I'm sure Keith wouldn't mind me posting this discussion, so I have done so.

rom spencer Thu Jun 17 00:53:42 2004
Date: Thu, 17 Jun 2004 00:53:42 -0300
To: keith@kano.net
Subject: PATCH: Corrections and improvements to the C++ in javabench

This email message has real content and is not a flame. :)

Regarding <http://kano.net/javabench/>:

As a professional C++ developer, I couldn't let the C++ code represent
the language, even given the limited scope of the benchmarks (which
seem to be more about optimizers and less about overall system
performance).  The C++ code presented was pretty amateurish, so I've
made some improvements, which I've attached.  I have not changed the
algorithms of the overall tests since I will presume they
were chosen by the original author with some basis, although there is
little to convince me that this is the case.  Since the files were so
small, I decided just to send the whole new copies of the ones I
changed.  

I'm using your run lines with SUN java 1.4.2_04-b05 (in server mode).
I'm using the gcc-3.3.3 from the current Debian unstable.

The java compiler/runtime seems to be very good at optimizing recursion!
I am truly impressed with the scores for ackermann and fibo.  (Even using
a bit of gcc's __builtin_expect only shaved 5% off its times.)

Without even running the hash and hash2 tests, I knew they would
lose.  I've made some comments in the attached files explaining why.

I decided to look at the hash and hash2 tests next because the logic
was very straight forward and the scores were so dramatically
different.  And they were next in alphabetical order :)

After my changes, the C++ hash ran ~1.75 times faster than the Java,
and the C++ hash2 ran ~0.89 times faster (i.e. slower) than the Java.
This went to ~0.94 times faster with my gcc-3.4.0 I use
professionally, probably due to changes from the "pool" allocator to
the "mt" allocator in libstdc++ in the two configurations. 

The heapsort is kind of iffy, because in C++, you'd probably just use
the STL algorithms that treat any randomly-accessible sequence as a
heap, or you'd just call std::sort().  I revamped it anyway, just for
amusement and didn't really get much out of it.  Using std::sort()
took a small fraction of the time of heapsort() with 10 000 000
elements, so I'm not putting much weight into this test (even though
the C++ "won"); the sort algorithm here stinks.  When I was done, the
C++ was clocking ~1.35 times faster for N =3D 10 000 000.

The matrix test is pretty strange with it's array of arrays, each with
their own call to malloc.  It's not even using objects for the
matrices!  This test is basically all "multiply and add".  I'm
surprised that C++ and Java didn't have nearly identical scores given
this.  The sophisticated C++ numerical packages (such as
uBLAS <http://www.boost.org/libs/numeric/ublas/doc/overview.htm>) uses
lazy evaluation, and of course works with any reasonable value type.

The methcall test is testing a _virtual_ method call, which is not
always the case in C++.  In fact, the test program does not even
depend on the method being virtual to work correctly.  Removing the
virtuality of activate() makes the C++ ~1.6 times as fast as the Java
code.  The Java virtual method call is a lot faster than the C++ one,
however.  Since real C++ applications do not use virtual method calls
exclusively (i.e. for every method call), I'd wager this at least
comes out nearly even in the end in real software.

The objinst test is a joke.  Real programs define custom new and
delete operators (easily via a shared base class) for small,
frequently allocated objects.  The default new and delete call malloc
and free, which on some platforms, including Linux, don't perform well
in those cases.  Swapping in a memory pool like libstdc++'s default
node allocator makes the C++ version _beat_ the Java version with no
other changes: now it's ~1.05 times as fast as Java.

When I get some more spare moments, I'll take a look at the rest of
the tests.

BTW, my build line, which I built up over my years of experience with
g++ is as follows.  Substitute your current CPU in the appropriate
spots for some extra tuning.  Note that the extra -ffast-math and
-funroll-loops are not enabled with any of the -Ox flags.  Note that
-mcpu is now -mtune in gcc-3.4.x and above.

g++ -Wall -O3 -fexpensive-optimizations -mtune=3Dathlon-tbird \
  -march=3Dathlon-tbird -ffast-math -funroll-loops hash.cpp -o hash

       
--
----------------------------------------------------------------
Brad Spencer - spencer@jacknife.org - "It's quite nice..."
"S.M.A.K.I.B.B.F.B." - A.J. Rimmer | http://jacknife.org/

Date: Thu, 17 Jun 2004 01:25:57 -0400
To: Brad Spencer <spencer@jacknife.org>
References: <20040617035342.GA19068@jacknife.org>
Subject: Re: PATCH: Corrections and improvements to the C++ in javabench

Great, thanks for all this, I'm looking to improve the code for a 
separate set of hand-optimized benchmarks I'll release later this summer.

I think that the original benchmarks represent performance of "average" 
code. As far as I know, the original author of the benchmarks didn't 
know C++ or Java any better than the other (or maybe at all) before he 
learned enough about them to write that code. So, in a way, the 
benchmarks represent "equivalent" Java and C++ code in terms of what 
kind of code an average program would contain.

I'd like to know what you think about this idea. Don't worry, I will 
post a comparison of hand-optimized C++ vs. Java code, I just wonder 
what your feelings are about the idea that benchmarks shouldn't test 
hand-optimized code, because most code in the world isn't hand-optimized 
beyond being written with performance in mind (along with many other 
issues).

-Keith

From spencer Thu Jun 17 08:19:32 2004
Date: Thu, 17 Jun 2004 08:19:32 -0300
To: Keith Lea <kano@kano.net>
Subject: Re: PATCH: Corrections and improvements to the C++ in javabench

On Thu, Jun 17, 2004 at 01:25:57AM -0400, Keith Lea wrote:
> Great, thanks for all this, I'm looking to improve the code for a
> separate set of hand-optimized benchmarks I'll release later this summer.

You're welcome.  I haven't but glanced at the other programs yet.
Thanks for considering my email!

> I think that the original benchmarks represent performance of "average"=
> code. As far as I know, the original author of the benchmarks didn't
> know C++ or Java any better than the other (or maybe at all) before he
> learned enough about them to write that code. So, in a way, the
> benchmarks represent "equivalent" Java and C++ code in terms of what
> kind of code an average program would contain.

> I'd like to know what you think about this idea. Don't worry, I will
> post a comparison of hand-optimized C++ vs. Java code, I just wonder
> what your feelings are about the idea that benchmarks shouldn't test
> hand-optimized code, because most code in the world isn't hand-optimized
> beyond being written with performance in mind (along with many other
> issues).

I don't consider the code to be hand-optimized.  The code you see is
pretty much how I and the folks I work with write code all the time.
We of course have made some things you saw into functions (like the
numeric to string conversions), but the constructs I used in hash,
hash2, and objinst, for example, are all typical of daily code in my
experience.

Plus, if the goal is to capture the power of the language with only
the "average" programmer, then I really hope the average programmer
wouldn't be implementing some of these algorithms (matrix
multiplication, sorting), where they're almost guaranteed to lose
against existing implementations.  I don't think that speed comparing
the "most obvious program" is necessarily fair to either language.  I
think you need to implement the same algorithms, but with the
knowledge of how to do things "right".  Hopefully nobody expects to
get notable performance from an "average" implementation when they
aren't experienced with the language (although I do see the value
there).  I guess I'm just saying that in any language, there are
"obvious" ways to do things that are also slow.

I do think your tests show that the Java compiler _is_ on par with
other language compilers.  Given the current state of the art, I would
hope this is the case; there's nothing in the language that should
make small algorithms like these perform that much differently.
Differences start to come out when you start to consider the tradeoffs
encountered in real-world scenarios; I believe in a "right tool for
the job" approach, and we use Java for quite a number of things where
I work, in tandem with (but not linked to) C++.

I'll forward some revised versions of the other tests later.  Thanks
again for your time!

--
----------------------------------------------------------------
Brad Spencer - spencer@jacknife.org - "It's quite nice..."
"S.M.A.K.I.B.B.F.B." - A.J. Rimmer | http://jacknife.org/

Date: Thu, 17 Jun 2004 15:05:11 -0400
From: Keith Lea <kano@kano.net>
To: Brad Spencer <spencer@jacknife.org>
Subject: Re: PATCH: Corrections and improvements to the C++ in javabench

Brad Spencer wrote:
> I don't consider the code to be hand-optimized.  The code you see is
> pretty much how I and the folks I work with write code all the time.
> We of course have made some things you saw into functions (like the
> numeric to string conversions), but the constructs I used in hash,
> hash2, and objinst, for example, are all typical of daily code in my
> experience. 

You seem like an experienced C++ programmer. Maybe hand-optimized isn't 
the right phrase, but your code is obviously optimized more than the 
original code.

> Plus, if the goal is to capture the power of the language with only
> the "average" programmer, then I really hope the average programmer
> wouldn't be implementing some of these algorithms (matrix
> multiplication, sorting), where they're almost guaranteed to lose
> against existing implementations.  I don't think that speed comparing
> the "most obvious program" is necessarily fair to either language.  I
> think you need to implement the same algorithms, but with the
> knowledge of how to do things "right".  Hopefully nobody expects to
> get notable performance from an "average" implementation when they
> aren't experienced with the language (although I do see the value
> there).  I guess I'm just saying that in any language, there are
> "obvious" ways to do things that are also slow.

I agree with you here, in a way.

The tests I ran are "narrow" in the sense that they do a very specific 
thing, over and over. Thus, they can be optimized heavily by a 
programmer, based on assumptions about what will and won't happen when 
the program is run. However, I think that most code in the world can't 
be so specifically optimized, because many things could happen in a 
real-world situation than in a 10-line single-threaded loop. So, maybe 
this benchmark code shouldn't be optimized by a programmer to represent 
more real-world code.

> I do think your tests show that the Java compiler _is_ on par with
> other language compilers.  Given the current state of the art, I would
> hope this is the case; there's nothing in the language that should
> make small algorithms like these perform that much differently.
> Differences start to come out when you start to consider the tradeoffs
> encountered in real-world scenarios; I believe in a "right tool for
> the job" approach, and we use Java for quite a number of things where
> I work, in tandem with (but not linked to) C++.

Yeah, this is the impression I wanted to get across by posting my 
results. At a very low level, Java and C++ have similar performance.

-Keith

From spencer Thu Jun 17 20:54:30 2004
Date: Thu, 17 Jun 2004 20:54:30 -0300
To: Keith Lea <kano@kano.net>
Subject: Re: PATCH: Corrections and improvements to the C++ in javabench

On Thu, Jun 17, 2004 at 03:05:11PM -0400, Keith Lea wrote:
> You seem like an experienced C++ programmer. Maybe hand-optimized isn't=

> the right phrase, but your code is obviously optimized more than the
> original code.
[...]
> The tests I ran are "narrow" in the sense that they do a very specific
> thing, over and over. Thus, they can be optimized heavily by a
> programmer, based on assumptions about what will and won't happen when
> the program is run. However, I think that most code in the world can't
> be so specifically optimized, because many things could happen in a
> real-world situation than in a 10-line single-threaded loop. So, maybe
> this benchmark code shouldn't be optimized by a programmer to represent
> more real-world code.

We appear to have different perspectives here :)  There are few more
things I'd like to say before I stop pestering you :)

As an aside, I think professional programmers _do_ think about what
will and won't happen when they write software; that's important not
just for speed, but for correctness (and robustness under failure) as
well.  I am saying this from my experience building very highly
available large scale systems.  (BTW, all these real-word systems are
single-threaded . . . but that's another topic ;)

I must insist that if it's worth comparing the languages for speed, we
can presume that speed is important in some contexts.  I'm sure you
would agree that otherwise the concern over execution speed is already
unwarranted.  And, I don't think it makes sense to say that you're not
interested in how well the code is written, performance-wise, when
you're talking about language speed comparisons.  (To me, that sounds
like comparing two drag-racing sports cars with monkeys behind the
wheels. :)

That being said, I should make it clear that I did not spend many
iterations trying to tune the C++ implementations.  I read the
existing versions to see what the test was doing, and then revised
the C++.  I didn't go through several approaches before I hit upon the
"faster-than-Java" versions; the natural C++ approaches I chose to
start with were fast from the beginning.  I don't consider that to be
optimization.  

As far the overall original code quality goes, consider first the
hash/hash2 tests.  These used the hash_map extension, but did so with
C-strings instead of std::string, which is Just Not C++.  One test was
buggy in that it modified the container when it just wanted to examine
it.  The other test was unnecessarily looking up data it already had
via its iterator, when the Java test was not.  Both also had multiple
memory leaks (coming out of the strdup() calls).  The objinst test
purported to represent C++ small-object allocation performance
(despite being a simple allocate/free test, and not even randomized),
but completely ignored the mainstream notion of custom allocators.
The only thing non-C in the matrix test is the use of "cout", and its
data structure is a naive ("obvious") array of pointers to other
arrays.  The heapsort test appears to be completely C and makes no use
of C++ features.  I stop here only because I haven't done more than
look at most of the rest of the tests, but they appear to be similar.

It is for the reasons of the above paragraph (and more like them) that
the original C++ tests are amateurish unfair representations
of the language.  This is why I thought it appropriate
to provide some quality implementations.  (I am not calling you an
amateur by any means! :)

> Yeah, this is the impression I wanted to get across by posting my
> results. At a very low level, Java and C++ have similar performance.

And kudos for it!  I support you in this effort.

--
----------------------------------------------------------------
Brad Spencer - spencer@jacknife.org - "It's quite nice..."
"S.M.A.K.I.B.B.F.B." - A.J. Rimmer | http://jacknife.org/

From kano@kano.net Fri Jun 18 03:24:29 2004
Date: Fri, 18 Jun 2004 02:24:20 -0400
From: Keith Lea <kano@kano.net>
To: Brad Spencer <spencer@jacknife.org>
Subject: Re: PATCH: Corrections and improvements to the C++ in javabench

Brad Spencer wrote:
> As an aside, I think professional programmers _do_ think about what
> will and won't happen when they write software; that's important not
> just for speed, but for correctness (and robustness under failure) as
> well.  I am saying this from my experience building very highly
> available large scale systems.  (BTW, all these real-word systems are
> single-threaded . . . but that's another topic ;)

Yes, definitely. I was saying that most applications I've dealt with 
aren't so clear-cut as computing the Fibonacci sequence in a single 
loop. There are obvious optimizations that can be made in the latter 
case that maybe couldn't be made about a real computation a program has 
to perform. This might not be significant to the results though.

> It is for the reasons of the above paragraph (and more like them) that
> the original C++ tests are amateurish unfair representations
> of the language.  This is why I thought it appropriate
> to provide some quality implementations.  (I am not calling you an
> amateur by any means! :)

It's a good point that they may not be fair representations of the 
language. I don't think some of the Java code was a fair representation 
of Java either, so I'll be optimizing those myself and comparing them to 
some of the optimized C++ code I've been getting (including yours), 
later this summer.

Thanks for all the good thinks,
-Keith

From spencer Fri Jun 18 08:59:20 2004
Date: Fri, 18 Jun 2004 08:59:20 -0300
To: Keith Lea <kano@kano.net>
Subject: Re: PATCH: Corrections and improvements to the C++ in javabench

On Fri, Jun 18, 2004 at 02:24:20AM -0400, Keith Lea wrote:
> It's a good point that they may not be fair representations of the
> language. I don't think some of the Java code was a fair representation=
> of Java either, so I'll be optimizing those myself and comparing them to=
> some of the optimized C++ code I've been getting (including yours),
> later this summer.

Ok!  If you could remember to drop me a note when that happens. 

> Thanks for all the good thinks,

Likewise!

--
----------------------------------------------------------------
Brad Spencer - spencer@jacknife.org - "It's quite nice..."
"S.M.A.K.I.B.B.F.B." - A.J. Rimmer | http://jacknife.org/