I'm sure Keith wouldn't mind me posting this discussion, so I have done so. rom spencer Thu Jun 17 00:53:42 2004 Date: Thu, 17 Jun 2004 00:53:42 -0300 To: keith@kano.net Subject: PATCH: Corrections and improvements to the C++ in javabench This email message has real content and is not a flame. :) Regarding : As a professional C++ developer, I couldn't let the C++ code represent the language, even given the limited scope of the benchmarks (which seem to be more about optimizers and less about overall system performance). The C++ code presented was pretty amateurish, so I've made some improvements, which I've attached. I have not changed the algorithms of the overall tests since I will presume they were chosen by the original author with some basis, although there is little to convince me that this is the case. Since the files were so small, I decided just to send the whole new copies of the ones I changed. I'm using your run lines with SUN java 1.4.2_04-b05 (in server mode). I'm using the gcc-3.3.3 from the current Debian unstable. The java compiler/runtime seems to be very good at optimizing recursion! I am truly impressed with the scores for ackermann and fibo. (Even using a bit of gcc's __builtin_expect only shaved 5% off its times.) Without even running the hash and hash2 tests, I knew they would lose. I've made some comments in the attached files explaining why. I decided to look at the hash and hash2 tests next because the logic was very straight forward and the scores were so dramatically different. And they were next in alphabetical order :) After my changes, the C++ hash ran ~1.75 times faster than the Java, and the C++ hash2 ran ~0.89 times faster (i.e. slower) than the Java. This went to ~0.94 times faster with my gcc-3.4.0 I use professionally, probably due to changes from the "pool" allocator to the "mt" allocator in libstdc++ in the two configurations. The heapsort is kind of iffy, because in C++, you'd probably just use the STL algorithms that treat any randomly-accessible sequence as a heap, or you'd just call std::sort(). I revamped it anyway, just for amusement and didn't really get much out of it. Using std::sort() took a small fraction of the time of heapsort() with 10 000 000 elements, so I'm not putting much weight into this test (even though the C++ "won"); the sort algorithm here stinks. When I was done, the C++ was clocking ~1.35 times faster for N =3D 10 000 000. The matrix test is pretty strange with it's array of arrays, each with their own call to malloc. It's not even using objects for the matrices! This test is basically all "multiply and add". I'm surprised that C++ and Java didn't have nearly identical scores given this. The sophisticated C++ numerical packages (such as uBLAS ) uses lazy evaluation, and of course works with any reasonable value type. The methcall test is testing a _virtual_ method call, which is not always the case in C++. In fact, the test program does not even depend on the method being virtual to work correctly. Removing the virtuality of activate() makes the C++ ~1.6 times as fast as the Java code. The Java virtual method call is a lot faster than the C++ one, however. Since real C++ applications do not use virtual method calls exclusively (i.e. for every method call), I'd wager this at least comes out nearly even in the end in real software. The objinst test is a joke. Real programs define custom new and delete operators (easily via a shared base class) for small, frequently allocated objects. The default new and delete call malloc and free, which on some platforms, including Linux, don't perform well in those cases. Swapping in a memory pool like libstdc++'s default node allocator makes the C++ version _beat_ the Java version with no other changes: now it's ~1.05 times as fast as Java. When I get some more spare moments, I'll take a look at the rest of the tests. BTW, my build line, which I built up over my years of experience with g++ is as follows. Substitute your current CPU in the appropriate spots for some extra tuning. Note that the extra -ffast-math and -funroll-loops are not enabled with any of the -Ox flags. Note that -mcpu is now -mtune in gcc-3.4.x and above. g++ -Wall -O3 -fexpensive-optimizations -mtune=3Dathlon-tbird \ -march=3Dathlon-tbird -ffast-math -funroll-loops hash.cpp -o hash -- ---------------------------------------------------------------- Brad Spencer - spencer@jacknife.org - "It's quite nice..." "S.M.A.K.I.B.B.F.B." - A.J. Rimmer | http://jacknife.org/ Date: Thu, 17 Jun 2004 01:25:57 -0400 To: Brad Spencer References: <20040617035342.GA19068@jacknife.org> Subject: Re: PATCH: Corrections and improvements to the C++ in javabench Great, thanks for all this, I'm looking to improve the code for a separate set of hand-optimized benchmarks I'll release later this summer. I think that the original benchmarks represent performance of "average" code. As far as I know, the original author of the benchmarks didn't know C++ or Java any better than the other (or maybe at all) before he learned enough about them to write that code. So, in a way, the benchmarks represent "equivalent" Java and C++ code in terms of what kind of code an average program would contain. I'd like to know what you think about this idea. Don't worry, I will post a comparison of hand-optimized C++ vs. Java code, I just wonder what your feelings are about the idea that benchmarks shouldn't test hand-optimized code, because most code in the world isn't hand-optimized beyond being written with performance in mind (along with many other issues). -Keith From spencer Thu Jun 17 08:19:32 2004 Date: Thu, 17 Jun 2004 08:19:32 -0300 To: Keith Lea Subject: Re: PATCH: Corrections and improvements to the C++ in javabench On Thu, Jun 17, 2004 at 01:25:57AM -0400, Keith Lea wrote: > Great, thanks for all this, I'm looking to improve the code for a > separate set of hand-optimized benchmarks I'll release later this summer. You're welcome. I haven't but glanced at the other programs yet. Thanks for considering my email! > I think that the original benchmarks represent performance of "average"= > code. As far as I know, the original author of the benchmarks didn't > know C++ or Java any better than the other (or maybe at all) before he > learned enough about them to write that code. So, in a way, the > benchmarks represent "equivalent" Java and C++ code in terms of what > kind of code an average program would contain. > I'd like to know what you think about this idea. Don't worry, I will > post a comparison of hand-optimized C++ vs. Java code, I just wonder > what your feelings are about the idea that benchmarks shouldn't test > hand-optimized code, because most code in the world isn't hand-optimized > beyond being written with performance in mind (along with many other > issues). I don't consider the code to be hand-optimized. The code you see is pretty much how I and the folks I work with write code all the time. We of course have made some things you saw into functions (like the numeric to string conversions), but the constructs I used in hash, hash2, and objinst, for example, are all typical of daily code in my experience. Plus, if the goal is to capture the power of the language with only the "average" programmer, then I really hope the average programmer wouldn't be implementing some of these algorithms (matrix multiplication, sorting), where they're almost guaranteed to lose against existing implementations. I don't think that speed comparing the "most obvious program" is necessarily fair to either language. I think you need to implement the same algorithms, but with the knowledge of how to do things "right". Hopefully nobody expects to get notable performance from an "average" implementation when they aren't experienced with the language (although I do see the value there). I guess I'm just saying that in any language, there are "obvious" ways to do things that are also slow. I do think your tests show that the Java compiler _is_ on par with other language compilers. Given the current state of the art, I would hope this is the case; there's nothing in the language that should make small algorithms like these perform that much differently. Differences start to come out when you start to consider the tradeoffs encountered in real-world scenarios; I believe in a "right tool for the job" approach, and we use Java for quite a number of things where I work, in tandem with (but not linked to) C++. I'll forward some revised versions of the other tests later. Thanks again for your time! -- ---------------------------------------------------------------- Brad Spencer - spencer@jacknife.org - "It's quite nice..." "S.M.A.K.I.B.B.F.B." - A.J. Rimmer | http://jacknife.org/ Date: Thu, 17 Jun 2004 15:05:11 -0400 From: Keith Lea To: Brad Spencer Subject: Re: PATCH: Corrections and improvements to the C++ in javabench Brad Spencer wrote: > I don't consider the code to be hand-optimized. The code you see is > pretty much how I and the folks I work with write code all the time. > We of course have made some things you saw into functions (like the > numeric to string conversions), but the constructs I used in hash, > hash2, and objinst, for example, are all typical of daily code in my > experience. You seem like an experienced C++ programmer. Maybe hand-optimized isn't the right phrase, but your code is obviously optimized more than the original code. > Plus, if the goal is to capture the power of the language with only > the "average" programmer, then I really hope the average programmer > wouldn't be implementing some of these algorithms (matrix > multiplication, sorting), where they're almost guaranteed to lose > against existing implementations. I don't think that speed comparing > the "most obvious program" is necessarily fair to either language. I > think you need to implement the same algorithms, but with the > knowledge of how to do things "right". Hopefully nobody expects to > get notable performance from an "average" implementation when they > aren't experienced with the language (although I do see the value > there). I guess I'm just saying that in any language, there are > "obvious" ways to do things that are also slow. I agree with you here, in a way. The tests I ran are "narrow" in the sense that they do a very specific thing, over and over. Thus, they can be optimized heavily by a programmer, based on assumptions about what will and won't happen when the program is run. However, I think that most code in the world can't be so specifically optimized, because many things could happen in a real-world situation than in a 10-line single-threaded loop. So, maybe this benchmark code shouldn't be optimized by a programmer to represent more real-world code. > I do think your tests show that the Java compiler _is_ on par with > other language compilers. Given the current state of the art, I would > hope this is the case; there's nothing in the language that should > make small algorithms like these perform that much differently. > Differences start to come out when you start to consider the tradeoffs > encountered in real-world scenarios; I believe in a "right tool for > the job" approach, and we use Java for quite a number of things where > I work, in tandem with (but not linked to) C++. Yeah, this is the impression I wanted to get across by posting my results. At a very low level, Java and C++ have similar performance. -Keith From spencer Thu Jun 17 20:54:30 2004 Date: Thu, 17 Jun 2004 20:54:30 -0300 To: Keith Lea Subject: Re: PATCH: Corrections and improvements to the C++ in javabench On Thu, Jun 17, 2004 at 03:05:11PM -0400, Keith Lea wrote: > You seem like an experienced C++ programmer. Maybe hand-optimized isn't= > the right phrase, but your code is obviously optimized more than the > original code. [...] > The tests I ran are "narrow" in the sense that they do a very specific > thing, over and over. Thus, they can be optimized heavily by a > programmer, based on assumptions about what will and won't happen when > the program is run. However, I think that most code in the world can't > be so specifically optimized, because many things could happen in a > real-world situation than in a 10-line single-threaded loop. So, maybe > this benchmark code shouldn't be optimized by a programmer to represent > more real-world code. We appear to have different perspectives here :) There are few more things I'd like to say before I stop pestering you :) As an aside, I think professional programmers _do_ think about what will and won't happen when they write software; that's important not just for speed, but for correctness (and robustness under failure) as well. I am saying this from my experience building very highly available large scale systems. (BTW, all these real-word systems are single-threaded . . . but that's another topic ;) I must insist that if it's worth comparing the languages for speed, we can presume that speed is important in some contexts. I'm sure you would agree that otherwise the concern over execution speed is already unwarranted. And, I don't think it makes sense to say that you're not interested in how well the code is written, performance-wise, when you're talking about language speed comparisons. (To me, that sounds like comparing two drag-racing sports cars with monkeys behind the wheels. :) That being said, I should make it clear that I did not spend many iterations trying to tune the C++ implementations. I read the existing versions to see what the test was doing, and then revised the C++. I didn't go through several approaches before I hit upon the "faster-than-Java" versions; the natural C++ approaches I chose to start with were fast from the beginning. I don't consider that to be optimization. As far the overall original code quality goes, consider first the hash/hash2 tests. These used the hash_map extension, but did so with C-strings instead of std::string, which is Just Not C++. One test was buggy in that it modified the container when it just wanted to examine it. The other test was unnecessarily looking up data it already had via its iterator, when the Java test was not. Both also had multiple memory leaks (coming out of the strdup() calls). The objinst test purported to represent C++ small-object allocation performance (despite being a simple allocate/free test, and not even randomized), but completely ignored the mainstream notion of custom allocators. The only thing non-C in the matrix test is the use of "cout", and its data structure is a naive ("obvious") array of pointers to other arrays. The heapsort test appears to be completely C and makes no use of C++ features. I stop here only because I haven't done more than look at most of the rest of the tests, but they appear to be similar. It is for the reasons of the above paragraph (and more like them) that the original C++ tests are amateurish unfair representations of the language. This is why I thought it appropriate to provide some quality implementations. (I am not calling you an amateur by any means! :) > Yeah, this is the impression I wanted to get across by posting my > results. At a very low level, Java and C++ have similar performance. And kudos for it! I support you in this effort. -- ---------------------------------------------------------------- Brad Spencer - spencer@jacknife.org - "It's quite nice..." "S.M.A.K.I.B.B.F.B." - A.J. Rimmer | http://jacknife.org/ From kano@kano.net Fri Jun 18 03:24:29 2004 Date: Fri, 18 Jun 2004 02:24:20 -0400 From: Keith Lea To: Brad Spencer Subject: Re: PATCH: Corrections and improvements to the C++ in javabench Brad Spencer wrote: > As an aside, I think professional programmers _do_ think about what > will and won't happen when they write software; that's important not > just for speed, but for correctness (and robustness under failure) as > well. I am saying this from my experience building very highly > available large scale systems. (BTW, all these real-word systems are > single-threaded . . . but that's another topic ;) Yes, definitely. I was saying that most applications I've dealt with aren't so clear-cut as computing the Fibonacci sequence in a single loop. There are obvious optimizations that can be made in the latter case that maybe couldn't be made about a real computation a program has to perform. This might not be significant to the results though. > It is for the reasons of the above paragraph (and more like them) that > the original C++ tests are amateurish unfair representations > of the language. This is why I thought it appropriate > to provide some quality implementations. (I am not calling you an > amateur by any means! :) It's a good point that they may not be fair representations of the language. I don't think some of the Java code was a fair representation of Java either, so I'll be optimizing those myself and comparing them to some of the optimized C++ code I've been getting (including yours), later this summer. Thanks for all the good thinks, -Keith From spencer Fri Jun 18 08:59:20 2004 Date: Fri, 18 Jun 2004 08:59:20 -0300 To: Keith Lea Subject: Re: PATCH: Corrections and improvements to the C++ in javabench On Fri, Jun 18, 2004 at 02:24:20AM -0400, Keith Lea wrote: > It's a good point that they may not be fair representations of the > language. I don't think some of the Java code was a fair representation= > of Java either, so I'll be optimizing those myself and comparing them to= > some of the optimized C++ code I've been getting (including yours), > later this summer. Ok! If you could remember to drop me a note when that happens. > Thanks for all the good thinks, Likewise! -- ---------------------------------------------------------------- Brad Spencer - spencer@jacknife.org - "It's quite nice..." "S.M.A.K.I.B.B.F.B." - A.J. Rimmer | http://jacknife.org/