The Computer Language |
A game begun years ago. A game with many players. A game with many winners.
Learn something - look at several benchmarks and compare the program source code for a language you don't know with a language you do know, compare them on different measures, read Flawed Benchmarks. Write programs that improve the showing of your chosen language. Write the best program in your chosen language.
On 3 measures - ↓ Time-used, ↓ Memory-used and ↓ Code-used.
Compare them directly one-against-another for all the benchmarks.
Compare the summary measurements for just those language implementations.
March 1 - August 31, 2009
1,022,612 Unique Page Views; 150,802 Absolute Unique Visitors; 166 programs contributed.
It varies from benchmark to benchmark. It varies from week to week. It depends which language implementations are compared. It depends which measures are compared.
When the facts exceed our curiousity.
Look at what we show for Ubuntu™ Intel® Q6600® quad-core. Choose one of those programming languages. Choose one of those benchmarks. Read and accept the benchmarks game rules. Ask questions ↓ in the discussion forum.
↓ Write a new program and make sure it's correct by diff'ing the output. Profile and improve the program. ↓ Attach the program source code file to a tracker item.
There are 4 sets of up-to-date measurements. Measurements for different OS/machine combinations are shown on different color-coded pages. Click one of these color-code links to see measurements for a particular OS/machine -
Ubuntu™ : Intel® Q6600® quad-core |
x64 Ubuntu™ : Intel® Q6600® quad-core |
x64 Ubuntu™ : Intel® Q6600® one core |
Ubuntu™ : Intel® Q6600® one core |
It's worth asking why a program is better in one set of measurements rather than another set of measurements made on the same Intel® Q6600® machine. Caveat lector! Check the source code!
It isn't worth asking why a program is better on a different test machine, because as well as the obvious differences - hardware, os, language implementation versions - it's likely that the programs measured on the different machines are different programs (either because missing third party libraries stop a program being measured, or simply because the program was not downloaded and measured).
Gentoo : Intel® Pentium® 4 |
Debian : AMD™ Sempron™ |

They raced up, and down, and around and around and around, and forwards and backwards and sideways and upside-down.
Cheetah's friends said "it's not fair" - everyone knows Cheetah is the fastest creature but the races are too long and Cheetah gets tired!
Falcon's friends said "it's not fair" - everyone knows Falcon is the fastest creature but Falcon doesn't walk very well, he soars across the sky!
Horse's friends said "it's not fair" - everyone knows Horse is the fastest creature but this is only a yearling, you must stop the races until a stallion takes part!
Man's friends said "it's not fair" - everyone knows that in the "real world" Man would use a motorbike, you must wait until Man has fueled and warmed up the engine!
Snail's friends said "it's not fair" - everyone knows that a creature should leave a slime trail, all those other creatures are cheating!
Dalmatian's tail was banging on the ground. Dalmatian panted and between breaths said "Look at that beautiful mountain, let's race to the top!"
When the program was being measured: the first core was not-idle about 27% of the time, the second core was not-idle about 34% of the time, the third core was not-idle about 28% of the time, the fourth core was not-idle about 67% of the time.
When all the programs show ~CPU Load like this '0% 0% 0% 100%' you are probably looking at measurements of programs forced to use just one core - the fourth core (rather than being allowed to use any or all of the CPU cores).
N means the value passed to the program on the command-line (or the value used to create the data file passed to the program on stdin). Larger N causes the program to do more work - mostly measurements are shown for the largest N, the largest workload.
Interesting Alternative Program means that the program doesn't implement the benchmark according to the arbitrary and idiosyncratic rules of The Computer Language Benchmarks Game - but we simply couldn't resist showing the program.
Nothing - they are arbitrary suffixes that identify a specific program.
Each program was run as a child-process of a Python script using Popen.
Time measurements include program startup time.
By sampling GTop proc_mem for the program and it's child processes every 0.2 seconds. Obviously those measurements are unlikely to be reliable for programs that run for less than 0.2 seconds.
We started with the source-code markup you can see, removed comments, removed duplicate whitespace characters, and then applied minimum GZip compression. The Code-used measurement is the size in bytes of that GZip compressed source-code file.
The GTop cpu idle and GTop cpu total was taken before forking the child-process and after the child-process exits, The percentages represent the proportion of cpu not-idle to cpu total for each core.
In these (x86 Ubuntu™ : Intel® Q6600® quad-core) examples we measured elapsed time once the Java program had started: in the first case, we simply started and measured the program 66 times; in the second case, we started the program once and repeated measurements again and again and again 66 times without restarting the JVM; and then discarded the first measurement leaving 65 data points.
The usual startup measurements and the "Java 6 steady state" approximations (and JVM time) are shown alongside for comparison.
| "1.6.0_16" | started 65 times | repeated 65 times | |||||
|---|---|---|---|---|---|---|---|
| mean | σ | mean | σ | usual startup | approx. | JVM time | |
| meteor-contest | 0.23s | 0.01 | 0.12s | 0.00 | 0.31s | 0.13s | (14 sec) |
| spectral-norm | 4.08s | 0.23 | 3.96s | 0.02 | 4.11s | 3.96s | (4 min) |
| pidigits | 5.03s | 0.14 | 4.65s | 0.10 | 5.00s | 4.55s | (5 min) |
| fasta | 7.50s | 0.21 | 6.91s | 0.12 | 7.51s | - | - |
| mandelbrot | 11.91s | 0.81 | 11.46s | 0.06 | 10.95s | 11.40s | (13 min) |
| binary-trees | 20.69s | 0.69 | 15.35s | 0.43 | 19.18s | 15.40s | (17 min) | fannkuch | 18.74s | 0.66 | 18.56s | 0.48 | 18.43s | 18.59s | (20 min) |
| nbody | 24.94s | 0.02 | 25.07s | 1.22 | 25.03s | 24.69s | (27 min) |
Loading Java bytecode, profiling and dynamic compilation do take time but not enough time to make much of a difference in these examples.
The obvious differences show where there is a mismatch between program structure and JVM optimization - even though methods have been fully compiled the JVM continues using the on-stack-replacement. The opportunity to use the fully optimized compiled methods seems only to arise the next time the code block is invoked - whether that's in 10 seconds or 10 days.
To highlight that mismatch, "Java 6 steady state" approximations are shown in the measurement tables alongside the usual startup measurements.
Without any optimization option the GCC compiler goal is to reduce compilation cost and make debugging reasonable. Typically we might set -O3 -fomit-frame-pointer -march=native. For some benchmarks -mfpmath=sse -msse2 makes a noticeable difference (note J2SE use of SSE instruction sets).
We use a quad-core 2.4Ghz Intel® Q6600® machine with 4GB of RAM and 250GB SATA II disk drive.
The out-of-date measurements used a single-processor 2.2Ghz AMD™ Sempron™ machine with 512MB of RAM and 40GB IDE disk drive; and a single-processor 2Ghz Intel® Pentium® 4 machine with 512MB of RAM and 80GB IDE disk drive.
We use Ubuntu™ 9.04 Linux Kernel 2.6.28-17-generic
The out-of-date measurements used Debian Linux 'unstable', Kernel 2.6.18-3-k7 and Gentoo Linux gentoo-sources-2.6.20-r6
Broadcast message from root@hopper (Sun Oct 1 16:33:48 2006): Power button pressed The system is going down for system halt NOW! Broadcast message from root@hopper (Sun Oct 1 16:33:48 2006): Power button pressed The system is going down for system halt NOW! Broadcast message from root@hopper (Sun Oct 1 16:33:49 2006): Power button pressed The system is going down for system halt NOW! Broadcast message from root@hopper (Sun Oct 1 16:33:49 2006): Power button pressed The system is going down for system halt NOW! Broadcast message from root@hopper (Sun Oct 1 16:33:49 2006): Power button pressed The system is going down for system halt NOW! Broadcast message from root@hopper (Sun Oct 1 16:33:50 2006): Power button pressed The system is going down for system halt NOW!
We no longer accept anonymous comments - please create an Alioth ID and login.
Ask questions or discuss the benchmarks in the discussion forums.
Tell us about content mistakes, inconsistencies, bad installs etc - Report a Bug.
Tell us about the latest language updates etc - add a Feature Request.
We use Andre Simon's highlight to convert program source code to XHTML, please contribute better language definition files.
Change the things you don't like - convince us that the change is a worthwhile improvement and then expect to do all the work.
Be Nice! Maybe we'll reject the program. Maybe we'll prefer our own opinions. Maybe we'll decide not to change something.
We prefer plain vanilla programs - after all we're trying to compare language implementations not programmer effort and skill. We'd like your programs to be easily viewable - so please format your code to fit in less than 80 columns (we don't measure lines-of-code!).
We also have a weakness for idiosyncratic, elegant, clever programs; and when they are too elegant to meet the requirements of the benchmark we might still show them in the ↓ Interesting Alternative Programs section.
Do design-iteration on your machine, or in a language newsgroup. Only Contribute Programs which give correct results on your machine - diff the program output with the provided output file. (Don't make-unnecessary-work for the committers.)
Leave it a couple of days, and then see if there are any minor improvements that you'd like to make, before you Contribute Programs to The Computer Language Benchmarks Game.
Programs are measured across a range of input-values; programs are expected to either take a single command-line parameter or read text from stdin.
(Look at what the other programs do.)
Programs should write to stdout. Program output is redirected to a log-file and diff'd with the expected output.
(Look at what the other programs do.)
Include a header comment in the program like this:
/* The Computer Language Benchmarks Game http://shootout.alioth.debian.org/ contributed by … modified by … */
We use a simple script to split a single source file into multiple target source files.
One of the target source files must have the same filename as the original single source file, and is expected to be the 'main' program.
For example, the Eiffel nbody.e source file will be split into 3 target source files - nbody.e, body.e, nbody_system.e - each new target source file will start from the comment line which included the SPLITFILE=target-filename directive and run to the line preceding the next SPLITFILE=target-filename directive or end-of-file.
So, the new target source file body.e will start with the line
-- SPLITFILE=body.eand end with the empty line following
end -- class BODYDon't manually unroll loops!
We'll remove any promos that you add as comment text, so please don't waste our time.
There are many contributors and few committers - a little more time spent by contributors saves committers a great deal more time.
Attach the full source-code file of a tested program. Please don't paste source-code into the description field. Please don't contribute patch-files.
Before contributing programs
Follow these instructions step-by-step
Now start from the bottom of the "Play the Benchmarks Game" Submit-New form and work your way up.
You created an ↓ Alioth ID with a valid email address so you'll receive email updates when your program is accepted and measured.
Periodically we go through and remove slower programs from the website (if there's a faster program for the same language implementation). We don't remove those programs from the "Play the Benchmarks Game" tracker.
You can see previous programs by browsing though the Play the Benchmarks Game tracker items and looking at the attached source code files. Log In with your Alioth Id, you will be able to create and save a query to search for particular tracker items.
You can see information about the language implementation, including the version number, at the bottom of each language measurements page.
You can see the build commands and runtime options at the bottom of each program page - make, command line, and program output logs.
The project is hosted by Alioth FusionForge.
You can browse the CVS tree.
We are trying to show the performance of various programming language implementations - so we ask that contributed programs not only give the correct result, but also use the same algorithm to calculate that result.
Back in the day, Doug Bagley used both same way (same algorithm) and same thing (same result) benchmarks - so in many cases the performance differences were simply better algorithms.
After hearing many arguments, it seems to me that we should think of same way (same algorithm) tests as benchmarks, and we should think of same thing (same result) tests as contests.
At present, we show just one contest - meteor-contest.
Is the language implementation
If that wasn't discouraging enough: in too many cases we've been asked to include a language implementation, and been told that of course programs would be contributed, but once the language didn't seem to perform as-well-as hoped no more programs were contributed. We're interested in the whole range of performance - not just in the 5 programs which show a language implementation at it's best.
We have no ambition to measure every Python implementation or every Haskell implementation or every C implementation - that's a chore for all you Python enthusiasts and Haskell enthusiasts and C enthusiasts, a chore which might be straightforward if you use our measurement scripts.
We are unable to publish measurements for many commercial language implementations simply because their license conditions forbid it.
We will accept and reject languages in a capricious and unfair fashion - so ask if we're interested before you start coding.
The Python script "bencher does repeated measurements of program cpu time, elapsed time, resident memory usage, cpu load while a program is running - and summarizes those measurements" - download bencher
As an alternative, you should take a look at these Python measurement scripts designed for statistically rigorous Java performance evaluation - JavaStats
You'll come across a range of uses for the programs and measurements: