The Computer Language |
There are 4 sets of up-to-date measurements. Measurements for different OS/machine combinations are shown on different color-coded pages.
Example Ubuntu™ : Intel® Q6600® quad-core select all benchmarks select all languages in the drop-down menus.
Example x64 Ubuntu™ : Intel® Q6600® quad-core select spectral-norm select all languages in the drop-down menus.
Example x64 Ubuntu™ : Intel® Q6600® one core select all benchmarks select Java -6 server in the drop-down menus.
Example Ubuntu™ : Intel® Q6600® one core select all benchmarks select Java -6 server select Python CPython in the drop-down menus.
Example select Java -6 server in the drop-down menu.
Each program was run as a child-process of a Python script using Popen.
Time measurements include program startup time.
By sampling GTop proc_mem for the program and it's child processes every 0.2 seconds. Obviously those measurements are unlikely to be reliable for programs that run for less than 0.2 seconds.
We started with the source-code markup you can see, removed comments, removed duplicate whitespace characters, and then applied minimum GZip compression. The Code-used measurement is the size in bytes of that GZip compressed source-code file.
The GTop cpu idle and GTop cpu total was taken before forking the child-process and after the child-process exits, The percentages represent the proportion of cpu not-idle to cpu total for each core.
Do design-iteration on your machine, or in a language newsgroup. Only contribute programs which give correct results on your machine - diff the program output with the provided output file before you contribute the program.
Prefer plain vanilla programs - after all we're trying to compare language implementations not programmer effort and skill. We'd like your programs to be easily viewable - so please format your code to fit in less than 80 columns (we don't measure lines-of-code!).
Programs are measured across a range of input-values; programs are expected to either take a single command-line parameter or read text from stdin.
(Look at what the other programs do.)
Programs should write to stdout. Program output is redirected to a log-file and diff'd with the expected output.
(Look at what the other programs do.)
Include a header comment in the program like this:
/* The Computer Language Benchmarks Game http://shootout.alioth.debian.org/ contributed by … modified by … */
Don't manually unroll loops!
Attach the full source-code file of a tested program. Please don't paste source-code into the description field. Please don't contribute patch-files.
Before contributing programs
Follow these instructions step-by-step
Now start from the bottom of the "Play the Benchmarks Game" Submit-New form and work your way up.
You created an ↓ Alioth ID with a valid email address so you'll receive email updates when your program is accepted and measured.
Tell us about content mistakes, inconsistencies, bad installs etc
Please create an Alioth ID, login and Report a Bug.
Tell us about the latest language updates etc - add a Feature Request.
Please ask more general questions in the discussion forum
(debian issue their own security certificate - your web browser will complain.)
N means the value passed to the program on the command-line (or the value used to create the data file passed to the program on stdin). Larger N causes the program to do more work - mostly measurements are shown for the largest N, the largest workload.
When the program was being measured: the first core was not-idle about 27% of the time, the second core was not-idle about 34% of the time, the third core was not-idle about 28% of the time, the fourth core was not-idle about 67% of the time.
When all the programs show ≈ CPU Load like this '0% 0% 0% 100%' you are probably looking at measurements of programs forced to use just one core - the fourth core (rather than being allowed to use any or all of the CPU cores).
Interesting Alternative Program means that the program doesn't implement the benchmark according to the arbitrary and idiosyncratic rules of The Computer Language Benchmarks Game - but we felt like showing the program anyway.
Nothing - they are arbitrary suffixes that identify a specific program.
Why don't you use our measurement scripts and publish measurements for language X?
The Python script "bencher does repeated measurements of program cpu time, elapsed time, resident memory usage, cpu load while a program is running, and summarizes those measurements" - download bencher and unpack into your ~ directory and read the license before use.
As an alternative, you should take a look at these Python measurement scripts designed for statistically rigorous Java performance evaluation - JavaStats
In these (x86 Ubuntu™ : Intel® Q6600® quad-core) examples we measured elapsed time once the Java program had started: in the first case, we simply started and measured the program 66 times; in the second case, we started the program once and repeated measurements again and again and again 66 times without restarting the JVM; and then discarded the first measurement leaving 65 data points.
The usual startup measurements and the "Java 6 steady state" approximations (and JVM time) are shown alongside for comparison.
| "1.6.0_16" | started 65 times | repeated 65 times | |||||
|---|---|---|---|---|---|---|---|
| mean | σ | mean | σ | usual startup | approx. | JVM time | |
| meteor contest | 0.23s | 0.01 | 0.12s | 0.00 | 0.31s | 0.13s | (14 sec) |
| spectral norm | 4.08s | 0.23 | 3.96s | 0.02 | 4.11s | 3.96s | (4 min) |
| pidigits | 5.03s | 0.14 | 4.65s | 0.10 | 5.00s | 4.55s | (5 min) |
| fasta | 7.50s | 0.21 | 6.91s | 0.12 | 7.51s | - | - |
| mandelbrot | 11.91s | 0.81 | 11.46s | 0.06 | 10.95s | 11.40s | (13 min) |
| binary trees | 20.69s | 0.69 | 15.35s | 0.43 | 19.18s | 15.40s | (17 min) |
| fannkuch | 18.74s | 0.66 | 18.56s | 0.48 | 18.43s | 18.59s | (20 min) |
| nbody | 24.94s | 0.02 | 25.07s | 1.22 | 25.03s | 24.69s | (27 min) |
Loading Java bytecode, profiling and dynamic compilation do take time but not enough time to make much of a difference in these examples.
The obvious differences show where there is a mismatch between program structure and JVM optimization - even though methods have been fully compiled the JVM continues using the on-stack-replacement. The opportunity to use the fully optimized compiled methods seems only to arise the next time the code block is invoked - whether that's in 10 seconds or 10 days.
To highlight that mismatch, "Java 6 steady state" approximations are shown in the measurement tables alongside the usual startup measurements.
We use a quad-core 2.4Ghz Intel® Q6600® machine with 4GB of RAM and 250GB SATA II disk drive.
The out-of-date measurements used a single-processor 2.2Ghz AMD™ Sempron™ machine with 512MB of RAM and 40GB IDE disk drive; and a single-processor 2Ghz Intel® Pentium® 4 machine with 512MB of RAM and 80GB IDE disk drive.
We use Ubuntu™ 9.04 Linux Kernel 2.6.28-17-generic
The out-of-date measurements used Debian Linux 'unstable', Kernel 2.6.18-3-k7 and Gentoo Linux gentoo-sources-2.6.20-r6
Periodically we go through and remove slower programs from the website (if there's a faster program for the same language implementation). We don't remove those programs from the "Play the Benchmarks Game" tracker.
You can see previous programs by browsing though the Play the Benchmarks Game tracker items and looking at the attached source code files. Log In with your Alioth Id, you will be able to create and save a query to search for particular tracker items.
We are trying to show the performance of various programming language implementations - so we ask that contributed programs not only give the correct result, but also use the same algorithm to calculate that result.
We do show one contest where you can use different algorithms - meteor-contest.
The out-of-date measurements are for many different programming languages. The program source code written in those other programming languages is interesting.
Gentoo : Intel® Pentium® 4 |
Debian : AMD™ Sempron™ |