28 Mar 2008, Friday, 6:05 pm GMT

This FAQ is short. You can read it really quickly.

 What kind of game is this?

A game begun years ago. A game with many winners. A game with many players.

How is the game scored?

On 3 measures - ↓ cpu time, ↓ memory use and ↓ source code size.

How do I play?

Choose a programming language. Choose a benchmark. Read and accept the benchmark rules. Ask questions ↓ in the discussion forum.

↓ Write a new program and make sure it's correct by diff'ing the output. Profile and improve the program. ↓ Attach the program source code file to a tracker item.

How do I compare language implementations?

Compare them directly one-against-another for all the benchmarks.

How do I compare 2 or 3 programs?

Compare them directly side-by-side for all the data points.

How do I win?

Write the best program in your chosen language. Write programs that improve the showing of your chosen language. Learn something new.

Who's winning?

It varies from benchmark to benchmark. It varies from week to week. It depends which language implementations are compared. It depends which measures are compared.

Be curious - look at several benchmarks, compare several language implementations, compare them on different measures, read Flawed Benchmarks.

When does the game end?

When the facts exceed our curiousity.

 What does … mean?
What does "not fair" mean? (A fable)

They raced up, and down, and around and around and around, and forwards and backwards and sideways and upside-down.

Cheetah's friends said "it's not fair" - everyone knows Cheetah is the fastest creature but the races are too long and Cheetah gets tired!

Falcon's friends said "it's not fair" - everyone knows Falcon is the fastest creature but Falcon doesn't walk very well, he soars across the sky!

Horse's friends said "it's not fair" - everyone knows Horse is the fastest creature but this is only a yearling, you must stop the races until a stallion takes part!

Man's friends said "it's not fair" - everyone knows that in the "real world" Man would use a motorbike, you must wait until Man has fueled and warmed up the engine!

Snail's friends said "it's not fair" - everyone knows that a creature should leave a slime trail, all those other creatures are cheating!

Dalmatian's tail was banging on the ground. Dalmatian panted and between breaths said "Look at that beautiful mountain, let's race to the top!"

What does CPU Time mean?

CPU Time means program usr+sys time (in seconds) which includes the time taken to startup and shutdown the program. For language implementations that use a Virtual Machine the CPU Time includes the time taken to startup and shutdown the VM.

You can get a vague idea of the difference in startup time between language implementations from the startup benchmark.

What about Java dynamic compilation?

Sometimes Java programmers point out that JVM profiling and dynamic compilation will improve program performance when the same program is used again and again and again without shutting down the JVM. Sometimes other programmers don't believe that JVM profiling and dynamic compilation will have any effect on simple programs like those shown in the benchmarks game - let's take a look.

In these examples we measured elapsed time once the Java program had started: in the first case, we simply started and measured the program 400 times; in the second case, we started the program once and measured the program again and again and again 400 times, without restarting the JVM.

 started 400 times   started once 
  mean σ mean σ
nsieve   2.37   0.05   2.14   0.01
mandelbrot   3.42   0.01   3.20   0.01
binary-trees   6.37 0.06 5.66 0.05
nsievebits   7.43 0.28 7.15 0.08
fannkuch   11.66 0.16 11.13 0.43
nbody   16.21 0.05 16.06 0.34
spectral-norm   24.71 0.02 23.65 0.05

The costs of JVM profiling and dynamic compilation are always included in the first case; in the second case the first measurement shows the costs of partial interpretation and JVM profiling and dynamic compilation, but the next 399 measurements show the benefits without showing the costs. We can't just wish the costs away - Java bytecode does need to be loaded and profiled and compiled.

As part of performance analysis, those differences hint at how much is not accounted for and how little or how much we might still be able to achieve by writing better programs.

What language was used to write each initial benchmark program?

Different benchmark programs, different authors - different languages.

The benchmark descriptions sometimes refer to an article which included program source: Lisp and C for fannkuch; Java for binary-trees and meteor and chameneos; Haskell for pidigits; Erlang for thread-ring. And others as the author provided: C for mandelbrot and spectral-norm; Java for n-body. And others in Nice or C# or Lua or … as the mood would have it.

What does Interesting Alternative Program mean?

Interesting Alternative Program means that the program doesn't implement the benchmark according to the arbitrary and idiosyncratic rules of The Computer Language Benchmarks Game - but we simply couldn't resist showing the program.

What do #2 #3 mean?

Nothing - they are arbitrary suffixes that identify a specific program.

 How did you measure…?
How did you measure?

Each program was run once pre-test to reduce cache effects. Program output is redirected to a log-file and compared to the expected output.

Each program was then run 3 times with program output redirected to /dev/null. We show the lowest measured CPU time and the highest memory usage, from the 3 runs.

The variation between cpu times is different for different languages and for different benchmarks. The coefficient of variation for 100 measurements of nbody ranged from 0.029% (Lua) to 0.074% (Oberon2) to 0.092% (C#); and for 100 measurements of fasta ranged from 0.009% (Lua) to 0.088% (C#) to 0.655% (Oberon2).

Don't sweat the small stuff - differences in cpu time of a few % are illusory.

How did you measure CPU time?

Each program was run as a child-process of a Perl script. We take the script child-process usr+sys time, before forking the child-process and after the child-process exits.

(BSD::Resource::times)[2,3] does seem to provide better resolution than Perl times() builtin function or GNU time, for example measuring the same program:

Perl times() builtin function
16.650
16.660
16.640

BSD::Resource::times
16.659
16.656
16.655

GNU time version 1.7
16.62
16.61
16.60

Bash time builtin command
16.624
16.628
16.638

We use (BSD::Resource::times)[2,3]

The ↓ CPU time includes program startup time.

How did you measure memory usage?

In a very approximate and unreliable way. We sampled the child-process resident memory size (VmRSS) multiple times a second. We identified the main thread by checking for SIGCHLD being registered as the exit_signal in the second to last field of /proc/{pid}/stat.

There's a race condition. When the program completes quickly, this sampling technique will fail.

How did you measure GZip Bytes?

We started with the source-code markup you can see, removed comments, removed duplicate whitespace characters, and then applied minimum GZip compression.

How did you set compiler options?

Without any optimization option the GCC compiler goal is to reduce compilation cost and make debugging reasonable. Typically we might set -O3 -fomit-frame-pointer -march=pentium4. For some benchmarks -mfpmath=sse -msse2 makes a noticeable difference (note J2SE use of SSE instruction sets).

What machine are you running the programs on?

We use a single-processor 2.2Ghz AMD™ Sempron™ machine with 512 MB of RAM and a 40GB IDE disk drive; and a single-processor 2Ghz Intel® Pentium® 4 machine with 512MB of RAM and an 80GB IDE disk drive.

What OS are you using on the test machine?

We use Debian Linux™ 'unstable', Kernel 2.6.18-3-k7 and Gentoo Linux™ gentoo-sources-2.6.20-r6

Sometimes the children help with the measurements…
Broadcast message from root@hopper (Sun Oct  1 16:33:48 2006):

Power button pressed
The system is going down for system halt NOW!

Broadcast message from root@hopper (Sun Oct  1 16:33:48 2006):

Power button pressed
The system is going down for system halt NOW!

Broadcast message from root@hopper (Sun Oct  1 16:33:49 2006):

Power button pressed
The system is going down for system halt NOW!

Broadcast message from root@hopper (Sun Oct  1 16:33:49 2006):

Power button pressed
The system is going down for system halt NOW!

Broadcast message from root@hopper (Sun Oct  1 16:33:49 2006):

Power button pressed
The system is going down for system halt NOW!

Broadcast message from root@hopper (Sun Oct  1 16:33:50 2006):

Power button pressed
The system is going down for system halt NOW!
 Where can I ask for help…?
Create an Alioth ID and login

We no longer accept anonymous comments - please create an Alioth ID and login.

Ask questions or discuss the benchmarks in the discussion forums.

Where can I report bugs… request features?

Tell us about content mistakes, inconsistencies, bad installs etc - Report a Bug.

Tell us about the latest language updates etc - add a Feature Request.

We use Andre Simon's highlight to convert program source code to XHTML, please contribute better language definition files.

 

Change the things you don't like - convince us that the change is a worthwhile improvement and then expect to do all the work.

Be Nice! Maybe we'll reject the program. Maybe we'll prefer our own opinions. Maybe we'll decide not to change something.

 How should I implement…?
How should I implement programs for the Benchmarks Game?

We prefer plain vanilla programs - after all we're trying to compare language implementations not programmer effort and skill.

We also have a weakness for idiosyncratic, elegant, clever programs; and when they are too elegant to meet the requirements of the benchmark we might still show them in the ↓ Interesting Alternative Programs section.

How much effort should I put into getting the program correct?

Do design-iteration on your machine, or in a language newsgroup. Only Contribute Programs which give correct results on your machine - diff the program output with the provided output file. (Don't make-unnecessary-work for the committers.)

Leave it a couple of days, and then see if there are any minor improvements that you'd like to make, before you Contribute Programs to The Computer Language Benchmarks Game.

How should I implement data-input?

Programs are measured across a range of input-values; programs are expected to either take a single command-line parameter or read text from stdin.

(Look at what the other programs do.)

How should I implement data-output?

Programs should write to stdout. Program output is redirected to a log-file and diff'd with the expected output.

(Look at what the other programs do.)

How should I identify my program?

Include a header comment in the program like this:

/* The Computer Language Benchmarks Game
   http://shootout.alioth.debian.org/

   contributed by …
   modified by …
*/
How should I implement multiple source code files?

We use a simple script to split a single source file into multiple target source files.

One of the target source files must have the same filename as the original single source file, and is expected to be the 'main' program.

For example, the Eiffel nbody.e source file will be split into 3 target source files - nbody.e, body.e, nbody_system.e - each new target source file will start from the comment line which included the SPLITFILE=target-filename directive and run to the line preceding the next SPLITFILE=target-filename directive or end-of-file.

So, the new target source file body.e will start with the line

-- SPLITFILE=body.e

and end with the empty line following

end -- class BODY
How should I implement loops?

Don't manually unroll loops!

How should I advertise my company, services, website…?

We'll remove any promos that you add as comment text, so please don't waste our time.

 How do I contribute a program?

There are many contributors and few committers - a little more time spent by contributors saves committers a great deal more time.

Attach the full source-code file of a tested program. Please don't paste source-code into the description field. Please don't contribute patch-files.

Before contributing programs

  • read and accept the Revised BSD license - all contributed programs are published under this revised BSD license.

Follow these instructions step-by-step

  1. Start from the bottom. Attach the program source-code file - do this first because it's easy to forget.
  2. Say in the Description how this program fixes an error or is faster or was missing or … Give us reasons to accept your program.
  3. Each Summary text must be unique! Follow this convention:
    language, benchmark, your-name, date, (version)
    Ruby nsieve Glenn Parker 2005-03-28
  4. Category: select the language implementation
  5. Group: select the benchmark
  6. click the Submit button

Now start from the bottom of the "Play the Benchmarks Game" Submit-New form and work your way up.

How can I track what happens to the program I contributed?

You created an ↓ Alioth ID with a valid email address so you'll receive email updates when your program is accepted and measured.

 Where can I see…?
Where can I see previous programs?

Periodically we go through and remove slower programs from the website (if there's a faster program for the same language implementation). We don't remove those programs from the "Play the Benchmarks Game" tracker.

You can see previous programs by browsing though the Play the Benchmarks Game tracker items and looking at the attached source code files. Log In with your Alioth Id, you will be able to create and save a query to search for particular tracker items.

Where can I see more about a Timeout or Error?

Sometimes a program may produce correct results, within the timeout, for smaller workloads - so check the data on the full data page.

You may find information about an Error in the 'build & benchmark results' section of the program page.

Where can I see which language version was used?

You can see information about the language implementation, including the version number, at the bottom of each language comparison page.

Where can I see which compiler and runtime options were used?

You can see the build commands and runtime commands on each program page in the build & benchmark results section.

Where can I see more?

The project is hosted by Alioth GForge Debian.org.

You can browse the CVS tree.

Build dependencies include GTop and BSD::Resource

 Why don't you…?
Why don't you accept every program that gives the correct result?

We are trying to show the performance of various programming language implementations - so we ask that contributed programs not only give the correct result, but also use the same algorithm to calculate that result.

Back in the day, Doug Bagley used both same way (same algorithm) and same thing (same result) benchmarks - so in many cases the performance differences were simply better algorithms.

After hearing many arguments, it seems to me that we should think of same way (same algorithm) tests as benchmarks, and we should think of same thing (same result) tests as contests.

At present, we show just one contest - meteor-contest.

Why don't you include language X?

Is the language implementation

  • Used? There are way too many dead languages and unused new languages - see The Language List and Computer Languages History
  • Interesting? Is there something significant and interesting about the language, and will that be revealed by these simple benchmark programs? (But look closely and you'll notice that we sometimes include languages just because we find them interesting.)

If that wasn't discouraging enough: in too many cases we've been asked to include a language implementation, and been told that of course programs would be contributed, but once the language didn't seem to perform as-well-as hoped no more programs were contributed. We're interested in the whole range of performance - not just in the 5 programs which show a language implementation at it's best.

We have no ambition to measure every Python implementation or every Haskell implementation or every C implementation - that's a chore for Python enthusiasts and Haskell enthusiasts and C enthusiasts.

We are unable to publish measurements for many commercial language implementations simply because their license conditions forbid it.

We will accept and reject languages in a capricious and unfair fashion - so ask if we're interested before you start coding.

 What's it useful for?

You'll come across a range of uses for the programs and measurements:

Revised BSD license