The Computer Language |
"we have found that the CPU time is rarely the limiting factor; the expressibility of the language means that most programs are small and spend most of their time in I/O and native run-time code."
Do your programs startup and finish within a few seconds, like these benchmarks?
Are your programs tiny, like these benchmarks?
Do your programs avoid library use, like these benchmarks?
"We measure three specific areas of JavaScript runtime behavior: 1) functions and code; 2) heap-allocated objects and data; 3) events and handlers. We find that the benchmarks are not representative of many real websites and that conclusions reached from measuring the benchmarks may be misleading."
"The performance of a benchmark, even if it is derived from a real program, may not help to predict the performance of similar programs that have different hot spots."
"Generally, we can get accurate measurements for durations that are either very short (less than around 10 millisecond) or very long (greater than around 1 second), even on heavily loaded machines. Times between around 10 milliseconds and 1 second require special care to measure accurately."
Computer Systems: A Programmer's Perspective, Chapter 9 (pdf)
Measuring Program Performance (pdf slides)
"Most likely, if the performance differences between the alternatives are large, a statistically rigorous method will not alter the overall picture nor affect the general conclusions obtained using prevalent methods. However, for relatively small performance differences (that are within the margin of experimental error) not using statistical rigor may lead to incorrect conclusions."
Statistically Rigorous Java Performance Evaluation (pdf slides)
Statistically Rigorous Java Performance Evaluation (pdf paper)
We can learn something about a particular language implementation from benchmarking - if we already know a great deal about the implementation and carefully analyze the results:
Performing Lisp Analysis of the FANNKUCH Benchmark (55KB postscript)
Java theory and practice: Anatomy of a flawed microbenchmark. Is there any other kind?
Many benchmark suites are designed to help language implementors optimize compiler designs:
nobench performance comparisons between different Haskell systems
Performance regressions in daily development versions of Mono
We should always question how useful a benchmark is for our specific purpose:
There's more to programming language comparison than CPU time, memory use, and program length - but other aspects are less easy to measure, and so are less often measured.
An Empirical Comparison of Seven Programming Languages (pdf)
Note: After reading the "Comparison Validity" section at the foot of pages 24-25, you might decide that it doesn't seem reasonable to compare independently measured programming time for one group of languages against programming time reported by program authors for another group of languages, etc etc
"In order to find the optimal cost/benefit ratio, Wirth used a highly intuitive metric, the origin of which is unknown to me but that may very well be Wirth's own invention. He used the compiler's self-compilation speed as a measure of the compiler's quality. Considering that Wirth's compilers were written in the languages they compiled, and that compilers are substantial and non-trivial pieces of software in their own right, this introduced a highly practical benchmark that directly contested a compiler's complexity against its performance. Under the self compilation speed benchmark, only those optimizations were allowed to be incorporated into a compiler that accelerated it by so much that the intrinsic cost of the new code addition was fully compensated."
Oberon: The Overlooked Jewel (pdf) Michael Franz, in L. Boszormenyi, J. Gutknecht, G. Pomberger "The School of Niklaus Wirth" 2000.
"Overall Performance: PHP is rarely the bottleneck"
Simple is Hard, DrupalCon 2008 (HTML slides) Rasmus Lerdorf
Programming language implementations are compared against each other as though the designers intended them to be used for the exact same purpose - that just isn't so.
"Lua is a tiny and simple language, partly because it does not try to do what C is already good for, such as sheer performance, low-level operations, or interface with third-party software. Lua relies on C for those tasks."
Programming in Lua, preface
"Most (all?) large systems developed using Erlang make heavy use of C for low-level code, leaving Erlang to manage the parts which tend to be complex in other languages, like controlling systems spread across several machines and implementing complex protocol logic."
Frequently Asked Questions about Erlang
"Lua is not intended for building huge programs, where many programmers are involved for long periods. Quite the opposite, Lua aims at small to medium programs, usually part of a larger system, typically developed by one or a few programmers, or even by non programmers. Therefore, Lua avoids too much redundancy and artificial restrictions."
Programming in Lua, page 142
"Ada was originally designed with three overriding concerns: program reliability and maintenance, programming as a human activity, and efficiency … emphasis was placed on program readability over ease of writing … Like many other human activities, the development of programs is becoming ever more decentralized and distributed. Consequently, the ability to assemble a program from independently produced software components continues to be a central idea in the design."
Ada Reference Manual, Introduction, Design Goals
'Our system [Erlang] was originally designed for building telecoms switching systems. Telecoms switching systems have demanding requirements in terms of reliability, fault-tolerance etc. Telecoms systems are expected to operate "forever", they should exhibit soft real-time behaviour, and they should behave reasonably in the presence of software and hardware errors.'
Making reliable distributed systems in the presence of software errors, page 13. (840KB pdf)
"In return for learning a new language, the user is rewarded by the ability to write short, clear programs that are guaranteed to work well on thousands of machines in parallel. Ironically - but vitally - the user need know nothing about parallel programming; the language and the underlying system take care of all the details.
It may seem paradoxical to use an interpreted language in a high-throughput environment, but we have found that the CPU time is rarely the limiting factor; the expressibility of the language means that most programs are small and spend most of their time in I/O and native run-time code."
Interpreting the Data: Parallel Analysis with Sawzall, page 27.