Hacker News new | past | comments | ask | show | jobs | submit login
Energy Efficiency Across Programming Languages (sites.google.com)
188 points by foob on Sept 14, 2017 | hide | past | favorite | 139 comments



According to their normalized "global" results, something interesting i see:

1. Pascal, surprisingly, the most memory efficient of all. I should take a look at the implementation they used.

2. Rust a good alternative to C which leads in "energy efficiency" and speed.

3. Common Lisp most energy-efficient and fastest and smallest memory footprint of all the dynamic programming languages in the list -- like Python, Ruby, Lua, Perl, and even Racket (which fares pretty well)

4. PHP, JRuby, Ruby, Typescript, Perl, Python, being massively slow than the fastest languages, for example Ruby being 59 times slower than Rust or C.

I agree that these languages (in item 4) are "acceptably fast" for many applications, but we can't say they are "close to the speed" of the fastest languages, even more if we consider the fastest dynamic+interactive languages like Lisp (Python being 21 times slower than Lisp.)

5. I wonder which implementation of Lua they used. Lua can be pretty fast, one of the fastest dynamic languages out there.


> 1. Pascal

Pascal is a "hidden" gem in the area of languages. Sadly not enough pus for it, but imagine if it have the push that other languages have...


"Pascal is a "hidden" gem in the area of languages"

I agree. Programmers won't look at it because they perceive it to be old and out-of-date. But the language hasn't stood still. It's a fast, low-memory language. FreePascal with the Lazarus IDE is one of the best cross-platform development toolkits for building native desktop apps.

Sadly a lot of programmers can never see beyond the verbose (but readable) syntax.


Maybe somebody needs to create a more palatable language, that'll transpile (I know, I know, compile) into pascal?


This guys made a pascal-like language:

http://www.elementscompiler.com/elements/oxygene/

Exist various dialects of pascal (Modula, Oberon, etc) so is not something so far in the wind.

But if you lose too much of the syntax you remove part of the charm.


Palatable how? Pascal's syntax is not especially different from that of most scripting languages


Every time I look at the syntax, it feels really verbose. IIRC some dialects require all caps or initial caps on keywords; at any rate that's how most examples are shown, and it's rather off-putting. Superficial, I know, but I'm probably not the only one.


I don't know about other implementations but FreePascal is completely case insensitive. Identifiers are actually treated the same if the only difference is case. I don't particularly like this, but I've never seen it actually be an issue


That was what I was looking for in Go, but they seem to have other goals in mind.

Failing that there is always Ada, .NET Native, Kotlin Native, D,...


When I started programming in the early 90's, I remember Pascal being a serious contender, with C vs Pascal compilers being a pretty evenly matched contest.


But why should it's memory utilization be so much lower?

Pascal and C give the programmer basically the same control over memory layout, no?


Basically, C's semantics assume too little - everything is a primitive type or a pointer dressed up to look like something else. Pascal has a stronger notion of types and this allows for additional automatic optimization where they're relevant - memory layout just happens to be one such area.


I miss the ability to pair structures as "antagonistic" - aka if one exists the other is not, thus allowing for unions of various datatypes who guarantee that if they exist- the shared memory they reside in is theirs and theirs alone.


Just a wild guess: Pascal may use tightly packed structures without alignment by default?


Much of Pascal (or, rather, Modula-2 and Oberon) is revived inside Go.


And even more so is revived in Nim.


What parts of Pascal do you recognize in Go?


The id type way instead of type id.

The way methods are attached to types appeared first in Oberon-2.

Type assertions are similar to Oberon.

The unsafe package idea is similar to SYSTEM in Oberon or Pascal.

The stronger type semantics forcing explicit casts.


Now I need to finish my lisp.pas


You will like this:

https://github.com/kanaka/mal

(Have implementation in pascal!)


I try to not look at it to not steal ideas :)


Ruby also has gem.


I have noticed a significant speedup using LuaJit, similar to PyPy compared to normal Python. I'm guessing if LuaJit & PyPy were used, the performance for Lua & Python would be similar to JS in these results.


> 5. I wonder which implementation of Lua they used. Lua can be pretty fast, one of the fastest dynamic languages out there.

Apparently 5.3, but I am very skeptical of these results. Also the way the data is organised is very weird. One should not compare a JIT compiled VM (JavaScript) with interpreters (Lua, Python, Ruby...). An honest approach would have used LuaJIT, PyPy etc. or separated the languages into "families" better. Not to mention the choice of algorithms they used. Some of the algorithms in the benchmark game have many flaws.

As it was already stated in this thread, some of the Python ones were multithreaded but the ones in Lua weren't. Why? In a different benchmark I've seen algorithms using C wrappers for Python, but not in Lua. While I understand wrapping C and multithreading reflect the industry use in Python, so they do in Lua. Benchmarks are useless if you are not consistent in the implementations of the things you are comparing with. It disgusts me, to be honest, when I see benchmarks comparing JIT compilers with vanilla Lua just to say "hey we're faster", as LuaJIT does not even exist. It looks like pure dishonesty. In the case of this paper in particular, as they do not represent any language, it strikes me as lack of research / understanding.


> … but the ones in Lua weren't. Why?

No mystery: someone has made the effort to contribute all those multi-process Python programs; no one has made the effort to contribute all those multi-process Lua programs.


Oh yes, that is true. But the person making the benchmarks should be aware of that and either look for better matching algorithms or implement them themselves.


Why should a comparison be restricted just because the best Lua programs-contributed fail to use multi core? (Tail wagging the dog).

If someone who didn't know Lua wrote those programs, then you'd complain that the programs weren't written by an expert.

If Lua wasn't included then I think you'd complain about that too.


If your intention is not to make a good comparison, then sure. I've seen other papers on language implementation and benchmarks, and researchers were way more careful than this, individually inspecting the algorithms they were using.

I happened to spot a problem on this one on the Lua part and also on some other parts concerning how they classify things, because it happens to be things I know more. I'm not going out of my way to look for problems in the rest of the paper because I'm not a reviewer and I have better things to do.

But if I was the one way conducting this academic research, then I would want the results not to be bogus. The difference here is this is not a blog post about some bullshit comparison someone is making. If that was the case, then alright, if public contributed algorithms failed to represent exactly what was being looked for, why not go for whatever is out there. I just expect a more rigorous procedure from academics.


> I just expect a more rigorous procedure from academics

Are they likely to be expert in all those programming languages?

----

If Lua wasn't included would you complain ?

( Like this:

https://news.ycombinator.com/item?id=15255427

https://news.ycombinator.com/item?id=15251242

https://news.ycombinator.com/item?id=15251144

)


> Are they likely to be expert in all those programming languages?

No, but again, I take paper results very seriously. Misleading results are bad for science, period. Here is an example of more rigorous research: https://arxiv.org/pdf/1602.00602.pdf

> If Lua wasn't included would you complain ?

Maybe I would, who knows. I did also mention in another comment it would be nice if Julia was there. It depends how relevant they are for the research being conducted. As a language used very often in microcontrollers, Lua is very relevant for energy efficiency research. I don't know anything about Forth, but I wouldn't just dismiss people saying it should be there with "then contribute forth algorithms to the benchmarks game yourself".


> … I take paper results very seriously…

"disgusts me" & "pure dishonesty" don't seem like a serious response.

> … an example of more rigorous research…

Which again uses programs that were contributed to the benchmarks game.

> … wouldn't just dismiss people…

I wouldn't just dismiss them: I'd tell them that others have every right to present what seems important to them, and exclude what seems less important -- without being accused of dishonesty.

There's only so-much time & money.


> One should not compare a JIT … with interpreters

Because ?


Because they're "different families" of language implementation. A Just In Time compiler is a compiler. It looks like an interpreter, but it compiles the code to machine code and runs it.


And obviously the performance of programs using those language implementations on the same computer under the same workload may be measured and compared.

Why should "different families" of language implementation not be compared?


Firstly, the paper separates languages into compiled languages and interpreted languages to begin with. So I was expecting it would follow this pattern correctly.

Secondly, why would you compare the performance of C and Ruby, for example? Ruby is bound to be slower for design reasons. Doing it to say "Look, C is fast!" doesn't mean anything. There are different reasons one would compare C and Rust, or C and Ruby. The way the results are displayed and what it tries to convey is very important.

If you're not making a baseline, or being very clear, etc. then it's just misleading to have a benchmark portraying Javascript as faster than Ruby, Python and Lua. Either you put those into their correct classifications clearly, or you compare it with LuaJIT, PyPy, Julia etc.


> … the paper separates languages into…

The paper, both shows tables that include all the results (ordered by Energy consumed) and shows separate charts for what the authors classify as "either a compiled, interpreted, or virtual-machine language".

What is "an interpreted language" ?

"Although we refer to Lua as an interpreted language, Lua always precompiles source code to an intermediate form before running it. … The presence of a compilation phase may sound out of place in an interpreted language like Lua. However, the distinguishing feature of interpreted languages is not that they are not compiled, but that any eventual compiler is part of the language runtime and that, therefore, it is possible (and easy) to execute code generated on the fly."

p57 "Programming in Lua" (2003)

> … why would you compare the performance of C and Ruby, for example? Ruby is bound to be slower…

Except when the C program is measured to be 50x slower than the Ruby program --

http://benchmarksgame.alioth.debian.org/u32/compare.php?lang...


Lua is not compiled to machine code. It compiles to an intermediate bytecode, which is then interpreted. So you could say it is a VM. Also bear in mind, LuaJIT is not Lua. It's a different implementation.

I guess you misinterpreted my comment. I didn't imply there are no reasons to compare C and Ruby. I said the reason is important and the way you portray that, and the way you portray your results, changes things.


> Lua is not compiled to machine code…

Do you think Roberto Ierusalimschy is confused about that ?

Again, here's what the creator of Lua says -- "… the distinguishing feature of interpreted languages is not that they are not compiled, but that any eventual compiler is part of the language runtime…"

EDIT:

Is that not what you mean by "interpreted language" ?


> Do you think Roberto Ierusalimschy is confused about that ?

No, of course not. I think you are. You are clearly misinterpreting his words. I will repeat, (PUC)Lua is not compiled to machine code, but to an intermediate bytecode form. Those are completely different things, and what I said does not come in conflict at all with the quote you took from PiL, so I don't know what you are confronting me about.


> Lua is not compiled to machine code, but to an intermediate bytecode form.

Correct.

Now look at what Roberto Ierusalimschy means by "interpreted language": "… the distinguishing feature of interpreted languages is not that they are not compiled, but that any eventual compiler is part of the language runtime and that, therefore, it is possible (and easy) to execute code generated on the fly."

Is that what you mean by "interpreted language" ?


Imaging if the world ran on SBCL instead :)


SBCL is one lisp that I intend to learn more about. It's apparently the lisp of choice for XMaxima, which is a program that I used a lot in high school.


>5. I wonder which implementation of Lua they used. Lua can be pretty fast, one of the fastest dynamic languages out there.

I looked at the binary trees benchmark and while the Python version used multiprocessing to parallelize the code, the Lua version was not parallelized at all..

.. not that this is entirely unrealistic. Python comes with MP in the standard library while I am not aware of any "industrial strength" solution for Lua, whether MP or thread-based.


Idk if it is industrial strong, but there is lanes and probably few more listed at lua-users. Do you have any experience to share on these?

The choice of 5.3 is pretty strange, as Lua is known to split in two at 5.2, because LuaJIT was fixed to 5.1 variant with compat backports and is to my opinion an "industrial" Lua standard. 5.x are actually three different (but not absolutely superior to each other) languages with shared 5-like foundation and app-level code compatibility, so speaking about Lua in general is pointless in a sense.

http://lualanes.github.io/lanes/

http://lua-users.org/wiki/MultiTasking


For an interested Lua programmer, there's a kind-of workable approach --

http://benchmarksgame.alioth.debian.org/u64q/program.php?tes...


The industry standard in Lua is to have multiple Lua states doing stuff. A different approach, but it is there


Where are you getting that Pascal is the most memory efficient (edit: duh, under the "memory efficiency" table, mistakingly read this as "efficient")? It's firmly in the middle of the pack in every result, in their global result table it's slower and less energy efficient than Java (!). It's only advantage is that it's smaller than anything else.

Don't mean to assume you are wrong, but what are you referring to?


In the `Normalized Global Results` table, Pascal is used as the reference (for memory use) and everything else is reported relative to it.


> 5. I wonder which implementation of Lua they used.

There is a table with the versions they used¹, but they don't mention the implementation – in can be inferred for some: e.g. for Pascal they used the Free Pascal Compiler.

① - https://sites.google.com/view/energy-efficiency-languages/se...


Ouch, they used Lua 5.3

LuaJIT is at least an order of magnitude(10x) faster.


To the point where a lot of popular libraries (like Love2D) use LuaJIT by default. IIRC, Lapis (Lua-based web framework) can be easily configured for it as well.

I'd be interested in seeing how it fared; Lua really is a nice language.


Lapis afaik also uses LuaJIT by default because it's built for openresty which does use LuaJIT


> 4. PHP, JRuby, Ruby, Typescript, Perl, Python, being massively slow than the fastest languages, for example Ruby being 59 times slower than Rust or C.

Take a look at some of the TypeScript code and compare it to the Javascript versions. At least some are way different, i.e. the TypeScript uses "modern" JS features and the JS version... looks about as much like C as it can.


Typescript is not its own language in terms of having a distinct runtime from Javascript so it has no performance difference and does not make sense to appear in this comparison.

I can see comparing different target versions of Javascript against one another (es2015 vs es2017 for instance), but it all depends on the VM chosen as well.


Exactly. The difference at least some of these tests is measuring between the two is idiomatic modern JS (promise, map, careful variable scoping) with fast & loose old-school c-like JavaScript (for loops, global vars galore). Typescript has little to do with it.


Typescript can use only of these, so it does make sense to have it on the list. Clearly it doesn't fall right next to JavaScript in the results. It is a benchmark of language implementations not runtimes. This is why JRuby, C# and F# are on the list, one uses the JVM and the latter two both use the CLR. Different languages.


And Forth is missing.


I can't believe they used the Computer Language Benchmarks Game. Those benchmarks don't reflect real-world workloads at all, and the contest has fairly arbitrary rules about implementations and widely differing implementation quality between languages. This should have been rejected by peer review.


This criticism is flawed. The fact that the code is imperfect or sub optimal actually makes their conclusions strong and more useful for real world comparisons.

The code guidelines in their benchmarks represent real programming practices in those languages. They promote "idiomatic" code because it is meant to represent the typical quality of code in the real world. Obviously many of the examples could be made more performant, but in doing so you would made the code less representative of the real world. Example: the typescript code outputs wanky ES6 features that are slower than their old plain js counterparts (classes, arrow functions, let, etc). You could abuse the ts code until its output is identical to js, but you would have a pointless benchmark now.

What would the benchmark honestly represent if the code looked nothing like code in the real world? The theoretical speed of the language just doesn't matter.

In fact, JS engines have been optimizing their performance around numerical benchmarks for decades. The benchmark problems (nbody, etc) are actually highly unrepresentative of real javascript performance because real world javascript is touching strings and awful DOM apis and messing with dictionaries all day.

Your 'real-world workload' should be stuff like text editor operations, a domain where JS's alleged 6x slowdown compared to C has not been remotely approached by current editors.


> The code guidelines in their benchmarks represent real programming practices in those languages. They promote "idiomatic" code because it is meant to represent the typical quality of code in the real world.

This could not be further from my observations when I tried contributing. These results are neither controlled nor the result of idiomatic programs.


I'm talking about stuff like putting your code in classes, initializing with constructors, or using your language's standard library instead of writing your own stuff.

Perhaps idiomatic was the wrong word. I mean that the code isn't supposed to be fighting the language.

What were your observations when you were contributing? Which problems were you working on? Any chance you still have the code?


> I mean that the code isn't supposed to be fighting the language.

That's the issue: those programs that did best were those that fought the language the most, and those that pushed the closest to the edge of the rules. You can't, in general, look at two programs and assume they approach the problem the same way.


The results are what I have expected but yes, mandelbrot and n-body in Ruby? Ruby's typical workload is getting some strings from a web server, building a SQL query, sending it to a database, get a resultset, creating a bunch of objects to represent the result and send back another string to the web server. That would be more energy efficient if done in C but not as much as doing a mandelbrot.

Then factor in the energy of the developer and its supporting environment for the extra development time (commuting, air conditioning / heating, laptop, monitors, etc.) The less the load on the app, the more important it gets.


> Then factor in the energy of the developer and its supporting environment for the extra development time (commuting, air conditioning / heating, laptop, monitors, etc.) The less the load on the app, the more important it gets.

No, it's almost always negligible because it's essentially a one-time cost, whereas the program's inefficiencies are multiplied by an unknown number of users over an unknown period of time (cf. Cobol programs still being in use today) - but for sure your goal is often to maximize both of these unknown numbers.

This fact is very obvious when you develop hardware/firmware combos. The firmware is often designed to minimize the hardware costs (RAM size, mass storage size,...) because the cost of producing the hardware is per piece, while the development cost is mostly a one-time cost that one rarely includes in the total cost of the product.


> Those benchmarks don't reflect real-world workloads at all…

That's a very definite claim, for which you provide no supporting evidence ;-)


One of the worst offenders is probably the binary-trees benchmark.

1. It features an atypically high allocation rate compared to real-world programs.

2. As a synthetic micro benchmark that only ever allocates objects of one size, it makes pool allocators look disproportionately good. But in real-world applications with varying object sizes, overuse of pool allocators creates a serious risk for fragmentation.

3. The rules allow programs that use manual memory management to use pretty much any pool allocator library that they can get their hands on, but forbid garbage-collected languages from adjusting their GC parameters (last I checked, you could easily improve the performance of OCaml and Dart for that benchmark by a factor of 2-3 simply by adjusting GC settings, especially the size of the minor heap).


> It features an atypically high allocation rate compared to real-world programs.

Do you have any supporting evidence for this? Real-world programs can be pretty allocation heavy, and if anything high allocation rates would "unfairly" benefit GC languages since native programs are banned from using arena allocators.

> As a synthetic micro benchmark that only ever allocates objects of one size, it makes pool allocators look disproportionately good. But in real-world applications with varying object size, overuse of pool allocators is a serious risk for fragmentation.

Huh? It's so incredibly common in the real-world for a structure to be of a single type that pretty much every language makes that first-class supported via templates.

And since they are single-size there's no risk of fragmentation at all. Indeed object pools are typically how you avoid fragmentation, they are not things that cause it. So I'm not sure what you're trying to get at with that comment.

> The rules allow programs that use manual memory management to use pretty much any pool allocator library that they can get their hand on, but forbid garbage-collected languages from adjusting their GC parameters (last I checked, you could easily improve the performance of OCaml and Dart for that benchmark by a factor of 2-3 simply by adjusting GC settings, especially the size of the nursery).

Last I checked if I write a Dart library I can't adjust the GC settings in my library to make my library run better, either, but if I ship a C++ library I can absolutely use a pool allocator to make my library run better (and indeed this is exactly what real libraries do)

So I'd call this a good rule, and allowing tweaking of GC parameters would be more about trying to over-tune the runtime to the benchmark rather than the benchmark illustrating what is feasible to achieve in the language as a library or as part of a larger system.


> Do you have any supporting evidence for this?

For starters, by virtue of the construction of the benchmark alone, which mostly is about stress-testing allocation.

There are also quite a few papers that benchmark allocators, such as [1], where you can see that such allocation rates aren't exactly ordinary.

> Real-world programs can be pretty allocation heavy,

I said "atypical", not "impossible". Obviously, I can construct real-world programs with pretty much arbitrary allocation rates, but especially performance-sensitive code will avoid that, if only because it hurts memory locality.

> and if anything high allocation rates would "unfairly" benefit GC languages since native programs are banned from using arena allocators.

My point is that this apples vs. oranges, no matter how you cut it. I don't want one side to "win", I want results that are good science.

> Huh? It's so incredibly common in the real-world for a structure to be of a single type that pretty much every language makes that first-class supported via templates.

That sentence doesn't make sense, unless you live in a world where C++ is the only programming language (because templates are pretty much C++-specific).

> And since they are single-size there's no risk of fragmentation at all.

Simple example: A program alternates between allocating objects of size N (and then freeing most, but not all of them) and allocating objects of size M (and then freeing most, but not all of them). Worst case means that one object per block is enough to keep an entire block alive.

The concern here is long-running programs: if you have high allocation rates, you also have frequent deallocations (or the program eventually stops allocating).

> Last I checked if I write a Dart library I can't adjust the GC settings in my library to make my library run better, either, but if I ship a C++ library I can absolutely use a pool allocator to make my library run better (and indeed this is exactly what real libraries do)

And instead you could tune the final application; allocation tuning is often a non-local concern. Apples and oranges again.

Importantly, you seem to be mistaking this for a tribal argument (the GC tribe vs. the manual memory management tribe), whereas my point is simply that the benchmark is bad and tries to compare incomparable things.

[1] http://dl.acm.org/citation.cfm?id=3030211


> … my point is simply that the benchmark … tries to compare incomparable things.

That trump card wasn't actually mentioned in your original comment.

The difficulty is that programming languages (and programming language implementations) are more different than apples and oranges, but the question is still asked - "Will my program be faster if I write it in language X?" - and there's still a wish for a simpler answer than - It depends how you write it!

http://benchmarksgame.alioth.debian.org/dont-jump-to-conclus...


> Obviously, I can construct real-world programs with pretty much arbitrary allocation rates, but especially performance-sensitive code will avoid that, if only because it hurts memory locality.

That's not actually true at all. Games, for example, can have high allocation rates. But they use custom allocators to make those allocations extremely cheap using things like arena allocators for the allocations for a frame.

> My point is that this apples vs. oranges, no matter how you cut it.

It's not apples vs. oranges at all, though. It's "how can you solve this problem in the optimal way on any given language"

The set of rules does not appear to unfairly punish any particular language design. You can do object pools in GC'd languages, too, for example.

Does the problem itself have bias? Probably, but real problems in the real world have inherent language biases, too. That's a problem with reality, not a problem with the benchmark.

> That sentence doesn't make sense, unless you live in a world where C++ is the only programming language (because templates are pretty much C++-specific).

generics? Not C++ specific.

> Simple example: A program alternates between allocating objects of size N (and then freeing most, but not all of them) and allocating objects of size M (and then freeing most, but not all of them). Worst case means that one object per block is enough to keep an entire block alive.

That example results in terrible fragmentation for everyone if M > N other than a compacting GC. It's not made particularly worse by an object pool.

> Importantly, you seem to be mistaking this for a tribal argument (the GC tribe vs. the manual memory management tribe), whereas my point is simply that the benchmark is bad and tries to compare incomparable things.

I'm not mistaking it for that at all. I'm saying your arguments for why it's bad are bad. You seem to be upset that benchmarks for problems exist that do not represent your priorities of what should be benchmarked.


> The set of rules does not appear to unfairly punish any particular language design. You can do object pools in GC'd languages, too, for example.

Actually the rules for the Computer Language Benchmarks Game say about Binary Tree: 'As a practical matter, the myriad ways to custom allocate memory will not be accepted. Please don't implement your own custom "arena" or "memory pool" or "free list" - they will not be accepted.'


> It's not apples vs. oranges at all, though. It's "how can you solve this problem in the optimal way on any given language"

And for any synthetic microbenchmark, that's unlikely to reflect real-world workloads, as they tend to narrowly test just one aspect of the language (or rather, its implementation).

Remember, we're talking about a peer-reviewed paper here, where the burden to show relevance is upon the authors. Section 4 ("Threats to Validity") of the paper does not really address that adequately.

As the Computer Language Benchmark Game site itself quotes (and one reason why it calls itself a "game" and disavows usefulness for actual programming language comparisons):

  Attempts at running programs that are much simpler than a real
  application have led to performance pitfalls. Examples include:

  ...

  * Toy programs, which are 100-line programs from beginning
    programming assignments ...
  * Synthetic benchmarks, which are small, fake programs invented
    to try to match the profile and behavior of real applications
    ...

  All three are discredited today, usually because the compiler
  writer and architect can conspire to make the computer appear
  faster on these stand-in programs than on real applications.
You can construct pretty much arbitrary rules that will arbitrarily favor certain implementations over others. For example, the rules could also say to only use the language's built-in allocation mechanism for this benchmark. Or you could construct a benchmark that would heavily favor languages with built-in JIT techniques.

> You can do object pools in GC'd languages, too, for example.

No. The rules do not allow for that. I did mention how arbitrary they are, right? The regex-redux benchmark, for example, comes down to what the best external library is that you can link to under the rules. The gcc version wins in large part (being massively better than the g++ version, which relies on Boost regexes) because it uses the JIT version of libpcre. It's borderline absurd.

> generics? Not C++ specific.

You said templates, not generics. Generics alone are not necessarily sufficient to implement pool allocators. Plus, C++ is the only language that has major adoption that has both some form of generics and manual memory management.

> That example results in terrible fragmentation for everyone if M > N other than a compacting GC. It's not made particularly worse by an object pool.

This is false. Even a simple first-fit allocator can fare better, for example. Obviously, a compacting GC can avoid external fragmentation (almost) entirely, no matter the cause.


> … and one reason why it calls itself a "game"…

No: http://benchmarksgame.alioth.debian.org/sometimes-people-jus...

> … and disavows usefulness for actual programming language comparisons…

Not a definitive conclusion but a starting point.


> The rules allow programs that use manual memory management to…

And still there's outrage that programs with allocators custom-written for binary-trees are rejected :-)


Maybe we could get away with a couple smaller claims:

- It would be weird to expect programs specifically optimized for a performance benchmark, to also be optimal for energy usage or memory. Maybe less weird for energy, if runtime is the biggest factor in how much energy gets used. But I'd expect there to be huge tradeoffs between runtime and memory usage, once we start really optimizing for memory.

- Running the Benchmarks Game has meant doing a lot of work to figure out what the relationship between "optimal" and "representative" is, and what the rules of the game should be as a result. Even if we take it for granted that we've answered those questions in a way that gives us useful benchmarks for performance, it would be weird to expect the answers to be exactly the same in benchmarks for energy or memory.


Maybe we should look at "4 Threats to Validity" section of their paper and ask whether they have really done enough to suggest that their results should be considered general rather than specific.


Perhaps I'm over simplifying to the point of making mistakes, but intuitively I would expect that performance and Energy Efficiency are strongly correlated, because I assume that each instruction executed takes a certain amount of time and a certain amount of energy, and that this skills roughly linearly. That is, executing n instructions takes A * n time and B * n energy, so of course making it finish sooner means less instructions means less energy.


performance usually == energy efficiency on modern cpus. race to shutdown.


Does your comment agree or disagree with what they measure?


The implementations in the benchmark game are all targeting performance.

This will almost always also be the most energy efficient thing on modern cpus.

It will not always be the most memory efficient way. I neither agree or disagree with what they are measuring, its just data. I would not look at this if I was trying to get an idea of what languages are most memory efficient, maybe.


I wondered if you'd looked at all ;-)


they reflect a number of different real world workloads.


For a bunch of years i was writing in Java and driving V8. Now i'm back to native (C++) and am driving Prius these days. I guess that it is my eco-consciousness that is making me shudder in disgust every time i look at Python... pretty much the same way like when looking at Hummers. While of course i love Perl, an M1 Abrams which gets things done despite anything :)


A real shame that Forth wasn't included in the tested languages. Chuck Moore has been an advocate for more energy efficient computation for a while now.


The tests come from the benchmark game. If you want to see Forth, implement the tests in Forth:

http://benchmarksgame.alioth.debian.org/


> The tests come from the benchmark game. If you want to see Forth, implement the tests in Forth:

But perhaps archive your code elsewhere. Github? Lots of language communities participated in past versions of the game. Like gforth and bigforth on shootout.alioth.debian.org circa 2008. Now it seems little remains but a few archive.org snapshots without source. Though perhaps all of that code was archived somewhere else that I'm not quickly finding?


It seems like you are entirely ignorant of the facts.

Sitting in the same old-repo, a GForth binary-trees program from 2005 --

https://alioth.debian.org/scm/viewvc.php/shootout/bench/bina...

edit Now it seems little remains but for you to lift-that-shade and tell-everyone how amazing it is that someone took the trouble to archive those obscure old programs.


@igouy points out that while the old shootout site is gone, its language files are still available[1]. There are about 70 benchmarks, in about 70 languages. I remember them as interesting multilingual browsing.

Unlike the old shootout, adding languages directly to benchmarksgame.alioth.debian.org is not an option. "Because I know it will take more time than I choose. Been there; done that."[2] :) Instead, the code is available[3], and communities are invited to document their own benchmarksgame comparisons[2], such as Nim''s[4]. The OP used different code[5].

A brief glance at the current benchmarks, suggests there is some overlap with the old ones. Or at least an overlap of names - the requirements may have changed. So it might be possible to get started fairly easily? Perhaps even to do several languages...

[1] https://alioth.debian.org/scm/viewvc.php/shootout/bench/?roo... [2] https://benchmarksgame.alioth.debian.org/play.html [3] https://github.com/Byron/benchmarksgame-cvs-mirror [4] https://github.com/def-/nim-benchmarksgame [5] https://github.com/greensoftwarelab/Energy-Languages


> Unlike the old shootout, adding languages directly to benchmarksgame.alioth.debian.org is not an option

No, not unlike the old shootout -- exactly like it !

Please stop making things up.


It's a pity that the OP didn't include Nim despite benchmark files for it being available.


If a summary of the test data is/were machine readable in the individual language repos, perhaps one could create an automated aggregation? A distributed/federated version of the old shootout. One with lower maintenance requirements. bendmarksgame-results.json?

The data would be crufty, of limited comparability. But combining easily browsable links to colorized source code, with "just for a rough feel" speed relative to C, might be sufficient for the use case of raising language awareness - "What is this C-speed-like language I've never heard of? Oh, that looks pretty! I think I'll explore this language's web page..."

Or alternately, use the language files on github to create a new, broader benchmarksgame. That is, distribute the work of benchmark revisions and makefile compiler options, but keep the testing centralized. I've no idea of the relative costs of those tasks, or of others. But a continuous integration shootout sounds intriguing.


> A distributed/federated version of the old shootout

It would be wonderful but how do you ensure any form of hardware consistency?


> There are about 70 benchmarks…

Most of which were replaced by something better.


I'd rather say it begun to be a focus with its latest processors because it's the only hope they have to sell Forth processors. I remember he mentioned introducing a lot of complexity in its previous design to achieve very high speed for exactly the same reason (the only hope we have to sell a Forth processor is to make it super-fast), which was a bit in contradiction with the reason why he moved from software to hardware - simplifying globally the hardware/software combo.


Yes, see several minutes starting around 01:18 in this talk:

https://www.infoq.com/presentations/power-144-chip

As for efficiency, he mentions 7 picojoules per instruction in his forth chips.


Note: The PDF of the paper has the a lot more detailed listing of the results (with graphs, explanation, etc) than the results web page: http://greenlab.di.uminho.pt/wp-content/uploads/2017/09/pape...


Only while printing it, I noticed my university name there! :)


Rust is fairly competitive with C, but it's memory use was 50 percent higher. I bet if they focus on getting the memory usage down it will perform better too.


Rust uses jemalloc by default; I wonder what it would look like if you swapped the allocator.


I'm kinda surprised to see Go score so low on some of these, given that it's AOT-compiled to native code, and that its memory model and safety guarantees are optimization-friendly.

I'm also kinda surprised that OCaml scored that high, considering how high-level it is.


OCaml has an incredible optimizer, and many of the abstractions that one would use in OCaml are transparent to the optimizer.

You could say the opposite about Go.


Kinda makes me wonder why OCaml didn't become more popular for Unix systems programming, the way Go seems to be becoming now. Looking at the two languages, it seems like pretty much everything that Go has to offer in that department, OCaml has as well - and if it also optimizes better, why is this even a contest?


Higher-level scripting languages. Fast for programmer's time, slow for run times.


Not really, if you look into Lisp results.


On of the things with the C set is they didn't use Intel's icc. On our main application (which involved a lot of heavy duty number crunching) it speeds up our application by 100% over the best we can get out gcc.


I've seen 20% speedups with icc over gcc; but not a doubling. Is this including mkl over some other numeric library as well?


No nothing extra. The code does vectorise well. But even if the result is 20%, it would make quite a bit of difference to the results presented in this paper.


I'm suprised at the Java-C# comparaison. When are this numbers from?

The following blogspot explains how the author improved C# implementations after seeing defavorable C# results.

https://anthonylloyd.github.io/blog/2017/08/15/dotnetcore-pe...


So go (a garbage collected language) is about as memory efficient as pascal, the most memory efficient language. That comes surprising.


Not really. "Manual" memory allocation is usually malloc/free, which leaves a lot of gaps.


Go's garbage collector isn't compacting.


My point wasn't that garbage collection is more efficient than people think, but that "manual" memory allocation is less efficient than people think.


To the point that people used GC enabled systems programming languages to write full OSes[0], but as Joe Duffy puts on his last keynote, winning prejudice is an uphill battle.

[0]- Done at Xerox PARC, DEC/Olivetti, ETHZ, Erlangen, MSR


I have just started looking into Ada and it is cool to see up there with C. Does anyone know a good resource for writing fast Ada?


No Julia?


I see they monitor a complete process lifetime, not just the active workload. A quick grep of the paper didn't turn up anything discussing this. And it would seem to hurt dynamic languages and runtime-JIT languages a lot. Perhaps the active workload takes long enough that the transient is washed out. Anyone have insights?


If you want to tease the startup out, then you can run the game with a null or extremely small input set and then discount the time it takes to run a small/null workload from the actual run.

But really, if they hurt, then they get hurt. There's only so many knobs you can turn and so many runs you can run before you have to assemble a table and submit your paper. They put all the code online so you can try it yourself.


> Perhaps the active workload takes long enough that the transient is washed out.

That does seem to be the case with some-of the benchmarks game programs --

http://benchmarksgame.alioth.debian.org/sometimes-people-jus...


Considering most CPU cores (Arm/X86/etc) are optimized for C/GCC its an unsurprising result


Can someone elaborate. I always thought that the compiler was optimized for the architecture, not the other way around. Why would an architecture be optimized for a specific compiler?


I'd love to see some scatter plots of CPU-time and energu-time, I expected a fairly linear correlation but there looks to be quite a lot of outliers, some but not all are explained with parallelism. Memory seems all over the place too.


Why is TypeScript so much less efficient than JavaScript?


Guess: its output has more lines of JS than the vanilla JS versions, and since a lot of the variation between scripting languages in these things often come down to how fast they can stop running themselves and start running some underlying C, those extra lines hurt it.

This may also explain why JS is really fast in a couple of the examples. Those script likely do almost nothing but call some built-in function that does all its work in C (or C++, or whatever).

[EDIT] OK, if I'm reading these[1][2] files correctly, it has more to do with the the TS version being written in "modern" "idiomatic" "beautiful" JS with promises and crap, and the vanilla JS one being written in "ugly" "bad" old-school 90's-lookin' C-ish Javascript.

[1] https://github.com/greensoftwarelab/Energy-Languages/blob/ma...

[2] https://github.com/greensoftwarelab/Energy-Languages/blob/ma...


It seems to only be significantly different with fannkuch-redux. Looking at the CLBG page (http://benchmarksgame.alioth.debian.org/u32/performance.php?...) for that algorithm there doesn't seem to be very big difference at all.

I cannot explain the difference and if this is indeed an error by the authors of the paper it casts a fairly big shadow on the reliability of the rest of the data.


less optimal implementation is all


Haskell did not shine here. Ocaml did surprisingly well.


One wonders how Kotlin (or other popular JVM languages) would compare. And also Objective-C or Smalltalk.


I am curious to know, if there is any explanation for why java performs better than Go.


Two decades of optimization probably account for that. From the paper:

    Moreover, the top 5 languages that need less energy and
    time to execute the solutions are: C (57J, 2019ms), Rust (59J,
    2103ms), C++ (77J, 3155ms), Ada (98J, 3740ms), and Java
    (114J, 3821ms); of these, only Java is not compiled.
That is an admirable accomplishment. JRE 1.8 has the most efficient "managed" language runtime tested.

The PHP result is interesting; they used 7.x which is roughly twice as efficient as older (5.x) versions; PHP 7 was primarily a (successful) effort to improve performance. If you extrapolate the PHP result to the old version (multiply by 2) it's nearly as bad as Perl and among the slowest/most costly. That certainly corroborates Facebook's several PHP re-implementations.


The Java performance was the most interesting part of the results for me. I'm glad it got some attention. I have been developing the opinion for awhile that, as an industry, we've become far too avoidant of "pre-mature" optimization. It is astonishing to me how much slower a lot of this stuff is than Java.

Granted, I also think considering Java to be "not compiled" is a slight misrepresentation since it does get compiled to Java bytecode.


Go was invented in the 1970s and then placed on the self for 40 years


I personally enjoy Go but your comment gave me a chuckle since it's very true. Have your upvote, friend, for whatever they're worth on HN.


20 years of optimizing tends to do this


I can't speak to the Java comparison, but the site says they used Go 1.6.3. The SSA back end introduced in Go 1.7 (amd64) came with a nice performance boost.

https://blog.golang.org/go1.7


It'd be nice if these werent images of the tables ...:-\


They're just screen capped from the actual tables in the paper PDF.


brief: C is the winner of all the races ( time, energy and memory)


Pascal won on memory.


Nim compiles to C




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: