Benchmarketing Racket parallelism

The Computer Language Shootout is a popular if not-so-informative way to compare the speed of various language implementations. Racket does pretty well on their benchmarks, thanks to a lot of effort from various people, especially Eli. They run benchmarks on both 1 core and 4 core machines, so languages with support for parallelism can take advantage in many cases. However, up until this past week, there were no parallel versions of the Racket programs, and therefore Racket didn’t even show up on the 4-core benchmarks. I set out to fix this, in order to advertise Racket’s up-and-coming parallelism constructs.

There are now two new Racket versions of the benchmarks, one each using futures and places. The mandelbrot benchmark uses futures, getting a speedup of approximately 3.2x on 4 cores, and the binary-trees benchmark uses places, with a speedup of almost exactly 2x.

I learned a few things writing these programs:

Racket’s parallelism constructs, though new, are quite performant, at least on microbenchmarks. With only two parallel programs, Racket is right now competitive with Erlang on 4 cores.
Futures are really easy to use; places take a little more getting used to. Both are quite simple once you get the hang of it, especially if you’ve written concurrent Racket programs before using Racket’s threads.
It can be very surprising which languages are easiest to translate to Racket. F# and OCaml were the easiest, with Scala similar. Programs written in Common Lisp, though fast, were much harder to convert to Racket.
My quick rule of thumb for whether to choose places or futures: if you program does much allocation in parallel, or it needs to synchronize, then use places. Otherwise, futures are probably easier. I think this is roughly in line with the original design, and there are more applications where synchronization is unnecessary than you would think.

There are a bunch more programs that could have parallel implementations; feel free to hack on them, or to improve mine.

8 comments

Geoffrey Knauth said: September 25, 201110:34 pm

This is awesome, thanks! I’m curious, though, what about Common Lisp was hard to translate to Racket? I’m signed up for Stanford’s ai-class 100000+ experiment, and I plan to do most of the assignments in Racket, and maybe some other languages too.

Reply ↓
- Sam Tobin-Hochstadt said: September 25, 201110:48 pm
  
  Common Lisp is just a very different language, with very different style — the style being the bigger issue. Compare this F# program http://shootout.alioth.debian.org/u64q/program.php?test=binarytrees&lang=fsharp&id=3 to this SBCL program: http://shootout.alioth.debian.org/u64q/program.php?test=binarytrees&lang=sbcl&id=3 .
  
  You can certainly write Common Lisp programs that are very easy to translate to Racket, but they wouldn’t be ones that used loop or CLOS or many other CL features, and the same is true in the other direction.
  
  Reply ↓
Stephen Bloch said: September 26, 20118:54 am

I’ve been playing with futures in the past week or two. On my own (image processing) problems, I have yet to get real-time < cpu-time, on a machine with 4 cores. On the futures examples from the Racket Guide "Parallelism with Futures" chapter, I've got real-time < cpu-time, but I have yet to get real-time for a program using futures smaller than real-time for the corresponding sequential program.

Reply ↓
- Sam Tobin-Hochstadt said: September 26, 201112:31 pm
  
  By image processing, do you mean images from the 2htdp/image library? If so, they’re almost certainly not going to parallelize well with futures. The sweet spot for futures currently is programs like the mandelbrot benchmark: almost no communication, and all parallel operations are very simply (numeric computation, vector manipulation, etc).
  
  Also, have you looked at the messages the futures library gives about what it’s blocking on?
  
  Reply ↓
Javin Paul said: September 26, 20119:24 am

how is the performance in a quad core machine ?

Reply ↓
- Sam Tobin-Hochstadt said: September 28, 201112:36 am
  
  I talk about the performance on a quad-core in the post.
  
  Reply ↓
Stephen Bloch said: September 26, 20118:19 pm

Correction: on the “any-double?” examples from “Parallelism with Futures”, I DO have real-time smaller using futures than sequentially. On the mandelbrot examples, I’ve got real-time < cpu-time, but (even on the version with all fl operations) real-time is greater for the parallelized version than for the sequential version.

On the "image-processing" problems… I'm starting with 2htdp/image images, but rendering them to bitmaps before even trying to parallelize anything. Then I have one future working on (say) the top 25 pixel rows, another on the next 25 pixel rows, and so on. All the computations are completely independent; they are reported using bytes-set!, with no two futures affecting the same offset.

Reply ↓
- Sam Tobin-Hochstadt said: September 27, 201112:59 am
  
  On the mandelbrot code from the shootout, now in the git repository here: https://github.com/plt/racket/blob/master/collects/tests/racket/benchmarks/shootout/mandelbrot-futures.rkt I get substantial speedups on 4 cores. What does your code say that’s it blocking on?
  
  Reply ↓