Is Nashorn (JVM) faster than Node (V8)?

The answer to the question “Is Nashorn (JVM) faster than Node (V8)?” is in most people’s minds: a foregone conclusion, looking something like this expression.

no-way-9yykt8

It certainly was in my mind, until I actually ran a very simple benchmark that computes the Fibonacci sequence a few times (recursively). It’s a common enough benchmark, one frequently used to test method dispatching, recursion and maths in various languages / virtual machines. For disclosure, the code is listed below:

function fib(n) {
  if (n < 2)
    return 1;
  return fib(n - 2) + fib(n - 1);
}

try {
  print = console.log;
} catch(e) {
}

for(var n = 30; n <= 50; n++) {
  var startTime = Date.now();
  var returned = fib(n);
  var endTime = Date.now();
  print(n + " = " + returned + " in " + (endTime - startTime));
}

The results are, lets just say: “shocking”. The table below is both runs along side each other, the last numbers are the important ones: how many milliseconds each run took.

jason@Bender ~ $ node fib.js 
30 = 1346269 in 12
31 = 2178309 in 17
32 = 3524578 in 27
33 = 5702887 in 44
34 = 9227465 in 69
35 = 14930352 in 113
36 = 24157817 in 181
37 = 39088169 in 294
38 = 63245986 in 474
39 = 102334155 in 766
40 = 165580141 in 1229
41 = 267914296 in 2009
42 = 433494437 in 3241
43 = 701408733 in 5302
44 = 1134903170 in 8671
45 = 1836311903 in 13626
46 = 2971215073 in 22066
47 = 4807526976 in 45589
48 = 7778742049 in 74346
49 = 12586269025 in 120254
50 = 20365011074 in 199417
jason@Bender ~ $ jjs -ot fib.js 
30 = 1346269 in 70
31 = 2178309 in 16
32 = 3524578 in 19
33 = 5702887 in 30
34 = 9227465 in 48
35 = 14930352 in 76
36 = 24157817 in 123
37 = 39088169 in 197
38 = 63245986 in 318
39 = 102334155 in 517
40 = 165580141 in 835
41 = 267914296 in 1351
42 = 433494437 in 2185
43 = 701408733 in 3549
44 = 1134903170 in 5718
45 = 1836311903 in 9306
46 = 2971215073 in 15031
47 = 4807526976 in 34294
48 = 7778742049 in 38446
49 = 12586269025 in 61751
50 = 20365011074 in 100343

So what on earth is going on here? How can it be that the last result is 99 seconds faster on the Java VM than on V8, one of the fastest JavaScript engines on the planet?

The answer is actually hidden at the top of the tables: the -ot option being passed to JJS (Java JavaScript) is the key to this magic sauce. OT stands for Optimistic Typing, which aggressively assumes that variables are all compatible with the Java int type, and then falls back to more permissive types (like Object) when runtime errors happen. Because the code above only ever produces int values, this is a very good assumption to make and allows the Java VM to get on and optimise the code like crazy, where V8 continues to chug along with JavaScript’s Number type (actually it is more intelligent than that, but it’s just not showing in this little benchmark).

The point of this post is: don’t believe everything you read in benchmarks. They are very dependant on the code being run, and micro-benchmarks are especially dangerous things. The work that has been done on Optimistic Typing in Nashorn is amazing, and makes it a very attractive JavaScript environment. But you shouldn’t believe that just because these benchmarks show such a wide gap in performance that your Express application would run faster in Nashorn than under V8… In fact you’d be lucky to get it to run at all.

Feel free to replicate this experiment on your machine. It’s amazing to watch, it surprises me every time I re-run it.

Advertisements

Misleading Microbeanchmarks

You’ll often see people show a micro-benchmark to prove a point. I can think of many examples offhand:

  1. Using a synchronized singleton SimpleDateFormat is slower than tying them to a ThreadLocal or putting them in a pool
  2. Unfair ReadWriteLocks cause write-lock starvation (writeLock.lock() will never return)
  3. Using a HashSet of String’s is much faster than doing an indexOf()

These are all true in the confines of a micro-benchmark. The problem is: micro-benchmarks are misleading. For example, number 2. I recently read a micro-benchmark proving that under high contention situations the writeLock may never be acquired. Firstly the documentation does tell you this:

The constructor for this class accepts an optional fairness parameter. When set true, under contention, locks favor granting access to the longest-waiting thread. Otherwise this lock does not guarantee any particular access order.

Most people assume that the write-lock will have preference over read-locks. However, this is a general-purpose implementation of a ReadWriteLock (ReentrantReadWriteLock). That means it’s designed to work well (and fast) under most situations. The fast is: in most situations, lock contention is relatively low and/or fluctuates over time.

What does this have to do with micro-benchmarking? A micro-benchmark is generally built not to simulate the fluctuation of the systems load. A micro-benchmark is generally designed to simulate a very high load, often where the only thing done is what is being tested (formatting dates, acquiring locks, etc). This results in a very unrealistic picture of whats actually going on.

If we take 3 for example. Looking for a specific substring in a String list (for example “AN” in a comma separated String containing “GH,JK,IH,TA,AN,FR,MN,SA”), there are several different approaches to this:

  • Simply use indexOf to see if the substring exists
  • Use String.split(“,”) and then iterate through the resulting array to find the substring
  • Add the substring tokens to a HashSet and use the contains() method
  • Write a method to iterate through the String and test the substring tokens without using substring (ie: a char[] or some such).

Obviously the fastest (assuming the data can be re-used) will almost always be to add the strings to a HashSet. However if the HashSet is not kept around for re-use later, this is the slowest method. The fastest in this case would be indexOf() or a specialized method. In a micro-benchmark however, it would be easy to prove that a HashSet is by far the fastest, when the String is actually only tested once every day. In which case, whats the point in optimizing?