Misleading Microbeanchmarks

You’ll often see people show a micro-benchmark to prove a point. I can think of many examples offhand:

  1. Using a synchronized singleton SimpleDateFormat is slower than tying them to a ThreadLocal or putting them in a pool
  2. Unfair ReadWriteLocks cause write-lock starvation (writeLock.lock() will never return)
  3. Using a HashSet of String’s is much faster than doing an indexOf()

These are all true in the confines of a micro-benchmark. The problem is: micro-benchmarks are misleading. For example, number 2. I recently read a micro-benchmark proving that under high contention situations the writeLock may never be acquired. Firstly the documentation does tell you this:

The constructor for this class accepts an optional fairness parameter. When set true, under contention, locks favor granting access to the longest-waiting thread. Otherwise this lock does not guarantee any particular access order.

Most people assume that the write-lock will have preference over read-locks. However, this is a general-purpose implementation of a ReadWriteLock (ReentrantReadWriteLock). That means it’s designed to work well (and fast) under most situations. The fast is: in most situations, lock contention is relatively low and/or fluctuates over time.

What does this have to do with micro-benchmarking? A micro-benchmark is generally built not to simulate the fluctuation of the systems load. A micro-benchmark is generally designed to simulate a very high load, often where the only thing done is what is being tested (formatting dates, acquiring locks, etc). This results in a very unrealistic picture of whats actually going on.

If we take 3 for example. Looking for a specific substring in a String list (for example “AN” in a comma separated String containing “GH,JK,IH,TA,AN,FR,MN,SA”), there are several different approaches to this:

  • Simply use indexOf to see if the substring exists
  • Use String.split(“,”) and then iterate through the resulting array to find the substring
  • Add the substring tokens to a HashSet and use the contains() method
  • Write a method to iterate through the String and test the substring tokens without using substring (ie: a char[] or some such).

Obviously the fastest (assuming the data can be re-used) will almost always be to add the strings to a HashSet. However if the HashSet is not kept around for re-use later, this is the slowest method. The fastest in this case would be indexOf() or a specialized method. In a micro-benchmark however, it would be easy to prove that a HashSet is by far the fastest, when the String is actually only tested once every day. In which case, whats the point in optimizing?


One Response to “Misleading Microbeanchmarks”

  1. Know you contention ratios. « Techno Lemming Says:

    […] Programming, Software, Technical, Threading — Jason @ 3:26 pm After blogging about how Micro-Benchmarks are often misleading and focused on proving a point, I went ahead and wrote a little micro-benchmark to curb my […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: