You’ll often see people show a micro-benchmark to prove a point. I can think of many examples offhand:
- Using a synchronized singleton SimpleDateFormat is slower than tying them to a ThreadLocal or putting them in a pool
- Unfair ReadWriteLocks cause write-lock starvation (writeLock.lock() will never return)
- Using a HashSet of String’s is much faster than doing an indexOf()
These are all true in the confines of a micro-benchmark. The problem is: micro-benchmarks are misleading. For example, number 2. I recently read a micro-benchmark proving that under high contention situations the writeLock may never be acquired. Firstly the documentation does tell you this:
The constructor for this class accepts an optional fairness parameter. When set
true, under contention, locks favor granting access to the longest-waiting thread. Otherwise this lock does not guarantee any particular access order.
Most people assume that the write-lock will have preference over read-locks. However, this is a general-purpose implementation of a ReadWriteLock (ReentrantReadWriteLock). That means it’s designed to work well (and fast) under most situations. The fast is: in most situations, lock contention is relatively low and/or fluctuates over time.
What does this have to do with micro-benchmarking? A micro-benchmark is generally built not to simulate the fluctuation of the systems load. A micro-benchmark is generally designed to simulate a very high load, often where the only thing done is what is being tested (formatting dates, acquiring locks, etc). This results in a very unrealistic picture of whats actually going on.
If we take 3 for example. Looking for a specific substring in a String list (for example “AN” in a comma separated String containing “GH,JK,IH,TA,AN,FR,MN,SA”), there are several different approaches to this:
- Simply use indexOf to see if the substring exists
- Use String.split(“,”) and then iterate through the resulting array to find the substring
- Add the substring tokens to a HashSet and use the contains() method
- Write a method to iterate through the String and test the substring tokens without using substring (ie: a char or some such).
Obviously the fastest (assuming the data can be re-used) will almost always be to add the strings to a HashSet. However if the HashSet is not kept around for re-use later, this is the slowest method. The fastest in this case would be indexOf() or a specialized method. In a micro-benchmark however, it would be easy to prove that a HashSet is by far the fastest, when the String is actually only tested once every day. In which case, whats the point in optimizing?