When dealing with performance problems, I’ve noticed an alarming trend: Profiling seems to be something special! Profiling should be your first stop when trying to improve performance, and not pretend profiling either. When you profile a large system, you should profile parts of the production system.
Production Profiling 1.0.1
This doesn’t mean you have to put your profiling into production. I’ve often used a record / playback system for profiling. A recorder of sorts writes a binary log file detailing the actual actions of real users on the production system. A profiling system is then setup (after a few weeks of recording), and you play back the recording, capturing the profiling information. This method means that your users don’t experience bad performance because of the profiling instrumentation of the code.
A common answer to a slow system is to start caching some of the information. But what information should you cache? How long should it be lived? How big should to cache be? Most decent caching systems offer metrics that will tell you what the cache is doing, use this information! There’s not point in gathering information you don’t use.
Finally, there are parts of your system you will always want to know about. These parts should continuously feed you information about their performance, information like:
- How often that special piece of optimized code is actually being used
- How much memory is being consumed in the process
- How long it takes to execute
- Both the entire process, and key sub-parts of the process
- Which users make the most use of the feature
- How often that area of the system is used by day / week / month
If you’re using Java, I would strongly recommend looking into writing some MBeans. JConsole will give you graphing of your stats for free, and also makes it very easy to create custom views of you MBeans.
Measured insight into your system is possibly the only way to truly improve the performance of a system.