Why memcached is probably a bad idea for your Java app

memcached has gotten a lot of press lately, and in the web-application world people are swallowing the coffee as a cure-all for performance related problems. However, in a Java application I would now consider memcached not just a bad idea: it’s in fact harmful.

Don’t get me wrong at this point, memcached is a great little piece of software. It’s simple (both in itself, and in it’s usage), and that’s one of it’s greatest virtues.

So now, why is it a bad idea for a Java app?

Remote caching

memcached is a 100% remote cache. Every time you get an object from memcached is does a network fetch. In a PHP / Python / Perl style application where you likely have memory separation this is a logical course of action. However in Java, there’s often a good chance you already have that object lying around locally already, especially since many smaller clusters are configured with sticky sessions. However the Java clients for memcached that I have seen make no attempt to bind the object through even a WeakReference map of some sort. Therefore, every-time you get something from memcached, you allocate more memory, for an object you may have dozens of in memory already. Considering the amount of memory thrown at servers today, it can take the Garbage Collector ages to get to those objects.

Another side effect of this structure, is that when storing Java object, you incur a Serialization overhead in either direction. Which, while not a huge expense, takes a bit of the edge off.

Storing Database Objects

Most of the use of memcached seems to be storing database objects. This is something you’re database layer in Java may well be doing for you already! Tools like Hibernate already have caches built in to reduce the number of database hits. Yes, memcached is a replicated cache, so is SwarmCache. SwarmCache and similar products have the added advantage of existing within you Java VM and thus not reproducing copies of objects fetched.

Another point I’d like to make at this point is: mostly objects that are fetched from the DB in a web-application are used to build one or more pages. Why not rather store the fully rendered page, or at least parts of it in memcached?

2GB Memory Limit

Okay, this in some ways is one of memcached’s advantages. However when you confront it with the fact that a Java VM quite happily cope with 10GB of memory, why start multiple processes, instead of just one? Why load your scheduler (which is probably a server non-pre-emptive scheduler) with the additional load? Once again this also relates to the “not inside the VM” problem.

Very Large Hash-Tables

Hash-tables inherently get slower as you put more objects in them. This is not such a a big deal generally, but breaking your cache into multiple smaller caches can have a huge impact on performance. Rather caching each object type in it’s own cache (which is generally what you want to do anyway). Each memcached instance running in your network is a nice big Hash-table. Given that most objects take up only a few k of memory, and that memcached instances normally allocate the full 2GB limit, there are often hundreds of thousands or objects stored in each memcached instance.

So what do I suggest?

So I’ve outlined a few problems here, but what sort of solutions would I suggest? First I would say take a look at pure Java caching solutions such as SwarmCache. SwarmCache itself is fairly old now, and hasn’t seen much activity in a while, but it’s stable and well known.

If you really want to ride the memcached wave with Java, I would suggest putting a local cache in front of it with WeakReferences. This means you will never fetch an object from memcached that is already within your VM.

A final, but important note about caching: I would strongly suggest making your cached objects immutable! This will save you from the possibility of concurrent modification, if required: use a Builder that can take an existing object as a template.