MemCached Client for CFMX - alpha version

I've been looking at using memcached in order to speed up a few sites that I've been working on.

What does it do?

Memcached is basically a large RAM cache that is distributed across a network and can be shared between many machines. Many CF applications use the application scope to cache information - including the wonderful CF_Accelerate tag. This works great for single servers (and instances) but once you need to scale beyond this you end up with each server having duplicate copies of the cached data.

Memcache gives you a central place to store this info and allows you to move it out of the ColdFusion memory space which uses as much memory on the network as you have available. Plenty more info on the memcache site.

How do I use it?

Install memcached on a machine (using apt on ubuntu is the following)
sudo apt-get install memcached
Then run it (the -vv gives extra debugging info):
memcached -vv

Download the java api for memcached (for CF7 you need the 1.3.2 version unless you have upgraded your VM)

Copy it into the lib folder for CF and restart CF eg:

sudo cp java_memcached-release_1.3.2.jar /opt/jrun/lib

Get the CFC:

Then try the following test code:

<cfscript>
   oMemCached = createObject("component","MemCached").getPoolInstance();
   oMemCached.shutdown();
   
   //List of server:ports to use
   serverList = '127.0.0.1:11211';
   
   //Create a new memcached object - this could be stored in application    
   oMemCached = createObject("component","MemCached").init(serverList=serverList);
      
   //Create some data to store    
   aTemp = ArrayNew(1);
   aTemp[1] = '34343';
   aTemp[2] = '134343';
   aTemp[3] = '234343';
   aTemp[4] = '334343';
   aTemp[5] = '434343';

   //Save to keys (one add and one set)
   oMemCached.add('key',aTemp);
   oMemCached.set('key2',aTemp);      
</cfscript>

<!--- Output the data from cache --->
<cfdump var="#oMemCached.get('key')#">
<cfdump var="#oMemCached.get('key2')#">

The cfc is not production ready - but is working in some basic tests. One area that is not working (as far as I can tell) is the failover of servers.

Have a play, and all comments welcome.

Comments
Adam Fortuna's Gravatar Very cool stuff. Surprisingly easy to implement in CF!
# Posted By Adam Fortuna | 5/1/07 11:30 PM
Tim Lucas's Gravatar Jealous! I've been looking for an excuse for playing with it on some of my Rails apps but haven't had a project that justifies it (yet). Nice one!
# Posted By Tim Lucas | 5/2/07 1:17 AM
Michael Long's Gravatar Not entirely sure why "each server having duplicate copies of the cached data" is a bad thing, since the primary reason for caching data locally is speed. For example, practicaly every element that appears on a Yahoo home page is pre-rendered and cached locally on each server.

That's why it can deliver extremely complex personalized pages quickly, since fundamentally it's grabbing a block here and a block there, stacking 'em all together, and pushing the result out the door.

It would seem that judicious use of ram and file-based caching would beat this out for speed, especially if you have a significant number of cached items per page, each of which need a separate HTTP request.
# Posted By Michael Long | 5/2/07 3:12 AM
Mark Lynch's Gravatar @Adam
It is easy in CF - the only major gotcha is that the Java objects return null quite a lot - which is a pain to deal with in CF as it doesn't handle it nicely.

@Tim
I'm going to be implementing it over the next couple of months - I'll try to record and blog about the load testing performance before and after implementation.

@Michael
Biggest problem I've run into with having duplicate copies of the data is updating the data. It can be tricky to make the data update across all the CF instances if something updates.
Previous site I was working on was using multiple instances of CF on the same server - which gave very good reliability, but it's very wasteful of resources to have 3 copies (1 per instance) of cache data on a server.
With Memcached you can distribute the cached data across an array of servers, and hence could give you the ability to cache the rendering of every page of a large site - and update them immediately when a change occurs, which is very neat.
# Posted By Mark Lynch | 5/2/07 5:59 AM
Michael Long's Gravatar Mark - Agree about updating being a pain, as that typically means that I need to be a little more conservative in regard to cache periods than I'd really like. But even an expiration period as short as five minutes can still be a major improvement on something like a home page. Or you can do as I did, and setup up a private page that simply rebroadcasts any query string parameters to stub pages on all known servers.

http://myserver.com/private/bc.cfm?sc=12345&ca...

(SC is a security code used to prevent anyone from using the page as a DOS attack vector.)

As such, I don't consider multiple-copies being wasteful. Any place large enough to need a dedicated content caching server isn't going blanch over provisioning a web server with an extra gig or two of ram. Performance-wise you can't beat it. And scalability-wise, you don't have single-point-of-failure issues when the caching server goes belly up.

That said, I do like the possibilites inherent in the technique.
# Posted By Michael Long | 5/2/07 7:05 AM
Tim Lucas's Gravatar Michael: How would you perform fine granularity cache data invalidation using your rebroadcasts? I'd imagine having the single point of failure as a stable and dedicated memcache server is going to be less problematic than a CF app instance. A library/framework approach also has the benefit of controlling the caching at the CFC level, completed decoupled from HTTP requests.
# Posted By Tim Lucas | 5/2/07 8:55 AM
Michael Long's Gravatar If a CFApp instance is going to be problematic then--since it's running your web application--you're going to be dealing with a few issues anyway (grin).

And if I reframe your question you'll see it kind of answers itself: how would you, using a caching server, perform cache invalidation? If I can send a memcache server a key to invalidate (say, an article UUID), then I can probably also broadcast the same key to invalidate.

As to caching, I typically do three types: cfquery caches; caches of instantiated objects, ORM definitions, etc, which need to be present and are not good candidates for serialization; and completely rendered HTML at the view/page level (sidebar items, navigation items, home page content, landing page content, etc.).

The later is the most significant, since for the most part I don't see the point in caching all of the data to do the rendering, and still needing to do the looping and code needed to render it. I'd much prefer to skip that work and cache the final output (all where appropriate, of course). You want sidebar XYZ? Here it is: plop.

Again, I think the technique is cool, and applicable to other areas. I was simply pointing out that the advantage of centralized caching is counter-balanced by the disadvantage of centralized caching. If one of my servers with distributed caching goes down, the other servers LB and pickup the slack. If a single centralized server goes down... then we have issues.

Obviously, you could setup an application cluster to minimize the side effects of THAT scenario, but that makes the resulting solution even more expensive to implement and maintain.

Anyway, all of this doesn't mean people shouldn't do it. Doesn't mean they can't. Just means they need to weigh the pros and cons and make the right decision.
# Posted By Michael Long | 5/2/07 10:49 AM
Mark Lynch's Gravatar @Micheal

Further to this conversation - I just started replying to your comments but it got a bit long so I put it as a new post.

http://www.lynchconsulting.com.au/blog/index.cfm/2...

Cheers,
Mark
# Posted By Mark Lynch | 5/8/07 1:10 PM
Jon's Gravatar The work you're doing here is very important. I hope you make this production ready and can ensure that it works with query objects- that's the thing I need really really bad. memcached queries within cfmx.

Thanks,
Jon
# Posted By Jon | 5/18/07 6:53 AM
Mark Lynch's Gravatar @Jon,

Thanks for the encouragement - I'll be getting back into this in earnest in the next few weeks when I start load testing a major application I'm working on. In theory queries should work just fine - but I haven't tested yet. As a bonus on Scorpio it should be possible to use this to cache full CFC's as they are getting full support for serialization.

Please feel free to try it out and let me know any specific problems you encounter.

Cheers,
Mark
# Posted By Mark Lynch | 5/18/07 7:05 AM
BlogCFC was created by Raymond Camden. This blog is running version 5.1.004.