Cache Aside Pattern: Part I

Brief Synopsis

The idea here is simple; we’ve been doing it for ages. After retrieving some data from a durable store you store it into a high speed cache to improve the performance of future reads.

The Upside

You get extremely low latency reads. In a public cloud environment, you get the added benefit of reducing the workload on your relational database (e.g. SQL Server installed on Virtual Machines or SQL Database) that are typically more expensive and harder to scale horizontally.

The Downside

There are a number of challenges. First and foremost, for fear of being dubbed ‘Captain Obvious’, it really only works on reads and you really only reap the benefit of performance improvement after the first time you retrieve the data. However, steps can be taken to proactively load the cache but for large sets of data this can be impractical.

Then there is the issue of synchronization. Any time there are writes to the original system of record you are potentially invalidating the cached data. In an ideal world, you would expire the cached data as soon as it becomes invalid.

In practical terms, there is usually a time gap where the cached data is invalid but hasn’t expired. In the insurance industry, we call it ‘Incurred but not yet reported’ (IBNR, yeah, it’s a real thing, look it up J). Basically risk that has actualized but hasn’t quite turned into a claim yet. Think about a car accident prior to either of the drivers notifying their Insurance Agents that the accident has occurred—there is this time lapse where the insurance company has just a little bit less money than they think they do.

The issue of synchronization is a big one, particularly in distributed systems. In distributed systems, we might have dozens of virtual machines handling incoming web requests that could result in a write that invalidates cached data. At the same time, we likely have that cached data stored on dozens of other virtual machines waiting to be read by future incoming web requests. Somebody has to A) recognize that the write operation we are performing is affecting data that is in the cache and B) to notify all of the virtual machines where the cached data is residing to purge that invalid data (and possibly replace it with fresh data).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s