Caching has been around for decades, as accessing data quickly and efficiently is critical when building application services. Caching is a mandatory requirement for building scalable microservice applications. Therefore, we will be reviewing three approaches to caching in modern cloud-native applications.

In many use cases, the cache is a relatively small data storage medium using fast and expensive technology. The primary benefit of using a cache is to reduce the time it takes to access frequently queried data on slower and cheaper large data stores. In modern applications, there is a multitude of storage methods of caching data, but let’s briefly cover the two most popular approaches:

  1. Storing small amounts of frequently accessed data in local or shared memory (RAM)
  2. Storing larger amounts of data on fast or local disks (SSD)

Caching is considered an optimization along the z-axis of the AKF scaling cube, and while not exclusive, it is one of the most popular approaches to scaling on that axis.

Cache Storage

Let’s consider the first method, caching in memory. This approach was once fairly straightforward, as systems were large and many aspects of a computer program executed on a single machine. The many threads and processes could easily share RAM and cache frequently accessed data locally. As companies have begun transitioning to the cloud, microservices, and serverless, caching has become more of a challenge because service and functional replicas may not be running on the same host, let alone in the same data center.

Systems engineers have adapted to this. We have technologies such as Hazelcast that offer a shared caching model that transcends local restrictions through a secure networking model. There are also other technologies such as Redis, which offer a hash-based lookup system that can be run on RAM as well as fast disks (SSDs) for the tiered cache.

The second storage medium to consider when caching are local or shared SSD systems which may be faster than older magnetic or tape medium. These systems are usually deployed when the content is much larger than the RAM system can store. Typically, large images or video media are cached using these systems.

Cache Warming

The entire premise of caching is that disk lookups are slow, especially for large databases that have many orders of magnitude of data that cannot economically be stored in fast memory (RAM). For example, a stock trading company may keep the most recent transactions in RAM, a process called “cache warming”. The engineers at this company know that their customers will be accessing this data frequently, so they push the latest transactions into the cache system as they occur. This is a more proactive approach than waiting for a user to access data before storing it in the cache, which is the most popular method of caching and we’ll discuss that next.

Cache Hit or Miss?

Most cache implementations are a variation of the approach where data is accessed through normal methods, whether that be a database, storage bucket, or another implementation such as an API. Caching systems are an intermediary where responses are intercepted and stored in memory where they can be accessed again with much lower latency than their slower counterparts. When a request is made, the caching system checks to see if it has the appropriate response. If it does not then it is called a cache miss. In this case, the request will be passed along to the slower system to be fulfilled. The response will be stored in memory. Once this data has been stored in the cache and is accessed again, it is called a cache hit.

When deciding what to cache, and when, it’s important to take into considering the latency of various actions on computer hardware and networking. There are some great resources available to help inform these decisions on the interwebz, but as a general rule of thumb you’ll want to cache anything that is accessed frequently, does not change often, and is on media that is slower than RAM.

Measuring Cache Effectiveness and Costs

Caching is always a good idea when hit rates are high, and the overhead costs outweigh the costs of lost revenue due to unhappy customers. Paying close attention to these costs used to be incredibly important because RAM was very expensive a decade or two ago — and the industry had complex formulas to determine budgets.

Today, RAM is cheap. Use cache whenever you’ve determined the cache look-ups and updates are fast or near zero and the performance benefit is a function of the cache hit ratio. Let’s assume an average lookup time of 10 seconds and a hit ratio of 50%. With instant look-ups, the average would then fall by 50 percent to 5 seconds. Even with less than ideal scenarios (ie, the real world), we’re going to see a huge performance increase by implementing a cache.