A cache stampede can occur when web servers face very high loads. To maintain the efficiency of backend services, caching is vital. But if the caching mechanism comes under intense strain, slow performance and even failure can occur. While there are several strategies for mitigating cache stampedes, each has limitations. So for this article, we’ve teamed up with Jointly, to discover their innovative approach to tackling cache stampedes.
Cache Stampede: what is it and how to prevent it?
High performance web apps and services wouldn’t survive without caching. Caching prevents unnecessary re-computation of data and, when functioning properly, allows far greater traffic levels than would otherwise be the case. HTML, database queries, objects and other types of data can all be cached, often using dedicated frameworks like memcached.
Cached data needs periodic regeneration to stay up-to-date. But if too many cache regeneration requests occur at the same time, backend servers may be overwhelmed. This problem is known as a cache stampede. To understand it, let’s consider the basic mechanism of a cache service. If requested data exists in the cache, a ‘cache hit’ occurs. If not, a ‘cache miss’ is registered and the request is passed on to a backend service to recompute the data. This all works well until we have high levels of cache misses at the same time.
A cache stampede involves multiple concurrent attempts to write the same piece of data to the cache. It gets worse as further requests pile into the already overloaded server, leading to a self-reinforcing cascade of more requests and slower performance. This is known as a race condition – multiple threads requesting a shared resource, each retarding the capacity to produce it. The results are typically non-determinate and can sometimes lead to the complete collapse of a system.
General strategies for avoiding cache stampedes and their limitations
Different strategies can be used to mitigate the problem of cache stampedes. Let’s look at the three main approaches.
External recomputation
The strategy of external recomputation does away with the problem of a request surge by regenerating caches in an independent process. The downside of this, however, is that it demands greater resources to maintain the separate updating process. By regenerating caches whether or not they are needed, it can be wasteful. Plus, determining which resources are to be recomputed is usually complex, requiring awareness of the logic of the system being mirrored. This violates the engineering principle of separation of concerns.
Probabilistic early recomputation or expiration
Probabilistic early recomputation attempts to spread load by bringing forward cache regeneration requests probabilistically. The probability increases as it gets closer to the actual expiration time and because each request’s decision is made independently, regeneration hits should be better distributed. It is, however, not a guaranteed solution and it can be difficult to configure the probability parameters. Notice that this pattern works well when you have a correct prediction of where and when your load might be directed to certain resources, otherwise it can result in wasted resources.
Locks
Finally, locks provide a means of limiting the number of regeneration requests for any given resource. A lock is issued for the first regeneration request following a cache miss, preventing all other requests until completion. But the question remains of what to do with these remaining requests. They may simply be queued. Alternatively, a stale cached item could be served until the fresh one is available. But neither of these outcomes is ideal, and fault-tolerance is poor: if cache regeneration fails, the timing of the next successful request is unpredictable.
Asynchronous batching: an innovative solution
Given these partially effective, but suboptimal solutions to cache stampedes, what can be done to maintain high-performance caching and preserve backend stability? An innovative solution, chosen by Jointly for their systems, is asynchronous batching.
What is asynchronous batching?
Each of the above strategies for cache management has particular problems. With both external recomputation and probabilistic early recomputation strategies, cache regeneration is compromised by insufficient control or unpredictability. Locks give greater rigidity, but are somewhat inflexible and can lead to unnecessarily queued requests. So is there a way to restrict cache recomputation without holding everything up? That’s just what asynchronous batching is designed to achieve.
Asynchronous batching uses a construct known as a promise, available in the standard libraries of many modern programming languages, including Python, Dart, C++ and JavaScript. A promise object represents the outcome of an asynchronous process, be it success or failure. In the case of asynchronous batching, promises can be stored in a temporary cache. Multiple clients awaiting the fulfilment of the same promise then get the same result at the same time as the promise is returned, without multiple requests to the regeneration server.
Why did Jointly choose this approach?
Jointly’s decision to develop asynchronous batching as the solution to cache stampedes was based primarily on efficiency and stability. The technique allows coders to attach callbacks to the promise rather than passing them into a function. This means that promise chains can be created which perform better and are more stable than old-fashioned error-prone callback pyramids. Code is simpler since error handling is added only once at the end of the .then chain, covering exceptions for all stages. This decoupling of results and processes also improves flexibility and reduces latency.
How to implement it in a caching library
Jointly’s implementation of asynchronous batching uses Node.js. Their library is fully open source and can be consulted here. However, the basic mechanism can be given in outline relatively simply.
Let’s look at a simple example to see the algorithm at work. This code prints “42” ten times during the same tick of the event loop as the timeout is reached. In each case, it is the same promise.
The code below is a simple example of asynchronous batching. The function getResult() returns a promise that resolves to the value 42 after a delay of 1.5 seconds. The first time getResult() is called, it creates and returns a new promise, storing it into the ‘myPromise’ variable. Subsequent calls to getResult() within the same execution context return the same promise without creating a new one.
In the async IIFE(Immediately invoked function expression), a loop is run 10 times, with a 100ms delay between iterations. Each iteration calls getResult() and logs the result to the console after it is resolved. Because the calls to getResult() effectively return the same Promise, they will resolve all together without asking 10 times for the function execution/resource allocation to be done. This allows for more efficient use of resources and avoids overloading the system with too many simultaneous requests.
let myPromise = null;
function getResult() {
if (myPromise) {
return myPromise;
}
myPromise = new Promise((resolve, reject) => {
setTimeout(() => {
resolve(42);
}, 1500);
});
return myPromise;
}
(async () => {
for (let i = 0; i < 10; i++) {
await new Promise((resolve) => setTimeout(resolve, 100));
getResult().then((result) => {
console.log(result);
});
}
})();
Code language: JavaScript (javascript)
More resources and best practices for asynchronous batching
For many, learning about cache stampedes has come the hard way. Even some of the biggest web infrastructures have suffered. For example, Facebook suffered one of its biggest outages to date in September 2010 due to a cache stampede. The problem was caused by a configuration update that caused all clients to query the database cluster for a refreshed value at once. The cluster was quickly overwhelmed.
A lot has been learned since Facebook’s mishap. So what can we take from it? Let’s go through a few lessons:
- Add more caches (Russian Dolls Cache). Given that excess caching requests are the source of the problem, this may sound odd, but a cache hierarchy can add more stability. Just as your operating system uses L1 and L2 caches, so can your applications. This double-level caching strategy means that an expired L2 value may still be served by your L1 cache, helping to avert a stampede. However, the approach is not foolproof.
- Try early recomputation. We’ve seen above how probabilistic early recomputation can help distribute cache regeneration requests over time. Again, it’s not without its problems, but this approach has been adopted by the Internet Archive after it suffered its own cache stampede outage in 2016.
- Use Promises instead of Locks. Locks can prevent multiple clients from accessing the same resource at the same time. But how do you notify clients when the cache is renewed? As we have seen, leveraging promises allows you to process cached resources asynchronously and automatically notify clients once ready.
- Use a circuit breaker. Because a cache stampede is a self-reinforcing problem, it can be hard to recover services once it has started. Another answer is to use the circuit-breaker pattern in your code. This is effectively a kill-switch that trips once a certain threshold of failures is reached, however, this solution implies a high level of complexity that many a time can lead to the generation of a new set of issues.
Conclusions, why it’s key to tackle cache stampedes
Cache stampedes arise from a relatively simple problem but soon spiral out of control, devastating your service infrastructure. Without putting in place guards and mitigating measures, you can never be sure that a traffic spike, fault or outage won’t trigger a cascade of requests to your backend. At best you’ll take a performance hit, at worst, your sites, apps or services could be brought down. And cache stampedes can be hard to recover from. So it’s wise to put in place measures to help prevent or mitigate cache stampedes.