Cache replacement policies explained
Learn how cache replacement policies work, why they matter, and how common algorithms like LRU, LFU, FIFO, MRU, and advanced adaptive methods impact performance in modern systems.
Cache replacement is the policy and algorithm that manages which items to evict from a cache when the cache is full. A cache is a high-speed storage layer, either hardware or software-based, that holds a subset of data so future data requests get served faster than retrieving them again from the original data store.
However, because caches have limited capacity, they cannot hold everything. When new data needs to be cached, and the cache is full, the system must decide which existing cache entry to remove. The decision-making process is governed by the cache replacement policy. A well-designed replacement policy aims to retain data that will be reused soon, which is called cache hits, while discarding data that is less likely to be needed again, improving the cache’s hit rate and effectiveness.
Developing an optimal cache replacement strategy is challenging because it requires predicting which data will be needed in the future. In theory, the best policy, called Bélády’s algorithm, would evict the item that won’t be used for the longest time in the future, but in practice, we don’t know that. Instead, cache replacement uses various rules of thumb to figure this out.
The need for cache replacement policies
Without an effective replacement policy, a cache’s performance gets worse when it fills up. Cache replacement policies provide a systematic way to decide which entries to evict, making the cache more efficient. The goal is to keep in cache those items most likely to be requested again soon, improving the likelihood of future hits.
Cache replacement policies do the following typical workload memory hierarchy behavior:
Temporal locality, or recently retrieved items, tend to be retrieved again
Spatial locality, or items near recently retrieved data are more likely to be retrieved
A good replacement strategy takes advantage of these patterns. For example, if temporal locality is more common, a policy favoring recently used items will work well. On the other hand, if certain items are retrieved often over a long period, a strategy that factors in access frequency might perform better.
Cache replacement policies also prevent cache thrashing. Thrashing occurs when a cache is too small for the working data set, causing constant evictions and misses with little reuse. In such cases, a too simplistic or poorly chosen policy might evict an item that’s still being used, leading to an immediate cache miss when that item is requested again. Replacement algorithms try to keep this from happening by making informed guesses about usage patterns.
Ultimately, the need for cache replacement policies stems from the constraint of finite cache size combined with the unpredictable nature of future access patterns. By applying heuristics based on past accesses, such as recency and frequency, these policies aim to keep the cache content as relevant as possible over time.
Common cache replacement policies
Computing systems use a variety of cache replacement policies, each with its own best-guess approach to evicting “least needed” data. Here are some of the most widely used algorithms and how they work:
Least recently used (LRU)
Least recently used (LRU) is one of the most common cache replacement policies. LRU operates on the assumption that data that hasn’t been used for the longest time is least likely to be needed soon. In practice, an LRU cache evicts the item that was retrieved longest ago, or the oldest “recently used” entry, when space is needed. Implementing LRU typically involves tracking access order, often with a linked list or similar structure where each new access moves an item to the “most recently used” end.
LRU tends to work well when the workload exhibits strong temporal locality, meaning recently retrieved items are more likely to be retrieved again, which is true for many real-world access patterns. However, LRU doesn’t work well with certain patterns. For example, in a scanning workload that cyclically goes through a dataset larger than the cache, LRU constantly evicts items that will be used again soon. Despite its shortcomings in some cases, LRU remains popular due to its simplicity and generally good performance for a wide range of workloads.
Least frequently used (LFU)
Least frequently used (LFU) eviction removes the item that has been retrieved the fewest number of times. The intuition behind LFU is that items that haven’t been retrieved often in the past are less likely to be needed in the future. LFU is especially useful in scenarios where certain data items are “hotter,” or retrieved repeatedly, and others are rarely touched. By counting accesses, LFU keeps the hot items in cache and evicts the cold ones.
One challenge with a basic LFU approach is that it may keep items that were used to be retrieved a lot a long time ago but are no longer relevant. To address this, many LFU implementations include an aging mechanism or time decay factor, which gradually reduces the recorded frequencies of older accesses so the cache doesn’t get stuck holding items that were once popular but aren’t anymore.
LFU generally shines in workloads with stable popularity distribution, where some items are consistently far more popular than others. But it can be slower to adapt than LRU when access patterns shift, because an item’s past frequency might not quickly reflect a recent change in usage.
First-in, first-out (FIFO)
First-in, first-out (FIFO) is a simple policy that evicts the oldest item in the cache, or the one that was added first, without any regard to how often or how recently it was retrieved. Conceptually, the cache treats entries like a queue, where the longest-resident item is evicted first.
FIFO is efficient to implement because it uses insertion order and doesn’t require complex usage tracking, but sometimes it doesn’t perform well because it doesn’t consider how recently the item was retrieved. For instance, an item that was loaded early and is still frequently used might be evicted under FIFO simply because it’s “old.”
FIFO might be appropriate sometimes, such as when cache entries naturally expire after some time or when used in combination with time-to-live. However, by itself, FIFO is generally not as good as LRU or LFU at taking advantage of access patterns. In fact, in some cases, making the cache space bigger under FIFO performs worse!
Most recently used (MRU)
Most recently used (MRU) is essentially the opposite of LRU: It evicts the item that was retrieved most recently. At first glance, this sounds counterintuitive. Why remove something that was just used? MRU is actually beneficial in certain special cases. Consider a workload where whenever an item is retrieved, it’s needed only once and not again for a long time, such as in a scan or one-pass workload. In these cases, the most recently used item might indeed be the one least likely to be reused soon, because the system is moving on to other data. MRU capitalizes on this pattern by evicting the newest item, making space for data that is coming next.
While MRU is not commonly used as a general policy, it is effective for specific access patterns or combined with other policies. Some database systems and two-level cache designs use an MRU section to handle bursty one-off accesses so they don’t disrupt an LRU-managed working set.
Random replacement
Random replacement evicts a randomly chosen cache entry. This approach does not use any usage pattern information. The main advantage of random eviction is its simplicity and low overhead; there is no need to maintain any data structures to track access history or frequencies.
Surprisingly, random replacement works reasonably well in practice, sometimes nearly as well as LRU for certain workloads. It avoids the pathological worst-case scenarios that sometimes plague more structured algorithms. For example, it won’t consistently make the “wrong” choice for a repeating access pattern, because its choices are nondeterministic.
However, purely random eviction also means you might remove a popular item and leave a colder item in cache by chance. In large caches with uniform access distributions, random may work, but generally it’s used as a baseline or fallback rather than the primary policy.
Adaptive and advanced policies (ARC and others)
Over the years, researchers have developed more sophisticated algorithms to overcome the limitations of the basic policies. One notable example is adaptive replacement cache (ARC), which dynamically adjusts to workloads by keeping two lists: one tracking recent accesses and another tracking frequent accesses. ARC continuously balances between LRU and LFU behavior based on which list is experiencing more hits, adapting to whether the current workload favors recency or frequency. It has been shown to outperform standard LRU on a variety of access patterns and is scan-resistant, meaning it won’t let one-time full-cache scans pollute the cache for long.
Another advanced approach is multi-queue algorithms, such as 2Q, which use multiple queues to filter out transient versus long-term items. CLOCK (and its variants) is an efficient implementation of LRU that uses a circular buffer and a reference bit, often used in operating systems, to approximate LRU with less overhead.
More recent implementations, such as the TinyLFU strategy used in caches such as Caffeine for Java, combine an LFU admission policy with an LRU eviction policy in a two-tiered approach. This yields a high hit rate by admitting only items likely to be frequently used, while still quickly evicting items that fall out of use.
Each of these advanced policies aims to strike a better balance for particular patterns or to get better replacement performance without the overhead of caching. The choice of algorithm in practice often depends on the specific workload characteristics and the environment in which the cache operates, such as hardware versus software cache or single node versus distributed.
Choosing the right replacement policy
No one cache replacement policy is best in all cases; each has tradeoffs. The effectiveness of a given policy depends on workload characteristics and system constraints. Here are factors to consider when choosing or tuning a cache replacement strategy:
Access pattern and locality: Workloads with strong temporal locality, where recently retrieved items are more likely to repeat soon, tend to favor recency-based policies such as LRU. In contrast, workloads with skewed popularity, where a few items account for most accesses over time, might benefit from frequency-based strategies such as LFU.
If the access pattern involves long sequential scans or one-time use of data, such as streaming through a large dataset, a basic LRU might suffer as it would evict everything only to have misses on the next pass. Algorithms that detect and adapt to scans, such as ARC or 2Q, or even MRU elements, handle this better.
Cache size relative to working set: The size of the cache and how much of the overall dataset fits in it influences which policy performs well. If the cache is small relative to the working set, no policy avoids a high miss rate, but some might handle it better. For example, random eviction might do as well as any policy in a thrashing scenario where useful data isn’t kept anyway.
On the other hand, if the cache is large enough to hold most of the hot data, a policy that keeps the right items, such as LRU or ARC, works better.
Overhead and complexity: Different policies have more or less metadata and computation overhead. LRU typically requires updating data structures on every retrieval by moving a list node to the front, which has some cost, while LFU requires updating frequency counters on each retrieval. More complex schemes, such as ARC, maintain multiple lists and may do additional bookkeeping per operation. In high-throughput systems, this takes more time than caching saves.
These cases might use simpler approximations instead, such as segmented LRU or clock-pro algorithms that reduce overhead by not updating on every access. Redis, an in-memory cache, uses an approximate LRU that samples a subset of entries to decide on eviction, trading some accuracy for efficiency. When choosing a policy, make sure the algorithm’s overhead doesn’t outweigh its caching gains.
Adaptability: Workloads aren’t static; patterns can change over time, such as peak versus off-peak, or introduction of new data types. A replacement policy that adapts to shifts is valuable. For example, ARC’s ability to adjust to recency versus frequency is a form of adaptability. LFU with aging adapts to gradually changing popularities.
In contrast, a static policy such as LFU without aging might “lock” the cache with items that used to be popular but aren’t anymore. When a system experiences phase changes in workload, adaptive or hybrid policies are more efficient.
Data priority and domain knowledge: Sometimes, the importance of cache entries isn’t uniform. It might be important to keep certain data cached for performance or business reasons, regardless of recency or frequency. For example, in web caching, you might always prefer to keep certain assets or metadata cached.
Many real-world caching systems allow tuning or hints, such as pinning items or weighting some items as more valuable. Replacement policies incorporate such priorities by not evicting pinned items or by using a weighted cost-benefit analysis of an item’s value versus how much space it occupies. This strays into cache policy more broadly, beyond purely the replacement algorithm, but it’s an important practical consideration.
Ultimately, choosing the right replacement policy often involves analyzing access logs or simulating with representative workloads. In practice, many systems also provide configurable policies or parameters. For example, a cache might allow switching between LRU and LFU or tweaking the “aging” rate for LFU so that architects adjust behavior to match their situation. In summary, the best policy is context-dependent: it emerges from understanding the data access patterns and performance goals of the application.
Limitations and challenges of caching
While caching and replacement policies may improve average performance, they also introduce their own challenges. One issue is unpredictable latency due to cache misses. If a system relies on a cache for speed, a cache miss responds more slowly because the data must be fetched from a slower backend store. Inconsistencies in hit rate, such as during traffic spikes, after a cache purge, or upon a node restart when the cache is cold, may cause inconsistent response times: Users experience fast responses most of the time, but occasionally a slow response when a miss occurs and the request falls back to the primary database. This variability degrades tail latency, also known as the 99th percentile response time or worst-case delays.
For applications that require predictable response times, this is a problem. It’s even worse in distributed systems, because if one user request fans out to many cache lookups, just one miss among many hits slows down the entire operation, known as “tail at scale.” Keeping the cache hit rate high under all conditions is difficult because it often requires overprovisioning to make the cache large enough for almost all data or complex pre-caching and warming strategies, especially after failures or deployments.
Another challenge is cache consistency. When the underlying data changes, cache entries may become stale. Updating cached data in tandem with database updates is particularly hard, or applications may read outdated information from the cache. Some systems use time-to-live (TTL) expirations as a safety net so stale data eventually gets refreshed, but this introduces a trade-off between staleness and load: too long a TTL and data may be stale; too short and the cache might unnecessarily drop useful data. Designing a cache scheme that keeps data fresh without sacrificing hit rate is hard. It often adds complexity to application logic to publish changes to cache or use write-through/write-back strategies.
There is also the matter of memory cost and scalability. Maintaining a high hit rate might require a cache so large that it’s almost as big as the dataset itself, which costs a lot if using expensive memory such as DRAM. In such cases, an external cache might not be worth it. Perhaps the database itself should be engineered to handle the load more directly.
Every extra layer, such as a cache in front of a database, is another moving part that may fail. If the cache layer goes down or becomes a bottleneck, the system might be worse off than if there were no cache at all. This is why some architectures choose databases that serve requests with predictable low latency directly from persistent storage, reducing the need for a separate caching tier.
In summary, caching is powerful but comes with pitfalls: uneven performance if not managed well, added complexity in ensuring data correctness, and additional infrastructure that needs scaling and monitoring. Using cache effectively requires considering these challenges so the cache helps rather than hurts the overall system reliability and predictability.
Aerospike and cache replacement
Cache replacement policies will continue to play an important role in system design as engineers strive to bridge the gap between fast memory and slower storage. An efficient caching layer, guided by the right replacement strategy, makes applications faster by keeping relevant data close at hand. However, maintaining a cache isn’t free; it is more complex and sometimes performs unpredictably.
This is where Aerospike’s approach to data management comes in. Aerospike delivers data quickly, often eliminating the need for a separate caching tier. With its patented Hybrid Memory Architecture that stores indexes in RAM and data on high-speed persistent storage, and smart data distribution, Aerospike offers in-memory speeds directly from the database. It incorporates built-in eviction and expiration policies such as configurable TTLs and efficient LRU eviction as part of the core data store, so stale or less-used data is phased out without external cache management. The result is predictable high performance even as data grows, and much simpler application stacks because you no longer have to maintain a cache synchronization layer or worry about cache misses causing tail latency spikes.
For organizations struggling with the limitations of traditional caches or looking to simplify their architecture, Aerospike provides a compelling solution. It delivers caching’s benefits of fast access and high throughput without the drawbacks by making the database itself as close to RAM speed as possible. This means fewer moving parts and more consistent performance.
