Why Cache Data? [Latency Book Excerpt]

Cynthia Dunlop

Latency is a monstrous concern here at ScyllaDB. So we’re pleased to bring you excerpts from Pekka Enberg’s new book on Latency…and a masterclass with Pekka as well! Latency is a monstrous concern here at ScyllaDB. Our engineers, our users, and our broader community are obsessed with it…to the point that we developed an entire conference on low latency. [Side note: That’s P99 CONF, a free + virtual conference coming to you live, October 22-23.] Join P99 CONF – Free + Virtual We’re delighted that Pekka Enberg decided to write an entire book on latency to ~~punish himself~~ share his hard-fought latency lessons learned. The book (quite efficiently titled “Latency”) is now off to the printers. From his intro: Latency is so important across a variety of use cases today. Still, it’s a tricky topic because many low-latency techniques are effectively developer folklore hidden in blog posts, mailing lists, and side notes in books. When faced with a latency problem, understanding what you’re even dealing with often takes a lot of time. I remember multiple occasions where I saw peculiar results from benchmarks, which resulted in an adventure down the software and hardware stack where I learned something new. By reading this book, you will understand latency better, how you can measure latency accurately, and discover the different techniques for achieving low latency. This is the book I always wished I had when grappling with latency issues. Although this book focuses on applying the techniques in practice, I will also bring up enough of the background side of things to try to balance between theory and practice. ScyllaDB is sponsoring chapters from the book, so we’ll be trickling out some excerpts on our blogs. Get the Latency book excerpt PDF More good news: Pekka just joined us for a masterclass on Building Low Latency Apps (now available on demand), where he shared key takeaways from that book. Let’s kick off our Latency book excerpts with the start of Pekka’s caching chapter. It’s reprinted here with permission of the publisher. *** Why cache data? Typically, you should consider caching for reducing latency over other techniques if your application or system: Doesn’t need transactions or complex queries. Cannot be changed which makes using techniques such as replication hard. Has compute or storage constraints that prevent other techniques. Many applications and systems are simple enough that a key-value interface, typical for caching solutions, is more than sufficient. For example, you can store user data such as profiles and settings as a key-value pair where the key is the user ID, and the value is the user data in JSON or a similar format. Similarly, session management, where you keep track of logged in user session state is often simple enough that it doesn’t require complex queries. However, caching can eventually be too limiting as you move to more complicated use cases, such as recommendations or ad delivery. You have to look into other techniques. Overall, whether your application is simple enough to use caching is highly use case-specific. Often, you look into caching because you cannot or don’t want to change the existing system. For example, you may have a database system that you cannot change, which does not support replication, but you have clients accessing the database from multiple locations. You may then look into caching some query results to reduce latency and scale the system, which is a typical use of caching. However, this comes with various caveats on data freshness and consistency, which we’ll discuss in this chapter. Compute and storage constraints can also be a reason to use caching instead of other techniques. Depending on their implementation, colocation and replication can have high storage requirements, which may prevent you from using them. For example, suppose you want to reduce access latency to a large data set, such as a product catalog in an e-commerce site. In that case, it may be impractical to replicate the whole data set in a client with the lowest access latency. However, caching parts of the product catalog may still make sense to cache in the client to reduce latency but simultaneously live with the client’s storage constraints. Similarly, it may be impractical to replicate a whole database to a client or a service because database access requires compute capacity for query execution, which may not be there. Caching overview With caching, you can keep a temporary copy of data to reduce access time significantly by reusing the same result many times. For example, if you have a REST API that takes a long time to compute a result, you can cache the REST API results in the client to reduce latency. Accessing the cached results can be as fast as reading from the memory, which can significantly reduce latency. You can also use caching for data items that don’t exist, called negative caching. For example, maybe the REST API you use is there to look up customer information based on some filtering parameters. In some cases, no results will match the filter, but you still need to perform the expensive computation to discover that. In that scenario, you would use negative caching to cache the fact that there are no results, speeding up the search. Of course, caching has a downside, too: you trade off data freshness for reduced access latency. You also need more storage space to keep the cached data around. But in many use cases, it’s a trade-off you are willing to take. Cache storage is where you keep the temporary copies of data. Depending on the use case, cache storage can either be in the main memory or on disk, and cache storage can be accessed either in the same memory address space as the application or over a network protocol. For example, you can use an in-memory cache library value in your application memory or a key-value store such as Redis or Memcached to cache values in a remote server. With caching, an application looks up values based on a cache key from the cache. When the cache has a copy of the value, we call that a cache hit and serve the data access from the cache. However, if there is no value in the cache, we call that scenario a cache miss and must retrieve the value from the backing store. A key metric for an effective caching solution is the cache hit-to-miss ratio, which describes how often the application finds a relevant value in the cache and how frequently the cache does not have a value. If a cache has a high cache hit ratio, it is utilized well, meaning there is less need to perform a slow lookup or compute the result. With a high cache miss ratio, you are not taking advantage of the cache. This can mean that your application runs slower than without caching because caching itself has some overhead. One major complication with caches is cache eviction policies or what values to throw out from the cache. The main point of a cache is to provide fast access but also fit the cache in a limited storage space. For example, you may have a database with hundreds of gigabytes of data. Still, you can only reasonably cache tens of gigabytes in the memory address space of your application because of machine resource limitations. You, therefore, need some policy to determine which values stay in the cache and which ones you can evict if you run out of cache space. Similarly, once you cache a value, you can’t always retain the value in the cache indefinitely if the source value changes. For example, you may have a time-based eviction policy enforcing that a cached value can be at least a minute old before updating to the latest source value. Despite the challenges, caching is an effective technique to reduce latency in your application, in particular when you can’t change some parts of the system and when your use case doesn’t warrant investment in things like colocation, replication, or partitioning. With that in mind, let’s look at the different caching strategies. Caching strategies When adding caching to your application, you must first consider your caching strategy, which determines how reads and writes happen from the cache and the underlying backing store, such as a database or a service. At a high level, you need to decide if the cache is passive or active when there is a cache miss. In other words, when your application looks up a value from the cache, but the value is not there or has expired, the caching strategy mandates whether it’s your application or the cache that retrieves the value from the backing store. As usual, different caching strategies have different trade-offs on latency and complexity, so let’s get right into it. To be continued… Get the Latency book excerpt PDF Join the Latency Masterclass on September 25

Why Cache Data? [Latency Book Excerpt]

ScyllaDB Vector Search: 1B Vectors with 2ms P99s and 250K QPS Throughput

Cut LLM Costs and Latency with ScyllaDB Semantic Caching

Managing ScyllaDB Background Operations with Task Manager

The Cost of Multitenancy

Cache vs. Database: How Architecture Impacts Performance

11X Faster ScyllaDB Backup

P99 CONF 2025 Recap: Latency to LLMs

Building a Movie Recommendation App with ScyllaDB Vector Search

The Fourth Pillar of Observability: Recognizing Diagnostic Procedures

Building a Low-Latency Vector Search Engine for ScyllaDB

Building Meesho’s ML Platform: Lessons from the First-Gen System

Inside the Database Internals Talks at P99 CONF 2025

ScyllaDB X Cloud: An Inside Look with Avi Kivity (Part 3)

The Latency vs. Complexity Tradeoffs with 6 Caching Strategies

ScyllaDB X Cloud: An Inside Look with Avi Kivity (Part 2)

ScyllaDB X Cloud: An Inside Look with Avi Kivity (Part 1)

Be Part of Something Big – Speak at Monster Scale Summit

Beyond Apache Cassandra

We Built a Tool to Diagnose ScyllaDB Kubernetes Issues

How GE Healthcare Took DynamoDB on Prem for Its AI Platform