How Comcast Reduced Cassandra P99, P999, and P999 Latencies
"What we saw was pretty phenomenal. We simulated 2.5X our peak load with a 95% drop in our response times. That's value that you pay back to the end-user right away."
Phil Zimich, Senior Director of Engineering at Comcast
About Comcast
Comcast is a global media and technology company with three primary businesses: Comcast Cable (one of the United States’ largest video, high-speed internet, and phone providers to residential customers), NBCUniversal, and Sky.
Comcast’s Database Use Case
The X1 Scheduler powers the DVR and program reminder experience on the Comcast X1 platform, a cable and streaming video service that now supports more than 31 million set top boxes and “second screen” devices used on a monthly basis for 15 million households. Their X1 Scheduler processes more than 2 billion RESTful calls daily. To meet that scale, the X1 Scheduler uses multiple datastore technologies.
Comcast’s Cassandra Challenges: High Long-Tail Latencies–Plus Massive Node Sprawl
Over the course of 7 years, the project expanded from supporting 30K devices to over 31M devices. They first began with Oracle, then later moved to Apache Cassandra (via DataStax). When Cassandra’s long tail latencies proved unacceptable at the company’s rapidly-increasing scale, they began exploring new options. In addition to lowering latency, the team also wanted to reduce complexity. To mask Cassandra’s latency issues from users, they placed 60 cache servers in front of their database. Keeping this cache layer consistent with the database was causing major admin headaches.
Phil Zimich, Senior Director of Engineering at Comcast explained, “The P99, P999, and P999 is really where our stress was with Cassandra. The median times were tolerable. But it’s the P999 and P9999 where our systems started to fall over.”
Ultimately, these challenges led Comcast to consider an alternate solution: ScyllaDB.
ScyllaDB is the #1 Apache Cassandra alternative. ScyllaDB provides the same CQL interface and queries, the same drivers, even the same on-disk SSTable format – but with a modern architecture designed to eliminate Cassandra performance issues, limitations, and operational barriers. ScyllaDB is built from the ground up in C++. No Java overhead. No garbage collection. And performance tuning? It’s automated.
Migrating from Cassandra to ScyllaDB
Comcast selected ScyllaDB, the NoSQL database for data-intensive apps that require high performance and low latency. ScyllaDB’s close-to-the-metal, shard-per-core architecture delivers greater performance for a fraction of the cost of DynamoDB, Apache Cassandra, MongoDB, and Google Bigtable. Thanks to ScyllaDB’s ability to take full advantage of modern infrastructure — allowing it to scale up as much as scale out — Comcast was able to replace 962 Cassandra nodes with just 78 nodes of ScyllaDB. They improved overall availability and performance while completely eliminating the 60 cache servers. The result: a 10x latency improvement with the ability to handle over twice the requests – at a fraction of the cost.
Comcast’s Cassandra Migration Results
- By moving from Cassandra to ScyllaDB, Comcast:
- Reduced their total database infrastructure from 962 Cassandra nodes to 78
- Decreased P99, P999, and P9999 latencies by 95%
- Achieved 60% savings over Cassandra operating costs
- Saved $2.5M annually in infrastructure costs and staff overhead