Expedia Group's Migration Journey from Apache Cassandra to ScyllaDB
Learn what made Expedia, a multibillion dollar travel company, move from Cassandra to ScyllaDB, and the lessons they drew from the process.
In the above video, Singaram “Singa” Ragunathan and Dilip Kolosani presented their technical challenges, and how ScyllaDB was able to solve them.
Currently there are multiple applications at Expedia built on top of Apache Cassandra. “Which comes with its own set of challenges,” Singa noted. He highlighted four top issues:
Garbage Collection: The first well-known issue is with Java Virtual Machine (JVM) Garbage Collection (GC). Singa noted, “Apache Cassandra, written in Java, brings in the onus of managing garbage collection and making sure it is appropriately tuned for the workload at hand. It takes a significant amount of time and effort, as well as expertise required, to handle and tune the GC pause for every specific use case.”
Burst Traffic & Infrastructure Costs: The next two interrelated issues for Expedia are burst traffic which leads to overprovisioning. “With burst traffic or a sudden peak in the workload there is significant disturbance to the p99 response time. So we end up having buffer nodes to handle this peak capacity, which results in more infrastructure costs.”
Infrequent Releases: “Another significant worry” for Expedia, according to Singa, was Cassandra’s infrequent release schedule. “According to the past years’ history, the number of Apache Cassandra releases has significantly slowed down.”
Showing a comparative timeline between Cassandra and ScyllaDB, Singa continued, “We would like to compare the open source commits in Cassandra versus ScyllaDB in a timeline chart here, and highlight the amount of releases that ScyllaDB has gone through in the same past three year period. As you can see, it gives enough confidence towards ScyllaDB that, given an issue or bug with a specific release, it will be soon addressed with a patch. In contrast with Apache Cassandra, one might have to wait longer.
Timeline created by Expedia showing the update frequency of Cassandra compared to ScyllaDB.
“So why did we end up with ScyllaDB?” Peace of mind, both operationally and expense-wise, was key. “Thanks to the C++ backend of ScyllaDB we no longer have to worry about ‘stop-the-world’ GC pauses. Also, we were able to store more data per node, and achieve more throughput per node, thereby saving significant dollars for the company.”
For Singa, another key issue was ease of migration. “From an Apache Cassandra code base, it’s frictionless for developers to switch over to ScyllaDB. For the use cases that we tried, there weren’t any data model changes necessary. And the ScyllaDB driver was compatible, and a swap-in replacement with Cassandra driver dependency. With a few tweaks to our automation framework that provisions an Apache Cassandra cluster, we were able to provision a ScyllaDB cluster.”
Lastly, in terms of being able to entrust Expedia’s business to ScyllaDB, “A clear roadmap and support from ScyllaDB’s Slack community comes in very handy.”
Expedia Geosystem on ScyllaDB
“The candidate application chosen for this proof-of-concept (POC) is our geosystem that provides information about geographical entities and the relationships between them. It aggregates data from multiple systems, like hotel location info, third party data, etc. This rich geography dataset enables different types of data searches using a simple REST API while guaranteeing single-digit millisecond p99 read response time.”
Sanga then described the prior existing architecture, “To speed up API responses, we are using multilayered cache, with Redis as a first layer, and Cassandra as a second layer.” The goal was to replace both Redis and Cassandra with ScyllaDB.
Dilip then described the ScyllaDB cluster that they ran tests on:
- 24 nodes
- 25 TB of data
- i3.2xlarge AWS instances
- ScyllaDB Open Source 4.1.4
“The idea is to test if a lower capacity ScyllaDB cluster can match the performance of our existing Cassandra cluster or not,” Dilip explained. “We didn’t face any major challenges migrating from Cassandra to ScyllaDB. We are not using any fancy features like secondary indexes, materialized views and lightweight transactions, so we kept our data model and application drivers as-is while migrating from Cassandra to ScyllaDB.”
In the performance comparison between Cassandra and ScyllaDB, Dilip showed how write latency was essentially flat and negligible for both, “But the real winner here is p99 read latency for ScyllaDB, which is consistently around 5 ms throughout the day.” He showed how Cassandra latency was “spiky in nature, and it varies from 20 to 80 ms throughout the day depending on the traffic pattern.”
For throughput, “ScyllaDB was also able to deliver throughput close to 3x compared with Apache Cassandra. So for applications that require high throughput and single-digit read latency, ScyllaDB is recommended over Apache Cassandra.” Their benchmark also helped them prove that ScyllaDB would provide a 30% infrastructure cost savings.