Expedia Group: Our Migration Journey to Scylla
Expedia Group, the multi-billion-dollar travel brand, presented at our recent Scylla Summit 2021 virtual event. Singaram “Singa” Ragunathan and Dilip Kolosani presented their technical challenges, and how Scylla was able to solve them.
Currently there are multiple applications at Expedia built on top of Apache Cassandra. “Which comes with its own set of challenges,” Singa noted. He highlighted four top issues:
- Garbage Collection: The first well-known issue is with Java Virtual Machine (JVM) Garbage Collection (GC). Singa noted, “Apache Cassandra, written in Java, brings in the onus of managing garbage collection and making sure it is appropriately tuned for the workload at hand. It takes a significant amount of time and effort, as well as expertise required, to handle and tune the GC pause for every specific use case.”
- Burst Traffic & Infrastructure Costs: The next two interrelated issues for Expedia are burst traffic which leads to overprovisioning. “With burst traffic or a sudden peak in the workload there is significant disturbance to the p99 response time. So we end up having buffer nodes to handle this peak capacity, which results in more infrastructure costs.”
- Infrequent Releases: “Another significant worry” for Expedia, according to Singa, was Cassandra’s infrequent release schedule. “According to the past years’ history, the number of Apache Cassandra releases has significantly slowed down.”
Showing a comparative timeline between Cassandra and Scylla, Singa continued, “We would like to compare the open source commits in Cassandra versus Scylla in a timeline chart here, and highlight the amount of releases that Scylla has gone through in the same past three year period. As you can see, it gives enough confidence towards Scylla that, given an issue or bug with a specific release, it will be soon addressed with a patch. In contrast with Apache Cassandra, one might have to wait longer.
Timeline created by Expedia showing the update frequency of Cassandra compared to Scylla.
“So why did we end up with Scylla?” Peace of mind, both operationally and expense-wise, was key. “Thanks to the C++ backend of Scylla we no longer have to worry about ‘stop-the-world’ GC pauses. Also, we were able to store more data per node, and achieve more throughput per node, thereby saving significant dollars for the company.”
For Singa, another key issue was ease of migration. “From an Apache Cassandra code base, it’s frictionless for developers to switch over to Scylla. For the use cases that we tried, there weren’t any data model changes necessary. And the Scylla driver was compatible, and a swap-in replacement with Cassandra driver dependency. With a few tweaks to our automation framework that provisions an Apache Cassandra cluster, we were able to provision a Scylla Open Source cluster.”
Lastly, in terms of being able to entrust Expedia’s business to Scylla, “A clear roadmap and support from ScyllaDB’s Slack community comes in very handy.”
Expedia Geosystem on Scylla
“The candidate application chosen for this proof-of-concept (POC) is our geosystem that provides information about geographical entities and the relationships between them. It aggregates data from multiple systems, like hotel location info, third party data, etc. This rich geography dataset enables different types of data searches using a simple REST API while guaranteeing single-digit millisecond p99 read response time.”
Sanga then described the prior existing architecture, “To speed up API responses, we are using multilayered cache, with Redis as a first layer, and Cassandra as a second layer.” The goal was to replace both Redis and Cassandra with Scylla.
Dilip then described the Scylla cluster that they ran tests on:
- 24 nodes
- 25 TB of data
- i3.2xlarge AWS instances
- Scylla Open Source 4.1.4
“The idea is to test if a lower capacity Scylla cluster can match the performance of our existing Cassandra cluster or not,” Dilip explained. “We didn’t face any major challenges migrating from Cassandra to Scylla. We are not using any fancy features like secondary indexes, materialized views and lightweight transactions, so we kept our data model and application drivers as-is while migrating from Cassandra to Scylla.”
In the performance comparison between Cassandra and Scylla, Dilip showed how write latency was essentially flat and negligible for both, “But the real winner here is p99 read latency for Scylla, which is consistently around 5 ms throughout the day.” He showed how Cassandra latency was “spiky in nature, and it varies from 20 to 80 ms throughout the day depending on the traffic pattern.”
For throughput, “Scylla was also able to deliver throughput close to 3x compared with Apache Cassandra. So for applications that require high throughput and single-digit read latency, Scylla is recommended over Apache Cassandra.” Their benchmark also helped them prove that Scylla would provide a 30% infrastructure cost savings.
Next Steps
Expedia’s roadmap for 2021 is to replace the Cassandra/Redis architecture with Scylla, since it by itself can support single-digit millisecond latencies.
You can learn more about Expedia’s benchmark results and their plans for Scylla by watching their full Scylla Summit presentation.
If you have plans of your own for Scylla, we’d love to hear about them. You can contact us directly, or join our Slack channel to engage with our engineers and the Scylla community.