How Freshworks Cut Database P99 Latency by 95% – with Lower Costs
How Freshworks tackled high tail latencies, Cassandra admin burden, and any little surge causing an increase in timeouts
Freshworks creates AI-boosted business software that is purpose-built for IT, customer support, sales, and marketing teams to work more efficiently. Given their scale, managing petabytes of data across multiple RDBMS and NoSQL databases was a challenge. Preparing for 10x growth under such circumstances required a strategic approach that would allow them to scale without interrupting business continuity. Spoiler: this approach included ScyllaDB.
In the following video, Sunderjeet Singh (ScyllaDB India Manager) kicks off with an introduction to ScyllaDB and Freshworks. Then, Sreedhar Gade (VP of Engineering at Freshworks) shares how Freshworks architected a solution that enables the company to scale operations while keeping costs under control.
Here are highlights from the talk, as shared by Sreedhar Gade…
About Freshworks
Freshworks was founded in 2010 with the goal of empowering millions of companies across the world in multiple domains. The company went public in 2021. Today Freshworks’ revenue is near $600 million. We are relied upon by customers in over 120 countries, and have earned many recognitions across industry verticals.
Technical Challenges
From an application perspective, serving Freshworks’ global customer base requires the team to serve products and data with ultra-low latency and high performance. When using Cassandra, the team faced challenges such as:
- High tail latencies. Every SaaS product vendor is good at serving with high performance up to the 80th or 90th percentiles. But the long tail is where the performance actually starts getting impacted. Improving this can really improve the customer experience.
- Administrative burden. We don’t want to keep adding SREs and database engineers in step with our company growth. We want to make sure that we stay lean and mean – but still be able to manage a large fleet of database instances.
- Any slight surge in traffic led to an increase in timeouts. Any surge in traffic could introduce problems. And with a global customer base, traffic patterns are quite unpredictable. Surges can lead to timeouts – unless we’re able to rapidly scale up and down.
Why ScyllaDB
ScyllaDB proved that it could solve these challenges for our former Cassandra use cases. It helps us deliver engaging experiences to our customers across the world. It helps us reduce toil for our engineers. It’s easy to scale up. And more importantly, it’s very cost effective — easy on the eyes for our CFO. 😉
Migrating from Cassandra to ScyllaDB
To start the migration, we enabled zero downtime dual writes on the Cassandra databases that we wanted to migrate to ScyllaDB. Then, we took a snapshot of the existing Cassandra cluster and created volumes in the ScyllaDB cluster. We started with around 10 TB as part of this project, then moved it forward in different phases. And once the Cassandra migration was done, we used the CDM migrator to validate the migration quality.
The Results So Far
We are currently live with ScyllaDB in one of the regions, and we’ve been able to migrate about 25% of the data (more than two terabytes) as part of this project. We have already achieved a 20X reduction in tail latency – we brought the P99 latency down from one second to 50 milliseconds.
What’s Next with ScyllaDB at Freshworks
There are many more opportunities with ScyllaDB at Freshworks, and we have great plans going forward. One of the major projects we’re considering involves taking the text/BLOB data that’s currently stored in MySQL and moving it into ScyllaDB. We expect that will give us cost benefits as well as a performance boost.
We are also looking to use ScyllaDB to improve the scalability, performance, and maintenance-related activities across our existing Cassandra workloads, across all our business units and products. This will help ensure that our products can scale 10x and scale on demand.