How ScyllaDB Simulates Real-World Production Workloads with the Rust-Based “latte” Benchmarking Tool
Learn why we use a little-known benchmarking tool for testing Before using a tech product, it’s always nice to know its capabilities and limits. In the world of databases, there are a lot of different benchmarking tools that help us assess that… If you’re ok with some standard benchmarking scenarios, you’re set – one of the existing tools will probably serve you well. But what if not? Rigorously assessing ScyllaDB, a high-performance distributed database, requires testing some rather specific scenarios, ones with real-world production workloads. Fortunately, there is a tool to help with that. It is latte: a Rust-based lightweight benchmarking tool for Apache Cassandra and ScyllaDB.
Special thanks to Piotr Kołaczkowski for implementing the latte benchmarking tool.
We (the ScyllaDB testing team) forked it and enhanced it. In this blog post, I’ll share why and how we adopted it for our specialized testing needs.
About latte
Our team really values latte’s “flexibility.”
Want to create a schema using a user defined type (UDT), Map, Set, List, or any other data type? Ok.
Want to create a materialized views and query it? Ok.
Want to change custom function behavior based on elapsed time? Ok.
Want to run multiple custom functions in parallel? Ok.
Want to use small, medium, and large partitions? Ok.
Basically, latte lets us define any schema and workload functions.
We can do this thanks to its implementation design. The latte
tool is a type of engine/kernel and rune scripts
are essentially the “business logic” that’s written separately. Rune scripts
are an enhanced, more powerful, analog of what cassandra-stress calls user profiles
. The rune scripting language is dynamically-typed and native to the Rust programming language ecosystem.
Here’s a simple example of a rune script:
In the above example of a rune script, we defined 2 required functions (schema
and prepare
) and one custom to be used as our workload –myinsert
.
First, we create a schema:
Then, we use the latte run
command to call our custom myinsert
function:
The replication_factor
parameter above is a custom parameter. If we do not specify it, then latte will use its default value, 3
. We can define any number of custom parameters.
How is latte different from other benchmarking Tools?
Based on our team’s experiences, here’s how latte
compares to the 2 main competitors: cassandra-stress
and ycsb
:
How is our fork of latte different from the original latte project?
At ScyllaDB, our main use case for latte is testing complex and realistic customer scenarios with controlled disruptions. But (from what we understand), the project was designed to perform general latency measurements in healthy DB clusters. Given these different goals, we changed some features (“overlapping features”) – and added other new ones (“unique to our fork”). Here’s an overview:
Overlapping features differences
Latency measurement.
Fork-latte accounts for coordinated omission in latencies
The original project doesn’t consider the “coordinated omission” phenomenon.
Saturated DB impact.
When a system under test cannot keep up with the load/stress, fork-latte tries to satisfy the “rate”, compensating for missed scheduler ticks ASAP.
Source-latte pulls back on violating the rate requirement and doesn’t later compensate for missed scheduler ticks.
This isn’t a “bug”; it is a design decision which also violates the idea of proper latency calculation related to the “coordinated omission” phenomenon.
Retries.
We enabled retries by default; there, it is disabled by default.
Prepared statements.
Fork-latte supports all the CQL data types available in ScyllaDB Rust Driver.
The source project has limited support of CQL data types.
ScyllaDB Rust Driver.
Our fork uses the latest version – “1.2.0”
The source project sticks to the old version “0.13.2”
Stress execution reporting.
Report is disabled by default in fork-latte.
It’s enabled in source-latte.
Features unique to our fork
Preferred datacenter support.
Useful for testing multi-DC DB setups
Preferred rack support.
Useful for testing multi-rack DB setups
Possibility to get a list of actual datacenter values from the DB nodes that the driver connected to.
Useful for creating schema with dc-based keyspaces
Sine-wave rate limiting.
Useful for SLA/Workload Prioritization demo and OLTP testing with peaks and lows.
Batch query type support.
Multi-row partitions.
Our fork can create multi-row partitions of different sizes.
Page size support for select queries.
Useful using multi-row partitions feature.
HDR histograms support.
The source project has only 1 way to get the HDR histograms data
It stores HDR histograms data in RAM till the end of a stress command execution and only in the end releases it as part of a report.
Leaks RAM.
Forked latte supports the above inherited approach and one more:
Real-time streaming of HDR histogram data not storing in RAM.
No RAM leaks.
Rows count validation for select queries.
Useful for testing data resurrection.
Example: Testing multi-row partitions of different sizes
Let’s look at one specific user scenario where we applied our fork of latte to test ScyllaDB. For background, one of the user’s ScyllaDB production clusters was using large partitions which could be grouped by size in 3 groups: 2000, 4000 and 8000 rows per partition. 95% of the partitions had 2000 rows, 4% of partitions had 4000 rows, and the last 1% of partitions had 8000 rows. The target table had 20+ different columns of different types. Also, ScyllaDB’s Secondary Indexes (SI) feature was enabled for the target table.
One day, on one of the cloud providers, latencies spiked and throughput dropped. The source of the problem was not immediately clear. To learn more, we needed to have a quick way to reproduce the customer’s workload in a test environment.
Using the latte
tool and its great flexibility, we created a rune script covering all the above specifics. The simplified rune script looks like the following:
Assume we have a ScyllaDB cluster where one of the nodes has a 172.17.0.2
IP address. Here is the command to create the schema we need:
And here is the command to populate the just-created table:
To read from the main table and from the MV, use a similar command – just replacing the function name to get
and get_from_mv
respectively.
So, the usage of the above commands allowed us to get a stable issue reproducer and work on its solution.
Working with ScyllaDB’s Workload Prioritization feature
In other cases, we needed to:
Create a Workload Prioritization (WLP) demo.
Test an OLTP setup with continuous peaks and lows to showcase giving priority to different workloads.
And for these scenarios, we used a special latte feature called sine wave rate
. This is an extension to the common rate-limiting
feature. It allows us to specify how many operations per second we want to produce. It can be used with following command parameters:
And looking at the monitoring, we can see the following picture of the operations per second graph:
Internal testing of tombstones (validation)
As of June 2025, forked latte supports row count validation. It is useful for testing data resurrection. Here is the rune script for latte to demonstrate these capabilities:
As before, we create the schema first:
Then, we populate the table with 100k rows using the following command:
To check that all rows are in place, we use command similar to the one above, change the function to be get
, and define the validation strategy to be fail-fast
:
The supported validation strategies are retry
, fail-fast
, ignore
.
Then, we run 2 different commands in parallel. Here is the first one, which deletes part of the rows:
Here is the second one, which knows when we expect 1 row and when we expect none:
And here is the timing of actions that take place during these 2 commands’ runtime:
That’s a simple example of how we can check whether data got deleted or not.
In long-running testing scenarios, we might run more parallel commands, make them depend on the elapsed time, and many more other flexibilities.
Conclusions
Yes, to take advantage of latte, you first need to study a bit of rune
scripting. But once you’ve done that to some extent, especially having available examples, it becomes a powerful tool that is capable of covering various scenarios of different types.