Rust and ScyllaDB: 3 Ways to Improve Performance
We’ve been working hard to develop and improve the scylla-rust-driver. It’s an open-source ScyllaDB (and Apache Cassandra) driver for Rust, written in pure Rust with a fully async API using Tokio. You can read more regarding its benchmark results and how our developers solved a performance regression.
In different benchmarks, the Rust driver proved more performant than other drivers, which gave us the idea of using it as a unified core for other drivers as well.
This blog post is based on the ScyllaDB University Rust lesson. In this post, I’ll cover the essentials of the lesson. You’ll learn about Prepared Statements, Paging, and Retries and see an example using the ScyllaDB Rust driver. The ultimate goal is to demonstrate how some minor changes can significantly improve the application’s performance.
To continue learning about ScyllaDB and Rust with additional exercises and hands-on examples, log in or register for ScyllaDB University (it’s free). You’ll be on the path to new certification and also gain unlimited access to all of our NoSQL database courses.
Starting ScyllaDB in Docker
Download the example from git:
git clone https://github.com/scylladb/scylla-code-samples.git
cd scylla-code-samples/Rust_Scylla_Driver/chat/
To quickly get ScyllaDB up and running, use the official Docker image:
docker run \
-p 9042:9042/tcp \
--name some-scylla \
--hostname rust-scylla \
-d scylladb/scylla:4.5.0 \
--smp 1 --memory=750M --overprovisioned 1
Example Application
In this example, you’ll create a console application that reads messages from standard input and puts them into a table in ScyllaDB.
First, create the keyspace and the table:
docker exec -it some-scylla cqlsh
CREATE KEYSPACE IF NOT EXISTS log WITH REPLICATION = {
'class': 'SimpleStrategy',
'replication_factor': 1
};
CREATE TABLE IF NOT EXISTS log.messages (
id bigint,
message text,
PRIMARY KEY (id)
);
Now, look at the main code of the application:
The application connects to the database, reads some lines from the console, and stores them in the table log.messages
. It then reads those lines from the table and prints them.
So far, this is quite similar to what you saw in the Getting Started with Rust lesson. Using this application, you’ll see how some minor changes can improve the application’s performance.
Prepared Statements
In every iteration of the while loop, we want to insert new data into the log.messages
table. Doing so naively is inefficient as every call to session.query
would send the entire query string to the database, which then parses it. One can prepare a query in advance using the session to avoid unnecessary database-side calculations.prepare
method. A call to this method will return a PreparedStatement
object, which can be used later with session.execute()
to execute the desired query.
What Exactly are Prepared Statements?
A prepared statement is a query parsed by ScyllaDB and then saved for later use. One of the valuable benefits of using prepared statements is that you can continue to reuse the same query while modifying variables in the query to match parameters such as names, addresses, and locations.
When asked to prepare a CQL statement, the client library will send a CQL statement to ScyllaDB. ScyllaDB will then create a unique fingerprint for that CQL statement by MD5 hashing it. ScyllaDB then uses this hash to check its query cache and see if it has already seen it. If so, it will return a reference to that cached CQL statement. If ScyllaDB does not have that unique query hash in its cache, it will then proceed to parse the query and insert the parsed output into its cache.
The client will then be able to send and execute a request specifying the statement id (which is encapsulated in the PreparedStatement
object) and providing the (bound) variables, as you will see next.
Using Prepared Statements In the Application
Go over the sample code above and modify it to use prepared statements.
The first step is to create a prepared statement (with the help of session.prepare
) before the while loop. Next, you need to replace session.query
with session.execute
inside the while loop.
After these two steps, the app will reuse the prepared statement insert_message
instead of sending raw queries. This significantly improves performance.
Paging
Look at the last lines of the application:
There is a call to the Session::query
method, and an unprepared select query is sent. Since this query is only executed once, it isn’t worth preparing. However, if we suspect that the result will be large, it might be better to use paging.
What is Paging?
Paging is a way to return a lot of data in manageable chunks.
Without paging, the coordinator node prepares a single result entity that holds all the data and returns it. In the case of a large result, this may have a significant performance impact as it might use up a lot of memory, both on the client and on the ScyllaDB server side.
To avoid this, use paging, so the results are transmitted in chunks of limited size, one chunk at a time. After transmitting each chunk, the database stops and waits for the client to request the next one. This is repeated until the entire result set is transmitted.
The client can limit the size of the pages according to the number of rows it can contain. If a page reaches the size limit before it reaches the client-provided row limit, it’s called a short page or short read.
Adding Paging to Our App
As you may have guessed by now, Session::query
does not use paging. It fetches the whole result into memory in one go. An alternative Session method uses paging under the hood – Session::query_iter
(Session::execute_iter
is another alternative that works with prepared statements). The Session::query_iter
method takes a query and a value list as arguments and returns an async iterator (stream) over the result Rows. This is how it is used:
After the query_iter
invocation, the driver starts a background task that fetches subsequent rows. The caller task (the one that invoked query_iter
) consumes newly fetched rows by using an iterator-like stream interface. The caller and the background task run concurrently, so one of them can fetch new rows while the other consumes them.
By adding paging to the app, you reduce memory usage and increase the application’s performance.
Retries
After a query fails, the driver might decide to retry it based on the retry policy and on the query itself. The retry policy can be configured for the whole Session or just for a single query.
Available Retry Policies
The driver offers two policies to choose from:
- Fallthrough Retry Policy – never retries and returns all errors straight to the user.
- Default Retry Policy – used by default, might retry if there is a high chance of success.
It is possible to provide a custom retry policy by implementing RetryPolicy and RetrySesssion.
Using Retry Policies
The key to enjoying the benefits of retry policies is to provide more information about query idempotency. A query is idempotent if it can be applied multiple times without changing the result of the initial application. The driver will not retry a failed query if it is not idempotent. Marking queries as idempotent is expected to be done by the user, as the driver does not parse query strings.
Mark the app’s select statement as an idempotent one:
By making this change, you will be able to use retries (provided by the default retry policy) in case of a select statement execution error.
Additional Resources
- To run the application and see the results, check out the complete lesson on ScyllaDB University.
- The Rust Driver docs page contains a quick start guide for people who want to use our driver.
- The P99 CONF conference is happening soon – it’s a great (free and virtual) opportunity to learn about high-performance applications. Register now to save your spot.
- Check out some other NoSQL courses on ScyllaDB University