Rust and ScyllaDB: 3 Ways to Improve Performance

August 30, 2022

Guy Shtub

We’ve been working hard to develop and improve the scylla-rust-driver. It’s an open-source ScyllaDB (and Apache Cassandra) driver for Rust, written in pure Rust with a fully async API using Tokio. You can read more regarding its benchmark results and how our developers solved a performance regression.

In different benchmarks, the Rust driver proved more performant than other drivers, which gave us the idea of using it as a unified core for other drivers as well.

This blog post is based on the ScyllaDB University Rust lesson. In this post, I’ll cover the essentials of the lesson. You’ll learn about Prepared Statements, Paging, and Retries and see an example using the ScyllaDB Rust driver. The ultimate goal is to demonstrate how some minor changes can significantly improve the application’s performance.

To continue learning about ScyllaDB and Rust with additional exercises and hands-on examples, log in or register for ScyllaDB University (it’s free). You’ll be on the path to new certification and also gain unlimited access to all of our NoSQL database courses.

Starting ScyllaDB in Docker

Download the example from git:

git clone https://github.com/scylladb/scylla-code-samples.git


cd scylla-code-samples/Rust_Scylla_Driver/chat/

To quickly get ScyllaDB up and running, use the official Docker image:


docker run \
  -p 9042:9042/tcp \
  --name some-scylla \
  --hostname rust-scylla \
  -d scylladb/scylla:4.5.0 \
  --smp 1 --memory=750M --overprovisioned 1

Example Application

In this example, you’ll create a console application that reads messages from standard input and puts them into a table in ScyllaDB.

First, create the keyspace and the table:


docker exec -it some-scylla cqlsh


CREATE KEYSPACE IF NOT EXISTS log WITH REPLICATION = {
  'class': 'SimpleStrategy',
  'replication_factor': 1
};


CREATE TABLE IF NOT EXISTS log.messages (
  id bigint,
  message text,
  PRIMARY KEY (id)
);

Now, look at the main code of the application:

The application connects to the database, reads some lines from the console, and stores them in the table log.messages. It then reads those lines from the table and prints them.
So far, this is quite similar to what you saw in the Getting Started with Rust lesson. Using this application, you’ll see how some minor changes can improve the application’s performance.

Prepared Statements

In every iteration of the while loop, we want to insert new data into the log.messages table. Doing so naively is inefficient as every call to session.query would send the entire query string to the database, which then parses it. One can prepare a query in advance using the session to avoid unnecessary database-side calculations.prepare method. A call to this method will return a PreparedStatement object, which can be used later with session.execute() to execute the desired query.

What Exactly are Prepared Statements?

A prepared statement is a query parsed by ScyllaDB and then saved for later use. One of the valuable benefits of using prepared statements is that you can continue to reuse the same query while modifying variables in the query to match parameters such as names, addresses, and locations.

When asked to prepare a CQL statement, the client library will send a CQL statement to ScyllaDB. ScyllaDB will then create a unique fingerprint for that CQL statement by MD5 hashing it. ScyllaDB then uses this hash to check its query cache and see if it has already seen it. If so, it will return a reference to that cached CQL statement. If ScyllaDB does not have that unique query hash in its cache, it will then proceed to parse the query and insert the parsed output into its cache.

The client will then be able to send and execute a request specifying the statement id (which is encapsulated in the PreparedStatement object) and providing the (bound) variables, as you will see next.

Using Prepared Statements In the Application

Go over the sample code above and modify it to use prepared statements.
The first step is to create a prepared statement (with the help of session.prepare) before the while loop. Next, you need to replace session.query with session.execute inside the while loop.

After these two steps, the app will reuse the prepared statement insert_message instead of sending raw queries. This significantly improves performance.

Paging

Look at the last lines of the application:

There is a call to the Session::query method, and an unprepared select query is sent. Since this query is only executed once, it isn’t worth preparing. However, if we suspect that the result will be large, it might be better to use paging.

What is Paging?

Paging is a way to return a lot of data in manageable chunks.

Without paging, the coordinator node prepares a single result entity that holds all the data and returns it. In the case of a large result, this may have a significant performance impact as it might use up a lot of memory, both on the client and on the ScyllaDB server side.

To avoid this, use paging, so the results are transmitted in chunks of limited size, one chunk at a time. After transmitting each chunk, the database stops and waits for the client to request the next one. This is repeated until the entire result set is transmitted.
The client can limit the size of the pages according to the number of rows it can contain. If a page reaches the size limit before it reaches the client-provided row limit, it’s called a short page or short read.

Adding Paging to Our App

As you may have guessed by now, Session::query does not use paging. It fetches the whole result into memory in one go. An alternative Session method uses paging under the hood – Session::query_iter (Session::execute_iter is another alternative that works with prepared statements). The Session::query_iter method takes a query and a value list as arguments and returns an async iterator (stream) over the result Rows. This is how it is used:

After the query_iter invocation, the driver starts a background task that fetches subsequent rows. The caller task (the one that invoked query_iter) consumes newly fetched rows by using an iterator-like stream interface. The caller and the background task run concurrently, so one of them can fetch new rows while the other consumes them.

By adding paging to the app, you reduce memory usage and increase the application’s performance.

Retries

After a query fails, the driver might decide to retry it based on the retry policy and on the query itself. The retry policy can be configured for the whole Session or just for a single query.

Available Retry Policies

The driver offers two policies to choose from:

Fallthrough Retry Policy – never retries and returns all errors straight to the user.
Default Retry Policy – used by default, might retry if there is a high chance of success.

It is possible to provide a custom retry policy by implementing RetryPolicy and RetrySesssion.

Using Retry Policies

The key to enjoying the benefits of retry policies is to provide more information about query idempotency. A query is idempotent if it can be applied multiple times without changing the result of the initial application. The driver will not retry a failed query if it is not idempotent. Marking queries as idempotent is expected to be done by the user, as the driver does not parse query strings.

Mark the app’s select statement as an idempotent one:

By making this change, you will be able to use retries (provided by the default retry policy) in case of a select statement execution error.

Additional Resources

To run the application and see the results, check out the complete lesson on ScyllaDB University.
The Rust Driver docs page contains a quick start guide for people who want to use our driver.
The P99 CONF conference is happening soon – it’s a great (free and virtual) opportunity to learn about high-performance applications. Register now to save your spot.
Check out some other NoSQL courses on ScyllaDB University