Skip to main content

Consuming CDC with Java, Go… and Rust!

A quick look at how to use the ScyllaDB CDC with the Rust connector In 2021, we published a guide for using Java and Go with ScyllaDB CDC. Today, we are happy to share a new version of that post, including how to use ScyllaDB CDC with the Rust connector! Note: We will skip some of the sections in the original post, like “Why Use a Library?” and challenges in using CDC. If you are planning to use CDC in production, you should absolutely go back and read them. But if you’re just looking to get a demo up and running, this post will get you there. Getting Started with Rust scylla-cdc-rust is a library for consuming the ScyllaDB CDC Log in Rust applications. It automatically and transparently handles errors and topology changes of the underlying ScyllaDB cluster. As a result, the API allows the user to read the CDC log without having to deeply understand the internal structure of CDC. The library was written in pure Rust, using ScyllaDB Rust Driver and Tokio. Let’s see how to use the Rust library. We will build an application that prints changes happening to a table in real-time. You can see the final code here. Installing the library The scylla-cdc library is available on crates.io. Setting up the CDC consumer The most important part of using the library is to define a callback that will be executed after reading a CDC log from the database. Such a callback is defined by implementing the Consumer trait located in scylla-cdc::consumer. For now, we will define a struct with no member variables for this purpose: Since the callback will be executed asynchronously, we have to use the async-trait crate to implement the Consumer trait. We also use the anyhow crate for error handling. The library is going to create one instance of TutorialConsumer per CDC stream, so we also need to define a ConsumerFactory for them: Adding shared state to the consumer Different instances of Consumer are being used in separate Tokio tasks. Due to that, the runtime might schedule them on separate threads. In response, a struct implementing the Consumer trait should also implement the Send trait and a struct implementing the ConsumerFactory trait should implement Send and Sync traits. Luckily, Rust implements these traits by default if all member variables of a struct implement them. If the consumers need to share some state, like a reference to an object, they can be wrapped in an Arc. An example of that might be a Consumer that counts rows read by all its instances: Note: In general, keeping shared mutable state in the Consumer is not recommended. That’s because it requires synchronization (i.e. a mutex or an atomic like AtomicUsize), which reduces the speedup granted by Tokio by running the Consumer logic on multiple cores. Fortunately, keeping exclusive (not shared) mutable state in the Consumer comes with no additional overhead. Starting the application Now we’re ready to create our main function: As we can see, we have to configure a few things in order to start the log reader: We have to create a connection to the database, using the Session struct from ScyllaDB Rust Driver. Specify the keyspace and the table name. We create time bounds for our reader. This step is not compulsory – by default, the reader will start reading from now and will continue reading forever. In our case, we are going to read all logs added during the last 6 minutes. We create the factory. We can build the log reader. After creating the log reader, we can await the handle it returns so that our application will terminate as soon as the reader finishes. Now, let’s insert some rows into the table. After inserting 3 rows and running the application, you should see the output: Hello, scylla-cdc! Hello, scylla-cdc! Hello, scylla-cdc! The application printed one line for each CDC log consumed. To see how to use CDCRow and save progress, see the full example below. Full Example Follow this detailed cdc-rust tutorial or

git clone https://github.com/scylladb/scylla-cdc-rust cd scylla-cdc-rust cargo run --release --bin scylla-cdc-printer -- --keyspace KEYSPACE --table TABLE --hostname HOSTNAME

Where HOSTNAME is the IP address of the cluster. Getting Started with Java and Go For a detailed walk through of with Java and Go examples, see our previous blog, Consuming CDC with Java and Go. Further reading In this blog, we have explained what problems the scylla-cdc-rust, scylla-cdc-java, and scylla-cdc-go libraries solve and how to write a simple application with each. If you would like to learn more, check out the links below: Replicator example application in the scylla-cdc-java repository. It is an advanced application that replicates a table from one Scylla cluster to another one using the CDC log and scylla-cdc-java library. Example applications in scylla-cdc-go repository. The repository currently contains two examples: “simple-printer”, which prints changes from a particular schema, “printer”, which is the same as the example presented in the blog, and “replicator”, which is a relatively complex application which replicates changes from one cluster to another. API reference for scylla-cdc-go. Includes slightly more sophisticated examples which, unlike the example in this blog, cover saving progress. CDC documentation. Knowledge about the design of Scylla’s CDC can be helpful in understanding the concepts in the documentation for both the Java and Go libraries. The parts about the CDC log schema and representation of data in the log is especially useful. ScyllaDB users slack. We will be happy to answer your questions about the CDC on the #cdc channel. We hope all that talk about consuming data has managed to whet your appetite for CDC! Happy and fruitful coding!