Scylla University: New Spark and Kafka Lessons

Guy Shtub

Scylla University is our free online resource for you to learn and master NoSQL skills. We’re always adding new lessons and updating existing lessons to keep the content fresh and engaging.

We’re also expanding the content to cover data ecosystems, because we understand that your database doesn’t operate in a vacuum. To that end we recently published two new lessons on Scylla University: Using Spark with Scylla and Kafka and Scylla.

Using Spark with Scylla

Whether you use on-premises hardware or cloud-based infrastructure, Scylla is a solution that offers high performance, scalability, and durability to your data. With Scylla, data is stored in a row-and-column, table-like format that is efficient for transactional workloads. In many cases, we see Scylla used for OLTP workloads.

But what about analytics workloads? Many users these days they’ve standardized on Apache Spark. It accepts everything from columnar format files like Apache Parquet to row-based Apache Avro. It can also be integrated with transactional databases like Scylla.

By using Spark together with Scylla, users can deploy analytics workloads on the information stored in the transactional system.

The new Scylla University lesson “Using Spark with Scylla” covers:

An overview of Scylla, Spark, and how they can work together.
Scylla and Analytics workloads
Scylla token architecture, data distribution, hashing, and nodes
Spark intro: the driver program, RDDs, and data distribution
Considerations for writing and reading data using Spark and Scylla
What happens when writing data and what are the different configurable variables
How data is read from Scylla using Spark
How to decide if Spark should be collocated with Scylla
Best practices and considerations for configuring Spark to work with Scylla

Using Kafka with Scylla

This lesson provides an intro to Kafka and covers some basic concepts. Apache Kafka is an open-source distributed event streaming system. It allows you to:

Ingest data from a multitude of different systems, such as databases, your services, microservices or other software applications
Store them for future reads
Process and transform the incoming streams in real-time
Consume the stored data stream

Some common use cases for Kafka are:

Message broker (similar to RabbitMQ and others)
Serve as the “glue” between different services in your system
Provide replication of data between databases/services
Perform real-time analysis of data (e.g., for fraud detection)

The Scylla Sink Connector is a Kafka Connect connector that reads messages from a Kafka topic and inserts them into Scylla. It supports different data formats (Avro, JSON).It can scale across many Kafka Connect nodes. It has at-least-once semantics, and it periodically saves its current offset in Kafka.

The Scylla University lesson also provides a brief overview of Change Data Capture (CDC) and the Scylla CDC Source Connector. To learn more about CDC, check out this lesson.

The Scylla CDC Source Connector is a Kafka Connect connector that reads messages from a Scylla table (with Scylla CDC enabled) and writes them to a Kafka topic. It works seamlessly with standard Kafka converters (JSON, Avro). The connector can scale horizontally across many Kafka Connect nodes. Scylla CDC Source Connector has at-least-once semantics.

The lesson includes demos for quickly starting Kafka, using the Scylla Sink Connector, viewing changes on a table with CDC enabled, and downloading, installing, configuring, and using the Scylla CDC Source Connector.

To learn more about using Spark with Scylla and about Kafka and Scylla, check out the full lessons on Scylla University. These include quiz questions and hands-on labs.

Scylla University LIVE – Fall Event (November 9th and 10th)

Following the success of our previous Scylla University LIVE events, we’re hosting another event in November! We’ll conduct these informative live sessions in two different time zones to better support our global community of users. The November 9th training is scheduled for a time convenient in North and South America; November 10th will be the same sessions but better scheduled for users in Europe and Asia.

As a reminder, Scylla University LIVE is a FREE, half-day, instructor-led training event, with training sessions from our top engineers and architects. It will include sessions that cover the basics and how to get started with Scylla, as well as more advanced topics and new features. Following the sessions, we will host a roundtable discussion where you’ll have the opportunity to talk with Scylla experts and network with other users.

The event will be online and instructor-led. Participants that complete the LIVE training event will receive a certificate of completion.