Skip to main content

Scylla University: New Spark and Kafka Lessons

Scylla University is our free online resource for you to learn and master NoSQL skills. We’re always adding new lessons and updating existing lessons to keep the content fresh and engaging.

We’re also expanding the content to cover data ecosystems, because we understand that your database doesn’t operate in a vacuum. To that end we recently published two new lessons on Scylla University: Using Spark with Scylla and Kafka and Scylla.

Using Spark with Scylla

Whether you use on-premises hardware or cloud-based infrastructure, Scylla is a solution that offers high performance, scalability, and durability to your data. With Scylla, data is stored in a row-and-column, table-like format that is efficient for transactional workloads. In many cases, we see Scylla used for OLTP workloads.

But what about analytics workloads? Many users these days they’ve standardized on Apache Spark. It accepts everything from columnar format files like Apache Parquet to row-based Apache Avro. It can also be integrated with transactional databases like Scylla.

By using Spark together with Scylla, users can deploy analytics workloads on the information stored in the transactional system.

The new Scylla University lesson “Using Spark with Scylla” covers:

  • An overview of Scylla, Spark, and how they can work together.
  • Scylla and Analytics workloads
  • Scylla token architecture, data distribution, hashing, and nodes
  • Spark intro: the driver program, RDDs, and data distribution
  • Considerations for writing and reading data using Spark and Scylla
  • What happens when writing data and what are the different configurable variables
  • How data is read from Scylla using Spark
  • How to decide if Spark should be collocated with Scylla
  • Best practices and considerations for configuring Spark to work with Scylla

Using Kafka with Scylla

This lesson provides an intro to Kafka and covers some basic concepts. Apache Kafka is an open-source distributed event streaming system. It allows you to:

  • Ingest data from a multitude of different systems, such as databases, your services, microservices or other software applications
  • Store them for future reads
  • Process and transform the incoming streams in real-time
  • Consume the stored data stream

Some common use cases for Kafka are:

  • Message broker (similar to RabbitMQ and others)
  • Serve as the “glue” between different services in your system
  • Provide replication of data between databases/services
  • Perform real-time analysis of data (e.g., for fraud detection)

The Scylla Sink Connector is a Kafka Connect connector that reads messages from a Kafka topic and inserts them into Scylla. It supports different data formats (Avro, JSON).It can scale across many Kafka Connect nodes. It has at-least-once semantics, and it periodically saves its current offset in Kafka.

The Scylla University lesson also provides a brief overview of Change Data Capture (CDC) and the Scylla CDC Source Connector. To learn more about CDC, check out this lesson.

The Scylla CDC Source Connector is a Kafka Connect connector that reads messages from a Scylla table (with Scylla CDC enabled) and writes them to a Kafka topic. It works seamlessly with standard Kafka converters (JSON, Avro). The connector can scale horizontally across many Kafka Connect nodes. Scylla CDC Source Connector has at-least-once semantics.

The lesson includes demos for quickly starting Kafka, using the Scylla Sink Connector, viewing changes on a table with CDC enabled, and downloading, installing, configuring, and using the Scylla CDC Source Connector.

To learn more about using Spark with Scylla and about Kafka and Scylla, check out the full lessons on Scylla University. These include quiz questions and hands-on labs.

Scylla University LIVE – Fall Event (November 9th and 10th)

Following the success of our previous Scylla University LIVE events, we’re hosting another event in November! We’ll conduct these informative live sessions in two different time zones to better support our global community of users. The November 9th training is scheduled for a time convenient in North and South America; November 10th will be the same sessions but better scheduled for users in Europe and Asia.

As a reminder, Scylla University LIVE is a FREE, half-day, instructor-led training event, with training sessions from our top engineers and architects. It will include sessions that cover the basics and how to get started with Scylla, as well as more advanced topics and new features. Following the sessions, we will host a roundtable discussion where you’ll have the opportunity to talk with Scylla experts and network with other users.

The event will be online and instructor-led. Participants that complete the LIVE training event will receive a certificate of completion.

REGISTER FOR SCYLLA UNIVERSITY LIVE

Next Steps

If you haven’t done so yet, register a user account in Scylla University and start learning. It’s free!

Join the #scylla-university channel on our community Slack for more training-related updates and discussions.