Beyond Apache Cassandra

Cynthia Dunlop

ScyllaDB is no longer “just” a faster Cassandra.

In 2008, Apache Cassandra set a new standard for database scalability. Born to support Facebook’s Inbox Search, it has since been adopted by tech giants like Uber, Netflix, and Apple – where it’s run by experts who also serve as Cassandra contributors (alongside DataStax/IBM). And as its adoption scaled, Cassandra remained true to its core mission of scaling on commodity hardware with high availability.

But what about performance? Simplicity? Efficiency? Elasticity?

In 2015, ScyllaDB was born to go beyond Cassandra’s suboptimal resource utilization. Fresh from creating KVM and hacking the Linux kernel, the founders believed that their low-level engineering approach could squeeze considerably more power from the underlying infrastructure. The timing was ideal: just a year earlier, Netflix had published their numbers showing how to push Apache Cassandra to 1 million write RPS. This was an impressive feat, but one that required significant infrastructure investments and tuning efforts.

The idea was quite simple (in theory, at least): take Apache Cassandra’s scalable architecture and reimplement it close to the metal while keeping wire protocol compatibility. Not relying on Java meant less latency variability (plus no stop the world pauses), while a unique shard-per-core architecture maximized servers’ throughput even under heavy system load. To prevent contention, everything was made asynchronous, and all these optimizations were paired with autonomous internal schedulers for minimal operational overhead.

That was 10 years ago. While I can’t speak to Cassandra’s current direction, ScyllaDB evolved quite significantly since then – shifting from “just” a faster Cassandra alternative to a database with its own identity and unique feature set.

Spoiler: In this video, I walk you through some key differences between ScyllaDB and how it differs from Apache Cassandra. I discuss the differences in performance, elasticity, and capabilities such as workload prioritization. You can see how ScyllaDB maps data per CPU core, scales in parallel, and de-risks topology changes—allowing it to handle millions of OPS with predictable low latencies (and without constant tuning and babysitting).

ScyllaDB’s Evolution

The first generation of ScyllaDB was all about raw performance. That’s when we introduced the shard-per-core asynchronous architecture, row-based cache, and advanced schedulers that achieve predictable low latencies.

ScyllaDB’s second generation aimed for feature parity with Cassandra, but we actually went beyond that. For example, we introduced our Materialized views and production-ready Global Secondary Indexes (something that Cassandra still flags as experimental). Likewise, ScyllaDB also introduced support for local secondary indexes in that same year; those were just introduced in Cassandra 5 (after at least three different indexing implementations). Moreover, our Paxos implementation for lightweight transactions eliminated much of the overhead and limitations in Cassandra’s alternative implementation.

The third generation marked our shift to the cloud, along with continued innovation. This is when ScyllaDB Alternator—our DynamoDB-compatible API—was introduced. We added support for ZSTD compression in 2020 (Cassandra only adopted it late in 2021). During this period, we dramatically improved repair speeds with row-level repair and introduced workload prioritization (more on this in the next section).

The fourth generation of ScyllaDB emerged around the time AWS announced their i3en instance family, with high-density nodes holding up to 60TB of data (something Cassandra still struggles to handle effectively). During this period, we introduced the Incremental Compaction Strategy (ICS), allowing users to utilize up to 70% of their storage before scaling out. This later evolved into a hybrid compaction strategy (and we now support 90% storage utilization).

We also introduced Change Data Capture (CDC) with a fundamentally different approach from Cassandra’s. And we significantly extended the CQL protocol with concepts such as shard-awareness, BYPASS CACHE, per-query configurable TIMEOUTs, and much more.

Finally, we arrive at the fifth generation of ScyllaDB, which is still unfolding. This phase represents our path toward strong consistency and elasticity with Raft and Tablets. For more about the significance of this, read on…

Capabilities That Set ScyllaDB Apart

Our engineers have introduced lots of interesting features over the past decade. Based on my interactions with former Cassandra users, I think these are the most interesting to discuss here.

Tablets Data Distribution

Each ScyllaDB table is split into smaller fragments (“tablets”) to evenly distribute data and load across the system. Tablets bring elasticity to ScyllaDB, allowing you to instantly double, triple, or even 10x your cluster size to accommodate unpredictable traffic surges. They also enable more efficient use of storage, reaching up to 90% utilization. Since teams can quickly scale out in response to traffic spikes, they can satisfy latency SLAs without needing to overprovision “just in case.”

Raft-based Strong Consistency for Metadata

Raft introduces strong consistency to ScyllaDB’s metadata. Gone are the days when a schema change could push your cluster into disagreement or you’d lose access because you forgot to update the replication factor of your authentication keyspace (issues that still plague Cassandra).

Workload Prioritization

Workload prioritization allows you to consolidate multiple workloads under a single cluster, each with its own SLA. Basically, it controls how different workloads compete for system resources. Teams use it to prioritize urgent application requests that require immediate response times versus others that can tolerate slighter delays (e.g., large scans). Common use cases include balancing real-time vs batch processing, splitting writes from reads, and workload/infrastructure consolidation.

Repair-based Operations

Repair-based operations ensure your cluster data stays in sync, even during topology changes. This addresses a long-standing data consistency flaw in Apache Cassandra, where operations like replacing failed nodes can result in data loss. ScyllaDB also fully eliminates the problem of data resurrection, thanks to repair-based tombstone garbage collection.

Incremental Compaction

Incremental compaction (ICS) has been the default compaction strategy in ScyllaDB for over five years. ICS greatly reduces the temporary space amplification, resulting in more disk space being available for storing user data – and that eliminates the typical requirement of 50% free space in your drive. There is no comparable Cassandra feature. Cassandra just recently introduced Unified Compaction, which has yet to prove itself.

Row-based Cache

ScyllaDB’s row-based cache is also unique. It is enabled by default and requires no manual tuning. With the BYPASS CACHE extension, you can prevent cache pollution by keeping important items from being invalidated. Additionally, SSTable index caching significantly reduces I/O access time when fetching data from disk.

Per-shard Concurrency Limits and Rate Limiters

ScyllaDB includes per-shard concurrency limits and rate limiters per partition to protect against unexpected spikes. Whether dealing with a misbehaving client or a flood of requests to a specific key, ScyllaDB ensures resilience where Cassandra often falls short.

DynamoDB Compatibility

ScyllaDB also offers a DynamoDB-compatible layer, further distancing itself from its Apache Cassandra origins. This lets teams run their DynamoDB workloads on any cloud or on-prem – without code changes, and with 50% lower cost. This has helped quite a few teams consolidate multiple workloads on ScyllaDB.

What’s Next?

At the recent Monster SCALE Summit, CEO/co-founder Dor Laor shared a peek at what’s next for ScyllaDB. A few highlights…

Ready now (see this blog post and product page for details):

The ability to safely run at 90% storage utilization
Support for clusters with mixed instance type nodes
Dynamic provisioning and flex credit

Short-term:

Vector search
Strongly consistent tables
Fault injection service
Transparent repairs
Object and tiered storage
Raft for strongly consistent tables

Longer-term

Multi-key transactions
Analytics and transformations with UDFs
Automated large partition balancing
Immutable infrastructure for greater stability and reliability
A replication mode for more flexible and efficient infrastructure changes

For details, watch the complete talk here:

To close, ScyllaDB is faster than Cassandra (I’ll share our latest benchmark results here soon). But both ScyllaDB and Cassandra have evolved to the point that ScyllaDB is no longer “just” a faster Cassandra. We’ve evolved beyond Cassandra. If your project needs more predictable performance – and/or could benefit from the elasticity, efficiency, and simplicity optimizations we’ve been focusing on for years now – you might also want to consider evolving beyond Cassandra.