Alan Shimel and Dor Laor on Database Elasticity, ScyllaDB X Cloud

Cynthia Dunlop

Alan and Dor chat about elasticity, 90% storage utilization, powering feature stores and other AI use cases, ScyllaDB’s upcoming vector search release, and much more

ScyllaDB recently announced ScyllaDB X Cloud: a truly elastic database that supports variable/unpredictable workloads with consistent low latency, plus low costs.

To explore what’s new and how X Cloud impacts development teams, DevOps luminary Alan Shimel recently connected with ScyllaDB Co-founder and CEO Dor Laor for a TechStrongTV interview. Alan and Dor chatted about database elasticity, 90% storage utilization, powering ML feature stores and other AI use cases, ScyllaDB’s upcoming vector search release, and much more.

If you prefer to read rather than watch, here’s a transcript (lightly edited for brevity and clarity) of the core conversation.

What is ScyllaDB X Cloud

Alan: Dor, you guys recently announced ScyllaDB X Cloud. Tell us about it.

Dor: ScyllaDB is available as a self-managed database (ScyllaDB Enterprise) and also as a fully managed database, a service in the cloud (ScyllaDB Cloud). Recently, we released a new version called ScyllaDB X Cloud. What’s special about it? It allows us to be the most elastic database on the market.

Why would you need elasticity? At first, teams look for a database that offers performance and the high availability needed for mission-critical use cases. Afterwards, when they start using it, they might scale to a very high extent, and that often costs quite a bit of money.

Sometimes your peak level varies and the usage varies. There are cases where a Black Friday or similar event occurs, and then you need to scale your deployments. This is predictable scale. There’s also unpredictable scale: sometimes you have a really good success, traffic ends up surging, and you need to scale the database to service it.

In many cases, elasticity is needed daily. Some sites have predictable traffic patterns: it increases when people wake up, traffic subsides for a while, then there’s another peak later. If you provision 100% for the peak load, you’re wasting resources through a lot of the day.

However, with traditional databases, keeping your capacity aligned with your actual traffic requires constantly meddling with the number of servers and infrastructure. Many users have petabyte-sized deployments, and moving around petabytes within several hours is an extremely difficult task. X Cloud lets you move data fast. Teams can double or quadruple throughput within minutes. That’s the value of X Cloud.

See ScyllaDB scaling in this short demo:

Elastic Scaling

Alan: You know, in many ways, this whole idea of elasticity feels like déjà vu. I was told this about the cloud in general, right? That was one of the things about being in the cloud: burstable. But in the cloud, it proved to be like a balloon. If you blow up a balloon it stretches out – but then you let the air out, and the balloon is still stretched out. And for so many people, cloud elasticity is just one way. You can always make it bigger, but how many people really do make it smaller? Is this something that’s also true in ScyllaDB? Is it truly that elastic?

Dor: Everything you described is absolutely true. Many times, people just keep on adding more and more data. If you just add and add but you don’t delete it, we have different capabilities to help with that.

There are normally two reasons why you’d want to scale out and why you’d like to scale in. Number one is storage. So if your storage grows, you will scale out and add resources. If you delete data, it will automatically scale back in again. And our current release has two unique things to address that.

First, it has autoscale with storage. The storage automatically scales, and our compute is bundled together with the storage.

I’ll tell you a secret: at the end of the day, we run servers, and each server has memory and disk and networking and compute all together. That’s one of the reasons why we have really good performance. We bundle compute and storage. So if the amount of data people store decreases, we automatically decrease the number of servers, and we do it with 90% utilization. So the server, the disk capacity, can go up to 90% of the shared cluster infrastructure. It’s a very high percentage. Previously, we went from 50% to 70%. Now, because we’re very elastic, and we can move data very fast, we can go up to 90%.

The scaling is all automated. When you go to 90%, we automatically provision more servers. If you go below 90% (there is some threshold like 85%), then we automatically decrease the number of servers. And we also do it in a very precise way. For example, assume you have three gigantic servers – we support servers up to 256 CPUs. If you run out of space, you could add another gigantic server, but then your utilization will suddenly be very low. So it’s not ideal to add a big server if you just need another 1% or 2% of utilization.

X Cloud automatically selects the server size for you, and we mix server sizes. So if you only need an extra 5% along with these very big servers, we’re going to add a small, tiny server with two vCPUs next to those other big servers – and we will keep replacing it automatically for you, without you having to worry about it. That translates to value. The total cost of ownership will be low because we are targeting 90% utilization of this capacity.

The other reason why you’d want to scale out is sometimes throughput and CPU consumption. This is even easier because we need to move less storage. We let our customers scale in and out multiple times an hour.

ScyllaDB and AI, Vector Search

Alan: Got it, that’s great, very in-depth. Thank you very much for that. You know, Dor, all the news today is “AI agentic, AI generative, AI LLMs…” Underlying all of this, though, is data. They’re training on data, storing data, using data…How has this affected ScyllaDB’s business?

Dor: It definitely drives our business. We have three main pillars related to AI. Number one, there’s traditional machine learning. ScyllaDB is used for machine learning feature stores, for real-time personalization and customization. For example, Tripadvisor is a customer of ours. They use ScyllaDB for a feature store that helps find the best recommendations, deals, and advice for their users.

Number two is for AI use cases that need large, scalable storage underneath. For example, a major vehicle vendor is using ScyllaDB to train their AI model for autonomous self-driving cars. Lots of AI use cases need to store and access massive amounts of data, fast.

Third is vector search, which is a core component of RAG and many agentic AI pipelines. ScyllaDB will release a vector search product by the end of the year – right now, it’s in closed beta.

Extreme Automation

Alan: You heard it here! I just want to make sure we hit the main points of the X Cloud offering. With all of these things that you’ve mentioned already, really what are the results? You’ve improved compression, improved streaming, helping to reduce storage and cloud costs…and network too, right? That’s an important piece of the equation as well. The other thing I want to emphasize for our audience is that all this is offered as a “database as a service,” so you don’t have to worry about your infrastructure.

Dor: Absolutely. We have a high amount of automation that acts on behalf of the end user. This isn’t about us doing manual operations on your behalf. The elasticity is all automated and natural, like the cluster is breathing.

Users just set the scaling policies and the cluster will take it from that point onward. Everything runs under the hood, including backups. For example, imagine a customer who runs at 89% utilization and a backup brings them temporarily beyond 90% capacity. The second it crosses the 90% trigger, we will automatically provision more servers to the cluster. Once the backup is complete and the image backup snapshot is loaded to S3, then the image will be deleted, the cluster will go back below 90% utilization, and we can automatically decrease the number of servers. Everything is automated. It’s really fascinating to see all of those use cases run under the hood.