Alan Shimel and Dor Laor on Database Elasticity and AI with ScyllaDB
Alan and Dor chat about high-performance databases & AI trends Everything about re:Invent 2025 screamed “massive” – from the exhibit hall’s towering booths, to the overflowing keynotes, to product announcements at every turn. ScyllaDB’s “scale fearlessly” message fit in perfectly. See ScyllaDB’s re:Invent videos But despite the crowds and chaos, Alan Shimel (founder and CEO of Techstrong Group) and Dor Laor (ScyllaDB co-founder and CEO) found a way to meet for a laid-back chat. Topics ranged from ScyllaDB’s origin story, to OSS, to ScyllaDB’s latest announcements for AI and extreme database elasticity. Read highlights below, or enjoy the full interview: ScyllaDB AI Use Cases: Vector Scale, Feature Store, AI Stack Alan: What’s it like on the re:Invent floor? What are the conversations like? What are you hearing? Dor: There’s certainly no shortage of crowds at the booth. A lot of the conversation is about AI. We’re seeing a surge in AI-related use cases. At this point, about half of the use cases we see with ScyllaDB are directly related to AI. Alan: Explain that to me. What’s the use case? Dor: We usually split our AI uses cases into three categories. The first is being part of the AI stack itself. During training and serving, the stack needs to access a huge number of objects, and it needs a fast database to do that. In this case, we’re part of the core AI stack. It’s distributed databases handling very high workloads – and these are very high workloads. That can be for large LLM companies, or for much smaller companies that are just starting their AI journey. That’s the first category. The second category is the feature store. Feature stores are more traditionally associated with machine learning, but they’re still part of the AI world. A feature store lets people classify users, or sometimes agents, automatically. That can be used for recommendations in e-commerce, fraud detection, and a variety of other use cases. In those cases, the feature store needs a fast database to quickly determine how a user is classified and what’s appropriate for them – what they might want to watch, what ad they should see, and so on. The third category is vector search for running LLMs on private datasets. That’s where RAG comes in, with vector data. We added vector search ourselves, and we’re already seeing a lot of interest. In January, we’ll be going live with the general availability of our RAG and vector store. Alan: So in essence, they could use ScyllaDB as their vector database. They’re creating small language models or RAG. That’s got to be big…that’s fantastic. Dor: Our vector search is the most scalable. We can easily run models with a billion objects. Very few vendors can even reach a billion. We can do that while handling hundreds of thousands of requests per second, so we scale to very high numbers. And for people with lower or medium demand – which is most users, with models around 10 million or 100 million objects – we can deliver the best latency at very low price points. Alan: That’s fantastic. Look, there are a lot of people saying we’ve scraped everything there is to scrape for these LLMs. That continuing to make generative AI better by just increasing model size or training data is starting to hit diminishing returns. The thinking is that the way forward might be smaller language models, more RAG. Some people even argue we should move away from ML altogether and toward things like world models. But I definitely believe there’s going to be a lot of activity in the SLM and RAG space. And beyond that, as we build AI for specific use cases, I don’t need the whole internet. I just need the data that matters for that use case – especially if it’s my own proprietary information. I don’t want to put that out there. I want it right here. So I think that’s a huge business. Congratulations. Dor: Thanks. It’s market demand. It’s not just an opportunity, it’s also a defensive move. If we don’t do it, customers will go elsewhere, to be frank. People now expect the same ease of use they get from LLMs on public internet data when they come to any vendor. They want to ask questions in free text, in a single line, and immediately get the best results – without digging through a complicated UI. That’s the power of LLMs. And sometimes it won’t even be people doing that. It could be agents that come in, automate things, and issue those queries on their behalf. True Database Elasticity: Scaling Out and In, Fast Alan: All right, let’s fast-forward past AWS for a second. You have some new announcements coming soon. Share a bit, if you don’t mind. Dor: Thank you for the opportunity. We’re moving from beta to general availability with ScyllaDB X Cloud, our managed platform. This is the new generation of our core database, delivered as a database-as-a-service, including management and consumption. The unique thing here is our new core architecture, which we call tablets. It’s way more elastic than any other database – or even infrastructure – out there. Before this, we were okay in terms of scaling clusters out and scaling them back in. We were about average. But there was demand to do it much faster. And frankly, we also compete with DynamoDB. We’re API-compatible with DynamoDB. Up until now, DynamoDB has been the best in the industry at scaling up and down quickly. If your workload changes throughout the day, you don’t want to pay for peak capacity all the time. You want the system to follow usage dynamically. That’s exactly what X Cloud is. With tablets, we break a very large database– say, a petabyte of data – into five-gigabyte chunks. We can move those chunks around super quickly. That allows us to scale extremely fast. We can increase capacity by four times in about ten minutes. For example, you can go from 500,000 operations per second to two billion operations per second in ten minutes. Alan: And back to 500K? Dor: That’s right. Alan: Sometimes with these things, it’s like blowing up a balloon. You know what I mean? It never really goes back to the size it was before you blew it up. Dor: So with this, we can also go back and shrink. It’s complicated, but it works. User workloads come and go – whether it’s Black Friday or just daily patterns. That leads to big TCO improvements and usability improvements. It’s also pretty unique. We have a shard-per-core engine. So if you have a machine with 32 cores, you’ll have 32 independent threads in the server. If you have a 64-core machine, you’ll have 64 threads, and it will perform twice as well. Now, let’s say you have a 64-core machine, but you actually need 66 threads. If you only had 64, would you buy another 64-core machine? That’s expensive. Instead, we can mix and match. You can run a 64-core machine together with a small two-vCPU machine side by side. Because of the flexibility of our sharding model, we can combine the two. I haven’t seen any other vendor that can do that. What the user gets is efficiency. They have exactly what they need, without having to buy oversized, expensive servers. Alan: Really, what we’re talking about here is almost a FinOps play. I think that’s where we are now, especially with cloud usage. Look, we’re talking about spending five trillion dollars on data center AI factories. But when I talk to people, what they actually say is, I want to get control of my cloud bill. They want to be more efficient in how they use these resources. That’s why I made the balloon joke– that’s pretty much how the cloud works. It never seems to go back down. People want insight. They want to be able to turn the dial. And they want to ask, how can I do this more efficiently? Dor: Most databases aren’t that loaded. I’m not talking about spikes. I’m talking about normal daily usage, or overnight usage. Often it’s only 10% or 20% utilized – but you’re paying for the entire thing. Alan: That was always the promise of the cloud – that elasticity would go up and down. In practice, it mostly just went up. Tiered Storage at ScyllaDB Alan: So, what else is new at ScyllaDB? Dor: We’re also working on things like tiered storage and other technologies to reduce the bill. Normally, we use NVMe for fast storage and performance. It’s also relatively cheap compared to other high-performance storage options. But S3 is cheaper. The problem with S3 is latency. It can be 50 milliseconds, 100 milliseconds, which is prohibitive for many workloads. With tiered storage, we can keep the hot data on fast NVMe and automatically move cold data to S3. That lets us come up with a good solution for common use cases. For example, you might want to keep 30 days of data in ScyllaDB on NVMe, but keep a year of data overall – and still access it through the same API, without having to build a separate access path. That gives users a single API and a very cost-effective solution. Learn more about what’s next for ScyllaDB at Monster SCALE Summit — free and virtual.