When ScyllaDB is Overkill vs. DynamoDB
ScyllaDB isn’t for everyone. In some cases, migrating from DynamoDB won’t reduce your costs at all. ScyllaDB isn’t for everyone. It’s designed specifically for teams that need predictable ultra-low latency with high throughput. And since one size never fits all, there are inevitably situations where you’d be much better served with a different database. For teams that need relatively high performance and want something simple to use, that “different database” is often DynamoDB. We’re the first to admit that there are situations where you’d be quite fine with DynamoDB. Sometimes moving to ScyllaDB just doesn’t make sense from a pure cost perspective. Other times, it could lower costs 5-40X. And there are other situations when one database is just a better fit from a technical perspective – regardless of costs. So where is the tipping point if you’re looking at the decision from a cost perspective? Often, it’s somewhere around 10,000 OPS. But the precise answer varies depending on your workload characteristics (read/write ratios) and the amount of data under management. That’s what we’ll focus on in this article. ScyllaDB and DynamoDB are a lot alike But first, let’s step back. Why are we comparing ScyllaDB and DynamoDB in the first place? For the past couple of years, we’ve worked with a lot of users moving from DynamoDB to ScyllaDB. That makes sense, given that ScyllaDB and DynamoDB share common goals of high performance and low hassle. But they’re fundamentally built differently. ScyllaDB’s close-to-the-metal architecture handles millions of ops/sec with predictable single-digit millisecond latencies and lower, predictable costs. Still, you can use the same data model in ScyllaDB as in DynamoDB. And an open source DynamoDB API (“Alternator”) simplifies migration. You redirect your application to ScyllaDB instead of DynamoDB and it just works like magic (actually, we listen on a specific port that understands the DynamoDB API). In most cases, minimal code changes are required. Moreover, both databases provide high performance, high availability, and multi-region support – with a fully managed “database-as-a-service” option. Can you make use of (at least) a minimally viable ScyllaDB cluster? One key difference between ScyllaDB and DynamoDB is provisioning clusters versus provisioning tables. When you work with DynamoDB, you provision tables and assign a different billing mode or capacity to each. In ScyllaDB, you provision a group of nodes (VMs, containers, pods, etc) that collectively manage and distribute data as a cluster. You can route traffic to that cluster with great performance…as long as there is enough capacity to sustain your workload’s needs. Even for applications with minimal traffic, you still want to provision a full ScyllaDB cluster. The smallest ScyllaDB Cloud cluster runs with 3-nodes. That’s 1) to ensure high availability and 2) to serve strongly consistent [or quorum] reads even if one node fails or goes out for maintenance. But in some cases, even that minimal ScyllaDB cluster might provide more power than you really need. ScyllaDB’s sweet spot is throughput-heavy workloads, given its ability to sustain massive parallel processing at scale. If your throughput wouldn’t really benefit from that, ScyllaDB is quite possibly overkill. The amount of data under management is another factor to consider. ScyllaDB is optimized for high throughput and low latencies and assumes that most of your data is frequently queried and requires low latencies. We achieve this by relying on local SSDs, which provide high concurrency and low latencies compared to other storage mediums. If you have a lot of data under management, your ScyllaDB cluster might require more, or larger, VMs. However, if your throughput isn’t high enough to make good use of that infrastructure because most of your data set is read infrequently, then ScyllaDB is probably overkill from a cost perspective. Discovering your specific tipping point With that reasoning in mind, let’s get more specific. You’re probably curious about what makes most sense for your own workload and storage requirements? There’s a calculator for that! We built a cost estimation calculator that compares ScyllaDB’s on-demand pricing vs DynamoDB’s on-demand pricing, using the same math as AWS. If you’re debating between selecting DynamoDB and ScyllaDB – or thinking of migrating from DynamoDB – these on-demand cost estimates are a fast way to see if ScyllaDB could be worth exploring. First off, note the minimums. The lowest number of operations per second (OPS) you can specify is 10K and the minimal storage set size is 1TB. That’s a pretty big hint – ScyllaDB is probably overkill for anything under that. Second, be careful when you’re entering your storage utilization. If you’re already using DynamoDB, don’t just copy over your DynamoDB storage utilization. Keep in mind that the reported utilization there refers to uncompressed RAW data. As you move to another database, your storage utilization will be less because you will typically be able to achieve at least 50% compression, if not more. Before you go and plug in your own numbers, let’s walk through how it plays out across two very different scenarios. Scenario 1: Storage Bound First, let’s consider a “storage-bound” case. For example, say you’re consuming 250TB of storage with DynamoDB. You would need 18 nodes of i3en.24xlarge – which are VERY large instances – to support 250TB in ScyllaDB. But if you consider that ScyllaDB’s compression typically provides a 50% gain, that would take us down to 125TB and require 9 nodes of i3en.24xlarge. If you look at the cluster capacity area, you see that this cluster can easily sustain close to 3M operations/second (as a rather conservative estimate) Now, if we click the DynamoDB button, you’ll see that our calculator tries to make a ScyllaDB and DynamoDB cost comparison analysis for you. But, unfortunately, it is telling us to talk to Sales. This indicates ScyllaDB is likely more expensive than DynamoDB here because you have so much storage for so few operations. Still, if you have other non-cost-related reasons to move, it might be worth a discussion. However, now assume that you decided to store only your most frequently accessed data on ScyllaDB. Also assume that’s 10% of the total 125TB (thus 13TB). In this case, the ScyllaDB pricing also drops….by a factor of ~10. Scenario 2: Write-Heavy Now, let’s increase the rate of writes, which are more expensive than reads with DynamoDB. If you bump up the throughput, note that the ScyllaDB estimates are the same for anywhere from 200K to 2M writes per second. That’s because as you move from 200K to 2M with ScyllaDB, you already have enough hardware to sustain this level of operations. In contrast, DynamoDB pricing keeps increasing when you increase the write throughput. Get your own cost estimation now Other factors to consider Of course, cost isn’t everything. Even when ScyllaDB seems like a more cost-effective option, moving from DynamoDB doesn’t always make sense. If your use case is heavily dependent on the AWS ecosystem, moving from DynamoDB to ScyllaDB might require considerable refactoring work. I once met with a DynamoDB user who started looking at ScyllaDB to replace DynamoDB and they had over a thousand Lambdas connected to DynamoDB. That’s a thousand lambdas you would need to refactor to connect to ScyllaDB. Also consider the feature set you currently require from DynamoDB. Although There is not a one-to-to one mapping for every DynamoDB features like multi-item transactions (TransactWriteItems, TransactGetItems) and accounting/capping are not (yet) available in ScyllaDB. For example, accounting and capping is an interesting one. I once met with a user who used DynamoDB’s provisioned throughput to their benefit, and throttled and billed their own customers according to it. Currently, there’s no such thing in ScyllaDB since we don’t impose any throughput limits by default. In this case, ScyllaDB didn’t make sense. On the flip side, there are two key situations where it makes sense to consider ScyllaDB even if there’s not an impressive cost advantage vs. DynamoDB: when you need better latency and more deployment flexibility. If DynamoDB can’t achieve the latency you need for whatever reason – or you want to keep latencies low without the hassles/cost of DAX – ScyllaDB could likely help you address that. And if you need to move your DynamoDB workloads to another cloud or on-prem, ScyllaDB can help you move fast with just minimal application refactoring – often just a one-line change. I talk a lot more about both of these situations in the blog post, DynamoDB: When to Move Out. Rule of thumb So the bottom line truly is “it depends.” But ScyllaDB could very well be overkill if your workload is under 10K OPS and You don’t expect much throughput growth, You’re satisfied with how DynamoDB meets your latency SLAs, and You have no foreseeable need to move beyond AWS Bonus: DynamoDB Cost Optimization Masterclass If you want more opinions, more strategies,and more options about DynamoDB cost optimization, I encourage you to take a look at this masterclass I recently participated in with Ales DeBrie and Miles Ward…