Book Excerpt: ScyllaDB, a Different Database
What’s so distinctive about ScyllaDB? Read what Bo Ingram (Staff Engineer at Discord) has to say – in this excerpt from the book “ScyllaDB in Action.”
Editor’s note: We’re thrilled to share the following excerpt from Bo Ingram’s informative – and fun! – new book on ScyllaDB: ScyllaDB in Action. You might have already experienced Bo’s expertise and engaging communication style in his blog How Discord Stores Trillions of Messages or ScyllaDB Summit talks How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB and So You’ve Lost Quorum: Lessons From Accidental Downtime If not, you should 😉
You can purchase the full 325-page book from Manning.com. You can also access a 122-page early-release digital copy for free, compliments of ScyllaDB. The book excerpt includes a discount code for 45% off the complete book.
The following is an excerpt from Chapter 1; it’s reprinted here with permission of the publisher.
***
ScyllaDB is a database — it says it in its name! Users give it data; the database gives it back when asked. This very basic and oversimplified interface isn’t too dissimilar from popular relational databases like PostgreSQL and MySQL. ScyllaDB, however, is not a relational database, eschewing joins and relational data modeling to provide a different set of benefits. To illustrate these, let’s take a look at a fictitious example.
Hypothetical databases
Let’s imagine you’ve just moved to a new town, and as you go to new restaurants, you want to remember what you ate so that you can order it or avoid it next time. You could write it down in a journal or save it in the notes app on your phone, but you hear about a new business model where people remember information you send them. Your friend Robert has just started a similar venture: Robert’s Rememberings.
ROBERT’S REMEMBERINGS
Robert’s business (figure 1.2) is straightforward: you can text Robert’s phone number, and he will remember whatever information you send him. He’ll also retrieve information for you, so you won’t need to remember everything you’ve eaten in your new town. That’s Robert’s job.
Figure 1.2 Robert’s Rememberings has a seemingly simple plan.
The plan works swimmingly at first, but issues begin to appear. Once, you text him, and he doesn’t respond. He apologizes later and says he had a doctor’s appointment. Not unreasonable, you want your friend to be healthy. Another time, you text him about a new meal, and it takes him several minutes to reply instead of his usual instant response. He says that business is booming, and he’s been inundated with requests — response time has suffered. He reassures you and says not to worry, he has a plan.
Figure 1.3 Robert adds a friend to his system to solve problems, but it introduces complications.
Robert has hired a friend to help him out. He sends you the new updated rules for his system. If you only want to ask a question, you can text his friend, Rosa. All updates are still sent to Robert; he will send everything you save to her, so she’ll have an up-to-date copy. At first, you slip up a few times and still ask Robert questions, but it seems to work well. No longer is Robert overwhelmed, and Rosa’s responses are prompt.
One day, you realize that when you asked Rosa a question, she texted back an old review that you had previously overwritten. You message Robert about this discrepancy, worried that your review of the much-improved tacos at Main Street Tacos is lost forever. Robert tells you there was an issue within the system where Rosa hadn’t been receiving messages from Robert but was still able to get requests from customers. Your request hasn’t been lost, and they’re reconciling to get back in sync.
You wanted to be able to answer one question: is the food here good or not? Now, you’re worrying about contacting multiple people depending on whether you’re reading a review or writing a review, whether data is in sync, and whether your friend’s system can scale to satisfy all of their users’ requests. What happens if Robert can’t handle people only saving their information? When you begin brainstorming intravenous energy drink solutions, you realize that it’s time to consider other options.
ABC DATA: A DIFFERENT APPROACH
Your research leads you to another business – ABC Data. They tell you that their system is a little different: they have three people – Alice, Bob, and Charlotte – and any of them can save information or answer questions. They communicate with each other to ensure each of them has the latest data, as shown in figure 1.4. You’re curious what happens if one of them is unavailable, and they say they provide a cool feature: because there are multiple of them, they coordinate within themselves to provide redundancy for your data and increased availability. If Charlotte is unavailable, Alice and Bob will receive the request and answer. If Charlotte returns later, Alice and Bob will get Charlotte back up to speed on the latest changes.
Figure 1.4 ABC Data’s approach is designed to meet the scaling challenges that Robert encountered.
This setup is impressive, but because each request can lead to additional requests, you’re worried this system might get overwhelmed even easier than Robert’s. This, they tell you, is the beauty of their system. They take the data set and create multiple copies of it. They then divide this redundant data amongst themselves. If they need to expand, they only need to add additional people, who take over some of the existing slices of data. When a hypothetical fourth person, Diego, joins, one customer’s data might be owned by Alice, Charlotte, and Diego, whereas Bob, Charlotte, and Diego might own other data.
Because they allow you to choose how many people should respond internally for a successful request, ABC Data gives you control over availability and correctness. If you want to always have the most up-to-date data, you can require all three holders to respond. If you want to prioritize getting an answer, even if it isn’t the most recent one, you can require only one holder to respond. You can balance these properties by requiring two holders to respond — you can tolerate the loss of one, but you can ensure that a majority of them have seen the most up-to-date data, so you should get the most recent information.
Figure 1.5 ABC Data’s approach gives us control over availability and correctness.
You’ve learned about two imaginary databases here — one that seems straightforward but introduces complexity as requests grow, and another with a more complex implementation that attempts to handle the drawbacks of the first system. Before beginning to contemplate the awkwardness of telling a friend you’re leaving his business for a competitor, let’s snap back to reality and translate these hypothetical databases to the real world.
Real-world databases
Robert’s database is a metaphorical relational database, such as PostgreSQL or MySQL. They’re relatively straightforward to run, fit a multitude of use cases, and are quite performant, and their relational data model has been used in practice for more than 50 years. Very often, a relational database is a safe and strong option. Accordingly, developers tend to default toward these systems. But, as demonstrated, they also have their drawbacks. Availability is often all-or-nothing. Even if you run with a read replica, which in Robert’s database would be his friend, Rosa, you would potentially only be able to do reads if you had lost your primary instance. Scalability can also be tricky – a server has a maximum amount of compute resources and memory. Once you hit that, you’re out of room to grow. It is through these drawbacks that ScyllaDB differentiates itself.
The ABC Data system is ScyllaDB. Like ABC Data, ScyllaDB is a distributed database that replicates data across its nodes to provide both scalability and fault tolerance. Scaling is straightforward – you add more nodes. This elasticity in node count extends to queries. ScyllaDB lets you decide how many replicas are required to respond for a successful query, giving your application room to handle the loss of a server.
***
Want to read more from Bo? You can purchase the full 325-page book from Manning.com Also, you can access a 122-page early-release digital copy for free, compliments of ScyllaDB.