The State of NoSQL: Trends & Tradeoffs
If you’re hitting performance and scalability plateaus with SQL and are considering a migration to NoSQL, the recent discussion hosted by TDWI is a great place to get up to speed, fast. ScyllaDB VP of Product Tzach Livyatan was invited to join TDWI Senior Research Director James Kobielus to explore some of the most common NoSQL questions that they’ve been hearing recently. You can watch the complete video here:
Here are a few highlights from the NoSQL trends and tradeoffs discussed.
What’s driving wide column NoSQL database adoption (e.g., Cassandra, Bigtable, ScyllaDB?
- What we were calling “Big Data” 10 or 15 years ago has become bigger and bigger. The number of data-generating sensors and devices is only increasing, and people need to store all this data in a way that allows them to make it valuable.
- Databases and many other systems are becoming much more distributed. You have customers all around the world, and all of them want low latency, all of them want fast responses. As a result, splitting the data between countries or regions is often critical, but it’s not simple. You might partition the data between countries and have each country access only its own data. Someone might have an account in the US, but want to access their account while traveling in Europe. You would need global synchronization between all these regions, but it doesn’t always have to be completely synchronous. Asynchronous might be fine in many cases.
- We’ve also noticed a significant increase in organizations with use cases that require low latency – across domains like gaming, IoT, and media streaming. Wide column databases have proven to be a very effective way to meet low latency expectations, especially at high throughputs.
Learn more about wide column databases
What are some of the tradeoffs of NoSQL?
There are probably thousands of different NoSQL databases out there, all with slightly different approaches to negotiating tradeoffs. NoSQL started as a relaxation from the model of what used to be known as a relational database (RDBMS) – Oracle, MySQL, and such. Some properties were relaxed and some properties were gained. For example, transactions were relaxed – NoSQL does not support the same level of transactions as RDBMS. But, you gain more availability and more distribution of data. Some databases relax the schema (for example, MongoDB a document database) to gain some more flexibility from the application side. There are many tradeoffs in NoSQL: distribution, transactions, latency, scale, flexibility, and more. Different NoSQL databases take different positions on these tradeoffs.
The CAP theorem is another big tradeoff. If you want your system to always be available, you have to give up on another property, which is full consistency. Imagine that you have two servers in two data centers, located in two different regions. What happens if one goes down and you can’t guarantee that the data is the same on both servers? In this case, databases that prioritize consistency limit availability – they prevent reads and writes that they cannot guarantee will be consistent. But databases that prioritize availability will continue serving reads and writes, even if the two servers might not be in sync for a short amount of time. In many cases, this is fine. Consider your cable or your Netflix – you always want to have some service, even if in some rare cases you’re not getting the latest update from your watch history.
How does NewSQL (Distributed SQL) compare with NoSQL? Is the distinction between SQL and NoSQL blurring?
When NoSQL was born, many requirements were relaxed to gain availability – for example, transaction support and full SQL (with JOINs, etc.) . But, in recent years, some databases actually managed to merge distribution and, to some extent, availability – plus also support for transactions and full SQL.
However, if you do want to support SQL, you must pay for it – usually in latency. This latency comes from the fact that you need to send a lot of messages between the distributed system nodes to reach a consensus, and all these messages require processing, which impacts latency. The end result is that these databases – some of which are quite cool – still cannot compete with NoSQL’s excellent performance. But, you gain things like support for transactions. Again, it’s a tradeoff. PACELC, an extension to the CAP theorem, formalizes this tradeoff.
Will a Kubernetes-based cloud native orchestration model become standard for databases in the next few years?
The trend is obvious: Everyone is moving to Kubernetes, and now databases are finally joining the party. Kubernetes was created and designed largely for stateless applications. When using stateful applications, it’s still a challenge to some extent, but the market has spoken and everyone is moving to Kubernetes.
For a database that focuses on performance, it can be a challenge to retain that level of performance and low latency on Kubernetes because it adds latency – as does any kind of abstraction layer that you add to the system. It’s certainly not trivial; but we’ve worked on it quite a bit at ScyllaDB and we’ve shown that it is possible.