Augury Chooses ScyllaDB for Ease-of-Use and Efficiency vs Cassandra

March 10, 2022

Daniel Barsky discusses the reasons behind Augury's choice of ScyllaDB over Cassandra. [Read complete case study]

Transcript

Hello, my name is Daniel Barsky. I'm Senior Machine Learning developer at Augury. Augury is building a world where people can rely on the machines that matter to them by leveraging multiple sensor data, massive sensor data and industrial IoT, using machine learning and deep learning to convert those into actionable insights about the machines that are running that need to be maintained.

We had a need for some time to be able to store and retrieve machine contexts a large period back . We've used some solutions that were not really suitable for the task. And around two years back, we realized that we needed to make a change in the underlying infrastructure to be able to support both getting that context in the stream and also doing big data analytics, converting those into data sets for machine learning models.

We evaluated a few solutions where, aside from the data modeling and queries that we wanted to be able to make, we were really looking for a solution that we could manage ourselves if needed, that we could deploy easily and maintain easily, and would not be dependent on a very specific skill set in terms of infrastructure or DevOps.

So we evaluated Cassandra and several other vendor-related solutions such as DynamoDB and Google Bigtable. We also evaluated time-series databases such as InfluxDB, and each one had their own small kink that really made it a deal breaker for us that we couldn't use in the specific use cases we needed.

Cassandra seemed like the best fit, but a short evaluation made it seem like it was very hard to deploy and maintain. We didn't really have the bandwidth or the manpower to go into that.

Our Chief Architect encountered ScyllaDB sometime back in some technical blog post, and suggested it as a possible alternative. That really sealed the deal for us. We did some internal integration, deployed a small cluster, internally locally in a test environment, and saw that even someone who's not senior DevOps or infrastructure engineer could maintain a simple cluster, maybe not optimized to the extreme, but that really convinced us that ScyllaDB was a good way to go for us. And with all of the added customizations and tweaking we were able to really leverage that to a solution that is very, very efficient and very effective for us.

So in terms of performance, we're getting very good performance. We also didn't plan on incorporating that as the back end for the web dashboard. But, pretty early on, we saw that it's exceeding our expectations in terms of performance. We basically got two for the price of one in that context, which was a major win for us. We're also still using that to construct our datasets and train our models using parallel frameworks. such as Apache Spark and Apache Beam. So ScyllaDB really plays nicely with anything that enables parallel access, which also is a major win for us. So we keep going strong with ScyllaDB. And I'm sure we'll find a lot more benefits in the future.