Augury: Real-Time Insights for the Industrial IoT
Augury stores and serves time-series features from massive streams of IoT data, both for real-time insights, and offline learning and analytics. Learn about Augury’s needs and constraints, their solution evaluation and architecture, and fundamental practices for efficient data modeling, plus get a glimpse into the next-gen architecture at Augury, with a view on time-series feature storage and serving.
Here's a summary of the above use case by Amit Ziv-Kenet, Backend Tech Lead at Augury, and Daniel Barsky, Senior Machine Learning Engineer and Algorithms Tech Lead at Augury.
Using predictive technology, Augury makes machines more reliable by combining two key shifts in the industry: artificial intelligence and the Internet of Things (IoT). The intersection of these trends allows us to provide visibility into the actual condition of equipment, enabling our customers to avoid surprises, make informed reliability decisions, and increase the lifetime value of critical equipment. To accomplish all this, we at Augury deliver accurate and actionable insights into the health and well being of industrial equipment.
By delivering real-time insights and historical analytics about the condition of machines on the manufacturing floor, we help our customers perform the right maintenance at the right time. Beyond just providing alerts, we provide actionable suggestions on the root cause of faults, recommendations on which checks to perform, and where to best focus the maintenance attention needed to keep machines running smoothly and avert catastrophic failures. The results speak for themselves — our customers average 75% fewer breakdowns, 30% lower asset costs, and have 45% higher uptime.
Today, Augury’s continuous diagnostics system can be found in over 7,000 facilities across North America and Europe, monitoring machines in industrial and commercial facilities spanning the Consumer Packaged Goods, Food & Beverage and Pharmaceutical Manufacturing industries.

Augury IoT in Action
The Challenge
We provide sensors to collect machine data, measuring vibration, temperature, ultrasonic, and electromagnetic emissions. Our algorithms, built on traditional vibration analysis knowledge, are tailored to the specific metrics associated with each type of machine. These metrics are augmented by our expertise and data specific to machines and their vulnerabilities.
We initially built our predictive services on top of MongoDB. Yet as our company grew the dataset reached the limits of MongoDB. We realized the need for data infrastructure that could continue to scale horizontally. Our team evaluated Apache Cassandra. While it initially looked promising, our team recognized there was a solution that provided the same type of features and data access without requiring a solution with Apache dependencies, such as Zookeeper. For this reason and more, we began focusing our attention on a Cassandra drop-in alternative: ScyllaDB.
The Solution
Our team conducted a proof of concept (POC) to evaluate ScyllaDB’s architecture and features, focusing on ensuring it could model our machine data properly. We also wanted to ensure support for high volume, offline analytics using Apache Spark and Beam with high levels of parallelism. Given our multi-data center architecture, we needed to demonstrate we could access production data from research environments. The good news? The system requires less management and configuration to keep it running. ScyllaDB just runs smoothly and with minimal configuration.

Augury devices
ScyllaDB’s main benefit for Augury is the ability to access the data in a way that precisely matches our use cases, which we had been unable to do previously. With ScyllaDB, we have yet to find a use case where we needed the data in a certain way, or we needed a certain query, or a certain format of the data, and we weren’t able to achieve that simply using ScyllaDB.
Our team was delighted to discover how ScyllaDB was fast enough to support real-time user interface queries as well. We are currently using ScyllaDB for OLTP-type access with millisecond queries for actually visualizing and providing APIs over our time-series data. At the same time, we use it for our OLAP use cases using the Spark connector with Apache Beam and Telescope. Being able to combine the two types of processing against a single cluster has reduced the cost and complexity of our data infrastructure.
The second largest benefit of adopting ScyllaDB is the reduction in management overhead. Self-hosting can be very expensive and time-consuming. We haven’t encountered any stability issues with ScyllaDB, and the system requires less management and configuration to keep it running. ScyllaDB just runs smoothly and with minimal configuration.
We finished migrating diagnostics and analytics data from MongoDB to ScyllaDB. We currently run a three-node ScyllaDB cluster in each data center, a total of six nodes, with plans to expand in the future.
It’s been a smooth ride so far. With lean DevOps resources, we tried to stay away from infrastructure, which is very heavy and requires loads of ongoing maintenance. ScyllaDB is perfect for us, since we don’t have the dedicated resources to babysit our database.