Natura: The Short and Straight Road That Leads from Cassandra to ScyllaDB
Natura is a multibillion-dollar Brazilian cosmetics company that prides itself on its ecological friendliness, bioethics and sustainable business practices. Its operations now span over 70 countries, 3,200 stores, 17,000 employees, and a human network of 1.8 million sales consultants. In this presentation, Felipe Moz, Big Data Engineer at Natura, discuss esthe technical and business drivers behind Natura’s decision to migrate from Datastax Cassandra to ScyllaDB, business scenarios involved, and expectations/results.
Natura is a multibillion-dollar Brazil-based consumer beauty products company that prides itself on its ecological friendliness, bioethics and sustainable business practices. Its operations span over 70 countries, 3,200 stores, 17,000 employees, and a human network of 1.8 million sales consultants. These operations supports sales to over 100 million consumers.
The global powerhouse is comprised of three major brands: Natura, founded out of Brazil, Aēsop, founded out of Australia, and The Body Shop, founded out of the United Kingdom.
To support Natura’s global operations requires managing a tremendous amount of consumer data. Felipe Moz, Big Data Engineer at Natura, took the time at ScyllaDB Summit 2018 to describe Natura’s corporate data infrastructure, and why they migrated from DataStax Enterprise (a proprietary version of Apache Cassandra) to ScyllaDB to continue scaling their business.
Natura’s architecture is front-ended with NGINX and Node.js. Data from user activities with their website are streamed over Apache Kafka into Apache Spark jobs — performing about 160,000 RDDs of streaming data as well as processing around 40 “long batch” jobs on a daily basis.
In their architecture, ScyllaDB replaced DSE for key-value data. MongoDB is still used for document-oriented data. The reason DSE was replaced was to avoid JVM issues. Natura had suffered significant performance issues.
Natura's Data-Intensive Architecture
Natura’s Big Data Architecture, deployed and managed using Docker and Kubernetes, includes NGINX, Apache Kafka and Apache Spark, NoSQL databases including MongoDB and ScyllaDB, as well as a Talend integration to an Oracle SQL database.
They aren’t our provider. They are a part of our team.
— Felipe Moz, Natura
Beyond the product aspects of ScyllaDB, Felipe emphasized the value of ScyllaDB’s enterprise support. “They know our use case and data modeling.” He was also glad to be able to get direct access to developers when needed through ScyllaDB’s Slack channel.
Then Felipe got down to the bottom line, favorably comparing the cost of ScyllaDB versus DataStax Enterprise. To provision hardware for Datastax Enterprise on Microsoft Azure required five DS14 v2 servers, which cost $2,000 per node for 744 hours (equivalent to a 31-day month), resulting in a monthly cost of $10,000, thus an annual cost of over $120,000 for the required hardware. With ScyllaDB running on AWS i3.4xlarge instances, the monthly cost per node dropped to $913.54, making the monthly hardware costs around $4,600, annualized to less than $55,000. This alone saved Natura over half their annual hardware expenditure.
On top of the cost savings, Felipe noted significant performance benefits. Batch processing times on ScyllaDB dropped on average to 10% of what they used to take on DataStax. In some cases, batch jobs that took 6 hours to run on DSE took less than 10 minutes to run on ScyllaDB (1/36th, or 0.27% of the time).
For streaming analytics, Spark jobs that used to take 76 ms to complete on DSE were accomplished in as little as 6 ms on ScyllaDB.
And for pure database performance, for p95 writes latencies dropped from 220 milliseconds (ms) on DSE to around 500 microseconds (μsec) on ScyllaDB.
ScyllaDB provided Natura with performance an order of magnitude greater than DataStax Enterprise for half the cloud server cost.