How ScyllaDB Helped an AdTech Company Focus on Core Business
Editor’s Note: This article was originally published on The New Stack.
GumGum is a company whose platform serves up online ads related to the context in which potential customers are already shopping or searching. (For instance: it will send ads for Zurich restaurants to someone who’s booked travel to Switzerland.) To handle that granular targeting, it relies on its proprietary machine-learning platform, Verity.
“For all of our publishers, we send a list of URLs to Verity,” according to Keith Sader, GumGum’s director of engineering. “Verity goes in and basically categorizes those URLs as different [internal bus] categories. So the IB has tons of taxonomies, based on autos, based upon clothing based upon entertainment. And then that’s how we do our targeting.”
Verity’s targeting data is stored in DynamoDB, but the rest of GumGum’s data is stored in managed MySQL and its daily tracking data is stored in ScyllaDB, a database designed for data-intensive applications. ScyllaDB, Sader said, helps his company avoid serving audiences the same ads over and over again, by keeping track of which ads customers have already seen.
“That’s where ScyllaDB comes into the picture for us,” he said. “ScyllaDB is our rate limiter on ad serving.”
In this episode of The New Stack’s Makers podcast, Sader and Dor Laor, CEO and co-founder of ScyllaDB, told how GumGum has used ScyllaDB to shift more IT resources to its core business and keep it from repeating ads to audiences that have already seen them, no matter where they travel.
This case study episode of Makers, hosted by Heather Joslyn, TNS features editor, was sponsored by ScyllaDB.
“With ScyllaDB, we have pretty much reduced our entire operations effort to almost nothing… The toughest thing to do in this industry is to make things look easy. And ScyllaDB helped us make ad serving look easy.” Keith Sader, GumGum’s director of engineering
‘Where Do We Spend Our Limited Funds?’
Before adding ScyllaDB to its stack, Sader said, “We had a Cassandra-based system that some very smart people put in. But Cassandra relies upon you to have an engineering staff to support it.
“That’s great. But like many types of systems, managing Cassandra databases is not really what our business makes money at.”
GumGum was hosting its Cassandra database, installed on Amazon Web Services, by itself — and the drain on resources brought the company’s teams to a crossroads, Sader said. “Where do we spend our limited funds? Do we spend it on Cassandra maintenance? Or do we hire someone to do it for us? And that’s really what determined the switch away from a sort of self-installed, self-managed Cassandra to another provider.”
A core issue for GumGum, Sader said, was making sure that it wasn’t over-serving consumers, even as they moved around the globe. “If you see an ad in one place, we need to make sure, if you fly across the country, you don’t see it again,” he said.
That’s an issue Cassandra previously solved for his company, he said. Because ScyllaDB is an API-compatible replacement for Apache Cassandra, it also helped prevent over-serving in all regions of the globe — thus preventing GumGum from losing money.
In addition to managing its database for GumGum and other customers, Laor said that an advantage ScyllaDB brings is an “always on” guarantee.
“We have a big legacy of infrastructure that’s supposed to be resilient,” he said. “For example, every implementation of ours has consistent configurable consistency, so you can have multiple replicas.”
Laor added, “Many many times organizations have multiple data centers. Sometimes it’s for disaster recovery, sometimes it’s also to shorten the latency and be closer to the client.” Replica databases located in data centers that are geographically distributed, he said, protect against failure in any one data center.
Seeing Results
Bringing ScyllaDB to GumGum was not without challenges, both Sader and Laor said. When ScyllaDB is added to an organization’s stack, Laor said, it likes to start with as small a deployment as possible.
“But in the GumGum case, all of these clients were new processes,” Laor said. So hundreds or thousands of processes, all trying to connect to the database, it’s really a connection storm.”
Scylla’s team created a private version of its database to work on the problem and eventually solved it: “We had to massage the algorithm and make sure that all of the [open source] code committers upstream are summing it up.”
It ultimately designed an admission control mechanism that measures the number of parallel requests that the distributed database is handling, and slows down requests that arrived for the first time from a new process. “We tried to have the complexity on our end,” Laor said.
GumGum has seen the results of handing off that complexity and toil to a managed database. “With ScyllaDB, we have pretty much reduced our entire operations effort to almost nothing,” Sader said.
He added, “We’re coming into our busy point of the year, ads really get picked up in Q4. So we reach out so we go, ‘Hey, we need more nodes in these regions, can you make that happen for us?’ They go, ‘Yep.’ Give us the things, we pay the money. And it happens.”
In 2021, Sader said, “we increased our volume by probably 75% plus 50%, over our standard. The toughest thing to do in this industry is to make things look easy. And ScyllaDB helped us make ad serving look easy.”
Listen to — or watch — the complete podcast (above) to get more detail about GumGum’s move to a managed database.