Optimizing a Fast Feature Store for Costs: Lessons Learned
How ShareChat reduced costs for its billion-feature scale feature store with ScyllaDB “Great system…now please make it 10 times cheaper.” That’s not exactly what the ShareChat team wanted to hear after completing a major engineering feat: scaling a real-time feature store 1000X without scaling their database (ScyllaDB). To stretch from supporting 1 million features per second to supporting 1 billion features per second, the team already… Redesigned their database schema to store all features together as protocol buffers and optimized tile configurations (reduced required rows from 2B to 73M/sec) Switched their database compaction strategy from incremental to leveled (doubled database capacity) Forked caching and gRPC libraries to eliminate mutex contention and connection bottlenecks Applied object pooling and garbage collector tuning to reduce memory allocation overhead You can read about those performance optimizations in Scaling an ML Feature Store From 1M to 1B Features per Second. But ShareChat – an Indian leader in a globally competitive social media market – is always looking to optimize. After reaching this scalability milestone, the team received a follow-up challenge: reducing the feature store’s costs by 10X (without compromising performance, of course). David Malinge (Sr. Staff Software Engineer at ShareChat) and Ivan Burmistrov (then Principal Software Engineer at ShareChat) shared how they approached this new challenge in a keynote at Monster Scale Summit 2025. Watch the complete talk, or read the highlights below. Note: Monster Scale Summit is a free + virtual conference on extreme scale engineering, with a focus on data-intensive applications. Learn from luminaries like antirez, creator of Redis; Camille Fournier, author of “The Manager’s Path” and “Platform Engineering”; Martin Kleppmann, author of “Designing Data-Intensive Applications” and more than 50 others, including engineers from Discord, Disney, Pinterest, Rivian, LinkedIn, Nextdoor, and American Express. Register (free) and join us March 11-12 for some lively chats Background ShareChat is one of the largest Indian media network, with over 300 million monthly active users. The app’s popularity stems from its powerful recommendation system – and that’s powered by their feature store. As Malinge explained, “We process more than 2 billion events daily to compute our features and serve close to a billion features per second at peak, with P99 latency under 20 milliseconds. We also read more than 30 billion rows per day in ScyllaDB, so we’re very happy that we’re not charged by the transaction.” Cloud Cost Optimization: Where Do You Even Start? When it came time to make this system 10 times cheaper, the team very quickly realized how complicated and confusing cloud cost optimization could be. For example, Burmistrov noted: Cloud billing is cloudy. AWS and GCP each have around 40,000 different SKUs, which makes it really hard to figure out where your money’s actually going. And cloud providers aren’t exactly motivated to make this easier for you to understand and debug. It’s easy to lose track of little things that add up. Maybe you forgot about an instance that’s still running. Maybe there’s an old deployment nobody uses anymore. Or perhaps you have storage buckets filled with data that should have expired by now (e.g., via Time to Live [TTL]). It all accumulates over time. Your cost intuition from on-prem doesn’t necessarily translate to the cloud. Something that was cost-efficient in your own data center might be expensive in the cloud, and what works well on one cloud provider might not be cost-effective on another. For the first optimization step, the team became sticklers for “hygiene”: clearing out anything they didn’t really need. This involved a deep cleaning for forgotten instances and deployments, unused buckets, data without proper TTLs, and overprovisioning. Having proper attribution was critical here. As Burmistrov explained, “Every workload, every instance, and every resource must be attributed to a specific service and a specific team. Without attribution, cost optimization is a global problem. With attribution, it becomes a local problem. Once each team can clearly see the costs associated with their own services, optimization becomes much easier. The team that owns the service has both the context and the incentive to improve its cost profile. They can see which components are expensive and decide what to do about them.” Cloud Trap 1: Getting Sucked Into Unsustainable Costs Next, they shifted focus to challenges unique to running applications in the cloud. To get the required cost savings, they had to navigate around a few cloud traps. The first trap was the allure and ease of using the cloud provider’s own products – particularly, databases. “They’re great to start with, but at scale they become painful in terms of cost,” explained Burmistrov. Scaling can be particularly costly for solutions that use a metered pricing model (e.g., based on traffic). “We used several GCP databases, including Bigtable and Spanner, but they became expensive and, more importantly, those costs were largely out of our control, we had little leverage over cost,” continued Burmistrov. “So we migrated many workloads to ScyllaDB. Today, more than 30 ScyllaDB clusters are deployed across the organization, including for our feature store use case. Besides stability and low latency, ScyllaDB gives us strong cost control. We can choose instance types, merge use cases into a single cluster, and run at high utilization. ScyllaDB works well at 80%, 90%, even close to 100% utilization, which gives us a very strong cost profile.” Cloud Trap 2: Kubernetes Wastage Next, the team tackled Kubernetes wastage. In Kubernetes, you deploy containers into pods, but you pay for nodes, which are actual VMs. You typically choose node types upfront – for example, nodes with 4 CPUs and 16 GB of memory. However, over time, workloads change, teams change, and these nodes are no longer optimal. Pods cannot fully occupy the nodes, either because of CPU, memory, or scheduling constraints. At that point, you’re paying for resources that aren’t being used. “You may decide, ‘Okay, let’s isolate workloads into separate node pools,’” explained Burmistrov. “But it’s not a solution because, as shown in the image below, we have one node completely wasted.” Simple isolation just moves waste around. The solution: dynamic node allocation based on workloads. Options include open source solutions like Karpenter, cloud-specific ones like GKE Autopilot, or commercial multi-cloud solutions like Cast AI. ShareChat likes Cast AI because it lets them fine-tune the configuration with respect to long-term commitments, spot instances, and other pricing impacts. Cloud Trap 3: Network Costs (“The Cloud Tax”) And then there’s network costs, which the ShareChat team calls “the cloud tax.” Massive apps like ShareChat deploy across multiple zones for high availability. However, multiple zones need to communicate with one another, and the traffic between zones comes with its own cost: Network Inter-Zone Egress (NIZE). That’s the outbound network traffic between availability zones within the same region. It’s billed separately by many cloud providers For example, if you deploy ScyllaDB in three GCP zones with two nodes per zone and use a non-ScyllaDB driver (not recommended), reads may cross zones both at the client level and inside the cluster. For traffic of volume T, you pay for T in NIZE. ScyllaDB drivers help here. As Burmistrov explained, “ Using ScyllaDB’s token-aware routing removes the extra hop inside the cluster, reducing inter-zone traffic. Using token-aware plus zone-aware routing ensures reads stay within the same zone, reducing inter-zone traffic to zero for reads.” Removing the extra hop cuts NIZE down to about 2/3 of T. On the write path, it’s a little different because replication is required. Burmistrov continued, “With three zones, writes generate 2 times T of inter-zone traffic. You can trade availability for cost by using two zones instead of three, reducing this to 1.5 times T. ScyllaDB also lets teams model each zone as a separate data center. In that case, for traffic T, you pay for T of inter-zone traffic. You generally can’t go lower unless you deploy in a single zone.” Tip: Oracle Cloud and Azure don’t charge for inter-zone traffic. If you use one of these cloud providers, you can deploy across three zones with zero inter-zone cost. Database Layer Optimization Another challenge: ShareChat’s generally stable read latencies would spike beyond their SLAs when write-heavy workloads would occur (like a backfill job, for example). The usual option – just scale up the database – would be simple, but the team wanted to reduce costs. Isolating reads and writes into separate data centers could technically improve performance, but that would be even worse from the cost perspective. That approach would double the infrastructure and probably lead to underutilization. Ultimately, they discovered and applied ScyllaDB’s workload prioritization. This lets them control how their various workloads compete for system resources. Malinge explained, “Under the hood, this relies on ScyllaDB’s thread-per-core architecture. Each core runs an async scheduler with multiple queues, each with its own priority. Workload prioritization maps directly to these priorities. Looking at the image, we have a very high share of compute for serving because we have strong SLAs on the read path, which is user facing. Most of the rest goes to less latency-sensitive workloads from asynchronous jobs, like writing the features to the database. We also keep a tiny share for manual queries (debugging, for example). This enables us to have exactly what we want – different latencies for reads and writes – and that allowed us to have the best possible serving for our features.” Feature Serving Layer Optimization That serving layer is a distributed gRPC (Google Remote Procedure Call) service responsible for caching and handling consumer requests. The team made some cost optimizations here too. One of the most impactful was a clever shortcut: instead of fully deserializing complex Protobuf messages, they treated repeated messages as serialized byte blobs. Since Protobuf encodes embedded messages using the same wire format as bytes, they could simply append these serialized records to merge data. With that, they could skip the heavy lift of fully unpacking and rebuilding the messages from scratch. That optimization alone could be the subject of an entire article; please see the video (starting at 18:15) for a detailed walkthrough of their approach. Malinge’s top lesson learned from optimizing this layer: “The first advice is the Captain Obvious one: don’t go blindly. We set up continuous profiling very early on. Nowadays, there are tons of tools available. Integration is super straightforward. You might recognize the vanilla GCP profiler in the image below. It’s just four lines of code to add in Go programs, and this has really helped us on our quest to reduce costs. We’ve actually unlocked more than a 50% reduction in compute by doing optimizations guided by continuous profiling.” Another lesson learned: if you’re using protobuf at high-RPS, serde is probably burning your CPU. Even with ~95% cache hit rates, the hot path was still deserializing protos, merging them, and serializing them again just to stitch responses together. Most of that work was pointless. The team ended up leveraging protobuf’s wire format, where repeated fields are just sequences of records and embedded messages can be treated as raw bytes (no deserialization). They switched to this “lazy” deserialization and merged cached protos without ever touching individual fields. For a practical example, see this repository david-sharechat/lazy-proto. And one parting tip: the wins compound fast. Benchmarking showed lazy merging was about 6 times faster, with about a third of the allocations. In production, that meant lower compute bills and better tail latency. Cost Optimization Actually Fostered Innovation The team’s smart work here underscores that cost optimizations don’t have to compromise the product. They can actually act as a catalyst for better system design. Malinge left us with this: “Our cost savings initiatives actually led to a lot of innovations. As I tried to represent in the left quadrant, there is a very unhealthy way to look at cost savings – one that focuses on shortcuts that negatively impact your product. The key is to stay on the right side of this quadrant, to aim for the sweet spot of reducing costs while improving the product.”