Database Performance at Scale: Hear from the Book Authors
Hear from Database Performance at Scale book authors (and masterclass presenters)
If your team has ever worried about database performance at scale, you’ve come to the right place. We wrote the book on it.
Felipe Cardeneti Mendes, Piotr Sarna, Pavel Emelyanov, and I recently collaborated on Database Performance at Scale, an Apress “Open Access” book (available for free, published under the Creative Commons license)
The 270-page book stems from an unconventional global collaboration between Felipe Cardeneti Mendes, Piotr Sarna, Pavel Emelyanov, and myself. Fun fact: we produced the entire book without a single all-author virtual meeting – and we’ve never all met in person.
We wrote the book to share our collective experience with performance-focused database engineering as well as performance-focused database users. It represents what we think teams striving for extreme database performance — low latency, high throughput, or both—should be thinking about, but often overlook. This includes the nuances of DB internals, drivers, infrastructure, topology, monitoring, and quite a lot more.
As with any engineering challenge, there’s no one-size-fits-all solution. But there are a lot of commonly overlooked considerations and opportunities. We try to guide you through the database performance traps and tradeoffs that your peers have already faced and share the lessons learned. Hopefully, you’ll walk away with some new ideas for optimizing database performance for your team’s specific use cases and technical requirements.
We invite you to dive into the 270-page book and to share your brutally honest feedback with us (find us on the socials). But we also know that not everyone has time to read a book these days. So the authors – plus one of our tech reviewers – recently came together to deliver a Database Performance at Scale masterclass sharing key points from the book in just a couple of hours.
Behind the Book: Author Q &A
Before the event, I thought it would be fun to share author perspectives on some topics related to database performance at scale, the writing process, and other recommended technical resources. Here’s a look at the responses…
Piotr Sarna
Piotr is a software engineer who is keen on open-source projects and the Rust and C++ languages. He previously developed an open-source distributed file system and had a brief adventure with the Linux kernel during an apprenticeship at Samsung Electronics. He’s also a long-time contributor and maintainer of ScyllaDB, as well as libSQL. Piotr graduated from University of Warsaw with an MSc in Computer Science.
What is your favorite part of the book and why?
My favorite part is the blank pages added as padding before each chapter starts so that it always happens on the right. It really helped us reach our 270-page goal! And from the technical point of view, I really enjoyed writing the Drivers chapter, since I could finally put all the frustration from designing and redesigning ScyllaDB Rust Driver to practical use.
Editor’s note: We actually signed up to write 150 pages, but it seems we’re overachievers. This came back to haunt (some of) us during the 2-day final proofreading sprint.
What’s one of the easiest things a reader can do to improve database performance?
Put your database as close to the users as possible. If you’re targeting European markets with a hyperspeed database running from the us-east-1 AWS region, you’re not going to look very fast.
What technical books have you enjoyed?
Do Brandon Sanderson‘s elaborate magical systems in his fantasy books count as technical? And if you mean technical books of the boring genre, I really enjoyed Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron, and Robert Love’s Linux Kernel Development.
What’s your preferred writing environment?
At home, standing desk, morning hours. I tend to be most productive with “Hooked on Classics 2″ looped on YouTube for 4 to 6 hours.
What was the most interesting or surprising aspect of writing a book?
I expected to experience this mythical “writer’s block” and I didn’t, which I found quite surprising. I think the secret sauce is to have at least 3 co-authors, including at least one very professional writer. Preferably, professional enough to sneakily omit herself in the list of authors in a blog post, to avoid having to answer multiple obligatory questions.
Editor’s note: No comment 😉
Pavel “Xemul” Emelyanov
Pavel is an ex-Linux kernel hacker now speeding up row cache, tweaking the IO scheduler, and helping to pay back technical debt for component interdependencies. He is a Principal Engineer at ScyllaDB.
What is your favorite part of the book and why?
The chapter about B/B+ trees (Chapter 4), simply because it was the part I coded in ScyllaDB. Prior to coding it, I did some research and it was really fun.
What’s one of the easiest things a reader can do to improve database performance?
I’d say – equip it with a good monitoring system (and relevant metrics if you can modify the DB engine). Knowing your bottlenecks is 80% of solving the problem.
What technical books have you enjoyed?
My favorite one is Charles Wetherell’s Etudes for Programmers. It’s quite an old one already, but still available online… somewhere.
What’s your preferred writing environment?
VIM, evening, silence.
What was the most interesting or surprising aspect of writing a book?
The “write vs publish” efforts ratio. I thought that writing would be the longest part so the ratio would be at most 50:50. Surprisingly, writing was fast, easy, and tons of fun. Editing, correcting, polishing, shuffling chapters, bruising up images, etc took most of the time.
Felipe Cardeneti Mendes
Felipe is an IT specialist with years of experience on distributed systems and open source technologies. He has co-authored three Linux books and is a frequent speaker at public events and conferences to promote open source technologies. At ScyllaDB, he works as a Solution Architect.
What is your favorite part of the book and why?
Chapter 2 (Your Project, Through the Lens of Database Performance) starts discussing several considerations to think about before you actually start planning your solution. It should give readers a good overview of the multiple factors involved when working at scale. For example, a write-mostly workload will behave differently than a read-mostly workload, which will behave differently than a delete-heavy one. These also vary depending on the underlying database implementation. Therefore, these and other specific workload attributes need to be well-known whenever you decide to improve performance or evaluate a different vendor.
What’s one of the easiest things a reader can do to improve database performance?
You definitely want to invest time in selecting the right tool for the job. If you are currently struggling with your existing solution, it may be time to try to understand what workload nuances are causing you harm. We’ve seen several situations where a team starts with a small relational database deployment that later suffers from elevated latencies at scale. Of course, performance can be very subjective since your requirements and budget will greatly vary compared to other applications. In that sense, it definitely makes sense to start small – but consider anticipating where you want to be in the future and choose your database with that in mind.
What technical books have you enjoyed?
Martin Fowler’s NoSQL Distilled is definitely at the top of my shelf. It is a great book that covers most of the fundamentals when it comes to distributed NoSQL databases. It is a great resource for teaching you how to change your mindset compared to relational workloads, which is – by far – the main aspect that most people have trouble with in the field. After that, but no less important, Martin Kleppmann’s Designing Data-Intensive Applications is a must-read for learning about all the nuances and inner workings of databases and other distributed systems with a focus on performance.
Bonus: Access free Designing Data-Intensive Applications chapters courtesy of ScyllaDB.
What’s your preferred writing environment?
Believe it or not, I am a night person. Admittedly I can fairly easily get distracted which – as you may guess – can be very counterproductive for writing. One of my advantages, however, is that I live and work in the Brazil timezone. That means my mornings are typically free to catch up on things, but the afternoons can get very busy. Since firefighting is a reality for almost anyone who works in tech, it is during the night (while my puppy is sleeping on my feet) when I typically can find some focus time.
What was the most interesting or surprising aspect of writing a book?
The whole experience of writing the book and the opportunity to connect with brilliant authors from different countries and with totally different backgrounds, experiences, and visions was very interesting. Notably, the whole process of writing and later on tech reviewing each chapter was a learning journey on its own. However, the most surprising aspect of writing the book was after it was done, when I realized that I would’ve been a much better professional if I had been introduced to such content upfront when I started my career.
What Readers Are Saying
A quick look at some of the buzz so far:
Keep Learning: Podcasts with Book Authors
Piotr Sarna and Felipe Cardeneti Mendes were recently invited to share their insights on various tech podcasts.
Piotr on The Distributed Fabric Pod
Piotr started the podcast ball rolling with an appearance on The Distributed Fabric Pod, where host Vipul Vaibhaw explores the fascinating world of distributed systems, database internals, deep learning, programming languages etc. In this episode, Vipul and Piotr chatted about the book, database drivers and database internals, Zig vs Rust, the broader challenges of distributed systems, and advice for new engineers.
Felipe on The Geek Narrator
Felipe kept the podcast momentum going with The Geek Narrator (Kaivalya Apte), who just kicked a new series focused on database internals – featuring ScyllaDB as well as DynamoDB, Cassandra, CockroachDB, DuckDB, Neo4J, TiDb, Clickhouse, and more. Kaivalya and Felipe took a deep dive into ScyllaDB’s unique close-to-the-hardware design, with an emphasis on why design decisions like our shard-per-core architecture, specialized cache, and IO scheduling matter for database users who require predictable performance at scale.