Epic App Performance Starts With the Database

Tyler Benfield
Tyler Benfield

Tyler Benfield focused on the importance of optimizing database performance to improve the overall functioning of applications. It was highlighted that an application's speed is often dictated by its slowest database query. Therefore, optimization efforts should prioritize improving database performance.

Tyler demonstrated the significant potential of database performance enhancement, mainly through the use of indexes. Indexes were compared to an alphabetically organized address book, speeding up data retrieval from unindexed tables. He emphasized the need to consider multiple columns within a single index to optimize queries effectively.

Other database performance factors considered were network latency, open database connections, and the use of caching for inherently slow queries. Tyler recommended the use of connection poolers and stressed the importance of the database and application's proximity to minimize latency.

Tyler emphasized continuous monitoring and tracking of database query performance trends, recommending indexing all frequently used queries and foreign keys for immediate improvements.

Transcript

I'm here to talk about how epic app performance starts with a database.

That sounds kind of scary for a web dev conference, right? Most of my career, like many of you, I have been a web developer, but I've been very fortunate to get a lot of opportunities to dive really deep into databases and learn how they work. Whether that was running production systems, working at a company that actually built database

performance monitoring software, and now working at Prisma. So what I'd like to do is take the next 20 minutes and try to summarize a lot of what I've learned into this short time frame and maybe leave you with some things you can take back and apply to your own apps. Something I'd like to say is your app is only as fast as its slowest query.

We spend so much time talking about our app performance, whether it's like optimizing our back-end algorithms, how we deliver our content to the front-end, how we optimize our front-end rendering, but we very seldom talk about the database performance. I find that really interesting because the database is sort of fundamental to so much of what we do on the web. It is like the lowest level.

It's the first thing that has to happen to get data into the front-end, but we don't really have a lot of focus on how to make it faster, and it's actually pretty low-hanging fruit sometimes. So to let that kind of sink in, I want to share a little bit of a story. A while back, I was at a company that built an e-commerce platform, and one of the sites

that we manage had a kind of notable name behind it, and sometimes they would go post on social media, as they do, and we get a huge influx of users, and it would be enough to kind of bring the site to its knees. It would come down and have errors. It was kind of bad. So the team messaged me and said, hey, Tyler, can you come in and see what's going on here? The first thing I did wasn't to pull the app code.

I didn't go look at traces and things like that. Those things are all good, but I jumped straight to the database, and luckily, we were on a cloud vendor at the time that had really good analysis of how our database was performing. The very first thing on the list was this query that hit a session table, and it was

a really simple query, a few columns, one table, but that query ran an average of 140 to 200 milliseconds average duration. It executed 14,000 to 20,000 times an hour. It was like multiple times on a single page load, and on a normal workload, not even under

a spike, it consumed an entire CPU off of our database just by itself, leaving the rest for everything else. I added one index, single index, just on a few columns. It went from 140 to 200 milliseconds average. That's a crazy spread on an average, right? That's a symptom in itself.

It went down to 0.04 milliseconds consistently, 30% drop in active connections, and 20% drop in CPU. It was pretty much idle after this. The database was chilling on the beach. I'm not a prompt engineer. This is the best I've got from AI.

So, yeah, after this, fast forward a few weeks, added a few more indexes, tuned up a couple more things. We ended up slashing our database tier on our cloud provider, saved a bunch of money. The app still ran better than it did before. So I'm Tyler Binfield. I'm a staff software engineer at Prisma.

I work primarily on the Prisma data platform, so if you've been following us, that would be Prisma Accelerate, Prisma Pulse. I also just love databases. I love working in database performance and helping people learn how to get the most out of their database. So what you can expect from the rest of this talk, we're going to learn to think like a database.

It's actually surprisingly relatable. We're going to learn how to make databases fast. We're going to talk about a little bit of the nuance, as we have time for, and then we're going to leave with some things you can just take away and apply it yourself. So how do we think like a database? Remember physical address books?

Maybe I'm kind of old, but we had these pen and paper books we would carry around, and when we met somebody, we would write their name, phone number, whatever in this book. And the weird thing is, these were ordered by the time that we met somebody, right? So as it grew and it got worse, we would have to flip through all these pages trying to find the name of somebody that we wanted to call up.

Well, this is exactly how tables work. If you don't put any indexes and you just deploy a table in a database, everything's organized by the primary key. And that's great if you know the ID that you're looking for. You're going to jump straight to it, grab the record super fast, but for just about everything else, it is terrible. If you want to find somebody by their first name and last name, the database has to look

through every single record in the database around that table just to find the one that you cared about. Well, then with those address books, we had these little tabs down the side of it that had letters on it, right? And that would help you jump to this very specific point, you know, if you wanted to look up somebody that had like the last name starting with B, you could jump straight there.

Then you had a much smaller list to filter through. So you still might have to filter through some items, but it was, it grew at a much slower pace than the number of people that you met overall. But there was another problem. If you wanted to say, call up everybody whose birthday is today to wish them a happy birthday,

you would still have to flip through every single page in your address book looking for people whose birthday is today and make a list and then call them. This is kind of how indexes work. They're really great for filtering. You can index on one or more fields on a table. You can have multiple indexes as we'll look at. And they can narrow results either to a really specific record or a subset of records or

even partially narrow it and then apply some more filters that aren't in the index after that. One more terrible AI generation here. This is what I think a calendar looks like. So we can also use a calendar in the physical world. So if we wanted to go back to the birthday example, we can have a calendar where we put

our names of our friends whose birthday is today. Makes it really easy for us to go find who we want to call today, right? And the nice thing is we don't have to duplicate all of that information like their phone number, address, that sort of thing. We still have our address book. So we can look to see whose birthday is today, whose birthday is tomorrow, look them up in our address book and give them a call.

Well, this is how multiple indexes can coexist in a single database. You can index on birthday and you can index on first name and last name. The database is really smart. It's going to pick the best one to run for any query that you send it. So there's no harm in having multiple here. And fields not in the index, that's okay.

It'll just pull them back from the table anyway. So how do queries query? Let's look at this Prisma query up in the top left. We got a simple query on users that looks for first name Peter, last name Parker. That effectively becomes this SQL, selecting ID, first name, last name from users or first

name Peter, last name Parker. But at the database, conceptually, that actually executes this big for loop. It's looping over every user in that table, checking to see if their first name and last name matches and then pushing them into the result set. So that list is going to grow very fast.

And as your data size grows, as your app matures, this is just going to get slower and slower. How do indexes work? We can throw an index in our Prisma scheme in the top left on first name, last name. That ends up generating this SQL index on the user first name, last name.

And then conceptually, don't roast me if you're really no databases here, but conceptually, this is going to generate this type definition, a nested object, first name, last name, and the results are an array of IDs that match. What does that do to that query we just saw?

That big for loop just became a simple lookup in an object, conceptually. So as long as we know the first name and last name, we can navigate to those results with very little compute. Quick recap, queries without indexes are essentially for loops.

For loops grow with the size of the database, and looping is going to use a lot of CPU, memory, and disk reads just to filter those results out. And any and all of those properties are going to make your query slow, and your slow queries are going to make your app slow. So what kind of indexes do we have available to us?

There's what I would call a regular multi-column index. We should focus in on our things that we're filtering by and our relations here. You can expect that only one index is going to be used per table in a query generally, so don't try to fragment an index like indexing on first name, last name, birthday, all separately,

because it's just going to pick one of those and go with it. But you can do multiple columns in the same index. And specifying specific fields that you want to return back can also help it better utilize the index as well. Something I really want to stress, because this is often overlooked, is the order of

those columns in that index is incredibly important. If you notice in the example on the right, first name, last name, last name, first name, those are not the same thing, and we'll show an example of that. If you had that last index that is just on first name, that's redundant, because partial matching is allowed.

As long as it can navigate from right to left, or left to right, it will use those columns in that order, and it can just leverage everything after that. Inequalities also cause a partial utilization, so if you have a less than or greater than, it'll go from left to right, and once it gets to the inequality, it just won't use any columns

after that. So put that in a little bit of a practice. We pick one of those indexes on first name and last name here, and we got some Prisma queries on the right. If I'm looking at first name, last name in that first query, that's going to hit the index perfectly. I'm going to get a really fast lookup from that.

If I look for just for first name, Peter, that's still going to use the index, because it can use that first column, and then just loop over everything after that for the last name. But if I'm looking just at last name, Parker, that index is not going to match, because it can't just skip over first name in the index. And conceptually, let's go back to our TypeScript example here.

What if you didn't know that the first name was Peter, what would you have done in JavaScript code to loop over this? You would have probably looped over all the keys and users, right? So the database sees that and says, well, if I'm looping anyway, and I've got to go through all this data, I might as well jump back and loop over the table instead, conceptually.

So instead, we can actually add an index on last name, first name, just flip the order here. It looks redundant at first, but now when we look at our queries on the right, that first name, last name one can actually use either query or either index to apply. So a query on just first name will use the first name, last name index.

And a query on last name will use the last name, first name index. We also have unique indexes. You can do a unique index on one or multiple fields. They actually give you the same performance benefits as a regular index would. You can treat them very similarly. They just come with the additional property of uniqueness.

The same matching rules and everything we just discussed, that all applies, including the ordering requirement. It also enables find unique in the Prisma client. So if you're using Prisma, find unique gives you a little bit extra performance boost under certain circumstances. And this one's probably my favorite to talk about, relational indexes.

When you're defining a relation in your database, most databases do not index foreign keys by default. And that is a surprise to most people. It was a surprise to me when I first learned it. Foreign keys exist to protect your data integrity. So when you're inserting, updating, deleting data, it's going to make sure your database doesn't get into a bad state. But it does nothing for performance.

In fact, it can actually slightly hurt it, because it has to actually check if that other record is still valid on the other side of the relation. But you can index relations just like any other field in your database. And when you do that, it improves the performance of both the queries that are looking through that relation and also the things that are filtering on that relation.

And again, all the same rules apply, just like regular indexes, ordering, and everything. You can even stack other columns in with a relation index. I would just typically suggest putting the field from the relation on the far left. So you can see in the little example here, the post has an author relation using the

author ID. And we've indexed that. So like I said, everything has drawbacks. The two most common I hear for indexes are disk space and write overhead. Indexes do use more disk space. That is true. But let me tell you, disk space is much cheaper than database compute.

I would happily increase the size of my disk attached to a database, then increase the tier of the compute for it. It will also scale much better, because just throwing more compute power just makes that loop run slightly faster. The index actually fixes the problem. And then write overhead.

Indexes are going to slow your writes down a little bit. But most apps are going to read way more than they write. And maybe this is a hot take, but most users are going to be OK with the slow save button if the page loads really fast. I will take that trade as a user. And in some cases, index can even improve the performance of updates and deletes.

Going back to our post example, if I wanted to update all of the posts by a particular author, well, if I have an index on the author ID, that update is going to run really, really fast, because it can find those records very quickly. There's some other considerations we can think about that affect database performance, too.

We don't have enough time to jump into those in too much depth, but network latency. How far is your database from your app? You want to make these really close together. If you're deploying your app to many regions, you can do replication of your database to get it closer. You can also use something like Prisma Accelerate that moves the workload around to be more optimal.

Opening a connection is surprisingly slow on databases. There's a lot of back and forth that has to happen to get a new connection established. The Prisma client actually keeps a connection pool for you locally in your app. A lot of ORMs do this, and that helps reuse open connections to keep that really fast. But if you're running on serverless environments, that connection pool doesn't work super well,

because every serverless invocation makes kind of a new app instance, new memory, and has to reconnect. And that's going to take some time away from your serverless execution. So for those, you should use a connection pooler. Prisma Accelerate has one, and there's others you could host as well. Caching. Some queries are just going to be slow. Sometimes you don't want to do indexes on some tables.

Some queries you don't update that often. You can consider caching. There's lots of ways to do that. You could cache in memory. You could host a caching solution. And Prisma Accelerate also offers a cache. If you can't tell, we really built Accelerate to target these performance cases. And then monitor. Like, monitor your database performance. Look at its trends over time.

You will see it change, and you want to stay in front of that. And you can use the Prisma ORM's tracing, or whatever tool you're using's tracing, to keep an eye on your slowest database queries, and try to get in front of them. And keep an eye on more from us. We're really thinking deeply in this space right now.

So general advice. If you don't know much about indexes, and you just want to get started, this is where I would start. Index all of your foreign keys, all your relationships. Just do that. You'll probably see an immediate performance boost just like that, because you're probably doing either filters or joins to get data in your app all over the place. And then index your queries often.

Look for places where you're reusing the same filters a lot. If that's first name and last name, or birthday, or whatever your workload is doing, look for common patterns and try to build indexes around them. And then observe, improve, and repeat. Your performance is going to change over time. Your database is going to grow. You're going to be successful, and you're going to get all these records coming in.

And that's going to cause your performance to look worse as that grows. You're also going to be changing your queries. You're making changes to your code. Some indexes that you might add today might not be used a couple weeks from now. So you want to keep an eye on what indexes are being used and what you need to add to keep your app fast. Where should you go next?

I would read through the Prisma ORM docs on indexing if you're using our ORM. Also check out the data modeling deep dive on Epic Web Dev. Kent has done a fantastic job of taking a lot of these same tips and turning them into exercises and workshops. It really gets you some hands-on experience with this. You can reach out to us on the Prisma Discord.

I personally love helping people tune their databases. So even if you're not using Prisma, send us what you're working with. I would love to see it. Also you can find myself and John Harrell around. We'll be all over the conference. As a bonus, take a look at or Google how to use Explain Plan to understand how your queries work.

That's actually how I pulled that example of how the for loop works. It really breaks down what your SQL query is doing at the database level and you can add indexes and see how that changes the query profile. You can get really confident in what you're doing. Read up on more indexing patterns.

There's a lot of additional concepts, more advanced concepts, like included fields, covering indexes. You can take this a whole other step further. If you're here in person in your swag bag, you might have found a little card from us with a discount for Prisma Pulse. I just want to throw this up for anybody online. They get that same discount code.

So that's all for me. Thank you all for being here. Thank you to the organizers, Kent, everybody with Epic Web Dev. This has been fantastic. Thank you.

Related Talks