Please login or sign up to post and edit reviews.
Whiteboard Confessional: Scaling Databases in a Single Bound
Publisher |
Corey Quinn
Media Type |
audio
Categories Via RSS |
Business News
News
Tech News
Publication Date |
Mar 06, 2020
Episode Duration |
00:11:35

About Corey Quinn

Over the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.

Links

Transcript

Corey: Corey: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semipolite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.

But first… On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.

So I’m going to deviate slightly from the format that I’ve established so far on these Friday morning whiteboard confessional stories, and talk instead about a pattern that has tripped me and others up more times than I care to remember. So it’s my naive hope that by venting about this for the next 10 minutes or so, I will eventually be able to encounter an environment where someone hasn’t made this particular mistake. And what mistake am I talking about? Well, as with so many terrifying architectural patterns, it goes back to databases. You decide that you’re going to write a small toy application, You’re probably not going to turn this into anything massive. And in all honesty, baby seals will probably get more hits than whatever application you’re about to build will. So you don’t really think too hard about what your database structure is going to look like. You spin up a database, you define the database endpoint inside the application, and you go about your merry way. Now, that’s great. Everything’s relatively happy, and everything we just described will work. But let’s say that you hit that edge or corner case where this app doesn’t fade away into obscurity. In fact, this turns out to have some legs, the thing that you’re building now has attained business viability or is at least seeing enough user traffic that it now has to worry about load.

So you start taking a look at this application because you get the worst possible bug reports six to eight months later; it’s slow. Where do you start looking when something is slow? Well, personally, I start looking at the bar, because that is a terribly obnoxious problem to have to troubleshoot. There are so many different ways that latency can get injected into an application. You discover the person reporting the slowness is on the other side of the world with satellite internet connection that they’re apparently trying to set up to the satellite with a tin can and a piece of very long string. There’s a lot of failure states here that you get to start hunting down. The joys of latency hunting. But in many cases, the answer is going to come down to, oh, that database that you defined is now no longer up to the task. You’re starting to bottleneck on that database. Now, you can generally buy your way out of this problem by scaling up whatever database you’re using. Terrific, great, it turns out that you can just add more hardware, which in a time of cloud, of course, just means more money and a bit of downtime while you scale the thing up, but that gets you a little bit further down the road. Until the cycle begins to rinse and repeat, and it turns out, there are only instances that are so large that you’ll be able to get to power databases. Also, they’re not exactly inexpensive. Now, I would name exact sizes of what those databases might look like. But this is AWS, they’re probably going to release at least five different instance families and sizes, by the time I finish recording this. But it gets published later at the end of the week. So instead, there is an alternative here, and it doesn’t take much from an engineering or design perspective when you’re building out one of these silly toy apps that will never have to scale. What is that fix, you might wonder? Terrific question. Let me tell you in just a minute. 

In the late 19th and early 20th centuries, democracy flourished around the world. This was good for most folks, but terrible for the log analytics industry because there was now a severe shortage of princesses to kidnap for ransom to pay for their ridiculous implementations. It doesn’t have to be that way. Consider CHAOSSEARCH. The data lives in your S3 buckets in your AWS accounts, and we know what that costs. You don’t have to deal with running massive piles of infrastructure to be able to query that log data with APIs you’ve come to know and tolerate, and they’re just good people to work with. Reach out to CHAOSSEARCH.io. And my thanks to them for sponsoring this incredibly depressing podcast. 

So this is a pattern that increasingly, modern frameworks are recommending, but a number of them don’t. And I’m not going to name names, because I don’t want to wind up in a slap and tickle fight around which frameworks are good versus which frameworks are crappy. You can all make your own decisions around that. But the pattern that makes sense for this is even when you’re beginning with a toy app, go ahead and define two database endpoints, one for reads, And one for writes. Invariably, this is going to solve a whole host of problems with most database technologies. If you take a look at most applications, and yes, I know there are going to be exceptions to this, they tend to bottleneck on reads. If you have just a single database or database cluster, then all of the read traffic gets in the way of being able to write to that. That includes things that don’t actually need to be in line with the rest of what the application is doing. If you can have a read replica that’s used for business analytics, great. Your internal business teams can beat the living crap out of that database replica without damaging anything that’s in the critical path of serving users. And the writes can then go specifically to the primary node, w...

Join me as I continue a new series called Whiteboard Confessional by examining an all-too-common problem: having to scale a database when it’s too late. In this episode, I touch upon the underlying reason many developers don’t think about their database until they’re forced to, what some of the primary drivers of latency are, the easiest (and priciest) way to scale a database, what you can do to avoid this whole problem altogether from the outset, Corey’s advice on how to save months of work down the road, how often this problem rears its ugly head in applications, and more.

About Corey Quinn

Over the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.

Links

Transcript

Corey: Corey: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semipolite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.

But first… On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.

So I’m going to deviate slightly from the format that I’ve established so far on these Friday morning whiteboard confessional stories, and talk instead about a pattern that has tripped me and others up more times than I care to remember. So it’s my naive hope that by venting about this for the next 10 minutes or so, I will eventually be able to encounter an environment where someone hasn’t made this particular mistake. And what mistake am I talking about? Well, as with so many terrifying architectural patterns, it goes back to databases. You decide that you’re going to write a small toy application, You’re probably not going to turn this into anything massive. And in all honesty, baby seals will probably get more hits than whatever application you’re about to build will. So you don’t really think too hard about what your database structure is going to look like. You spin up a database, you define the database endpoint inside the application, and you go about your merry way. Now, that’s great. Everything’s relatively happy, and everything we just described will work. But let’s say that you hit that edge or corner case where this app doesn’t fade away into obscurity. In fact, this turns out to have some legs, the thing that you’re building now has attained business viability or is at least seeing enough user traffic that it now has to worry about load.

So you start taking a look at this application because you get the worst possible bug reports six to eight months later; it’s slow. Where do you start looking when something is slow? Well, personally, I start looking at the bar, because that is a terribly obnoxious problem to have to troubleshoot. There are so many different ways that latency can get injected into an application. You discover the person reporting the slowness is on the other side of the world with satellite internet connection that they’re apparently trying to set up to the satellite with a tin can and a piece of very long string. There’s a lot of failure states here that you get to start hunting down. The joys of latency hunting. But in many cases, the answer is going to come down to, oh, that database that you defined is now no longer up to the task. You’re starting to bottleneck on that database. Now, you can generally buy your way out of this problem by scaling up whatever database you’re using. Terrific, great, it turns out that you can just add more hardware, which in a time of cloud, of course, just means more money and a bit of downtime while you scale the thing up, but that gets you a little bit further down the road. Until the cycle begins to rinse and repeat, and it turns out, there are only instances that are so large that you’ll be able to get to power databases. Also, they’re not exactly inexpensive. Now, I would name exact sizes of what those databases might look like. But this is AWS, they’re probably going to release at least five different instance families and sizes, by the time I finish recording this. But it gets published later at the end of the week. So instead, there is an alternative here, and it doesn’t take much from an engineering or design perspective when you’re building out one of these silly toy apps that will never have to scale. What is that fix, you might wonder? Terrific question. Let me tell you in just a minute. 

In the late 19th and early 20th centuries, democracy flourished around the world. This was good for most folks, but terrible for the log analytics industry because there was now a severe shortage of princesses to kidnap for ransom to pay for their ridiculous implementations. It doesn’t have to be that way. Consider CHAOSSEARCH. The data lives in your S3 buckets in your AWS accounts, and we know what that costs. You don’t have to deal with running massive piles of infrastructure to be able to query that log data with APIs you’ve come to know and tolerate, and they’re just good people to work with. Reach out to CHAOSSEARCH.io. And my thanks to them for sponsoring this incredibly depressing podcast. 

So this is a pattern that increasingly, modern frameworks are recommending, but a number of them don’t. And I’m not going to name names, because I don’t want to wind up in a slap and tickle fight around which frameworks are good versus which frameworks are crappy. You can all make your own decisions around that. But the pattern that makes sense for this is even when you’re beginning with a toy app, go ahead and define two database endpoints, one for reads, And one for writes. Invariably, this is going to solve a whole host of problems with most database technologies. If you take a look at most applications, and yes, I know there are going to be exceptions to this, they tend to bottleneck on reads. If you have just a single database or database cluster, then all of the read traffic gets in the way of being able to write to that. That includes things that don’t actually need to be in line with the rest of what the application is doing. If you can have a read replica that’s used for business analytics, great. Your internal business teams can beat the living crap out of that database replica without damaging anything that’s in the critical path of serving users. And the writes can then go specifically to the primary node, w...

This episode currently has no reviews.

Submit Review
This episode could use a review!

This episode could use a review! Have anything to say about it? Share your thoughts using the button below.

Submit Review