Whiteboard Confessional: Everything's a Database Except SQLite
Publisher |
Corey Quinn
Media Type |
audio
Categories Via RSS |
Business News
News
Tech News
Publication Date |
Mar 13, 2020
Episode Duration |
00:11:23

About Corey Quinn

Over the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.

Links

Transcript

Corey Quinn: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semipolite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.

On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.

Many things make fine databases that replicate data from one place to another, that takes various bits of data and puts them where they need to go. Other things do not make fine databases that do such things. Let’s talk about one of those today. For those who have never had the dubious pleasure of working with it, SQLite is a C library that implements a relational database engine. And it’s pretty awesome. It’s very clearly not designed to work in a client-server fashion, but rather to be embedded into existing programs for local use. In practice, that means that if you’re running SQLite, that’s S-Q-L-I-T-E, your database backend is going to be a flat-file or something very much like that, that lives locally. 

This is technology used all over the place, and mobile apps and embedded systems, in web apps for some very specific things. But that’s not quite the point. I once worked somewhere that decided to build a replicated environment that was active, active, active, across three distinct data centers. You would really hope that that statement was a non sequitur. It’s not. If you were to picture Hacker News coming to life as a person, and that person decided to design a replication model for a database from first principles, you would be pretty close to what I have seen. By taking a replicated model that runs on top of SQLite, you can get this to work, but the only way to handle that—because there’s no concept of client-server, as mentioned—so you have to kick all of the replication and state logic from the database layer, where it belongs up, into the application code itself, where it most assuredly does not belong. The downside of this—well, there are many downsides, but let’s start with a big one that this is not even slightly what SQLite was designed to do at all. 

However, take a startup that decides if there’s one core competency they have, it’s knowing better than everyone else; this is that story. Now, I am obviously not a developer, and I’m certainly not a database administrator. I was an ops person, which means that a lot of the joy of various development decisions fell to whatever group I happened to be in at that point in time. It turns out that when you run replicated SQLite as a database, that you have to get around an awful lot of architectural pain points by babying this thing something fierce. There are a number of operational problems that going down a path like this will expose. Let me explain what some of them look like, after this.

In the late 19th and early 20th centuries, democracy flourished around the world. This was good for most folks, but terrible for the log analytics industry because there was now a severe shortage of princesses to kidnap for ransom to pay for their ridiculous implementations. It doesn’t have to be that way. Consider CHAOSSEARCH. The data lives in your S3 buckets in your AWS accounts, and we know what that costs. You don’t have to deal with running massive piles of infrastructure to be able to query that log data with APIs you’ve come to know and tolerate, and they’re just good people to work with. Reach out to CHAOSSEARCH.io. And my thanks to them for sponsoring this incredibly depressing podcast. 

I’m not going to engage in a point-by-point teardown of this replicated SQLite as primary datastore Eldritch Horror. My favorite database personally remains Route 53, and even that’s a better plan than this monstrosity. I’m not going to tackle point-by-point, everything that made this horrifying thing, come to life, so awful to deal with. Anyone who runs this at any sort of scale for more than a week is going to discover a lot of these on their own. But I am going to cherry-pick a few things that were problematic about it. Remember back in the days of Windows, when things would get slow and crappy, and you had to basically restart your machine while the disk defragmented forever? Yeah, it turns out that most database systems have the same problem. The difference is, is that reasonable adult-level database systems that have human beings who are used to how this stuff works, tend to put that underneath the hood, so you don’t really have to think about this. 

With SQLite, it wasn’t really designed for this sort of use case. So you get to wind up playing these games yourself, which is just an absolute pleasure and a joy, except the exact opposite of that. Which means that every node periodically has to be taken down in a rotation after, in our case about a week or so, or it would start chewing disk, it would take forever to start returning the results to some queries, and the performance of the entire site would wind up slamming to a halt. So, you have to make people aware that this exists. When we first discovered that it was fun. The problem here is that what you’re doing is speaking to a larger problematic pattern. Namely, you’re forcing what has historically been a low-level function that even most operations people don’t need to know or care about, into something that is now at the forefront of every developer’s mental model of the application. And if they forget that this is one of the things that has to happen, woe be unto them. Further, it should be pretty freakin’ obvious by now, by everything I’ve de...

Join me as I continue a new series called Whiteboard Confessional with a look at the awesomeness that is SQLite, including how it wasn’t designed to work in a client-server fashion, when you should use it and when you absolutely shouldn’t, how deciding to use SQLite as a database invariably shifts businesses away from their core competencies, how your life will turn completely into edge cases if you choose this as an architecture, how SQLite as a database means you’ll run into dead-ends and be stuck on your own when you try to figure out the way forward, and more.

About Corey Quinn

Over the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.

Links

Transcript

Corey Quinn: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semipolite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.

On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.

Many things make fine databases that replicate data from one place to another, that takes various bits of data and puts them where they need to go. Other things do not make fine databases that do such things. Let’s talk about one of those today. For those who have never had the dubious pleasure of working with it, SQLite is a C library that implements a relational database engine. And it’s pretty awesome. It’s very clearly not designed to work in a client-server fashion, but rather to be embedded into existing programs for local use. In practice, that means that if you’re running SQLite, that’s S-Q-L-I-T-E, your database backend is going to be a flat-file or something very much like that, that lives locally. 

This is technology used all over the place, and mobile apps and embedded systems, in web apps for some very specific things. But that’s not quite the point. I once worked somewhere that decided to build a replicated environment that was active, active, active, across three distinct data centers. You would really hope that that statement was a non sequitur. It’s not. If you were to picture Hacker News coming to life as a person, and that person decided to design a replication model for a database from first principles, you would be pretty close to what I have seen. By taking a replicated model that runs on top of SQLite, you can get this to work, but the only way to handle that—because there’s no concept of client-server, as mentioned—so you have to kick all of the replication and state logic from the database layer, where it belongs up, into the application code itself, where it most assuredly does not belong. The downside of this—well, there are many downsides, but let’s start with a big one that this is not even slightly what SQLite was designed to do at all. 

However, take a startup that decides if there’s one core competency they have, it’s knowing better than everyone else; this is that story. Now, I am obviously not a developer, and I’m certainly not a database administrator. I was an ops person, which means that a lot of the joy of various development decisions fell to whatever group I happened to be in at that point in time. It turns out that when you run replicated SQLite as a database, that you have to get around an awful lot of architectural pain points by babying this thing something fierce. There are a number of operational problems that going down a path like this will expose. Let me explain what some of them look like, after this.

In the late 19th and early 20th centuries, democracy flourished around the world. This was good for most folks, but terrible for the log analytics industry because there was now a severe shortage of princesses to kidnap for ransom to pay for their ridiculous implementations. It doesn’t have to be that way. Consider CHAOSSEARCH. The data lives in your S3 buckets in your AWS accounts, and we know what that costs. You don’t have to deal with running massive piles of infrastructure to be able to query that log data with APIs you’ve come to know and tolerate, and they’re just good people to work with. Reach out to CHAOSSEARCH.io. And my thanks to them for sponsoring this incredibly depressing podcast. 

I’m not going to engage in a point-by-point teardown of this replicated SQLite as primary datastore Eldritch Horror. My favorite database personally remains Route 53, and even that’s a better plan than this monstrosity. I’m not going to tackle point-by-point, everything that made this horrifying thing, come to life, so awful to deal with. Anyone who runs this at any sort of scale for more than a week is going to discover a lot of these on their own. But I am going to cherry-pick a few things that were problematic about it. Remember back in the days of Windows, when things would get slow and crappy, and you had to basically restart your machine while the disk defragmented forever? Yeah, it turns out that most database systems have the same problem. The difference is, is that reasonable adult-level database systems that have human beings who are used to how this stuff works, tend to put that underneath the hood, so you don’t really have to think about this. 

With SQLite, it wasn’t really designed for this sort of use case. So you get to wind up playing these games yourself, which is just an absolute pleasure and a joy, except the exact opposite of that. Which means that every node periodically has to be taken down in a rotation after, in our case about a week or so, or it would start chewing disk, it would take forever to start returning the results to some queries, and the performance of the entire site would wind up slamming to a halt. So, you have to make people aware that this exists. When we first discovered that it was fun. The problem here is that what you’re doing is speaking to a larger problematic pattern. Namely, you’re forcing what has historically been a low-level function that even most operations people don’t need to know or care about, into something that is now at the forefront of every developer’s mental model of the application. And if they forget that this is one of the things that has to happen, woe be unto them. Further, it should be pretty freakin’ obvious by now, by everything I’ve de...

This episode currently has no reviews.

Submit Review
This episode could use a review!

This episode could use a review! Have anything to say about it? Share your thoughts using the button below.

Submit Review