Moving Data Is Expensive and Painful (Just Like Moving Banks)
Publisher |
Corey Quinn
Media Type |
audio
Categories Via RSS |
Business News
News
Tech News
Publication Date |
Feb 05, 2021
Episode Duration |
00:24:01

Transcript

Corey: This episode is sponsored in part by our friends at Fairwinds. Whether you’re new to Kubernetes or have some experience under your belt, and then definitely don’t want to deal with Kubernetes, there are some things you should simply never, ever do in Kubernetes. I would say, “run it at all.” They would argue with me, and that’s okay because we’re going to argue about that. Kendall Miller, president of Fairwinds, was one of the first hires at the company and has spent the last six years the dream of disrupting infrastructure a reality while keeping his finger on the pulse of changing demands in the market, and valuable partnership opportunities. He joins senior site reliability engineer Stevie Caldwell, who supports a growing platform of microservices running on Kubernetes in AWS. I’m joining them as we all discuss what Dev and Ops teams should not do in Kubernetes if they want to get the most out of the leading container orchestrator by volume and complexity. We’re going to speak anecdotally of some Kubernetes failures and how to avoid them, and they’re going to verbally punch me in the face. Sign up now at fairwinds.com/never. That’s fairwinds.com/never.

Pete: Hello, and welcome to the AWS Morning Brief: Fridays From the Field. I am Pete Cheslock.

Jesse: I'm still Jesse DeRose.

Pete: We're still here. And you can also be here by sending us your questions at lastweekinaws.com/QA. We're continuing our Unconventional Guide to AWS Cost Management series, and today we're talking about moving data. It's not cheap, is it?

Jesse: No, it's definitely not cheap. It is expensive, and it's painful. And we're going to talk about why, today. And a reminder, if you haven't listened to some of the other episodes in this series, please go back and do so. Lots of really great information before this one and lots of really great information coming after this one. I'm really excited to dive in.

Pete: Yeah, look, they're all great episodes in the end of the day, right? They're just all fantastic.

Jesse: Yeah.

Pete: If I do say so myself.

Jesse: All of the information is important; all of the information is individually important—I think that's probably the best way to put it. You can listen to all these episodes and implement maybe just a handful of things that work best for you; you can listen to all these episodes and implement all of them, all of the suggestions. There's lots of opportunities here.

Pete: If you do actually go and implement all of these suggestions, you really should go to lastweekinaws.com/QA and tell us about it. We'd be very curious to hear how it goes. But if you're struggling with any of these, just let us know as well. These are things that are measured in long periods of time. 

It is rare that we run into engagements with clients that you can just click box, save money. Now, don't get me wrong; there's a whole bunch of those, too. But if you want to just fundamentally improve how you're using the Cloud and how you're saving money, those projects are multi-year investments. It's just all of this stuff takes a long time. And you just got to manage those expectations appropriately. 

And specifically around this topic, moving data, it is—as Jesse said—painful. It is expensive, especially in Amazon. They will charge you to move the tiniest bit of data literally everywhere, with, like, two minor exceptions. And it's just the worst. Data storage costs, so Duckbill Group, we've kind of become these experts on data transfer and data storage costs, understanding just the complexity around them. And I feel like a lot of times folks only think about the storage being the biggest driver of their spend. 

Jesse: Absolutely.

Pete: You know, you never delete your data. But you put it all on S3, right, Jesse? Like that's a cheap place to put your data.

Jesse: Absolutely. Worthwhile. Put it in S3 standard storage, call it a day. I'm done, right? 

Pete: Yeah, just do my little, like, wipe my hands, and go on, and we're good. Most people put it in standard storage, just like most people use gp2 EBS volumes; that's the standard everything. And that could be a big driver of cost, but more likely the larger driver—because it's a little bit more hidden, it's a little bit more spread around your entire bill is the transferring of data, the moving data around. And I say moving specifically because there are some services that are charged via I/Os. Via actually putting data into it or taking data out, not just the data transfer.

Jesse: I think it's also really important to call out that most companies that move into the Cloud don't realize that data transfer is something that AWS will charge you for, so I want to make that explicitly clear. As Pete mentioned, in almost every case moving data around, AWS will charge you for that versus in a data center environment where that's kind of hidden, that's not really explicitly a line item in your bill. And here, it absolutely is a line item in your bill and absolutely should be thought of as an important component to optimize. 

Pete: Exactly. In the data center world, for any of the folks out there that are in a data-center land, or maybe hybrid-cloud land, your networking costs are, I mean, it's largely a sunk cost. You've got your switches and your lines that run, maybe you're—get charged for the cross-connects, and interacting, data transferring to other areas and things like that. But within your racks, within your own secure domains, you don't have to really think about the cost of those network communications because it's already paid for. And you're definitely not charged at a per-gigabyte level like you are on Amazon.

Jesse: So, we talked about this a little bit before in a previous episode, when we talked about context is king. Context for your application infrastructure is really, really important; understanding how your application interacts with other applications within your cloud infrastructure ecosystem; how your data moves between workloads. All of these things are really, really important, and so specifically, when we talk about data transfer, it's really important to not just understand how your data is moved around, but why your data is moved around. So, we really like to suggest working with all of the teams within your organization. Again, product, potentially legal, maybe IT, to understand your data movement patterns and the business requirements for those data movement patterns. 

Why does your data need to move multiple times within an availability zone? Why does it need to move between regions? Do you need to have data that is copied across multiple availability zones? Do you need that data to be cross-region? These are some examples of really important questions to ask to understand, do you need to continue transferring that data? Because the more you can optimize the way that that data is moving around within AWS, the less money you'll ultimately spend.

Pete: Yeah, and this ties into, again as you've noticed, there's a reoccurring theme is th...

Join Pete and Jesse as they talk about the prohibitively expensive costs associated with moving data in the cloud. They touch upon how data transfer is so expensive in AWS and how many people don’t realize it when they first migrate, how data transfer costs in data centers have always been hidden, the role context plays in data transfer and why it’s important to know how and why data is moved around, the questions you need to ask yourself to figure out why data is moving within AWS, why you should rope legal into the process when figuring out how data transfers across your cloud environment, Pete’s gripes about the NAT gateway service, and more.

Transcript

Corey: This episode is sponsored in part by our friends at Fairwinds. Whether you’re new to Kubernetes or have some experience under your belt, and then definitely don’t want to deal with Kubernetes, there are some things you should simply never, ever do in Kubernetes. I would say, “run it at all.” They would argue with me, and that’s okay because we’re going to argue about that. Kendall Miller, president of Fairwinds, was one of the first hires at the company and has spent the last six years the dream of disrupting infrastructure a reality while keeping his finger on the pulse of changing demands in the market, and valuable partnership opportunities. He joins senior site reliability engineer Stevie Caldwell, who supports a growing platform of microservices running on Kubernetes in AWS. I’m joining them as we all discuss what Dev and Ops teams should not do in Kubernetes if they want to get the most out of the leading container orchestrator by volume and complexity. We’re going to speak anecdotally of some Kubernetes failures and how to avoid them, and they’re going to verbally punch me in the face. Sign up now at fairwinds.com/never. That’s fairwinds.com/never.

Pete: Hello, and welcome to the AWS Morning Brief: Fridays From the Field. I am Pete Cheslock.

Jesse: I'm still Jesse DeRose.

Pete: We're still here. And you can also be here by sending us your questions at lastweekinaws.com/QA. We're continuing our Unconventional Guide to AWS Cost Management series, and today we're talking about moving data. It's not cheap, is it?

Jesse: No, it's definitely not cheap. It is expensive, and it's painful. And we're going to talk about why, today. And a reminder, if you haven't listened to some of the other episodes in this series, please go back and do so. Lots of really great information before this one and lots of really great information coming after this one. I'm really excited to dive in.

Pete: Yeah, look, they're all great episodes in the end of the day, right? They're just all fantastic.

Jesse: Yeah.

Pete: If I do say so myself.

Jesse: All of the information is important; all of the information is individually important—I think that's probably the best way to put it. You can listen to all these episodes and implement maybe just a handful of things that work best for you; you can listen to all these episodes and implement all of them, all of the suggestions. There's lots of opportunities here.

Pete: If you do actually go and implement all of these suggestions, you really should go to lastweekinaws.com/QA and tell us about it. We'd be very curious to hear how it goes. But if you're struggling with any of these, just let us know as well. These are things that are measured in long periods of time. 

It is rare that we run into engagements with clients that you can just click box, save money. Now, don't get me wrong; there's a whole bunch of those, too. But if you want to just fundamentally improve how you're using the Cloud and how you're saving money, those projects are multi-year investments. It's just all of this stuff takes a long time. And you just got to manage those expectations appropriately. 

And specifically around this topic, moving data, it is—as Jesse said—painful. It is expensive, especially in Amazon. They will charge you to move the tiniest bit of data literally everywhere, with, like, two minor exceptions. And it's just the worst. Data storage costs, so Duckbill Group, we've kind of become these experts on data transfer and data storage costs, understanding just the complexity around them. And I feel like a lot of times folks only think about the storage being the biggest driver of their spend. 

Jesse: Absolutely.

Pete: You know, you never delete your data. But you put it all on S3, right, Jesse? Like that's a cheap place to put your data.

Jesse: Absolutely. Worthwhile. Put it in S3 standard storage, call it a day. I'm done, right? 

Pete: Yeah, just do my little, like, wipe my hands, and go on, and we're good. Most people put it in standard storage, just like most people use gp2 EBS volumes; that's the standard everything. And that could be a big driver of cost, but more likely the larger driver—because it's a little bit more hidden, it's a little bit more spread around your entire bill is the transferring of data, the moving data around. And I say moving specifically because there are some services that are charged via I/Os. Via actually putting data into it or taking data out, not just the data transfer.

Jesse: I think it's also really important to call out that most companies that move into the Cloud don't realize that data transfer is something that AWS will charge you for, so I want to make that explicitly clear. As Pete mentioned, in almost every case moving data around, AWS will charge you for that versus in a data center environment where that's kind of hidden, that's not really explicitly a line item in your bill. And here, it absolutely is a line item in your bill and absolutely should be thought of as an important component to optimize. 

Pete: Exactly. In the data center world, for any of the folks out there that are in a data-center land, or maybe hybrid-cloud land, your networking costs are, I mean, it's largely a sunk cost. You've got your switches and your lines that run, maybe you're—get charged for the cross-connects, and interacting, data transferring to other areas and things like that. But within your racks, within your own secure domains, you don't have to really think about the cost of those network communications because it's already paid for. And you're definitely not charged at a per-gigabyte level like you are on Amazon.

Jesse: So, we talked about this a little bit before in a previous episode, when we talked about context is king. Context for your application infrastructure is really, really important; understanding how your application interacts with other applications within your cloud infrastructure ecosystem; how your data moves between workloads. All of these things are really, really important, and so specifically, when we talk about data transfer, it's really important to not just understand how your data is moved around, but why your data is moved around. So, we really like to suggest working with all of the teams within your organization. Again, product, potentially legal, maybe IT, to understand your data movement patterns and the business requirements for those data movement patterns. 

Why does your data need to move multiple times within an availability zone? Why does it need to move between regions? Do you need to have data that is copied across multiple availability zones? Do you need that data to be cross-region? These are some examples of really important questions to ask to understand, do you need to continue transferring that data? Because the more you can optimize the way that that data is moving around within AWS, the less money you'll ultimately spend.

Pete: Yeah, and this ties into, again as you've noticed, there's a reoccurring theme is th...

This episode currently has no reviews.

Submit Review
This episode could use a review!

This episode could use a review! Have anything to say about it? Share your thoughts using the button below.

Submit Review