Podknife - Whiteboard Confessional: The Day IBM Cloud Dissipated by AWS Morning Brief

Help
Help

Suggested Topics
- What is Podknife?
  
  Podknife is a curated podcast information and review site designed to be accessible in your browser from any device. We’re working to build the most useful podcast information source available by providing you with as much publicly available information about each podcast in our database as we can find and keeping it as up to date as possible.
- Can I listen to podcasts on Podknife?
  
  Yeah! There's a player at the bottom of the page.
- How can I get my podcast on Podknife?
  
  We’d love to hear about your podcast and consider it for addition. You can suggest your own by logging in and selecting the “Suggest a Podcast” link from the menu. Fill out the form and we’ll take it from there.
- How can I submit a review of a podcast?
  
  To submit a review of a podcast, go to the page of the podcast you want to review and click the "Submit Review" button below the image associated with the podcast. Clicking that button brings you to a new page where you can select a rating between 1 and 5 stars and write a brief review.
- How can I submit a review of a podcast episode?
  
  To submit a review of a podcast episode, go to the page of the podcast whose episode you want to review and find the specific episode you're looking for. Clicking on any episode reveals a "View Episode" button and clicking on that brings you to an episode-specific page where you can find more information displayed about the episode along with a "Submit Review" button. Clicking that brings you to a new page where you can select a rating between 1 and 5 stars and write a brief review of the episode.
- Is there a Podknife app?
  
  Not yet - we're keeping it in the browser at least for now. No app download required to access anything on the site.
- How can I change my display name?
  
  To change your display name, log in to Podknife and select "Account Settings" in the dropdown menu. From the Account Settings page, enter your desired new display name and press the "Save" button to confirm the change.
- How can I change my password?
  
  To change your password, log in to Podknife and select "Account Settings" in the dropdown menu. From the Account Settings page, enter your desired new password in the two fields below "Update Password" and press the "Save" button to confirm the change.
- How can I keep track of all my reviews?
  
  Within the menu, click on "My Profile", you will then be able to check all your reviews for both podcast & episode. You can also check all your favorite podcasts.
- How can I keep track of all my favorited podcasts?
  
  You can find podcasts you’ve favorited displayed on the My Profile page associated with your account (https://podknife.com/profile).
  You can share your My Profile page displaying your favorite podcasts as well as any reviews you’ve written publicly with a link in the form https://podknife.com/users/[displayname] where [displayname] is replaced by your Display Name.
- How can I favorite a podcast?
  
  On each podcast page you’ll find the outline of a star above the podcast title. Clicking this star fills it in and designates the podcast as a favorite (for logged in users) or invites the user to log in or register for an account (for users who are not logged in). You can find favorited podcasts displayed on the My Profile page associated with your account (https://podknife.com/profile).
- How can I register for a Podknife account?
  
  To register, click on the “Register” button in the top right of your browser window (or visit https://podknife.com/users/sign_up). Enter a Display Name to be associated with your account on the site (alongside any reviews you contribute, for example), an Email address to associate with the account and your password (twice for confirmation). Press the “Confirm” button and you should be all set, logged in and ready to review and favorite podcasts as you wish.
- How can I get in touch?
  
  We’d love to hear from you. You can reach us using the Feedback form (https://podknife.com/feedbacks/new) which can be found either in the footer on each page (for users who are not logged in) or in the dropdown menu (for logged in users).
  
  You can also reach us on Twitter by tweeting us @podknife (https://twitter.com/podknife) or on Facebook by messaging us from our page (https://www.facebook.com/podknife/).
- Log In
- Register
- Feedback
- Help
- Privacy Policy
- Terms of Use
© Podknife 2024

This episode currently has no reviews.

Submit Review

Favorite Add to Queue

Whiteboard Confessional: The Day IBM Cloud Dissipated

Podcast |: AWS Morning Brief
Publisher |: Corey Quinn
Media Type |: audio
Podknife tags |: Amazon,; Cloud Computing,; Interview,; Tech News,; Technology,; Web Development
Categories Via RSS |: Business News,; News,; Tech News
Publication Date |: Jul 03, 2020
Episode Duration |: 00:13:14

Description
iTunes Summary

About Corey Quinn

Over the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.

Links

Transcript

Corey: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semi-polite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real-world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.

This episode is sponsored in part by ParkMyCloud, fellow worshipers at the altar of turned out [BLEEP] off. ParkMyCloud makes it easy for you to ensure you're using public cloud like the utility it's meant to be. just like water and electricity, You pay for most cloud resources when they're turned on, whether or not you're using them. Just like water and electricity, keep them away from the other computers. Use ParkMyCloud to automatically identify and eliminate wasted cloud spend from idle, oversized, and unnecessary resources. It's easy to use and start reducing your cloud bills. get started for free at parkmycloud.com/screaming.

Welcome to the AWS Morning Brief’s Whiteboard Confessional series. I am Cloud Economist Corey Quinn, and today's topic is going to be slightly challenging to talk about. One of the core tenants that we've always had around technology companies and working with SRE, or operations-type organizations is, full stop, you do not make fun of other people's downtime because today it's their downtime, and tomorrow it's yours. It's important. That's why we see the hashtag #HugOps on Twitter start to—well, not trend. It's not that well known but definitely happens fairly frequently when there's a well-publicized multi-hour outage that affects a company that people are familiar with.

So, what we're going to talk about is an outage that happened several weeks ago for IBM Cloud. I want to point out some failings on IBM’s part but this is in the quote-unquote, “Sober light of day.” They are not currently experiencing an outage. They've had ample time to make public statements about the cause of the outage. And I've had time to reflect a little bit on what message I want to carry forward, given that there are definitely lessons for the rest of us to learn. HugOps is important, but it only goes so far, and at some point, it's important to talk about the failings of large companies and their associated response to crises so the rest of us can learn.

Now, I'm about to dunk on them fairly hard, but I stand by the position that I'm taking, and I hope that it's interpreted in the constructive spirit that I intend it to. For background, IBM Cloud is IBM's purported hyperscale cloud offering. It was effectively stitched together from a variety of different acquisitions, most notable among them SoftLayer. I've had multiple consulting clients who are customers of IBM Cloud over the past few years, and their experience has been, to put it politely, a mixed bag. In practice, the invective that they would lobby against it would be something worse.

Now, a month ago, something strange happened to IBM Cloud. Specifically, it went down. I don't mean that a service started having problems in a region. That tends to happen to every cloud provider, and it's important that we don't wind up beating them up unnecessarily for these things. No, IBM Cloud went down. And when I say that IBM Cloud went down, I mean, the entire thing effectively went off the internet. Their status page stopped working, for example. Every resource that people had inside of IBM Cloud was reportedly down. And this was relatively unheard of in the world of global cloud providers.

Azure and GCP don't have the same isolated network boundary per region that AWS has, but even in those cases, we tend to see far more frequently rolling outages rather than global outages affecting everything simultaneously. It's a bit uncommon. What's strange is that their status page was down. Every point of access you had into looking at what was going on with IBM Cloud was down. Their Twitter accounts fell silent, other than pre-scheduled promotional tweets that were set to go out. It looked for all the world like IBM had just decided to pack up early, turn everything off on the way out of the office, and enjoy the night off.

That obviously isn't what happened, but it was notable in that there was no communication for the first hour or so of the outage, and this was causing people to go more than a little bonkers. One of the pieces that was interesting to me, while this was happening, since it was impossible to get data out of this for anything substantive or authoritative, was I pulled up their marketing site. Now, the marketing site still worked—apparently, it does not live on top of IBM Cloud—but it listed a lot of their marquee customers and case studies. I went through a quick sampling, and American Airlines was the only site that had a big outage notification on the front of it. Everything else seemed to be working.

So, either the outage was not as widespread as people thought, or a lot of their marquee customers are only using them for specific components. Either one of those is compelling and interesting, but we don't have a whole lot of data to feed back into the system to draw reasonable conclusions. Their status page itself, like it was mentioned, was down, and that's super bad. One of the early things you learn when running a large-scale system of any kind is the thing that tells you—and the world—that you're down cannot have a dependency on any of the things that you are personally running. The AWS status page had this, somewhat hilariously, during the S3 outage a few years ago, when they had trouble updating what was going on due to that outage. I would imagine that's no longer the case, but one does wonder.

And most damning, and the reason I bring this up is the following day, they posted the following analysis on their site: “IBM is focused on external network provider issues as the cause of the disruption of IBM Cloud services on Tuesday, June 9th. All services have been restored. A detailed root cause analysis is underway. An investigation shows an external network provider flooded the IBM Cloud network with incorrect routing, resulting in severe congestion of traffic, and impacting IBM Cloud services, and our data centers. Migration steps have been taken to prevent a recurrence. Root ...

Join me as I continue the Whiteboard Confessional series by examining IBM Cloud’s recent widespread outage. I talk about why you should never make fun of someone else’s downtime, how IBM’s response to the outage was abysmal, what cloud providers should do whenever their systems go down, how a cloud provider’s status page can never be dependent on things they are personally running, why IBM Cloud deserves the Oxymoron of the Year award, why it’s important to let the broader community learn from your major mistakes, how this fiasco demonstrates that IBM Cloud isn’t ready to properly service the market, and more.

About Corey Quinn

Links

Transcript

This episode currently has no reviews.

Submit Review

This episode could use a review!

This episode could use a review! Have anything to say about it? Share your thoughts using the button below.

Submit Review