Please login or sign up to post and edit reviews.
Whiteboard Confessional: How Cluster SSH Almost Got Me Fired
Publisher |
Corey Quinn
Media Type |
audio
Categories Via RSS |
Business News
News
Tech News
Publication Date |
Feb 28, 2020
Episode Duration |
00:14:03

About Corey Quinn

Over the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.

Links

TranscriptCorey: On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.

So, once upon a time, way back in the mists of antiquity, was a year called 2006. This is before many folks listening to this podcast were involved in technology. And I admit as well that it is also several decades after other folks listening to this podcast got involved in technology. But that’s not the point of this story. It was my first real job working in anything resembling a production-style environment. I’d dabbled before this, running various environments on Windows desktop style support. I’d played around with small business servers for running Windows-style environments. And then I decided there wasn’t much of a future in technology and spent some time as a technical recruiter, spent a little bit more time working in a sales role, which I was disturbingly good at, but I was selling tape drives to people. But that’s not the interesting part of the story. What is, is that I somehow managed to luck my way into a job interview for a university, helping to run their Linux and Unix systems. 

Cool. Turns out that interviewing is a skill like any other. The technical reviewer was out sick that day, and they really liked both the confidence of my answers, as well as my personality. That’s two mistakes right there. One; my personality is exactly what you would expect it to be. And two; hiring the person who sounds the most confident is exactly what you don’t want to do. It also tends to lend credence to people who look exactly like me. So I had converted some systems over in the first few months for that role to FreeBSD, which is like Linux, except it’s not Linux. It’s a Unix and it’s far older, derived from the Berkeley software distribution. and managing a bunch of those systems at scale was a challenge. Now understand, in this era scale meant something radically different than it does today. I had somewhere between 12 and 15 nodes that I had to care about. Some more mail servers. Some were NTP servers, of all things. Utility boxes here and there, the omnipresent web servers that we all dealt with, the Cacti box whose primary job was to get compromised and serve as an attack vector for the rest of the environment, etcetera. 

This was a university. Mistakes didn’t necessarily mean the same thing there as they would in revenue-generating engineering activities. I was also young, foolish, and the statute of limitations is almost certainly expired by now. So, running the same command on every box was annoying. This was in the days before configuration management was really a thing. BCFG2 was out there and incredibly complex. And CFEngine was also out there, which required an awful lot of in-depth arcane knowledge that I frankly didn’t have. Remember, I bluffed my way into this job and was learning on the fly. So I did a little digging and, lo and behold, I found a tool that solved my problems. called ClusterSSH. And oh, was it a cluster. The way that this works was that it would spin up different xterm windows on your screen that you could then provide a list of hosts for, and it would open one for every host you gave it. 

Great. So now I’m logged into all of those boxes at once. If this is making you cringe already, it probably should, because this is not a great architectural pattern. But here we are, we’re telling this story, so you probably know how that worked out. One of the intricacies of FreeBSD is that instead of running systems that turn things on or turn things off, as far as services to start on boot. For example, with Red Hat derived systems, before the dark times of systemd, you could write things like chkconfig, that’s C-H-K, the word config, and then you could give a service and tell it to turn it on or off at certain run levels. This is how you would tell it to, for example, start the webserver when you boot, otherwise, you reboot the system, the webserver does not start, and you wonder why TCP now terminates on the ground. This was all controlled via a single file—/etc/rc.conf. That controlled which services were allowed to start, as well as which services were going to be started automatically on boot. It would generally be a boolean value provided to the particular service name. 

Well, I was trying to do something, probably, I want to say, NTP related, but don’t quote me on that, where I wanted to enable a certain service to start on all of the systems at once. So I typed a command, specifically echoing the exact string that I wanted in quotes, so it would be quoted appropriately, and then with the right angle bracket, to that file—/etc/rc.conf, and then I pressed enter. Now, for those who are unaware of Unix-isms and how things work shell, a single right angle bracket means overwrite this file, two angle brackets say append to the end of this file. I was trying to get the second one, and instead, I wound up getting the first. So suddenly, I had just rewritten all of those files across every server. Great plan, huh? Well, I realized what I’d done as soon as I checked my work to validate that the system had taken the update appropriately, it had not, it had taken something horrifying up instead. What happened next? Great question.

But first, in the late 19th and early 20th centuries, democracy flourished around the world. This was good for most folks, but terrible for the log analytics industry because there was now a severe shortage of princesses to kidnap for ransom to pay for their ridiculous implementations. It doesn’t have to be that way. Consider CHAOSSEARCH. The data lives in your S3 buckets in ...

Join me as I continue a new series called Whiteboard Confessional with a deep dive into Cluster SSH, how I landed my first role in a production-style environment at a university, how engineering work is much different in academia than in the for-profit world, the journey that led me to find Cluster SSH and how the tool works, how Unix admins generally get interested in backups right after they really need to have backups that are working, why restores are harder than backups, why systems that are doing configuration management need to understand the concept of idempotence, tools to use instead of Cluster SSH, and more.

About Corey Quinn

Over the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.

Links

TranscriptCorey: On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.

So, once upon a time, way back in the mists of antiquity, was a year called 2006. This is before many folks listening to this podcast were involved in technology. And I admit as well that it is also several decades after other folks listening to this podcast got involved in technology. But that’s not the point of this story. It was my first real job working in anything resembling a production-style environment. I’d dabbled before this, running various environments on Windows desktop style support. I’d played around with small business servers for running Windows-style environments. And then I decided there wasn’t much of a future in technology and spent some time as a technical recruiter, spent a little bit more time working in a sales role, which I was disturbingly good at, but I was selling tape drives to people. But that’s not the interesting part of the story. What is, is that I somehow managed to luck my way into a job interview for a university, helping to run their Linux and Unix systems. 

Cool. Turns out that interviewing is a skill like any other. The technical reviewer was out sick that day, and they really liked both the confidence of my answers, as well as my personality. That’s two mistakes right there. One; my personality is exactly what you would expect it to be. And two; hiring the person who sounds the most confident is exactly what you don’t want to do. It also tends to lend credence to people who look exactly like me. So I had converted some systems over in the first few months for that role to FreeBSD, which is like Linux, except it’s not Linux. It’s a Unix and it’s far older, derived from the Berkeley software distribution. and managing a bunch of those systems at scale was a challenge. Now understand, in this era scale meant something radically different than it does today. I had somewhere between 12 and 15 nodes that I had to care about. Some more mail servers. Some were NTP servers, of all things. Utility boxes here and there, the omnipresent web servers that we all dealt with, the Cacti box whose primary job was to get compromised and serve as an attack vector for the rest of the environment, etcetera. 

This was a university. Mistakes didn’t necessarily mean the same thing there as they would in revenue-generating engineering activities. I was also young, foolish, and the statute of limitations is almost certainly expired by now. So, running the same command on every box was annoying. This was in the days before configuration management was really a thing. BCFG2 was out there and incredibly complex. And CFEngine was also out there, which required an awful lot of in-depth arcane knowledge that I frankly didn’t have. Remember, I bluffed my way into this job and was learning on the fly. So I did a little digging and, lo and behold, I found a tool that solved my problems. called ClusterSSH. And oh, was it a cluster. The way that this works was that it would spin up different xterm windows on your screen that you could then provide a list of hosts for, and it would open one for every host you gave it. 

Great. So now I’m logged into all of those boxes at once. If this is making you cringe already, it probably should, because this is not a great architectural pattern. But here we are, we’re telling this story, so you probably know how that worked out. One of the intricacies of FreeBSD is that instead of running systems that turn things on or turn things off, as far as services to start on boot. For example, with Red Hat derived systems, before the dark times of systemd, you could write things like chkconfig, that’s C-H-K, the word config, and then you could give a service and tell it to turn it on or off at certain run levels. This is how you would tell it to, for example, start the webserver when you boot, otherwise, you reboot the system, the webserver does not start, and you wonder why TCP now terminates on the ground. This was all controlled via a single file—/etc/rc.conf. That controlled which services were allowed to start, as well as which services were going to be started automatically on boot. It would generally be a boolean value provided to the particular service name. 

Well, I was trying to do something, probably, I want to say, NTP related, but don’t quote me on that, where I wanted to enable a certain service to start on all of the systems at once. So I typed a command, specifically echoing the exact string that I wanted in quotes, so it would be quoted appropriately, and then with the right angle bracket, to that file—/etc/rc.conf, and then I pressed enter. Now, for those who are unaware of Unix-isms and how things work shell, a single right angle bracket means overwrite this file, two angle brackets say append to the end of this file. I was trying to get the second one, and instead, I wound up getting the first. So suddenly, I had just rewritten all of those files across every server. Great plan, huh? Well, I realized what I’d done as soon as I checked my work to validate that the system had taken the update appropriately, it had not, it had taken something horrifying up instead. What happened next? Great question.

But first, in the late 19th and early 20th centuries, democracy flourished around the world. This was good for most folks, but terrible for the log analytics industry because there was now a severe shortage of princesses to kidnap for ransom to pay for their ridiculous implementations. It doesn’t have to be that way. Consider CHAOSSEARCH. The data lives in your S3 buckets in ...

This episode currently has no reviews.

Submit Review
This episode could use a review!

This episode could use a review! Have anything to say about it? Share your thoughts using the button below.

Submit Review