Software Sessions

Help
Help

Suggested Topics
- What is Podknife?
  
  Podknife is a curated podcast information and review site designed to be accessible in your browser from any device. We’re working to build the most useful podcast information source available by providing you with as much publicly available information about each podcast in our database as we can find and keeping it as up to date as possible.
- Can I listen to podcasts on Podknife?
  
  Yeah! There's a player at the bottom of the page.
- How can I get my podcast on Podknife?
  
  We’d love to hear about your podcast and consider it for addition. You can suggest your own by logging in and selecting the “Suggest a Podcast” link from the menu. Fill out the form and we’ll take it from there.
- How can I submit a review of a podcast?
  
  To submit a review of a podcast, go to the page of the podcast you want to review and click the "Submit Review" button below the image associated with the podcast. Clicking that button brings you to a new page where you can select a rating between 1 and 5 stars and write a brief review.
- How can I submit a review of a podcast episode?
  
  To submit a review of a podcast episode, go to the page of the podcast whose episode you want to review and find the specific episode you're looking for. Clicking on any episode reveals a "View Episode" button and clicking on that brings you to an episode-specific page where you can find more information displayed about the episode along with a "Submit Review" button. Clicking that brings you to a new page where you can select a rating between 1 and 5 stars and write a brief review of the episode.
- Is there a Podknife app?
  
  Not yet - we're keeping it in the browser at least for now. No app download required to access anything on the site.
- How can I change my display name?
  
  To change your display name, log in to Podknife and select "Account Settings" in the dropdown menu. From the Account Settings page, enter your desired new display name and press the "Save" button to confirm the change.
- How can I change my password?
  
  To change your password, log in to Podknife and select "Account Settings" in the dropdown menu. From the Account Settings page, enter your desired new password in the two fields below "Update Password" and press the "Save" button to confirm the change.
- How can I keep track of all my reviews?
  
  Within the menu, click on "My Profile", you will then be able to check all your reviews for both podcast & episode. You can also check all your favorite podcasts.
- How can I keep track of all my favorited podcasts?
  
  You can find podcasts you’ve favorited displayed on the My Profile page associated with your account (https://podknife.com/profile).
  You can share your My Profile page displaying your favorite podcasts as well as any reviews you’ve written publicly with a link in the form https://podknife.com/users/[displayname] where [displayname] is replaced by your Display Name.
- How can I favorite a podcast?
  
  On each podcast page you’ll find the outline of a star above the podcast title. Clicking this star fills it in and designates the podcast as a favorite (for logged in users) or invites the user to log in or register for an account (for users who are not logged in). You can find favorited podcasts displayed on the My Profile page associated with your account (https://podknife.com/profile).
- How can I register for a Podknife account?
  
  To register, click on the “Register” button in the top right of your browser window (or visit https://podknife.com/users/sign_up). Enter a Display Name to be associated with your account on the site (alongside any reviews you contribute, for example), an Email address to associate with the account and your password (twice for confirmation). Press the “Confirm” button and you should be all set, logged in and ready to review and favorite podcasts as you wish.
- How can I get in touch?
  
  We’d love to hear from you. You can reach us using the Feedback form (https://podknife.com/feedbacks/new) which can be found either in the footer on each page (for users who are not logged in) or in the dropdown menu (for logged in users).
  
  You can also reach us on Twitter by tweeting us @podknife (https://twitter.com/podknife) or on Facebook by messaging us from our page (https://www.facebook.com/podknife/).
- Log In
- Register
- Feedback
- Help
- Privacy Policy
- Terms of Use
© Podknife 2025

This podcast currently has no reviews.

Submit Review

Favorite Report Add all episodes to Queue

Software Sessionsinactive

Publisher |: Jeremy Jung

Media Type |: audio

Podknife tags |: Interview,; Technology,; Web Development

Categories Via RSS |: Education,; Technology

Practical conversations about software development.

People |: Jeremy Jung (@jertype)
Frequency |: Periodic
Explicit |: No

This podcast currently has no reviews.

Submit Review

48 Available Episodes (48 Total) / Average duration: 01:05:58

Date Duration Most Reviewed Highest Rated

48 Available Episodes (48 Total)Average duration: 01:05:58

Mar 02, 2023

01:20:27

Luca Casonato is the tech lead for Deno Deploy and a TC39 delegate.

Deno is a JavaScript runtime from the original creator of NodeJS, Ryan Dahl.

Topics covered:

What's a JavaScript runtime
How V8 is used
Why Deno was created
The W3C WinterCG for server-side JavaScript
Why it's difficult to ship new features in Node
The benefits of web standards
Creating an all-inclusive toolset like Rust and Go
Deno's node compatibility layer
Use cases for WebAssembly
Benefits and implementation of Deno Deploy
Reasons to deploy on the edge
What's coming next

Luca

Deno

Deno Users

Transcript

You can help edit this transcript on GitHub.

[00:00:07] Jeremy: Today I'm talking to Luca Casonato. He's a member of the Deno Core team and a TC 39 Delegate.

[00:00:06] Luca: Hey, thanks for having me.

What's a runtime?

[00:00:07] Jeremy: So today we're gonna talk about Deno, and on the website it says, Deno is a runtime for JavaScript and TypeScript. So I thought we could start with defining what a runtime is.

[00:00:21] Luca: Yeah, that's a great question. I think this question actually comes up a lot. It's, it's like sometimes we also define Deno as a headless browser, or I don't know, a, a JavaScript script execution tool. what actually defines runtime? I, I think what makes a runtime a runtime is that it is a, it's implemented in native code.

It cannot be self-hosted. Like you cannot self-host a JavaScript runtime. and it executes JavaScript or TypeScript or some other scripting language, without relying on, well, yeah, I guess it's the self-hosting thing. Like it's, it's essentially a, a JavaScript execution engine, which is not self-hosted.

So yeah, it, it maybe has IO bindings, but it doesn't necessarily need to like, it. Maybe it allows you to read the, from the file system or, or make network calls. Um, but it doesn't necessarily have to. It's, I think the, the primary definition is something which can execute JavaScript without already being written in JavaScript.

How V8 and JavaScript runtimes are related

[00:01:20] Jeremy: And when we hear about JavaScript run times, whether it's Deno or Node or Bun, or anything else, we also hear about it in the context of v8. Could you explain the relationship between V8 and a JavaScript run time?

[00:01:36] Luca: Yeah. So V8 and, and JavaScript core and Spider Monkey, these are all JavaScript engines. So these are the low level virtual machines that can execute or that can parse your JavaScript code. turn it into byte code, maybe turn it into, compiled machine code, and then execute that code. But these engines, Do not implement any IO functions.

They do not. They implement the JavaScript spec as is written. and then they provide extension hooks for, they call these host environments, um, like environments that embed these engines to provide custom functionalities to essentially poke out of the sandbox, out of the, out of the virtual machine. Um, and this is used in browsers.

Like browsers have, have these engines built in. This is where they originated from. Um, and then they poke holes into this, um, sandbox virtual machine to do things like, I don't know, writing to the dom or, or console logging or making fetch calls and all these kinds of things. And what a runtime essentially does, a JavaScript runtime is it takes one of these engines and.

It then provides its own set of host APIs, like essentially its own set of holes. It pokes into the sandbox. and depending on what the runtime is trying to do, um, the weight will do. This is gonna be different and, and the sort of API that is ultimately exposed to the end user is going to be different.

For example, if you compare Deno and node, like node is very loosey goosey, about how it pokes holds into the sandbox, it sort of just pokes them everywhere. And this makes it difficult to enforce things like, runtime permissions for example. Whereas Deno is much more strict about how it, um, pokes holds into its sandbox.

Like everything is either a web API or it's behind in this Deno name space, which means that it's, it's really easy to find, um, places where, where you're poking out of the sandbox. and really you can also compare these to browsers. Like browsers are also JavaScript run times. Um, they're just not headless.

JavaScript run times, but JavaScript run times that also have a ui. and. . Yeah. Like there, there's, there's a whole Bunch of different kinds of JavaScript run times, and I think we're also seeing a lot more like embedded JavaScript run times. Like for example, if you've used React Native before, you, you may be using Hermes as a, um, JavaScript engine in your Android app, which is like a custom JavaScript engine written just for, for, for React native.

Um, and this also is embedded within a, like react native run time, which is specific to React native. so it's also possible to have run times, for example, that are, that can be where the, where the back backing engine can be exchanged, which is kind of cool.

[00:04:08] Jeremy: So it sounds like V8's role, one way to look at it is it can execute JavaScript code, but only pure functions. I suppose you

[00:04:19] Luca: Pretty much. Yep.

[00:04:21] Jeremy: Do anything that doesn't interact with IO so you think about browsers, you were mentioning you need to interact with a DOM or if you're writing a server side application, you probably need to receive or make HTTP requests, that sort of thing.

And all of that is not handled by v8. That has to be handled by an external runtime.

[00:04:43] Luca: Exactly Like, like one, one. There's, there's like some exceptions to this. For example, JavaScript technically has some IO built in with, within its standard library, like math, random. It's like random number. Generation is technically an IO operation, so, Technically V8 has some IO built in, right? And like getting the current date from the user, that's also technically IO

So, like there, there's some very limited edge cases. It's, it's not that it's purely pure, but V8 for example, has a flag to turn it completely deterministic. which means that it really is completely pure. And this is not something which run times usually have. This is something like the feature of an engine because the engine is like so low level that it can essentially, there's so little IO that it's very easy to make deterministic where a runtime higher level, um, has, has io, um, much more difficult to make deterministic.

[00:05:39] Jeremy: And, and for things like when you're working with JavaScript, there's, uh, asynchronous programming

[00:05:46] Luca: mm-hmm.

Concurrent JavaScript execution

[00:05:47] Jeremy: So you have concurrency and things like that. Is that a part of V8 or is that the responsibility of the run time?

[00:05:54] Luca: That's a great question. So there's multiple parts to this. There's the part, um, there, there's JavaScript promises, um, and sort of concurrent Java or well, yes, concurrent JavaScript execution, which is sort of handled by v8, like v8. You can in, in pure v8, you can create a promise, and you can execute some code within that promise.

But without IO there's actually no way to defer time, uh, which means that in with pure v8, you can either, you can create a promise. Which executes right now. Or you can create a promise that never executes, but you can't create a promise that executes in 10 seconds because there's no way to measure 10 seconds asynchronously.

What run times do is they add something called an event loop on top of this, um, on top of the base engine and that event loop, for example, like a very simple event loop, for example, might have a timer in it, which every second looks at if there's a timer schedule to run within that second.

And if it does, if, if that timer exists, it'll go call out to V8 and say, you can now execute that promise. but V8 is still the one that's keeping track of, of like which promises exist, and the code that is meant to be invoked when they resolve all that kind of thing. Um, but the underlying infrastructure that actually invokes which promises get resolved at what point in time, like the asynchronous, asynchronous IO is what this is called.

This is driven by the event loop, um, which is implemented by around time. So Deno, for example, it uses, Tokio for its event loop. This is a, um, an event loop written in Rust. it's very popular in the Rust ecosystem. Um, node uses libuv. This is a relatively popular runtime or, or event loop, um, implementation for c uh, plus plus.

And, uh, libuv was written for Node. Tokio was not written for Deno. But um, yeah, Chrome has its own event loop implementation. Bun has its own event loop implementation.

[00:07:50] Jeremy: So we, we might go a little bit more into that later, but I think what we should probably go into now is why make Deno, because you have Node that's, uh, currently very popular. The co-creator of Deno, to my understanding, actually created Node. So maybe you could explain to our audience what was missing or what was wrong with Node, where they decided I need to create, a new runtime.

Why create a new runtime? (standards compliance)

[00:08:20] Luca: Yeah. So the, the primary point of concern here was that node was slowly diverging from browser standards with no real path to, to, to, re converging. Um, like there was nothing that was pushing node in the direction of standards compliance and there was nothing, that was like sort of forcing node to innovate.

and we really saw this because in the time between, I don't know, 2015, 2018, like Node was slowly working on esm while browsers had already shipped ESM for like three years. , um, node did not have fetch. Node hasn't had, or node only at, got fetch last year. Right? six, seven years after browsers got fetch.

Node's stream implementation is still very divergent from, from standard web streams. Node was very reliant on callbacks. It still is, um, like promises in many places of the Node API are, are an afterthought, which makes sense because Node was created in a time before promises existed. Um, but there was really nothing that was pushing Node forward, right?

Like nobody was actively investing in, in, in improving the API of Node to be more standards compliant. And so what we really needed was a new like Greenfield project, which could demonstrate that actually writing a new server side run. Is A viable, and b is totally doable with an API that is more standards combined.

Like essentially you can write a browser, like a headless browser and have that be an excellent to use JavaScript runtime, right? And then there was some things that were I on top of that, like a TypeScript support because TypeScript was incredibly, or is still incredibly popular. even more so than it was four years ago when, when Deno was created or envisioned, um, this permission system like Node really poked holes into the V8 sandbox very early on with, with like, it's gonna be very difficult for Node to ever, ever, uh, reconcile this, this.

Especially cuz the, some, some of the APIs that it, that it exposes are just so incredibly low level that like, I don't know, you can mutate random memory within your process. Um, which like if you want to have a, a secure sandbox like that just doesn't work. Um, it's not compatible. So there was really needed to be a place where you could explore this, um, direction and, and see if it worked.

And Deno was that. Deno still is that, and I think Deno has outgrown that now into something which is much more usable as, as like a production ready runtime. And many people do use it, in production. And now Deno is on the path of slowly converging back with Node, um, in from both directions. Like Node is slowly becoming more standards compliant. and depending on who you ask this was, this was done because of Deno and some people said it would had already been going on and Deno just accelerated it. but that's not really relevant because the point is that like Node is becoming more standard compliant and, and the other direction is Deno is becoming more node compliant.

Like Deno is implementing node compatibility layers that allow you to run code that was originally written for the node ecosystem in the standards compliant run time. so through those two directions, the, the run times are sort of, um, going back towards each other. I don't think they'll ever merge. but we're, we're, we're getting to a point here pretty soon, I think, where it doesn't really matter what runtime you write for, um, because you'll be able to write code written for one runtime in the other runtime relatively easily.

[00:12:03] Jeremy: If you're saying the two are becoming closer to one another, becoming closer to the web standard that runs in the browser, if you're talking to someone who's currently developing in node, what's the incentive for them to switch to Deno versus using Node and then hope that eventually they'll kind of meet in the middle.

[00:12:26] Luca: Yeah, so I think, like Deno is a lot more than just a runtime, right? Like a runtime executes JavaScript, Deno executes JavaScript, it executes type script. But Deno is so much more than that. Like Deno has a built-in format, or it has a built-in linter. It has a built-in testing framework, a built-in benching framework.

It has a built-in Bundler, it, it like can create self-hosted, um, executables. yeah, like Bundle your code and the Deno executable into a single executable that you can trip off to someone. Um, it has a dependency analyzer. It has editor integrations. it has, Yeah. Like I could go on for hours, (laughs) about all of the auxiliary tooling that's inside of Deno, that's not a JavaScript runtime.

And also Deno as a JavaScript runtime is just more standards compliant than any of the other servers at Runtimes right now. So if, if you're really looking for something which is standards complaint, which is gonna like live on forever, then it's, you know, like you cannot kill off the Fetch API ever.

The Fetch API is going to live forever because Chrome supports it. Um, and the same goes for local storage and, and like, I don't know, the Blob API and all these other web APIs like they, they have shipped and browsers, which means that they will be supported until the end of time. and yeah, maybe Node has also reached that with its api probably to some extent.

but yeah, don't underestimate the power of like 3 billion Chrome users. that would scream immediately if the Fetch API stopped working Right?

[00:13:50] Jeremy: Yeah, I, I think maybe what it sounds like also is that because you're using the API that's used in the browser places where you deploy JavaScript applications in the future, you would hope that those would all settle on using that same API so that if you were using Deno, you could host it at different places and not worry about, do I need to use a special API maybe that you would in node?

WinterCG (W3C group for server side JavaScript)

[00:14:21] Luca: Yeah, exactly. And this is actually something which we're specifically working towards. So, I don't know if you've, you've heard of WinterCG? It's a, it's a community group at the W3C that, um, CloudFlare and, and Deno and some others including Shopify, have started last year. Um, we're essentially, we're trying to standardize the concept of what a server side JavaScript runtime is and what APIs it needs to have available to be standards compliant.

Um, and essentially making this portability sort of written down somewhere and like write down exactly what code you can write and expect to be portable. And we can see like that all of the big, all of the big players that are involved in, in, um, building JavaScript run times right now are, are actively, engaged with us at WinterCG and are actively building towards this future.

So I would expect that any code that you write today, which runs. in Deno, runs in CloudFlare, workers runs on Netlify Edge functions, runs on Vercel's Edge, runtime, runs on Shopify Oxygen, is going to run on the other four. Um, of, of those within the next couple years here, like I think the APIs of these is gonna converge to be essentially the same.

there's obviously gonna always be some, some nuances. Um, like, I don't know, Chrome and Firefox and Safari don't perfectly have the same API everywhere, right? Like Chrome has some web Bluetooth capabilities that Safari doesn't, or Firefox has some, I don't know, non-standard extensions to the error object, which none of the other runtimes do.

But overall you can expect these front times to mostly be aligned. yeah, and I, I think that's, that's really, really, really excellent and that, that's I think really one of the reasons why one should really consider, like building for, for this standard runtime because it, it just guarantees that you'll be able to host this somewhere in five years time and 10 years time, with, with very little effort.

Like even if Deno goes under or CloudFlare goes under, or, I don't know, nobody decides to maintain node anymore. It'll be easy to, to run somewhere else. And also I expect that the big cloud vendors will ultimately, um, provide, manage offerings for, for the standards compliant JavaScript on time as well.

Is Node part of WinterCG?

[00:16:36] Jeremy: And this WinterCG group is Node a part of that as well?

[00:16:41] Luca: Um, yes, we've invited Node, um, to join, um, due to the complexities of how node's, internal decision making system works. Node is not officially a member of WinterCG. Um, there is some individual members of the node, um, technical steering committee, which are participating. for example, um, James m Snell is, is the co-chair, is my co-chair on, on WinterCG.

He also works at CloudFlare. He's also a node, um, TSC member, Mateo Colina, who has been, um, instrumental to getting fetch landed in Node, um, is also actively involved. So Node is involved, but because Node is node and and node's decision making process works the way it does, node is not officially listed anywhere as as a member.

but yeah, they're involved and maybe they'll be a member at some point. But, yeah, let's. , see (laughs)

[00:17:34] Jeremy: Yeah. And, and it, so it, it sounds like you're thinking that's more of a, a governance or a organizational aspect of note than it is a, a technical limitation. Is that right?

[00:17:47] Luca: Yeah. I obviously can't speak for the node technical steering committee, but I know that there's a significant chunk of the node technical steering committee that is, very favorable towards, uh, standards compliance. but parts of the Node technical steering committee are also not, they are either indifferent or are actively, I dunno if they're still actively working against this, but have actively worked against standards compliance in the past.

And because the node governance structure is very, yeah, is, is so, so open and let's, um, and let's, let's all these voices be heard, um, that just means that decision making processes within Node can take so long, like. . This is also why the fetch API took eight years to ship. Like this was not a technical problem.

and it is also not a technical problem. That Node does not have URL pattern support or, the file global or, um, that the web crypto API was not on this, on the global object until like late last year, right? Like, these are not technical problems, these are decision making problems. Um, and yeah, that was also part of the reason why we started Deno as, as like a separate thing, because like you can try to innovate node, from the inside, but innovating node from the inside is very slow, very tedious, and requires a lot of fighting.

And sometimes just showing somebody, from the outside like, look, this is the bright future you could have, makes them more inclined to do something.

Why it takes so long to ship new features in Node

[00:19:17] Jeremy: Do, do you have a sense for, you gave the example of fetch taking eight years to, to get into node. Do you, do you have a sense of what the typical objection is to, to something like that? Like I, I understand there's a lot of people involved, but why would somebody say, I, I don't want this

[00:19:35] Luca: Yeah. So for, for fetch specifically, there was a, there was many different kinds of concerns. Um, one of the, I, I can maybe list two of them. One of them was for example, that the fetch API is not a good API and as such, node should not have it. which is sort of. missing the point of, because it's a standard API, how good or bad the API is is much less relevant because if you can share the API, you can also share a wrapper that's written around the api.

Right? and then the other concern was, node does need fetch because Node already has an HTTP API. Um, so, so these are both kind of examples of, of concerns that people had for a long time, which it took a long time to either convince these people or, or to, push the change through anyway. and this is also the case for, for other things like, for example, web, crypto, um, like why do we need web crypto?

We already have node crypto, or why do we need yet another streams? Implementation node already has four different streams implementations. Like, why do we need web streams? and the, the. Like, I don't know if you know this XKCD of, there's 14 competing standards. so let's write a 15th standard, to unify them all.

And then at the end we just have 15 competing standards. Um, so I think this is also the kind of concern that people were concerned about, but I, I think what we've seen here is that this is really not a concern that one needs to have because it ends up that, or it turns out in the end that if you implement web APIs, people will use web APIs and will use web APIs only for their new code.

it takes a while, but we're seeing this with ESM versus require like new code written with require much less common than it was two years ago. And, new code now using like Xhr, whatever it's called, form request or. You know, the one, I mean, compared to using Fetch, like nobody uses that name.

Everybody uses Fetch. Um, and like in Node, if you write a little script, like you're gonna use Fetch, you're not gonna use like Nodes, htp, dot get API or whatever. and we're gonna see the same thing with Readable Stream. We're gonna see the same thing with Web Crypto. We're gonna see, see the same thing with Blob.

I think one of the big ones where, where Node is still, I, I, I don't think this is one that's ever gonna get solved, is the, the Buffer global and Node. like we have the Uint8, this Uint8 global, um, and like all the run times including browsers, um, and Buffer is like a super set of that, but it's in global scope.

So it, it's sort of this non-standard extension of unit eight array that people in node like to use and it's not compatible with anything else. Um, but because it's so easy to get at, people use it anyway. So those are, those are also kind of problems that, that we'll have to deal with eventually. And maybe that means that at some point the buffer global gets deprecated and I don't know, probably can never get removed.

But, um, yeah, these are kinds of conversations that the no TSE is going have to have internally in, I don't know, maybe five years.

Write once, have it run on any hosting platform

[00:22:37] Jeremy: Yeah, so at a high level, What's shipped in the browser, it went through the ECMAScript approval process. People got it into the browser. Once it's in the browser, probably never going away. And because of that, it's safe to build on top of that for these, these server run times because it's never going away from the browser.

And so everybody can kind of use it into the future and not worry about it. Yeah.

[00:23:05] Luca: Exactly. Yeah. And that's, and that's excluding the benefit that also if you have code that you can write once and use in both the browser and the server side around time, like that's really nice. Um, like that, that's the other benefit.

[00:23:18] Jeremy: Yeah. I think that's really powerful. And that right now, when someone's looking at running something in CloudFlare workers versus running something in the browser versus running something in. it's, I think a lot of people make the assumption it's just JavaScript, so I can use it as is. But it, it, there are at least currently, differences in what APIs are available to you.

[00:23:43] Luca: Yep. Yep.

Why bundle so many things into Deno?

[00:23:46] Jeremy: Earlier you were talking about how Deno is more than just the runtime. It has a linter, formatter, file watcher there, there's all sorts of stuff in there. And I wonder if you could talk a little bit to the, the reasoning behind that

[00:24:00] Luca: Mm-hmm.

[00:24:01] Jeremy: Having them all be separate things.

[00:24:04] Luca: Yeah, so the, the reasoning here is essentially if you look at other modern run time or mo other modern languages, like Rust is a great example. Go is a great example. Even though Go was designed around the same time as Node, it has a lot of these same tools built in. And what it really shows is that if the ecosystem converges, like is essentially forced to converge on a single set of built-in tooling, a that built-in tooling becomes really, really excellent because everybody's using it.

And also, it means that if you open any project written by any go developer, any, any rest developer, and you look at the tests, you immediately understand how the test framework works and you immediately understand how the assertions work. Um, and you immediately understand how the build system works and you immediately understand how the dependency imports work.

And you immediately understand like, I wanna run this project and I wanna restart it when my file changes. Like, you immediately know how to do that because it's the same everywhere. Um, and this kind of feeling of having to learn one tool and then being able to use all of the projects, like being able to con contribute to open source when you're moving jobs, whatever, like between personal projects that you haven't touched in two years, you know, like being able to learn this once and then use it everywhere is such an incredibly powerful tool.

Like, people don't appreciate this until they've used a runtime or, or, or language which provides this to them. Like, you can go to any go developer and ask them if they would like. There, there's this, there's this saying in the Go ecosystem, um, that Go FMT is nobody's favorite, but, or, uh, wait, no, I don't remember what the, how the saying goes, but the saying essentially implies that the way that go FMT formats code, maybe not everybody likes, but everybody loves go F M T anyway, because it just makes everything look the same.

And like, you can read your friend's code, your, your colleagues code, your new jobs code, the same way that you did your code from two years ago. And that's such an incredibly powerful feeling. especially if it's like well integrated into your IDE you clone a repository, open that repository, and like your testing panel on the left hand side just populates with all the tests, and you can click on them and run them.

And if an assertion fails, it's like the standard output format that you're already familiar with. And it's, it's, it's a really great feeling. and if you don't believe me, just go try it out and, and then you will believe me, (laughs)

[00:26:25] Jeremy: Yeah. No, I, I'm totally with you. I, I think it's interesting because with JavaScript in particular, it feels like the default in the community is the opposite, right? There's so many different ways. Uh, there are so many different build tools and testing frameworks and, formatters, and it's very different than, like you were mentioning, a go or a Rust that are more recent languages where they just include that, all Bundled in. Yeah.

[00:26:57] Luca: Yeah, and I, I think you can see this as well in, in the time that average JavaScript developer spends configuring their tooling compared to a rest developer. Like if I write Rust, I write Rust, like all day, every day. and I spend maybe two, 3% of my time configuring Rust tooling like. Doing dependency imports, opening a new project, creating a format or config file, I don't know, deleting the build directory, stuff like that.

Like that's, that's essentially what it means for me to configure my rest tooling. Whereas if you compare this to like a front-end JavaScript project, like you have to deal with making sure that your React version is compatible with your React on version, it's compatible with your next version is compatible with your ve version is compatible with your whatever version, right?

this, this is all not automatic. Making sure that you use the right, like as, as a front end developer, you developer. You don't have just NPM installed, no. You have NPM installed, you have yarn installed, you have PNPM installed. You probably have like, Bun installed. And, and, and I don't know to use any of these, you need to have corepack enabled in Node and like you need to have all of their global bin directories symlinked into your or, or, or, uh, included in your path.

And then if you install something and you wanna update it, you don't know, did I install it with yarn? Did I install it with N pNPM? Like this is, uh, significant complexity and you, you tend to spend a lot of time dealing with dependencies and dealing with package management and dealing with like tooling configuration, setting up esent, setting up prettier.

and I, I think that like, especially Prettier, for example, really showed, was, was one of the first things in the JavaScript ecosystem, which was like, no, we're not gonna give you a config where you, that you can spend like six hours configuring, it's gonna be like seven options and here you go. And everybody used it because, Nobody likes configuring things.

It turns out, um, and even though there's always the people that say, oh, well, I won't use your tool unless, like, we, we get this all the time. Like, I'm not gonna use Deno FMT because I can't, I don't know, remove the semicolons or, or use single quotes or change my tab width to 16. Right? Like, wait until all of your coworkers are gonna scream at you because you set the tab width to 16 and then see what they change it to.

And then you'll see that it's actually the exact default that, everybody uses. So it'll, it'll take a couple more years. But I think we're also gonna get there, uh, like Node is starting to implement a, a test runner. and I, I think over time we're also gonna converge on, on, on, on like some standard build tools.

Like I think ve, for example, is a great example of this, like, Doing a front end project nowadays. Um, like building new front end tooling that's not built on Vite Yeah. Don't like, Vite's it's become the standard and I think we're gonna see that in a lot more places.

We should settle on what tools to use

[00:29:52] Jeremy: Yeah, though I, I think it's, it's tricky, right? Because you have so many people with their existing projects. You have people who are starting new projects and they're just searching the internet for what they should use. So you're, you're gonna have people on web pack, you're gonna have people on Vite, I guess now there's gonna be Turbo pack, I think is another one that's

[00:30:15] Luca: Mm-hmm.

[00:30:16] Jeremy: There's, there's, there's all these different choices, right? And I, I think it's, it's hard to, to really settle on one, I guess,

[00:30:26] Luca: Yeah,

[00:30:27] Jeremy: uh, yeah.

[00:30:27] Luca: like I, I, I think this is, this is in my personal opinion also failure of the Node Technical Steering committee, for the longest time to not decide that yes, we're going to bless this as the standard format for Node, and this is the standard package manager for Node. And they did, they sort of did, like, they, for example, node Blessed NPM as the standard, package manager for N for for node.

But it didn't innovate on npm. Like no, the tech nodes, tech technical steering committee did not force NPM to innovate NPMs, a private company ultimately bought by GitHub and they had full control over how the NPM cli, um, evolved and nobody forced NPM to, to make sure that package install times are six times faster than they were.

Three years ago, like nobody did that. so it didn't happen. And I think this is, this is really a failure of, of the, the, the, yeah, the no technical steering committee and also the wider JavaScript ecosystem of not being persistent enough with, with like focus on performance, focus on user experience, and, and focus on simplicity.

Like things got so out of hand and I'm happy we're going in the right direction now, but, yeah, it was terrible for some time. (laughs)

Node compatibility layer

[00:31:41] Jeremy: I wanna talk a little bit about how we've been talking about Deno in the context of you just using Deno using its own standard library, but just recently last year you added a compatibility shim where people are able to use node libraries in Deno.

[00:32:01] Luca: Mm-hmm.

[00:32:01] Jeremy: And I wonder if you could talk to, like earlier you had mentioned that Deno has, a different permissions model.

on the website it mentions that Deno's HTTP server is two times faster than node in a Hello World example. And I'm wondering what kind of benefits people will still get from Deno if they choose to use packages from Node.

[00:32:27] Luca: Yeah, it's a great question. Um, so I think a, again, this is sort of a like, so just to clarify what we actually implemented, like what we have is we have support for you to import NPM packages. Um, so you can import any NPM package from NPM, from your type script or JavaScript ECMAScript module, um, that you have, you already have for your Deno code.

Um, and we will under the hood, make sure that is installed somewhere in some directory globally. Like PNPM does. There's no local node modules folder you have to deal with. There's no package of Jason you have to deal with. Um, and there's no, uh, package. Jason, like versioning things you need to deal with.

Like what you do is you do import cowsay from NPM colon cowsay at one, and that will import cowsay with like the semver tag one. Um, and it'll like do the sim resolution the same way node does, or the same way NPM does rather. And what you get from that is that essentially it gives you like this backdoor to a callout to all of the existing node code that Isri been written, right?

Like you cannot expect that Deno developers, write like, I don't know. There was this time when Deno did not really have that many, third party modules yet. It was very early on, and I don't know the, you either, if you wanted to connect to Postgres and there was no Postgres driver available, then the solution was to write your own Postgres driver.

And that is obviously not great. Um, (laughs) . So the better solution here is to let users for these packages where there's no Deno native or, or, or web native or standard native, um, package for this yet that is importable with url. Um, specifiers, you can import this from npm. Uh, so it's sort of this like backdoor into the existing NPM ecosystem.

And we explicitly, for example, don't allow you to, create a package.json file or, import bare node specifiers because we don't, we, we want to stay standards compliant here. Um, but to make this work effectively, we need to give you this little back door. Um, and inside of this back door. All hell is like, or like everything is terrible inside there, right?

Like inside there you can do bare specifiers and inside there you can like, uh, there's package.json and there's crazy node resolution and underscore underscore DIRNAME and common js. And like all of that stuff is supported inside of this backdoor to make all the NPM packages work. But on the outside it's exposed as this nice, ESM only, NPM specifiers.

and the, the reason you would want to use this over, like just using node directly is because again, like you wanna use TypeScript, no config, like necessary. You want to use, you wanna have a formatter you wanna have a linter, you wanna have tooling that like does testing and benchmarking and compiling or whatever.

All of that's built in. You wanna run this on the edge, like close to your users and like 30 different, 35 different, uh, points of presence. Um, it's like, Okay, push it to your git repository. Go to this website, click a button two times, and it's running in 35 data centers. like this is, this is the kind of ex like developer experience that you can, you do not get.

You, I will argue that you cannot get with Node right now. Like even if you're using something like ts-node, it is not possible to get the same level of developer experience that you do with Deno. And the, the, the same like speed at which you can iterate, iterate on your projects, like create new projects, iterate on them is like incredibly fast in Deno.

Like, I can open a, a, a folder on my computer, create a single file, may not ts, put some code in there and then call Deno Run may not. And that's it. Like I don't, I did not need to do NPM install I did not need to do NPM init -y and remove the license and version fields and from, from the generated package.json and like set private to true and whatever else, right?

It just all works out of the box. And I think that's, that's what a lot of people come to deno for and, and then ultimately stay for. And also, yeah, standards compliance. So, um, things you build in Deno now are gonna work in five, 10 years, with no hassle.

Node shims and testing

[00:36:39] Jeremy: And so with this compatibility layer or this, this shim, is it where the node code is calling out to node APIs and you're replacing those with Deno compatible equivalents?

[00:36:54] Luca: Yeah, exactly. Like for example, we have a shim in place that shims out the node crypto API on top of the web crypto api. Like sort of, some, some people may be familiar with this in the form of, um, Browserify shims. if anybody still remembers those, it's essentially. , your front end tooling, you were able to import from like node crypto in your front end projects and then behind the scenes your web packs or your browser replies or whatever would take that import from node crypto and would replace it with like the shim that was essentially exposed the same APIs node crypto, but under the hood, wasn't implemented with native calls, but was implemented on top of web crypto, or implemented in user land even.

And Deno does something similar. there's a couple edge cases of APIs that there's, where, where we do not expose the underlying thing that we shim to, to end users, outside of the node shim. So like there's some, some APIs that I don't know if I have a good example, like node nextTick for example.

Um, like to properly be able to shim node nextTick, you need to like implement this within the event loop in the runtime. and. , you don't need this in Deno, because Deno, you use the web standard queueMicrotask to, to do this kind of thing. but to be able to shim it correctly and run node applications correctly, we need to have this sort of like backdoor into some ugly APIs, um, which, which natively integrate in the runtime, but, yeah, like allow, allow this node code to run.

[00:38:21] Jeremy: A, anytime you're replacing a component with a, a shim, I think there's concerns about additional bugs or changes in behavior that can be introduced. Is that something that you're seeing and, and how are you accounting for that?

[00:38:38] Luca: Yeah, that's, that's an excellent question. So this is actually a, a great concern that we have all the time. And it's not just even introducing bugs, sometimes it's removing bugs. Like sometimes there's bugs in the node standard library which are there, and people are relying on these bugs to be there for the applications to function correctly.

And we've seen this a lot, and then we implement this and we implement from scratch and we don't make that same bug. And then the test fails or then the application fails. So what we do is, um, we actually run node's test suite against Deno's Shim layer. So Node has a very extensive test suite for its own standard library, and we can run this suite against, against our shims to find things like this.

And there's still edge cases, obviously, which node, like there was, maybe there's a bug which node was not even aware of existing. Um, where maybe this, like it's is, it's now standard, it's now like intended behavior because somebody relies on it, right? Like the second somebody relies on, on some non-standard or some buggy behavior, it becomes intended.

Um, but maybe there was no test that explicitly tests for this behavior. Um, so in that case we'll add our own tests to, to ensure that. But overall we can already catch a lot of these by just testing, against, against node's tests. And then the other thing is we run a lot of real code, like we'll try run Prisma and we'll try run Vite and we'll try run NextJS and we'll try run like, I don't know, a bunch of other things that people throw at us and, check that they work and they work and there's no bugs. Then we did our job well and our shims are implemented correctly. Um, and then there's obviously always the edge cases where somebody did something absolutely crazy that nobody thought possible. and then they'll open an issue on the Deno repo and we scratch our heads for three days and then we'll fix it.

And then in the next release there'll be a new bug that we added to make the compatibility with node better. so yeah, but I, yeah. Running tests is the, is the main thing running nodes test.

Performance should be equal or better

[00:40:32] Jeremy: Are there performance implications? If someone is running an Express App or an NextJS app in Deno, will they get any benefits from the Deno runtime and performance?

[00:40:45] Luca: Yeah. It's actually, there is performance implications and they're usually. The opposite of what people think they are. Like, usually when you think of performance implications, it's always a negative thing, right? It's always okay. Like you, it's like a compromise. like the shim layer must be slower than the real node, right?

It's not like we can run express faster than node can run, express. and obviously not everything is faster in Deno than it is in node, and not everything is faster in node than it is in Deno. It's dependent on the api, dependent on, on what each team decided to optimize. Um, and this also extends to other run times.

Like you can always cherry pick results, like, I don't know, um, to, to make your runtime look faster in certain benchmarks. but overall, what really matters is that you do not like, the first important step for for good node compatibility is to make sure that if somebody runs your code or runs their node code in Deno or your other run type or whatever, It performs at least the same.

and then anything on top of that great cherry on top. Perfect. but make sure the baselines is at least the same. And I think, yeah, we have very few APIs where we behave, where we, where, where like there's a significant performance degradation in Deno compared to Node. Um, and like we're actively working on these things.

like Deno is not a, a, a project that's done, right? Like we have, I think at this point, like 15 or 16 or 17 engineers working on Deno, spanning across all of our different projects. And like, we have a whole team that's dedicated to performance, um, and a whole team that's dedicated node compatibility.

so like these things get addressed and, and we make patch releases every week and a minor release every four weeks. so yeah, it's, it's not a standstill. It's, uh, constantly improving.

What should go into the standard library?

[00:42:27] Jeremy: Uh, something that kind of makes Deno stand out as it's standard library. There's a lot more in there than there is in in the node one.

[00:42:38] Luca: Mm-hmm.

[00:42:39] Jeremy: Uh, I wonder if you could speak to how you make decisions on what should go into it.

[00:42:46] Luca: Yeah, so early on it was easier. Early on, the, the decision making process was essentially, is this something that a top 100 or top 1000 NPM library implements? And if it is, let's include it. and the decision making is still short of based on that. But right now we've already implemented most of the low hanging fruit.

So things that we implement now are, have, have discussion around them whether we should implement them. And we have a process where, well we have a whole team of engineers on our side and we also have community members that, that will review prs and, and, and make comments. Open issues and, and review those issues, to sort of discuss the pros and cons of adding any certain new api.

And sometimes it's also that somebody opens an issue that's like, I want, for example, I want an API to, to concatenate two unit data arrays together, which is something you can really easily do node with buffer dot con cat, like the scary buffer thing. and there's no standards way of doing that right now.

So we have to have a little utility function that does that. But in parallel, we're thinking about, okay, how do we propose, an addition to the web standards now that makes it easy to concatenate iterates in the web standards, right? yeah, there's a lot to it. Um, but it's, it's really, um, it's all open, like all of our, all of our discussions for, for, additions to the standard library and things like that.

It's all, all, uh, public on GitHub and the GitHub issues and GitHub discussions and GitHub prs. Um, so yeah, that's, that's where we do that.

[00:44:18] Jeremy: Yeah, cuz to give an example, I was a little surprised to see that there is support for markdown front matter built into the standard library. But when you describe it as we look at the top a hundred thousand packages, are people looking at markdown? Are they looking at front matter? I, I'm sure there's a fair amount that are so that that makes sense.

[00:44:41] Luca: Yeah, like it sometimes, like that one specifically was driven by, like, our team was just building a lot of like little blog pages and things like that. And every time it was either you roll your own front matter part or you look for one, which has like a subtle bug here and the other one has a subtle bug there and really not satisfactory with any of them.

So, we, we roll that into the standard library. We add good test coverage for it good, add good documentation for it, and then it's like just a resource that people can rely on. Um, and you don't, you then don't have to make the choice of like, do I use this library to do my front meta parsing or the other library?

No, you just use the one that's in the standard library. It's, it's also part of this like user experience thing, right? Like it's just a much nicer user experience, not having to make a choice, about stuff like that. Like completely inconsequential stuff. Like which library do we use to do front matter parsing? (laughs)

[00:45:32] Jeremy: yeah. I mean, I think when, when that stuff is not there, then I think the temptation is to go, okay, let me see what node modules there are that will let me parse the front matter. Right. And then it, it sounds like probably ideally you want people to lean more on what's either in the standard library or what's native to the Deno ecosystem.

Yeah.

[00:46:00] Luca: Yeah. Like the, the, one of the big benefits is that the Deno Standard Library is implemented on top of web standards, right? Like it's, it's implemented on top of these standard APIs. so for example, there's node front matter libraries which do not run in the browser because the browser does not have the buffer global.

maybe it's a nice library to do front matter pricing with, but. , you choose it and then three days later you decide that actually this code also needs to run in the browser, and then you need to go switch your front matter library. Um, so, so those are also kind of reasons why we may include something in Strand Library, like maybe there's even really good module already to do something.

Um, but if there's certain reliance on specific node features that, um, we would like that library to also be compatible with, with, with web standards, we'll, uh, we might include in the standard library, like for example, YAML Parser, um, or the YAML Parser in the standard library is, is a fork of, uh, of the node YAML module.

and it's, it's essentially that, but cleaned up and, and made to use more standard APIs rather than, um, node built-ins.

[00:47:00] Jeremy: Yeah, it kind of reminds me a little bit of when you're writing a front end application, sometimes you'll use node packages to do certain things and they won't work unless you have a compatibility shim where the browser can make use of certain node APIs. And if you use the APIs that are built into the browser already, then you won't, you won't need to deal with that sort of thing.

[00:47:26] Luca: Yeah. Also like less Bundled size, right? Like if you don't have to shim that, that's less, less code you have to ship to the client.

WebAssembly use cases

[00:47:33] Jeremy: Another thing I've seen with Deno is it supports running web assembly.

[00:47:40] Luca: Mm-hmm.

[00:47:40] Jeremy: So you can export functions and call them from type script. I was curious if you've seen practical uses of this in production within the context of Deno.

[00:47:53] Luca: Yeah. there's actually a Bunch of, of really practical use cases, so probably the most executed bit of web assembly inside of Deno right now is actually yes, build like, yes, build has a web assembly, build like yeses. Build is something that's written and go. You have the choice of either running. Um, natively in machine code as, as like an ELF process on, on Linux or on on Windows or whatever.

Or you can use the web assembly build and then it runs in web assembly. And the web assembly build is maybe 50% slower than the, uh, native build, but that is still significantly faster than roll up or, or, or, or I don't know, whatever else people use nowadays to do JavaScript Bun, I don't know. I, I just use es build always, um,

So, um, for example, the Deno website, is running on Deno Deploy. And Deno Deploy does not allow you to run Subprocesses because it's, it's like this edge run time, which, uh, has certain security permissions that it's, that are not granted, one of them being sub-processes. So it needs to execute ES build. And the way it executes es build is by running them inside a web assembly.

Um, because web assembly is secure, web assembly is, is something which is part of the JavaScript sandbox. It's inside the JavaScript sandbox. It doesn't poke any holes out. Um, so it's, it's able to run within, within like very strict security context. . Um, and then other examples are, I don't know, you want to have a HTML sanitizer, which is actually built on the real HTML par in a browser.

we, we have an hdml sanitizer called com or, uh, ammonia, I don't remember. There's, there's an HTML sanitizer library on denoland slash x, which is built on the html parser from Firefox. Uh, which like ensures essentially that your html, like if you do HTML sanitization, you need to make sure your HTML par is correct, because if it's not, you might like, your browser might parse some HTML one way and your sanitizer pauses it another way and then it doesn't sanitize everything correctly.

Um, so there's this like the Firefox HTML parser compiled to web assembly. Um, you can use that to. HTML sanitization, or the Deno documentation generation tool, for example. Uh, Deno Doc, there's a web assembly built for it that allows you to programmatically, like generate documentation for, for your type script modules.

Um, yeah, and, and also like, you know, deno fmt is available as a WebAssembly module for programmatic access and a Bunch of other internal Deno, programs as well. Like, or, uh, like components, not programs.

[00:50:20] Jeremy: What are some of the current limitations of web assembly and Deno for, for example, from web assembly, can I make HTTP requests? Can I read files? That sort of thing.

[00:50:34] Luca: Mm-hmm. . Yeah. So web assembly, like when you spawn as web assembly, um, they're called instances, WebAssembly instances. It runs inside of the same vm, like the same, V8 isolate is what they're called, but. it does not have it, it's like a completely fresh sandbox, sort of, in the sense that I told you that between a runtime and like an engine essentially implements no IO calls, right?

And a runtime does, like a runtime, pokes holds into the, the, the engine. web assembly by default works the same way that there is no holes poked into its sandbox. So you have to explicitly poke some holes. Uh, if you want to do HTTP calls, for example, when, when you create web assembly instance, it gives you, or you can give it something called imports, uh, which are essentially JavaScript function bindings, which you can call from within the web assembly.

And you can use those function bindings to do anything you can from JavaScript. You just have to pass them through explicitly. and. . Yeah. Depending on how you write your web assembly, like if you write it in Rust, for example, the tooling is very nice and you can just call some JavaScript code from your Rust, and then the build system will automatically make sure that the right function bindings are passed through with the right names.

And like, you don't have to deal with anything. and if you're writing go, it's slightly more complicated. And if you're writing like raw web assembly, like, like the web assembly, text format and compiling that to a binary, then like you have to do everything yourself. Right? It's, it's sort of the difference between writing C and writing JavaScript.

Like, yeah. What level of abstraction do you want? It's definitely possible though, and that's for limitations. it, the same limitations as, as existing browsers apply. like the web assembly support in Deno is equivalent to the web assembly support in Chrome. so you can do, uh, many things like multi-threading and, and stuff like that already.

but especially around, shared mutable memory, um, and having access to that memory from JavaScript. That's something which is a real difficulty with web assembly right now. yeah, growing web assembly memory is also rather difficult right now. There's, there's a, there's a couple inherent limitations right now with web assembly itself.

Um, but those, those will be worked out over time. And, and Deno is like very up to date with the version of, of the standard, it, it implements, um, through v8. Like we're, we're, we're up to date with Chrome Beta essentially all the time. So, um, yeah. Any, anything you see in, in, in Chrome beta is gonna be in Deno already.

Deno Deploy

[00:52:58] Jeremy: So you talked a little bit about this before, the Deno team, they have their own, hosting. Platform called Deno Deploy. So I wonder if you could explain what that is.

[00:53:12] Luca: Yeah, so Deno has this really nice, this really nice concept of permissions which allow you to, sorry, I'm gonna start somewhere slightly, slightly unrelated. Maybe it sounds like it's unrelated, but you'll see in a second. It's not unrelated. Um, Deno has this really nice permission system which allows you to sandbox Deno programs to only allow them to do certain operations.

For example, in Deno, by default, if you try to open a file, it'll air out and say you don't have read permissions to read this file. And then what you do is you specify dash, dash allow read um, maybe you have to give it. they can either specify, allow, read, and then it'll grant to read access to the entire file system.

Or you can explicitly specify files or folders or, any number of things. Same goes for right permissions, same goes for network permissions. Um, same goes for running subprocesses, all these kind of things. And by limiting your permissions just a little bit. Like, for example, by just disabling sub-processes and foreign function interface, but allowing everything else, allowing reeds and allowing network access and all that kind of stuff.

we can run Deno programs in a way that is significantly more cost effective to you as the end user than, and, and like we can cold start them much faster than, like you may be able to with a, with a more conventional container based, uh, system. So what, what do you, what Deno Deploy is, is a way to run JavaScript or Deno Code, on our data centers all across the world with very little latency.

like you can write some JavaScript code which execute, which serves HTTP requests deploy that to our platform, and then we'll make sure to spin that code up all across the world and have your users be able to access it through some URL or, or, or some, um, custom domain or something like that. and this is some, this is very similar to CloudFlare workers, for example.

Um, and it's like Netlify Edge functions is built on top of Deno Deploy. Like Netlify Edge functions is implemented on top of Deno Deploy, um, through our sub hosting product. yeah, essentially Deno Deploy is, is, um, yeah, a cloud hosting service for JavaScript, um, which allows you to execute arbitrary JavaScript.

and there there's a couple, like different directions we're going there. One is like more end user focused, where like you link your GitHub repository and. Like, we'll, we'll have a nice experience like you do with Netlify and Versace, that word like your commits automatically get deployed and you get preview deployments and all that kind of thing.

for your backend code though, rather than for your front end websites. Although you could also write front-end websites and you know, obviously, and the other direction is more like business focused. Like you're writing a SaaS application and you want to allow the user to customize, the check like you're writing a SaaS application that provides users with the ability to write their own online store.

Um, and you want to give them some ability to customize the checkout experience in some way. So you give them a little like text editor that they can type some JavaScript into. And then when, when your SaaS application needs to hit this code path, it sends a request to us with the code, we'll execute that code for you in a secure way.

In a secure sandbox. You can like tell us you, this code only has access to like my API server and no other networks to like prevent data exfiltration, for example. and then you do, you can have all this like super customizable, code in inside of your, your SaaS application without having to deal with any of the operational complexities of scaling arbitrary code execution, or even just doing arbitrary code execution, right?

Like it's, this is a very difficult problem and give it to someone else and we deal with it and you just get the benefits. yeah, that's Deno Deploy, and it's built by the same team that builds the Deno cli. So, um, all the, all of your favorite, like Deno cli, or, or Deno APIs are available in there.

It's just as web standard is Deno, like you have fetch available, you have blob available, you have web crypto available, that kind of thing. yeah.

Running code in V8 isolates

[00:56:58] Jeremy: So when someone ships you their, their code and you run it, you mentioned that the, the cold start time is very low. Um, how, how is the code being run? Are people getting their own process? It sounds like it's not, uh, using containers. I wonder if you could explain a little bit about how that works.

[00:57:20] Luca: Yeah, yeah, I can, I can give a high level overview of how it works. So, the way it works is that we essentially have a pool of, of Deno processes ready. Well, it's not quite Deno processes, it's not the same Deno CLI that you download. It's like a modified version of the Deno CLI based on the same infrastructure, that we have spun up across all of our different regions across the world, uh, across all of our different data centers.

And then when we get a request, we'll route that request, um, the first time we get request for that, that we call them deployments, that like code, right? We'll take one of these idle Deno processes and will assign that code to run in that process, and then that process can go serve the requests. and these process, they're, they're, they're isolated and they're, you.

it's essentially a V8 isolate. Um, and it's a very, very slim, it's like, it's a much, much, much slimmer version of the Deno cli essentially. Uh, which the only thing it can do is JavaScript execution and like, it can't even execute type script, for example, like type script is we pre-process it up front to make the the cold start faster.

and then what we do is if you don't get a request for some amount of. , we'll, uh, spin down that, um, that isolate and, uh, we'll spin up a new idle one in its place. And then, um, if you get another request, I don't know, an hour later for that same deployment, we'll assign it to a new isolate. And yeah, that's a cold start, right?

Uh, if you have an isolate which receives, or a, a deployment rather, which receives a Bunch of traffic, like let's say you receive a hundred requests per second, we can send a Bunch of that traffic to the same isolate. Um, and we'll make sure that if, that one isolate isn't able to handle that load, we'll spin it out over multiple isolates and we'll, we'll sort of load balance for you.

Um, and we'll make sure to always send to the, to the point of present that's closest to, to the user making the request. So they get very minimal latency. and they get we, we've these like layers of load balancing in place and, and, and. I'm glossing over a Bunch of like security related things here about how these, these processes are actually isolated and how we monitor to ensure that you don't break out of these processes.

And for example, Deno Deploy does, it looks like you have a file system cuz you can read files from the file system. But in reality, Deno Deploy does not have a file system. Like the file system is a global virtual file system. which is, is, uh, yeah, implemented completely differently than it is in Deno cli.

But as an end user you don't have to care about that because the only thing you care about is that it has the exact same API as the Deno cli and you can run your code locally and if it works there, it's also gonna work in deploy. yeah, so that's, that's, that's kind of. High level of Deno Deploy. If, if any of this sounds interesting to anyone, by the way, uh, we're like very actively hiring on, on Deno Deploy.

I happen to be the, the tech lead for, for a Deno Deploy product. So I'm, I'm always looking for engineers, to, to join our ranks and, and build cool distributed systems. Deno.com/jobs.

[01:00:15] Jeremy: for people who aren't familiar with the isolates, are these each run in their own processes, or do you have a single process and that has a whole Bunch of isolates inside it?

[01:00:28] Luca: in, in the general case, you can say that we run, uh, one isolate per process. but there's many asterisks on that. Um, because, it's, it's very complicated. I'll just say it's very complicated. Uh, in, in the general case though, it's, it's one isolate per process.

Yeah.

Configuring permissions

[01:00:45] Jeremy: And then you touched a little bit on the permissions system. Like you gave the example of somebody could have a website where they let their users give them code to execute. how does it look in terms of specifying what permissions people have? Like, is that a configuration file? Are those flags you pass in?

What, what does that look?

[01:01:08] Luca: Yeah. So, so that product is called sub hosting. It's, um, slightly different from our end user platform. Um, it's essentially a service that allows you to, like, you email us, well, we'll send you a, um, onboard you, and then what you can do is you can send HTTP requests to a certain end point with a, authentication token and.

a reference to some code to execute. And then what we'll do is, we'll, um, when we receive that HTTP request, we'll fetch the code, it's spin up and isolate, execute the code. execute the code. We serve the request, return you the response, um, and then we'll pipe logs to you and, and stuff like that.

and the, and, and part of that is also when we, when we pull the, um, the, the code for to spin up the isolate, that code doesn't just include the code that we're executing, but also includes things like permissions, and, and various other, we call this isolate configuration. Um, you can inspect, this is all public.

we have public docs for this at Deno.com/subhosting. I think. Yes, Deno.com/subhosting.

[01:02:08] Jeremy: And is that built on top of something that's a part of the public Deno project, the open source part? Or is this specific to this sub hosting product?

[01:02:19] Luca: Um, so the underlying engine or underlying runtime that executes the code here, like all of the code execution is performed by code, which is execute, which is public. Like all our, our, yeah, it uses, the Deno CLI just strips out a Bunch of stuff. It doesn't need the orchestration code, is not public.

The orchestration code is proprietary. and yeah, if you have use cases that where you would like to run this orchestration code on your own infrastructure, and yeah, you have interesting use cases, please email us. We would love to hear from you.

[01:02:51] Jeremy: separate from the, the orchestration, if it's more of an example of, let's say I deploy a Deno application and in the case that someone was able to get some, like malicious code or URLs into my application, could I tell Deno I only want this application to be able to call out to these URLs for just as an example?

[01:03:18] Luca: yes. So it's, it's slightly more complicated because you can't actually tell it that it can only call out to specific URLs, but you can tell it to call out only to specific domains or IP addresses. which sort of the same thing, but, uh, just slightly different layer of abstraction. Yeah, you can do that.

the allow net flag, allows you to specify a set of domains to allow requests to those domains. Yes,

[01:03:41] Jeremy: I see. So on the, user facing open source part, there are configuration flags where you could say, I want this application to be able to access these domains, or I don't want it to be able to use IO or whatever

[01:03:56] Luca: Yeah, exactly.

[01:03:57] Jeremy: their, their flags.

[01:03:59] Luca: Yeah. And, and on, on subhosting, this is done via the isolate configuration, which is like a JSON blob. And, yeah, like there, there's, it's, but ultimately it's all, it's all sort of, the same concept, just slightly different interfaces because like, like the, the subhosting one needs to be programmatic interface instead of, uh, something you type as an end user.

Right?

Why deploy your application on the edge?

[01:04:20] Jeremy: One of the things you mentioned about Deno Deploy is it's, centered around deploying your application code to a Bunch of different locations. And you also mentioned the, the cold start times very low.

could you kind of give the case for wanting your application code at a Bunch of different sites?

[01:04:38] Luca: Mm-hmm. . Yeah. So the, the, the, the main benefit of this is that when your user makes request for your application, um, you don't have to round trip back to, um, wherever centrally hosted your application would otherwise be. Like, if you are, a, a startup, even if you're just in the US for example, it's nice to have, points of presence, not just on one of the US coasts, but on both of the US coasts because that means that your round trip time is not gonna be a hundred milliseconds, but it's gonna be 20 milliseconds.

this sort of relies on. the, like, this doesn't, there's obviously always the problem here that if your database lives in only one of the two coasts, you still need to do the round trip. And there's solutions to this, which is one caching, uh, that's the, the, the obvious sort of boring solution. Um, and then there's the solution of using databases which are built exactly for this.

For example, CockroachDB is a database which is Postgres compatible, but it's really built for, um, global distribution and built for being able to shard data across regions and have different, um, primary regions for different, uh, shards of your, of your, of your tables. which means, for example, you could have the, your users on the East coast, their data could live on a database in the east coast and your users on the west coast, their data could live on a database on the west coast.

and. your like admin panel needs to show all of them. It has an aggregate view over both coasts, right? like this is something which, which something like CockroachDB can do and it can be a really great, um, great thing here. And, we acknowledge that this is not something which is very easy to do right now and Deno tries to make everything very easy.

So you can imagine that this is something we're working on and we're working on, on database solutions. And actually I should more generally say persistent solutions that allow you to persist data, in a way that makes sense for an edge system like this. Um, where the data has persisted close to users that need it.

Consistency in local development vs deployment

[01:06:44] Luca: Um, and data is cached around the world. and you still have sort of semantics, which, which are consistent with the semantics that you have, when you're locally developing your application. Like you don't want, for example, your local application development. , you don't want there to be like strong consistency there, but then in production you have eventual consistency where suddenly, I don't know, all of your code breaks because you didn't, your US west region didn't pick up the changes from US east because it's eventually consistent, right?

I mean, this is a problem that we see with a lot of the existing solutions here. like specifically CloudFlare KV for example. CloudFlare KV is, um, a single primary is a system with, with single primary, um, right regions, where there's just a Bunch of caching going on. And this leads to ventral consistency, which can be very confusing for, for end user developers.

Um, especially because if you're using this locally, the local emulator does not emulate the eventual consistency, right? so this, this, this can become very confusing very quickly. And so a, anything that we build in, in this persistence field, for example, we take very, we very seriously, um, Weigh these trade offs and make sure that if there's something that's eventually consistent, it's very clear and it works the same way, the same eventually consistent way in the cli.

[01:08:03] Jeremy: So for someone, let's say they haven't made that jump yet to use a cockroach. They, they just have their. their database instance in AWS East or whatever. does having the code at the edge where it all ends up needing to go to east, is that better than having the code be located next to the database?

[01:08:27] Luca: Yeah. Yeah. It, it, it totally does. Um, there's, there's def there's different, um, there, there's trade-offs here, right? Obviously, like if you have a, a, if you have an admin panel, for example, or a, a like user dashboard, which is very, very reliant on data from your database, and for every single request needs to fetch fresh data from the database, then maybe the trade off isn't worth it.

But most applications are not like that. Most applications are, for example, you have a landing page and that landing page needs to do AB tests. and those AB tests are based on some heuristic that you can fetch from the database every five seconds. That's fine. Like, it doesn't need to be perfect, right?

So you, you have caching in place, which, um, like by doing this caching locally to the user, um, and, and still being able to programmatically control this, like based on, I don't know, the user's user agent or, the IP address of the user or the region of the user, or. the past browsing history of that user through as, as measured by their cookies or whatever else, right?

being able to do these highly user customized actions very close to the user, means that like latency is, is like, this is a much better user experience than if you have to do the roundtrip, especially if you're a, a startup or, or, or, or a, um, service which is globally distributed and, and serves not just users in the US or the EU but like all across the world.

Caching options

[01:09:52] Jeremy: And when you talk about caching in the context of Deno Deploy, is there a cache native to the system or are you expecting someone to have, uh, a Redis or a memcached, that sort of thing?

[01:10:07] Luca: Yeah. So Deno Deploy, actually has, there's a built, there's a, there's a web cache api, um, which is also the web cache API that's used by service workers and, and others. and CloudFlare also implements this cache api. Um, and this is something that's implemented in Deno cli, and it's gonna be coming to Deploy this quarter, which is, that's the native way to do caching, and otherwise you can also use Redis you can use services like Upstash or, uh, even like primitive in-memory caches where it's just an LRU that's in memory, like a, like a JavaScript data structure, right?

or even just a JavaScript map or JavaScript object, with a, with a time on it. And you automatically, and like every time you read from it and the time is above some certain threshold, you delete the cache and go fetch it again, right? Like this is, there's many things that you could consider a cache that are not like Redis or, or, or, or like the web cache api.

So there's, there's ways to do that. And there's also a Bunch of, like, modules on, in the standard library, or not in the standard library story in the, in the third party module registry and also on NPM that you can use to, to implement different cache behaviors.

[01:11:15] Jeremy: And when you give the example of a in memory cache, when you're running in Deno deploy, you're running in these isolates, which presumably can be shut down at any time. So what kind of guarantees do users have that whatever they put into memory will still be there?

[01:11:34] Luca: none like the, it's, it's a cache, right? The cache can be evicted at any time. Your isolate can be restarted at any time. It can be shut down. You can be moved to a different region. The data center could go for, go down for maintenance. Like this is something your application has to be built in, in a way that it is tolerant to, to restarts essentially.

but because it's a cache, that's fine. Because if the cache expires or, or, or the cache is cleared through some external means, the worst thing that happens is that you have a cold request again, right? And, if you're serving like a hundred requests a second, I can essentially guarantee to you that not every single request will invoke a cold start.

Like, I can guarantee to you that probably less than 0.1% of requests will, will cause a cold start. this is not like SLA anywhere. Um, because it's like totally up to, to however the, the system decides to scale you. but yeah, like it's, it, it would be very wasteful for us, for example, to spin up a new isolate for every request.

So we don't, we reuse isolates wherever possible. yeah. It's like it's in our best interest to not cold start you, um, because it's expensive for us to do all the CPU work to, to cold start an isolate, right?

Working with CDNs

[01:12:47] Jeremy: and typically with applications, people will put a, a CDN in front and they'll use things like cache control headers to be able to serve straight from the CDN Is that a supported use case with Deno Deploy or are there anything that, anything that people should be aware of when they're doing that sort of thing?

[01:13:09] Luca: Yeah, so you can do that. Um, like you could put a cache in front of Deploy but in most cases it's really not necessary. Um, because the main reasons people use CDNs is, it is essentially to like do this global distribution problem, right? Like you, you want to be able to cache close to users, but if your end application is already executing close to users, the cost of a, of a, of serving something, like serving a request from a JavaScript cache is like marginal.

It's so low. there's, there's like no nearly no CPU time involved here. it's, it's network bandwidth. That's the, that's the limiting factor and that's the limiting factor for all CDNs. Uh, so, so whether you're serving on Deploy or you have a, a separate CDN that you put in front of it, hmm. not really that big a difference.

Like you can do it. but I don't know. Deno.com doesn't, or, or, and Deno.land, like they don't have a CDN in front of them. They're running bare on, on Deno Deploy and, yeah, it's fine.

[01:14:06] Jeremy: So for, even for things like images, for example, something that. Somebody might store in object storage and put a CDN in in front.

[01:14:17] Luca: Mm-hmm.

[01:14:18] Jeremy: are you suggesting that people could put it on Deno deployed directly or just kind of curious what your thoughts are there?

[01:14:26] Luca: Yeah. Uh, like if you have a blog and your profile image is, is part of your blog, right? And you can put that in your static file folder and serve that directly from your Deno Deploy application, like that's totally cool. Uh, you should do that because that's obvious and that's the obvious way to do things.

if you're specifically building like a, image serving CDN , go reach out to us because we'd love to work with you. But also, um, like there's probably different constraints that you have. Um, like you probably very, very, very much care about network bandwidth costs, um, because that is like your one number one primary cost factor.

so yeah, it's just what's the trade off? What, what trade-offs are you willing to make? Like does some other provider give you a lower network bandwidth cost? I would argue that if you're building an, like an image cdn, then you'd probably, like, even if you have to write your application code in Haskell or in whatever, it's probably worth it if you can get like a cent cheaper gigabyte transfer fees.

just because that is like 100% of your, of your costs, um, is, is network bandwidth. So it's really a trade off based on what, what you're trying to build.

Workloads currently not handled by Deno Deploy (Coming soon)

[01:15:36] Jeremy: And if I understand correctly, Deno Deploy, it's centered around applications. That take HTTP requests. So it could be a website, it could be an API that sort of thing. and sometimes when people build applications, they have other things surrounding them. They'll, they'll need scheduled jobs. They may need some form of message queue, things like that.

Things that don't necessarily fit into what Deno Deploy currently hosts. And so I wonder for things like that, what you recommend people would do while working with Deno Deploy.

[01:16:16] Luca: Yeah. Great question. unfortunately I can't tell you too much about that without, like, spoiling everything (laughs), but what I'm gonna say is you should keep your eyes peeled on our blog over the next two to three months here. I consider message queues and like, especially message queues they are a persistence feature and we are currently working on persistence features.

So yeah, that's all I'm gonna say. But, uh, you can expect Deno deployed to do things other than, um, just HTTP requests in the not so far. Future, and like cron jobs and stuff like that. Also, uh, at some point, yeah.

Who's using deno?

[01:16:54] Jeremy: All right. We'll look, we'll look out for that I guess as we wrap up, maybe you could give some examples of who's using Deno and, and what types of projects do you think are are ideal for Deno?

[01:17:11] Luca: Yeah. yeah. Uh, Deno or Deno Deploy, like do you know, like, do you know as in all of Deno or Deno deploy specifically?

[01:17:17] Jeremy: I, I mean, I guess either (laughs)

[01:17:19] Luca: Okay. . Okay. Okay. Yeah, yeah. Uh, let's, let's do it. So, one really cool use case, for example, for Deno is Slack. Uh, slack has this app platform that they're building, um, which allows you to execute arbitrary JavaScript from within inside of Slack, in response to like slash commands and like actions. I dunno if you've ever seen like those little buttons you can have in messages if you press one of those buttons, like that can execute some Deno code.

And Slack has built like this entire platform around that, and it makes use of Deno's like security features and, and built in tooling and, and all that kind of thing. Um, and that's really cool. And Netlify has built edge functions like, which is like a really, really awesome primitive they have for, for being able to customize outgoing requests to even, come up with completely new requests on the spot, um, as part of their CDN layer.

Uh, also built on top of Deno. And GitHub has built, like this platform called, flat, which allows you to like sort of, um, on cron schedules, pull data, um, into git repositories and, and process that and, and post-process that and, and, and do do things with that. And it's integrated with GitHub actions, all kind of thing.

It's kind of cool. Supabase also has some Edge has like an Edge functions product that's built on top of Deno. I'm just thinking about other, like those are, those are the obvious ones that are on the homepage. there's, I, I know for example, there's a image CDN actually that's serves images with Deno, like 400 million of them a day.

kind of related to what we were talking about earlier. Actually, I don't know if it's still 400 million. I think it's more, um, the last data I got from them was like maybe eight months ago. So probably more at this point. Um, . Yeah. A Bunch of cool, cool, cool things like that. Um, we have like a really active discord channel and there's always people showcasing what kind of stuff they built in there that we have a showcase channel.

I think that's like, if, if you're really interested in like what people are, what cool things people are building with, you know, that's like, that's a great place to, to look. I think actually we maybe also have a showcase. Do we have Deno.land/showcase? I don't remember. Show case. Oh yeah, we do Deno.com/showcase, which is a page of like a Bunch of Yeah. Projects built with Deno or, or, or products using Deno or, um, other things like that.

[01:19:35] Jeremy: Cool. if people wanna learn more about Deno or see what you're up to, where should they head?

[01:19:42] Luca: Yeah. Uh, if you wanna learn more about Deno Cli, head to Deno.land. If you wanna learn more about Deno Deploy, head to Deno.com/deploy. Um, if you want to chat to me, uh, you can hit me up on my website, lcas.dev. if you wanna chat about Deno, you can go to discord.gg/deno. yeah, and if you're interested in any of this and thought that maybe you have something to contribute here, you can either become an open source contributor on our open source project, or this is really something you wanna work on and you like distributed systems or systems engineering or fast performance, head to deno.com/jobs and, send in your resume.

We're, we're very actively hiring and, be super excited to, to, work with you.

[01:20:20] Jeremy: All right, Luca. Well thank you so much for coming on Software Engineering Radio.

[01:20:24] Luca: Thank you so much for having me.

Jan 10, 2023

01:37:55

Leaguepedia is a MediaWiki instance that covers tournaments, teams, and players in the League of Legends esports community. It's relied on by fans, analysts, and broadcasters from around the world.

Megan "River" Cutrofello joined Leaguepedia in 2014 as a community manager and by the end of her tenure in 2022 was the lead for Fandom's esports wikis.

She built up a community of contributing editors in addition to her role as the primary MediaWiki developer.

She writes on her blog and is a frequent speaker at the Enterprise MediaWiki Conference

Topics covered:

When to use MediaWiki
Visual vs code editor
MediaWiki's rough syntax
Templates and markup
Limiting user input to simplify pages
Choosing not to transliterate long player names in certain languages
Handling mobile clients
Building aliases for search results
Creating a single source of truth
Roster changes and caching
Cargo (Query data in MediaWiki templates using SQL)
Hiding implementation details from editors
Optimizing for the editor, not a clean codebase
Training your users to use workarounds
MediaWiki only supports es5
The wiki aesthetic
Who is working on the wiki + onboarding
Who is using the wiki
The future of Leaguepedia
How Megan got into wiki development
Issues as opportunities to onboard

MediaWiki extensions

CharInsert - Add code snippets into the MediaWiki editor
mediawiki.org/wiki/Semantic_MediaWiki">Semantic MediaWiki (SMW) - Store and query data inside Wiki pages
Cargo - Replaced SMW at Leaguepedia

Conference Talks

Other podcast appearances

Between the Brackets

Transcript

You can help edit this transcript on GitHub.

[00:00:00] Jeremy: Today I'm talking to Megan Cutrofello. She managed the Leaguepedia eSports wiki for eight years, and in 2017 she got an award for being the unsung hero of the year for eSports. So Megan, thanks for joining me today.

[00:00:17] Megan: Thanks for having me.

[00:00:19] Jeremy: A lot of the people I talk to are into web development, so they work with web frameworks and things like that. And I guess when you think about it, wikis are web development, but they're kind of their own world, I suppose. for someone who's going to build some kind of a site, like when does it make sense for them to use a wiki versus, uh, a content management system or just like a more traditional web framework?

[00:00:55] Megan: I think it makes the most sense to use a wiki if you're going to have a lot of contributors and you don't want all of your contributors to have access to your server.

also if your contributors aren't necessarily as tech savvy as you are, um, it can make sense to use a wiki. if you have experience with MediaWiki, I guess it makes sense to use a Wiki.

Anytime I'm building something, my instinct is always, oh, I wanna make a Wiki (laughs) . Um, so even if it's not necessarily the most appropriate tool for the job, I always. My, my first thought is, hmm, let's see, I'm, I'm making a blog. Should I make my blog in in MediaWiki? Um, so, so I always, I always wanna do that. but I think it's always, when you're collaborating is pretty much, you always wanna do MediaWiki

[00:01:47] Jeremy: And I, I think that's maybe an important point when you say people are collaborating. When I think about Wikis, I think of Wikipedia, uh, and the fact that I can click the edit button and I can see the markup right there, make a change and, and click save. And I didn't even have to log in or anything. And it seems like that workflow is built into a wiki, but maybe not so much into your typical CMS or WordPress or something like that.

[00:02:18] Megan: Yeah. Having a public ability to solicit contributions from anyone. so for Leaguepedia, we actually didn't have open contributions from the public. You did have to create an account, but it's still that open anyone can make an account and all you have to do is like, go through that one step of create an account.

Admittedly, sometimes people are like, I don't wanna make an account that's so much work. And we're like, just make the account. Come on. It's not that hard. but, uh, you still, you're a community and you want people to come and contribute ideas and you want people to come and be a part of that community to, document your open source project or, record the history of eSports or write down all of the easter eggs that you find in a video game or in a TV show, or in your favorite fantasy novels.

Um, and it's really about community and working together to create something where the whole is bigger than the sum of its parts.

[00:03:20] Jeremy: And in a lot of cases when people are contributing, I've noticed that on Wikipedia when you edit, there's an option for a, a visual editor, and then there's one for looking at the raw markup. in, in your experience, are people who are doing the edits, are they typically using the visual editor or are they mostly actually editing the, the markup?

[00:03:48] Megan: So we actually disabled the Visual editor on Leaguepedia, because the visual editor is not fantastic at knowing things about templates. Um, so a template is when you have one page that gets its content pulled into the larger page, and there's a special syntax for that, and the visual editor doesn't know a lot about that.

Um, so that's the first reason. And then the second reason is that, there's this, uh, one extension that we use that allows you to make a clickable, piece of text. It's called (https://www.mediawiki.org/wiki/Extension:CharInsert) CharInserts, uh, for character inserts. so I made a lot of these things that is sort of along the same philosophy as Visual Editor, where it's to help people not have to have the same burden of knowledge, of knowing every exact piece of source that has to be inserted into the page. So you click the thing that says like, um, insert a pick and band prefill, and then a little piece of JavaScript fires and it inserts a whole bunch of Wiki text and then you just enter the champions in the correct places. In the prefills of champions are like the characters that you play in, uh, league of Legends.

And so then you have like the text is prefilled for you and you only have to fill in into this outline. so Visual Editor would conflict with CharInserts, and I much preferred the CharInserts approach where you have this compromise in between the never interacting with source and having to have all of the source memorized.

So between the fact that Visual Editor like is not a perfect tool and has these bugs in it, and also the fact that I preferred CharInserts, we didn't use Visual Editor at all. I know that some wikis do like to use Visual Editor quite a bit, and especially if you're not working with these templates where you have all of these prefills, it can be a lot more preferred to use Visual Editor.

Visual Editor is an experience much more similar to editing something like Microsoft Word, It doesn't feel like you're editing code. and editing code is, I mean, it's scary. Like for, and when I said like, MediaWiki is when you have editors who aren't as tech savvy, as the person who set up the Wiki.

for people who don't have that experience, I mean, when you just said like you have to edit a wiki, like someone who's never done that before, they can be very intimidated by it. And you're trying to build a sense of community. You don't want to scare away your potential editors. You want everyone to be included there.

So you wanna do everything possible to make everyone feel safe, to contribute their ideas to the Wiki. and if you make them have to memorize syntax, like even something that to me feels as simple as like two open brackets and then the name of a page, and then two closed brackets means linking the page.

Like, I mean, I'm used to memorizing a lot of syntax because like, I'm a programmer, but someone who's never written code before, I mean, they're not used to memorizing things like that. So they wanna be able to click a button that says insert link, and then type the name of the page in the middle of the things that pop up there.

Um, so visual editor is. It's a lot safer to use. so a lot of wikis do prefer that. and if it, if it didn't have the bugs with the type of editing that my Wiki required, and if we weren't using CharInserts so much, we definitely would've gone for it. But, um, it wasn't conducive to the wiki that I built, so we didn't use it at all.

[00:07:42] Jeremy: And the, the compromise you're referring to, is it where the editor sees the raw markup, but then they can, there's like little buttons on the side they can click and they'll know, okay, if I click this one, then it's going to give me the text for creating a list or something like that.

[00:08:03] Megan: Yeah, it's a little bit more high level than creating a list because I would never even insert the raw syntax for creating a list. It would be a template that's going to insert a list at the very end. but basically that, yeah,

[00:08:18] Jeremy: And I, I know for myself, even though I do software development, if I click at it on a wiki and there's all the different curly brace tags, there's the square tags, and. I think if you spend some time with it, you can kind of get a sense of what it means. But for the average person who doesn't work with software in their day to day, do, do you find that, is that a big barrier for them where they, they click edit and there's all this stuff that they don't really understand?

Is that where some people just, they go, oh, I don't, I don't know what to do.

[00:08:59] Megan: I think the biggest barrier is actually clicking at it in the first place. so that was a big barrier to me actually. I didn't wanna click at it in the first place, and I guess my reasons were maybe a little bit different where for me it was like, I know that if I click edit, this is going to be a huge rabbit hole and I'm going to learn way too much about wikis and this is going to consume my entire life and look where I ended up.

So I guess I was pretty right about that. I don't know if other people feel the same way or if they just like, don't wanna get involved at all. but I think once people, click edit, they're able to figure it out pretty well. I think there's, there's two barriers or maybe three barriers. the first one is clicking edit in the first place.

The second one is if they learn to code templates at all. Media Wiki syntax is literally the worst I have encountered other than programming languages that are literally parodies. So like the white space language is worse (laughs https://en.wikipedia.org/wiki/Whitespace_(programming_language)) , but like it's two curly braces for a template and it's three curly braces for a variable.

And like, are you actually kidding me? One of my blog posts is like a plea to editors to write a comment saying the name of the template that they're ending because media wiki like doesn't provide any syntax for what you're ending. And there's no, like, there's no indentation. So you can't visually see what you're ending.

And there's no. So when I said the white sp white space language, that was maybe appropriate because MediaWiki prints all of the white space because it's really just like, PHP functions that are put into the text that you're literally putting onto the page. So any white space that you put gets printed.

So the only way to put white space into your code is if you comment it out. So anytime you wanna put a new line, you have to comment out your new line. And if you wanna indent your code, you have to comment out the indents. So it's just, I, I'm , I'm not exaggerating here. It's, it's just the worst. Occasionally you can put a little bit of white space. Because there's like some divisions in parser functions that get handled when it gets sent to the parser. And, but I mean, for the most part it's just, it's just terrible. so if I'm like writing an if statement, I'll write if, and then I'll write a commented out endif at the end, so once an editor starts to write templates, like with parser functions and stuff, that's another big barrier because, and that's not because like people don't know how to code, it's just because the MediaWiki language, and I use language very loosely, it's like this collection of PHP functions poured into this just disaster

It's just, it's not good! (laughs) And the, the next barrier is when people start to jump to Lua, which is just, I mean, it's just Lua where you can write Lua modules and then, Lua is fine. It's great, it has white space and you can make new lines and it's absolutely fine and you can write an entire code base and as long as you're writing Lua, it's, it's absolutely fantastic and there's nothing wrong with it anymore (laughs)

So as much as I just insulted the MediaWiki language, like writing Lua in MediaWiki is great (laughs) . So for, for most of my time I was writing Lua. Um, and I have absolutely no complaints about that except that Lua is one index, but actually the one indexing of Lua is fine because MediaWiki itself is one indexed.

So people complain about Lua being one index, and I'm like, what are you talking about? If it's, if another language were used, then you'd have all of this offsetting when you go to your scripting language because you'd have like the first argument from your template in MediaWiki going into your scripting language, and then you'd have to offset it to zero and everyone would be like vastly confused about what's going on.

So you should be thankful that they picked a language that's one index because it saves you all of this headache. So anyway, sorry for that tangent, but it's very good that we picked a one index language.

[00:13:17] Jeremy: When you were talking about the, the if statement and having to put in comments to have white space, is it, cuz like when I think about an if statement in most languages, the, the if statement isn't itself rendering anything, it's like deciding if you're going to do something inside of the, if so. like what, what would that white space do if you didn't comment it out in the context of the if?

[00:13:44] Megan: So actually you would be able to put some white space inside of an if statement, but you would not be able to put any white space after an if statement. and there, most likely inside of the if statement, you're printing variables or putting other parser functions. and the other parser functions also end in like two curly braces.

And, depending on what you're printing, you're likely ending with a series of like five or eight, or, I don't know, some very large set of curly braces. And so what I like to do is I would like to be able to see all of the things that I'm ending with, and I wanna know like how far the nesting goes, right.

So I wanna write like an end if, and so I have to comment that out because there's no like end if statement. so I comment out an end if there, it's more that you can't indent the statements inside of the if, because anything that you would be printing inside of your code would get printed. So if I like write text inside of the code, then that indentation would get printed into the page.

And then if I put any white space after the if statement, then that would also get printed. So technically you can have a little bit of white space before the curly braces, but that's only because it's right before the curly braces and PHP will strip the contents right inside of the parser function.

So basically if PHP is stripping something, then you're allowed to have white space there. But if PHP isn't stripping anything, then all of the white space is going to be printed and it's like so inconsistent that for the most part it's not safe to put white space anywhere because you don't, you have to like keep track of am I in a location where PHP is going to be stripping something right now or not?

and I, I wanna know what statement or what variable or what template I'm closing at any location. So I always want to, write out what I'm closing everywhere. And then I have to comment that because there was no foresight to put like an end

if clause in this white space, sensitive language.

[00:16:22] Jeremy: Yeah, I, I think I see what you mean. So you have, if you're gonna start an, if you have the, if inside these curly braces, but then, inside the, if you typically are going to render some text to the page, and so intuitively you would indent it so that it's indented in from the if statement. But then if you do that, then it's gonna be shifted to the right on, on the Wiki.

Did I get that right?

[00:16:53] Megan: Yeah. So you have the flexibility to put white space immediately because PHP will strip immediately, but then you don't have flexibility to put any white space after that, if that makes sense.

[00:17:11] Jeremy: So, so when you say immediately, is that on the following line or is that

[00:17:15] Megan: yeah, so any white space before the first clause, you have flexibility. So like if you were to put an if statement, so it's like if, and then there's a colon, all of the next white space will get stripped. Um, so then you can put some text, but then, if you wanted to like put some text and then another if statement nested within the first if statement.

It's not like Lua where you could like assign a variable and then put a comment and then put some more white space and then put another statement. And it's white space insensitive because you're just writing code and you haven't returned anything yet.

it, it's more like Jinja (View templating language) than Python for, for an analogy.

So everything is getting printed because you're in like a, this templating language, not actually a programming language. Um, so you have to work as if you're in a templating language about, you know, 70% of the time , unless you're in this like very specific location where PHP is stripping your white space because you're at the edge of an argument that's being sent there.

So it's like incredibly inconsistent. And every now and then you get to like, pretend that you're in an actual language and you have some white space, that you can indent or whatever. it's just incredibly

inconsistent, which is like what you absolutely want out of a programming language (laughs)

yeah, it's like you're, you're writing templates, but like, it seems like because of the fact that it's using php, there's

[00:18:56] Jeremy: weird exceptions to the behavior.

Yeah.

[00:18:59] Megan: Exactly. Yeah.

[00:19:01] Jeremy: and then you also mentioned these, these templates. So, if I understand correctly, this is kind of like how a lot of web frameworks will have, partials, I guess, where you'll, you'll be able to have a webpage, but it's made up of different I don't know if you would call them components, but you're able to build a full page that's made up of a bunch of different pieces.

So you could have a

[00:19:31] Megan: Yeah Yeah that's a good analogy.

[00:19:33] Jeremy: Where it's like, here's my table of contents, or here's my info box, or things like that. And those are all things that you would create a MediaWiki template for, and then somehow the, the data gets passed into those templates and the template decides how to, to render it out.

[00:19:55] Megan: Yeah.

[00:19:56] Jeremy: And for these, these templates, I, I noticed on some of the Leaguepedia pages, I noticed there's some html in some of them. I was curious if that's typical to write them with HTML or if there are different ways native to Media Wiki for, for, creating these templates.

[00:20:23] Megan: Um, it depends on what you're doing. MediaWiki has a special syntax for tables specifically. I would say that it's not necessarily recommended to use the special syntax because occasionally you can get things to not work out fantastically if people slightly break things. But it's easier to use it.

So if you know that everything's going to work out perfectly, you can use it. and it's a simple shortcut. if you go to the help page about tables on Wikipedia, everything is explained, and not all HTML works, um, for security reasons. So there's like a list of allowed, things that you can use, allowed tags, so you can't put like forms and stuff natively, but there's the widgets extension that you can use and widgets just automatically renders all html that you put inside of a widget.

Uh, and then the security layer there is that you have to have a special permission to edit a widget. so, you only give trusted people that permission and then they can put the whatever html they want there. So, we have a few forms on Leaguepedia that are there because I edited, uh, whichever widgets, and then put the widgets into a Lua module and then put the Lua module into a template and then put the template onto the page.

I was gonna say, it's not that complicated. It's not as complicated as it sounds, but I guess it really is as complicated as it sounds (laughs) . Um, so, uh, I, I won't say that. I don't know how standard it is on other wikis to use that much html, I guess Leaguepedia is pretty unique in how complicated it is.

There aren't that many wikis that do as many things as we did there. but tables are pretty common. I would say like putting divs places to style them, uh, is also pretty common. but beyond that, usually there's not too many HTML elements just because you typically wanna be mobile friendly and it's relatively hard to stay mobile friendly within the bounds of MediaWiki if you're like putting too many elements everywhere.

And then also allowing users to put whatever content inside of them that they want. The reason that we were able to get away with it is because despite the fact that we had so many editors, our content was actually pretty limited. Like if there's a bracket, it's only short team names going into it.

So, and short team names were like at most five or six characters long, so we don't have to worry about like overflow of team names. Although we designed the brackets to support overflow of team names, and the team names would wrap around and the bracket would not break. And a lot of CSS Magic went into making that work that, we worked really hard on and then did not end up using (laughsz)

[00:23:39] Jeremy: Oh no.

[00:23:41] Megan: Only short team names go into brackets.

But, that's okay. uh, and then for example, like in, uh, schedules and stuff, a lot of fields like only contain numbers or only contain timestamps. there's like a lot of tables again where like there's only two digit numbers with one decimal point and stuff like that. So a lot of the stuff that I was designing, I knew the content was extremely constrained, and if it wasn't then I said, well, too bad.

This is how I'm telling you to put the content . Um, and for technical reasons, that's the content that's gonna go here and I don't care. so there's like, A lot of understanding that if I said for technical reasons, this is how we have to do it. Then for technical reasons, that was how we had to do it.

And I was very lucky that all of the people that I worked with like had a very big appreciation with like, for technical reasons, like argument over. This is what's happening. And I know that with like different people on staff, like they would not be willing to compromise that way. Um, so I always felt like extremely lucky that like if I couldn't figure out a way to redesign or recode something in order to be more flexible, then like that would just be respected.

And that was like how we designed something. But in general, like it's, if you are not working with something as rigid as, I mean, and like the history of eSports sounds like a very fluid thing, but when you think about it, like it's mostly names of teams, names of players and statistics. There's not that much like variable stuff going on with it.

It's very easy to put in relational databases. It's very easy to put in fixed width tables. It's very easy to put in like charts that look the same on every single page. I'm not saying. It was always easy to like write everything that I wrote, and it's not, it wasn't always easy to like, deal with designs and stuff, but like relative to other topics that you can pick, it was much easier to put constraints on what was going to go where because everything was very similar across regions, across, although actually one thing.

Okay, so this will be like the, the exception that proves the rule. uh, we would trans iterate players' names when we, showed them in team rosters. So, uh, for example, when we were showing the hangul, the Korean player's names, we would show an English translation also.

Um, and we would do this for every single alphabet. but Hungarian players' names are really, really, really long. And so the transliteration doesn't fit in the table when we show the translation to the Roman alphabet. And so we couldn't do this, so we actually had to make a cargo table. Of alphabets that are allowed to be transliterated into the Roman alphabet, uh, when we have players names in that alphabet.

So we had, like, hangul was allowed and Arabic was allowed, and I can't remember the exact list, but we had like three alphabet, three or four alphabets were allowed and the rest of the alphabets were dis allowed to be transliterate into, uh, the Roman alphabet. and so again, we made up a rule that was like a hard rule across the entire Wiki where we forced the set of alphabets that were transliterated so that this tables could be the same size roughly across every single team page because these Hungarian player names are too long (laughs)

So I guess even this exception ended up being part of the rule of everything had to be standardized because these tables were just way too wide and they were running into the info box. They couldn't fit on the side. so it's really hard when you have like arbitrary user entered content to fit it into the HTML that you design.

And if you don't have people who all agree to the same standards, I mean, Even when we did have people who agreed to all of the same standards, it was really, really, really hard. And we ended up having things like a table of which alphabets to transliterate. Like that's not the kind of thing that you think you're going to end up having when you say, let's catalog the history of League of Legends eSports,

[00:28:40] Jeremy: And, and so when, let's say you had a language that you couldn't trans iterate, what would go into the table.

[00:28:49] Megan: uh, just the native alphabet.

[00:28:51] Jeremy: Oh I see. Okay.

[00:28:53] Megan: Yeah. And then if they went to the player page, then you would be able to see it transliterated. But it wouldn't show up on the team page.

[00:29:00] Jeremy: I see. And then to help people visualize what some of these things you're talking about look like when you're talking about a, a bracket, it's, is it kind of like a tree structure where you're showing which teams are facing which teams and okay,

[00:29:19] Megan: We had a very cool, CSS grid structure that used like before and after pseudo elements to generate the lines, uh, between the teams and then the teams themselves were the elements of the grid. Um, and it's very cool. Uh, I didn't design it. Um, I have a friend who I very, very fortunately have a friend who's amazing at CSS because I am like mediocre at css and she did all of our CSS for us.

And she also like did most of our designs too. Uh, so the Wiki would not be like anything like what it is without her.

[00:30:00] Jeremy: And when you're talking about making sure the designs fit on desktop and, and mobile, um, I think when you were talking earlier, you're talking about how you have these, these templates to build these tables and the, these, these brackets. Um, so I guess in which part of the wiki is it ensuring that it looks different or that it fits when you're working with these different screen sizes

[00:30:32] Megan: Usually it's a peer CSS solution. Every now and then we hide an element on mobile altogether, and some of that is actually MediaWiki core, for example, in, uh, nav boxes don't show up on mobile. And that's actually on Wikipedia too. Uh, well, I guess, yeah. I mean, being MediaWiki core, So if you've ever noticed the nav boxes that are at the bottom of pages on Wikipedia, just don't show up on like en.m.wikipedia.org.

and that way you're not like loading, you're not loading, but display noneing elements on mobile. but for the most part it's pure CSS Solutions. Um, so we use a lot of, uh, display flex to make stuff, uh, appropriate for mobile. Um, some media roles. sometimes we display none stuff for mobile. Uh, we try to avoid that because obviously then mobile users aren't getting like the full content.

Occasionally we have like overflow rules, so you're getting scroll bars on mobile and then every now and then we sort of just say, too bad if you're on mobile, you're gonna have not the greatest solution or not the greatest, uh, experience. that's typically for large data tables. so the general belief at fandom was like, if you can't make it a good experience on mobile, don't put it on the Wiki.

And I just think that's like the worst philosophy because like then no one gets a good experie. And you're just putting less content on the Wiki so no one gets to enjoy it, and no one gets to like use the content that could exist. So my philosophy has always been like the, the, core overview pages should be, as good as possible for both PC and mobile.

And if you have to optimize for one, then you slightly optimize for mobile because the majority of traffic is mobile. but attempt not to optimize for either one and just make it a good experience on both. but then the pages behind that, I say behind because we like have tabs views, so they're like sort of literally behind because it looks like folders sort of, or it looks like the tabs in a folder and you can, like, I, I don't know, it, it looks like it's behind (laughs) , the, the more detailed views where it's just really hard to design for mobile and it's really easy to design for pc and it just feels very unlikely that users on mobile are going to be looking at these pages in depth.

And it's the sort of thing. A PC user is much more likely to be looking at, and you're going to have like multiple windows open and you're gonna be tapping between them and you're gonna be doing all of your research at PC. You absolutely optimize this for PC users. Like, what the hell this is? These are like stats pages.

It's pages and pages and pages of stats. It's totally fine to optimize this for PC users. And if the option is like, optimized for PC users or don't create it at all, what are you thinking To not create it at all, like make it a good experience for someone?

So I don't, I don't understand that philosophy at all.

[00:34:06] Jeremy: Did you, um, have any statistics in terms of knowing on these types of pages, these pages that are information dense or have really big tables? Could you tell that? Oh, most of the people coming here are on computers or, or larger screens.

[00:34:26] Megan: I didn't have stats for individual pages. Um, mobile I accidentally lost Google Analytics access at some point, and honestly I wasn't interested enough to go through the process of trying to get it back. when I had it, it didn't really affect what I put time into, because it was, it was just so much what I expected it to be.

That it, it didn't really affect much. What I actually spent the most time on was looking, so you can, uh, you get URLs for search results. And so I would look through our search results, and I would look at the URL of the failed search results and, so there would be like 45 results for this particular failed search.

And then I would turn that into a redirect for what I thought the target was supposed to be. So I would make sure that people's failed searches would actually resolve to the correct thing. So if they're like typo something, then I make the typo actually resolve. So we had a lot of redirects of like common typos, or if they're using the wrong name for a tournament, then I make the wrong name for the tournament resolve.

So the analytics were actually really helpful for that. But beyond that, I, I didn't really find it that useful.

[00:35:48] Jeremy: And then when you're talking about people searching, are these people using a search box on the Wiki itself And not finding what they were looking for?

[00:36:00] Megan: Yeah. So like the internal search, so like if you search Wikipedia for like New York City, but you spell it C I Y T, , then you're not going to get a result. But it might say, did you mean New York City t y? If like 45 people did that in one month, then that would show up for me. And then I don't want them to be getting, like, that's a bad experience.

Sure. They're eventually getting there, but I mean, I don't want them to have to spend that extra time. So I'm gonna make an automatic redirect from c Y T to c i t Y

[00:36:39] Jeremy: And, and. Maybe we should have talked about this a little earlier, but the, all the information on Leaguepedia is, it's about all of the different matches and players, um, who play League of Legends. so when you edit a, a page on Wikipedia, all of that information, or a lot of it I think is, is hand entered by, by people and on Leagueapedia, which has all this information about like what, how teams did in a tournament or, intricate stats about how a game went.

That seems like a lot of information for someone to be hand entering. So I was wondering how much of that information is somebody actually manually editing those things and how much is, is done automatically or programmatically.

[00:37:39] Megan: So it's mostly hand entered. We do have a little bit of it that's automated, via a couple scripts, but for the most part it's hand entered. But after being handed, entered into a couple of data pages, it gets propagated a lot of times based on a bunch of Lua modules and the cargo extension. So when I originally joined the Wiki back in 2014, it was hand entered.

Not just once, but probably, I don't know, seven times for tournament results and probably 10 or 12 times for roster changes. It was, it was a lot. And starting in 2017, I started rewriting all of the code so that it was entered exactly one time for everything. Tournament results get entered one time into a data page and roster changes get entered one time into a data page.

And, for roster changes, that was very difficult because, for a roster change that needs to update the team history on a player page, which goes, from a join to a leave and it needs to update the, the like roster, change portal for the off season, which goes from a leave to a join because it's showing like the deltas over the off season.

And it needs to update the current team in the, player's info box, which means that the current team has to be calculated from all of the deltas that have ever occurred in that player's history and it needs to update. Current rosters in the team pages, which means that the team page needs to know all of the current players who are currently on the team, which again, needs to know all of the deltas from all of history because all that you're entering is the roster changes.

You're not entering anyone's current team. So nowhere on the wiki does it ever store a current team anymore. It only stores the roster changes. So that was a lot of code to write and deciding even what was going to be entered was a lot because, all I knew was that I was going to single source of truth that somehow and I needed to decide what was I going to single source of truth.

So I decided, um, that I was going to be this Delta and then deciding what to do with that, uh, how to store it in a relational database. It was, it was a big project. and I didn't have a background as a developer either. so this was like, I don't know, this was like my third big project ever. So, that was, that was pretty intense.

but it was, it was a lot of fun. so it is hand entered but I feel like that's underselling it a little bit.

[00:40:52] Jeremy: Yeah, cuz I was initially, I was a little confused when you mentioned how somebody might need to enter the same information multiple times. But, if I understood correctly, it would be if somebody's changing which team they're on, they would have to update, for example, the player's page and say like, oh, this player is on this team now.

And then you would have to go to their old team and remove them from the roster there.

Go to the new team, add them to the roster there, And you can see where it would kind

[00:41:22] Megan: Yeah. And then there's the roster, there's the roster nav box, and there's like the old team, you have to say, like the next team. Cuz in the previous players list, like we show former team members from the old team and you have to say like the next team. Uh, so if they had like already left their old team, you'd have to say like, new team.

Yeah, there's a, there's a lot of, a lot of places.

[00:41:50] Jeremy: And so now what it sounds like is, I'm not sure this is exactly how it works, but if you go to any location that would need that information, which team is this player on? When you go to that page, for example, if you were to go to, uh, a teams page, then it would make a SQL query to figure out I guess who most recently had a, I forget what you called it, but like a join row maybe, or like a, they, they had the action of joining this team, and now, now there's a row in the database that says they did this.

[00:42:30] Megan: it actually looks at the ten-- so I have an in in between table called tenures. And so it looks at the tenures table instead of querying all the way through the joins and leaves table and doing like the whole list of deltas. yeah. So, and it's also cached so you, it doesn't do the SQL query every time that you load the page.

So the only time that the SQL queries actually happen is if you do a save on the page. And then otherwise the entire generated HTML of the page is actually cached on the server. So you're, you're not doing that many database queries every time you load the page, so don't worry about that. but there, there can actually be something like a hundred SQL queries sometimes, when you're, saving a page.

So it would be absolute murder if you were doing that every time you went to the page. But yeah, it works. Something like that.

[00:43:22] Jeremy: Okay, so this, this tenures table is, that's kind of like what's the current state of all these players and where they are, and then.

[00:43:33] Megan: Um, the, the tenures table, caches sort of, or I guess the tenure table captures is a better word than caches um, every, join to leave historically from every team. Um, and then I save that for two reasons. The first one is so that I don't have to recompute it, uh, when I'm doing the team's table, because I have to know both the current members and the former members.

And then the second reason is also that we have a public api and so people can query that.

if they're building tools, like a lot of people use the public api, uh, for various things. And, one person built like, sort of like a six degrees of Kevin Bacon except for League of Legends, uh, using our tenures tables.

So, part of the reason that that exists is so that uh, people can use it for whatever projects that they're doing.

Cause the join, the join leave table is like pretty unfriendly and I didn't wanna have to really document that for anyone to use. So I made tenures so that that was the table I could document for people to use.

[00:44:39] Jeremy: Yeah. That, that's interesting in that, yeah, when you provide an api, then there's so many different things people can do that even if your wiki didn't really need it, they can build their own apps or their own pages built on all this information you've aggregated.

[00:44:58] Megan: Yeah. It's nice because then when someone says like, oh, can you build this as a feature request? I can say no, but you can (laughs)

[00:45:05] Jeremy: Well you've, you've done the, the hard part for them (laughs)

[00:45:09] Megan: Yeah. exactly.

[00:45:11] Jeremy: So that's cool. Yeah. that's, that's interesting too about the, the caching because yeah, I guess when you think about a wiki, most of the people who are visiting it are just visiting it to see what's on there. So the, provided that they're not logged in and they don't need anything specific to them. Yeah, you should be able to cache the whole response. It sounds like.

[00:45:41] Megan: Yeah. yeah. Caching was actually a nightmare with this in this particular thing. the, the team roster changes, because, so cargo, which I mentioned a couple times is the database extension that we used. Um, and it's basically a SQL wrapper that like, doesn't port 80% of the features that SQL has. so you can create tables and you can query, but you can't make, uh, like sub-select queries.

So your queries have to be like very simple. which is good for like most users of MediaWiki because like the average MediaWiki user doesn't have that much coding experience, but if you do have coding experience, then you're like, what, what, what am I doing? I can't, I can't do anything. Um, but it's a very powerful tool, still compared to most of what you could do with Media Wiki without this, basically you're adding a database layer to your software stack, which I mean, I, I, that's what you're doing, (laughs)

Um, so you get a huge amount of power from adding cargo to a wiki. Um, in exchange it's, it's very performance. It's like, it's, it, it's resource heavy. uh, it hurts your performance a lot. and if you don't need it, then you shouldn't use it. But frequently you need it when you're doing, difficult or not necessarily difficult, but like intensive things.

Um, anytime that you need to pull data from one page to another, you wanna use something like that. Um,

So cargo, uh, one of the things that it doesn't do is it doesn't allow you to, uh, set a primary key easily. so you have to like, just like pretend that one row in the table is your primary key, basically. it internally automatically sets one, but it won't be static or it won't be the same every time that you rebuild the table because it rebuilds the table in a random order and it just uses an auto increment primary key.

So you set a row in the table to pretend to be your ran, to pretend to be your primary key. But editors don't know what, your editors don't understand anything about primary keys. And you wanna hide this from them completely. Like, you cannot tell an editor, protect this random number, please don't change this.

So you have to hide it completely. So if you're making your own auto increment, like an editor cannot know that that exists. Like this is back to when we were talking about like visual editor. This is like, one of the things about making the wiki safe for people is like not exposing them to the internals of like, anything scary like that.

So for example, if an editor accidentally reorders two rows and your roster change data like that has to not matter. Because that can't break the entire wiki. They, you can't make an editor like freak out because they just reordered two rows in, in the page. And you can't put like a scary notice somewhere saying, under no circumstances reorder two rows here.

Like, that's gonna scare people away. And you wanna be very welcoming and say like, it's impossible to break this page no matter how hard you tried. Don't worry. Anything you do, we can just fix it. Don't worry. But the thing is that everything's going to be cached. And so in particular, um, when I said I made that tenures table, one thing I did not wanna do was resave every single row from the join leave table.

So you had to join back to, sorry, I'm going to use, join in two different connotations. you had to join back to the join leave table in order to get like all of the auxiliary data, like all of the extra columns, like, I don't know, like role, date, team name and stuff. Because otherwise the tenures table would've had like 50 columns or something.

So I needed to store the fake primary key in the tenures table, but the tenures table is cached on the player page and the join leave table is on the data page, which means that I need to purge the cache on the player page anytime that someone edits the data on the data page. Which means that, so there's like some JavaScript that does that, but if someone like changes the order of the lines, then that primary key is going to change because I have an auto increment going on.

And so I had to like very, very carefully pick a primary key here so that it was literally impossible for any kind of order change to affect what the primary key was so that the cash on the player page wasn't going to be changed by anything that the editor did in unless they were going to then update the cash on that player page after making that change.

If that makes sense. So after an editor makes a change on the news page, they're going to press a button to update the cache on the player page, but they're only going to update the player page for the one line that they change on the news page. These, uh, primary keys had to be like super invariant for accidental row moves, or also later on, like entire moves of separating a bunch of these data pages into like separate subpages because the pages were getting too big and it was like timing out the server because there were too many stores to the database on a single page every time you save the page.

And anyway, it took me like five iterations of making the primary key like more and more specific to the single line because my auto increment was like originally including every single line I was auto incrementing and then I auto incremented only when that single player was was involved. And then I auto incremented only when that player and the team was involved.

And then I reset the auto increment for that date. So, and it was just got like more and more convoluted what my primary key was. It was, it was a mess.

Anyway, this is just like another thing when you're working with volunteers who don't know what's going on and they're editing the page and they can contribute content, you have to code for the editor and not code for like minimizing complexity,

The editor's experience matters more than the cleanliness of your code base, and you just end up with these like absolute messes that make no sense whatsoever because the editor's experience matters and you always have to code to the editor. And Media Wiki is all about community, and the editor just becomes part of the software and part of the consideration of your code base, and it's very, very different from any other kind of development because they're like, the UX is just built so deeply into how you're developing.

[00:53:33] Jeremy: if I am following correctly, when I, when I think of using SQL when you were first talking about cargo and you were talking about how you make your own tables, and I'm envisioning the, the columns and the rows and, it's very common for the primary key to either be auto incrementing or some kind of GUID

But then if I understood correctly, I think what you were saying is that anytime an editor makes changes to the data, it regenerates the whole table. Is that did I get that right?

[00:54:11] Megan: It regenerates all of the rows on that page.

[00:54:14] Jeremy: and when you talk about this, these

data pages, there's some kind of media wiki or cargo specific markup where people are filling in what is going to go into the rows. And the actual primary key that's in MySQL is not exposed anywhere when they're editing the data.

[00:54:42] Megan: That's right

[00:54:44] Jeremy: And so when you're talking about trying to come up with a primary key, um, I'm trying to, I guess I'm trying to picture

[00:54:57] Megan: So usually I do page name underscore an auto increment. But then if people can rearrange the rows which they do because they wanna get the rows chronological, but some people just put it at the top of the page and then other people are like, oh my God, it's not chronological. And then they fix it and then other people are like, oh my God, you messed up the time zone.

And then they rearrange it again. Then, I mean, normally I wouldn't care because I don't really care like what the primary key is. I just care that it exists. But then because I have it cached on these player pages, I really, really do care what the primary key is. And because I need the primary key to actually agree with what it is on the data page, because I'm actually joining these things together.

and people aren't going to be updating the cache on the player page if they don't think that they edited the row because rearranging isn't actually editing and people aren't going to realize that. And again, this is burden of knowledge. People can't, I can't make them know that because they have to feel safe to make any edits.

It's bad enough that they have to know that they have to click this button to update the cache after making an edit in the first place. so, the auto increment isn't enough, so it has to be like an auto increment, but only within the set of rows that incorporate that one player. And then rearranging is pretty safe because they'd have to rearrange two pieces of news, including the same player.

And that's really unlikely to happen. It's really unlikely that someone's going to flip the order of two pieces of news that involve the same player without realizing that they're actually are editing that single player except maybe they are. So then I include the team in that also. So they'd have to rearrange two pieces of news, including the same player and the same team.

And that's like unlikely to happen in the first place. And then like, maybe a mistake happens like once a year. And at the end of the day, the thing that really saves us is that we're a wiki. We're not an official source. And so if we have a mistake once a year, like no one cares really. So we're not going for like five nines or anything.

We're going for like, you know, two (laughs) . Um, so

[00:57:28] Jeremy: so

[00:57:28] Megan: We were having like mistakes constantly until I added player and team and date to the set of things that I was auto incrementing against. and once I got all of those, it was pretty stable.

[00:57:42] Jeremy: And for the caching part, so when you're making a cargo query or a SQL query on one page and it needs to join on or get data from another page, it goes to this cache that you have instead of going directly to the actual table in the database. And the only way to get the right data is for the editor to click this button on the website that tells it to update the cache did I get that right?

[00:58:23] Megan: Not quite. So it, well, or Yes, you did sort of, it goes to the actual table. The issue here is that, the table was last updated, the last time that a page was saved. And the last time the data got saved was the last time that the page that contains the parser function that generates those rows got saved.

So, let me say that again. So, some of the data is being saved from the data page where the users manually enter it, and that's fine because the only time that gets updated is when the users manually enter it and then the page gets saved. But then these tenures tables are stored by my lua code on the player pages, and those aren't going to get updated unless the player page gets blank edited or null edited, or a save action happens from the player page.

And so the way to make a, an edit happen from the player page is either to manually go there and click edit, and then click save, which is called a blank edit because. Blank edited, you didn't do anything but you pressed save or to use my JavaScript gadget, which is clicking a button from the data page that just basically does that for you using the api.

And then that's going to update the table and then the database table, because that's where the, the cargo parser function is that writes to the database and updates the tables there. with the information, Hey, the primary key changed, because that's where the parser function is physically located in the wiki because one of them is on the data page and one of them is on the player page.

So you get this disconnect in the cache where it's on two different pages and so you have to press a save action in both of them before the table is consistent again.

[01:00:31] Jeremy: Okay. It be, it's, so this is really all about the tenure table, which the user will never mod or the editor will never modify directly. You need your code running on the data page and the player's page to run, to update the The tenure table?

[01:00:55] Megan: Yeah, exactly.

[01:00:57] Jeremy: yeah, it's totally hidden that this exists to the editor, but it's something that, that you as the person who put this all together, um, have to always be aware of, yeah.

[01:01:11] Megan: Right. So there was just so many things like this, where you just had to press this one button. I call it refresh overview because originally it was on a tournament page and you had to press, the refresh overview button to purge the cache on the overview page of the tournament. after editing the data and you would refresh, overview, to deal with this cache lag.

And everyone knew you have to refresh overview, otherwise none of your data entry is gonna like, be worth anything because it's not, the cache is just gonna lag. but every editor learned, like if there's a refresh overview button, make sure you press the refresh overview button, , otherwise nothing's gonna happen.

Um, and there is just like tons of these littered across the Wiki. and like to most people, it just like, looks like a simple little button, but like so many things happen when you press this button.

so it is, it is very important.

[01:02:10] Jeremy: Are there, no ways inside of media wiki to if somebody edits one page, for example, to force it to go and, do, I forget what you called it, like a blank save or blank edit on another page?

[01:02:27] Megan: So that wouldn't even really work because, we had 11,000 player pages. And you don't know which one the user just edited. so it, it's unclear to MediaWiki what just happened when the user just edited some part of the data page. and like the whole point here is that I can't even blank edit every single player page that the data page links to because the data page probably links to, I don't know, 200 different player pages.

So I wanna link, I wanna blank it like the five that this one news line links to. so I do that, through like HTML attributes, in the JavaScript,

[01:03:14] Jeremy: Oh, so that's why you're using JavaScript so that you can tell what the person edited because there isn't really a way to know natively in, in MediaWiki. what just changed?

[01:03:30] Megan: there's like a diff so I could, like, MediaWiki knows the characters that got changed, but it doesn't really know like semantically what happened. So it doesn't know, like, oh, a link to this just got edited and especially because, I mean it's like templates that got edited, not really like the final HTML or anything.

So Media Wiki has no idea what's going on. so yeah, so the JavaScript, uh, looks at the HTML attributes and then runs a couple API queries, and then the blank edits happen and then a couple purges after that so that the cache gets purged after the blank edit.

[01:04:08] Jeremy: Yeah. So it, it seems like on these Wiki pages, you have the html, you have the CSS you have the ability to describe these data pages, which I, I guess in the end, end up being rows in in SQL. And then finally you have JavaScript. So it kind of seems like you can do almost everything in the context of a a Wiki page.

You have so many, so

many of these tools at your, at your disposal.

[01:04:45] Megan: Yeah. Except write es6 code.

[01:04:48] Jeremy: Oh, still, still only es5.

[01:04:52] Megan: Yeah,

[01:04:52] Jeremy: Oh no. do, do you know if that's something that they are considering changing or

[01:05:01] Megan: There's a Phabricator ticket open.

[01:05:05] Jeremy: How, um, how, how many years?

[01:05:06] Megan: It has a lot of comments, oh a lot of years. I think it's since like 2014 or something

[01:05:14] Jeremy: Oh yeah. I, I guess the, the one maybe, well now now the browsers all, all support es6, but I, I guess one of the things, it sounds like media wiki, maybe side stepped is the whole, front end ecosystem in, in terms of node packages and build tools and things like that. is, is that right? It's basically you can write JavaScript and there, yeah,

[01:05:47] Megan: You can even write jQuery.

[01:05:49] Jeremy: Oh, okay. That's built in as well.

[01:05:52] Megan: Yeah .So I have to admit, like my, my front end knowledge is like a decade out of date or something because it's like what MediaWiki can do and there's like this entire ecosystem out there that I just like, don't have access to. And so I like barely know about. So I have this like side project that uses React that I've like, kind of sort of been working on.

And so like I know this tiny little bit of react and I'm like, why? Why doesn't MediaWiki do this?

Um, they are adding Vue support. So in theory I'll get to learn vue so that'll be fun.

[01:06:38] Jeremy: So I'm, I'm curious, just from the limited experience you've had, outside of,

MediaWiki, are, are there like specific things, uh, in your experience working with React where you're, you really wish you had in inside of Media Wiki?

[01:06:55] Megan: Well, really the big thing is like es6, like I really wish we could use arrow functions , like that would be fantastic. Being able to build components would be really nice. Yeah, we can't do that.

[01:07:09] Jeremy: I, I suppose you, you've touched a little bit on performance before, but I, I guess that's one thing about Wikis is that, putting what's happening in the back end, aside the, the front end experience of Wikis, they, they feel pretty consistent since they're generally mostly server rendered.

And the actual JavaScript is, is pretty light, at least from, from Wikis I've seen.

[01:07:40] Megan: Yeah. I mean you can add as much JavaScript as you want, so I guess it depends on what the users decide to do. But it's, it's definitely true that wikis tend to load faster than some websites that I've seen.

[01:07:54] Jeremy: Yeah, I mean, I guess when you think of a wiki, it's, you're there cuz you wanna get specific information and so the goal is not to necessarily reproduce like some crazy complex app or something. It's, It's, to get you the, the, information. Yeah.

[01:08:14] Megan: Yeah. No, that's actually one thing that I really like about Wikis also is that you don't have the pressure to make them look nice. I know that some people are gonna hear that and just like, totally cringe and be like, oh my God, what is she saying? ? Um, but it's actually really true. Like there's an aesthetic that Wikis and Media Wiki in particular have, and you kind of stick to that.

And within that aesthetic, I mean, you make them look as nice as you can. Um, and you certainly don't wanna like, make them deliberately ugly, but there's not a pressure to like go over the top with like marketing and branding and like, you know, you, you just make them look reasonably nice. And then the focus is on the information and the focus is on making the information as easy to understand as possible.

And a wiki that looks really nice is a wiki that's very understandable and very intuitive, and one where you. I mean, one, that the information is the joy and, you know, not, not the presentation, I guess. So it's like the presentation of the information instead of the presentation of the brand. so I, I really appreciate that about wikis.

[01:09:30] Jeremy: Yeah, that's a good point about the aesthetics in the sense of like, they have a certain look and yeah, maybe it's an authoritative look, , which, uh, is interesting cuz it's, like a, a wiki that I'll, I'll commonly go to for example, is there's the, the PC gaming Wiki. And when you look at how it's styled, it feels like very dated or it doesn't look like, I guess you could say normal webpages, but it's very much in line with what you expect a wiki to look like.

So it's, it's interesting how they have that, shared aesthetic, I guess.

[01:10:13] Megan: Yeah. yeah. No, I really like it. The Wiki experience,

[01:10:18] Jeremy: We, we kind of touched on this near the beginning, but sometimes when. I would see wikis and, and projects like Leaguepedia I would kind of wonder, you know, what's the decision between or behind it being a wiki versus something being like a custom CMS in, in the case of Leaguepedia but, you know, talking to you about how it's so, like wikis are structured so that people can contribute.

and then like you were saying, you have like this consistent look that brings the data to the user. Um, I actually, it gives me a better understanding of why so many people choose wikis as, as ways to present this information.

[01:11:07] Megan: Yeah, a a lot of people have asked me over the years why, why MediaWiki when it always feels like I'm jumping through so many hoops. Um, I mean, when I just described the caching thing to you, and that's just like one of, I don't know, dozens of struggles that I've had where, MediaWiki has gotten in the way of what I need to do.

Because really Leaguepedia is an entire software layer on top of MediaWiki, and so you might ask why. Why MediaWiki? Why not just build the software layer on top of something easier? And my answer is always, it's about the community. MediaWiki lends itself so well to community and people enjoy contributing to wikis and wikis. Wikis are just kind of synonymous with community, and they always have been. And Wikipedia sort of set the example when they launched, and it's sort of always been that way. And, you know, I feel like I'm a part of a community when I say a Wiki. And if it was just if it were a custom site that had the ability to contribute to it, you know, it just feels like it's not the same.

[01:12:33] Jeremy: I think just even seeing the edit button on Wikis is such a different experience than having the expectation, well, I guess in the case of Leaguepedia, you do have to create an account, but even without creating the account, you can still click edit and you can look at the source and you can see how all this information, or a lot of it, how it got filled in.

And I feel like it's kind of more similar to the earlier days of webpages where people could right click a site and click view source and then look at the HTML and the css, and kind of see how it was put together. versus, now with a lot of sites, the, the code has been minified or there's build tools involved so that when you look at view source on websites, it just looks crazy and you're not sure what's going on.

So I, I, I feel like wikis in some ways are, kind of closer to the, the spirit of, like the earlier H

T M L sites. Yeah.

[01:13:46] Megan: And the knowledge transfers too. If you've edit, if you've, if you've ever edited Wikipedia, then you know that like open bracket, open bracket, closed bracket. Closed bracket is how you link a page. and that knowledge transfers to admittedly maybe a little bit less so for Leaguepedia, since there, you need to know how all the templates work and there's not so much direct source editing.

it's mostly like clicking the CharInsert prefills. but there's still a lot of cross knowledge transfer, if you've edited one wiki and then change to editing another. And then it goes the other way too. If you edit Leaguepedia, then you want to go at it for the Zelda Wiki, that knowledge will transfer.

[01:14:38] Jeremy: And, and talking about the community and the editors. I, I imagine on Wikipedia, most of the people editing are volunteers. Is it the same with Leaguepedia in your experience?

[01:14:55] Megan: Um, yeah, so I was contracted, uh, or I was not contracted. My LLC was contract and then I subcontracted. Um, it changed a bit over the years, um, as people left. Uh, so at first I subcontracted quite a few people. Um, and then I guess, as you can imagine, as, there was a lot more data entry that had to be done at the start.

And less had to be done later on, as I, expanded the code base so that it was more a single source of truth, and less stuff had to be duplicated. And I guess it was, it probably became a lot more fun too, uh, when you didn't have to edit, enter the same thing multiple times. but, uh, a bunch of people, uh, moved on over the years.

and so by the end I was only subcontracting, three people. Um, and everyone else was volunteer.

[01:15:55] Jeremy: And and the people that you were subcontracting, that was for only data entry, or was that also for the actual code?

[01:16:05] Megan: No, that wasn't for data entry at all. Um, and actually that was for all of my wikis, uh, because I was. Managing like all of the eSports wikis. or one of them was for Call of Duty and Halo, uh, to manage those wikis. One of them was for, uh, just the Call of Duty Wiki. and then one of them was for Leaguepedia to do staff onboarding.

[01:16:28] Jeremy: okay. So this is, um, this is to help people contribute to all of these wikis. That's, that's what these, these, uh, subcontractors we're focusing on.

[01:16:41] Megan: Yeah,

[01:16:44] Jeremy: I guess that, that makes sense when we've been talking about the complexity, uh, what's behind Leaguepedia, but there's a lot that the editors, it sounds like, have to learn as well to be able to know basically where to go and how to fill everything out and Yeah.

[01:17:08] Megan: So basically, for the major leagues, in League of Legends, um, we required some onboarding before you could cover them because we wanted results entered within like, about one to four minutes. of the game centering, or sorry, of the games ending. Um, so that was like for North America, Korea, China, Europe, and for the, like for some regions, like the really minor ones, like second tier leagues in, like for example the national leagues in Europe, second tier or something, we kind of didn't really care if it was entered immediately.

And so anyone who wanted to enter could just enter, uh, information. So we did want the experience to be easy enough that people could figure it out on their own. and we didn't really, uh, require onboarding for that. There was like a gradation of how much onboarding we required. But typically we tried to give training as much as we could.

Um, it, it was sort of dependent on how fast people expected the results and how available someone was to provide training. so like for Latin America, there was like a lot of people who were available to provide trainings. So even like the more minor leagues, people got training there. for example, But yeah, it was, it was very collaborative.

and a lot of people, a lot of people got involved, so, yeah.

[01:18:50] Jeremy: And in terms of having this expectation of having the results in, in just a few minutes and things like that, is it, where are, are these people volunteers where they would volunteer for a slot and then there was just this expectation? Or how did that work?

work

[01:19:09] Megan: Yeah. So, um, a lot of people volunteered with us as resume experience to try and get jobs in eSports. Um, and some people just volunteered with us because they wanted to give back to the community because, we're like a really valuable resource for the community. And I mean, without volunteer contribution we wouldn't have existed.

So it was like understood that we needed people's help in order to continue existing. So some people, volunteered for that reason. Some people just found it fun to help out. so there's like a range of reasons to contribute.

[01:19:46] Jeremy: And, and you were talking about how there's some people who they, they really need this data in, in that short time span. you know, who, who are we talking about here? Are these like commentators? Are these journalists? I'm just curious who's, who's,

looking for this in such a short time span

[01:20:06] Megan: Well, fans would look for the data immediately. sometimes if we entered a wrong result, someone would like come into our discord and be like, Hey, the result of this is wrong. you know, within seconds of the wrong result going up. So we knew that people were like looking at the Wiki, like immediately.

But everyone used the data, commentators at Riot. journalists. Fans, yeah. like everyone is using it.

[01:20:33] Jeremy: and since it's so important to, like you're mentioning Riot or the tournament organizers, things like that. What kind of relationship do you have with them? Do they provide any kind of support or is it mostly just, it's something they just use

[01:20:54] Megan: I, so there is, um, I definitely talk to people at Riot pretty regularly. and we. we got like resources from them, so, they'd give us player photos to put up, and like answers to questions and stuff. but for the most part it was just something that they'd use.

[01:21:15] Jeremy: and, and so like now that unfortunately your, your contract wasn't renewed with Leaguepedia like where do you, I guess see the, the future of Leaguepedia but, but also all these other eSports wikis going, is this something that's gonna be more just community driven or, I'm, I guess I'm trying to understand, you know, how this, the gap gets filled.

[01:21:47] Megan: Yeah, I'm, I'm not sure. Um, they're doing an update to Media Wiki 1.39 next year. we'll see if stuff majorly breaks during that. probably no one's gonna be able to fix it if it does. Who knows? (laughs) um, yeah, I don't know. There's another site that hosts, uh, eSports wikis called Liquipedia

um, so it's possible that they'll adopt some of the smaller wikis. Um, I think it's pretty unlikely that they'll want to take Leaguepedia, um, just because it's too complicated of a wiki. but yeah, I, I, I don't know.

[01:22:31] Jeremy: it kind of feels like one of these things where I guess whoever is in charge of making these decisions may not fully understand the implications or, or what it takes to, to run such a, a large wiki. yeah, I guess it'll be interesting to, to see if it ends up being like you said, one, one big mess.

[01:22:58] Megan: Yeah. I got them through the 1.37 upgrade by submitting like three or four patches to cargo, during that time and discovering that the patches needed to be made prior to the upgrade happening. So, you know, I don't think that they're going to update cargo during the 1.39 upgrade and it's cargo changes that have the biggest disruption.

So they're probably safe from that. and, and I don't think 1.39 has any big parser changes. I think that's later, but yeah, there'll probably still be like a bunch of CSS changes and who knows if anyone's going to fix the follow up from that.

So, yeah, we'll see.

[01:23:46] Jeremy: Yeah, that's, um, that's kind of interesting to know too that, these upgrades to MediaWiki and, and to extensions like cargo, that they change so significantly that they require pull requests. Is that, like, is that pretty common in terms of when you do an upgrade of a MediaWiki that there there are these individual things you need to do and it's not just update package.

[01:24:18] Megan: well the cargo change was the first time that we had upgraded in like two and a half years or something. so that one in particular, I think it was expected that that one wasn't going to go so smoothly. generally updates go not that badly. I say with rising intonation, (laughs) , um, if you keep up to date with stuff, it's generally pretty okay.

Cargo is probably one of the less stable ones just because it's a relatively small contributor base, and so kind of crazy things happen sometimes. Um, Semantic Media Wiki is a lot more stable. Uh, but then the downside is that if you have a feature request for SMW it's harder to get pushed through.

But cargo still changes a lot. The big change with cargo, like the big problematic change with cargo was a tiny bug fix that just so happened to change every empty string value to nil in Lua,

You know, no big deal or anything, whatever.

[01:25:42] Jeremy: That, that's, uh, that's a good one right there.

[01:25:47] Megan: I mean,

I I don't know how no one noticed this for like a year and a half or something man,

It was a tiny bug fix.

[01:26:02] Jeremy: Mm.

[01:26:03] Megan: Like it was checked in as a bug fix and it really was a bug fix. I tracked down the guy who made the patch and I was like, I can't reproduce this bug. Can I just revert it? And he was like, I can't reproduce it either.

[01:26:21] Jeremy: Oh, wow. (laughs)

[01:26:23] Megan: And I was like, well, that's great. And I ended up just leaving it in, but then changing them back to empty string.

Um, when the extension was first released, null database values were sent to Lua as empty string due to a bug in the first place. Because null databases, null database values should just be nil in Lula. Like, come on here, . But they were sent as empty string.

And so for like five years, people were writing code, assuming that you would never get a nil value for anything that you requested from the database. So you can't make a breaking change like that without putting a config value that defaults to true.

[01:27:10] Jeremy: Yeah.

[01:27:11] Megan: So I added a legacy, nil value, legacy Lua, nil value as empty string config value or something, and, defaulted it to true and wrote in the documentation that it was recommended that you set it to false.

Or maybe I defaulted it to false. I, I don't remember what I set the default to, but I wrote in the documentation something about how you should, if possible, set this to false, but if you have a large code base, you probably need this . And then we set up Platform Ride to True, and that's the story of how I saved the shit out of our 1.37 upgrade this year.

[01:27:57] Jeremy: Oh yeah, that's, um, that's a rough one. Changing, changing people's data is very scary.

[01:28:05] Megan: Yeah, I mean, it was totally unintended. and I don't know how no one noticed this either. I mean, I guess the answer is that not very many people do the kind of stuff that I do working with Lua and Cargo in this much depth. but a fairly significant number of fandom Wikis do, and this would've just been an absolute disaster.

And the semi ironic thing is that, I, I have a wrapper that fixes the initial cargo bug where I detect every empty string value and then cast it to nil after I get my data from cargo. So I would've been completely unaffected by this. And my wiki was the primary testing wiki for cargo on the 1.37 patch. So we wouldn't have caught this, it would've gone to live

[01:28:56] Jeremy: Wow.

[01:28:58] Megan: So we got extremely lucky that I found out about this ahead of time prior to us QAing and fixed this bug

because it would've gone straight to live.

[01:29:10] Jeremy: that's wild yeah, it's just like kind of catastrophic, right? It's like, if it happens, I feel like whoever is managing the wikis is gonna be very confused. Like, why, why is everything broken? I don't, I don't understand.

[01:29:25] Megan: Right? And this is like so much broken stuff that it's like very difficult to track down what's going on. I actually had a lot of trouble figuring out what was wrong in the code base.

Causing this error. And I submitted an incorrect patch at first, and then the incorrect patch got merged, and then I had to like roll back the incorrect patch.

And then I got a merge conflict on the incorrect patch. And it, it was, it was bad. It took me three patches to get this right.

Um, But eventually, eventually I got there.

[01:30:02] Jeremy: Yeah. that's software, I guess ,

[01:30:06] Megan: Yeah.

[01:30:07] Jeremy: the, the, the thing you were trying to avoid all these years.

[01:30:10] Megan: Yeah,

[01:30:13] Jeremy: you're in it now.

[01:30:14] Megan: It really was, that was actually the reason that I went in, I got into the Wiki in the first place, um, and into e-sports. Uh, was that after Caltech, I wanted to like get away from STEM altogether. I was like, I've had enough of this. Caltech was too much, get me away, (laughs) .

And I wanted to do like event management or tournament organization or something.

And so I wanted to work in eSports. and that was like my life plan. And I wanted nothing to do with STEM and I didn't wanna work in software. I didn't wanna do math. I was a math major. I didn't wanna do math. I didn't wanna go to grad school. I wanted absolutely nothing to do with this. So that was my plan.

And somehow I stumbled and ended up in software.

[01:31:02] Jeremy: Well, at least you got the eSports part.

[01:31:05] Megan: Yeah, so that, that worked out. And really for the first couple of years I was doing like community management and social media and stuff.

Um, and I did stay away from software for about the first two years, so it lasted about two whole years.

[01:31:24] Jeremy: What ended up pulling you in?

[01:31:26] Megan: Um, actually, so when, when I signed back with Gamepedia, our developer just sort of disappeared and I was like, well, shit, I guess that's me now. (laughs)

So we had someone else writing all of our templates for a couple years, so I was able to just like make a lot of feature requests. and I'm very good at making feature requests.

If, if I ever have like, access to someone else who's writing code for me, I'm like, fantastic at just making a ton of like really minor feature requests, and just like taking off all of their time with like a billion tiny QA issues.

[01:32:09] Jeremy: You you are the backlog,

[01:32:12] Megan: Yeah, I really, um, I, there's another OSS project that I've been working on, um, which is a Discord bot and. We, our, our backlog just expands and expands and

[01:32:26] Jeremy: Oh yeah. You know what, I, I think I did look at that. I, I looked at the issues and, usually when you look at a, the issues for an open source project, it's, it's all these people using it, right? That are like, uh, there's this thing I want, but then I looked and it was all, it was all you. So I guess that's okay cuz you're, you're in the position to do something about it.

[01:32:47] Megan: The, the part that you don't know is that I'm like constantly begging other people to open tickets too.

[01:32:53] Jeremy: Really?

[01:32:55] Megan: Yeah. Like constantly. I'm like, guys, it can't just be me opening tickets all the time.

[01:33:04] Jeremy: Yeah. Yeah. If it was, if it was someone else's project, I would be like, oh, this is, uh, .

I don't know about this. But when it's your own, you know, okay. It's, it's basically like, um, it's like a roadmap I guess.

[01:33:20] Megan: Yeah. Some of them have been open for, for quite a long time, but actually a couple months ago we closed one that had been open since, I think like April, 2020.

[01:33:31] Jeremy: Oh, nice.

[01:33:32] Megan: That was quite an event.

[01:33:34] Jeremy: Yeah, it's open source, So you can do whatever you want, right. (laughs)

[01:33:41] Megan: We even have a couple good first issues that are actually good first issues.

[01:33:46] Jeremy: Yeah. Not, not getting any takers?

[01:33:49] Megan: No, we sometimes do. Yeah. I actually, we, so some of them are like semi-important features, but I like feel really bad if I ever do the good first issues myself because like somewhere else could do them. And so like, if it's like a one line ticket, I would just, I feel so much guilt for doing it myself.

[01:34:09] Jeremy: Oh, I see what you mean.

[01:34:10] Megan: I'm like, Yeah. so I just like, I can't do them. But then I'm like, oh, but this is really important. But then I'm like, oh, but we might get someone else who, and I just, I never know if I should just take the plunge and do it myself, so.

[01:34:22] Jeremy: yeah. No, that's, that's a good point. It's, it's like, like these opportunities, right. For people to, and it could, it could make a big difference for them. And then for you, it's like, I could do this in 10 minutes or whatever. ,

Uh, I, I guess it all depends on how annoyed you are by the thing not being there,

[01:34:43] Megan: Right. I know because my entire background is like community and getting new people to onboard and like the potential new contributor is worth like 10 times, like, The one PR that I can make. So I should just like absolutely leave it open for the next year.

[01:35:02] Jeremy: Yeah. Yeah, no, that's a, that's a good way of, of looking at it. I mean, I I think when you talk about open source or, or even wikis, that that sort of community aspect is, is so, so important, right? Because if it's just, if it's just one person, then I mean, it kind of, it lives or dies with the one person, right?

It, it's, it's so different when you actually get a bunch of people involved. And I think that's something like a lot of, a lot of projects struggle with

[01:35:38] Megan: Yeah. That's actually, as much as I'm like bitter about the fact that I was let go from my own project, I think the thing that I should, in a sense be the most proud of is that I grew my project to a place where that was able to happen in a sense. Like, I built this and I built it to a place where it was sustainable.

Although, we'll see how sustainable it was, (laughs) . but like I'm not needed for the day to day. and that means that like I successfully built a community.

[01:36:18] Jeremy: Yeah, no, you should be really proud about that because it's, it's not only like the, the code, right? Like over the years it sounds like you gradually made it easier and easier to contribute, but then also being able to get all these volunteers together and build a community on the discord and, and elsewhere.

Yeah, no, I think that's, I think that's really great to be able to, to do, do something like that.

[01:36:50] Megan: Thanks.

[01:36:53] Jeremy: I think that's, that's a good place to, to wrap up, but is there anything else you wanted to, to mention or do you want to tell people where to check out, uh, what you're up to?

[01:37:05] Megan: Yeah, I, I have a blog that's a little bit inactive for the past couple months, because I recently had surgery, but I, I've been saying for like five weeks that I will start, posting there again. So hopefully that happens soon. Uh, but it's river.me, and so you can check that out.

[01:37:27] Jeremy: Cool. Well, yeah, Megan, I just wanna say thanks for, for taking the time. This was, this was really interesting. the world of wikis is like this, it's like a really big part of the internet that, um, I use wikis, but I, I've never really understood kind of what's going on in, in terms of the actual technology and the community. so so thank you for, for sharing that.

[01:37:53] Megan: Yeah. Thanks so much for having me.

Jan 02, 2023

01:50:47

Victor is a software consultant in Tokyo who describes himself as a yak shaver. He writes on his blog at vadosware and curates Awesome F/OSS, a mailing list of open source products. He's also a contributor to the Open Core Ventures blog.

Before our conversation Victor wrote a structured summary of how he works on projects. I recommend checking that out in addition to the episode.

Topics covered:

Most people should use Dokku or CapRover
But he uses Kubernetes anyways
Hosting a Database in Kubernetes
Learning technology
You don't really know a thing until something goes wrong
History of Frontend Development
Context from lower layers of the stack and historical projects
Good project pages have comparisons to other products
Choosing technologies
Language choice affects maintainability
Knowing an ecosystem
Victor's preferred stack
Technology bake offs
Posting findings means you get free corrections
Why people use medium instead of personal sites

Victor

VADOSWARE - Blog
How Victor works on Projects - Companion post for this episode
Awesome FOSS - Curated list of OSS projects
NimbusWS - Hosted OSS built on top of budget cloud providers
Unvalidated Ideas - Startup ideas for side project inspiration
PodcastSaver - Podcast index that allows you to choose Postgres or MeiliSearch and compare performance and results of each

Victor's preferred stack

Docker - Containers
Kubernetes - Container provisioning (Though at the beginning of the episode he suggests Dokku for single server or CapRover for multiple)
TypeScript - JavaScript with syntax for types. Victor's default choice.
lang.org/">Rust - Language he uses if doing embedded work, performance is critical, or more correctness is desired
Haskell - Language he uses if correctness and type system is the most important for the project
Postgresql - General purpose database that's good enough for most use cases including full text search.
KeyDB - Redis compatible database for caching. Acquired by Snap and then made open source. Victor uses it over Redis because it is multi threaded and supports flash storage without a Redis Enterprise license.
Pulumi - Provision infrastructure with the languages you're already using instead of a specialized one or YAML
Svelte and SvelteKit - Preferred frontend stack. Previously used Nuxt.

Search engines

Postgres Full Text Search vs the rest
Optimizing Postgres Text Search with Trigrams
OpenSearch - Amazon's fork of Elasticsearch
typesense
meilisearch
sonic
Quickwit

JavaScript build tools

JavaScript frameworks

Frameworks built on top of frameworks

Next - React
Nuxt - Vue
SvelteKit - Svelte
Astro - Multiple

Historical JavaScript tools and frameworks

Underscore
jQuery
MooTools
Backbone
AngularJS
Knockout
Aurelia
GWT
Bower - Frontend package manager
Grunt - Task runner
Gulp - Task runner

Transcript

You can help edit this transcript on GitHub.

[00:00:00] Jeremy: This episode, I talk to Victor Adossi who describes himself as a yak shaver. Someone who likes trying a whole bunch of different technologies, seeing the different options. We talk about what he uses, the evolution of front end development, and his various projects.

Talking to just different people it's always good to get where they're coming from because something that works for Google at their scale is going to be different than what you're doing with one of your smaller projects.

[00:00:31] Victor: Yeah, the context. Of course in direct conflict with that statement, I definitely use Google technology despite not needing to at all right? Like, you know, 99% of people who are doing like people like to call it indiehacking or building small products could probably get by with just Dokku. If you know Dokku or like CapRover. Are two projects that'll be like, Oh, you can just push your code here, we'll build it up like a little mini Heroku PaaS thing and just go on one big server, right? Like 99% of the people could just use that. But of course I'm not doing that. So I'm a bit of a hypocrite in that sense.

I know what I should be doing, but I'm not doing that. I am writing a Kubernetes cluster with like five nodes for no reason. Uh, yeah, I dunno, people don't normally count the controllers.

[00:01:24] Jeremy: Dokku and CapRover, I think those are where it's supposed to create a heroku like experience I think it's based off of the heroku buildpacks right? At least Dokku is?

[00:01:36] Victor: Yeah Buildpacks has actually been spun out into like a community thing so like pivotal and heroku, it's like buildpacks.io, they're trying to build a wider standard around it so that more people can get involved.

And buildpacks are actually obviously fantastic as a technology and as a a process piece. There's not much else like them and you know, that's obvious from like Heroku's success and everything. I know Dokku uses that. I don't know that Caprover does, but I haven't, I haven't really run Caprover that much.

They, they probably do. Like at this point if you're going to support building from code, it seems silly to try and build your own buildpacks. Cause that's what you will do, eventually. So you might as well use what's there.

Anyway, this is like just getting to like my personal opinions at this point, but like, if you think containers are a bad idea in 2022, You're wrong, you should, you should stop. Like you should, you should stop. Think about it. I mean, obviously there's not, um, I got a really great question at an interview once, which is, where are containers a bad idea?

That's probably one of the best like recent interview questions I've ever gotten cause I was like, Oh yeah, I mean, like, you can't, it can't be perfect everywhere, right? Nothing's perfect everywhere. So it's like, where is it? Uh, and of course the answer was networking, right? (unintelligible)

So if you need absolute performance, but like for just about everything else. Containers are kind of it at this point. Like, time has born it out, I think. So yeah, I always just like bias at taking containers at this point. So I'm probably more of a CapRover person than a Dokku person, even though I have not used, I don't use CapRover.

[00:03:09] Jeremy: Well, like something that I've heard with containers, and maybe it's changed recently, but, but something that was kind of holdout was when people would host a database sometimes they would oh we just don't wanna put this in a container and I wonder if like that matches with your thinking or if things have changed.

[00:03:27] Victor: I am not a database administrator right like I read postgres docs and I read the, uh, the Postgres documentation, and I think I know a bit about postgres but I don't commit right like so and I also haven't, like, oh, managed X terabytes on one server that you are making sure never goes down kind of deal.

But the stickiness for me, at least from when I've run, So I've done a lot of tests with like ZFS and Postgres and like, um, and also like just trying to figure out, and I run Postgres in Kubernetes of course, like on my cluster and a lot of the stuff I found around is, is like fiddly kernel things like sort of base kernel settings that you need to have set.

Like, you know, stuff like should you be using transparent huge pages, like stuff like that. But once you have that settled. Containers are just processes with name spacing and resource control, right? Like, that's it. there are some other ins and outs, but for the most part, if you're fine running a process, so people ran processes, right?

And they were just completely like unprotected. Then people made users for the processes and they limited the users and ran the processes, right? Then the next step is now you can run a process and then do the limiting the name spaces in cgroups dynamically. Like there, there's, there's sort of not a humongous difference, unless you're hitting something very specific.

Uh, but yeah, databases have been a point of contention, but I think, Kelsey Hightower had that tweet yeah. That was like, um, don't run databases in Kubernetes. And I think he called it back.

[00:04:56] Victor: I don't know, but I, I know that was uh, was one of those things that people were really unsure about at first, but then after people sort of like felt it out, they were like, Oh, it's actually fine. Yeah.

[00:05:06] Jeremy: Yeah I vaguely remember one of the concerns having to do with persistent storage. Like there were challenges with Kubernetes and needing to keep that storage around and I don't know if that's changed yeah or if that's still a concern.

[00:05:18] Victor: Uh, I'd say that definitely has changed. Uh, and it was, it was a concern, depending on where you were. Mostly people who are running AKS or EKS or you know, all those other managed Kubernetes, they're just using EBS or like whatever storage provider is like offering for storage.

Most of those people don't actually have that much of a problem with, storage in general.

Now, high performance storage is obviously different, right? So like, so you'll, you're gonna have to start doing manual, like local volume management and stuff like that. it was a problem, because obviously CSI (Kubernetes Container Storage Interface) didn't exist for some period of time, and like there was, it was hard to know what to do for if you were just running a Kubernetes cluster. I think a lot of people were just using local, first of all, local didn't even exist for a bit.

Um, they were just using host path, right? And just like, Oh, it's on the disk somewhere. Where do we, we have to go get it right? Or we have to like, sort of manage that. So that was something most people weren't ready for, especially if you were just, if you weren't like sort of a, a, a traditional sysadmin and used to doing that stuff.

And then of course local volumes came out, but I think they still had to be, um, pre-provisioned. So that's sysadmin stuff that most people, you know, maybe aren't, aren't necessarily ready for. Uh, and then most of the general solutions were slow. So like, I used Longhorn (https://longhorn.io) for a long time and Longhorn, Longhorn's great. And super easy to set up, but it can be slower and you can have some, like, delays in mount time. it wasn't ideal for, for most people.

So yeah, I, overall it's true. Databases, Databases in Kubernetes were kind of fraught with peril for a while, but it wasn't for the reason that, it wasn't for the fundamental reason that Kubernetes was just wrong or like, it wasn't the reason most people think of, which is just like, Oh, you're gonna break your database.

It's more like, running a database is hard and Kubernetes hasn't solved all the hard problems. Like, cuz that's what Kubernetes does. It basically solves a lot of problems in a very generic way. Right. So it just hadn't solved all those problems yet at this point. I think it's got decent answers on a lot of them.

So I, I mean, I don't know. I I do it. Don't, don't take what I'm saying to your, you know, PM meeting or your standup meeting, uh, anyone who's listening. But it's more like if you could solve the problems with databases in the sense before. You could probably solve 'em on Kubernetes now with a good understanding of Kubernetes.

Cause at the end of the day, it's all the same stuff. Just Kubernetes makes it a little easier to, uh, do it dynamically.

[00:07:50] Jeremy: It sounds like you could do it before, but some of the, I guess the tools or the ways of doing persistent storage were not quite there yet, or they were difficult to use. And so that was why people at the start were like, Okay, maybe it's not a good idea, but, now maybe there's some established practices for how you should run a database in Kubernetes.

And I, I suppose the other aspect too is that, like you were saying, Kubernetes is its own thing. You gotta learn Kubernetes and all its intricacies. And then running a database is also its own challenge. So if you stack the two of them together and, and the path was not really clear then maybe at the start it wasn't the best idea. Um, uh, if somebody was going to try it out now, was there like a specific resource you looked at or a specific path to where like okay this is is how I'm going to do it.

[00:08:55] Victor: I'll just say what I normally recommend to everybody.

Cause it depends on which path you wanna go right? If you wanna go down like running a database path first and figure that out, fill out that skill tree. Like go read the Postgres docs.

Well, first of all, use Postgres. That's the first tip there. But like, read those documents. And obviously you don't have to understand everything. You won't understand everything. But knowing the big pieces and sort of letting your brain see the mention of like a whole bunch of things, like what is toast?

Oh, you can do compression on columns. Like, you can do some, some things concurrently. Um, you know, what ALTER TABLE looks like. You get all that stuff kind of in your head. Um, and then I personally really believe in sort of learning by building and just like iterating. you won't get it right the first time. It's just like, it's not gonna happen. You're get, you can, you can get better the first time, right? By being really prepared and like, and leave yourself lots of outs, but you kind of have to like, get it out there. Do do your best to make sure that you can't fail, uh, catastrophically, right?

So this is like, goes back to that decision to like use ZFS as the bottom of this I'm just like, All right, well, I, I'm not a file systems expert, but if I. I could delegate some of that, you know, some of that, I can get some of that knowledge from someone else. Um, and I can make it easier for me to not fail catastrophically.

For the database side, actually read documentation on Postgres or the whatever database you're going to use, make sure you at least understand that. Then start running it like locally or whatever. Again, Docker use, use Docker locally.

It's, it's, it's fine. and then, you know, sort of graduate to running sort of more progressively, more complicated versions. what I would say for the Kubernetes side is actually similar. the Kubernetes docs are really good. they're very large. but they're good.

So you can actually go through and know all the, like, workload, workload resources, know, like what a config map is, what a secret is, right? Like what etcd is doing in this whole situation. you know, what a kublet is versus an API server, right? Like the, the general stuff, like if you go through all that, you should have like a whole bunch of ideas at least floating around in your head. And then once you try and start setting up a server, they will all start to pop up again, right? And they'll all start to like, you, like, Oh, okay, I need a CNI (Container Networking) plugin because something needs to make the services available, right? Or something needs to power the ingress, right? Like, if I wanna be able to get traffic, I need an ingress object.

But what listens, what does that, what makes that ingress object do anything? Oh, it's an ingress controller. nginx, you know, almost everyone's heard of nginx, so they're like, okay. Um, nginx, has an ingress control. Actually there's, there used to be two, I assume there's still two, but there's like one that's maintained by Kubernetes, one that's maintained by nginx, the company or whatever.

I use traefik, it's fantastic. but yeah, so I think those things kind of fall out and that is almost always my first way to explain it and to start building. And tinkering iteratively. So like, read the documentation, get a good first grasp of it, and then start building yourself because you'll, you'll get way more questions that way.

Like, you'll ask way more questions, you won't be able to make progress. Uh, and then of course you can, you know, hop into slacks or like start looking around and, and searching on the internet. oh, one of the things that really helped me out early learning Kubernetes was, Kelsey Hightower's, um, learn Kubernetes the hard way. I'm also a big believer in doing things the hard way, at least knowing what you're choosing to not know, right? distributing file system, Deltas, right? Or like changes to a file system over the network is not a new problem. Other people have solved it. There's a lot of complexity there. but if you at least know the sort of surface level of what the thing does and what it's supposed to do and how it's supposed to do it, you can make a decision on, Oh, how deep am I going to go?

Right? To prevent yourself from like, making a mistake or going too deep in the rabbit hole. If you have an idea of the sort of ecosystem and especially like, Oh, here, like the basics of how I can use this thing, that's generally very good. And doing things the hard way is a great way to get a, a feel for that, right?

Cause if you take some chunk and like, you know, the first level of doing things the hard way, uh, or, you know, Kelsey Hightower's guide is like, get a machine, right? Like, so, like, if you somehow were like, Oh, I wanna run a Kubernetes cluster. but, you know, I don't want use necessarily EKS and you wanna learn it the hard way.

You have to go get a machine, right? If you, if you're not familiar, if you run on Heroku the whole time, like you didn't manage your own machines, you gotta go like, figure out EC2, right? Or, I personally use, hetzner I love hetzner, so you have to go figure out hetzner, digital ocean, whatever.

Right. And then the next thing's like, you know, the guide's changed a lot, and I haven't, I haven't looked at it in like, in years, actually a while since I, since I've sort of been, I guess living it, but it's, it's like generate certificates, right? So if you've never dealt with SSL and like, sort of like, or I should say TLS uh, and generating certificates and how that whole dance works, right?

Which is fascinating because it's like, oh, right, nothing's secure on the internet, except that we distribute root certificates on computers that are deployed in every OS, right? Like, that's a sort of fundamental understanding you may not go deep enough to realize, but if you are fascinated by it, trying to do it manually would lead you down that path.

You'd be like, Oh, what, like what is this thing? What is a CSR? Like, why, who is signing my request? Right? And it's like, why do we trust those people? Right? And it's like, you know, that kind of thing comes out and I feel like you can only get there from trying to do it, you know, answering the questions you can.

Right. And again, it takes some judgment to know when you should not go down a rabbit hole. uh, and then iterating. of course there are people who are excellent at explaining. you can find some resources that are shortcuts. But, uh, I think particularly my bread and butter has been just to try and do it the hard way.

Avoid pitfalls or like rabbit holes when you can. But know that the rabbit hole is there, and then keep going. And sometimes if something's just too hard, you're not gonna get it the first time. Like maybe you'll have to wait like another three months, you'll try again and you'll know more sort of ambiently about everything else.

You get a little further that time. that's how I feel about that. Anyway.

[00:15:06] Jeremy: That makes sense to me. I think sometimes when people take on a project, they try to learn too many things at the same time. I, I think the example of Kubernetes and Postgres is pretty good example, where if you're not familiar with how do I install Postgres on bare metal or a vm, trying to make sense of that while you're trying to into is probably gonna be pretty difficult.

So, so splitting them up and learning them individually, that makes a lot of sense to me. And the whole deciding how deep you wanna go. That's interesting too, because I think that's very specific to the person right because sometimes you wanna go a little deeper because otherwise you don't understand how the two things connect together.

But other times it's just like with the example with certificates, some people they may go like, I just put in let's encrypt it gives me my cert I don't care right then, and then, and some people they wanna know like okay how does the whole certificate infrastructure work which I think is interesting, depending on who you are, maybe you go ahh maybe it doesn't really matter right.

[00:16:23] Victor: Yeah, and, you know, shout out to Let's Encrypt . It's, it's amazing, right? think Singlehandedly the most, most of the deployment of HTTPS that happens these days, right? so many so many of like internet providers and uh, sort of service providers will use it right?

Under the covers. Like, Hey, we've got you free SSL through Let's Encrypt, right? Like, kind of like under the, under the covers. which is awesome. And they, and they do it. So if you're listening to this, donate to them. I've done it. So now that, now the pressure is on whoever's listening, but yeah, and, and I, I wanna say I am that person as well, right?

Like, I use, Cert Manager on my cluster, right? So I'm just like, I don't wanna think about it, but I, you know, but I, I feel like I thought about it one time. I have a decent grasp. If something changes, then I guess I have to dive back in. I think it, you've heard the, um, innovation tokens idea, right?

I can't remember the site. It's like, um, do, like do boring tech or something.com (https://boringtechnology.club/) . Like it shows up on sort of hacker news from time to time, essentially. But it's like, you know, you have a certain amount of tokens and sort of, uh, we'll call them tokens, but tolerance for complexity or tolerance for new, new ideas or new ways of doing things, new processes.

Uh, and you spend those as you build any project, right? you can be devastatingly effective by just sticking to the stack, you know, and not introducing anything new, even if it's bad, right? and there's nothing wrong with LAMP stack, I don't wanna annoy anybody, but like if you, if you're running LAMP or if you run on a hostgator, right?

Like, if you run on so, you know, some, some service that's really old but really works for you isn't, you know, too terribly insecure or like, has the features you need, don't learn Kubernetes then, right? Especially if you wanna go fast. cuz you, you're spending tokens, right? You're spending, essentially brain power, right?

On learning whatever other thing. So, but yeah, like going back to that, databases versus databases on Kubernetes thing, you should probably know one of those before you, like, if you're gonna do that, do that thing. You either know Kubernetes and you like, at least feel comfortable, you know, knowing Kubernetes extremely difficult obviously, but you feel comfortable and you feel like you can debug.

Little bit of a tangent, but maybe that's even a better, sort of watermark if you know how to debug a thing. If, if it's gone wrong, maybe one or five or 10 or 20 times and you've gotten out. Not without documentation, of course, cuz well, if you did, you're superhuman.

But, um, but you've been able to sort of feel your way out, right? Like, Oh, this has gone wrong and you have enough of a model of the system in your head to be like, these are the three places that maybe have something wrong with them. Uh, and then like, oh, and then of course it's just like, you know, a mad dash to kind of like, find, find the thing that's wrong.

You should have confidence about probably one of those things before you try and do both when it's like, you know, complex things like databases and distributed systems management, uh, and orchestration.

[00:19:18] Jeremy: That's, that's so true in, in terms of you are comfortable enough being able to debug a problem because it's, I think when you are learning about something, a lot of times you start with some kind of guide or some kind of tutorial and you follow the steps. And if it all works, then great.

Right? But I think it's such a large leap from that to something went wrong and I have to figure it out. Right. Whether it's something's not right in my Dockerfile or my postgres instance uh, the queries are timing out. so many things that could go wrong, that is the moment where you're forced to figure out, okay, what do I really know about this not thing?

[00:20:10] Victor: Exactly. Yeah. Like the, the rubber's hitting the road it's uh you know the car's about to crash or has already crashed like if I open the bonnet, do I know what's happening right or am I just looking at (unintelligible).

And that's, it's, I feel sort a little sorry or sad for, for devs that start today because there's so much. Complexity that's been built up. And a lot of it has a point, but you need to kind of have seen the before to understand the point, right? So I like, I like to use front end as an example, right? Like the front end ecosystem is crazy, and it has been crazy for a very long time, but the steps are actually usually logical, right?

Like, so like you start with, you know, HTML, CSS and JavaScript, just plain, right? And like, and you can actually go in lots of directions. Like HTML has its own thing. CSS has its own sort of evolution sort of thing. But if we look at JavaScript, you're like, you're just writing JavaScript on every page, right?

And like, just like putting in script tags and putting in whatever, and it's, you get spaghetti, you get spaghetti, you start like writing, copying the same function on multiple pages, right? You just, it, it's not good. So then people, people make jquery, right? And now, now you've got like a, a bundled set of like good, good defaults that you can, you can go for, right?

And then like, you know, libraries like underscore come out for like, sort of like not dom related stuff that you do want, you do want everywhere. and then people go from there and they go to like backbone or whatever. it's because Jquery sort of also becomes spaghetti at some point and it becomes hard to manage and people are like, Okay, we need to sort of like encapsulate this stuff somehow, right?

And like the new tools or whatever is around at the same timeframe. And you, you, you like backbone views for example. and you have people who are kind of like, ah, but that's not really good. It's getting kind of slow.

Uh, and then you have, MVC stuff comes out, right? Like Angular comes out and it's like, okay, we're, we're gonna do this thing called dirty checking, and it's gonna be, it's gonna be faster and it's gonna be like, it's gonna be less sort of spaghetti and it's like a little bit more structured. And now you have sort of like the rails paradigm, but on the front end, and it takes people to get a while to get adjusted to that, but then that gets too heavy, right?

And then dirty checking is realized to be a mistake. And then, you get stuff like MVVM, right? So you get knockout, like knockout js and you got like Durandal, and like some, some other like sort of front end technologies that come up to address that problem. Uh, and then after that, like, you know, it just keeps going, right?

Like, and if you come in at the very end, you're just like, What is happening? Right? Like if it, if it, if someone doesn't sort of boil down the complexity and reduce it a little bit, you, you're just like, why, why do we do this like this? Right? and sometimes there's no good reason.

Sometimes the complexity is just like, is unnecessary, but having the steps helps you explain it, uh, or helps you understand how you got there. and, and so I feel like that is something younger people or, or newer devs don't necessarily get a chance to see. Cause it just, it would take, it would take very long right? And if you're like a new dev, let's say you jumped into like a coding bootcamp. I mean, I've got opinions on coding boot camps, but you know, it's just like, let's say you jumped into one and you, you came out, you, you made it. It's just, there's too much to know. sure, you could probably do like HTML in one month.

Well, okay, let's say like two weeks or whatever, right? If you were, if you're literally brand new, two weeks of like concerted effort almost, you know, class level, you know, work days right on, on html, you're probably decently comfortable with it. Very comfortable. CSS, a little harder because this is where things get hard.

Cause if you, if you give two weeks for, for HTML, CSS is harder than HTML kind of, right? Because the interactions are way more varied. Right? Like, and, and maybe it's one of those things where you just, like, you, you get somewhat comfortable and then just like know that in the future you're gonna see something you don't understand and have to figure it out. Uh, but then JavaScript, like, how many months do you give JavaScript? Because if you go through that first like, sort of progression that I, I I, I, I mentioned everyone would have a perfect sort of, not perfect but good understanding of the pieces, right? Like, why did we start transpiling at all? Right? Like, uh, or why did you know, why did we adopt libraries?

Like why did Bower exist? No one talks about Bower anymore, obviously, but like, Bower was like a way to distribute front end only packages, right? Um, what is it? Um, Uh, yes, there's grunt. There's like the whole build system thing, right? Once, once we decide we're gonna, we're gonna do stuff to files before we, before we push. So there's grunt, there's, uh, gulp, which is like grunt, but like, Oh, we're gonna do it all in memory. We're gonna pipe, we're gonna use this pipes thing to make sure everything goes fast. then there's like, of course that leads like the insanity that's webpack. And then there's like parcel, which did better.

There's vite there's like, there's all this, there's this progression, but how many months would it take to know that progression? It, it's too long. So they end up just like, Hey, you're gonna learn react. Which is the right thing because it's like, that's what people hire for, right? But then you're gonna be in react and be like, What's webpack, right?

And it's like, but you can't go down. You can't, you don't have the time. You, you can't sort of approach that problem from the other direction where you, which would give you better understanding cause you just don't have the time. I think it's hard for newer devs to overcome this.

Um, but I think there are some, there's some hope on the horizon cuz some things are simpler, right? Like some projects do reduce complexity, like, by watching another project sort of innovate so like react. Wasn't the first component, first framework, right? Like technically, I, I think, I think you, you might have to give that to like, to maybe backbone because like they had views and like marionette also went with that.

Like maybe, I don't know, someone, someone I'm sure will get in like, send me an angry email, uh, cuz I forgot you Moo tools or like, you know, Ember Ember. They've also, they've also been around, I used to be a huge Ember fan, still, still kind of am, but I don't use it. but if you have these, if you have these tools, right?

Like people aren't gonna know how to use them and Vue was able to realize that React had some inefficiencies, right? So React innovates the sort of component. So Reintroduces the component based model component first, uh, front end development model. Vue sees that and it's like, wait a second, if we just export this like data object, and of course that's not the only innovation of Vue, but if we just export this data object, you don't have to do this fine grained tracking yourself anymore, right?

You don't have to tell React or tell your the system which things change when other things change, right? Like you, you don't have to set up this watching and stuff, right? Um, and that's one of the reasons, like Vue is just, I, I, I remember picking up Vue and being like, Oh, I'm done. I'm done with React now.

Because it just doesn't make sense to use React because they Vue essentially either, you know, you could just say they learned from them or they, they realize a better way to do things that is simpler and it's much easier to write. Uh, and you know, functionally similar, right? Um, similar enough that it's just like, oh they boil down some of that complexity and we're a step forward and, you know, in other ways, I think.

Uh, so that's, that's awesome. Every once in a while you get like a compression in the complexity and then it starts to ramp up again and you get maybe another compression. So like joining the projects that do a compression. Or like starting to adopting those is really, can be really awesome. So there's, there's like, there's some hope, right?

Cause sometimes there is a compression in that complexity and you you might be lucky enough to, to use that instead of, the thing that's really complex after years of building on it.

[00:27:53] Jeremy: I think you're talking about newer developers having a tough time making sense of the current frameworks but the example you gave of somebody starting from HTML and JavaScript going to jquery backbone through the whole chain, that that's just by nature of you've put in a lot of time right you've done a lot of work working with each of these technologies you see the progression

as if someone is starting new just by nature of you being new you won't have been able to spend that time

[00:28:28] Victor: Do you think it could work? again, the, the, the time aspect is like really hard to get like how can you just avoid spending time um to to learn things that's like a general problem I think that problem is called education in the general sense.

But like, does it make sense for a, let's say a bootcamp or, or any, you know, school right? To attempt to guide people through the previous solutions that didn't work, right? Like in math, you don't start with calculus, right? It just wouldn't, it doesn't make sense, right? But we try and start with calculus in software, right?

We're just like, okay, here's the complexity. You've got all of it. Don't worry. Just look at this little bit. If, you know, if the compiler ever spits out a weird error uh oh, like, you're, you're, you're in for trouble cuz you, you just didn't get the. get the basics. And I think that's maybe some of what is missing.

And the thing is, it is like the constraints are hard, right? No one has infinite time, right? Or like, you know, even like, just tons of time to devote to learning, learning just front end, right? That's not even all of computing, That's not even the algorithm stuff that some companies love to throw at you, right?

Uh, or the computer sciencey stuff. I wonder if it makes more sense to spend some time taking people through the progression, right? Because discovering that we should do things via components, let's say, or, or at least encapsulate our functionality to components and compose that way, is something we, we not everyone knew, right?

Or, you know, we didn't know wild widely. And so it feels like it might make sense to touch on that sort of realization and sort of guide the student through, you know, maybe it's like make five projects in a week and you just get progressively more complex. But then again, that's also hard cause effort, right?

It's just like, it's a hard problem. But, but I think right now, uh, people who come in at the end and sort of like see a bunch of complexity and just don't know why it's there, right? Like, if you've like, sort of like, this is, this applies also very, this applies to general, but it applies very well to the Kubernetes problem as well.

Like if you've never managed nginx on more than one machine, or if you've never tried to set up a, like a, to format your file system on the machine you just rented because it just, you know, comes with nothing, right? Or like, maybe, maybe some stuff was installed, but, you know, if you had to like install LVM (Logical Volume Manager) yourself, if you've never done any of that, Kubernetes would be harder to understand.

It's just like, it's gonna be hard to understand. overlay networks are hard for everyone to understand, uh, except for network people who like really know networking stuff. I think it would be better. But unfortunately, it takes a lot of time for people to take a sort of more iterative approach to, to learning.

I try and write blog posts in this way sometimes, but it's really hard. And so like, I'll often have like an idea, like, so I call these, or I think of these as like onion, onion style posts, right? Where you either build up an onion sort of from the inside and kind of like go out and like add more and more layers or whatever.

Or you can, you can go from the outside and sort of take off like layers. Like, oh, uh, Kubernetes has a scheduler. Why do they need a scheduler? Like, and like, you know, kind of like, go, go down. but I think that might be one of the best ways to learn, but it just takes time. Or geniuses and geniuses who are good at two things, right?

Good at the actual technology and good at teaching. Cuz teaching is a skill and it's very hard. and, you know, shout out to teachers cuz that's, it's, it's very difficult, extremely frustrating. it's hard to find determinism in, in like methods and solutions.

And there's research of course, but it's like, yeah, that's, that's a lot harder than the computer being like, Nope, that doesn't work. Right? Like, if you can't, if you can't, like if you, if the function call doesn't work, it doesn't work. Right. If the person learned suboptimally, you won't know Right. Until like 10 years down the road when, when they can't answer some question or like, you know, when they, they don't understand. It's a missing fundamental piece anyway.

[00:32:24] Jeremy: I think with the example of front end, maybe you don't have time to walk through the whole history of every single library and framework that came but I think at the very least, if you show someone, or you teach someone how to work with css, and you have them, like you were talking about components before you have them build a site where there's a lot of stuff that gets reused, right? Maybe you have five pages and they all have the same nav bar.

[00:33:02] Victor: Yeah, you kind of like make them do it.

[00:33:04] Jeremy: Yeah. You make 'em do it and they make all the HTML files, they copy and paste it, and probably your students are thinking like, ah, this, this kind of sucks

[00:33:16] Victor: Yeah

[00:33:18] Jeremy: And yeah, so then you, you come to that realization, and then after you've done that, then you can bring in, okay, this is why we have components.

And similarly you brought up, manual dom manipulation with jQuery and things like that. I, I'm sure you could come up with an example of you don't even necessarily need to use jQuery. I think people can probably skip that step and just use the the, the API that comes with the browser.

But you can have them go in like, Oh, you gotta find this element by the id and you gotta change this based on this, and let them experience the. I don't know if I would call it pain, but let them experience like how it was. Right. And, and give them a complex enough task where they feel like something is wrong right. Or, or like, there, should be something better. And then you can go to you could go straight to vue or react. I'm not sure if we need to go like, Here's backbone, here's knockout.

[00:34:22] Victor: Yeah. That's like historical. Interesting.

[00:34:27] Jeremy: I, I think that would be an interesting college course or something that.

Like, I remember when, I went through school, one of the classes was programming languages. So we would learn things like, Fortran and stuff like that. And I, I think for a more frontend centered or modern equivalent you could go through, Hey, here's the history of frontend development here's what we used to do and here's how we got to where we are today.

I think that could be actually a pretty interesting class yeah

[00:35:10] Victor: I'm a bit interested to know you learned fortran in your PL class.

I, think when I went, I was like, lisp and then some, some other, like, higher classes taught haskell but, um, but I wasn't ready for haskell, not many people but fortran is interesting, I kinda wanna hear about that.

[00:35:25] Jeremy: I think it was more in terms of just getting you exposed to historically this is how things were. Right. And it wasn't so much of like, You can take strategies you used in Fortran into programming as a whole. I think it was just more of like a, a survey of like, Hey, here's, you know, here's Fortran and like you were saying, here's Lisp and all, all these different languages nd like at least you, you get to see them and go like, yeah, this is kind of a pain.

[00:35:54] Victor: Yeah

[00:35:55] Jeremy: And like, I understand why people don't choose to use this anymore but I couldn't take away like a broad like, Oh, I, I really wish we had this feature from, I think we were, I think we were using Fortran 77 or something like that.

I think there's Fortran 77, a Fortran 90, and then there's, um, I think,

[00:36:16] Victor: Like old fortran, deprecated

[00:36:18] Jeremy: Yeah, yeah, yeah. So, so I think, I think, uh, I actually don't know if they're, they're continuing to, um, you know, add new things or maintain it or it's just static. But, it's, it's more, uh, interesting in terms of, like we were talking front end where it's, as somebody who's learning frontend development who is new and you get to see how, backbone worked or how Knockout worked how grunt and gulp worked.

It, it's like the kind of thing where it's like, Oh, okay, like, this is interesting, but let us not use this again. Right?

[00:36:53] Victor: Yeah. Yeah. Right. But I also don't need this, and I will never again

[00:36:58] Jeremy: yeah, yeah. It's, um, but you do definitely see the, the parallels, right? Like you were saying where you had your, your Bower and now you have NPM and you had Grunt and Gulp and now you have many choices

[00:37:14] Victor: Yeah.

[00:37:15] Jeremy: yeah. I, I think having he history context, you know, it's interesting and it can be helpful, but if somebody was. Came to me and said hey I want to learn how to build websites. I get into front end development. I would not be like, Okay, first you gotta start moo tools or GWT.

I don't think I would do that but it I think at a academic level or just in terms of seeing how things became the way they are sure, for sure it's interesting.

[00:37:59] Victor: Yeah. And I, I, think another thing I don't remember who asked or why, why I had to think of this lately. um but it was, knowing the differentiators between other technologies is also extremely helpful right? So, What's the difference between ES build and SWC, right? Again, we're, we're, we're leaning heavy front end, but you know, just like these, uh, sorry for context, of course, it's not everyone a front end developer, but these are two different, uh, build tools, right?

For, for JavaScript, right? Essentially you can think of 'em as transpilers, but they, I think, you know, I think they also bundle like, uh, generally I'm not exactly sure if, if ESbuild will bundle as well. Um, but it's like one is written in go, the other one's written in Rust, right? And sort of there's, um, there's, in addition, there's vite which is like vite does bundle and vite does a lot of things.

Like, like there's a lot of innovation in vite that has to have to do with like, making local development as fast as possible and also getting like, you're sort of making sure as many things as possible are strippable, right? Or, or, or tree shakeable. Sorry, is is is the better, is the better term. Um, but yeah, knowing, knowing the, um, the differences between projects is often enough to sort of make it less confusing for me.

Um, as far as like, Oh, which one of these things should I use? You know, outside of just going with what people are recommending. Cause generally there is some people with wisdom sometimes lead the crowd sometimes, right? So, so sometimes it's okay to be, you know, a crowd member as long as you're listening to the, to, to someone worth listening to.

Um, and, and so yeah, I, I think that's another thing that is like the mark of a good project or, or it's not exclusive, right? It's not, the condition's not necessarily sufficient, but it's like a good projects have the why use this versus x right section in the Readme, right? They're like, Hey, we know you could use Y but here's why you should use us instead.

Or we know you could use X, but here's what we do better than X. That might, you might care about, right? That's, um, a, a really strong indicator of a project. That's good cuz that means the person who's writing the project is like, they've done this, the survey. And like, this is kind of like, um, how good research happens, right?

It's like most of research is reading what's happening, right? To knowing, knowing the boundary you're about to push, right? Or try and sort of like push one, make one step forward in, um, so that's something that I think the, the rigor isn't in necessarily software development everywhere, right?

Which is good and bad. but someone who's sort of done that sort of rigor or, and like, and, and has, and or I should say, has been rigorous about knowing the boundary, and then they can explain that to you. They can be like, Oh, here's where the boundary was. These people were doing this, these people were doing this, these people were doing this, but I wanna do this.

So you just learned now whether it's right for you and sort of the other points in the space, which is awesome. Yeah. Going to your point, I feel like that's, that's also important, it's probably not a good idea to try and get everyone to go through historical artifacts, but if just a, a quick explainer and sort of, uh, note on the differentiation, Could help for sure. Yeah. I feel like we've skewed too much frontend. No, no more frontend discussion this point.

[00:41:20] Jeremy: It's just like, I, I think there's so many more choices where the, the mental thought that has to go into, Okay, what do I use next I feel is bigger on frontend.

I guess it depends on the project you're working on but if you're going to work on anything front end if you haven't done it before or you don't have a lot of experience there's so many build tools so many frameworks, so many libraries that yeah, but we

[00:41:51] Victor: Iterate yeah, in every direction, like the, it's good and bad, but frontend just goes in every direction at the same time Like, there's so many people who are so enthusiastic and so committed and and it's so approachable that like everyone just goes in every direction at the same time and like a lot of people make progress and then unfortunately you have try and pick which, which branch makes sense.

[00:42:20] Jeremy: We've been kind of talking about, some of your experiences with a few things and I wonder if you could explain the the context you're thinking of in terms of the types of projects you typically work on like what are they what's the scale of them that sort of thing.

[00:42:32] Victor: So I guess I've, I've gone through a lot of phases, right? In sort of what I use in in my tooling and what I thought was cool. I wrote enterprise java like everybody else. Like, like it really doesn't talk about it, but like, it's like almost at some point it was like, you're either a rail shop or a Java shop, for so many people.

And I wrote enterprise Java for a, a long time, and I was lucky enough to have friends who were really into, other kinds of computing and other kinds of programming. a lot of my projects were wrapped around, were, were ideas that I was expressing via some new technology, let's say. Right?

So, I wrote a lot of haskell for, for, for a while, right? But what did I end up building with that was actually a job board that honestly didn't go very far because I was spending much more time sort of doing, haskell things, right? And so I learned a lot about sort of what I think is like the pinnacle of sort of like type development in, in the non-research world, right?

Like, like right on the edge of research and actual usability. But a lot of my ideas, sort of getting back to the, the ideas question are just things I want to build for myself. Um, or things I think could be commercially viable or like do, like, be, be well used, uh, and, and sort of, and profitable things, things that I think should be built.

Or like if, if I see some, some projects as like, Oh, I wish they were doing this in this way, Right? Like, I, I often consider like, Oh, I want, I think I could build something that would be separate and maybe do like, inspired from other projects, I should say, Right? Um, and sort of making me understand a sort of a different, a different ecosystem.

but a lot of times I have to say like, the stuff I build is mostly to scratch an itch I have. Um, and or something I think would be profitable or utilizing technology that I've seen that I don't think anyone's done in the same way. Right? So like learning Kubernetes for example, or like investing the time to learn Kubernetes opened up an entire world of sort of like infrastructure ideas, right?

Because like the leverage you get is so high, right? So you're just like, Oh, I could run an aws, right? Like now that I, now that I know this cuz it's like, it's actually not bad, it's kind of usable. Like, couldn't I do that? Right? That kind of thing. Right? Or um, I feel like a lot of the times I'll learn a technology and it'll, it'll make me feel like certain things are possible that they, that weren't before.

Uh, like Rust is another one of those, right? Like, cuz like Rust will go from like embedded all the way to WASM, which is like a crazy vertical stack. Right? It's, that's a lot, That's a wide range of computing that you can, you can touch, right? And, and there's, it's, it's hard to learn, right? The, the, the, the, uh, the, the ramp to learning it is quite steep, but, it opens up a lot of things you can write, right?

It, it opens up a lot of areas you can go into, right? Like, if you ever had an idea for like a desktop app, right? You could actually write it in Rust. There's like, there's, there's ways, there's like is and there's like, um, Tauri is one of my personal favorites, which uses web technology, but it's either I'm inspired by some technology and I'm just like, Oh, what can I use this on?

And like, what would this really be good at doing? or it's, you know, it's one of those other things, like either I think it's gonna be, Oh, this would be cool to build and it would be profitable. Uh, or like, I'm scratching my own itch. Yeah. I think, I think those are basically the three sources.

[00:46:10] Jeremy: It's, it's interesting about Rust where it seems so trendy, I guess, in lots of people wanna do something with rust, but then in a lot of they also are not sure does it make sense to write in rust? Um, I, I think the, the embedded stuff, of course, that makes a lot of sense.

And, uh, you, you've seen a sort of surge in command line apps, stuff ripgrep and ag, stuff like that, and places like that. It's, I think the benefits are pretty clear in terms of you've got the performance and you have the strong typing and whatnot and I think where there's sort of the inbetween section that's kind of unclear to me at least would I build a web application in rust I'm not sure that sort of thing

[00:47:12] Victor: Yeah. I would, I characterize it as kind of like, it's a tool toolkit, so it really depends on the problem. And think we have many tools that there's no, almost never a real reason to pick one in particular right?

Like there's, Cause it seems like just most of, a lot of the work, like, unless you're, you're really doing something interesting, right?

Like, uh, something that like, oh, I need to, I need to, like, I'm gonna run, you know, billions and billions of processes. Like, yeah, maybe you want erlang at that point, right? Like, maybe, maybe you should, that should be, you know, your, your thing. Um, but computers are so fast these days, and most languages have, have sort of borrowed, not borrowed, but like adopted features from others that there's, it's really hard to find a, a specific use case, for one particular tool.

Uh, so I often just categorize it by what I want out of the project, right? Or like, either my goals or project goals, right? Depending on, and, or like business goals, if you're, you know, doing this for a business, right? Um, so like, uh, I, I basically, if I want to go fast and I want to like, you know, reduce time to market, I use type script, right?

Oh, and also I'm a, I'm a, like a type zealot. I, I'd say so. Like, I don't believe in not having types, right? Like, it's just like there's, I think it's crazy that you would like have a function but not know what the inputs could be. And they could actually be anything, right? , you're just like, and then you have to kind of just keep that in your head.

I think that's silly. Now that we have good, we, we have, uh, ways to avoid the, uh, ceremony, right? You've got like hindley Milner type systems, like you have a way to avoid the, you can, you know, predict what types of things will be, and you can, you don't have to write everything everywhere. So like, it's not that.

But anyway, so if I wanna go fast, the, the point is that going back to that early, like the JS ecosystem goes everywhere at the same time. Typescript is excellent because the ecosystem goes everywhere at the same time. And so you've got really good ecosystem support for just about everything you could do.

Um, uh, you could write TypeScript that's very loose on the types and go even faster, but in general it's not very hard. There's not too much ceremony and just like, you know, putting some stuff that shows you what you're using and like, you know, the objects you're working with. and then generally if I wanna like, get it really right, I I'll like reach for haskell, right?

Cause it's just like the sort of contortions, and again, this takes time, this not fast, but, right. the contortions you can do in the type system will make it really hard to write incorrect code or code that doesn't, that isn't logical with itself. Of course interfacing with the outside world. Like if you do a web request, it's gonna fail sometimes, right?

Like the network might be down, right? So you have to, you basically pull that, you sort of wrap that uncertainty in your system to whatever degree you're okay with. And then, but I know it'll be correct, right? But and correctness is just not important. Most of like, Oh, I should , that's a bad quote. Uh, it's not that correct is not important.

It's like if you need to get to market, you do not necessarily need every single piece of your code to be correct, Right? If someone calls some, some function with like, negative one and it's not an important, it's not tied to money or it's like, you know, whatever, then maybe it's fine. They just see an error and then like you get an error in your back and you're like, Oh, I better fix that.

Right? Um, and then generally if I want to be correct and fast, I choose rust these days. Right? Um, these days. and going back to your point, a lot of times that means that I'm going to write in Typescript for a lot of projects. So that's what I'll do for a lot of projects is cuz I'll just be like, ah, do I need like absolute correctness or like some really, you know, fancy sort of type stuff.

No. So I don't pick haskell. Right. And it's like, do I need to be like mega fast? No, probably not. Cuz like, cuz so I don't necessarily don't necessarily need rust. Um, maybe it's interesting to me in terms of like a long, long term thing, right? Like if I, if I'm think, oh, but I want x like for example, tight, tight, uh, integration with WASM, for example, if I'm just like, oh, I could see myself like, but that's more of like, you know, for a fun thing that I'm doing, right?

Like, it's just like, it's, it's, you don't need it. You don't, that's premature, like, you know, that's a premature optimization thing. But if I'm just like, ah, I really want the ability to like maybe consider refactoring some of this out into like a WebAssembly thing later, then I'm like, Okay, maybe, maybe I'll, I'll pick Rust.

Or like, if I, if I like, I do want, you know, really, really fast, then I'll like, then I'll go Rust. But most of the time it's just like, I want a good ecosystem so I don't have to build stuff myself most of the time. Uh, and you know, type script is good enough. So my stack ends up being a lot of the time just in type script, right? Yeah.

[00:52:05] Jeremy: Yeah, I think you've encapsulated the reason why there's so many packages on NPM and why there's so much usage of JavaScript and TypeScript in general is that it, it, it fits the, it's good enough. Right? And in terms of, in terms of speed, like you said, most of the time you don't need of rust.

Um, and so typescript I think is a lot more approachable a lot of people have to use it because they do front end work anyways. And so that kinda just becomes the I don't know if I should say the default but I would say it's probably the most common in terms of when somebody's building a backend today certainly there's other languages but JavaScript and TypeScript is everywhere.

[00:52:57] Victor: Yeah. Uh, I, I, I, another thing is like, I mean, I'm, of ignored the, like, unreasonable effectiveness of like rails Cause there's just a, there's tons of just like rails warriors out there, and that's great. They're they're fantastic. I'm not a, I'm not personally a huge fan of rails but that's, uh, that's to my own detriment, right?

In, in some, in some ways. But like, Rails and Django sort of just like, people who, like, I'm gonna learn this framework it's gonna be excellent. It most, they have a, they have carved out a great ecosystem for themselves. Um, or like, you know, even php right? PHP and like Laravel, or whatever. Uh, and so I'm ignoring those, like, those pockets of productivity, right?

Those pockets of like intense productivity that people like, have all their needs met in that same way. Um, but as far as like general, general sort of ecosystem size and speed for me, um, like what you said, like applies to me. Like if I, if I'm just like, especially if I'm just like, Oh, I just wanna build a backend, Like, I wanna build something that's like super small and just does like, you know, maybe a few, a couple, you know, endpoints or whatever and just, I just wanna throw it out there.

Right? Uh, I, I will pick, yeah. Typescript. It just like, it makes sense to me. I also think note is a better. VM or platform to build on than any of the others as well. So like, like I, by any of the others, I mean, Python, Perl, Ruby, right? Like sort of in the same class of, of tool.

So I I am kind of convinced that, um, Node is better, than those as far as core abilities, right? Like threading Right. Versus the just multi-processing and like, you know, other, other, other solutions and like, stuff like that. So, if you want a boring stack, if I don't wanna use any tokens, right?

Any innovation tokens I reach for TypeScript.

[00:54:46] Jeremy: I think it's good that you brought up. Rails and, and Django because, uh, personally I've done, I've done work with Rails, and you're right in that Rails has so many built in, and the ways to do them are so well established that your ability to be productive and build something really fast hard to compete with, at least in my experience with available in the Node ecosystem.

Um, on the other hand, like I, I also see what you mean by the runtimes. Like with Node, you're, you're built on top of V8 and there's so many resources being poured into it to making it fast and making it run pretty much everywhere. I think you probably don't do too much work with managed services, but if you go to a managed service to run your code, like a platform as a service, they're gonna support Node.

Will they support your other preferred language? Maybe, maybe not,

You know that they will, they'll be able to run node apps so but yeah I don't know if it will ever happen or maybe I'm just not familiar with it, but feel like there isn't a real rails of javascript.

[00:56:14] Victor: Yeah, you're, totally right.

There are, there are. It's, it's weird. It's actually weird that there, like Uh, but, but, I kind of agree with you. There's projects that are trying it recently. There's like Adonis, um, there is, there are backends that also do, like, will do basic templating, like Nest, NestJS is like really excellent.

It's like one of the best sort of backend, projects out there. I I, I but like back in the day, there were projects like Sails, which was like very much trying to do exactly what Rails did, but it just didn't seem to take off and reach that critical mass possibly because of the size of the ecosystem, right?

Like, how many alternatives to Rails are there? Not many, right? And, and now, anyway, maybe let's say the rest of 'em sort of like died out over the years, but there's also like, um, hapi HAPI, uh, which is like also, you know, similarly, it was like angling themselves to be that, but they just never, they never found the traction they needed.

I think, um, or at least to be as wide, widely known as Rails is for, for, for the, for the Ruby ecosystem, um, but also for people to kind of know the magic, cause. Like I feel like you're productive in Rails only when you imbibe the magic, right? You, you, know all the magic context and you know the incantations and they're comforting to you, right?

Like you've, you've, you have the, you have the sort of like, uh, convention. You're like, if you're living and breathing the convention, everything's amazing, right? Like, like you can't beat that. You're just like, you're in the zone but you need people to get in that zone. And I don't think node has, people are just too, they're too frazzled.

They're going like, there's too much options. They can't, it's hard to commit, right? Like, imagine if you'd committed to backbone. Like you got, you can't, It's, it's over. Oh, it's not over. I mean, I don't, no, I don't wanna, you know, disparage the backbone project. I don't use it, but, you know, maybe they're still doing stuff and you know, I'm sure people are still working on it, but you can't, you, it's hard to commit and sort of really imbibe that sort of convention or, or, or sort of like, make yourself sort of breathe that product when there's like 10 products that are kind of similar and could be useful as well.

Yeah, I think that's, that's that's kind of big. It's weird that there isn't a rails, for NodeJS, but, but people are working on it obviously. Like I mentioned Adonis, there's, there's more. I'm leaving a bunch of them out, but that's part of the problem.

[00:58:52] Jeremy: On, on one hand, it's really cool that people are trying so many different things because hopefully maybe they can find something that like other people wouldn't have thought of if they all stick same framework. but on the other hand, it's ... how much time have we spent jumping between all these different frameworks when what we could have if we had a rails.

[00:59:23] Victor: Yeah the, the sort of wasted time is, is crazy to think about it uh, I do think about that from time to time. And you know, and personally I waste a lot of my own time. Like, just, just recently, uh, something I've working on, for a long time. I came back to it after just sort of leaving it on the shelf for a while and I was like, You know what?

I should rewrite this in rust. I, I really should. and so I talked myself into it, and I'm like, You know what? It's gonna be so much easier to deploy. I'm just gonna have one binary. I'm not gonna have to deal with anything else. I'm just like, it'll be, it'll be so much better. I'll, I'll be a lot more confident in the code I write.

And then sort of going through it and like finishing this a, a chunk of it and the kind of project it is, is like I'll have a lot of sort of, different services, right? That, that, that sort of do a similar thing, but a sort of different flavor of a, of a thing, if that makes sense. And I know that I can just go back to typescript on the second one, right?

Like, I'm, I'm doing one and I'm just like, and that's what I've decided to do. Cause I'm just like, Yeah, no, this doesn't make any sense. like, I'm spending way too much time, um, when the other thing is like, is good enough. and like, I think maybe just if you feel that, if you can, like, don't know if you stay, stay aware of just like, Oh, how much friction am I encountering and maybe I should switch. Like if you know rails and you know, typescript, you should probably use Rails, if you're bought into the magic of Rails, right? And, and of course Rails is also another thing that has always has great support from, Platforms as service companies. Rails is always gonna be, you know, have great support right, Because it's just one of those places where it's so nice and cozy that, you know, people who use it are just like, the people who don't want to think about the server underneath.

[01:01:03] Jeremy: I think that combination is really powerful. Like you were talking earlier about working with Kubernetes and learning how all that works and how to run a database and all that. And if you think about the Heroku experience, right? You create your, your Rails app. You tell Heroku I want a database and then you push it. you don't have to worry about pods or Docker or any of that. They take care of all of it for you so I think that certainly there's a value to going deeper and, and learning about how to host things yourself and things like that but I can totally understand if you have the money, uh, especially if for a business would say I don't wanna do this type of ops work I don't want to learn how to set up a cluster just want to push it to a heroku and be done with it.

[01:02:00] Victor: Yeah, You don't, no one gives you an award for learning how to, like wrangle LVM right? No no gives you that. They just like, you know you either make it to market or you don't. Uh, and it's like, uh, like I, mean, I'd love to hear about what you sort optimize but I feel like all, it's all about what you want to optimize for.

Like, are you optimizing for time to market? Are you optimizing for, a code base that people won't be able to mess up later? Right? Like a lot of just, you know, seed stage startups or like just early startups or big companies, like, it doesn't matter. We'll rewrite anyway. Right? That like the eBay example was a great, was a great sort of indication of that like it will get rewritten. So maybe it doesn't make sense. Maybe it's silly to, to optimize for strong code base the beginning. Um,

[01:02:45] Jeremy: I think it, uh, at the beginning, especially if you don't have an established audience, like you're not getting any money, then pick something that the team knows and that, you know, um, or at least the majority does, because that, I think, makes the biggest difference in speed. Speed. Because let's, let's say you, you were giving an example of I would use haskell if I need to be correct, and I would use rust if I need to be fast. but if you are picking something everybody knows and you don't have some very specific requirement, for the most part, if you're using something you already know, it's going to be built faster, it's going to be easier to read and maintain and it'll probably be more correct just because you're more familiar with that whole...

[01:03:50] Victor: So I, I agreed right up until the last point I feel like correctness is one of those, if you use a tool that lets you be too sloppy you can't stop people from being sloppy Right? Uh, like I think, and this is actually something I was thinking of earlier today, is like, I think writing good code is either people being disciplined or better systems, and of course it doesn't matter in every case, Right. and so like, so in cases where like, it, it's just not that important and, and it's better to just let it error and then someone just goes and like, fixes it, right? But if you do that too long, you get you can get spaghetti, right? You can get either spaghetti or you can get a code base that's suffering from a lot of technical debt. Uh, and it, it won't be a problem early on, but when it is, it's a big problem, right? and can drain a lot of, a lot of time. but 99% of the time, I agree.

You don't need anything other than like TypeScript or Rails or like Django, or you could, you could use perl if you want php obviously, like, you know, Right? Like, you, you could get very far, very fast with those. And often it's like, not even necessary to go anywhere else. But the only little thing I'd say is just like, I find that it's, It's so hard to be correct if you're not getting any help from your compiler, right?

Like, for me, at the very least, right? Like, if you're not getting any help from the language, it's so hard to like, write stuff. That's correct. That doesn't ship with bugs in it. Right? There was, um, there's a whole period of time where everyone was getting really excited about writing tests that were like, Oh, make sure to like, write a test with negative one.

Right? Like, just like, you know, like the next level test stuff was just like, Oh, but what if you like, you know, you gotta, I mean, and this is true, right? You have to think like, how could your system possibly be broken, right? Like, like thinking of how to break a system is hard. It's different from thinking of how to build a system, right?

It's a different skill set. But like some of those things you should really just be protected from, I think a big, uh, moment in my career was like seeing option I, I'd been lucky enough to have friends that were like exploring with stuff like, um, like haskell, super early on and like common lisp and sort of like, and reading Hacker News, shout out to AJ cuz like, that's his name.

But like, there's a, there's a person that was like, just kind of like, sort of like exploring the frontier. And then I would like hear a little bit and be like, Ooh, that's interesting. And like, kind of like, take a look, but option coming in. Like, I think Java 8 was like, wait a second option should be everywhere, right?

Because it's like NPEs Null Pointer Exceptions should almost, like, they shouldn't really be a thing, right? Like, and then you are like, Oh, wait, option should be here, but that means it has to be there and it kind of like, it just infects everything. And normally stuff that infects everything is bad, right?

You're just like, Oh no, this is bad. I better take it out. But then you're like, Wait a second. But option is right because I don't know if that thing is not a null actually right. Like the language doesn't give me that. So then, you know, you kind of stumble upon non nullable types, right? As a language feature.

And so it's, it's really hard to quantify, but I think things like that can make a, a, a, a worthwhile difference in, base choice of language as far as correctness goes and in preventing. But I also know that like, people are just blazing by in rails like, just like absolutely without the care in the world, and they're doing great and they, like, they have the, all the integrations and it's all, it's working out great for them.

But I personally just like, I'm just like, I have to, I feel the compulsion. I'm just like, I feel the compulsion and I'm just like, I need to at least do typescript and then I have a little bit more protection from myself. Uh, and then I can, and then I can go on. And it's also, it's like, it's also an excuse for me to like, write less tests as well.

Like a little bit like, you know, I'm just like, you know, I, I, I, there's, there's some, there's some, Assurance that I don't have to like go back and actually write that negative one test like the first time, Right. It in practice, like technically you, you, you should, cuz like, you know, at run time it's, it is a completely different world, right?

Like typescript is like a compile time thing thing. But if you, if you write your types well enough, you, you, you're, you're protected from some amount of that. And I find that that helps me. Personally. So, so that's the, that's the one place I'm just like, ah, I do like that correctness stuff,

[01:08:13] Jeremy: Yeah. Yeah. I, I think like, I, I do agree in a general sense with languages that have static type checking where, you know, at compile time whether something can even run, that can make a big difference. Maybe correctness wasn't the right word, but I you work in an ecosystem, whether Rails or Django or something else, you kind of know all of the, the gotchas, I guess? if you're, if you're, let's say you're building a product with Haskell and you've never Haskell before, I feel like yes, you have a lot of strong guarantees from the type system, but there are going to be things about the language or the ecosystem that you, you'll just miss just because you haven't been in it.

And I think that's what I meant by correctness in that you're going to make mistakes, either logical mistakes or mistakes in structure, right? Because if you, if you think about a Rails app, one of the things that I think is really powerful is that you can go to a company day one that uses rails and if they haven't done anything too crazy, you have a general sense of where things are some extent.

And when you're building something from scratch in a language and ecosystem you don't understand, um there's just so many scrapes and cuts you have to get before you're proficient right Um, so I, so I think that is more what I was thinking of yeah.

[01:10:01] Victor: Oh yeah. I, I'd fully agree with that yeah I fully agree with that. you don't know what you, what you don't know right. When you, uh, when you start, um especially with a new ecosystem, right because you just, everything's hard. You have to go figure out everything you have to go try and decide between two libraries that do similar things despite, you know, like knowing how it's done in another language.

But you gotta like figure out how it's done in this language, et cetera. But it's like, well, you know, at least decisions are easier elsewhere sometimes, right? Like, like in the database level or like, maybe the infrastructure level or, but yeah, I, I totally get that. It's just, most of the time you just want to go with that, uh, that faster, that faster thing, you know, Feels funny to say of course.

Cuz I never do this (laughs) . for I never, like all my, all my projects on, on essentially crazy stacks. But, but I, I try and I try and be mindful about is how much of my toil right now is even a good idea, right? Like, depending on my goals. Again, like going back to like that, it depends on what you're optimizing for right if you're optimizing for learning or like getting a really good fundamental understanding of something, then yeah, sure. If you're optimizing for like getting to market? Sure. that's a different answer. If you're, if you are optimizing for, like, being able to hire developers to work alongside you, right?

Like making it easy to hire teammates in the future, that's a different set of languages maybe. so yeah, I don't know. I kind of give the, the weasel answer, which is, you know, it depends , hmm right? But, um, yeah.

[01:11:32] Jeremy: Especially if you're, you're learning or you're doing personal projects for yourself, then yeah, if you, if you want to know how to use haskell better, then yeah, go for it. Use, use haskell, um, uh, or use rust and so on. I think another thing I think about is the deployment so if personal you are running a SaaS or you're running something that you deploy internally, then I think something like Rails, Django is totally fine especially if you use a platform as a service, then there's so many resources for you. But if your goal is to give you an example, like Mastodon, right? So we have the whole,twitter substitute thing.

Mastodon is written in Rails and it has a number of dependencies, right? you have to have Sidekiq, which runs the Workers, Elastisearch for search, um, Postgres for the database and Nginx and so on. And for somebody who's running an instance for a bunch of people, totally makes sense, right?

No big deal. where I think it's maybe a little trickier is, and I don't know if this is the intent of, Mastodon or ActivityPub in but some people, they wanna host their own instance, right? Um, rather than signing up for mastodon.social and having a whole bunch of people in one instance, they wanna able to control their instance.

They wanna host it themselves. And I think for that Rails the, the resources that it requires are a little high for that kind of small usage. So, in an example like that, if I wanted something that I wanted to be able to easily give to you and you could host it, then something like a Go or a Rust I think would make a lot of sense you can run the binary, right? And, you don't have to worry about all the things that go around running a Ruby application.

So I think that's something to think about as well. And, and we talked about command line apps as well, right? If you're gonna build a command line app and you want it to run on Windows, well the person on Windows is not gonna have python or ruby so again having it in Go or in Rust makes a lot of sense there so that's another think I would think about who is it going to be given to and who is going to deploy it as well.

[01:14:25] Victor: Yeah. That's um, that's a great point, uh, because it makes me think of sort of explosion of sysadmins writing go when it first came out I, don't know if I imagined this or I think it was real, but like there were just so, uh, up until then, like most sysadmins would be they'd like obviously like get to know their routers or their, you know, their switches and their, you know, their servers and like racking, stacking doing all that stuff.

Languages and like frameworks can unlock a certain group of people or like unblock a certain group of people and like unlock their sort of productivity. So like Ansible was one of those first things that was like really sort of easy to understand and like, Oh, you can imperatively set this machine up.

But a side effect is you get a lot of sysadmins that know Python, right? So like, now a lot of like the sort of black art stuff is accessible to you. Like, or, sorry, I say accessible to you as in accessible to me as the non sysadmin, right? Cause I'm just like, Oh, I can run this like little script this person wrote, uh, in Python and it like, will do all this stuff, right?

That I, I would've never been able to do before. And maybe I learned a little bit more about that, about that system, right? And so I, I, I saw something similar and go where people were writing a bunch of tools that were just really easy to run, right? Really, really easy to run everywhere. Um, and that means easy to download, easy to like, you know, everything's easier and, A lot of hard things got a lot easier, right? Uh, and this is same with Rust. Like, I, I believe that library that most people use is like clap, I've built a few things with Clap and it's like, it gives you excellent, uh, I guess you'd call them affordances or like ability to make a high quality CLI program with very little effort, right?

Uh, and so that means you end up writing really decent binaries, right? With like, good help texts and like reasonable like, you know, options and stuff like that. and then it's really easy to deploy to Windows, right? And like other, other platforms, uh, like you said, you don't have to try and bundle Python or, or whatever else the sort of interpreter class of languages. So yeah, I think that I'd agree that like just languages and, and, and sort of frameworks can, can unlock, easier creation of certain kinds of apps and certain sort of groups of people to share their knowledge or like to, to, to make a, a tool that's more usable by everyone else.

It could be like, kind of like a, multiplicative factor right. Just like, I made this really, really intense Python script, but like now, but to use it, you'd have to like install Python on Windows, like manage your environments, whatever. Like, I don't know if you're using pyenv, maybe you are, maybe you aren't.

Do you get the wheel? Like what, what do you do with that? no, I'll just give you a, executable and you have an executable and then now you can use all the tools that like normally work with an executable or with something that like produces output and it's just faster for everybody and everybody like just, you know, gets more value

[01:17:17] Jeremy: Cool. Well, is there anything else you wanted to, to mention or, or talk about?

[01:17:26] Victor: I don't know. oh yeah, I guess, I guess I could just like say my stack, right? Um, Oh, I, I really love Sveltekit. I've been kind of all in on Sveltekit for the front end for a while now. it feels like I've used, um, I've used nuxt I've used, like, I've used a lot of frameworks, but I'm trying to think of, of frameworks that like, do the, um, like I think, I think a local, if not global maximum for front end development is power of the front end component driven sort of paradigm and server side rendering, right?

Because there's like, what are the big advantages of using something like Rails or like whatever else that, that just, just, that's completely server side is that the pages are fast, the pages are always fast. It's there, but they don't have interactivity. Right. we've taken a weird path to get here and it looks really wasteful and maybe it is really wasteful, but at this point we now have kind of both kind of like glued and like hacked into one thing.

And I think that class of tools is like, is, is is a local maximum, if not, if not global. so, so yeah. So like, there's like next, nuxt, sveltekit. There's, there's other solutions. There's Astro like there's, there's, which is Astro's really recent.

Um, there's Ember, right? Shout out to Ember right. People, people still pushing that forward as well, which is great. but yeah, so I, I've SvelteKit also, and this is again in like direct conflict to what we've talked about this entire time, which is like, use established things that get you there fast. but like SvelteKit isn't at 1.0 yet, but it is excellent.

Like, I, I am more productive in it than I ever was with Nuxt. Um, and again, Nuxt has changed a lot since I've, you know, sort of made the switch and like, you know, maybe I, maybe it deserves a rethink and like re revisiting it, but I'm so productive with SvelteKit. I just, like, I don't mind. And like half the time I'll just, I'll just use SvelteKit, uh, and my database and then be done like no middle layer.

So like no API layer. I just like stuff it into the SvelteKit app, and then use, postgres on the backend and then I'm done and, and I feel like that's been really productive, you know, again, this is outside of the, the world where you use a rails or whatever.

Um, so yeah. So that's, that's been my stack for a lot of the products I've done recently. so yeah, if I, if I had to, I guess say something about like front end, like give SvelteKit a try. It's pretty good. Uh, and obviously like databases, just use Postgres. Stop using other things. don't, don't do that.

And like infrastructure stuff, I think Kubernetes is cool, but you probably don't need it.

Uh, I like Pulumi. I feel like no one, like I've been recommending Pulumi for a long time over Terraform. So it's just like DSLs have limits and those limits are a bad idea to have when you, like, the rest of your time is spent with no limits, right.

With like just general computing. Right. So, and Pulumi is just like, you can do your general computing and infrastructure stuff too, and it's, I feel like it's, it's always, you know, been better, but, but anyway, yeah. That's like, that's kind of my stack

[01:20:26] Jeremy: So pulumi is um, it's a way to provision infrastructure, but is there a language for it?

[01:20:35] Victor: It integrates with the language you use. And Terraform has caught up in that respect, right? Cause you have that now. but how it works is still slightly different right because if I remember correctly they still generate a Terraform file and execute that file it's, still a little bit different, which is like, it's, and it's AWS' CDK as well, right? So, so the world that's sort of caught up to where, what Pulumi is doing. But you know, I, I think it was like, I don't know, terraform 12 or something like that where it was just like, we've added better for loops.

I'm like, okay, at this point, like this is, that's the indication of like, you now need general, like you, you, you're now the dsl, like DSLs can have for loops, but it's like if you're starting to like pluck, you know, general computing languages, we have really good general computing languages right there.

You know, that was kind of my, indication to be like, okay, I Pulumi is the way, uh, for me, um again, This doesn't matter cuz like at work you're gonna, you're probably using Terraform, like, you know, just every, just like, there's, you know, everyone's using certain tools and you don't have a choice. Sometimes you have to use certain tools, but I personally have my, uh, have my pet pet likes and stuff.

[01:21:49] Jeremy: How about for caching?

[01:21:53] Victor: Uh, KeyDB. I go into rabbit holes a lot. I call myself a yak shaver cause I shave a lot of yaks and it doesn't benefit anyone really except for me most of the time. But there are lots of redis alikes out there. And the best feature set is right now KeyDB.

There's like, there's, there's one called Tendis there's, um, which is like, um, a little bit like more distributed. There's like SSdb, which will do it off disk, which is, I think because we have such fast disks now, it's good enough for a bunch of applications. Right. Especially if, like, if your alternative was like, you know, a much farther away sort of, you know, calls the farther away service.

There's Pelican out of Twitter, so they have a whole, they've got like a caching, it's like a framework kind of, right? Like they, they, they've sort of built a kernel of like really interesting caching, um, originally like sort of to serve their memcache workloads and stuff. But it's kind of grown in like, in lots of directions as well.

KeyDB is, was the most compelling and still is to, to me for, from a resource usage. Multi threading, obviously, like it is multi threaded, so it is now, it's it's way faster. Right. Um, and also like it offers flash storage, using the SSD when you can. And, and that's, Those are game changers. Right. And, and of course all the, you know, usual and clusters, right? It clusters without you, you know, paying Redis Labs any money or whatever. Um, which is, which is fine. You know, people opensource projects and, and businesses have to, you know, make money. That is a thing. But yeah, KeyDB is, is my, uh, I, whenever I'm about to spin up redis, I don't, and I spin up uh, also they were bought by Snap or bought hell of an aquihire.

I think if, if you, cuz I think sometimes that has like a negative pejorative context to it. Like you didn't, like, oh, you didn't make a billion dollars, you just got aquihired or whatever. But hell of an aquihire. Um, and, and so all of it's like free now, like all of the, like all the, the premium features are becoming free.

And I'm like, this is, this is like, I won the lottery, right? Cause um, you know, you get all the, the, the awesome stuff outta KeyDB for, for free. Um, so yeah, Caching KeyDB. I do KeyDB.

[01:24:11] Jeremy: KeyDB. I haven't heard of that one.

[01:24:14] Victor: Oh yeah, it's, um, yeah it's like keydb.dev.

[01:24:17] Jeremy: Oh KeyDB.

[01:24:18] Victor: It's awesome. They did YC.

[01:24:23] Jeremy: Oh, it uses the Redis wire protocol

[01:24:28] Victor: Like Redis is like, is the leader, unless you're using memcached for some other reason and then like obvious like have to use memcached, whatever. But, um, but yeah, Redis is like the sort of app external cache dujour for basically everywhere and when I wanna run Redis, I run KeyDB.

[01:24:51] Jeremy: And for search, do you just in search in postgres or turn to something else?

[01:24:59] Victor: Oh, you've asked a dangerous question. So I recently did some, uh, some writing. So I, I, I, so recently, um, like this year, I've branched out and done a little bit more experiments in writing for companies that have an interesting you know developer product or sometimes where like, you know, my sort of like interest and stuff just aligned, right?

So like, uh, I've worked with, um, OCV Open Core Ventures, um, which is on Sid, if you know Sid from GitLab, That's his, um, his, uh, his fund, uh, and then also Supabase, which does, um, you know, awesome stuff on Postgres. And, you know, it's fully open source that, that company is amazing as well. and search has been a thing.

So Postgres has full text search, SQLite has full text search. They both have it built in. they're very good and I think great approximations for like V1s at the very least, maybe even farther. because a lot of the time if someone's in your product and they're searching something's wrong usually, right?

Like, like, unless you have vast gobs of data, like this means your UX is not good enough for something, right? Um, but um, that said, I almost always start with Postgres full text search. and then I have the, um, there, there are, there's a huge crop of new search engines, right? So if we consider open search to be new, as in like the fork of Amazon from, from Elastic search, there's that, there's a project called Meilisearch.

There's a project called TypeSense. Um, there's Sonic, uh, there's like, um, Tantivy, uh, which which is like the, can be under net. There's like quickwit, which is like shifted to logging a little bit. Like that's their like, path to sort of, um, profitability. I, I think, I think they, they sort of shifted a little bit.

there's a bunch more that I'm, I'm missing. And so that's what I wrote about and had a lot of fun writing about for Supabase very recently. And this was, um, this was something I just had written down, right? So I was just like, I need to do a blog post. And I, I write on my blog a lot, so I'm just like, Alright.

I write up yak shaves to my blog a lot and I'm, and I was just like, I need to try and just use some of these, right? Because there's so many and they all look pretty good. And they have to have learned, like the golden standard is like, uh, solr, right? Lucene, right? Like, it's like, it's like solr and lucene and like, you know, that or whatever.

And, but a lot of times you just don't need, like, you don't necessarily need every single feature of lucene. And so there are so many new projects that are look decent. Uh, and so I got a chance to, to to sort of, I was paid to do some of that experimentation, which is awesome cause I would've done it anyway.

But it's nice to be paid to do it, on search stuff. and I actually have a project I like, I liked that so much that I made a project to try and get a more representative dataset. So I started a site called podcastsaver.com I use the podcast index, right?

Which has a lot of sort of like podcast information. And, know, if someone doesn't know about podcasts, there's like an RSS feed, right? Which is kind of like a, you can think of an XMLy uh, format where people like podcasts are just a publish of a RSS feed and the RSS feed has links to where to download the actual files, right?

So it's really open, right? Um, and so I used, um, that the structure of that to index, in multiple search engines at once, right? Running alongside each other, the information from the podcast index. this is was fun for me cuz it was like an extension of that other project. It was a really good way to test them against each other.

Very fast, right? Like, or, or like in real time. So like right now, um, if you go to podcastsaver.com and you search a podcast, it will go to one of the search engines randomly. So right now there is Postgres FTS, plus Trigram. So, so there is, um, there's also a thing called, um, Tri Trigram searches another really good like, um, sort of basic search feature.

And there's Meilisearch. So both of those are implemented right now. And there's actually a little nerds link, right? Which will show you how many, how many podcasts there are, right? So, so how many documents, essentially you can kind of assume there are. Um, and it'll show you how fast each search engine did, right?

At sort of returning an answer. Now it's a little bit of a problem because I don't you need to do some manual work to figure out whether the answer was good, right? If you're really fast but give a garbage answer, that's not good. But in general, like, so you can, you can actually use the nerd tab to control, You can like switch to only Postgres, uh, and I do that with like cookies and you can, um, you can force it to go to Postgres and you can see the quality of the answers for yourself.

But they're generally, it's pretty good from both. Like it's not, it's not terrible from, from both. So I'm, I'm kind of like glossing over that part right now, but you can see the performance and it's actually, it's like meilisearch does a great job, right? Um, and you know, there's obviously some complexity in running another service and there's some other caveats and stuff like that, but it's, it's pretty good.

And over time, I want to add more. So I wanna add, you know, at the very least typesense, like people have reached out, so like, I, I made a, a comment on this, on Hacker news and like there's a long road ahead for that and like, I honestly shouldn't be working on that cuz I have other things that I'm like, you know, I, I'm really should be full time on.

Um, But like, that's a thing I'm trying to, I'm trying to do sort of grow in the future a little bit more cuz it's just like, it's so fascinating to, to like, everything's so cheap. Like computer is cheap, you know, like there's awesome projects out there with like really advanced functionality that we can just run, like, for free or not, not for free, but like, you don't have to do the work to like build a search engine.

There's like five out there. So all you, the only thing that's missing is like knowing which one's the best fit for you and like, you can just find that out. Yeah.

[01:30:46] Jeremy: Are there any I guess early conclusions in terms of you like Meilisearch because of X or?

[01:30:53] Victor: Yeah, the, the super supabase blog post, uh, was, was a little bit better in terms of, uh, takeaways. I can say that from like meilisearch is definitely faster like meilisearch was harder for me to load and like it took a, a little bit longer cuz you know, you have to do the network call.

And to be fair, if you choose Postgres, it's in the database. So like, your copying is a lot easier. Like, manipulating stuff is a lot easier. Um, but right now when I look at the stats, like Meilisearch goes way faster. It's like almost always under a hundred milliseconds, right? And that's including, you know, um, that network, you know, round trip.

Um, but you know, Postgres is like, I don't know, I just, I just, I think it's, I I'm just so, I'm so biased. Like it is not a good idea to ever bet against Postgres, right? Like, obviously meilisearch is be like, it doesn't make sense for Postgres to be better than purpose-built tools. Um, because they are fully focused, right?

Like, they should be, they should be optimal. Cuz they, they, they don't have any other sort of conflicting constraints to think about. But Postgres is very good. It's just like, it's, it's so excellent and it, it keeps moving. Like it keeps getting better. It gets better and better every year, every like, every quarter. It's hard to not bet on it. So I often, So, so, so yeah, so I just, I, if you, I, I would say based on pure performance of podcastserver.com right now, the data lends itself to saying pick meilisearch. unfortunately that data set is incomplete. I don't have typesense up. I don't have all these other like search engines up.

So, so it's, it's, it's limited. there was also, like in the supabase post, you'll see there, there was support for like, um, misspellings and stuff was different among search engines. So there's also that axis as well. But if you happen to be running on Postgres, I really do suggest just, just give Postgres FTS a try, even if it was just Trigram search.

Like even if you just do Trigram search and do like a sort of like fuzzy search bar, cause that's probably like what a V1 would look like. Anyway, try that and then go off and like, you know, and then like, if you need like crazy faceting or like, you know, you know, really advanced features, then jump off.

Uh, but I, I don't know, that's not interesting cause I feel like it already kind of confirms what I think. So I think other people, other people need to need to do this. I need other people to please replicate, uh, and uh, come up with better, better ideas than I have

[01:33:20] Jeremy: but I think that's a good start in, in terms of when you're comparing different solutions, whether it's databases or, I don't know what you call these, but what do you call an elasticsearch?

[01:33:32] Victor: Search engine.

[01:33:34] Jeremy: You go to open source projects or the company websites and they'll have their charts and go we're x times faster than Y.

But I, I think actually having a running instance where they're going against the same data, I think that's, that's helpful really for anyone trying to compare something to for someone having gone through the time. And I think that a lot of other things too not just search engines where you could have hey, I have my system and it's got, uh I don't know five different databases or something like that. I, I'm not sure the logistics of how you would do it,

[01:34:15] Victor: Like with redis. Like just like all the Redis likes, like just all run, run 'em all at the same time. Someone needs to do that

[01:34:26] Jeremy: Could be you.

[01:34:27] Victor: Ahaha no! I do too much! Like the redis thing is obvious, right? Redis is easier, like comparing these redises and there's some great blog posts out there also that like kind of do it. But like a running service is like a really good way of like showing like, oh, this is like, we hit this cache, you know, x times a second with like, and it's like this, like naturally random sort of traffic.

This is how it performed, this is how they performed against each other. These were like the, the resources allotted or whatever. But yeah, that stuffs, that stuffs really cool. I feel like people haven't done it or aren't doing it enough.

[01:35:01] Jeremy: Yeah. I guess thing about, putting together one of these tests as well, especially when you make it live is then you have to spend the time and spend the money to maintain it right and I think, uh, if somebody's not paying you to do it's gotta be uh, Yeah. You gotta want it that bad to put it together.

[01:35:22] Victor: Hey, but you know what? we can go full circle just use Kubernetes,

Its easy if you just use Kubernetes man.

[01:35:33] Jeremy: First you gotta learn... Where, where were we? First start with postgres, kubernetes.

[01:35:42] Victor: Yeah. If you wanna use Kubernetes first, you start with Postgres and... It's like, what?

[01:35:49] Jeremy: So, learn these ten other things first then you can start to build your project.

[01:35:58] Victor: Yeah, it's silly but I know people out there have the knowledge I just feel like it's, it's like, you they just need to do some of this stuff, right? Like, it's just like, they just like need to like, have the idea or just like go, just go try it Uh, and hopefully we, get more of like, in the future.

Just like, cause, cause at some point, like there's gonna be so much choice that you're like, how are you gonna decide? How does anyone decide these days? Right? Like, you know, more people have to dedicate their time to like, trying out different things, but also sharing it. Cause I think just inside companies, you do this, you do the bakeoffs, right?

Everyone does the bakeoffs to try and figure out, you know, within a week or whatever, whether, whether they should use, let's say like Buddy Base or App Smith, right? Like, just like you, just like the rest of the team has no idea what those are, right? But someone, Does the Bakeoff maybe start sharing Bakeoffs?

There it is. There's another app idea. I, I think of a lot of ideas, and this is a, there's another one, right? Just make a site where people can share their bakeoff, like just share their bakeoff results with certain products. And then that knowledge just being available to people is like, is massively, is massively valuable.

And it, it kind of helps, it helps the products that are mentioned because they can figure out what to change, right? it kind of makes the market more efficient, right? In that vague, uh, capitalistic sense where it's like, oh, like then, you know, if everyone has a chance to improve, then we get a better product at the end of the day.

But, um, yeah, I dunno, Hopefully more people more people yak shave, please, more people waste your time. Uh, not waste, uh, use your time to, uh, to yak shave. It's, it's, it's fine.

[01:37:32] Jeremy: Well I think you have something at the end of it sometimes you can yak shave and at the end it's kind of like, well, I, I played with it and oh well.

Versus you having something to show for it.

[01:37:50] Victor: Yeah, that's true. Yeah. I won't talk about all the other projects that went absolutely nowhere.

But, uh, but yeah, I think you always feel selfish if you learn something to, and I should, I should rephrase this like I am definitely a selfish person. Like you know, like, I'm not, this is not altruism, right? It's just like, but at some point it feels like, man, someone should really know this other stuff, right? Like, if you, if you've found something that's like, interesting, like it's, it's like someone should know, cuz someone who's better at it will be like, Oh, like no, this part and this part.

Like, it's like everyone kind of wins. which is, which is awesome. So, I dunno, maybe if more people have that feeling, they'll like, they'll like share some of their stuff and like maybe you do a thing and it doesn't help you, but then someone else comes along and they're like, Oh, because I read this, I know how to do this.

And like, and then if they give that back too, it's, uh, it's pretty awesome. But anyway, that's all pie in the sky,

[01:38:57] Jeremy: I think in general, the fact that you are running a blog and, you know, you do your posts on Hacker News and, and so on. The fact that you're sharing what you've learned, I think it's is super valuable. And I think that goes for anybody who is learning a new technology or working on a problem and you run into issues or things you get stuck on for sure yeah you should share that and the way I've heard described There's always someone on the internet just waiting to tell you why you're wrong.

[01:39:35] Victor: Oh yeah. Yeah.

[01:39:36] Jeremy: And provided that they're right. That can be very helpful. Right?

[01:39:40] Victor: Yeah. Yeah. I, I actually, I love I, I personally like it because if you're a hacker in the, you know, hacker news sense that's excellent. That's like a free compiler right?

It's like a free checker right? If you just sit next to someone who is amazing at X.

And you just start bouncing ideas of like, around X and like how to do whatever it is off of them, you get it compiled.

They're just like, No, you can't do that cuz of X, Y, and Z. And you're like, Oh, okay, great. I've just saved myself like, you know, months of like thinking I could do it and like, now I know I can't do it. And the internet is great cuz it gives you access to like, to those people who are like, Yeah. And knowing it first, but if you realize that like, oh, they've chosen to share some wisdom with me like that, like, you know, or, or like trying to, Right.

Assuming you're correct, Like, even if they're not correct. Um, it's, it's, it's pretty awesome. So, so I personally welcome that. Of course it doesn't feel good to be wrong, right? I don't like that. But, um, I love it when someone like take, took the time to be like, No, your, your view on this is wrong because of this.

Or like, you know, like 99% of the time you don't need that. You should have just done this, right? Cause then I learn, a lot of my posts will have updates at the top. Right. So like when someone, like, you know, when I posted the, the thing about the throat mic to like hack me is people were like, This sounds terrible,

I was like, I didn't think it was that bad, but, uh, but I was like, you know, maybe I, maybe I shouldn't use this, uh, all the time, but it, it, you know, it was, it was like obvious that, oh, I should have, I should have never made the post without including a sample of the audio at the top, right? So like, I like went back and like an update for that and then, and then people like discussing about like, Oh, you should have used a bone conducting mic instead.

Like, and like all this other stuff that I just like didn't think about. I'm like, Oh, awesome.

And then like I update the post I go on with my life, so anyway, more people please do that and don't post it on Medium. Please don't do that. Stop, stop that. If you like, if you, if you write software, do not like, please put it some, put your writing about software somewhere else, unless, I don't know, You have to or something.

[01:41:52] Jeremy: You've reached your article limit.

[01:41:57] Victor: Yeah, yeah. Oh, also shout out to the web archive. The best way to get almost any article, right? I don't think people in the general populace know this?

But like 99% of the time if you're trying to you just go to the web archive.

It's common knowledge for us. Um, but, but it's not Common knowledge for everybody else and it just feels like they're making a lot of stuff available and legally, right. Cuz like, you know, there's like the, the precedent right now I think is, is is in favor of scraping, right? If you make a thing available to the internet, right?

LinkedIn got ruled against a while ago, but like, if you make a thing available to the internet, uh, publicly available without signing in or whatever it is assumed public, right? So it's just like, yeah, whenever I read something I'm just like, ah, article limit. I hop right on. I hop right on archive today.

But, but I just feel like it's like, it's, it's sad that developers put like, put knowledge enmasse into that particular, It's not a trap. Cause I don't, it's like I don't dislike medium, I don't have any necessarily like animosity towards medium, but it's just like we should be the most capable of, putting up something like maintaining our own websites.

Right. If it's like the death of the personal website, why is it dying with developers? Like, we should be the most capable. We have no hope of the regular world putting out websites if, if it's hard for us.

[01:43:32] Jeremy: I, I mean, I think for stuff like medium maybe sometimes it's the, the technical aspect of not wanting to set up your own site but, I think a large part of it is the social aspect. Like with Medium, you have discoverability you have the likes system, if they even call it that. Um, I think that's the same reason why people can be happy to post on twitter, right?

Um, but when it comes to posting on their own blog, it's like well, I post and then nobody comes and sees it, right? Or I don't get the, I don't get the, Well, the thing is too, like, they could be seeing it but you don't get the feedback and you don't get, you don't get the dopamine hit of like, Oh, I got 40 likes on Medium or Twitter or whatever.

And I think that's one of the challenges with personal sites where I totally agree with you. Wish people would do it and do more but I also understand you are on a little bit of an island unless you can get people to come and interact with you.

[01:44:44] Victor: There's another idea, right? Like just, you know, can you build a self hostable, but decentralized by default, medium clone. there's that's like a personal site that you could easily host you know, like, almost like WordPress, like let's say, right? Um, but with the, with enough metrics, with like, with the engagement stuff built in, even though it's not like powering a company essentially, right?

Cause like the incentives behind building in the engagement, like pumping up engagement. Make sense? If you're running a company cuz you like, you know, you're trying to get MAUs up so you can do your next round or like, you know, make more revenue. Wonder if, I don't know. Yeah, it's just like, like that is a great point cuz it's like, you don't get the positive reinforcement if you don't have the likes and the things that a company would add, right?

Like, as opposed to just like, Oh, I set up nginx and like my site's up or whatever. Like, not that anyone does that these days, but, yeah, that's, that's that's interesting. It's just like, could you make it really like just increasing the engagement of doing it yourself or like, you know, having that. Huh.

[01:45:56] Jeremy: I think sites have, have tried, I mean, it's not quite the same thing, but, dev.to, if you've seen that, like, uh, they, they have, um, I can't remember what it's called, I think it's like a canonical link or something. but basically you can post on their site and then you can put the canonical link to your own website.

And so then when somebody searches on Google, the, the traffic goes to your site. It doesn't bring up dev.to.

And then, people can comment and like on dev.to so I thought it was an interesting idea. I, I don't know how many people use it or take advantage but that's one approach anyways.

[01:46:44] Victor: Yeah, that's actually, that's cool. I don't know enough about that space. I guess.

That sounds awesome. That sounds like actually, you know, useful and like a good middle ground right in like encouraging the ecosystem but also like capturing some of that, of that value, right?

In terms of like just SEO juice, I guess, if you wanna, what, what you wanna call it. But that's awesome. I don't know, I, I, I've always thought of like dev.to And, and clearly I was, you know, at least wrong in part of dev.to Is just like medium 2.0 for, but more developer focused. Um, but I will find great blog posts on there, um, you know, more often than not, and it's just like, okay, yeah, that's, that's awesome.

Like, it, it, it works. Uh, and this canonical link thing sounds actually like very good for, um, for everybody involved, so. Awesome. Sounds like they're, they're good.

[01:47:36] Jeremy: Yeah, if people wanna check out you're up to, what, what, you're working on, where should they head?

[01:47:43] Victor: Oh God. Uh, well, like, I have my blog at, um, vadosware.io, so V A D O S WARE projects I work biggest ones right now. Oh, I guess three. Um, uh, like I, we mentioned Podcast Saver, which is cool. Uh, if you need to download podcasts, do that.

Um, I send out ideas. I send out ideas every week that I think are like valuable. valuable and like things you could turn up into like a startup or a SaaS and like kind of focus on like validating. Cuz like one thing I've learned the hard way is that validating ideas is more important than having them.

Uh, cuz you can think something is good and it won't, won't attract anybody. Um, or you know, if you don't put it in front of people, they'll, it's not gonna take off. so I do that. I send that out at like unvalidatedideas.com So that's, that's a, you know, that's the domain.

I also started, um, trying to highlight FOSS projects cuz in yak shaving what you do is you come across a lot of awesome free and open source projects that are just like, oh, like this is a whole world and like this is like pretty polished and it's like pretty good and I just bookmark So I was just like, I have so many bookmarks, it doesn't make sense that I hold all of them. Um, and like I, someone else has, should see this. So I send out, and this is uh, new for me cuz I send out that newsletter every day. So it's a daily newsletter for like free and open source projects that do, you know, do whatever, like, do lots of various things.

And that is at Awesome Foss. So you can actually spell it multiple ways, but a w s m f o s s.com. So like, awesome without the vowels. Um, but also just if you spell it normally like a normal person, like awesome the word f o s s.com. Um, so that's, that's going.

And then the, the thing that's actually like taking up all my time is nimbus, um, Nimbus Web Services is what I'm calling it.

Uh, it's not out yet, there's nothing to try there, but it is, it is my attempt, to host free and open source software. But give, 10-30% back of revenue, so not profit. Right. Cause they can be different things and like, you know, see the movie industry for like, how that can go wrong, of revenue back to open source projects that, uh, that made the software that I'm hosting.

And I, I think there's more interesting things to be done there, right? Like it can, I can be more aggressive with that. Right. If it, if it works out. Cuz it's just like, you know, it scales so well, you know, see Amazon, right. but yeah, so if you're, if you're interested in that checkout, nimbusws.com.

And that's it. I've, I've plugged everything. Everything plugged.

[01:50:38] Jeremy: Yeah that last one sounds pretty, pretty ambitious. So good luck.

[01:50:42] Victor: Thanks for taking the time.

Oct 01, 2022

00:54:10

Xe Iaso is the Archmage of Infrastructure at Tailscale and previously worked at Heroku.

This episode originally aired on Software Engineering Radio but includes some additional discussion about their blog near the end of the episode.

Topics covered:

Use cases for VPNs
Simplifying service authentication by identifying users via IP
Peer-to-peer vs centralized "Virtual Pain Networks"
Tailscale's tech stack and why they forked the go compiler
DERP relay servers
Struggling with the iOS network extension size limit
The surprisingly small amount of infrastructure required to run a VPN
Running your company on your own product
Working at Heroku vs Tailscale
Using the socratic style of debate in technical blog posts

Related Links

Transcript

[00:00:00] Jeremy: Today I'm talking to Xe Iaso, they're the archmage of infrastructure at tailscale, and they also have a great blog everyone should check out.

Xe, welcome to software engineering radio.

[00:00:12] Xe: Thanks. It's great to be here.

[00:00:14] Jeremy: I think the first thing we should start with, is what's a, a VPN, because I think some people they may have used it to remote into their workplace or something like that. But I think the, the scope of what it's good for and what it does is a lot broader than that. So maybe you could talk a little bit about that first.

[00:00:31] Xe: Okay. a VPN is short for virtual private network. It's basically a fake network that's overlaid on top of existing networks. And then you can use that network to do whatever you would with a normal computer network. this term has been co-opted by companies that are attempting to get into the, like hide my ass style market, where, you know, you encrypt your internet information and keep it safe from hackers.

But, uh, so it makes it really annoying and hard to talk about what a VPN actually is. Because tailscale, uh, the company I work for is closer to like the actual intent of a VPN and not just, you know, like hide your internet traffic. That's already encrypted anyway with another level of encryption and just make a great access point for, uh, three letter agencies.

But are there, use cases, past that, like when you're developing a piece of software, why would you decide to use a VPN outside of just because I want my, you know, my workers to be able to get access to this stuff.

[00:01:42] Xe: So something that's come up, uh, when I've been working at tailscale is that sometimes we'll make changes to something. And it'll be changes to like the user experience of something on the admin panel or something. So in a lot of other places I've worked in order to have other people test that, you know, you'd have to push it to the cloud.

It would have to spin up a review app in Heroku or some terrifying terraform of abomination would have to put it out onto like an actual cluster or something. But with tail scale, you know, if your app is running locally, you just give like the name of your computer and the port number. And you know, other people are able to just see it and poke it and experience it.

And that basically turns the, uh, feedback cycle from, you know, like having to wait for like the state of the world to converge, to, you know, make a change, press F five, give the URL to a coworker and be like, Hey, is this Gucci?

they can connect to your app as if you were both connected to the same switch.

[00:02:52] Jeremy: You don't have to worry about, pushing to a cloud service or opening ports, things like that.

[00:02:57] Xe: Yep. It will act like it's in the same room, even when they're not it'll even work. if you're at both at Starbucks and the Starbucks has reasonable policies, like holy crap, don't allow devices to connect to each other directly. so you know, you're working on. Like your screenplay app at your Starbucks or something, and you have a coworker there and you're like, Hey, uh, check this out and, uh, give them the link.

And then, you know, they're also seeing the screenplay editor.

[00:03:27] Jeremy: in terms of security and things like that. I mean, I'm picturing it kind of like we were sitting in the same room and there's a switch and we both plugged in. Normally when you do something like that, you kind of have, full access to whatever else is on the switch. Uh, you know, provided that's not being blocked by a, a firewall.

is there like a layer of security on top of that, that a VPN service like tailscale would provide.

[00:03:53] Xe: Yes. Um, there are these things called access control lists, which are kind of like firewall rules, except you don't have to deal with like the nightmare of writing an IP tables rule that also works in windows firewall and whatever they use in Mac OS. The ACL rules are applied at the tailnet level for every device in the tailnet.

So if you have like developer machines, you can put people into groups as things like developers and say that developer machines can talk to production, but not people in QA. They can only talk to testing and people on SRE have, you know, permissions to go everywhere and people within their own teams can connect to each other. you can make more complicated policies like that fairly easily.

[00:04:44] Jeremy: And when we think about infrastructure for, for companies, you were talking about how there could be development, infrastructure, production, infrastructure, and you kind of separate it all out. when you're working with cloud infrastructure. A lot of times, there's the, I always forget what it stands for, but there's like IAM.

There's like policies that you can set up with the cloud provider that says these users can access this, or these machines can access this. And, and I wonder from your perspective, when you would choose to use that versus use something at the, the network or the, the VPN level.

[00:05:20] Xe: The way I think about it is that things like IAM enforce, permissions for like more granularly scoped things like can create EC2 instances or can delete EC2 instances or something like that. And that's just kind of a different level of thing. uh, tailscale, ACLs are more, you know, X is allowed to connect to Y or with tailscale, SSH X is allowed to connect as user Y.

and that's really different than like arbitrary capability things like IAM offers.

you could think about it as an IAM system, but the main permissions that it's exposing are can X connect to Y on Zed port.

[00:06:05] Jeremy: What are some other use cases where if you weren't using a VPN, you'd have to do a lot more work or there's a lot more complexity, kind of what are some cases where it's like, okay, using a VPN here makes a lot of sense.

(The quick and simple guide to go links https://www.trot.to/go-links)

[00:06:18] Xe: There is a service internal to tailscale called go, which is a, clone of Google's so-called go links where it's basically a URL shortener that lives at http://go. And, you know, you have go/something to get to some internal admin service or another thing to get to like, you know, the company directory and notion or something, and this kind of thing you could do with a normal setup, you know, you could set it up and have to do OAuth challenges everywhere and, you know, have to put and make sure that everyone has the right DNS configuration so that, it shows up in the right place.

And then you have to deal with HTTPS um, because OAuth requires HTTPS for understandable and kind of important reasons. And it's just a mess. Like there's so many layers of stuff like the, the barrier to get, you know, like just a darn URL, shortener up turns from 20 minutes into three days of effort trying to, you know, understand how these various arcane things work together.

You need to have state for your OAuth implementation. You need to worry about what the hell a a JWT is (sigh) . It's it it's just bad. And I really think that something like tailscale with everybody has an IP address. In order to get into the network, you have to sign in with your, auth provider, your, a provider tells tailscale who you are.

So transitively every IP address is tied to an owner, which means that you can enforce access permission based on the IP address and the metadata about it that you grab from the tailscale. daemon, it's just so much simpler. Like you don't have to think about, oh, how do I set up OAuth this time? What the hell is an oauth proxy?

Um, what is a Kubernetes? That sort of thing you just think about like doing the thing and you just do it. And then everything else gets taken care of it. It's like kind of the ultimate network infrastructure, because it's both omnipresent and something you don't have to think about. And I think that's really the power of tailscale.

[00:08:39] Jeremy: typically when you would spin up a, a service that you want your developers or your system admins, to be able to log into, you would have to have some way of authenticating and authorizing that user. And so you were talking about bringing in OAuth and having your, your service understand that.

But I, I guess what you're saying is that when you have something like tailscale, that's kind of front loaded, I guess you, you authenticate with tail scale, you get onto the network, you get your IP. And then from that point on you can access all these different services that know like, Hey, because you're on the network, we know you're authenticated and those services can just maybe map that IP that's not gonna change to like users in some kind of table. Um, and not have to worry about figuring out how do I authenticate this user.

[00:09:34] Xe: I would personally more suggest that you use the, uh, whois, uh, look up route in the tailscale daemon's local API, but basically, yeah, you don't really have to worry too much about like the authentication layer because the authentication layer has already been done. You know, you've already done your two factor with Gmail or whatever, and then you can just transitively push that property onto your other machines.

[00:10:01] Jeremy: So when you talk about this, this whois daemon, can you give an example of I'm in the network now I'm gonna make a service call to an application. what, what am I doing with this? This whois daemon?

[00:10:14] Xe: It's more of like a internal API call that we expose via tailscaled's, uh, Unix, socket. but basically you give it an IP address and a port, and it tells you who the person is. It's kind of like the Unix ident protocol in a way, except completely not. And at a high level, you know, if you have something like a proxy for Grafana, you have that proxy for Grafana, make a call to the local tailscale daemon, and be like, Hey, who was this person?

And the tailscale, daemon will spit back at JSON object. Like, oh, it's this person on this device and there you can do additional logic like maybe you shouldn't be allowed to delete things from an iOS device, you know, crazy ideas like that. there's not really support for like arbitrary capabilities and tailscaled at the time of recording, but we've had some thoughts would be cool.

[00:11:17] Jeremy: would that also include things like having roles, for example, even if it's just strings, um, that you get back so that your application would know, okay. This person, is supposed to have admin access to this service based on what I got back from, this, this service.

[00:11:35] Xe: Not currently, uh, you can probably do it via convention or something, but what's currently implemented in the actual, like, source code and user experience that they, you can't do that right now. Um, it is something that I've been, trying to think about different ways to solve, but it's also a problem.

That's a bit big for me personally, to tackle.

[00:11:59] Jeremy: there's, there's so many, I guess, different ways of doing it. That it's kind of interesting to think of a solution that's kind of built into the, the network. Yeah.

[00:12:10] Xe: Yeah. and when I describe that authentication thing to some people, it makes them recoil in shock because there's kind of a Stockholm syndrome type effect with security, for a lot of things where, the easy way to do something and the secure way to do something are, you know, like completely opposite and directly conflicting with each other in almost every way.

And over time, people have come to associate security or like corporate VPNs as annoying, complicated, and difficult. And the idea of something that isn't annoying, complicated or difficult will make people reject it, like just on principle, because you know, they've been trained that, you know, VPN equals virtual pain network and it, it's hard to get that association outta people's heads because you know, a lot of VPNs are virtual pain networks.

Like. I used to work for Salesforce and Salesforce had this corporate VPN where no matter what you did, all of your traffic would go out to the internet from their data center. I think it was in San Francisco or something. And I was in the Seattle area. So whenever I had the VPN on my latency to Google shot up by like eight times and being a software person, you know, I use Google the same way that others breathe and it, it was just not fun.

And I only had the VPN on for the bare minimum of when I needed it. And, oh God, it was so bad.

[00:13:50] Jeremy: like some people, when they picture a VPN, they picture exactly what you're describing, where all of my traffic is gonna get routed to some central point. It's gonna go connect to the thing for me and then send the result back. so maybe you could talk a little bit about why that's, that's maybe a wrong assumption, I guess, in the case of tailscale, or maybe in the case of just more modern VPN solutions.

[00:14:13] Xe: Yeah. So the thing that I was describing is what I've been lovingly calling the, uh, single point of failure as a service type model of VPN, where, you know, you have like the big server somewhere, it concentrates all the connections and, you know, like does things to make the computer feel like they've teleported over there, but overall it's a single point of failure.

And if that falls over, you know, like goodbye, VPN. everybody's just totally screwed. And in contrast, tailscale does a more peer-to-peer thing so that everyone is basically on equal footing. Everyone can send traffic directly to each other, and if it can't get directly to there, it'll use a network of, uh, relay servers, uh, lovingly called Derp and you don't have to worry about, your single point of failure in your cluster, because there's just no single point of failure.

Everything will directly communicate as much as possible. And if it can't, it'll still communicate anyway.

[00:15:18] Jeremy: let's say I start up my computer and I wanna connect to a server in a data center somewhere at the very beginning, am I connecting to some server hosted at tailscale? And then. There's some kind of negotiation process where after that I connect directly or do I just connect directly straight away?

[00:15:39] Xe: If you just turn on your laptop and log in, you know, to it signs into tailscale and gets you on the tailnet and whatnot, then it will actually start all connections via Derp just so that it can negotiate the, uh, direct connection. And in case it can't, you know, it's already connected via Derp so it just continues the connection with Derp and this creates a kind of seamless magic type experience where doing things over Derp is slower.

Yes, it is measurably slower because you know, like you're not going directly, you're doing TCP inside of TCP. And you know, that comes with a average minefield of lasers or whatever you call it. And it does work though. It's not ideal if you wanna do things like copy large amounts of data, but if you want just want ssh into prod and see the logs for what the heck is going on and why you're getting paged at 3:00 AM. it's pretty great.

[00:16:40] Jeremy: What you, you were calling Derp is it where you have servers kind of all over the world and somehow it determines which one's, I guess, is it which one's closest to your destination or which one's closest to you. I'm kind of

[00:16:54] Xe: It's really interesting. It's one of the most weird distributed systems, uh, type things that I've ever seen. It's the kind of thing that could only come outta the mind of an X Googler, but basically every tailscale, every tailscale node has a connection to all of the Derp servers and through process of, you know, latency testing.

It figures out which connection is the fastest and the lowest latency. And it calls that it's home Derp but because it's connected to everything is connected to every Derp you can have two people with different home Derps getting their packets relayed too other clients from different Derps.

So, you know, if you have a laptop in Ottawa and a laptop in San Francisco, the laptop in San Francisco will probably use the, uh, Derp that's closest to it. But the laptop in Ottawa will also use the Derp that's closest to it. So you get this sort of like asynchronous thing, and it actually works out a lot better in practice, than you're probably imagining.

[00:17:52] Jeremy: And then these servers, what was the, the technical term for them? Are they like relays or what's

[00:17:58] Xe: They're relays. Uh, they only really deal with encrypted wire guard packets, and there's, no way for us at tailscale, to see the contents of Derp messages, it is literally just a forwarder. It, it literally just forwards things based on the key ID.

[00:18:17] Jeremy: I guess if tail scale isn't able to decrypt the traffic, is, is that because the, the keys are only on the user's devices, like it's on their laptop and on the server they're trying to reach, or

[00:18:31] Xe: Yeah. The private keys are live and die with those devices or the devices they were minted on. And the public keys are given to the coordination server and the coordination server spreads those around to every device in your tailnet. It does some limiting so that like, if you don't have ACL access to something, you don't get the private key, you don't get the, uh, public key for it.

The public key, not the private key, the public key, not the private key. And yeah. Then, you know, you just go that way and it'll just figure it out. It's pretty nice.

[00:19:03] Jeremy: When we're kind of talking about situations where it can't connect directly, that's where you would use the relay. what are kind of the typical cases where that happens, where you, you aren't able to just connect directly?

[00:19:17] Xe: Hotel, wifi and paranoid network security setups, hotel wifi is the most notorious one because you know, you have like an overpriced wifi connection. And if you bring, like, I don't know like, You you're recording a bunch of footage on your iPhone. And because in, 2022. The iPhone has the USB2 connection on it.

And you know, you wanna copy that. You wanna use the network, but you can't. So you could just let it upload through iCloud or something, or, you know, do the bare minimum. You need to get the, to get the data off with Derp it wouldn't be ideal, but it would work. And ironically enough, that entire complexity involved with, you know, doing TCP inside of TCP to copy a video file over to your laptop might actually be faster than USB2, which is something that I did the math for a while ago.

And I just started laughing.

[00:20:21] Jeremy: Yeah, that that is pretty, pretty ridiculous

[00:20:23] Xe: welcome to the future, man (laughs) .

[00:20:27] Jeremy: in terms of connecting directly, usually when you have a computer on the internet, you don't have all your ports open, you don't necessarily allow, just anybody to send you traffic over UDP and so forth. let's say I wanna send, UDP data to a, a server on my network, but, you know, maybe it has some TCP ports open. I I'm assuming once I connect into the network via the VPN, I'm able to use other protocols and ports that weren't necessarily exposed. Is that correct?

[00:21:01] Xe: Yeah, you can use UDP. you can do basically anything you would do on a normal network except multicast um, because multicast is weird.

I mean, there's thoughts on how to handle multicast, but the main problem is that like wireguard, which is what is tail tailscale is built on top of, is, so called OSI model layer three network, where it's at like, you know, the IP address level and multicast is a layer two or data link layer type thing.

And, those are different numbers and, you can't really easily put, you know, like broadcast packets into IP, uh, IPV4 thinks otherwise, but, uh, in practice, no people don't actually use the broadcast address.

[00:21:48] Jeremy: so for someone who's, they, they have a project or their company wants to get started. I mean, what does onboarding look like? What, what do they have to do to get all these devices talking to one another?

[00:22:02] Xe: basically you, install tail scale, you log in with a little GUI thing or on a Linux server, you run tailscale up, and then you all log to the, to a, like a G suite account with the same domain name. So, you know, if your domain is like example.com, then everybody logs in with their example.com G suite account.

And, there is no step three, everything is allowed and everything can just connect and you can change the permissions from there. By default, the ACLs are set to a, you know, very permissive allow everyone to talk to everyone on any port. Uh, just so that people can verify that it's working, you know, you can ping to your heart's content.

You can play Minecraft with others. You can, you know, host an HTTP server. You can SSH into your development box and and write blog post with emacs, whatever you want.

[00:22:58] Jeremy: okay, you install the, the software on your servers, your workstations, your laptops, and so on. And then at, after that there's some kind of webpage or dashboard you would go in and say, I want these people to be able to access these things and

[00:23:14] Xe: Mm-hmm

[00:23:15] Jeremy: these ports and so on.

[00:23:17] Xe: you, uh, can customize the access control rules with something that looks like JSON, but with trailing commas and comments allowed, and you can go from there to customize basically anything to your heart's content. you can set rules so that people on the DevOps team can access everything, but you know, maybe marketing doesn't need access to the production database.

So you don't have to worry about that as much.

[00:23:45] Jeremy: there's, there's kind of different options for VPNs. CloudFlare access, zero tier, there's, there's some kind of, I think it's Nebula from slack or something like that. so I was kind of curious from your perspective, what's the, difference between those kinds of services and, and tailscale.

[00:24:04] Xe: I'm gonna lead this out by saying that I don't totally understand the differences between a lot of them, because I've only really worked with tailscale. I know things about the other options, but, uh, I have the most experience with tailscale but from what I've been able to tell, there are things that tailscale offers that others don't like reverse mapping of IP addresses to people, or, there's this other feature that we've been working on, where you can embed tail scale as a library inside your go application, and then write a internal admin service that isn't exposed to the internet, but it's only exposed over tailscale.

And I haven't seen a way to do those things with those others, but again, I haven't done much research. Um, I understand that zero tier has some layer, two capabilities, but I've, I don't have enough time in the day to look into.

[00:25:01] Jeremy: There's been different, I guess you would call them VPN protocols. I mean, there's people have probably worked with IP sec in some situations they may have heard of OpenVPN, wireguard. in the case of tailscale, I believe you chose to build it on top of wireguard.

So I wonder if you could talk a little bit about why, you chose wireguard and, and maybe what makes it unique.

[00:25:27] Xe: I wasn't on the team that initially wrote like the core of tailscale itself. But from what I understand, wire guard was chosen because, what overhead, uh, it's literally, you just encrypt the packets, you send it to the other server, the other server decrypts them. And you know, you're done. it's also based purely on the public key. Um, the key pairs involved. And from what I understand, like at the wireguard protocol level, there's no reason why you, why you would need an IP address at all in theory, but in practice, you kind of need an IP address because you know, everything sucks. But also wire guard is like UDP only, which I think it at it's like core implementation, which is a step up from like AnyConnect and OpenVPN where they have TCP modes.

So you can experience the, uh, glorious, trash fire of TCP in TCP. And from what I understand with wireguard, you don't need to set up a certificate authority or figure out how the heck to revoke certificates. Uh, you just have key pairs and if a node needs to be removed, you delete the key pair and you're done.

And I think that really matches up with a lot of the philosophy behind how tailscale networks work a lot better. You know, you have a list of keys and if the network changes the list of keys changes, that's, that's the end of the story.

So maybe one of the big selling points was just What has the least amount of things I guess, to deal with, or what's the, the simplest, when you're using a component that you want to put into your own product, you kind of want the least amount of things that could go wrong, I guess.

[00:27:14] Xe: Yeah. It's more like simple, but not like limiting. Like, for example, a set of tinker toys is simple in that, you know, you can build things that you don't have to worry too much about the material science, but a set of tinker toys is also limiting because you know, like they're little wooden, dowels and little circles made out of wind that you stick the dowels into, you know, you can only do so much with it.

And I think that in comparison, wireguard is simple. You know, there's just key pairs. They're just encryption. And it's simple in it's like overall theory and it's implementation, but it's not limiting. Like you can do pretty much anything you want with it.

inherently whenever we build something, that's what we want, but that's a, that's an interesting way of putting it. Yeah.

[00:28:05] Xe: Yeah. It. It can be kind of annoyingly hard to figure out how to make things as simple as they need to be, but still allow for complexity to occur. So you don't have to like set up a keyboard macro to write if error not equals nil over and over.

[00:28:21] Jeremy: I guess the next thing I'd like to talk a little bit about is. We we've covered it a little bit, but at a high level, I understand that that tailscale uses wireguard, which is the open source, VPN protocol, I guess you could call it. And then there's the client software. You're saying you need to install on each of the servers and workstations.

But there's also a, a control plane. and I wonder if you could kind of talk a little bit about I guess at a high level, what are all the different components of, of tailscale?

[00:28:54] Xe: There's the agent that you install in your devices. The agent is basically the same between all the devices. It's all written in go, and it turns out that go can actually cross compile fairly well. So you have. Your, you know, your implementation in go, that is basically the, the same code, more or less running on windows, MacOS, freeBSD, Android, ChromeOS, iOS, Linux.

I think I just listed all the platforms. I'm not sure, but you have that. And then there's the sort of control plane on tailscale's side, the control plane is basically like control, uh, which is, uh, I think a get smart reference. and that is basically a key dropbox. So, you know, you You authenticate through there. That's where the admin panel's hosted. And that's what tells the different tailscale nodes uh, the keys of all the other machines on the tailnet. And also on tailscale side there's, uh, Derp which is a fleet of a bunch of different VPSs in various clouds, all over the world, both to try to minimize cost and to, uh, have resiliency because if both digital ocean and Vultr go down globally, we probably have bigger problems.

[00:30:15] Jeremy: I believe you mentioned that the, the clients were written in go, are the control plane and the relay, the Derp portion. Are those also written in go or are they

[00:30:27] Xe: They're all written and go, yeah,

go as much as possible. Yeah.

It's kind of what happens when you have some ex go team members is the core people involved in tail scale, like. There's a go compiler fork that has some additional patches that go upstream either can't accept, uh, won't accept or hasn't yet accepted, for a while. It was how we did things like trying to shave off by bites from binary size to attempt to fit it into the iOS network extension limit.

Because for some reason they only allowed you to have 15 megabytes of Ram for both like your application and working Ram. And it turns out that 15 megabytes of Ram is way more than enough to do something like OpenVPN. But you know, when you have a peer-to-peer VPN engine, it doesn't really work that well.

So, you know, that's a lot of interesting engineering challenge.

[00:31:28] Jeremy: That was specifically for iOS. So to run it on an iPhone.

[00:31:32] Xe: Yeah. Um, and amazingly after the person who did all of the optimization to the linker, trying to get the binary size down as much as possible, like replacing Unicode packages was something that's more coefficient, you know, like basically all but compressing parts of the binary to try to save space. Then the iOS, I think 15 beta dropped and we found out that they increased the network extension Ram limit to 50 megabytes and the look of defeat on that poor person's face. I feel very bad for him.

[00:32:09] Jeremy: you got what you wanted, but you're sad about it,

[00:32:12] Xe: Yeah.

[00:32:14] Jeremy: so that's interesting too. you were using a fork of the go compiler

[00:32:19] Xe: Basically everything that is built is built using, uh, the tailscale fork, of the go compiler.

[00:32:27] Jeremy: Going forward is the sort of assumption is that's what you'll do, or is it you're, you're hoping you can get this stuff upstreamed and then eventually move off of it.

[00:32:36] Xe: I'm pretty sure that, I, I don't know if I can really make a forward looking statement like that, but, I've come to accept the fact that there's a fork of the go compiler. And as a result, it allows a lot more experimentation and a bit more of control, a bit more control over what's going on. like I'm, I'm not like the most happy with it, but I've, I understand why it exists and I'm, I've made my peace with it.

[00:33:07] Jeremy: And I suppose it, it helps somewhat that the people who are working on it actually originally worked on the, go compiler at Google. Is that right?

[00:33:16] Xe: Oh yeah. If, uh, there weren't ex go team people working on that, then I would definitely feel way less comfortable about it. But I trust that the people that are working on it, know what they're doing at least enough.

[00:33:30] Jeremy: I, I feel like, that's, that's kind of the position we put ourselves in with software in general, right? Is like, do we trust our ourselves enough to do this thing we're doing?

[00:33:39] Xe: Yeah. And trust is a bitch.

[00:33:44] Jeremy: um, I think one of the things that's interesting about tail scale is that it's a product that's kind of it's like network infrastructure, right? It's to connect you to your other devices. And that's a little different than somebody running a software as a service. And so. how do you test something that's like built to support a network and, and how is that different than just making a web app or something like that.

[00:34:11] Xe: Um, well, it's a lot more complicated for one, especially when you have to have multiple devices in the mix with multiple different operating systems. And I was working on some integration tests, doing stuff for a while, and it was really complicated. You have to spin up virtual machines, you know, you have to like make sure the virtual machines are attempting to download the version of the tailscale client you wanna test and. It's it's quite a lot in practice.

[00:34:42] Jeremy: I mean, do you have a, a lab, you know, with Android phones and iPhones and laptops and all this sort of stuff, and you have some kind of automated test suite to see like, Hey, if these machines are in Ottawa and, my servers in San Francisco, like you're mentioning before that I can get from my iPhone to this server and the data center over here, that kind of thing.

[00:35:06] Xe: What's the right way to phrase this without making things look bad. Um, it's a work in progress. It it's, it's really a hard problem to solve, uh, especially when the company is fully remote and, uh, like. Address that's listed on the business records is literally one of the founders condos because you know, the company has no office.

So that makes the logistics for a lot of this. Even more fun.

[00:35:37] Jeremy: Probably any company that's in an early stage feels the same way where it's like, everything's a work in progress and we're just gonna, we're gonna keep going and we're gonna get there. And as long as everything keeps running, we're good.

[00:35:50] Xe: Yeah. I, I don't like thinking about it in that way, because it kind of sounds like pessimistic or defeatist, but at some level it's, it, it really is a work in progress because it's, it's a hard problem and hard problems take a lot of time to solve, especially if you want a solution that you're happy with.

[00:36:10] Jeremy: And, and I think it's kind of a unique case too, where it's not like if it goes down, it's like people can't do their job. Right. So it's yeah.

[00:36:21] Xe: Actually, if tail scales like control plane goes down, I don't think people would notice until they tried to like boot up a, a reboot, a laptop, or connect a new device to their tailnet. Because once, once all the tailscale agents have all of the information they need from the control plate, you know, they just, they just continue on independently and don't have to care.

Derp is also fairly independent of the, like the key dropbox component. And, you know, if that, if that goes down Derp doesn't care at all,

[00:37:00] Jeremy: Oh, okay. So if the control plane is down, as long as you had authenticated earlier in the day, you can still, I don't know if it's cached or something, but you can still continue to reach the relay servers, the Derp servers or your,

[00:37:15] Xe: other nodes. Yeah. I, I'm pretty sure that in most cases, the control plane could be down for several hours a day and nobody would notice unless they're trying to deal with the admin panel.

[00:37:28] Jeremy: Got it. that's a little bit of a relief, I suppose, for, for all of you running it,

[00:37:33] Xe: Yeah. Um, it's also kind of hard to sell people on the idea of here is a VPN thing. You don't need to self host it and they're like, what? Why? And yeah, it can be fun.

[00:37:49] Jeremy: though, I mean, I feel like anybody who has, self-hosted a VPN, they probably like don't really wanna do it. I don't know. Maybe I'm wrong.

[00:38:00] Xe: well, so a lot of the idea of wanting to self host it is, uh, I think it's more of like trying to be self-sufficient and not have to rely on other companies, failures dictating your company's downtime. And, you know, like from some level that's very understandable. And, you know, if, you know, like tail scale were to get bought out and the new owners would, you know, like basically kill the product, they'd still have something that would work for them.

I don't know if like such a defeatist attitude is like productive. But it is certainly the opinion that I have received when I have asked people why they wanna self-host. other people, don't want to deal with identity providers or the, like, they wanna just use their, they wanna use their own identity provider.

And what was hilarious was there was one, there was one thing where they were like our old VPN server died once and we got locked out of our network. So therefore we wanna, we wanna self-host tailscale in the future so that this won't happen again.

And I'm like, buddy, let's, let's just, let's just take a moment and retrace our steps here. CAuse I don't think you mean what you think you mean.

[00:39:17] Jeremy: yeah, yeah.

[00:39:19] Xe: In general, like I suggest people that, you know, even if they're like way deep into the tailscale, Kool-Aid they still have at least one other method of getting into their servers. Ideally, two. I, I admit that I'm, I come from an SRE style background and I am way more paranoid than most, but it, I usually like having, uh, a backup just in case.

[00:39:44] Jeremy: So I, I suppose, on, on that note, let's, let's talk a little bit about your role at tailscale. the title of the archmage of infrastructure is one of the, the coolest titles I've, uh, I've seen. So maybe you can go a little bit into what that entails at, at tailscale.

[00:40:02] Xe: I started that title as a joke that kind of stuck, uh, my intent, my initial intent was that every time someone asked, I'd say, I'd have a different, you know, like mystic sounding title, but, uh, archmage of infrastructure kind of stuck. And since then, I've actually been pivoting more into developer relations stuff rather than pure software engineering.

And, from the feedback that I've gotten at the various conferences I've spoken at, they like that title, even though it doesn't really fit with developer relations work at all, it it's like it fits because it doesn't. You know, that kind of coney kind of way.

[00:40:40] Jeremy: I guess this would go more into the, the infrastructure side, but. What does the, the scale of your infrastructure look like? I mean, I, I think that you touched a little bit on the fact that you have relay servers all over the place and you've got this control plane, but I wonder if you could give people a little bit of perspective of what kind of undertaking this is.

[00:41:04] Xe: I am pretty sure at this point we have more developer laptops and the like, than we do production servers. Um, I'm pretty sure that the scale of the production of production servers are in the tens, at most. Um, it turns out that computers are pretty darn and efficient and, uh, you don't really need like a lot of computers to do something amazing.

[00:41:27] Jeremy: the part that I guess surprises me a little bit is, is the relay servers, I suppose, because, I would imagine there's a lot of traffic that goes through those. are you finding that just most of the time they just aren't needed and usually you can make a direct connection and that's why you don't need too many of these.

[00:41:45] Xe: From what I understand. I don't know if we actually have a way to tell, like what percentage of data is going over the relays versus not. And I think that was an intentional decision, um, that may have been revisited I'm operating based off of like six to 12 month old information right now. But in general, like the only state that the relay servers has is in Ram.

And whenever the relay, whenever you disconnect the server, the state is dropped.

[00:42:18] Jeremy: Okay.

[00:42:19] Xe: and even then that state is like, you know, this key is listening. It is, uh, connected, uh, in case you wanna send packets over here, I guess.

it's a bit less bandwidth than you're probably thinking it's not like enough to max it out 24/7, but it is, you know, measurable and there are some, you know, costs associated with it. This is also why it's on digital ocean and vulture and not AWS. but in general, it's a lot less than you'd think. I'm pretty sure that like, if I had to give a baseless assumption, I'd say that probably about like 85% of traffic goes directly.

And the remaining is like the few cases in the whole punching engine that we haven't figured out yet. Like Palo Alto fire walls. Oh God. Those things are a nightmare.

[00:43:13] Jeremy: I see. So it's most of the traffic actually ends up. Being straight peer to peer. Doesn't have to go through your infrastructure. And, and therefore it's like, you don't need too many machines, uh, to, to make this whole thing work.

[00:43:28] Xe: Yeah. it turns out that computers are pretty darn fast and that copying data is something that computers are really good at doing. Um, so if you have, you know, some pretty darn fast computers, basically just sitting there and copying data back and forth all day, like it, you can do a lot with shockingly little.

Um, when I first started, I believe that the Derp VMs were using like sometimes as little as one core and 512 megabytes of Ram as like a primary Derp. And, you know, we only noticed when, there were some weird connection issues for people that were only on Derp because there were enough users that the machine had ran out of memory.

So we just, you know, upped the, uh, virtual machine size and called it a day. But it's, it's truly remarkable how mu how far you can get with very little

[00:44:23] Jeremy: And you mentioned the relay servers, the, the Derp servers were on services like digital ocean and Vultr. I'm assuming because of the, the bandwidth cost, for the control plane, is, is that on AWS or some other big cloud provider?

[00:44:39] Xe: it's on AWS. I believe it's in EU central 1.

[00:44:44] Jeremy: You're helping people connect from device to device and in a situation like that. what does monitoring look like in, in incidents? Like what are you looking for to determine like, Hey, something's not working.

[00:44:59] Xe: there's monitoring with, you know, Prometheus, Grafana, all of that stuff. there are some external probing things. there's also some continuous functional testing for trying to connect to tailscale and like log in as an account. And if that fails like twice in a row, then, you know, something's very wrong and, you know, raise the alarm.

But in general. A lot of our monitoring is kind of hard at some level because you know, we're tailscale at a tailscale can't always benefit from tailscale to help operate tail scale because you know, it's tailscale. Um, so it, it still trying to figure out how to detangle the chicken and egg situation.

It's really annoying.

there's the, the term dog fooding, right? Where they're saying like, oh, we, we run, um, our own development on our own platform or our own software. but I could see when your product is network infrastructure, VPNs, where that could be a little, little dicey.

[00:46:06] Xe: Yeah, it is very annoying. But I I'm pretty sure we'll figure something out. It is just a matter of when, another thing that's come up is we've kind of wanted to use tailscale's SSH features, where you specify ACLs in your, you specify ACL rules to allow people to SSH, to other nodes as various users.

but if that becomes your main access to production, then you know, like if tailscale is down and you're tailscale, like how do you get in, uh, then there's been various philosophical discussions about this. it's also slightly worse if you use what's called check mode in SSH, where, uh, tail scale, SSH without check mode, you know, you just, it, the, the server checks against the policy rules and the ACL and if it. if it's okay, it lets you in. And if not, it says no, but with check mode, there's also this like eight hour, there's this like eight hour quote unquote lifetime for you to have like sudo mode on GitHub, where you do an auth an auth challenge with your auth aprovider. And then, you know, you're given a, uh, Hey, this person has done this thing type verification.

And if that's down and that goes through the control plane, and if the control plane is down and you're tailscale, trying to debug the control plane, and in order to get into the control plane over tailscale, you need to use the, uh, control plane. It, you know, that's like chicken and egg problem level 78,

which is a mythical level of chicken egg problem that, uh, has only been foretold in the legends of yore or something.

[00:47:52] Jeremy: at that point, it sounds like somebody just needs to, to drive to the data center and plug into the switch.

[00:47:59] Xe: I mean, It's not, it's not going to, it probably wouldn't be like, you know, we need to get a person with an angle grinder off of Craigslist type bad. Like it was with the Facebook BGP outage, but it it's definitely a chicken and egg problem in its own right.

it makes you do a lot of lateral thinking too, which is also kind of interesting.

[00:48:20] Jeremy: When, when you say lateral thinking, I'm just kind of curious, um, if you have an example of what you mean.

[00:48:27] Xe: I don't know of any example that isn't NDAed. Um, but basically, you know, tail scale is getting to the, to the point where tailscale is relying on tailscale to make tailscale function and you know, yeah. This is classic oroboros style problem.

I've heard a, uh, a wise friend of mine said that that is an ideal problem to have, which sounds weird at face value. But if you're getting to that point, that means that you're successful enough that, you know, you're having that problem, which is in itself a good thing, paradoxically.

[00:49:07] Jeremy: better to have that problem than to have nobody care about the product. Right.

[00:49:12] Xe: Yeah.

[00:49:13] Jeremy: kind of on that, that note, um, you mentioned you worked at, at Salesforce, uh, I believe that was working on Heroku. I wonder if you could talk a little about your experience working at, you know, tailscale, which is kind of more of a, you know, early startup versus, uh, an established company like Salesforce.

[00:49:36] Xe: So at the time I was working at Heroku, it definitely didn't feel like I was working at Salesforce for the majority of it. It felt like I was working, you know, at Heroku, like on my resume, I listed as Heroku. When I talked about it to people, I said, I worked at Heroku and that sales force was this, you know, mythical, Ohana thing that I didn't have to deal with unless I absolutely had to.

By the end of the time I was working at Heroku, uh, the salesforce, uh, sort of started to creep in and, you know, we moved from tracking issues in GitHub issues. Like we were used to, to using their, oh, what's the polite way to say this, their creation, which is, which was like the moral equivalent of JIRA implemented on top of Salesforce.

You had to be behind the VPN for it. And, you know, every ticket had 20 fields and, uh, there were no templates. And in comparison with tail scale, you know, we just use GitHub issues, maybe some like things in notion for doing like longer term tracking or Kanban stuff, but it's nice to not have. you know, all of the pomp and ceremony of filling out 20 fields in a ticket for like two sentences of this thing is obviously wrong and it's causing X to happen.

Please fix.

[00:51:08] Jeremy: I, I like that, that phrase, the, the creation, that's a very, very diplomatic term.

[00:51:14] Xe: I mean, I can think of other ways to describe it, but I'm pretty sure those ways wouldn't be allowed on the podcast. So

[00:51:25] Jeremy: Um, but, but yeah, I, I know what you mean for sure where, it, it feels like there's this movement from, Hey, let's just do what we need. Like let's fill in the information that's actually relevant and don't do anything else to a shift to, we need to fill in these 10 fields because that's the thing we do.

Yeah.

[00:51:48] Xe: Yeah. and in the time I've been working for tail scale, I'm like employee ID 12. And, uh, tail scale has gone from a company where I literally know everyone to just recently to the point where I don't know everyone anymore. And it's a really weird feeling. I've never been in a, like a small stage startup that's gotten to this size before, and I've described some of my feelings to other people who have been there and they're like, yeah, welcome to the club. So I figure a lot of it is normal. from what I understand, though, there's a lot of intentionality to try to prevent tail skill from becoming, you know, like Google style, complexity, organizational complexity, unless that is absolutely necessary to do something.

[00:52:36] Jeremy: it's a function of size, right? Like as you have more people, more teams, then more process comes in. that's a really tricky balance to, to grow and still keep that feeling of, I'm just doing the thing, I'm doing the work rather than all this other process stuff.

[00:52:57] Xe: Yeah, but it, I've also kind of managed to pigeonhole myself off into a corner with devrel stuff. And that's been nice. I've been working a bunch with, uh, like marketing people and, uh, helping out with support occasionally and doing a, like a godawful amount of writing.

[00:53:17] Jeremy: the, the writing, for our audience's benefit, I, I think they should, they should really check out your blog because I think that the way you write your, your articles is very thoughtful in terms of the balance of the actual example code or example scripts and the descriptions and, and some there's a little bit of a narrative sometimes too.

So,

[00:53:40] Xe: Um, I'm actually more of a prose writer just by like how I naturally write things. And a lot of the style of how I write things is, I will take elements from, uh, the Socratic style of dialogue where, you know, you have the student and the teacher. And, you know, sometimes the student will ask questions that the teacher will answer.

And I found that that's a particularly useful way to help model understanding or, you know, like put side concepts off into their own little blurbs or other things like that. I also started doing those conversation things with, uh, furry art, specifically to dunk on a homophobe that was getting very angry at furry art being in, uh, another person's blog.

And that's it, it's occasionally fun to go into the, uh, orange website of bad takes and see the comments when people complain about it. oh gosh, the bad takes are hilariously good. Sometimes.

[00:54:45] Jeremy: it's good that you have like a, a positive, mindset around that. I know some people can read, uh, that sort of stuff and go, you know, just get really bummed out.

[00:54:54] Xe: One of the ways I see it is that a lot of the time algorithms are based on like sheer numbers. So if you like get something that makes people argue in the comments, that number will go up and because there's more comments on it, it makes more people more likely to, to read the article and click on it.

So, sometimes I have been known to sprinkle, what's the polite way to say this. I've been known to sprinkle like intentionally kind of things that will, uh, get people and make them want to argue about it in the comments. Purely to make the engagement numbers rise up, which makes more people likely to read the article.

And, it's kind of a dirty practice, but you know, it makes more people read the article and more people benefit. So, you know, like it's kind of morally neutral, I guess.

[00:55:52] Jeremy: usually that, that seems like, a sketchy thing. But I feel like if it's in service to, uh, like a technical blog post, I mean, why not? Right.

[00:56:04] Xe: And a lot of the times I'll usually have the like, uh, kind of bad take, be in a little conversation blurb thing so that people will additionally argue about the characterization of, you know, the imaginary cartoon shark or whatever.

[00:56:20] Jeremy: That's good. It's the, uh, it's the Xe Xe universe that they're, they're stepping into.

[00:56:27] Xe: I've heard people describe it, uh, lovingly as the xeiaso.net cinematic universe.

I've had some ideas on how to expand it in the future with more characters that have more different kind of diverse backgrounds. But, uh, it turns out that writing this stuff is hard. Like actually very hard because you have to get this right.

You have to get the right balance of like snark satire, uh, like enlightenment. And

it's, it's surprisingly harder than you'd think. Um, but after a while, I've just sort of managed to like figure out as I'm writing where the side tangents come off and which ones I should keep and which ones I should, uh, prune and which ones can also help, Gain deeper understanding with a little like Socratic dialogue to start with a Mo like an incomplete assumption, like an incomplete picture.

And then, you know, a question of, wait, what about this thing? Doesn't that conflict with that? And like, well, yes. technically it does, but realistically we don't have to worry about that as much. So we can think about it just in terms of this bigger model and, uh, that's okay. Like, uh, I mentioned the OSI model earlier, you know, like the seven layer OSI model it's, you know, genuinely overkill for basically everything, except it's a really great conceptual model for figuring out the difference between, you know, like an ethernet cable, an ethernet, like the ethernet card, the IP stack TCP and, you know, TLS or whatever.

I have a couple talks that are gonna be up by the time this is published. Uh, one of them is my, uh, rustconf talk on my, or what was it called? I think it was called the surreal horrors of PAM or something where I discussed my experience, trying to bug a PAM module in rust, uh, for work. And, uh, it's the kind of story where, you know, it's bad when you have a break point on dlopen.

[00:58:31] Jeremy: That sounds like a nightmare.

[00:58:32] Xe: Oh yeah. Like part of the attempting to fix that process involved, going very deep. We're talking like an HTML frame set in the internet archive for sunOS documentation that was written around the time that PAM was used. Like it's things that are bad enough were like everything in the frame set, but the contents had eroded away through bit rot and you know, you're very lucky just to have what you do.

[00:59:02] Jeremy: well, I'm, I'm glad it was. It was you and not me. we'll get to, to hear about it and, and not have to go through the, the suffering ourselves.

[00:59:11] Xe: yeah. One of the things I've been telling people is that I'm not like a brilliant programmer. Like I know a bunch of people who are definitely way smarter than me, but what I am is determined and, uh, determination is a bit stronger of a force than you'd think.

[00:59:27] Jeremy: Yeah. I mean, without it, nothing gets done. Right.

[00:59:30] Xe: Yeah.

[00:59:31] Jeremy: as we wrap up, is there anything we missed or anything else you wanna mention?

[00:59:36] Xe: if you wanna look at my blog, it's on xeiaso.net. That's X, E I a S o.net. Um, that's where I post things. You can see, like the 280 something articles at time of recording. It's probably gonna get to 300 at some point, oh God, it's gonna get to 300 at some point. Um, and yeah, from, I try to post articles about weekly, uh, depending on facts and circumstances, I have a bunch of talks coming up, like one about the hilarious over engineering I did in my blog.

And maybe some more. If I get back positive responses from calls for paper submissions,

[01:00:21] Jeremy: Very cool. Well, Xe thank you so much for, for coming on software engineering radio.

[01:00:27] Xe: Yeah. Thank you for having me. I hope you have a good day and, uh, try out tailscale, uh, note my bias, but I think it's great.

Sep 09, 2022

00:55:19

Jonathan Shariat is the coauthor of the book Tragic Design and co-host of the Design Review Podcast.

He's currently a Sr. Interaction Designer & Accessibility Program Lead at Google.

This episode originally aired on Software Engineering Radio.

Topics covered:

How poor design kills in medical environments
Causing harm with features meant to bring joy
Considerations during the product development cycle
Industry specific checklists and testing requirements
Creating guiding principles for a team
Why medical software often has poor UX
Designing for crisis situations
Why dark patterns can be bad in the long term

Related Links

Transcript

You can help edit this transcript on GitHub.

[00:00:00] Jeremy: Today I'm talking to Jonathan Shariat, he's the co-author of Tragic design. The host of the design review podcast. And he's currently a senior interaction designer and accessibility program lead at Google. Jonathan, welcome to software engineering radio.

[00:00:15] Jonathan: Hi, Jeremy, thank you So much for having me on.

[00:00:18] Jeremy: the title of your book is tragic design. And I think that people can take a lot of different meanings from that. So I wonder if you could start by explaining what tragic design means to you.

[00:00:33] Jonathan: Hmm. For me, it really started with this story that we have in the beginning of the book. It's also online. Uh, I originally wrote it as a medium article and th that's really what opened my eyes to, Hey, you know, design has, is, is this kind of invisible world all around us that we actually depend on very critically in some cases.

And So this story was about a girl, you know, a nameless girl, but we named her Jenny for the story. And in short, she came for treatment of cancer at the hospital, uh, was given the medication and the nurses that were taking care of her were so distracted with the software they were using to chart, make orders, things like that, that they miss the fact that she needed hydration and that she wasn't getting it.

And then because of that, she passed away. And I still remember that feeling of just kind of outrage. And, you know, when we hear a lot of news stories, A lot of them are outraging. they, they touch us, but some of them, some of those feelings stay and they stick with you.

And for me, that stuck with me, I just couldn't let it go because I think a lot of your listeners will relate to this. Like we get into technology because we really care about the potential of technology. What could it do? What are all the awesome things that could do, but we come at a problem and we think of all the ways it could be solved with technology and here it was doing the exact opposite.

It was causing problems. It was causing harm and the design of that, or, you know, the way that was built or whatever it was failing Jenny, it was failing the nurses too, right? Like a lot of times we blame that end user and, and it caused it. So to me, that story was so tragic. Something that deeply saddened me and was regrettable and cut short someone's uh, you know, life and that's the definition of tragic, and there's a lot of other examples with varying degrees of tragic, but, um, you know, as we look at the impact technology has, and then the impact we have in creating those technologies that have such large impacts, we have a responsibility to, to really look into that and make sure we're doing as best of job as we can and avoid those as much as possible.

Because the biggest thing I learned in researching all these stories was, Hey, these aren't bad people. These aren't, you know, people who are clueless and making these, you know, terrible mistakes. They're me, they're you, they're they're people. Um, just like you and I, that could make the same mistakes.

[00:03:14] Jeremy: I think it's pretty clear to our audience where there was a loss of life, someone, someone died and that's, that's clearly tragic. Right? So I think a lot of things in the healthcare field, if there's a real negative outcome, whether it's death or severe harm, we can clearly see that as tragic.

and I, I know in your book you talk about a lot of other types of, I guess negative things that software can cause. So I wonder if you could, explain a little bit about now past the death and the severe injury. What's tragic to you.

[00:03:58] Jonathan: Yeah. still in that line of like of injury and death, And, you know, the side that most of us will actually, um, impact, our work day-to-day is also physical harm. Like, creating this software in a car. I think that's a fairly common one, but also, ergonomics, right?

Like when we bring it back to something like less impactful, but still like multiplied over the impact of, multiplied over the impact of a product rather, it can be quite, quite big, right? Like if we're designing software in a way that's very repetitive or, you know, everyone's, everyone's got that, that like scroll, thumb, scroll, you know, issue.

Right. if, uh, our phones aren't designed well, so there's a lot of ways that it can still physically impact you ergonomically. And that can cause you a lot of problem arthritis and pain, but yeah, there's, there's other, there's other, other ways that are still really impactful. So the other one is by saddening or angry.

You know, that emotional harm is very real. And oftentimes sometimes it gets overlooked a little bit because it's, um, you know, physical harm is what is so real to us, but sometimes emotional harm isn't. But, you know, we talk about in the book, the example of Facebook, putting together this great feature, which takes your most liked photo, and, you know, celebrates your whole year by you saying, Hey, look at as a hero, you're in review this, the top photo from the year, they add some great, you know, well done illustrations behind it, of, of balloons and confetti and, people dancing.

But some people had a bad year. Some people's most liked engaged photo is because something bad happened and they totally missed. And because of that, people had a really bad time with this where, you know, they lost their child that year. They lost their loved one that year, their house burnt down. Um, something really bad happened to them.

And here was Facebook putting that photo of their, of their dead child up with, you know, balloons and confetti and people dancing around it. And that was really hard for people. They didn't want to be reminded of that. And especially in that way, and these emotional harms also come into the, in the play of, on anger.

You know, we talk about, well, one, you know, there's, there's a lot of software out there that, that, um, tries to bring up news stories that anger us and which equals engagement. Um, but also ones that, um, use dark patterns to trick us into purchasing and buying and forgetting about that free trial. So they charge us for a yearly subscription and won't refund us.

Uh, if you've ever tried to cancel a subscription, you start to see some real their their real colors. Um, so emotional harm and, uh, anger is a, is a big one. We also talk about injustice in the book where there are products that are supposed to be providing justice. Um, and you know, in very real ways like voting or, you know, getting people the help that they need from the government, or, uh, for people to see their loved ones in jail.

Um, or, you know, you're getting a ticket unfairly because you couldn't read the sign was you're trying to read the sign and you, and you couldn't understand it. so yeah, we look at a lot of different ways that design and our saw the software that we create can have very real impact on people's lives and in a negative way, if we're not careful.

[00:07:25] Jeremy: the impression I get, when you talk about tragic design, it's really about anything that could harm a person, whether physically, emotionally, you know, make them angry, make them sad. And I think the, the most liked photo example is a great one, because like you said, I think the people may be building something that, that harms and they may have no idea that they're doing it.

[00:07:53] Jonathan: Exactly like that. I love that story because not, not to just jump on the bandwagon of saying bad things about like Facebook or something. No, I love that story because I can see myself designing the exact same thing, like being a part of that product, you know, building it, you know, looking at the, uh, the, the specifications, the, um, the, the PM, you know, put it that put together and the decks that we had, you know, like I could totally see that happening.

And just never, I think, never having the thought, because our we're so focused on like delighting our users and, you know, we have these metrics and these things in mind. So that's why, like, in the book, we really talk about a few different processes that need to be part of. Product development cycle to stop, pause, and think about like, well, what are the, what are the negative aspects here?

Like what are the things that could go wrong? What are the, what are the other life experiences that are negative? Um, that could be a part of this and you don't need to be a genius to think of every single thing out there. You know, like in this example, I think just talking about, you know, like, oh, well, some people might've had, you know, if they would have taken probably like, you know, one hour out of their entire project, or maybe even 10 minutes, they might've come up with like, oh, there could be bad thing.

Right. But, um, so if you don't have that, that, that moment to pause that moment to just say, okay, we have time to brainstorm together about like how this could go wrong or how, you know, the negative of life could be impacted by this, um, feature that that's all that it takes. It doesn't necessarily mean that you need to do.

You know, giant study around the impact, potential impact of this product and all the, all the ways, but really just having a part of your process that takes a moment to think about that will just create a better product and better, product outcomes. You know, if you think about all of life's experiences and Facebook can say, Hey, condolences, and like, you know, and show that thoughtfulness that would be, uh, I would have that have higher engagement that would have higher, uh, satisfaction, right?

So they could have created a better outcome by considering these things and obviously avoid the impact negative impact to users and the negative impact to their product.

[00:10:12] Jeremy: continuing on with that thought you're a senior interaction designer and you're an accessibility program lead. And so I wonder on the projects that you work on, and maybe you can give us a specific example, but how are you ensuring that you're, you're not running up against these problems where you build something that you think is going to be really great, um, for your users, but in reality ends up being harmful and specifically.

[00:10:41] Jonathan: Yeah, one of the best ways is, I mean, it should be part of multiple parts of your cycle. If, if you want something, if you want a specific outcome out of your product development life cycle, um, it needs to be from the very beginning and then a few more times, so that it's not, you know, uh, I think, uh, programmers, uh, will all latch onto this, where they have the worst end of the stick, right?

Because a and Q and QA as well. Because, you know, any bad decision or assumption that's happened early on with, you know, the, the business team or, or the PM, you know, gets like multiplied when they talk to the designer and then gets multiplied again, they hand it off. And it's always the engineer who has to, has to put the final foot down, be like, this doesn't make sense.

Or I think users are going to react this way, or, you know, this is the implication of that, that assumption. So, um, it's the same thing, you know, in our team, we have it in the very early stage when someone's putting together the idea for the feature, our project, we want to work on it's right there. There's a few, there's like a section about accessibility and a few other sections, uh, talking about like looking out for this negative impact.

So right away, we can have a discussion about it when we're talking about like what we should do about this and the D and the different, implications of implementing it. That's the perfect place for it. You know, like maybe, maybe when you're a brainstorm. Uh, about like, what should we should do? Maybe it's not okay there because you're trying to be creative.

Right. You're trying to think. But at the very next step, when you're saying, okay, like what would it mean to build this that's exactly where I should start showing up and, you know, the discussion from the team. And it depends also the, the risk involved, right? Like, uh, it depends, which is attached to how much, uh, time and effort and resources you should put towards avoiding that risk it's risk management.

So, you know, if you work, um, like my, um, you know, colleagues, uh, or, you know, some of my friends were working in the automotive industry and you're creating a software and you're worried that it might be distracting. There might be a lot more time and effort or the healthcare industry. Um, those were, those are, those might need to take a lot more resources, but if you're a, maybe a building, um, you know, SaaS software for engineers to spin up, you know, they're, um, you know resources.

Um, there might be a different amount of resources. It never is zero, uh, because you still have, are dealing with people and you'll impact them. And, you know, maybe, you know, that service goes down and that was a healthcare service that went down because of your, you know, so you really have to think about what the risk is.

And then you can map that back to how much time and effort you need to be spending on getting that. Right. And accessibility is one of those things too, where a lot of people think that it takes a lot of effort, a lot of resources to be accessible. And it really isn't. It just, um, it's just like tech debt, you know, if, if you have ignored your tech debt for, you know, five years, and then they're saying, Hey, let's all fix all the tech debt. Yeah. Nobody's going to be on board for that as much. Versus like, if, if addressing that and finding the right level of tech debt that you're okay with and when you address it and how, um, because, and just better practice. That's the same thing with accessibility is like, if you're just building it correctly, as you go, it's, it's very low effort and it just creates a better product, better decisions.

Um, and it is totally worth the increased amount of people who can use it and the improved quality for all users. So, um, yeah, it's just kind of like a win-win situation.

[00:14:26] Jeremy: one of the things you mentioned was that this should all start. At the very beginning or at least right after you've decided on what kind of product you're going to build, and that's going to make it much easier than if you come in later and try to, make fixes then, I wonder when you're all getting together and you're trying to come up with these scenarios, trying to figure out negative impacts, what kind of accessibility, needs you need to have, who are the people who are involved in that conversation?

Like, um, you know, you have a team of 50 people who needs to be in the room from the very beginning to start working this out.

[00:15:05] Jonathan: I think it would be the same people who are there for the project planning, like, um, at, on my team, we have our eng counter counterparts there. at least the team lead, if, if, if there's a lot of them, but you know, if they would go to the project kickoff, uh, they should be there.

you know, we, we have everybody in their PM, design, engineers, um, our project manager, like anyone who wants to contribute, uh, should really be there because the more minds you have with this the better, and you'll, you'll tease out much, much more of, of of all the potential problems because you have a more, more, um, diverse set of brains and life experiences to draw from.

And so you'll, you'll get closer to that 80% mark, uh, that you can just quickly take off a lot of those big items off the table, right?

[00:16:00] Jeremy: Is there any kind of formal process you follow or is it more just, people are thinking of ideas, putting them out there and just having a conversation.

[00:16:11] Jonathan: Yeah, again, it depends which industry you're in, what the risk is. So I previously worked at a healthcare industry, um, and for us to make sure that we get that right, and how it's going to impact the patients, especially though is cancer care. And they were using our product to get early warnings of adverse effects.

Our, system of figuring that like, you know, if that was going to be an issue was more formalized. Um, in, in some cases, uh, like, like actually like healthcare and especially if the, if it's a device or, or in certain software circumstances, it's determined by the FDA to be a certain category, you literally have a, uh, governmental version of this.

So the only reason that's there is because it can prevent a lot of harm, right? So, um, that one is enforced, but there's, there's reasons, uh, outside of the FDA to have that exact formalized part of your process. And it can, the size of it should scale depending on what the risk is. So on my team, the risk is, is actually somewhat low.

it's really just part of the planning process. We do have moments where we, we, um, when we're, uh, brainstorming like what we should do and how the feature will actually work. Where we talk about like what those risks are and calling out the accessibility issues. And then we address those. And then as we are ready to, um, get ready to ship, we have another, um, formalized part of the process.

There will be check if the accessibility has been taken care of and, you know, if everything makes sense as far as, you know, impact to users. So we have those places, but in healthcare, but it was much stronger where we had to, um, make sure that we re we we've tested it. We've, uh, it's robust. It's going to work on, we think it's going to work.

Um, we, you know, we do user testing has to pass that user testing, things like that before we're able to ship it, uh, to the end user.

[00:18:12] Jeremy: So in healthcare, you said that the FDA actually provides, is it like a checklist of things to follow where you must have done this? As you're testing and you must have verified these, these things that's actually given to you by the government.

[00:18:26] Jonathan: That's right. Yeah. It's like a checklist and the testing requirement. Um, and there's also levels there. So, I have, I've only, I've only done the lowest level. I know. There's like, I think like two more levels above that. Um, and again, that's like, because the risk is higher and higher and there's more stricter requirements there where maybe somebody in the FDA needs to review it at some point.

And, um, so again, like mapping it back to the risk that your company has is, is really important to understanding that is going to help you avoid and, and build a better product, avoid, you know, the bad impact and build a better product. And, and I think that's one of the things I would like to focus on as well.

And I'd like to highlight for your, for your listeners, is that, it's not just about avoiding tragic design because one thing I've discovered since writing the book and sharing it with a lot of people. Is that the exact opposite thing is usually, you know, in a vast majority of the cases ends up being a strategically great thing to pursue for the product and the company.

You know, if you think about, that, that example with, with Facebook, okay. You've run into a problem that you want to avoid, but if you actually do a 180 there and you find ways to engage with people, when they're grieving, you find people to, to develop features that help people who are grieving, you've created a value to your users, that you can help build the company off of.

Right. Um, cause they were already building a bunch of joy features. Right. Um, you know, and also like user privacy, like I, we see apple doing that really well, where they say, okay, you know, we are going to do our ML on device. We are going to do, you know, let users decide on every permission and things like that.

And that, um, is a strategy. We also see that with like something like T-Mobile, when they initially started out, they were like one of the nobody, uh, telecoms in the world. And they said, okay, what are all the unethical bad things that, uh, our competitors are doing? They're charging extra fees, you know, um, they have these weird data caps that are really confusing and don't make any sense their contracts, you get locked into for many years.

They just did the exact opposite of that. And that became their business strategy and it, and it worked for them now. They're, they're like the top, uh, company. So, um, I think there's a lot of things like that, where you just look at the exact opposite and, you, one you get to avoid the bad, tragic design, but you also see boom, you see an opportunity that, um, become, become a business strategy.

[00:21:03] Jeremy: So, so when you referred to exact opposite, I guess you're, you're looking for the potentially negative outcomes that could happen. there was the Facebook example of, of seeing a photo or being reminded of a really sad event and figuring out can I build a product around, still having that same picture, but recontextualizing it like showing you that picture in a way that's not going to make you sad or upset, but is actually a positive.

[00:21:35] Jonathan: Yeah. I mean, I don't know maybe what the solution was, but like one example that comes to mind is some companies. Now, before mother's day, we'll send you an email and say, Hey, this is coming up. Do you want us to send you emails about mother's day? Because for some people that's Can, be very painful. That's that's very thoughtful.

Right. And that's a great way to show that you, that you care. Um, but yeah, like, you know, uh, thinking about that Facebook example, like if there's a formalized way to engage with, with grieving, like, I would use Facebook for that. I don't use Facebook very often or almost at all, but you know, if somebody passed away, I would engage right with my, my Facebook account.

And I would say, okay, look, there's like, there's this whole formalized, you know, feature around, you know, uh, and, and Facebook understands grieving and Facebook understands like this w this event and may like smooth that process, you know, creates comfort for the community that's value and engagement. that is worthwhile versus artificial engagement.

That's for the sake of engagement. and that would create, uh, a better feeling towards Facebook. Uh, I would maybe like then spend more time on Facebook. So it's in their mutual interest to do it the right way. Um, and so it's great to focus on these things to avoid harm, but also to start to see new opportunities for innovation.

And we see this a lot already in accessibility where there's so many innovations that have come from just fixing accessibility issues like closed captions. We all use it, on our TVs, in busy crowded spaces, on, you know, videos that have no, um, uh, translation for us in different places.

So, SEO is, is the same thing. Like you get a lot of SEO benefit from, you know, describing your images and, and making everything semantic and things like that. And that also helps screen readers. and different innovations have come because somebody wanted to solve an accessibility need.

And then the one I love, I think it's the most common one is readability, like contrast and tech size. Sure. There's some people who won't be able to read it at all, but it hurts my eyes to read bad contrast and bad text size. And so it just benefits. Everyone creates a better design. And one of the things that comes up so often when I'm, you know, I'm the accessibility program lead.

And so I see a lot of our bugs is so many issues that, that are caught because of our, our audits and our, like our test cases around accessibility that just our bad design and our bad experience for everyone. And so we're able to fix that. And, uh, and it's just like an another driver of innovation and there's, there's, there's a ton of accessibility examples, and I think there's also a ton of these other, you know, ethical examples or, you know, uh, avoiding harm where you just can see it. It's an opportunity area where it's like, oh, let's avoid that. But then if you turn around, you can see that there's a big opportunity to create a business strategy out of it.

[00:24:37] Jeremy: Can, can you think of any specific examples where you've seen that? Where somebody, you know, doesn't treat it as something to avoid, but, but actually sees that as an opportunity.

[00:24:47] Jonathan: Yeah. I mean, I, I think that the, um, the apple example is a really good one where from the beginning, like they, they saw like, okay, in the market, there's a lot of abuse of information and people don't like that. So they created a business strategy around that And that's become a big differentiator for them.

Right. Like they, they have like ML on the device. They do. Um, they have a lot of these permission settings, you know, the Facebook. It was very much focused right. On, on using customer data and a lot of it without really asking their permission. And so once apple said, okay, now all apps need to show what you're tracking.

And, and then, um, and asked for permission to do that. A lot of people said no, and that caused about $10 billion of loss for, for Facebook. and for, for apple, it's, you know, they advertise on that now that we're, you know, ethical that, you know, we, we source things ethically and we, we care about user privacy and that's a strong position, right?

Uh, I think there's a lot of other examples out there. Like I mentioned accessibility and others, but like it they're kind of overflowing, so it's hard to pick one.

[00:25:58] Jeremy: Yeah. And I think what's interesting about that too, is with the example of focusing on user privacy or trying to be more sensitive around, death or things like that, as I think that other people in the industry will, will notice that, and then in their own products, then they may start to incorporate those things as well.

[00:26:18] Jonathan: Yeah. Yeah, exactly what the example of with T-Mobile. once that worked really, really well and they just ate up the entire market, all the other companies followed suit, right? Like now, um, having those data caps that, you know, are, are very rare, having those surprise fees are a lot, uh, rare.

Um, you know, there's, there's no more like deep contracts that lock you in and et cetera, et cetera. A lot of those have become industry standard now. Um, and so It, and it does improve the environment for everyone because, because now it becomes a competitive advantage that everybody needs to meet. Um, so yeah, I think that's really, really important.

So when you're going through your product's life cycle, you might not have the ability to make these big strategic decisions. Like, you know, we want to, you know, not have data caps or whatever, but, you know, if you, if you're on that Facebook level and you run into that issue, you could say, well, look, what could we do to address this?

What could we could do to, to help this and make, make that a robust feature? You know, when we talk about, lot of these dating apps, one of the problems was a lot of abuse, where women were being harassed or, you know, after the day didn't go well and you know, things were happening. And so a lot of apps have now dif uh, these dating apps have differentiated themselves and attracted a lot of that market because they deal with that really well.

And they have, you know, it's built into the strategy. It's oftentimes like a really good place to start too, because one it's not something we generally think about very, very well, which means your competitors. Haven't thought about it very well, which means it's a great place to, to build products, ideas off of.

[00:27:57] Jeremy: Yeah, that's a good point because I think so many applications now are like social media applications, their messaging applications there, their video chat, that sort of thing. I think when those applications were first built, they didn't really think so much about what if someone is, you know, sending hateful messages or sending, pictures that people really don't want to see.

Um, people are doing abusive things. It was like, they just assume that, oh, people will be, people will be good to each other and it'll be fine. But, uh, you know, in the last 10 years, pretty much all of the major social media companies have tried to figure out like, okay, um, what do I do if someone is being abusive and, and what's the process for that?

And basically they all have to do something now. Um,

[00:28:47] Jonathan: Yeah. And that's a hard thing to like, if, if that, uh, unethical or that, um, bad design decision is deep within your business strategy and your company's strategy. It's hard to undo that like some companies are still, still have to do that very suddenly and deal with it. Right. Like, uh, I know Uber had a big, big part of them, like, uh, and some other companies, but, uh, we're like almost suddenly, like everything will come to a head and they'll need to deal with it.

Or, you know, like, Twitter now try to try to get, be acquired by Elon Musk. Uh, some of those things are coming to light, but, I, what I find really interesting is that these these areas are like really ripe for innovation. So if you're interested in, a startup idea or you're, or you're working in a startup, or, you know, you're about to start one, you know, there's a lot of maybe a lot of people out there who are thinking about side projects right now, this is a great way to differentiate and win that market against other well-established competitors is to say, okay, well, what are they, what are they doing right now that is unethical. And it's like, you know, core to their business strategy and doing that differently is really what will help you, to win that market. And we see that happening all the time, you know, especially the ones that are like these established, uh, leaders in the market. they can't pivot like you can, so being able to say, I'm, we're going to do this ethically.

We're going to do this, uh, with, you know, with these tragic design in mind and doing the opposite, that's going to help you to, to find your, your attraction in the market.

[00:30:25] Jeremy: Earlier, we were talking about. How in the medical field, there is specific regulation or at least requirements to, to try and avoid this kind of tragic design. Uh, I noticed you also worked for Intuit before. Uh, um, so for financial services, I was wondering if there was anything similar where the government is stepping in and saying like, you need to make sure that, these things happen to avoid, these harmful things that can come up.

[00:30:54] Jonathan: Yeah, I don't know. I mean, I didn't work on TurboTax, so I worked on QuickBooks, which is like a accounting software for small businesses. And I was surprised, like we didn't have a lot, like a lot of those robust things, we just relied on user feedback to tell us like, things were not going well. And, you know, and I think we should have, like, I think, I think that that was a missed opportunity, um, to.

Show your users that you understand them and you care, and to find those opportunity areas. So we didn't have enough of that. And there was things that we shipped that didn't work correctly right out of the box, which, you know, it happens, but had a negative impact to users. So it's like, okay, well, what do we do about that?

How do we fix that? Um, and if the more you formalize that and make it part of your process, the more you get out of it. And actually this is like, this is a good, a good, um, uh, pausing point bit that I think will affect a lot of engineers listening to this. So if you remember in the book, we talk about the Ford Pinto story and there isn't, I want to talk about this story and why I added it to the book.

Is that, uh, one, I think this is the thing that engineers deal with the most, um, and, and designers do too, which is that okay. we see the problem, but we don't think it's worth fixing. Okay. Um, so that, that's what I'm going. That's what we're going to dig into here. So it's a, hold on for a second while I explain some, some history about this car.

So the Ford Pinto, if you're not familiar is notorious, uh, because it was designed, um, and built and shipped and there, they knowingly had this problem where if it was rear-ended at even like a pretty low speed, it would burst into flames because the gas tank would rupture the, and then oftentimes the, the, the doors would get jammed.

And so it became a death trap of fire and caused many deaths, a lot of injuries. And, um, in an interview with the CEO at the time, like almost destroyed Ford like very seriously would have brought the whole company down and during the design of it, uh, and design meaning in the engineering sense. Uh, and the engineering design of it, they say they found this problem and the engineers came up with their best solution.

Was this a rubber block. Um, and the cost was, uh, I forget how many dollars let's say it was like $9. let's say $6, but this is again, uh, back then. And also the margin on these cars was very, very, very thin and very important to have the lowest price in the market to win those markets. The customers were very price sensitive, so they, uh, they being like the legal team looked at like some recent, cases where they have the value of life and started to come up with like a here's how many people would sue us and here's how much it would cost to, uh, to, to settle all those.

And then here's how much it would cost to add this to all the cars. And it was cheaper for them to just go with the lawsuits and they, they found. Um, and I think why, I think why this is so important is because of the two things that happened afterward, one, they were wrong. it was a lot more people it affected and the lawsuits were for a lot more money.

And two after all this was going crazy and it was about to destroy the company, they went back to the drawing board and what did the engineers find? They found a cheaper solution. They were able to rework that, that rubber block and and get it under the margin and be able to hit the mark that they wanted to.

And I think that's, there's a lot of focus on the first part because it's so unethical to the value of life and, and, um, and doing that calculation and being like we're willing to have people die, but in some industries, it's really hard to get away with that, but it's also very easy. To get into that.

It's very easy to get lulled into this sense of like, oh, we're just going to crunch the numbers and see how many users it affects. And we're okay with that. Um, versus when you have principals and you have kind of a hard line and you, and you care a lot more than you should. And, and you really push yourself to create a more ethical, more, a safer, you know, avoiding, tragic design, then you, there there's a solution out there.

Like you actually get to innovation, you actually get to the solving the problem versus when you just rely on, oh, you know, the cost benefit analysis we did is that it's going to take an engineer in a month to fix this and blah blah blah. But if, if you have those values, if you have those principles and you're like, you know what, we're not okay shipping this, then you'll, you'll find that.

They're like, okay, there's, there's a cheaper way to, to fix this. There's another way we could address this. And that happens so often. and I know a lot of engineers deal with that. A lot of saying like, oh, you know, this is not worth our time to fix. This is not worth our time to fix. And that's why you need those principles is because oftentimes you don't see it and it's, but it's right there at right outside of the edge of your vision.

[00:36:12] Jeremy: Yeah. I mean, with the Pinto example, I'm just picturing, you know, obviously there wasn't JIRA back then, but you can imagine that somebody's having an issue that, Hey, when somebody hits the back of the car, it's going to catch on fire. Um, and, and going like, well, how do I prioritize that? Right? Like, is this a medium ticket?

Is this a high ticket? And it's just like, it's just, it just seems insane, right? That you could, make the decision like, oh no, this isn't that big an issue. You know, we can move it down to low priority and, and, and, ship it.

Okay.

[00:36:45] Jonathan: Yeah. And, and, and that's really what principals do for you, right? Is they help you make the tough decisions. You don't need a principle for an easy one. Uh, and that's why I really encourage people in the book to come together as a team and come up with what are your guiding principles. Um, and that way it's not a discussion point every single time.

It's like, Hey, we've agreed that this is something that we, that we're going to care about. This is something that we are going to stop and, fix. Like, one of the things I really like about my team at Google is product excellence is very important to us. and. there are certain things that, uh, we're, you know, we're Okay. with, um, letting slip and fixing at a next iteration.

And, you know, obviously we make sure we actually do that. Um, so it's not like we, we, we always address everything, but because it's one of our principles. We care more. We have more, we take on more of those tickets and we take on more of those things and make sure that they ship before, um, can make sure that they're fixed before we ship.

And, and it shows like to the end user that th that this company cares and they have quality. Um, so it's one of it. You need a principal to kind of guide you through those difficult things that aren't obvious on a decision to decision basis, but, you know, strategically get you in somewhere important, you know, and, and like, like design debt or, um, our technical debt where it's like, this should be optimized, you know, this chunk of code, like, nah, but you know, in, in it grouping together with a hundred of those decisions.

Yeah. It's gonna, it's gonna slow it down every single project from here on out. So that's why you need those principles.

[00:38:24] Jeremy: So in the book, uh, there are a few examples of software in healthcare. And when you think about principles, you would think. Generally everybody on the team would be on board that we want to give whatever patient that's involved. We want to give them good care. We want them to be healthy. We don't want them to be harmed.

And given that I I'm wondering because you, you interviewed multiple people in the book, you have a few different case studies. Um, why do you think that medical software in particular seems to be, so it seems to have such poor UX or has so many issues.

[00:39:08] Jonathan: Yeah, that's a, complicated topic. I would summarize it with a few, maybe three different reasons. Um, one which I think is, uh, maybe a driving factor of, of some of the other ones. Is that the way that the medical, uh, industry works is the person who purchases the software. It's not the end user. So it's not like you have doctors and nurses voting on, on which software to use.

Um, and so oftentimes it's, it's more of like a sales deal and then just gets pushed out and they, and they also have to commit to these things like, um, the software is very expensive and, uh, initially with, you know, like in the early days was very much like it needs to be installed, maintain, there has to be training.

So there was a lot to money to be made, in those, in that software. And, and so the investment from the hospital was a lot, so they can't just be like, oh, can it be to actually, don't like this one, we're going to switch to the next one. So, because like, once it's sold, it's really easy to just like, keep that customer.

There's very little incentive to like really improve it unless you're selling them a new feature. So there's a lot of feature add ons. Because they can charge more for those, but improving the experience and all that kind of stuff. There is less of that. I think also there's just generally a lot less like, uh, understanding of design, in that field.

And there's a lot more because there's sort of like traditions of things. they end up putting a lot of the pressure and the, that responsibility on the end individuals. So, you know, you've heard recently of that nurse who made a medication error and she's going to jail for that. And sh you know, And oftentimes we blame that end, that end person.

So the, the nurse gets all the blame or the doctor gets all the blame. Well, what about the software, you know, who like made that confusing or, you know, what about the medication that looks exactly like this other medication? Or what about the pump tool that you have to, you know, type everything in very specifically, and the nurses are very busy.

They're doing a lot of work. There's a 12 hour shifts. They're dealing with lots of different patients, a lot of changing things for them to have to worry about having to type something a specific way. And yet when those problems happen, what do they do? They don't go in like redesign the devices. Are they more training, more training, more training, more training, and people only can absorb so much training.

and so I think that's part of the problem is that like, there's no desire to change. They blame the end, the wrong person, and. Uh, lastly, I think that, um, it is starting to change. I think we're starting to see like the ability for, because of the fact that the government is pushing healthcare records to be more interoperable, meaning like I can take my health records anywhere, that a lot of the power comes in where the data is.

And so, um, I'm hoping that, uh, you know, as the government and people and, um, and initiatives push these big companies, like epic to be more open, that things will improve. One is because they'll have to keep up with their competitors and that more competitors will be out there to improve things. Because I, I think that there's, there's the know-how out there, but like, because the there's no incentive to change and, and, and there's no like turnover and systems and there's the blaming of the end user.

We're not going to see a change anytime soon.

[00:42:35] Jeremy: that's a, that's a good point in terms of like, it, it seems like even though you have all these people who may have good ideas may want to do a startup, uh, if you've got all these hospitals that already locked into this very expensive system, then yeah. Where's, where's the room to kind of get in there in and have that change.

[00:42:54] Jonathan: yeah.

[00:42:56] Jeremy: Uh, another thing that you talk about in the book is about how, when you're in a crisis situation, the way that a user interacts with something is, is very different. And I wonder if you have any specific examples for software when, when that can happen.

[00:43:15] Jonathan: yeah. Designing for crisis is a very important part of every software because, it might be hard for you to imagine being in that situation, but, it, it definitely will still happen so. one example that comes to mind is, uh, you know, let's say you're working on a cloud, um, software, like, uh, AWS or Google cloud.

Right. there's definitely use cases and user journeys in your product where somebody would be very panicked. Right. Um, and if you've ever been on an on-call with, with something and it goes south, and it's a big deal, you don't think. Right. Right. Like when we're in crisis, our brains go into a totally different mode of like that fight or flight mode.

And we don't think the way we do, it's really hard to read and comprehend very hard. and we might not make this, the right decisions and things like that. So, you know, thinking about that, like maybe your, your let's say, like, going back to that, the cloud software, like let's say you're, you're, you're working on that, like.

Are you relying on the user reading a bunch of texts about this button, or is it very clear from the way you've crafted that exact button copy and how big it is? And, and it's where it is relation to a bunch of other content? Like what exactly it does. It's going to shut down the instance where it's gonna, you know, it's, it's gonna, do it at a delay or whatever, like be able to all those little decisions, like are really impactful.

And when you, when you run them through the, um, the, the furnace of, of, of, uh, um, a user journey that's relying on, on a really urgent situation, you'll obviously help that. And you'll, you'll start to see problems in your UI that you hadn't noticed before, or, or different problems in the way you're implementing things that you didn't notice before, because you're seeing it from a different way.

And that's one of the great things about, um, the, the systems and the book that we talk about around, like, thinking about how things could go wrong, or, you know, thinking about, you know, designing for crisis. Is it makes you think of some new use cases, which makes you think of some new ways to improve your product.

You know, that improvement you make to make it so obvious that someone could do it in a crisis would help everyone, even when they're not in a crisis. Um, so that, that's why it's important to, to focus on those things.

[00:45:30] Jeremy: And for someone who is working on these products, it's kind of hard to trigger that feeling of crisis. If there isn't actually a crisis happening. So I wonder if you can talk a little bit about how you, you try to design for that when it's not really happening to you. You're just trying to imagine what it would feel like.

[00:45:53] Jonathan: yeah. Um, you're never really going to be able to do that. Like, so some of it has to be simulated, One of the ways that we are able to sort of simulate what we call cognitive load. Which is one of the things that happen during a crisis. But what also happened when someone's very distracted, they might be using your product while they're multitasking.

We have a bunch of kids, a toddler constantly pulling on their arm and they're trying to get something done in your app. So, one of the ways that has been shown to help, uh, test that is, um, like the foot tapping method. So when you're doing user research, you have the user doing something else, like tapping or like, You know, uh, make it sound like they have a second task that they're doing on the side.

It's manageable, like tapping their feet and their, their hands or something. And then they also have to do your task. Um, so like you can like build up what those tabs with those extra things are that they have to do while they're also working on, uh, finishing the task you've given them. and, and that's one way to sort of simulate cognitive load.

some of the other things is, is really just, um, you know, listening to users, stories and, and find out, okay, this user was in crisis. Okay, great. Let's talk to them and interview them about that. Uh, if it was fairly recently within like the past six months or something like that. but, but sometimes you don't like, you just have to run through it and do your best.

Um, and you know, those black Swan events or those, even if you're able to simulate it yourself, like put your, put your, put yourself into that exact position and be in panic, which, you know, you're not able to, but if you were that still would only be your experience and you wouldn't know all the different ways that people could experience this.

So, and there's going to be some point in time where you're gonna need to extrapolate a little bit and, you know, extrapolate from what you know, to be true, but also from user testing and things like that. And, um, and then wait for a real data

[00:47:48] Jeremy: You have a chapter in the book on design that angers and there were, there were a lot of examples in there, on, on things that are just annoying or, you know, make you upset while you're using software. I wonder for like our audience, if you could share just like a few of your, your favorites or your ones that really stand out.

[00:48:08] Jonathan: My favorite one is Clippy because, um, you know, I remember growing up, uh, you know, writing software, writing, writing documents, and Clippy popping up. And, I was reading an article about it and obviously just like everybody else, I hated it. You know, as a little character, it was fun, but like when you're actually trying to get some work done, it was very annoying.

And then I remember, uh, a while later reading this article about how much work the teams put into clubby. Like, I mean, if you think about it now, It had a lot of like, um, so the AI that we're playing with just now, um, around like natural language processing, understanding, like what, what type of thing you're writing and coming up with contextualized responses, like it was pretty advanced for the, uh, very advanced for the time, you know, uh, adding animation triggers to that and all, all that.

Um, and they had done a lot of user research. I was like, what you did research in, like you had that reaction. And I love that example because, oh, and also by the way, I love how they, uh, took Clippy out and S and highlighted that as like one of the features of the next version of the office, uh, software.

but I love that example again, because I see myself in that and, you know, you ha you have a team doing something technologically amazing doing user research, uh, and putting out a very great product, but he totally missing. And a lot of products do that. A lot of teams do that. And why is that? It's because they're, um, they're not thinking about, uh, they're putting their, they're putting the business needs or the team's needs first and they're putting the user's needs second.

And whenever we do that, whenever we put ourselves first, we become a jerk, right? Like if you're in a relationship and you're always putting yourself first, that relationship is not going to last long or it's not going to go very well. And yet we Do that with our relationship with users where we're constantly just like, Hey, well, what is the business?

The business wants users to not cancel here so let's make it very difficult for people to cancel. And that's a great way to lose customers. That's a great way to create, this dissonance with your, with your users. And, um, and so if you, if you're, focused on like, this is what the we need to accomplish with the users, and then you work backwards from.

You're you're, you're, you're, you're lower your chances of missing it, of getting it wrong of angering your users. and const always think about like, you sometimes have to be very real with yourselves and your team. And I think that's really hard for a lot of teams because we have we don't want to look bad.

We don't want to, but what I found is those are the people who actually, um, get promoted. Like, you know, if you look at the managers and directors and stuff, those are the people who can be brutally honest. Right. Um, who can say, like, I don't think this is ready. I don't, I don't think this is good. And so you actually, I, I, you know, I've done that in the front of like our CEO and things like that.

And I've always had really good responses from them to say, like, we really appreciate that you, you know, uh, you can call that out and you can just call it like, it is like, Hey, this is what we see this user. Maybe we shouldn't do this at all. Maybe. Um, and that can, uh, you know, at Google that's one of the criteria that we have in our software engineers and the designers of being able to spot things that are, you know, things that we shouldn't should stop doing.

Um, and so I think that's really important for the development of, of a senior engineer, uh, to be able to, to know that that's something like, Hey, this project, I would want it to work, but in its current form is not good. And being able to call that out is very important.

[00:51:55] Jeremy: Do you have any specific examples where there was something that was like very obvious to you? To the rest of the team or to a lot of other people that wasn't.

[00:52:06] Jonathan: um, yeah, so here's an example I finally got, I was early on in my career and I finally got to lead in our whole project. So we are redesigning our business micro-site um, and I got to, I got, uh, assigned two engineers and another designer and I got to lead the whole. I was, I was like, this is my chance.

Right? So, and we had a very short timeline as well, and I put together all these designs. And, um, one of the things that we aligned on at the time was like as really cool, uh, so I put together this really cool design for the contact form, where you have like, essentially, I kind of like ad-lib, it looks like a letter.

and you know, by the way, give me a little bit of, of, uh, of, of leeway here. Cause this was like 10 years ago, but, uh, it was like a letter and you would say like, you're addressing it to our company. And so it had all the things we wanted to get out of you around like your company size, your team, like, and so our sales team would then reach out to this customer.

I designed it and I had shown it to the team and everybody loved it. Like my manager signed off on it. Like all the engineers signed off on it, even though we had a short timeline, they're like, yeah, well we don't care. That's so cool. We're going to build it. But as I put it through that test of like, does this make sense for the, what the user wants answers just kept saying no to me.

So I had to go and back in and pitch everybody and argue with them around not doing the cool idea that I wanted to do. And, um, eventually they came around and that form performed once we launched it performed really well. And I think about like, what if users had to go through this really wonky thing?

Like this is the whole point of the website is to get this contact form. It should be as easy and as straightforward as possible. So I'm really glad we did that. And I can think of many, many more of those situations where, you know, um, we had to be brutally honest with ourselves with like this isn't where it needs to be, or this isn't what we should be doing.

And we can avoid a lot of harm that way too, where it's like, you know, I don't, I don't think this is what we should be building. Right.

[00:54:17] Jeremy: So in the case of this form, was it more like you, you had a bunch of drop-downs or S you know, selections where you would say like, okay, these are the types of information that I want to get from the person filling out the form as a company. but you weren't looking so much at, as the person filling out the form, this is going to be really annoying.

Was

that kind

[00:54:38] Jonathan: exactly, exactly. Like, so their experience would have been like, they come up, they come at the end of this page or on like contact us and it's like a letter to our company. And like, we're essentially putting words in their mouth because they're, they're filling out the, letter. Um, and then, yeah, it's like, you know, you have to like read and then understand like what, what that part of this, the, the page was asking you and, you know, versus like a form where you're, you know, it's very easy.

Well-known bam. You're, you're you're on this page. So you're interested in, so like, get it, get them in there. So we were able to, to decide against that and that, you know, we, we also had to, um, say no to a few other things, but like we said yes, to some things that were great, like responsive design, um, making sure that our website worked at every single use case, which is not like a hard requirement at the time, but was really important to us and ended up helping us a lot because we had a lot of, you know, business people who are on their phone, on the go, who wanted to, to check in and fill out the form and do a bunch of other stuff and learn about us.

So that, that, that sales, uh, micro-site did really well because I think we made the right decisions and all those kinds of areas. And like those, those general, those principles helped us say no to the right things, even though it was a really cool thing, it probably would have looked really great in my portfolio for a while, but it just wasn't the right thing to do for the, the, the goal that we had.

[00:56:00] Jeremy: So did it end up being more like just a text box? You know, a contact us fill in. Yeah.

[00:56:06] Jonathan: You know, with usability, you know, if someone's familiar with something and it's, it's tired, everybody does it, but that means everybody knows how to use it. So usability constantly has that problem of innovation being less usable. Um, and so sometimes it's worth the trade-off because you want to attract people because of the innovation and they'll bill get over that hump with you because the innovation is interesting.

So sometimes it's worth it and sometimes it's not, and you really have to, I'd say most times it's not. Um, and So you have to find like, what is, when is it time to innovate and when is it time to do the what's tried and true. Um, and on a business microsite, I think it's time to do tried and true.

[00:56:51] Jeremy: So in your research for the book and all the jobs you've worked previously, are there certain. Mistakes or just UX things that you've noticed that you think that our audience should know about?

[00:57:08] Jonathan: I think dark patterns are one of the most common, you know, tragic design mistakes that we see, because again, you're putting the company first and the user second. And you know, if you go to a trash, sorry, if you go to a dark patterns.org, you can see a great list. Um, there's a few other sites that have a nice list of them and actually Vox media did a nice video about, uh, dark patterns as well.

So it's gaining a lot of traction, but you know, things like if you try to cancel your search, like Comcast service or your Amazon service, it's very hard. Like I think I wrote this in the book, but. Literally re researched what's the fastest way to delete it to, to, you know, uh, remove your Comcast account.

I prepared everything. I did it through chat because that was the fastest way for first, not to mention finding chat by the way was very, very hard for me. Um, so I took me, even though I was like, okay, I have to find I'm going to do it through chat. I'm gonna do all this. It took me a while to find like chat, which I couldn't find it.

So once I finally found it from that point to deleting from having them finally delete my account was about an hour. And I knew what to do going in just to say all the things to just have them not bother me. So th that's on purpose they've purposely. Cause it's easier to just say like fine, I'll take the discount thing.

You're throwing in my face at the last second. And it's almost become a joke now that like, you know, you have to cancel your Comcast every year, so you can keep the costs down. Um, you know, and Amazon too, like trying to find that, you know, delete my account is like so buried. You know, they do that on purpose and a lot of companies will do things like, you know, make it very easy to sign up for a free trial and, and hide the fact that they're going to charge you for a year high.

The fact that they're automatically going to bill you not remind you when it's about to expire so that they can like surprise, get you in to forget about this billing subscription or like, you know, if you've ever gotten Adobe software, um, they are really bad at that. They, they trick you into like getting this like monthly sufficient, but actually you've committed to a year.

And if you want to cancel early, we'll charge you like 80% of the year. And, uh, and there's a really hard to contact anybody about it. So, um, it happens quite often. If the more you read into those, um, different things, uh, different patterns, you'll start to see them everywhere. And users are really catching onto a lot of those things and are responding.

To those in a very negative way. And like, um, we recently, uh, looked at a case study where, you know, this free trial, um, this company had a free trial and they had like the standard free trial, um, uh, kind of design. And then their test was really just focusing on like, Hey, we're not going to scam you. If I had to summarize that the entire direction of the second one, it was like, you know, cancel any time.

Here's exactly how much you'll be charged. And on the, it'll be on this date, uh, at five days before that we'll remind you to cancel and all this stuff, um, that ended up performing about 30% better than the other one. And the reason is that people are now burned by that trick so much so that every time they see a free trial, they're like, forget it.

I don't, I don't want to deal with all this trickery. Like, oh, I didn't even care about to try the product versus like. We were not going to trick you. We really want you to actually try the product and, you know, we'll make sure that if you're not wanting to move forward with this, that you have plenty of time and plenty of chances to lead and that people respond to that now.

So that's what we talked about earlier in the show of doing the exact opposite. This is another example of that.

[01:00:51] Jeremy: Yeah, because I think a lot of people are familiar with, like you said, trying to cancel Comcast or trying to cancel their, their New York times subscription. And they, you know, everybody is just like, they get so mad at the process, but I think they also may be assume that it's a positive for the company, but what you're saying is that maybe, maybe that's actually not in the company's best interest.

[01:01:15] Jonathan: Yeah. Oftentimes what we find with these like dark patterns or these unethical decisions is that th they are successful because, um, when you look at the most impactful, like immediate metric, you can look at, it looks like it worked right. Like, um, you know, let's say for that, those free trials, it's like, okay, we implemented like all this trickery and our subscriptions went up.

But if you look at like the end, uh, result, um, which is like farther on in the process, it's always a lot harder to track that impact. But we all know, like when we look at each other, like when we, uh, we, we, we talk to each other about these different, um, examples. Like we know it to be true, that we all hate that.

And we all hate those companies and we don't want to engage with them. And we don't, sometimes we don't use the products at all. So, um, yeah, it, it, it's, it's one of those things where it actually has like that, very real impact, but harder to track. Um, and so oftentimes that's how these, these patterns become very pervasive is the oh, and page views went up, uh, this was, this was a really, you know, this is high engagement, but it was page views because people were refreshing the page trying to figure out where the heck to go. Right. So um, oftentimes they they're less effective, but they're easier to track

[01:02:32] Jeremy: So I think that's, that's a good place to, to wrap things up, but, um, if people want to check out the book or learn more about what you're working on your podcast, where should they head?

[01:02:44] Jonathan: Um, yeah, just, uh, check out tragic design.com and our podcast. You can find on any of your podcasting software, just search design review podcast.

[01:02:55] Jeremy: Jonathan, thank you so much for joining me on software engineering radio.

[01:02:59] Jonathan: alright, thanks Jeremy. Thanks everyone. And, um, hope you had a good time. I did.

Aug 17, 2022

00:57:52

This episode originally aired on Software Engineering Radio.

Randy Shoup is the VP of Engineering and Chief Architect at eBay. He was previously the VP of Engineering at WeWork and Stitch Fix, a Director of Engineering at Google Cloud where he worked on App Engine, and a Chief Engineer and Distinguished Architect at eBay in 2004.

Topics covered:

eBay’s origins as a single C++ class
The five-year migration to Java services
Sharing a database between the old and new systems
Building a distributed tracing system
Working with bare metal
Why most companies should stick to cloud
Why individual services should own their own data storage
How scale has caused solutions to change
Rejoining a former company
The Accelerate Book
Improving delivery time.

Related Links:@randyshoup

The Epic Story of Dropbox’s Exodus from the Amazon Cloud Empire

Transcript:[00:00:00] Jeremy: Today, I'm talking to Randy Shoup, he's the VP of engineering and chief architect at eBay.

[00:00:05] Jeremy: He was previously the VP of engineering at WeWork and stitch fix, and he was also a chief engineer and distinguished architect at eBay back in 2004. Randy, welcome back to software engineering radio. This will be your fifth appearance on the show. I'm pretty sure that's a record.

[00:00:22] Randy: Thanks, Jeremy, I'm really excited to come back. I always enjoy listening to, and then also contributing to software engineering radio.

Back at, Qcon 2007, you spoke with Markus Volter he's he was the founder of SE radio. And you were talking about developing eBay's new search engine at the time.

[00:00:42] Jeremy: And kind of looking back, I wonder if you could talk a little bit about how eBay was structured back then, maybe organizationally, and then we can talk a little bit about the, the tech stack and that sort of thing.

[00:00:53] Randy: Oh, sure. Okay. Yeah. Um, so eBay started in 1995. I just want to like, you know, orient everybody. Same, same as the web. Same as Amazon, same as a bunch of stuff. So E-bay was actually almost 10 years old when I joined. That seemingly very old first time. Um, so yeah. What was ebay's tech stack like then? So E-bay current has gone through five generations of its infrastructure.

It was transitioning between the second and the third when I joined in 2004. Um, so the. Iteration was Pierre Omidyar, the founder three-day weekend three-day labor day weekend in 1995, playing around with this new cool thing called the web. He wasn't intending to build a business. He just was playing around with auctions and wanted to put up a webpage.

So he had a Perl backend and every item was a file and it lived on this little 486 tower or whatever you had at the time. Um, so that wasn't scalable and wasn't meant to be. The second generation of eBay's architecture was what we called V2 very, you know, creatively, uh, that was a C++ monolith. Um, an ISAPI DLL with essentially well at its worst, which grew to 3.4 million lines of code in that single DLL and basically in a single class, not just in a single, like repo or a single file, but in a single class.

So that was very unpleasant to work in. As you can imagine, um, eBay had about a thousand engineers at the time and they were, you know, as you can imagine, like really stepping on each other's toes and not being able to make much forward progress. So starting in, I want to call it 2002. So two years before I joined, um, they were migrating to the creatively named V3 and V3 architecture was Java, and.

you know, not microservices, but like we didn't even have that term, but it wasn't even that it was mini applications. So I'm actually going to take a step back. V2 was a monolith. So like all of eBay's code in that single DLL and like that was buying and selling and search and everything. And then we had two monster databases, a primary and a backup big Oracle machines on some hardware that was bigger, you know, bigger than refrigerators and that ran eBay for a bunch of years, before we changed the upper part of the stack, we, um, chopped up the, that single monolithic database into a bunch of, um, domain specific databases or entity specific databases, right?

So a set of databases around users, you know, sharded by the user ID could talk about all that. If you want, you know, items again, sharded by item ID transactions, sharded by transaction ID... I think when I joined, it was the several hundred instances of, uh, Oracle databases, um, you know, spread around, but still that monolithic front end.

And then in 2002, I wanna say we started migrating into that V3 that I was saying, okay. So that's, uh, that was a rewrite in Java, again, many applications. So you take the front end and instead of having it be in one big unit, it was this, uh, ER file, EAR, file, if run and people remember back to, you know, those stays in Java, um, you know, 220 different of those.

So like here is the, you know, one of them for the search pages, you know, so the, you know, one application be the search application and it would, you know, do all the search related stuff, the handful of pages around search, uh, ditto for, you know, the buying area, ditto for the, you know, checkout area, ditto for the selling area...

220 of those. Um, and that was again, domain, um, vertically sliced domains. And then the relationship between those V3, uh, applications and the databases was a many to many things. So like many applicants, many of those applications interact with items. So they would interact with those item databases. Many of them would interact with users.

And so they would interact with a user databases, et cetera, uh, happy to go into as much gory detail as you want about all that. But like, that's what, uh, but we were in the transition period. You know, when I, uh, between the V2 monolith to the V3 mini applications in, uh, 2004, I'm just going to pause there and like, let me know where you want to take it.

[00:05:01] Jeremy: Yeah. So you were saying that it was, um, it started as Perl, then it became a C++, and that's kind of interesting that you said it was all in one class, right? So it's wow. That's gotta be a gigantic

[00:05:16] Randy: I mean, completely brutal. Yeah. 3.4 million lines of code. Yeah. We were hitting compiler limits on the number of methods per class.

[00:05:22] Jeremy: Oh my gosh.

[00:05:23] Randy: I'm, uh, uh, scared that I have that. I happen to know that at least at the time, uh, Microsoft allowed you 16 K uh, methods per class, and we were hitting that limit.

So, uh, not great.

[00:05:36] Jeremy: So it's just kind of interesting to think about how do you walk through that code, right? You have, I guess you just have this giant file.

[00:05:45] Randy: Yeah. I mean, there were, you know, different methods. Um, but yeah, it was a big man. I mean, it was a monolith, it was, uh, you know, it was a spaghetti mess. Um, and you know, as you can imagine, Amazon went through a really similar thing by the way. So this wasn't soup. I mean, it was bad, but like we weren't the only people that were making that, making that a mistake.

Um, and just like Amazon, where they were, uh, they did like one update a quarter (laughs) , you know, at that period, like 2000, uh, we were doing something really similar, like very, very slow. Um, you know, updates and, uh, when we moved to V3, you know, the idea was to get to do changes much faster. And we were very proud of ourselves starting in 2004 that we, uh, upgraded the whole site every two weeks.

And we didn't have to do the whole site, but like each of those individual applications that I was mentioning, right. Those 220 applications, each of those would roll out on this biweekly cadence. Um, and they had interdependencies. And so we rolled them out in this dependency order in any way, lots of, lots of complexity associated with that.

Um, yeah, there you go.

[00:06:51] Jeremy: the V3 that, that was written in Java, I'm assuming this was a, as a complete rewrite. You, you didn't use the C++ code at all.

[00:07:00] Randy: Yeah. And, uh, it was, um, we migrated, uh, page by page. So, uh, you know, in the transition period, which lasted probably five years, um, there were pages, you know, in the beginning, all pages were served by V2. In the end, all pages are served by V3 and, you know, over time you iterate and you like rewrite in parallel, you know, rewrite and maintain in parallel the V3 version of XYZ page and the V2 version of XYZ page.

Um, and then when you're ready, you start to test out at low percentages of traffic, you know, what would, what does V3 look like? Is it correct? And when it isn't do you go and fix it, but then ultimately you migrate the traffic over, um, to fully take, get fully be in the V3 world, and then you, you know, remove or comment out or whatever.

The, the code that supported that in the V2 monolith.

[00:07:54] Jeremy: And then you had mentioned using Oracle databases. Did you have a set for V2 and a separate V3 and you were kind of trying to keep them in sync?

[00:08:02] Randy: Oh, great question. Thank you for asking that question. No, uh, no. We had the databases. Um, so again, as I mentioned, we had pre-demonolith that's my that's a technical term, uh, pre broken up the databases starting in, let's call it 2000. Uh, actually I'm almost certain that's 2000. Cause we had a major site outage in 1999, which everybody still remembers who was there at the time.

Uh wasn't me or I wasn't there at the time. Uh, but you know, you can look it up. Uh, anyway, so yeah, starting in 2000, we broke up that monolithic database into what I was telling you before those entity aligned databases. Again, one set for items, one set for users, one set for transactions, you know, dot dot, dot, um, and that division of those databases was shared.

You know, those databases were shared between. The three using those things and then V sorry, V2 using those things and V3 using those things. Um, and then, you know, so we've completely decoupled the rewrite of the database, you know, kind of data storage layer from the rewrite of the application layer, if that makes sense.

[00:09:09] Jeremy: Yeah. So, so you had V2 that was connecting to these individual Oracle databases. You said like they were for different types of entities, like maybe for items and users and things like that. but it was a shared database situation where V2 was connected to the same database as V3. Is that right?

[00:09:28] Randy: Correct and also in V3, even when done. Different V3 applications, were also connecting to the same database, again, like anybody who used user, anybody who used the user entity, which is a lot we're connecting to the user suite of databases and anybody who used the item entity, which again is a lot, um, you were connecting to the item databases, et cetera.

So yeah, it was this many to many that's, I'm trying to say many to many relationship between applications in the V3 world and databases.

[00:10:00] Jeremy: Okay. Yeah, I think I, I got it because

[00:10:03] Randy: It's easier with a diagram.

[00:10:04] Jeremy: yeah. W 'cause when you, when you think about services now, um, you think of services having dependencies on other services. Whereas in this case you would have multiple services that rather than talking to a different service, they would all just talk to the same database.

They all needed users. So they all needed to connect to the user's database.

[00:10:24] Randy: Right exactly. And so, uh, I don't want to jump ahead in this conversation, but like the problems that everybody has, everybody who's feeling uncomfortable at the moment. You're right. To feel uncomfortable because that wasn't unpleasant situation and microservices, or more generally the idea that individual services would own their own data.

And only in the only are interactions to the service would be through the service interface and not like behind the services back to the, to the data storage layer. Um, that's better. And Amazon discovered that, you know, uh, lots of people discovered that around that same, around that same early two thousands period.

And so yeah, we had that situation at eBay at the time. Uh, it was better than it was before. Right, right. Better than a monolithic database and a monolithic application layer, but it definitely also had issues. Uh, as you can imagine,

[00:11:14] Jeremy: you know, thinking about back to that time where you were saying it's better than a monolith, um, what were sort of the trade-offs of, you know, you have a monolith connecting to all these databases versus you having all these applications, connecting to all these databases, like what were the things that you gained and what did you lose if that made sense?

[00:11:36] Randy: Hmm. Yeah. Well, I mean, why we did it in the first place is develop is like isolation between development teams right? So we were looking for developer productivity or the phrase we used to use was feature velocity, you know, so how quickly would we be able to move? And to the extent that we could move independently, you know, the search team could move independently from the buying team, which could move independently from the selling team, et cetera.

Um, that was what we were gaining. Um, what were we losing? Uh, you know, when you're in a monolith situation, If there's an issue, you know, where it is, it's in the monolith. You might not know where in the monolith. Um, but like there's only one place that could be. And so an issue that one has, uh, when you break things up into smaller units, uh, especially when they have this, you know, shared, shared mutable state, essentially in the form of these databases, like who changed that column?

What, you know, what's the deal. Uh, actually we did have a solution for that or something that really helped us, which was, um, now 20, more than 20 years ago, we had something that we would now call distributed tracing where, uh, actually I talked about this way back in the 2007 thing, cause it was pretty cool, uh, at the time, uh, You know, just like the spans one would create using a modern distributed tracing, you know, open telemetry or, you know, any of the disruptive tracing vendors.

Um, just like you would do that. We, we didn't use the term span, but that same idea where, um, we could, and the goal was the same to like debug stuff. So, uh, every time we were about to make a database call, we would say, Hey, I'm about to make this data, you know, we would log we about to make this database call and then it would happen.

And then we would log whether it was successful or not successful. We could see how long it took, et cetera. Um, and so we built our own, you know, monitoring system, which, which we called central application logging or CAL, uh, totally proprietary to eBay. I'm happy to talk about whatever gory details you want to know about that, but it was pretty cool certainly way back in 2000.

It was, and that was our mitigation against the thing I'm telling you, which is, you know, when something, when not. Something is weird in the database. We can kind of back up and figure out where it might've happened, or things are slow. What's, you know, what's the deal. And, uh, you know, cause sometimes the database is slow for reasons.

Um, and what, which, what thing is, you know, from an application perspective, I'm talking to 20 different databases, but things are slow. Like what is it? And, um, CAL helped us to, to figure out both elements of that, right? Like what applications are talking to, what databases and what backend services and like debug and diagnose from that perspective.

And then for a given application, what, you know, databases in backend services are you talking to? And, um, debug that. And then we have the whole, and then we, um, we, we had monitors on those things and we would notice when databases would, where be a lot of errors or where, when database is starting in slower than they used to be.

Um, and then. We implemented what people would now call circuit breakers, where we would notice that, oh, you know, everybody who's trying to talk to database 1, 2, 3, 4 is seeing it slow down. I guess 1, 2, 3, 4 is unhappy. So now flip everybody to say, don't talk to 1, 2, 3, 4, and like, just that kind of stuff.

You're not going to be able to serve. Uh, but whatever, that's better than stopping everything. So I hope that makes sense. Like, you know, so all these, all these like modern resilience techniques, um, we always had, we had our own proprietary names for them, but you know, we, we implemented a lot of them way back when,

[00:15:22] Jeremy: Yeah. And, and I guess just to contextualize it for the audience, I mean, this was back in 2004. Oh it back in 2000.

[00:15:32] Randy: Again, because we had this, sorry to interrupt you because we have, the problem is that we were just talking about where application many applications are talking to many services and databases and we didn't know what was going on. And so we needed some visibility into what was going on.

Sorry, go ahead.

[00:15:48] Jeremy: yeah. Okay. So all the way back in 2000, there's a lot less, Services out there, like nowadays you think about so many software as a service products. if you were building the same thing today, what are some of the services that people today would just go and say like, oh, I'll just, I'll just pay for this and have this company handle it for me. You know, that wasn't available, then

[00:16:10] Randy: sure. Well, there. No, essentially, no. Well, there was no cloud cloud didn't happen until 2006. Um, and there were a few software as a service vendors like Salesforce existed at the time, but they weren't usable in the way you're thinking of where I could give you money and you would operate a technical or technological software service on my behalf.

Do you know what I mean? So we didn't have any of the monitoring vendors. We didn't have any of the stuff today. So yeah. So what would we do, you know, to solve that specific problem today? Uh, I would, as we do today, I would, uh, instrument everything with open telemetry because that's generic. Thank you, Ben Siegelman and LightStep for starting that whole open sourcing process, uh, of that thing and, and, um, getting all the vendors to, you know, respect it.

Um, and then I would shoot, you know, for my backend, I would choose one of the very many wonderful, uh, you know, uh, distributed tracing vendors of which there are so many, I can't remember, but like LightStep is one honeycomb... you know, there were a bunch of, uh, you know, backend, um, distributed tracing vendors in particular, you know, for that.

Uh, what else do you have today? I mean, we could go on for hours on this one, but like, we didn't have distributed logging or we didn't have like logging vendors, you know? So there was no, uh, there was no Splunk, there was no, um, you know, any, any of those, uh, any of the many, uh, distributed log, uh, or centralized logging vendor, uh, vendors.

So we didn't have any of those things. We didn't. like caveman, you know, we rent, we, uh, you know, had our own data. We built our own data centers. We racked our own servers. We installed all the OSS in them, you know, uh, by the way, we still do all that because it's way cheaper for us at our scale to do that.

But happy to talk about that too. Uh, anyway, but yeah, no, the people who live in, I don't know if this is where you want to go in 2022, the software developer has this massive menu of options. You know, if you only have a credit card, uh, and it doesn't usually cost that much, you can get a lot of stuff done from the cloud vendors, from the software service vendors, et cetera, et cetera.

And none of that existed in 2000.

[00:18:31] Jeremy: it's really interesting to think about how different, I guess the development world is now. Like, cause you mentioned how cloud wasn't even really a thing until 2006, all these, these vendors that people take for granted. Um, none of them existed. And so it just, uh, it must've been a very, very different time.

[00:18:52] Randy: Well, we didn't know. It was every, every year is better than the previous year, you know, in software every year. You know? So at that time we were really excited that we had all the tools and capabilities that, that we did have. Uh, and also, you know, you look back from, you know, 20 years in the future and, uh, you know, it looks caveman, you know, from that perspective.

But, uh, it was, you know, all those things were cutting edge at the time. What happened really was the big companies rolled their own, right. Everybody, you know, everybody built their own data centers, rack their own servers. Um, so at least at scale and the best you could hope for the most you could pay anybody else to do is rack your servers for you.

You know what I mean? Like there were external people, you know, and they still exist. A lot of them, you know, the Rackspaces you know Equinixes, et cetera of the world. Like they would. Have a co-location facility. Uh, and you, you know, you ask them please, you know, I'd like to buy the, these specific machines and please rack these specific machines for me and connect them up on the network in this particular way.

Um, that was the thing you could pay for. Um, but you pretty much couldn't pay them to put software on there for you. That was your job. Um, and then operating. It was also your job, if that makes sense.

[00:20:06] Jeremy: and then back then, would that be where. Employees would actually have to go to the data center and then, you know, put in their, their windows CD or their Linux CD and, you know, actually do everything right there.

[00:20:18] Randy: Yeah. 100%. Yeah. In fact, um, again, anybody who operates data centers, I mean, there's more automation, but the conceptually, when we run three data centers ourselves at eBay right now, um, and all of our, all of our software runs on them. So like we have physical, we have those physical data centers. We have employees that, uh, physically work in those things, physical.

Rack and stack the servers again, we're smarter about it now. Like we buy a whole rack, we roll the whole rack in and cable it, you know, with one big chunk, uh, sound, uh, as distinct from, you know, individual wiring and the networks are different and better. So there's a lot less like individual stuff, but you know, at the end of the day, but yeah, everybody in quotes, everybody at that time was doing that or paying somebody to do exactly that.

Right. Yeah.

[00:21:05] Jeremy: Yeah. And it's, it's interesting too, that you mentioned that it's still being done by eBay. You said you have three, three data centers. because it seems like now maybe it's just assumed that someone's using a cloud service or using AWS or whatnot. And so, oh, go ahead.

[00:21:23] Randy: I was just going to say, well, I'm just going to riff off what you said, how the world has changed. I mean, so much, right? So. Uh, it's fine. You didn't need to say my whole LinkedIn, but like I used to work on Google cloud. So I've been, uh, I've been a cloud vendor, uh, at a bunch of previous companies I've been a cloud consumer, uh, at stitch fix and we work in other places.

Um, so I'm fully aware, you know, fully, fully, personally aware of, of all that stuff. But yeah, I mean, there's this, um, you know, eBay is in the, uh, eBay is at the size where it is actually. Cost-effective very, cost-effective, uh, can't tell you more than that, uh, for us to operate our own, um, uh, our own infrastructure, right?

So, you know, you know, one would expect if Google didn't operate their own infrastructure, nobody would expect Google to use somebody else's right. Like that, that doesn't make any economic sense. Um, and, uh, you know, Facebook is in the same category. Uh, for a while, Twitter and PayPal have been in that category.

So there's like this clap, you know, there are the known hyperscalers, right. You know, the, the Google, Amazon, uh, Microsoft that are like cloud vendors in addition to consumers internally have their own, their own clouds. Um, and then there's a whole class of other, um, places that operate their own internal clouds in quotes.

Uh, but don't offer them externally and again, uh, Facebook or Meta, uh, you know, is one example. eBay's another, you know, there's a, I'm making this up. Dropbox actually famously started in the cloud and then found it was much cheaper for them to operate their own infrastructure again, for the particular workloads that they had.

Um, so yeah, there's probably, I'm making this up. Let's call it two dozen around the world of these, I'm making this term up many hyperscalers, right? Like self hyperscalers or something like that. And eBay's in that category.

[00:23:11] Jeremy: I know this is kind of a, you know, a big what if, but you were saying how once you reach a certain scale, that's when it makes sense to move into your own data center. And, uh, I'm wondering if, if E-bay, had started more recently, like, let's say in the last, you know, 10 years, I wonder if it would've made sense for it to start on a public cloud and then move to, um, you know, its own infrastructure after it got bigger, or if you know, it really did make sense to just start with your own infrastructure from the start.

[00:23:44] Randy: Oh, I'm so glad you asked that. Um, the, the answer is obvious, but like, I'm so glad you asked that because I love to make this point. No one should ever, ever start by building your own servers and your own (laughs) cloud. Like, No, there's be, uh, you should be so lucky (laughs) after years and years and years that you outgrow the cloud vendors.

Right. Um, it happens, but it doesn't happen that often, you know, it happens so rarely that people write articles about it when it happens. Do you know what I mean? Like Dropbox is a good example. So yes, 100% anytime. Where are we? 2022. Any time in, more than the last 10 years? Um, yeah, let's call it. Let's call it 2010, 2012.

Right. Um, when cloud had proved itself over and you know, many times over, um, anybody who starts since that time should absolutely start in the public cloud. There's no argument about it. Uh, and again, one should be so lucky that over time, you're seeing successive zeros added to your cloud bill, and it becomes so many zeros that it makes sense to shift your focus toward building and operating your own data centers.

That's it. I haven't been part of that transition. I've been the other way, you know, at other places where, you know, I've migrated from owned data centers and colos into, into public cloud. Um, and that's the, that's the more common migration. And again, there are, there are a handful, maybe not even a handful of, uh, companies that have migrated away, but when they do, they've done all the math, right.

I mean, uh, Dropbox has done some great, uh, talks and articles about, about their transition and boy, the math makes sense for them. So, yeah.

[00:25:30] Jeremy: Yeah. And it also seems like maybe it's for certain types of businesses where moving off of public cloud. Makes sense. Like you mentioned Dropbox where so much of their business is probably centered around storage or centered around, you know, bandwidth and, you know, there's probably certain workloads that it's like need to leave public cloud earlier.

[00:25:51] Randy: Um, yeah, I think that's fair. Um, I think that, I think that's a, I think that's an insightful comment. Again, it's all about the economics at some point, you know, it's a big investment to, uh, uh, and it takes years to develop the intern, forget the money that you're paying people, but like just to develop the internal capabilities.

So they're very specialized skill sets around building an operating data centers. So like it's a big deal. Um, and, uh, yeah. So are there particular classes of workloads where you would for the same dollar figure or whatever, uh, migrate earlier or later? I'm sure that's probably true. And again, what can absolutely imagine?

Well, when they say Dropbox in this example, um, yeah, it's because like they, they need to go direct to the storage. And then, I mean, like, they want to remove every middle person, you know, from the flow of the bytes that are coming into the storage media. Um, and it makes perfect sense for, for them. And when I understood what they were doing, which was a number of years ago, they were hybrid, right. So they had, they had completely, you know, they kept the top, you know, external layer, uh, in public cloud. And then the, the storage layer was all custom. I don't know what they do today, but people could check.

[00:27:07] Jeremy: And I'm kind of coming back to your, your first time at eBay. is there anything you felt that you would've done differently with the knowledge you have now?

but with the technology that existed, then.

[00:27:25] Randy: Gosh, that's the 20, 20 hindsight. Um, the one that comes to mind is the one we touched on a little bit, but I'll say it more starkly, the. If I could, if I could go back in time 20 years and say, Hey, we're about to do this V3 transition at eBay. I would not. I would have had us move directly to what we would now call microservices in the sense that individual services own their own data storage and are only interacted with through the public interface.

Um, there's a famous Amazon memo around that same time. So Amazon did the transition from a monolith into what we would now call microservices over about a four or five-year period, 2000 to 2005. And there was a famous Jeff Bezos memo from the early part of that, where, you know, seven, you know, requirements I can't remember them, but you know, essentially it was, you may, you may, you may never, you may never talk to anybody else's database. You may only interact with other services through their public interfaces. I don't care what those public interfaces are, so they didn't standardize around. You know, CORBA or JSON or GRPC, which didn't exist at the time, you know, like they didn't standardize around any, any particular, uh, interaction mechanism, but you did need to again, have this kind of microservice capability, that's modern terminology, um, uh, where, you know, the only services own their own data and nobody can talk in the back door.

So that is the one architectural thing that I wish, you know, with 2020 hindsight, uh, that I would bring back in my time travel to 20 years ago, because that would help. That does help a lot. And to be fair, Amazon, um, Amazon was, um, pioneering in that approach and a lot of people internally and externally from Amazon, I'm told, didn't think it would work, uh, and it, and it did famously.

So that's, that's the thing I would do.

[00:29:30] Jeremy: Yeah. I'm glad you brought that up because, when you had mentioned that, I think you said there were 220 applications or something like that at certain scales, people might think like, oh, that sounds like microservices to me. But when you, you mentioned that microservice to you means it having its own data store.

I think that's a good distinction.

[00:29:52] Randy: Yeah. So, um, I talk a lot about microservices that have for, for a decade or so. Yeah. I mean, several of the distinguishing characteristics are the micro in microservices is size and scope of the interface, right? So you can have a service oriented architecture with one big service, um, or some very small number of very large services.

But the micro in microservice means this thing does, maybe it doesn't have one operation, but it doesn't have a thousand. The several or the handful or several handfuls of operations are all about this one particular thing. So that's the one part of it. And then the other part of it that is critical to the success of that is owning the, owning your own data storage.

Um, so each service, you know, again, uh, it's hard to do this with a diagram, but like imagine, imagine the bubble of the service surrounding the data storage, right? So like people, anybody from the outside, whether they're interacting synchronously, asynchronously, messaging, synchronous, whatever HTTP doesn't matter are only interacting to the bubble and never getting inside where the, uh, where the data is I hope that makes sense.

[00:31:04] Jeremy: Yeah. I mean, I mean, it's a kind of in direct contrast to before you're talking about how you had all these databases that all of these services shared. So it was probably hard to kind of keep track of, um, who had modified data. Um, you know, one service could modify it, then another service control to get data out and it's been changed, but it didn't change it.

So it could be kind of hard to track what's going on.

[00:31:28] Randy: Yeah, exactly. Inner integration at the database level is something that people have been doing since probably the 1980s. Um, and so again, I, you know, in retrospect it looks like caveman approach. Uh, it was pretty advanced at the time, actually, even the idea of sharding of, you know, Hey, there are users and the users live in databases, but they don't all live in the same one.

Uh, they live in 10 different databases or 20 different databases. And then there's this layer that. For this particular user, it figures out which of the 20 databases it's in and finds it and gets it back. And, um, you know, that was all pretty advanced. And by the way, that's all those capabilities still exist.

They're just hidden from everybody behind, you know, nice, simple, uh, software as a service, uh, interfaces anyway, but that takes nothing away from your excellent point, which is, yeah. It's, you know, when you're, again, when there's many to many to relations, when there is this many to many relationship between, um, uh, applications and databases, uh, and there's shared mutable state in those databases that when is shared, like that's bad, you know, it's not bad to have state.

It's not bad to have mutable state it's bad to have shared beautiful state.

[00:32:41] Jeremy: Yeah. And I think anybody who's kind of interested in learning more about the, you had talked about sharding and things like that. If they go back and listen to your, your first appearance on software engineering radio, um, yeah. It kind of struck me how you were talking about sharding and how it was something that was kind of unique or unusual.

Whereas today it feels like it's very, I don't know, if quaint is the right word, but it's like, um, it's something that, that people kind of are accustomed to now.

[00:33:09] Randy: Yeah. Yeah. Um, it's obvious. Um, it seems obvious in retrospect. Yeah. You know, at the time, and by the way, he didn't invent charting. As I said, in 2007, you know, Google and Yahoo and, uh, Amazon, and, you know, it was the obvious, it took a while to reach it, but it's one of those things where once, once people have the, you know, brainwave to see, oh, you know what, we don't actually have to stop store this in one, uh, database.

We can, we can chop that database up into, you know, into chunks. And that, that looks similar to that herself similar. Um, yeah, that was, uh, that was, uh, that was reinvented by lots of, uh, Lots of the big companies at the same time again, because everybody was solving that same problem at the same time. Um, but yeah, when you look back and you, I mean, like, and honestly, like everything that I said there, it's still like this, all the techniques about how you shard things.

And there's lots of, you know, it's not interesting anymore because the problems have been solved, but all those solutions are still the solutions, if that makes any sense, but you know,

[00:34:09] Jeremy: Yeah, for sure. I mean, I think anybody who goes back and listens to it. Yeah. Like you said, it's, it's, it's very interesting because it's. it all still applies and it's like, I think the, the solutions that are kind of interesting to me are ones where it's, it's things that could have been implemented long ago, but we just later on realized like, this is how we could do it.

[00:34:31] Randy: Well part of it is, as we grow as an industry, we just, we discover new problems. You know, we, we get to the point where, you know, sharding over databases has only a problem when one database doesn't work. You know, when it, when you're the load that you put on that database is too big, or you want the availability of, you know, multiple.

Um, and so that's not a, that's not a day one problem, right? That's a day two or day 2000 and kind of problem. Right. Um, and so a lot of these things, yeah, well, you know, it's software. So like we could have done, we could have done any of these things in older languages and older operating systems and with older technology.

But for the most part, we didn't have those problems or we didn't have them at sufficiently enough. People didn't have the problem that we, you know, um, for us to have solved it as an industry, if that makes any sense.

[00:35:30] Jeremy: yeah, no, that's a good point because you think about when Amazon first started and it was just a bookstore, right. And the number of people using the site where, uh, who knows it was, it might've been tens a day or hundreds a day. I don't, I don't know. And, and so, like you said, the problems that Amazon has now in terms of scale are just like, it's a completely different world than when they started.

[00:35:52] Randy: Yeah. I mean, probably I'm making it up, but I don't think that's too off to say that it's a billion times more, their problems are a billion fold. You know, what they, what they were

[00:36:05] Jeremy: the next thing I'd like to talk about is you came back to eBay I think about has it been about two years ago.

[00:36:14] Randy: Two years yeah.

[00:36:15] Jeremy: Yeah. And, and so, so tell me about the experience of coming back to an organization that you had been at, you know, 10 years prior or however long it was like, how is your onboarding different when it's somewhere you've been before?

[00:36:31] Randy: Yeah. Sure. So, um, like, like you said, I worked at eBay from 2004 to 2011. Um, and I worked in a different role than I have today. I've worked mostly on eBay search engine. Um, and then, uh, I left to co-found a startup, which was in the 99%. So the one, you know, like didn't really do much. Uh, I joined, I worked at Google in the early days of Google cloud, as I mentioned on Google app engine and had a bunch of other roles including more recently, like you said, stitch fix and we work, um, leading those engineering teams.

And, um, so yeah, coming back to eBay as chief architect and, and, you know, leading. Developer platform, essentially a part of eBay. Um, yeah. What was the onboarding like? I mean, lots of things had changed, you know, in the, in the intervening 10 years or so. Uh, and lots had stayed the same, you know, not in a bad way, but just, you know, uh, some of the technologies that we use today are still some of the technologies we used 10 years ago, a lot has changed though.

Um, a bunch of the people are still around. So there's something about eBay that, um, people tend to stay a long time. You know, it's not really very strange for people to be at eBay for 20 years. Um, in my particular team of let's call it 150, there are four or five people that have crossed their 20 year anniversary at the company.

Um, and I also re I rejoined with a bunch of other boomerangs as the term we use internally. So it's, you know, the, um, including the CEO, by the way. So sort of bringing the band back together, a bunch of people that had gone off and worked at it, but at other places have, have come back for various reasons over the last couple of.

So it was both a lot of familiarity, a lot of unfamiliarity, a lot of familiar faces. Um, yup.

[00:38:17] Jeremy: So, I mean, having these people who you work with still be there and actually coming back with some of those people, um, what were some of the big, I guess, advantages or benefits you got from, you know, those existing connections?

[00:38:33] Randy: Yeah. Well, I mean, as with all things, you know, imagine, I mean, everybody can imagine like getting back together with friends that they had from high school or university, or like you had some people had some schooling at some point and like you get back together with those friends and there's this, you know, there's this implicit trust in most situations of, you know, because you went through a bunch of stuff together and you knew each other, uh, you know, a long time.

And so that definitely helps, you know, when you're returning to a place where again, there are a lot of familiar faces where there's a lot of trust built up. Um, and then it's also helpful, you know, eBay's a pretty complicated place and it's 10 years ago, it was too big to hold in any one person's head and it's even harder to hold it in one person said now, but to be able to come back and have a little bit of that, well, more than a little bit of that context about, okay, here's how eBay works.

And here, you know, here are the, you know, unique complexities of the marketplace cause it's very unique, you know, um, uh, in the world. Um, and so, yeah, no, I mean, it was helpful. It's helpful a lot. And then also, you know, in my current role, um, uh, my, my main goal actually is to just make all of eBay better, you know, so we have about 4,000 engineers and, you know, my team's job is to make all of them better and more productive and more successful and, uh, being able to combine.

Knowing what eBay, knowing the context about eBay and having a bunch of connections to the people that, you know, a bunch of the leaders there, uh, here, um, combining that with 10 years of experience doing other things at other places, you know, that's helpful because you know, now there are things that we do at eBay that, okay, well there, you know, you know, that this other place is doing, this has that same problem and is solving it in a different way.

And so maybe we should, you know, look into that option. So,

[00:40:19] Jeremy: so, so you mentioned just trying to make developers, work or lives easier. you start the job. How do you decide what to tackle first? Like how do you figure out where the problems are or what to do next?

[00:40:32] Randy: yeah, that's a great question. Um, so, uh, again, my, uh, I lead this thing that we internally called the velocity initiative, which is about just making us, giving us the ability to deliver. Features and bug fixes more quickly to customers. Right. And, um, so what do I figure for that problem? How can we deliver things more quickly to customers and improve, you know, get more customer value and business value?

Uh, what I did, uh, with, in collaboration with a bunch of people is what one would call a value stream map. And that's a term from lean software and lean manufacturing, where you just look end to end at a process and like say all the steps and how long those steps take. So a value stream, as you can imagine, like all these steps that are happening at the end, there's some value, right?

Like we produced some, you know, feature or, you know, hopefully gotten some revenue or like helped out the customer and the business in some way. And so value, you know, mapping that value stream. That's what it means. And, um, Looking for you look at that. And when you can see the end-to-end process, you know, and like really see it in some kind of diagram, uh, you can look for opportunities like, oh, okay, well, you know, if it takes us, I'm making this effort, it takes us a week from when we have an idea to when it shows up on the site.

Well, you know, some of those steps take five minutes. That's not worth optimizing, but some of those steps take, you know, five days and that is worth optimizing. And so, um, getting some visibility into the system, you know, looking end to end with some, with a kind of view of the system systems thinking, uh, that will give you the, uh, the knowledge about, or the opportunities about we know what can be improved.

And so that's, that's what we did. And we didn't talk with all 4,000, you know, uh, engineers are all, you know, whatever, half a thousand teams or whatever we had. Um, but we sampled. And after we talked with three teams who were already hearing a bunch of the same things, you know, so we were hearing in the whole product life cycle, which I like to divide into four stages.

I'd like to say, there's planning. How does an idea become a project or a thing that people work on a software development? How does a project or become committed code software delivery? How does committed code become a feature that people actually use? And then what I call post release iteration, which is okay, it's now there are out there on the site and we're turning it on and off for individual users.

We're learning in analytics and usage in the real world and, and experimenting. And so there were opportunities that eBay at all, four of those stages, um, which I'm happy to talk about, but what we ended up seeing again and again, uh, is that that software delivery part was our current bottleneck. So again, that's the, how long does it take from an engineer when she commits her code to, it shows up as a feature on the site.

And, you know, before we started the work. You know, two years ago before we started the work that I've been doing for the last two years with a bunch of people, um, on average and eBay was like a week and a half. So, you know, it'd be a week and a half between when someone's finished and then, okay. It gets code reviewed and, you know, dot, dot, dot it gets rolled out.

It gets tested, you know, all that stuff. Um, it was, you know, essentially 10 days. And now for the teams that we've been working with, uh, it's down to two. So we used a lot of, um, what people may be familiar with, uh, the accelerate book. So it's called accelerate by Nicole Forsgren. Um, Jez humble and Gene Kim, uh, 2018, like if there's one book anybody should read about software engineering, it's that?

Uh, so please read accelerate. Um, it summarizes almost a decade of research from the state of DevOps reports, um, which the three people that I mentioned led. So Nicole Forsgren, you know, is, uh, is a doctor, uh, you know, she's a PhD and, uh, data science. She knows how to do all this stuff. Um, anyway, so, uh, that when your, when your problem happens to be software delivery.

The accelerate book tells you all the kind of continuous delivery techniques, trunk based development, uh, all sorts of stuff that you can do to, to solve that, uh, solve those problems. And then there are also four metrics that they use to measure the effectiveness of an organization, software delivery. So people might be familiar with, uh, there's deployment frequency.

How often are we deploying a particular application lead time for change? That's that time from when a developer commits her code to when it shows up on the site, uh, change failure rate, which is when we deploy code, how often do we roll it back or hot fix it, or, you know, there's some problem that we need to, you know, address.

Um, and then, uh, meantime to re uh, meantime to restore, which is when we have one of those incidents or problems, how, how quickly can we, uh, roll it back or do that hot fix? Um, and again, the beauty of Nicole Forsgren research summarized in the accelerate book is that the science shows that companies cluster, in other words, Mostly the organizations that are not good at, you know, deployment frequency and lead time are also not good at the quality metrics of, uh, meantime to restore and change failure rate and the companies that are excellent at, you know, uh, deployment frequency and lead time are also excellent at meantime, to recover and, uh, change failure rate.

Um, so companies or organizations, uh, divided into these four categories. So there's a low performers, medium performers, high performers, and then elite performers. And, uh, eBay was solidly eBay on average at the time. And still on average is solidly in that medium performer category. So, uh, and what we've been able to do with the teams that we've been working with is we've been able to move those teams to the high category.

So just super brief. Uh, and I w we'll give you a chance to ask you some more questions, but like in the low category, all those things are kind of measured in months, right. So how long, how often are we deploying, you know, measure that in months? How long does it take us to get a commit to the site? You know, measure that in months, you know, um, where, and then the low performer, sorry.

Uh, the medium performers are like, everything's measured in weeks, right? So like, if we were deploy, you know, couple, you know, once every couple of weeks or once a week, uh, lead time is measured in weeks, et cetera. The, uh, the high-performers things are measured in days and the elite performance things are measured in hours.

And so you can see there's like order of magnitude improvements when you go from, you know, when you move from one of those kind of clusters to another cluster. Anyway. So what we were focused on again, because our problem was software delivery was moving a whole, a whole set of teams from that medium performer category where things are measured in weeks to the, uh, high-performer category, where things are missing.

[00:47:21] Jeremy: throughout all this, you said the, the big thing that you focused on was the delivery time. So somebody wrote code and, they felt that it was ready for deployment, but for some reason it took 10 days to actually get out to the actual site. So I wonder if you could talk a little bit about, uh, maybe a specific team or a specific application, where, where, where was that time being spent?

You know, you, you said you moved from 10 days to two days. What, what was happening in the meantime?

[00:47:49] Randy: Yeah, no, that's a great question. Thank you. Um, yeah, so, uh, okay, so now, so we, we, we looked end to end of the process and we found that software delivery was the first place to focus, and then there are other issues in other areas, but we'll get to them later. Um, so then for, um, to improve software delivery, now we asked individual teams, we, we, we did something like, um, you know, some conversation like I'm about to say, so we said, hi, it looks like you're deploying kind of once or twice.

If I, if I told you, you had to deploy once a day, tell me all the reasons why that's not going to work. And the teams are like, oh, of course, well, it's a build times take too long. And the deployments aren't automated and you know, our testing is flaky. So we have to retry it all the time and, you know, dot, dot, dot, dot, dot.

And we said, great, you just gave my team, our backlog. Right. So rather than, you know, just coming and like, let's complain about it. Um, which the teams work it's legit for them to complain. Uh, I was a, you know, we were able, because again, the developer program or sorry, the developer platform, you know, is as part of my team, uh, we said, great, like you just gave us, you just told us all the, all your top, uh, issues or your impediments, as we say, um, and we're going to work on them with you.

And so every time we had some idea about, well, I bet we can use Canary deployments to automate the deployment which we have now done. We would pilot that with a bunch of teams, we'd learn what works and doesn't work. And then we would roll that out to everybody. Um, So what were the impediments like? It was a little bit different for each individual team, but in some, it was, uh, the things we ended up focusing on or have been focusing on our build times, you know, so we build everything in Java still.

Um, and, uh, even though we're generation five, as opposed to that generation three that I mentioned, um, still build times for a lot of applications we're taking way too long. And so we, we spend a bunch of time improving those things and we were able to take stuff from, you know, hours down to, you know, single digit minutes.

So that's a huge improvement to developer productivity. Um, we made a lot of investment in our continuous delivery pipelines. Um, so making all the, making all the automation around, you know, deploying something to one environment and checking it there and then deploying it into a common staging environment and checking it there and then deploying it from there into the production environment.

And, um, and then, you know, rolling it out via this Canary mechanism. We invested a lot in something that we call traffic mirroring, which is a, we didn't invent. Other T other places have a different name for this? I don't know if there's a standard industry name. Some people call it shadowing, but the idea is I have a change that I'm making, which is not intended to change the behavior.

Like a lots of changes that we make, bug fixes, et cetera, uh, upgrading to new, you know, open source, dependencies, whatever, changing the version of the framework. There's a bunch of changes that we make regularly day-to-day as developers, which are like, refactorings kind of where we're not actually intending to change the behavior.

And so a tra traffic mirroring was our idea of. You have the old code that's running in production and you, and you fire a request, a production request at that old code and it responds, but then you also fire that request at the new version and compare the results, you know, did the same, Jason come back, you know, between the old version and the new version.

Um, and that's, that's a great way kind of from the outside to sort of black box detect any unintended changes in the, in the behavior. And so we definitely leveraged that very, very aggressively. Um, we've invested in a bunch of other bunch of other things, but, but all those investments are driven by what does the team, what do the particular teams tell us are getting in their way?

And there are a bunch of things that the teams themselves have, you know, been motivated to do. So my team's not the only one that's making improvements. You know, teams have. Reoriented, uh, moved, moved from branching development to trunk based development, which makes a big difference. Um, making sure that, uh, PR approvals and like, um, you know, code reviews are happening much more regularly.

So like right after, you know, a thing that some teams have started doing is like immediately after standup in the morning, everybody does all the code reviews that you know, are waiting. And so things don't drag on for, you know, two, three days, cause whatever. Um, so there's just like a, you know, everybody kind of works on that much more quickly.

Um, teams are building their own automations for things like testing site speed and accessibility and all sorts of stuff. So like all the, all the things that, you know, a team goes through in the development and roll out of their software, they were been spending a lot of time automating and making, making a leaner, making more efficient.

[00:52:22] Jeremy: So, so some of those, it sounds like the PR example is really, on the team. Like you're, you're telling them like, Hey, this is something that you internally should change how you work. for things like improving the build time and things like that. Did you have like a separate team that was helping these teams, you know, speed that process up? Or what, what was that

[00:52:46] Randy: like?

Yeah. Great. I mean, and you did give to those two examples are, are like you say, very different. So I'm going to start from, we just simply showed everybody. Here's your deployment frequency for this application? Here's your lead time for this application? Here's your change failure rate. And here's your meantime to restore.

And again, as I didn't mention before. All of the state of DevOps research and the accelerate book prove that by improving those metrics, you get better engineering outcomes and you also get better business outcomes. So like it's scientifically proven that improving those four things matters. Okay. So now we've shown to teams, Hey, you're we would like you to improve, you know, for your own good, but you know, more broadly at eBay, we would like the deployment frequency to be faster.

And we would like the lead time to be shorter. And the insight there is when we deploy smaller units of work, when we don't like batch up a week's worth of work, a month's worth of work, uh, it's much, much less risky to just deploy like an hour's worth of work. Right. And the, and the insight is the hours worth of work fits in your head.

And if you roll it out and there's an issue. First off rolling backs, no big deal. Cause you only, you know, not, you've only lost an hour of work for a temporary period of time, but also like you never have this thing, like what in the world broke? Cause like with a month's worth of work, there's a lot of things that changed and a lot of stuff that could break, but with an hour's worth of work, it's only like one change that you made.

So, you know, when, if something happens, like it's pretty much, pretty much guaranteed to be that thing anyway, that's the back. Uh, that's the backstory. And um, and so yeah, we were just working with individual teams. Oh yeah. So they were, the teams were motivated to like, see what's the biggest bang for the buck in order to improve those things.

Like how can we improve those things? And again, some teams were saying, well, you know what, a huge component of our, of that lead time between when somebody commits and it's, it's a feature on the site, a huge percentage of that. Maybe multiple days, it's like waiting for somebody to code review. Okay, great.

We can just change our team kind of agreements and our team behavior to make that happen. And then yes, to answer your question about. Were the other things like building the Canary capability and traffic mirroring and build time improvements. Those were done by central, uh, platform and infrastructure teams, you know, some of which were in my group and some of which are in peer peer groups, uh, in, in my part of the organization.

So, yeah, so I mean like providing the generic tools and, you know, generic capabilities, those are absolutely things that a platform organization does. Like that's our job. Um, and you know, we did it. And, uh, and then there are a bunch of other things like that around kind of team behavior and how you approach building a particular application that are, are, and should be completely in the control of the individual teams.

And we were trying not to be, not trying not to be, we were definitely not being super prescriptive. Like we didn't come in and we say, we didn't come in and say, alright, by next, by next Tuesday, we want you to be doing trunk based development by, you know, the Tuesday after that, we want to see test-driven development, you know, dot, dot, Um, we would just offer to teams, you know, hear it.

Here's where you are. Here's where we know you can get, because like we work with other teams and we've seen that they can get there. Um, you know, they just work together on, well, what's the biggest bang for the buck and what would be most helpful for that team? So it's like a menu of options and you don't have to take everything off the menu, if that makes sense.

[00:56:10] Jeremy: And, and how did that communication flow from you and your team down to the individual contributor? Like you have, I'm assuming you have engineering managers and technical leads and all these people sort of in the chain. How does it

[00:56:24] Randy: Yeah, thanks for asking that. Yeah. I didn't really say how we work as an initiative. So every, um, so there are a bunch of teams that are involved. Um, and we have, uh, every Monday morning, so, uh, just so happens. It's late Monday morning today. So we already did this a couple of hours ago, but once a week we get all the teams that are involved, both like the platform kind of provider teams and also the product.

Or we would say domain like consumer teams. And we do a quick scrum of scrums, like a big old kind of stand up. What have you all done this week? What are you working on next week? What are you blocked by kind of idea. And, you know, there are probably 20 or 30 teams again, across the individual platform capabilities and across the teams that, you know, uh, consume this stuff and everybody gives a quick update and they, and, uh, it's a great opportunity for people to say, oh, I have that same problem too.

Maybe we should offline try to figure out how to solve that together. You built a tool that automates the site speed stuff. That's great. I would S I would so love to have that. And, um, so it, uh, this weekly meeting has been a great opportunity for us to share wins, share, um, you know, help that people need and then get, uh, get teams to help with each other.

And also, similarly, one of the platform teams would say something like, Hey, we're about to be done or beta, let's say, you know, this new Canary capability, I'm making this up. Anybody wanna pilot that for us? And then you get a bunch of hands raised of yo, we would be very happy to pilot that that would be great.

Um, so that's how we communicate back and forth. And, you know, it's a big enough. It's kind of like engineering managers are kind of are the kind of level that are involved in that typically. Um, so it's not individual developers, but it's like somebody on most, every team, if that makes any sense. So like, that's kind of how we do that, that like communication, uh, back to the individual developers.

If that makes sense.

[00:58:26] Jeremy: Yeah. So it sounds like you would have, like you said, the engineering manager go to the standup and um, you said maybe 20 to 30 teams, or like, I'm just trying to get a picture for how many people are in this meeting.

[00:58:39] Randy: Yeah. It's like 30 or 40 people.

[00:58:41] Jeremy: Okay. Yeah.

[00:58:42] Randy: And again, it's quick, right? It's an hour. So we just go, boom, boom, boom, boom. And we've just developed a cadence of people. We have a shared Google doc and like people like write their little summaries, you know, of what they're, what they've worked on and what they're working on.

So we've over time made it so that it's pretty efficient with people's time. And. Pretty dense in a good way of like information flow, back and forth. Um, and then also separately, we meet more in more detail with the individual teams that are involved. Again, try to elicit, okay, now, where are you now?

Here's where you are. Please let us know what problems you're seeing with this part of the infrastructure or problems you're seeing in the pipelines or something like that. And we're, you know, we're constantly trying to learn and get better and, you know, solicit feedback from teams on what we can do differently.

[00:59:29] Jeremy: earlier you had talked a little bit about how there were a few services that got brought over from V2 or V3, basically kind of more legacy or older services that are, have been a part of eBay for quite some time.

And I was wondering if there were things about those services that made this process different, like, you know, in terms of how often you could deploy or, um, just what were some key differences between something that was made recently versus something that has been with the company for a long time?

[01:00:06] Randy: Yeah, sure. I mean, the stuff that's been with the company for a long time was best in class. As of when we built it, you know, maybe 15 and sometimes 20 years ago. Um, there actually, I wouldn't even less than a handful. There are, as we speak, there are two or three of those V3. Uh, clusters or applications or services still around and they should be gone in a completely migrated away from, in the next a couple of months.

So like, we're almost at the end of, um, you know, uh, moving all to more modern things. But yeah, you know, I mean, again, uh, stuff that was state-of-the-art, you know, 20 years ago, which was like deploying things once every two weeks, like that was a big deal in 2000 or 2004. Uh, and it's, you know, like that was fast in 2004 and is slow in 2022.

So, um, yeah, I mean, what's the difference? Um, yeah, I mean, a lot of these things, you know, if they haven't already been migrated, there's a reason. And it's because often that they're way in the guts of something that's really important. You know, this is the, this is a core part. I'm making these examples up and they're not even right, but like it's a core part of the payments flow.

It's a core part of, you know, uh, how, uh, sellers get paid. And those aren't examples. We have, those are modern, but you see what I'm saying? Like stuff that's like really core to the business and that's why it's kind of lasted.

[01:01:34] Jeremy: And, uh, I'm kind of curious from the perspective of some of these new things you're introducing, like you're talking about, um, improving continuous delivery and things like that. Uh, when you're working with some of these services that have been around a long time, are the teams the rate at which they deploy or the rate at which you find defects is that noticeably different from services that are more recent?

[01:02:04] Randy: I mean, and that's true of any legacy at any, at any place. Right? So, um, yeah, I mean, people are legitimately, uh, I have some trepidation that say about, you know, changing something that's, you know, been running the, running the business for a long, long time. And so, you know, it's a lot slower going, uh, exactly because it's not always completely obvious what, um, you know, what the implications are of those changes.

So, you know, we were very careful and we, you know, trust things a whole lot. And, um, you know, maybe we didn't write stuff with a whole bunch of automated tests in the beginning. And so there's a lot of manual stuff there. You know, this is pretty, you know, this is just what happens when you have, uh, you have stuff that, you know, you have a company that's, you know, been around for a long time.

[01:02:51] Jeremy: yeah, I guess just, just kind of to start wrapping up as this process of you coming into the company and identifying where the problems are and working on like, um, you know, ways to speed up delivery. Is there, there anything that kind of came up that really surprised you? I mean, you've been at a lot of different organizations. Is there anything about your experience here at eBay that was very different than what you'd seen before?

[01:03:19] Randy: No. I mean, it's a great question. I don't think, I mean, I think the thing that's surprising is how unsurprising it is. Like there's not, you know, the details are different. Like, okay. You know, we have this V3, I mean, like, you know, we have some uniquenesses around eBay, but, but, um, but I think what is maybe pleasantly surprising is all the techniques about how one.

Notice the things that are going on, uh, in terms of, you know, again, deployment, frequency, lead time, et cetera, and what techniques you would deploy to like make those things better. Um, all the standard stuff applies, you know, so again, uh, all the, all the techniques that are mentioned in the state of DevOps research and an accelerate and just all the, all the known good practices of software development, they all apply everywhere.

Um, and that's the wonderful, I think that's the wonderful thing. So like maybe the most surprising thing is how unsurprising or how, how, how applicable the, you know, the standard industry standard techniques, uh, are, I mean, I certainly hope that to be true, but that's why we, I didn't really say, but we piloted this stuff with a small number of teams.

Exactly. Because we, you know, we thought, and it would turned out to be true that they applied, but we weren't entirely sure. You know, we didn't know what we didn't know. Um, and we also needed proof points, you know, Not just out there in the world, but at eBay that these things made a difference and it turns out they do. So.

[01:04:45] Jeremy: yeah, I mean, I think it's easy for people to kind of get caught up and think like, my problem is unique or my organization is unique and, but it, but it sounds like in a lot of cases, maybe we're not so not so different.

[01:04:57] Randy: I mean, the stuff that works tends to work everywhere, the deeds there's always some detail, but, um, but yeah, I mean all aspects of, you know, the continuous delivery and kind of lean approach the software. I mean, we, the industry have yet to find a place where they don't work seriously. You have to find any place where they don't work.

[01:05:19] Jeremy: if people want to, um, you know, learn more about the work that you're doing at eBay, or just follow you in general, um, where should.

[01:05:27] Randy: Yeah. So, um, I tweet semi-regularly at, at Randy shelf. So my name all one word, R a N D Y S H O U P. Um, I'm not, I had always wanted to be a blogger. Like there is a Randy shop.com and there are some blogs on there, but they're pretty old. Um, someday I hope to be doing more writing. Um, I do a lot of conference speaking though.

So I speak at the Q con conferences. I'm going to be at the craft concert in Budapest in a couple of in week and a half, uh, as of this recording. Um, so you can often find me on, uh, on Twitter or on software conferences.

[01:06:02] Jeremy: all right, Randy. Well, thank you so much for coming back on software engineering radio.

[01:06:06] Randy: Thanks for having me, Jeremy. This was fun.

May 11, 2022

00:59:08

This episode originally aired on Software Engineering Radio.

A few topics covered

Building on top of open source
Forking their GoTrue dependency
Relying on Postgres features like row level security
Adding realtime support based on Postgres's write ahead log
Generating an API layer based on the database schema with PostgREST
Creating separate EC2 instances for each customer's database
How Postgres could scale in the future
Monitoring postgres
Common support tickets
Permissive open source licenses

Related Links

Postgres

PostgreSQL
intro.html">Write-Ahead Logging
rowsecurity.html">Row Security Policies
pg_stat_statements
pgAdmin
PostGIS
Amazon Aurora

Transcript

You can help edit this transcript on GitHub.

[00:00:00] Jeremy: Today I'm talking to Ant Wilson, he's the co-founder and CTO of Supabase. Ant welcome to software engineering radio.

[00:00:07] Ant: Thanks so much. Great to be here.

[00:00:09] Jeremy: When I hear about Supabase, I always hear about it in relation to two other products. The first is Postgres, which is a open source relational database. And second is Firebase, which is a backend as a service product from Google cloud that provides a no SQL data store.

It provides authentication and authorization. It has a functions as a service component. It's really meant to be a replacement for you needing to have your own server, create your own backend. You can have that all be done from Firebase. I think a good place for us to start would be walking us through what supabase is and how it relates to those two products.

[00:00:55] Ant: Yeah. So, so we brand ourselves as the open source Firebase alternative

that came primarily from the fact that we ourselves do use the, as the alternative to Firebase. So, so my co-founder Paul in his previous startup was using fire store. And as they started to scale, they hit certain limitations, technical scaling limitations and he'd always been a huge Postgres fan.

So we swapped it out for Postgres and then just started plugging in. The bits that we're missing, like the real-time streams. Um, He used the tool called PostgREST with a T for the, for the CRUD APIs. And so

he just built like the open source Firebase alternative on Postgres, And that's kind of where the tagline came from.

But the main difference obviously is that it's relational database and not a no SQL database which means that it's not actually a drop-in replacement. But it does mean that it kind of opens the door to a lot more functionality actually. Um, Which, which is hopefully an advantage for us.

[00:02:03] Jeremy: it's a, a hosted form of Postgres. So you mentioned that Firebase is, is different. It's uh NoSQL. People are putting in their, their JSON objects and things like that. So when people are working with Supabase is the experience of, is it just, I'm connecting to a Postgres database I'm writing SQL.

And in that regard, it's kind of not really similar to Firebase at all. Is that, is that kind of right?

[00:02:31] Ant: Yeah, I mean, the other thing, the other important thing to notice that you can communicate with Supabase directly from the client, which is what people love about fire base. You just like put the credentials on the client and you write some security rules, and then you just start sending your data. Obviously with supabase, you do need to create your schema because it's relational.

But apart from that, the experience of client side development is very much the same or very similar the interface, obviously the API is a little bit different. But, but it's similar in that regard. But I, I think, like I said, we're moving, we are just a database company actually. And the tagline, just explained really, well, kind of the concept of, of what it is like a backend as a service. It has the real-time streams. It has the auth layer. It has the also generated APIs. So I don't know how long we'll stick with the tagline. I think we'll probably outgrow it at some point. Um, But it does do a good job of communicating roughly what the service is.

[00:03:39] Jeremy: So when we talk about it being similar to Firebase, the part that's similar to fire base is that you could be a person building the front end part of the website, and you don't need to necessarily have a backend application because all of that could talk to supabase and supabase can handle the authentication, the real-time notifications all those sorts of things, similar to Firebase, where we're basically you only need to write the front end part, and then you have to know how to, to set up super base in this case.

[00:04:14] Ant: Yeah, exactly. And some of the other, like we took w we love fire based, by the way. We're not building an alternative to try and destroy it. It's kind of like, we're just building the SQL alternative and we take a lot of inspiration from it. And the other thing we love is that you can administer your database from the browser.

So you go into Firebase and you have the, you can see the object tree, and when you're in development, you can edit some of the documents in real time. And, and so we took that experience and effectively built like a spreadsheet view inside of our dashboard. And also obviously have a SQL editor in there as well.

And trying to, create this, this like a similar developer experience, because that's where Firebase just excels is. The DX is incredible. And so we, we take a lot of inspiration from it in, in those respects.

[00:05:08] Jeremy: and to to make it clear to our listeners as well. When you talk about this interface, that's kind of like a spreadsheet and things like that. I suppose it's similar to somebody opening up pgAdmin, I suppose, and going in and editing the rows. But, but maybe you've got like another layer on top that just makes it a little more user-friendly a little bit more like something you would get from Firebase, I guess.

[00:05:33] Ant: Yeah.

And, you know, we, we take a lot of inspiration from pgAdmin. PG admin is also open source. So I think we we've contributed a few things and, or trying to upstream a few things into PG admin. The other thing that we took a lot of inspiration from for the table editor, what we call it is airtable.

And because airtable is effectively. a a relational database and that you can just come in and, you know, click to add your columns, click to add a new table. And so we just want to reproduce that experience again, backed up by a full Postgres dedicated database.

[00:06:13] Jeremy: so when you're working with a Postgres database, normally you need some kind of layer in front of it, right? That the person can't open up their website and connect directly to Postgres from their browser. And you mentioned PostgREST before. I wonder if you could explain a little bit about what that is and how it works.

[00:06:34] Ant: Yeah, definitely. so yeah, PostgREST has been around for a while. Um, It's basically an, a server that you connect to, to your Postgres database and it introspects your schemas and generates an API for you based on the table names, the column names. And then you can basically then communicate with your Postgres database via this restful API.

So you can do pretty much, most of the filtering operations that you can do in SQL um, uh, equality filters. You can even do full text search over the API. So it just means that whenever you obviously add a new table or a new schema or a new column the API just updates instantly. So you, you don't have to worry about writing that, that middle layer which is, was always the drag right.

When, what have you started a new project. It's like, okay, I've got my schema, I've got my client. Now I have to do all the connecting code in the middle of which is kind of, yeah, no, no developers should need to write that layer in 2022.

[00:07:46] Jeremy: so this the layer you're referring to, when I think of a traditional. Web application. I think of having to write routes controllers and, and create this, this sort of structure where I know all the tables in my database, but the controllers I create may not map one to one with those tables. And so you mentioned a little bit about how PostgREST looks at the schema and starts to build an API automatically.

And I wonder if you could explain a little bit about how it does those mappings or if you're writing those yourself.

[00:08:21] Ant: Yeah, it basically does them automatically by default, it will, you know, map every table, every column. When you want to start restricting things. Well, there's two, there's two parts to this. There's one thing which I'm sure we'll get into, which is how is this secure since you are communicating direct from the client.

But the other part is what you mentioned giving like a reduced view of a particular date, bit of data. And for that, we just use Postgres views. So you define a view which might be, you know it might have joins across a couple of different tables or it might just be a limited set of columns on one of your tables. And then you can choose to just expose that view.

[00:09:05] Jeremy: so it sounds like when you would typically create a controller and create a route. Instead you create a view within your Postgres database and then PostgREST can take that view and create an end point for it, map it to that.

[00:09:21] Ant: Yeah, exactly (laughs) .

[00:09:24] Jeremy: And, and PostgREST is an open source project. Right. I wonder if you could talk a little bit about sort of what its its history was. How did you come to choose it?

[00:09:37] Ant: Yeah.

I think, I think Paul probably read about it on hacker news at some point. Anytime it appears on hacker news, it just gets voted to the front page because it's, it's So awesome. And we, we got connected to the maintainer, Steve Chavez. At some point I think he just took an interest in, or we took an interest in Postgres and we kind of got acquainted.

And then we found out that, you know, Steve was open to work and this kind of like probably shaped a lot of the way we think about building out supabase as a project and as a company in that we then decided to employ Steve full time, but just to work on PostgREST because it's obviously a huge benefit for us.

We're very reliant on it. We want it to succeed because it helps our business. And then as we started to add the other components, we decided that we would then always look for existing tools, existing opensource projects that exist before we decided to build something from scratch. So as we're starting to try and replicate the features of Firebase we would and auth is a great example.

We did a full audit of what are all the authorization, authentication, authentication open-source tools that are out there and which one was, if any, would fit best. And we found, and Netlify had built a library called gotrue written in go, which did pretty much exactly what we needed. So we just adopted that.

And now obviously, you know, we, we just have a lot of people on the team contributing to, to gotrue as well.

[00:11:17] Jeremy: you touched on this a little bit earlier. Normally when you connect to a Postgres database your user has permission to, to basically everything I guess, by default, anyways. And so. So, how does that work? Where when you want to restrict people's permissions, make sure they only get to see records they're allowed to see how has that all configured in PostgREST and what's happening behind the scenes?

[00:11:44] Ant: Yeah, we, the great thing about Postgres is it's got this concept of row level security, which actually, I don't think I even rarely looked at until we were building out this auth feature where the security rules live in your database as SQL. So you do like a create policy query, and you say anytime someone tries to select or insert or update apply this policy.

And then how it all fits together is our auth server go true. Someone will basically make a request to sign in or sign up with email and password, and we create that user inside the, database. They get issued a URL. And they get issued a JSON, web token, a JWT, and which, you know, when they, when they have it on the, client side, proves that they are this, you, you ID, they have access to this data.

Then when they make a request via PostgREST, they send the JWT in the authorization header. Then Postgres will pull out that JWT check the sub claim, which is the UID and compare it to any rows in the database, according to the policy that you wrote. So, so the most basic one is you say in order to, to access this row, it must have a column you UID and it must match whatever is in the JWT.

So we basically push the authorization down into the database which actually has, you know, a lot of other benefits in that as you write new clients, You don't need to have, have it live, you know, on an API layer on the client. It's kind of just, everything is managed from the database.

[00:13:33] Jeremy: So the, the, you, you ID, you mentioned that represents the user, correct.

[00:13:39] Ant: Yeah.

[00:13:41] Jeremy: Is that, does that map to a user in post graphs or is there some other way that you're mapping those permissions?

[00:13:50] Ant: Yeah. When, so when you connect go true, which is the auth server to your Postgres database for the first time, it installs its own schema. So you'll have an auth schema and inside will be all start users with a list of the users. It'll have a uh, auth dot tokens which will store all the access tokens that it's issued.

So, and one of the columns on the auth start user's table will be UUID, and then whenever you write application specific schemers, you can just join a, do a foreign key relation to the author users table. So, so it all gets into schema design and and hopefully we do a good job of having some good education content in the docs as well.

Because one of the things we struggled with from the start was how much do we abstract away from SQL away from Postgres and how much do we educate? And we actually landed on the educate sides because I mean, once you start learning about Postgres, it becomes kind of a superpower for you as a developer.

So we'd much rather. Have people discover us because we're a firebase alternatives frontend devs then we help them with things like schema design landing about row level security. Because ultimately like every, if you try and abstract that stuff it gets kind of crappy. And maybe not such a great experience.

[00:15:20] Jeremy: to make sure I understand correctly. So you have GoTrue, which is uh, a Netlify open-source project that GoTrue project creates some tables in your, your database that has like, you've mentioned the tokens, the, the different users. Somebody makes a request to GoTrue. Like here's my username, my password go true.

Gives them back a JWT. And then from your front end, you send that JWT to the PostgREST endpoint. And from that JWT, it's able to know which user you are and then uses postgres' built in a row level security to figure out which rows you're, you're allowed to bring back. Did I, did I get that right?

[00:16:07] Ant: That is pretty much exactly how it works. And it's impressive that you garnered that without looking at a single diagram (laughs) But yeah, and, and, and obviously we, we provide a client library supabase JS, which actually does a lot of this work for you. So you don't need to manually attach the JJ JWT in a header.

If you've authenticated with supabase JS, then every request sent to PostgREST. After that point, the header will just be attached automatically, and you'll be in a session as that user.

[00:16:43] Jeremy: and, and the users that we're talking about when we talk about Postgres' row level security. Are those actual users in PostgreSQL. Like if I was to log in with psql, I could actually log in with those users.

[00:17:00] Ant: They're not, you could potentially structure it that way. But it would be more advanced it's it's basically just users in, in the auth.users table, the way, the way it's currently done.

[00:17:12] Jeremy: I see and postgrest has the, that row level security is able to work with that table. You, you don't need to have actual Postgres users.

[00:17:23] Ant: Exactly. And, and it's, it's basically turing complete. I mean, you can write extremely complex auth policies. You can say, you know, only give access to this particular admin group on a Thursday afternoon between six and 8:00 PM. You can get really, yeah. really as fancy as you want.

[00:17:44] Jeremy: Is that all written in SQL or are there other languages they allow you to use?

[00:17:50] Ant: Yeah. It's the default is plain SQL. Within Postgres itself, you can use

I think you can use, like there's a Python extension. There's a JavaScript extension, which is a, I think it's a subsets of, of JavaScripts. I mean, this is the thing with Postgres, it's super extensible and people have probably got all kinds of interpreters.

So you, yeah, you can use whatever you want, but the typical user will just use SQL.

[00:18:17] Jeremy: interesting. And that applies to logic in general, I suppose, where if you were writing a rails application, you might write Ruby. Um, If you're writing a node application, you write JavaScript, but you're, you're saying in a lot of cases with PostgREST, you're actually able to do what you want to do, whether that's serialization or mapping objects, do that all through SQL.

[00:18:44] Ant: Yeah, exactly, exactly. And then obviously like there's a lot of awesome other stuff that Postgres has like this postGIS, which if you're doing geo, if you've got like a geo application, it'll load it up with a geo types for you, which you can just use. If you're doing like encryption and decryption, we just added PG libsodium, which is a new and awesome cryptography extension.

And so you can use all of these, these all add like functions, like SQL functions which you can kind of use in, in any parts of the logic or in the role level policies. Yeah.

[00:19:22] Jeremy: and something I thought was a little unique about PostgREST is that I believe it's written in Haskell. Is that right?

[00:19:29] Ant: Yeah, exactly. And it makes it fairly inaccessible to me as a result. But the good thing is it's got a thriving community of its own and, you know, people who on there's people who contribute probably because it's written in haskell. And it's, it's just a really awesome project and it's an excuse to, to contribute to it.

But yeah. I, I think I did probably the intro course, like many people and beyond that, it's just, yeah, kind of inaccessible to me.

[00:19:59] Jeremy: yeah, I suppose that's the trade-off right. Is you have a, a really passionate community about like people who really want to use Haskell and then you've got the, the, I guess the group like yourselves that looks at it and goes, oh, I don't, I don't know about this.

[00:20:13] Ant: I would, I would love to have the time to, to invest in uh, but not practical right now.

[00:20:21] Jeremy: You talked a little bit about the GoTrue project from Netlify. I think I saw on one of your blog posts that you actually forked it. Can you sort of explain the reasoning behind doing that?

[00:20:34] Ant: Yeah, initially it was because we were trying to move extremely fast. So, so we did Y Combinator in 2020. And when you do Y Combinator, you get like a part, a group partner, they call it one of the, the partners from YC and they add a huge amount of external pressure to move very quickly. And, and our biggest feature that we are working on in that period was auth.

And we just kept getting the question of like, when are you going to ship auth? You know, and every single week we'd be like, we're working on it, we're working on it. And um, and one of the ways we could do it was we just had to iterate extremely quickly and we didn't rarely have the time to, to upstream things correctly.

And actually like the way we use it in our stack is slightly differently. They connected to MySQL, we connected to Postgres. So we had to make some structural changes to do that. And the dream would be now that we, we spend some time upstream and a lot of the changes. And hopefully we do get around to that.

But the, yeah, the pace at which we've had to move over the last uh, year and a half has been kind of scary and, and that's the main reason, but you know, hopefully now we're a little bit more established. We can hire some more people to, to just focus on, go true and, and bringing the two folks back together.

[00:22:01] Jeremy: it's just a matter of, like you said speed, I suppose, because the PostgREST you, you chose to continue working off of the existing open source project, right?

[00:22:15] Ant: Yeah, exactly. Exactly. And I think the other thing is it's not a major part of Netlify's business, as I understand it. I think if it was and if both companies had more resource behind it, it would make sense to obviously focus on on the single codebase but I think both companies don't contribute as much resource as as we would like to, but um, but it's, it's for me, it's, it's one of my favorite parts of the stack to work on because it's written in go and I kind of enjoy how that it all fits together.

So Yeah. I, I like to dive in there.

[00:22:55] Jeremy: w w what about go, or what about how it's structured? Do you particularly enjoy about the, that part of the project?

[00:23:02] Ant: I think it's so I actually learned learned go through, gotrue and I'm, I have like a Python and C plus plus background And I hate the fact that I don't get to use Python and C plus posts rarely in my day to day job. It's obviously a lot of type script. And then when we inherited this code base, it was kind of, as I was picking it up I, it just reminded me a lot of, you know, a lot of the things I loved about Python and C plus plus, and, and the tooling around it as well. I just found to be exceptional. So, you know, you just do like a small amounts of conflig. Uh config, And it makes it very difficult to, to write bad code, if that makes sense.

So the compiler will just, boot you back if you try and do something silly which isn't necessarily the case with, with JavaScript. I think TypeScript is a little bit better now, but Yeah, I just, it just reminded me a lot of my Python and C days.

[00:24:01] Jeremy: Yeah, I'm not too familiar with go, but my understanding is that there's, there's a formatter that's a part of the language, so there's kind of a consistency there. And then the language itself tries to get people to, to build things in the same way, or maybe have simpler ways of building things. Um, I don't, I don't know.

Maybe that's part of the appeal.

[00:24:25] Ant: Yeah, exactly. And the package manager as well is great. It just does a lot of the importing automatically. and makes sure like all the declarations at the top are formatted correctly and, and are definitely there. So Yeah. just all of that tool chain is just really easy to pick up.

[00:24:46] Jeremy: Yeah. And I, and I think compiled languages as well, when you have the static type checking. By the compiler, you know, not having things blow up and run time. That's, that's just such a big relief, at least for me in a lot of cases,

[00:25:00] Ant: And I just loved the dopamine hits of when you compile something on it actually compiles this. I lose that with, with working with JavaScript.

[00:25:11] Jeremy: for sure. One of the topics you mentioned earlier was how super base provides real-time database updates. And which is something that as far as I know is not natively a part of Postgres. So I wonder if you could explain a little bit about how that works and how that came about.

[00:25:31] Ant: Yeah. So, So Postgres, when you add replication databases the way it does is it writes everything to this thing called the write ahead log, which is basically all the changes that uh, have, are going to be applied to, to the database. And when you connect to like a replication database. It basically streams that log across.

And that's how the replica knows what, what changes to, to add. So we wrote a server, which basically pretends to be a Postgres rep, replica receives the right ahead log encodes it into JSON. And then you can subscribe to that server over web sockets. And so you can choose whether to subscribe, to changes on a particular schema or a particular table or particular columns, and even do equality matches on rows and things like this.

And then we recently added the role level security policies to the real-time stream as well. So that was something that took us a while to, cause it was probably one of the largest technical challenges we've faced. But now that it's in the real-time stream is, is fully secure and you can apply these, these same policies that you apply over the CRUD API as well.

[00:26:48] Jeremy: So for that part, did you have to look into the internals of Postgres and how it did its row level security and try to duplicate that in your own code?

[00:26:59] Ant: Yeah, pretty much. I mean it's yeah, it's fairly complex and there's a guy on our team who, well, for him, it didn't seem as complex, let's say (laughs) , but yeah, that's pretty much it it's just a lot of it's effectively a SQL um, a Postgres extension itself, uh which in-in interprets those policies and applies them to, to the, to the, the right ahead log.

[00:27:26] Jeremy: and this piece that you wrote, that's listening to the right ahead log. what was it written in and, and how did you choose that, that language or that stack?

[00:27:36] Ant: Yeah. That's written in the Elixir framework which is based on Erlang very horizontally scalable. So any applications that you write in Elixir can kind of just scale horizontally the message passing and, you know, go into the billions and it's no problem. So it just seemed like a sensible choice for this type of application where you don't know.

How large the wall is going to be. So it could just be like a few changes per second. It could be a million changes per second, then you need to be able to scale out. And I think Paul who's my co-founder originally, he wrote the first version of it and I think he wrote it as an excuse to learn Elixir, which is how, a lot of probably how PostgREST ended up being Haskell, I imagine.

But uh, but it's meant that the Elixir community is still like relatively small. But it's a group of like very passionate and very um, highly skilled developers. So when we hire from that pool everyone who comes on board is just like, yeah, just, just really good and really enjoy is working with Elixir.

So it's been a good source of a good source for hires as well. Just, just using those tools.

[00:28:53] Jeremy: with a feature like this, I'm assuming it's where somebody goes to their website. They make a web socket connection to your application and they receive the updates that way. How have you seen how far you're able to push that in terms of connections, in terms of throughput, things like that?

[00:29:12] Ant: Yeah, I don't actually have the numbers at hand. But we have, yeah, we have a team focused on obviously maximizing that but yeah, I don't I don't don't have those numbers right now.

[00:29:24] Jeremy: one of the last things you've you've got on your website is a storage project or a storage product, I should say. And I believe it's written in TypeScript, so I was curious, we've got PostGrest, which is in Haskell. We've got go true and go. Uh, We've got the real-time database part in elixir.

And so with storage, how did we finally get to TypeScript?

[00:29:50] Ant: (Laughs) Well, the policy we kind of landed on was best tool for the job. Again, the good thing about being an open source is we're not resource constrained by the number of people who are in our team. It's by the number of people who are in the community and I'm willing to contribute. And so for that, I think one of the guys just went through a few different options that we could have went with, go just to keep it in line with a couple of the other APIs.

But we just decided, you know, a lot of people well, everyone in the team like TypeScript is kind of just a given. And, and again, it was kind of down to speed, like what's the fastest uh we can get this up and running. And I think if we use TypeScript, it was, it was the best solution there. But yeah, but we just always go with whatever is best.

Um, We don't worry too much uh, about, you know, the resources we have because the open source community has just been so great in helping us build supabase. And building supabase is like building like five companies at the same time actually, because each of these vertical stacks could be its own startup, like the auth stack And the storage layer, and all of this stuff.

And you know, each has, it does have its own dedicated team. So yeah. So we're not too worried about the variation in languages.

[00:31:13] Jeremy: And the storage layer is this basically a wrapper around S3 or like what is that product doing?

[00:31:21] Ant: Yeah, exactly. It's it's wraparound as three. It, it would also work with all of the S3 compatible storage systems. There's a few Backblaze and a few others. So if you wanted to self host and use one of those alternatives, you could, we just have everything in our own S3 booklets inside of AWS.

And then the other awesome thing about the storage system is that because we store the metadata inside of Postgres. So basically the object tree of what buckets and folders and files are there. You can write your role level policies against the object tree. So you can say this, this user should only access this folder and it's, and it's children which was kind of. Kind of an accident. We just landed on that. But it's one of my favorite things now about writing applications and supervisors is the rollover policies kind of work everywhere.

[00:32:21] Jeremy: Yeah, it's interesting. It sounds like everything. Whether it's the storage or the authentication it's all comes back to postgres, right? At all. It's using the row level security. It's using everything that you put into the tables there, and everything's just kind of digging into that to get what it needs.

[00:32:42] Ant: Yeah. And that's why I say we are a database company. We are a Postgres company. We're all in on postgres. We got asked in the early days. Oh, well, would you also make it my SQL compatible compatible with something else? And, but the amounts. Features Postgres has, if we just like continue to leverage them then it, it just makes the stack way more powerful than if we try to you know, go thin across multiple different databases.

[00:33:16] Jeremy: And so that, that kind of brings me to, you mentioned how your Postgres companies, so when somebody signs up for supabase they create their first instance. What's what's happening behind the scenes. Are you creating a Postgres instance for them in a container, for example, how do you size it? That sort of thing.

[00:33:37] Ant: Yeah. So it's basically just easy to under the hood for us we, we have plans eventually to be multi-cloud. But again, going down to the speed of execution that the. The fastest way was to just spin up a dedicated instance, a dedicated Postgres instance per user on EC2. We do also package all of the API APIs together in a second EC2 instance.

But we're starting to break those out into clustered services. So for example, you know, not every user will use the storage API, so it doesn't make sense to Rooney for every user regardless. So we've, we've made that multitenant, the application code, and now we just run a huge global cluster which people connect through to access the S3 bucket.

Basically and we're gonna, we have plans to do that for the other services as well. So right now it's you got two EC2 instances. But over time it will be just the Postgres instance and, and we wanted. Give everyone a dedicated instance, because there's nothing worse than sharing database resource with all the users, especially when you don't know how heavily they're going to use it, whether they're going to be bursty.

So I think one of the things we just said from the start is everyone gets a Postgres instance and you get access to it as well. You can use your Postgres connection string to, to log in from the command line and kind of do whatever you want. It's yours.

[00:35:12] Jeremy: so did it, did I get it right? That when I sign up, I create a super base account. You're actually creating an two instance for me specifically. So it's like every customer gets their, their own isolated it's their own CPU, their own Ram, that sort of thing.

[00:35:29] Ant: Yeah, exactly, exactly. And, and the way the. We've set up the monitoring as well, is that we can expose basically all of that to you in the dashboard as well. so you can, you have some control over like the resource you want to use. If you want to a more powerful instance, we can do that. A lot of that stuff is automated.

So if someone scales beyond the allocated disk size, the disk will automatically scale up by 50% each time. And we're working on automating a bunch of these, these other things as well.

[00:36:03] Jeremy: so is it, is it where, when you first create the account, you might create, for example, a micro instance, and then you have internal monitoring tools that see, oh, the CPU is getting heady hit pretty hard. So we need to migrate this person to a bigger instance, that kind of thing.

[00:36:22] Ant: Yeah, pretty much exactly.

[00:36:25] Jeremy: And is that, is that something that the user would even see or is it the case of where you send them an email and go like, Hey, we notice you're hitting the limits here. Here's what's going to happen.

[00:36:37] Ant: Yeah.

In, in most cases it's handled automatically. There are people who come in and from day one, they say has my requirements. I'm going to have this much traffic. And I'm going to have, you know, a hundred thousand users hitting this every hour. And in those cases we will over-provisioned from the start.

But if it's just the self service case, then it will be start on a smaller instance and an upgrade over time. And this is one of our biggest challenges over the next five years is we want to move to a more scalable Postgres. So cloud native Postgres. But the cool thing about this is there's a lot of.

Different companies and individuals working on this and upstreaming into Postgres itself. So for us, we don't need to, and we, and we would never want to fork Postgres and, you know, and try and separate the storage and the the computes. But more we're gonna fund people who are already working on this so that it gets upstreamed into Postgres itself.

And it's more cloud native.

[00:37:46] Jeremy: Yeah. So I think the, like we talked a little bit about how Firebase was the original inspiration and when you work with Firebase, you, you don't think about an instance at all, right? You, you just put data in, you get data out. And it sounds like in this case, you're, you're kind of working from the standpoint of, we're going to give you this single Postgres instance.

As you hit the limits, we'll give you a bigger one. But at some point you, you will hit a limit of where just that one instance is not enough. And I wonder if there's you have any plans for that, or if you're doing anything currently to, to handle that.

[00:38:28] Ant: Yeah. So, so the medium goal is to do replication like horizontal scaling. We, we do that for some users already but we manually set that up. we do want to bring that to the self serve model as well, where you can just choose from the start. So I want, you know, replicas in these, in these zones and in these different data centers.

But then, like I said, the long-term goal is that. it's not based on. Horizontally scaling a number of instances it's just a Postgres itself can, can scale out. And I think we will get to, I think, honestly, the race at which the Postgres community is working, I think we'll be there in two years.

And, and if we can contribute resource towards that, that goal, I think yeah, like we'd love to do that, but yeah, but for now, it's, we're working on this intermediate solution of, of what people already do with, Postgres, which is, you know, have you replicas to make it highly available.

[00:39:30] Jeremy: And with, with that, I, I suppose at least in the short term, the goal is that your monitoring software and your team is handling the scaling up the instance or creating the read replicas. So to the user, it, for the most part feels like a managed service. And then yeah, the next step would be to, to get something more similar to maybe Amazon's Aurora, I suppose, where it just kind of, you pay per use.

[00:40:01] Ant: Yeah, exactly. Exactly. Aurora was kind of the goal from the start. It's just a shame that it's proprietary. Obviously.

[00:40:08] Jeremy: right.

Um, but it sounds,

[00:40:10] Ant: the world would be a better place. If aurora was opensource.

[00:40:15] Jeremy: yeah. And it sounds like you said, there's people in the open source community that are, that are trying to get there. just it'll take time. to, to all this, about making it feel seamless, making it feel like a serverless experience, even though internally, it really isn't, I'm guessing you must have a fair amount of monitoring or ways that you're making these decisions.

I wonder if you can talk a little bit about, you know, what are the metrics you're looking at and what are the applications you're you have to, to help you make these decisions?

[00:40:48] Ant: Yeah. definitely. So we started with Prometheus which is a, you know, metrics gathering tool. And then we moved to Victoria metrics which was just easier for us to scale out. I think soon we'll be managing like a hundred thousand Postgres databases will have been deployed on, on supabase. So definitely, definitely some scale. So this kind of tooling needs to scale to that as well. And then we have agents kind of everywhere on each application on, on the database itself. And we listen for things like the CPU and the Ram and the network IO. We also poll. Uh, Postgres itself. Th there's a extension called PG stats statements, which will give us information about what are, the intensive queries that are running on that, on that box.

So we just collect as much of this as possible um, which we then obviously use internally. We set alerts to, to know when, when we need to upgrade in a certain direction, but we also have an end point where the dashboard subscribes to these metrics as well. So the user themselves can see a lot of this information.

And we, I think at the moment we do a lot of the, the Ram the CPU, that kind of stuff, but we're working on adding just more and more of these observability metrics uh, so people can can know it could, because it also helps with Let's say you might be lacking an index on a particular table and not know about it.

And so if we can expose that to you and give you alerts about that kind of thing, then it obviously helps with the developer experience as well.

[00:42:29] Jeremy: Yeah. And th that brings me to something that I, I hear from platform as a service companies, where if a user has a problem, whether that's a crash or a performance problem, sometimes it can be difficult to distinguish between is it a problem in their application or is this a problem in super base or, you know, and I wonder how your support team kind of approaches that.

[00:42:52] Ant: Yeah, no, it's, it's, it's a great question. And it's definitely something we, we deal with every day, I think because of where we're at as a company we've always seen, like, we actually have a huge advantage in that.

we can provide. Rarely good support. So anytime an engineer joins super base, we tell them your primary job is actually frontline support.

Everything you do afterwards is, is secondary. And so everyone does a four hour shift per week of, of working directly with the customers to help determine this kind of thing. And where we are at the moment is we are happy to dive in and help people with their application code because it helps our engineers land about how it's being used and where the pitfalls are, where we need better documentation, where we need education.

So it's, that is all part of the product at the moment, actually. And, and like I said, because we're not a 10,000 person company we, it's an advantage that we have, that we can deliver that level of support at the moment.

[00:44:01] Jeremy: w w what are some of the most common things you see happening? Like, is it I would expect you mentioned indexing problems, but I'm wondering if there's any specific things that just come up again and again,

[00:44:15] Ant: I think like the most common is people not batching their requests. So they'll write an application, which, you know, needs to, needs to pull 10,000 rows and they send 10,000 requests (laughs) . That that's, that's a typical one for, for people just getting started maybe. Yeah. and, then I think the other thing we faced in the early days was. People storing blobs in the database which we obviously solve that problem by introducing file storage. But people will be trying to store, you know, 50 megabytes, a hundred megabyte files in Postgres itself, and then asking why the performance was so bad.

So I think we've, we've mitigated that one by, by introducing the blob storage.

[00:45:03] Jeremy: and when you're, you mentioned you have. Over a hundred thousand instances running. I imagine there have to be cases where an incident occurs, where something doesn't go quite right. And I wonder if you could give an example of one and how it was resolved.

[00:45:24] Ant: Yeah, it's a good question. I think, yeah, w w we've improved the systems since then, but there was a period where our real time server wasn't able to handle rarely large uh, right ahead logs. So w there was a period where people would just make tons and tons of requests and updates to, to Postgres. And the real time subscriptions were failing. But like I said, we have some really great Elixir devs on the team, so they were able to jump on that fairly quickly. And now, you know, the application is, is way more scalable as a result. And that's just kind of how the support model works is you have a period where everything is breaking and then uh, then you can just, you know, tackle these things one by one.

[00:46:15] Jeremy: Yeah, I think any, anybody at a, an early startup is going to run into that. Right? You put it out there and then you find out what's broken, you fix it and you just get better and better as it goes along.

[00:46:28] Ant: Yeah, And the funny thing was this model of, of deploying EC2 instances. We had that in like the first week of starting super base, just me and Paul. And it was never intended to be the final solution. We just kind of did it quickly and to get something up and running for our first handful of users But it's scaled surprisingly well.

And actually the things that broke as we started to get a lot of traffic and a lot of attention where was just silly things. Like we give everyone their own domain when they start a new project. So you'll have project ref dot super base dot in or co. And the things that were breaking where like, you know, we'd ran out of sub-domains with our DNS provider and then, but, and those things always happen in periods of like intense traffic.

So we ha we were on the front page of hacker news, or we had a tech crunch article, and then you discover that you've ran out of sub domains and the last thousand people couldn't deploy their projects. So that's always a fun a fun challenge because you are then dependent on the external providers as well and theirs and their support systems.

So yeah, I think. We did a surprisingly good job of, of putting in good infrastructure from the start. But yeah, all of these crazy things just break when obviously when you get a lot of, a lot of traffic

[00:48:00] Jeremy: Yeah, I find it interesting that you mentioned how you started with creating the EC2 instances and it turned out that just work. I wonder if you could walk me through a little bit about how it worked in the beginning, like, was it the two of you going in and creating instances as people signed up and then how it went from there to where it is today?

[00:48:20] Ant: yeah. So there's a good story about, about our fast user, actually. So me and Paul used to contract for a company in Singapore, which was an NFT company. And so we knew the lead developer very well. And we also still had the Postgres credentials on, on our own machines. And so what we did was we set up the th th the other funny thing is when we first started, we didn't intend to host the database.

We, we thought we were just gonna host the applications that would connect to your existing Postgres instance. And so what we did was we hooked up the applications to, to the, to the Postgres instance of this, of this startup that we knew very well. And then we took the bus to their office and we sat with the lead developer, and we said, look, we've already set this thing up for you.

What do you think. know, when, when you think like, ah, we've, we've got the best thing ever, but it's not until you put it in front of someone and you see them, you know, contemplating it and you're like, oh, maybe, maybe it's not so good. Maybe we don't have anything. And we had that moment of panic of like, oh, maybe we just don't maybe this isn't great.

And then what happened was he didn't like use us. He didn't become a supabase user. He asked to join the team.

[00:49:45] Jeremy: nice, nice.

[00:49:46] Ant: that was a good a good kind of a moment where we thought, okay, maybe we have got something, maybe this is maybe this isn't terrible. So, so yeah, so he became our first employee. Yeah.

[00:49:59] Jeremy: And so yeah, so, so that case was, you know, the very beginning you set everything up from, from scratch. Now that you have people signing up and you have, you know, I don't know how many signups you get a day. Did you write custom infrastructure or applications to do the provisioning or is there an open source project that you're using to handle that

[00:50:21] Ant: Yeah. It's, it's actually mostly custom. And you know, AWS does a lot of the heavy lifting for you. They just provide you with a bunch of API end points. So a lot of that is just written in TypeScript fairly straightforward and, and like I said, you never intended to be the thing that last. Two years into the business.

But it's, it's just scaled surprisingly well. And I'm sure at some point we'll, we'll swap it out for some I don't orchestration tooling like Pulumi or something like this. But actually the, what we've got just works really well.

[00:50:59] Ant: Be because we're so into Postgres our queuing system is a Postgres extension called PG boss. And then we have a fleet of workers, which are. Uh, We manage on EC ECS. Um, So it's just a bunch of VMs basically which just subscribed to the, to the queue, which lives inside the database.

And just performs all the, whether it be a project creation, deletion modification a whole, whole suite of these things. Yeah.

[00:51:29] Jeremy: very cool. And so even your provisioning is, is based on Postgres.

[00:51:33] Ant: Yeah, exactly. Exactly (laughs) .

[00:51:36] Jeremy: I guess in that case, I think, did you say you're using the right ahead log there to in order to get notifications?

[00:51:44] Ant: We do use real time, and this is the fun thing about building supabase is we use supabase to build supabase. And a lot of the features start with things that we build for ourselves. So the, the observability features we have a huge logging division. So, so w we were very early users of a tool called a log flare, which is also written in Elixir.

It's basically a log sync backed up by BigQuery. And we loved it so much and we became like super log flare power users that it was kind of, we decided to eventually acquire the company. And now we can just offer log flare to all of our customers as well as part of using supabase. So you can query your logs and get really good business intelligence on what your users um, consuming in from your database.

[00:52:35] Jeremy: the lock flare you're mentioning though, you said that that's a log sink and that that's actually not going to Postgres, right. That's going to a different type of store.

[00:52:43] Ant: Yeah. That is going to big query actually.

[00:52:46] Jeremy: Oh, big query. Okay.

[00:52:47] Ant: yeah, and maybe eventually, and this is the cool thing about watching the Postgres progression is it's become. It's bringing like transactional and analytical databases together. So it's traditionally been a great transactional database, but if you look at a lot of the changes that have been made in recent versions, it's becoming closer and closer to an analytical database.

So maybe at some point we will use it, but yeah, but big query works just great.

[00:53:18] Jeremy: Yeah. It's, it's interesting to see, like, I, I know that we've had episodes on different extensions to Postgres where I believe they change out how the storage works. So there's yeah, it's really interesting how it's it's this one database, but it seems like it can take so many different forms.

[00:53:36] Ant: It's just so extensible and that's why we're so bullish on it because okay. Maybe it wasn't always the best database, but now it seems like it is becoming the best database and the rate at which it's moving. It's like, where's it going to be in five years? And we're just, yeah, we're just very bullish on, on Postgres.

As you can tell from the amount of mentions it's had in this episode.

[00:54:01] Jeremy: yeah, we'll have to count how many times it's been said. I'm sure. It's, I'm sure it's up there. Is there anything else we, we missed or think you should have mentioned.

[00:54:12] Ant: No, some of the things we're excited about are cloud functions. So it's the thing we just get asked for the most at anytime we post anything on Twitter, you're guaranteed to get a reply, which is like when functions. And we're very pleased to say that it's, it's almost there. So um, that will hopefully be a really good developer experience where also we launched like a, a graph QL Postgres extension where the resolver lives inside of Postgres.

And that's still in early alpha, but I think I'm quite excited for when we can start offering that on the on the hosted platform as well. People will have that option to, to use GraphQL instead of, or as well as the restful API.

[00:55:02] Jeremy: the, the common thread here is that PostgreSQL you're able to take it really, really far. Right. In terms of scale up, eventually you'll have the read replicas. Hopefully you'll have. Some kind of I don't know what you would call Aurora, but it's, it's almost like self provisioning, maybe not sharing what, how you describe it.

But I wonder as a, as a company, like we talked about big query, right? I wonder if there's any use cases that you've come across, either from customers or in your own work where you're like, I just, I just can't get it to fit into Postgres.

[00:55:38] Ant: I think like, not very often, but sometimes we'll, we will respond to support requests and recommend that people use Firebase. they're rarely

like if, if they really do have like large amounts of unstructured data, which is which, you know, documented storage is, is kind of perfect for that. We'll just say, you know, maybe you should just use Firebase.

So we definitely come across things like that. And, and like I said, we love, we love Firebase, so we're definitely not trying to, to uh, destroy as a tool. I think it, it has its use cases where it's an incredible tool yeah. And provides a lot of inspiration for, for what we're building as well.

[00:56:28] Jeremy: all right. Well, I think that's a good place to, to wrap it up, but where can people hear more about you hear more about supabase?

[00:56:38] Ant: Yeah, so supeabase is at supabase.com. I'm on Twitter at ant Wilson. Supabase is on Twitter at super base. Just hits us up. We're quite active on the and then definitely check out the repose gets up.com/super base. There's lots of great stuff to dig into as we discussed. There's a lot of different languages, so kind of whatever you're into, you'll probably find something where you can contribute.

[00:57:04] Jeremy: Yeah, and we, we sorta touched on this, but I think everything we've talked about with the exception of the provisioning part and the monitoring part is all open source. Is that correct?

[00:57:16] Ant: Yeah, exactly.

And as, yeah. And hopefully everything we build moving forward, including functions and graph QL we'll continue to be open source.

[00:57:31] Jeremy: And then I suppose the one thing I, I did mean to touch on is what, what is the, the license for all the components you're using that are open source?

[00:57:41] Ant: It's mostly Apache2 or MIT. And then obviously Postgres has its own Postgres license. So as long as it's, it's one of those, then we, we're not too precious. I, As I said, we inherit a fair amounts of projects. So we contribute to and adopt projects. So as long as it's just very permissive, then we don't care too much.

[00:58:05] Jeremy: As far as the projects that your team has worked on, I've noticed that over the years, we've seen a lot of companies move to things like the business source license or there's, there's all these different licenses that are not quite so permissive. And I wonder like what your thoughts are on that for the future of your company and why you think that you'll be able to stay permissive.

[00:58:32] Ant: Yeah, I really, really, rarely hope that we can stay permissive. forever. It's, it's a philosophical thing for, for us. You know, when we, we started the business, it's what just very, just very, as individuals into the idea of open source. And you know, if, if, if AWS come along at some point and offer hosted supabase on AWS, then it will be a signal that where we're doing something.

Right. And at that point we just, I think we just need to be. The best team to continue to move super boost forward. And if we are that, and I, I think we will be there and then hopefully we will never have to tackle this this licensing issue.

[00:59:19] Jeremy: All right. Well, I wish you, I wish you luck.

[00:59:23] Ant: Thanks. Thanks for having me.

[00:59:25] Jeremy: This has been Jeremy Jung for software engineering radio. Thanks for listening.

Apr 04, 2022

01:09:17

Jason Swett is the author of the Complete Guide to Rails Testing. We covered Jason's experience with testing while building relatively small Ruby on Rails applications. Our conversation applies to just about any language or framework so don't worry if you aren't familiar with Rails.

A few topics covered:- Listen to advice but be aware of its context. Something good for a large project may not apply to a small one- Fast feedback loops help us work quicker and tests are great for this- If you don't involve things like the database in any of your tests your application may not work at all despite your tests passing- You may not need to worry about scaling at the start for smaller or internal applications - Try to break features into the smallest pieces possible so they can be checked in and reviewed quickly- Jason doesn't remember the difference between a stub and a mock because he rarely uses them

Transcript:

[00:00:00] Jeremy: today I'm talking to Jason Swett, he's the author of the complete guide to rails testing, a frequent trainer and conference speaker. And he's the host of the code with Jason podcast. So Jason, welcome to software sessions.

[00:00:13] Jason: Thanks for having me.

[00:00:15] Jeremy: from listening to your podcast, I get a sense that the size of the projects you work on they're, they're relatively modest.

Like they're not like a super huge thing. There, there may be something that you can fit all within your head. And I was wondering if you could talk a little bit to that first, so that we kind of get where your perspective is and the types of projects you work on are.

[00:00:40] Jason: Yeah. Good question. So that is true. Most of my jobs have been at small companies and I think that's probably typical of the typical developer because most businesses in the world are small businesses. You know, there's, there's a whole bunch of small businesses for every large business. And so most of the code bases I've worked on have been not particularly huge.

And most of the teams I've worked on have been relatively small And sometimes so small that it's just me. I'm the only person working on the application. I, don't really know any different. So I can't really compare it to working on a larger application. I have worked at, I worked at AT&T so that was a big place, but I was at, AT&T just working on my own solo project so that wasn't a big code base either.

So yeah, that's been what my experience has been like.

[00:01:36] Jeremy: Yeah. And I, I think that's interesting that you mentioned most people work in that space as well, because that's basically where I fall as well. So when I listened to your podcast and I hear you talking about like, oh, I have a, I have a rails project where I just have a single server and you know, I have a database and rails, and maybe I have nginx in front, maybe redis it's sort of the scale that I'm familiar with versus when I hear podcasts or articles, you know, I'm reading where they're talking about, oh, we have 500 microservices or we have 200 instances of the application.

That's, that's not a space that I've, I've worked in. So I, I found it helpful to, to hear, you know, from you on your show that like, Hey, you know, not everybody is working on these gigantic projects.

[00:02:28] Jason: Yeah. Yeah. It's not terribly relatable when you hear about those huge projects.

And obviously, sometimes, maybe people earlier in their career can get the wrong idea about what's applicable to their situation. I feel like one of the most dangerous kinds of advice is advice that's good advice, but it's good advice for somebody else.

And then I've, I've. Been victim of that, where I get some advice and maybe it's genuinely good advice, but it's not good advice for me where I am doing what I'm doing. And so, I apply the advice, but it's not the right thing. And so it doesn't work out for me. So I'm always careful to like asterisk a lot of the things I say where it's like, Hey, this is, this is good advice if you're in this particular situation, but maybe not for everybody.

And really the truth is I, I try not to give advice these days because like advice is dangerous stuff for that very reason.

[00:03:28] Jeremy: so, so when you mentioned you try not to give advice and you have this book, the complete guide to rails testing, would you not describe what's in the book as advice? I'm kind of curious what the distinction is there.

[00:03:42] Jason: Yeah, Jeremy, right after I said that, I'm like, what am I talking about? I give all kinds of advice. So forget, I said that I totally give advice. But maybe not in certain things like like business advice or anything like that. I do give a lot of advice around testing and various programming things.

So, yeah, ignore that part of what I said.

[00:04:03] Jeremy: something that I found a little bit unique about rails testing was that a lot of the tests are centered around I guess you could call it like a full integration test, right? Because I noticed when working with rails, if I write a test, a lot of times it's talking to the database, it's talking to if, if I.

Have an API or I have a website it's actually talking to the API. So it's actually going through all the layers and spinning up a database and all that. And I wonder if you, you knew how that work, like each time you run a test, is it creating a new database? So that each test is isolated or how does all that stuff actually work?

[00:04:51] Jason: Yeah, good question. First. I want to mention something about terminology. So I think one of the most important things for somebody who's new to testing to learn is that in our industry, we don't have a consensus around terminology. So what you call an integration test might be different from what I call an integration test.

The thing you just described as an integration test, I might call an acceptance test. Although I happen to also call it an integration test cause I use that terminology too, but I just wanted to give that little asterisk for the listener, because if they're like, wait, I thought an integration test was this.

And not that anyway, you asked how does that work? So. It is true that with those types of rails tests, and just to add more terminology into the mix, they call those system tests or system specs, depending on what framework you're using. But those are the tests that actually instantiate a browser and simulating user input, exercise, the UI of the application.

And those are the kinds of tests that like show you that everything works together. And mechanically how that works. One layer of it is that each test runs in a database transaction. So when you, you know, in order to run a certain test, maybe you need certain records like a user. And then I don't know if it's a scheduling test, you might need to create an appointment and whatever. All those records that you create specifically for that test that's happening inside of a database transaction. And then at the end of the test, the transaction is aborted. So that none of the data you create during the test actually gets persisted to the database. then regarding the whole database, it's not actually like creating a new database instance at the start of each test and then blowing it away.

It's still the same database instance is just the data inside of each test is not being persisted at.

[00:07:05] Jeremy: Okay. So when you run. What you would call, I guess you called it an acceptance test, right? Where it's going, it's opening up your website, it's clicking through the website, creating records, things like that. That's happening in a database instance that's created for, I guess, for all your tests that all your tests get to reuse and rails is automatically wrapping your test in a transaction.

So even if you're doing five or 10 database queries at the end of all, that they all get rolled back because they're all within the same transaction.

[00:07:46] Jason: Exactly. And the reason why we want to do that. Is because of a testing principle that you want your tests to be runnable in any order. And the key thing is you want your tests to be deterministic. So deterministic means that the starting state determines the in-state and it's the same every time, no matter what.

So if you have tests a, B and C, it shouldn't be the case that you can run them in the order, ABC, and they all pass. But if you do it CBA, then test a fails because it should only fail. If something's actually wrong, it shouldn't fail for some other reason, like the order in which you run the tests. And so to ensure that property of deterministic newness we need to make it so that each test doesn't leak into the other tests.

Cause imagine if that. Database transaction. thing didn't happen. And it's, it's only incidental that that's achieved via database transactions. It could conceivably be achieved some other way. That's just how this happens to work in this particular case. But imagine if no measure was taken to clean up afterward and I, I ran a test and it generated an appointment.

And then the test that runs after that does some tests that involves like doing a count of appointments or something like that. And maybe like, coincidentally, my second test passes because I've always run the tests in a certain order. and so unbeknownst to me, test B only passes because of what I did in test a that's bad because now the thing that's happening is different from what I think is happening.

And then if it flipped and when we ran it, test B and then test a. It wouldn't work anymore. So that's why we make each test isolated. So it can be deterministic.

[00:09:51] Jeremy: and I wonder if you've worked with any other frameworks or any other languages, and if you found that the approaches and those frameworks or languages is similar to rails, like where it creates these, the transaction for you, does the rollback for you and all of that.

[00:10:08] Jason: Good question. I have to plead ignorance. I've dabbled a little bit in testing and some other languages or frameworks, but not enough to be able to say anything smart about it.

[00:10:22] Jeremy: Yeah, I mean in my experience and of course there are many different frameworks that I'm not familiar with, but in a lot of cases, I I've seen that they don't have this kind of behavior built in, like, they'll provide you a way to test your application, but it's up to you if you want to write code that will wrap everything in a transaction or create a new database instance per test, things like that.

That's all left up to you. so I, I think it's interesting that that rails makes that decision for you and makes it to where you don't really have to think about that or make that decision. And for me personally, I found that really helpful.

[00:11:09] Jason: Yeah, it's really nice. It's a decision that not everybody is going to be on board with. And by that decision, I mean the general decision of rails to make a lot of decisions for you. And it may not be the case that I agree with every single decision that rails has made, but I do appreciate that that the rails team or DHH, or whoever has decided that rails is just going to have all these sensible defaults.

And that's what you get. And if you want to go tweak that stuff, I guess you can, but you get all this stuff this way. Cause we decided what we think is the best way to do it. And that is how most people use their, their rails apps. I think it's great. It eliminates a lot of overhead and then. Use some other technologies, I've done some JavaScript stuff and it's just astonishing how much boiler plate and how many, how much energy I have to expend on decisions that don't really matter.

And maybe frankly, decisions that I'm not all that equipped to make, because I don't have the requisite knowledge to be able to make those decisions. And usually I'd rather just have somebody else make those decisions for me.

[00:12:27] Jeremy: we've been talking about the more high level tests, the acceptance tests, the integration tests. And when you're choosing on how to test something, how do you decide whether it should be tested that, that level, or if it should be more of a unit level tests, something, something smaller

[00:12:49] Jason: Good question. So I want to zoom out just a little bit in order to answer that question and come at it from a distance. So I recently conducted some interviews for a programmer job. I interviewed about 25 candidates, most of those candidates. Okay. And the first step of the interview was this technical coding exercise. most of the candidates did not pass. And maybe, I don't know. Five or six or seven of the candidates out of those 25 did pass. I thought it was really interesting. The ones who failed all failed in the same way and the ones who passed all passed in the same way. And I thought about what exactly is the difference.

And the difference was that the programmers who passed, they coded in feedback loops. So I'll say that a different way, the ones who failed, they tried to write their whole program at once and they would spend 15, 20 minutes carefully writing the program. And then at the end of that 20 minutes, they would try to run it.

And unsurprisingly to me the program would fail like on line 2 of 30, because nobody's smart enough to write that much code and have the whole thing work. And then the ones who did well. They would write maybe one line of code, run it, observe what happens, compare what they observed to what they expected to see, and if any corrections were needed, they made those corrections and ran it again.

And then only once their expectations were satisfied, did they go and write a second line and they would re repeat that process again, that workflow of programming and feedback loops I think is super important. And I think it's what distinguishes, Hmm. I don't exactly want to say successful programmers from unsuccessful programmers, but there's certainly a lot to do with speed.

like, think about how much slower it is to try to write your whole program, run it and see that it fails. And then try to find the needle in the haystack. It's like, okay, I just wrote 30 lines. There's a problem somewhere. I don't know where, and now I have to dig through and find it It's so much harder than if you just write one line and you see a problem and you know, that, that problem lines in that line, you just wrote.

So I say all that, because testing is just feedback loops automated. So rather than writing a line and then manually running your program and using your own judgment to compare what you observed to what you expected to see you write a test that exercises your code and says, I expect to see this when this happens.

And so the kind of test you write now to answer your question will depend first on the nature of the thing you're writing. But for like, if we take kind of the like typical case of, let's say I'm building a form that will allow me to create a customer in a system. And I put in the first name, last name and email address of the customer. that's a really basic like crud functionality thing. There's not a lot of complexity there. And so I am, to be honest, I might just not write a test at all and we can get into how I decide when to write a test and when not to, but I probably would write a test. And if I did, I would write a system spec to use the rails are spec terminology that spins up a browser.

I would fill in the first name field with a first name, fill in the last name field with the last name, email, with email click, the submit button. And then I would assert that on the subsequent page, I see some indicator of success. And then if we think about something that. Maybe more involved, like I'm thinking about some of the complicated stuff I've been working on recently regarding um, coming up with a patient's balance in the medical system that I work on.

That's a case where I'm not going to spin up a browser to check the correctness of a number. Cause that feels like a mismatch. I'm going to work at a lower level and maybe create some database records and say, when I, when I created this charge and when I create this payment, I expect the remaining balance to be such and such.

So the type of test I write depends highly on the kind of functionality.

[00:17:36] Jeremy: So it sounds like in the case of something that's more straight forward, you might write a high level test, I guess, where you were saying I just click this button and I see if the thing I expected to be created is there on the next page. And you might create that test from the start and then just start filling in the code and continually running that test you know, until it passes.

But you also mentioned that in the case of something simple like that, you might actually. Choose to forego the tests and just take a look you know, visually you open the app and you click that same button and you can see the same result. So I wonder if you could talk a little bit more about how you decide, like, yeah, I'm going to write this test or no, I'm just going to inspect a visually

[00:18:28] Jason: Yeah. So real quick before I answer that, I want to say that it's, it's not one of the tests is straightforward or the feature is straightforward that determines which kind of test I write, because sometimes the acceptance test that I write, which spins up a browser and everything. Sometimes that might be quite an involved test and in complicated feature, or sometimes I might write a lower level test and it's a trivially simple one.

It has more to do with um, What's, what's the thing that I care about. Like, is it primarily like a UI based feature that, that is like the meat of it? Or is it like a, a lower level, like calculation type thing or something like that? That's kind of what determines which kind of right. But you asked when would I decide not to write a test.

So the reason I write tests is because it's just like cost prohibitive to manually perform testing, not just in monetary terms, but like in emotional pain and mental energy and stuff like that. I don't want to go back and manually test everything to make sure that it's still working. And so the ROI on writing automated tests is almost always positive, but sometimes it's not a positive ROI.

And so when I don't write it down, It's if these conditions are true, if the cost of that feature braking is extremely low. And if the I'll put that if, if the consequences of the feature breaking are really small and the frequency of the usage is low and the cost of writing the test is high, then I probably won't write a test.

For example, if there's some report that somebody looks at once every six months and it's like some like maybe a front desk person who uses the feature and if it doesn't work, then it means they have to instead go get the answer manually. And instead of getting the answer in 30 seconds, it takes them five.

Extremely low cost to the failure. And it's like, okay, so I'm costing somebody, maybe 20 bucks once every six months, if this feature breaks. And let's say this test is one that would take like an hour for me to write. Clearly it's better just to accept the risk of that feature breaking once in a while, which it's probably not going to anyway. So those are the questions I ask when I decide and, and to, to be clear, it's not like I run through all those questions for every single test I write in the vast, vast majority of cases. I just write the test because it's a no-brainer that it's, that it's better to write the test, but sometimes my instincts tell me like, Hey, is this really actually important to write a test for?

And when I find myself asking that, then I say, okay, what's the consequences of the breakage? How hard is this test to write all that.

[00:21:46] Jeremy: So you talked about the consequences being low, but you also talked about maybe the time to write the test being high. What are the types of tasks that, that take a long time to write?

[00:21:58] Jason: Usually ones that involve a lot of setup. So pretty much every test requires some data to be in place data, either meaning database, data, or like some object structure or something like that. Sometimes it's really easy sometimes to set up is extremely complicated. and that's usually where the cost comes in.

And then sometimes, sometimes you encounter like a technical challenge, like, oh, how do I like download this file? And then like inspect the contents of this file. Like sometimes you just encounter something that's like technically tricky to achieve. But more frequently when a test is hard to write it's because of the setup is hard.

[00:22:49] Jeremy: and you're talking about set up being, you need to insert a whole bunch of different rows into your database or different things that interact with one, another things like that.

[00:23:02] Jason: Exactly.

[00:23:03] Jeremy: when you're testing a system and you create a database that has all these items in it for you to work with, I'm assuming that what's in your test database is much smaller than what's in the real database. So how do you get something that's representative so that if you only have 10 things in your tasks, but in production, there's thousands of them that you can catch that, Hey, this isn't going to work well, once it gets to production,

[00:23:35] Jason: Yeah. that's a really interesting question. And the answers that I don't like, I usually don't try to make the test beta test database representative of the production database in terms of scale, obviously like the right data has to be there in order to exercise the test that it has to be true. But I don't, for example, in production at this moment I know there's some tens of thousands of appointments in the database, but locally at any given time, there are between zero and three or, or So appointments in any particular test, that's obviously nowhere near realistic, but it's only becomes relevant in a great, great minority of cases with, with regard to that stuff, the way I approach that is rather to So I'm thinking about some of those through the, for the first time right now, but obviously with performance in general premature optimization is usually not a profitable endeavor. And so I'll write features without any thought toward performance. And then once things are out there and perform it in production observe the bottlenecks and then fix the bottlenecks, starting with what's the highest ROI.

And usually tests haven't come into the picture for me. It's cause like, okay. The reason for tests again is, so you don't have to go back and do that manual testing, but with these performance improvements, instead of tests, we have like application performance monitoring tools, and that's what tells me whether something needs an issue or people just say like, Hey, this certain page is slow or whatever.

And so tests would be like redundant to those other measures that we have that tell us if there's performance.

[00:25:38] Jeremy: Yeah. So that sorta touches on what you described before, where let's say you were writing some kind of report or adding a report and when you were testing it locally, it worked great generated the report. Uh, Then you pushed it out to production. Somebody went to run it and maybe because of an indexing problem or some other issue It times out, or it doesn't complete takes a long time, but I guess what you're saying is in a lot of cases, the, the consequences of that are not all that high.

Like the person will try it. They'll see like, Hey, it doesn't work. Either you'll get a notification or they'll let you know, and then that's when you go in and go like, okay, now, now we can fix this.

[00:26:30] Jason: Yeah. And I think like the distinction is the performance aspect of it. Because like with a lot of stuff, you know, if you don't have any tests in your application at all, there's a high potential for like silent failure. And so with the performance stuff, we have other ways of ensuring that there won't be silent failure.

So that's how I think about that particular.

[00:26:56] Jeremy: I guess another thing about tests is when you build an application, a lot of times you're not just interacting with your own database, you're interacting with third-party APIs. You may even be connecting to different pieces of hardware, things like that. So when you're writing a test, how do you choose to approach that?

[00:27:23] Jason: yeah, good question. This is an area where I don't have a lot of personal experience, but I do have some there's another principle in testing that is part of the determinism principle where you don't want to involve external HTTP requests and stuff like that in your tests. Because imagine if I run my test today, And it passes, but then I run my test tomorrow and this third-party API is down and my test fails the behavior of my program didn't change. The only thing that's different is this external API is down right now. And so what I do for, for those is I'll capture the response that I get from the API. And I'll usually somehow um, get my hands on a success response and a failure response and whatever kind of response I want to account for.

And then I'll insert those captured responses into my tests. So that then on every subsequent run, I can be using these canned values rather than hitting the real API.

[00:28:37] Jeremy: I think in your um, the description of your book, you mentioned a section on, on stubs and mocks, and I wonder what you're describing here, which of those two things, is it? And what's the difference?

[00:28:53] Jason: Yeah. it's such a tricky concept And I don't even trust myself to say it right every time that I want to remind myself of the difference between mocks and stubs. I have to go back to my own blog posts that I wrote on it and remind myself, okay, what is the difference between a mock and a stub? And I'll just say, I don't remember.

Because this isn't something that I find myself dealing with very frequently. It's something that people always want to know about at least in the rails world. But I'll speak for myself at least. I don't find myself having to use or wanting to use mocks and stubs very much.

I will say that both mocks and stubs are a form of a testable. So a mock is a testable and a stub is a testable and a testable. It's like a play on stunt double instead of using a real object or whatever it is, you have this fake object. And sometimes that can be used to like trick your program into behaving a certain way or it can be used to um, gain visibility into an area that you otherwise wouldn't have visibility into.

And kind of my main use case for mocks and stubs when I do use them, is that when you're testing a particular thing, You want to test the thing you're interested in testing. You don't want to have to involve all the dependencies of the thing you're testing. And so I will like stub out the dependencies.

So, okay. Here's an example. I have a rare usage of stubs in my, in my uh, test suite and dear listener. I'm going to use the word stub. Don't give too much credence to that. Maybe. I mean, mock, I don't remember. But anyway, I have this area where we determine a patient's eligibility to get a certain kind of medicine and there's a ton that goes into it and there's all these, like, there's, there's these four different, like coarse-grained determinations and they all have to be a yes in order for it to overall be a yes.

That they can get this medicine. It has to do with mostly insurance. And then each one of those four core course grain determinations has some number of fine grain determinations that determines whether it is a yes or a no. If I weren't using mocks and stubs in these tests, then in order to test one determination, I would have to set up the conditions.

This goes back to the setup, work stuff we talked about. I'd have to set up all the conditions for the medicine to be a yes. In addition to, to the thing I'm actually interested in. And so that's a waste because that stuff is all irrelevant to my current concern. Let me try to speak a little bit more concretely.

So let's say I have determinations ABC. When I'm interested in determination, a I don't want to have to do all the setup work for determinations, B, C, and D. And so what I'll do is I'll mock the determinations for B, C and D. And I'll say for B, just have the function returned true for C same thing, just return true for D return.

True. So it'd like short circuits, all that stuff and bypasses the actual logic that gives me the yes, no determination. And it just always gives me a yes. That way. There's no setup work for B, C, and D. And I can focus only on.

[00:32:48] Jeremy: And I think it may be hard to say in this example, but would you, would you still have at least one test that would run through and do all the setup, do the checks for ABC and D and then when you're doing more specific things start to put in doubles for the others, or would you actually just never have a full test that actually did the complete setup?

[00:33:14] Jason: well, here's how I'm doing this one. I described the scenario where I'm like thoroughly testing a under many different conditions, but stubbing out B, C and D. They don't have another set of tests where I thoroughly test B and stub out a C and D. And so on. I have one thorough set for, for each of those. If you're asking whether I have one that like exercises, all four of them, No.

I just have ones for each of the four individually, which is maybe kind of a trade off. Cause it's arguable that I don't have complete confidence because I'm never testing the four together. But in the like trade off of like setup?

work and all that, that's necessary to get that complete con confidence and the value of that, like additional, because really it's just like a tiny bit of additional con confidence that I would get from testing all those things together.

In that particular case, my judgment was that that was not worth

[00:34:19] Jeremy: yeah. Cause I was thinking from their perspective of sometimes I hear that people will have a acceptance test that covers sometimes you hear people call it the happy path, right. Where they everything lines up. It's like a very straightforward case of a feature. But then all the different ways that you can test that feature, they don't necessarily write tests for those, but they just write one for the, the base case.

And then, like you said, you actually drill down into more specifics and maybe only test a, a smaller part there, but it sounds like in this case, maybe you made the decision that, Hey, doing a test, that's going to test all four of these things, even in the simplest case is going to involve so much setup and so much work that, that maybe it's not, not worth it in this case.

[00:35:13] Jason: Yeah. And I'd have to go back and refresh my memory as to like what exactly this scenario is for those tasks. Because in general, I'm a proponent of having integration tests that makes sure multiple things work together. Okay. You might've seen that Gif where it says like um, two unit tests, zero integration tests, and there's like a cabinet with two doors.

Each door can open on its own or, or maybe it's drawers. Each drawer can open on its own, but you can't open both drawers at the same time. And so I think that's not smart to have only unit tests and no integration tests. And so I don't remember exactly why I chose to do that eligibility test with the ABC and D the way I did.

Maybe it was just cost-prohibitive to do it altogether. Um, One thing that I want to want to comment on regarding mocks and stubs, there's a mistake that's made kind of frequently where people overdo it with mocks and stuff. They misunderstand the purpose. The purpose again is that you want to test the thing you're testing, not the dependencies of the thing.

But sometimes people step out the very thing they're testing. And so they'll like assert that a certain method will return such and such value, but they'll stub the method they're testing so that the method is guaranteed to return the same value and that doesn't actually test anything. So I just wanted to make, mention that as a common mistake to avoid

[00:36:47] Jeremy: I wonder if you could maybe give an example of when you, you have a certain feature and the thought process you're going through where you decide like, yes, this is the part that I should have a stub or a mock for. And this is the part where I definitely need to make sure I run the code.

[00:37:07] Jason: Well, again, it's very rare that I will use a mocker stub and it's not common that I'll even consider it for better or worse. Like we're talking about. The nature of rails tests is that we spin up actual database records and, and test our models with database data and stuff like that. In other ecosystems, maybe the testing culture is different and there's more mocks and stubs.

I know when I was doing some coding with angular, there was a lot more mocking and stubbing. But with rails, it's kind of like everything's available all the time and we use the database a lot during testing. And so mocks and stubs don't really come into the picture too much.

[00:37:56] Jeremy: Yeah. It's, it's interesting that you, you mentioned that because like I work with some projects that use C-sharp and asp.net, and you'll a lot of times you'll see people say like you should not be talking to the database in your tests. And you know, they go through all this work to probably the equivalent of a mock or a stub.

But then, you know, when I, when I think about that, then I go like, well, but I'm not really testing how the database is going to react. You know, are my, are my queries actually valid. Things like that, because all this stuff is, is just not being run. in some other communities, maybe they're they have different ideas, I guess, about, about how to run tests.

[00:38:44] Jason: Yeah, And it's always interesting to hear expressions. Like you should do this or you shouldn't do that, or it's good to do this. It's bad to do that. And I think maybe that's not quite the right way to think about it. It's more like, well, if I do this, what are the costs and benefits of doing this? Cause it's like, nothing exactly is a good thing to do or a bad thing to do.

It's just, if you do this, this will happen as a consequence. And if you don't this won't and all that stuff. So people who don't want to talk to the database in their tests, why is that? What, what are the bad things they think will happen if you do that? The drawbacks is it appears to me are it's slow to use the database in any performance problem.

Usually the culprit is the database. That's always the first thing I look at. And if you're involving the database and all of your tests, your tests are going to be much slower than if you don't use the database, but the costs of not talking to the database are exactly what you said, where you're like, you're not exercising your real application, you're missing an entire layer and maybe that's fine.

I've never tried approaching testing in that way. And I would love to like, get some experience like working with some people who do it that way. Cause I can't say that I know for an absolute fact that that doesn't work out. But to me it just makes sense to exercise everything that you're actually using when the app runs.

[00:40:18] Jeremy: what's challenging probably for a lot of people is that if you look online for how to do testing in lots of different frameworks, you'll get different answers. Right. And it's not clear what's gonna fit your situation right? And you know, to, to give an example of, we've been talking about how rails will it, it predominantly focuses on tests that, that talks to the database and it wraps everything in a transaction as we talked about before, so that you can reset the state and things like that.

I've also seen in other frameworks where they'll say like, oh, you can run a database test, but you use this in-memory version of the database instead of actually talking to a real MySQL or Postgres instance, or they'll say, oh, for this test we're going to use SQLite in place of the Postgres database you're actually using in production.

And it, it makes the, the setup, I suppose, easier. Um, And maybe it makes the tests run quicker, but then it's also no longer the same as what you're really running. So there's like a lot of different approaches that, that people describe and take. And I think it can be hard for, for people to know, like what, what makes sense for me.

[00:41:42] Jason: Yeah. And this is another area where I have to plead ignorance because again, I don't have experience doing it the other way. Logically, I feel like my way makes sense, but I don't have empirical experience doing it the other way.

[00:41:57] Jeremy: we've talked a little bit about how there's cases where you'll say I'm not going to do this thing because it's going to take a lot of time and I've weighed the benefits. And I wonder if you could give some examples of things where you spent a lot of time on something, and then in hindsight, you, you realize like this really wasn't worth it.

[00:42:18] Jason: I don't think I have any examples of that because I don't think it tends to happen very much. I really can't emphasize enough how old, the case where I choose not to write a test for something is like a one in 5,000 kind of thing. It's really not something I do frequently. The mistake is overwhelmingly in the opposite direction.

Like somebody may, maybe I will get lazy and I'll skip a test and then I'll realize, oh yeah, This is why I write tests because it actually makes everything easier. And uh, we get pain as as a consequence when we skip tests. So that's usually the mistake I make is not writing a test when I should, rather than writing a test when I should not have

[00:43:08] Jeremy: So since then, in general, you, you said that not writing it is, is the, the mistake. How do you get people in the habit of. Of writing the tests where they feel like it's not this thing that's slowing them down or is in the way, but is rather something that's helping them with that feedback loop and is something that they actively want to do.

[00:43:33] Jason: Yeah. So to me, it's all about a mindset. So there's a common perception that tests are something extra. Like I've heard stories about, like somebody gives a quote for a project and then the prospective client asks like, well, how much, if we skip tests, how much less would that be? And it's like, oh, it wouldn't be less.

It'd be like five times more because tests are a time saver. So I want to try to dispel with that notion. But even so it can be hard to bring oneself, to write task because it feels like something that takes discipline. But in my case, I don't feel like it takes discipline. Because I remind myself of a true fact that it's actually the lazy and easy way to code is to code using tests.

And it's the harder, more laborious way to write code. Not using tests because think about what's, what's the alternative to not writing tests. Like we said earlier, the alternative is to manually test everything. And that's just so painful, especially when it's some feature where like, I'm sure you have experience with this, Jeremy, you, you make a code change.

And then in order to verify that the thing still works, you have to go through like nine different steps in the browser. And only on that last step, do you get that answer you're after. That's just so painful. And if you write a test, you can automate that. Some things that might present friction in that process, or just like a lack of familiarity with how to write tests and maybe a um, a lack of an easy process for writing tests.

And just to briefly touch on that, I think something that can help reduce that. Is to write tests in the same way that I write code in feedback loops. So we talked about writing one line, checking, writing, another line, checking that kind of thing. I write my tests in the same way. First I'll write the shell of the test and then I'll run just the shell, even though it seems kind of dumb to just run the shell cause you know, it doesn't do anything. I do that just to demonstrate to myself that I didn't like make some typo or something like that. I'm starting from like a clean baseline. And then I'll write one line of my test. Maybe if I'm writing a system spec, I'll write a line that creates a user of rum that I know that nothing's going to happen when I run the test, but I'll run it just to see it run and make sure there's no errors.

And then I'll add a line that says, log the user in and then I'll run that. And so on just one line at a time. There's this principle that I think is really useful when working, which is to separate the deciding what to do from the actually doing it. I think a lot of developers mixed those two jobs of deciding what to do and doing it in the same step.

But if, if you separate those, so you'd like, decide what you're going to have your tests do. And then after that, so like maybe I'll open my test and I'll write in comments what I want to achieve, not in technical terms necessarily, but I'll just write a comment that says, create a user, right? Another comment that says, log in another comment that says, click on such and such.

And then once I have those, there, I'll go back to that first line and convert that to code. Okay. My comment that says, create a user, I'll change that to the syntax that actually creates a user and again, using the feedback loop. So I'll run that so that I can, you know, once I'm, once I'm done writing all those comments that say what the test does, I'm now free to forget about it.

And I don't have to hold that in my mental Ram anymore. And I can clear my mental RAM. Now all my mental RAM is available to bring, to bear on the task of converting my steps that I already decided into working syntax. If you try to do both those things at the same time, it's more than twice as hard. And so that's why I try to separate.

[00:48:04] Jeremy: So that's interesting. So it's like you're designing, I guess, the feature, what you want to build in the context of the test first it's would that be accurate?

[00:48:19] Jason: that certainly can be the case. So much of this is context dependent. I very regularly give my self permission to be undisciplined and to go on exploratory spikes. And so if I have like very, if I have a really vague idea about what shape a feature is going to take, I give myself permission to forget about tests and I just write some code and I feel cause there's two reasons to write code.

You know, a code is not only a work product code is also a thinking. so I would let go into a different mode, I'll say, okay, I'm not trying to create a work product right now. I'm just using code as a thinking medium, to figure out what I'm even going to do. So that's what I'll do in that case. And then maybe I'll write the test afterward, but if it's very clear, what the thing is that I'm going to write, then I'll often write the test first again, in those two phases of deciding what it's going to be and the deciding how it works.

And I won't do a thing where, where, like I write 10 test cases and then I go through one by one and write code to make them pass. Usually I'll write one test, make a pass, write a second test, make it pass and so on.

[00:49:38] Jeremy: okay. So the more exploratory aspect, I guess, would be when. You're either doing something that you haven't done before, or it's not clear to you what the features should be is, is that right?

[00:49:58] Jason: Yeah, like maybe it's a feature that involves a lot of details. There's like a lot of room for discretion. It could be implemented in more than one way. Like how would I write a test for that? If I don't even know what form it's going to take? Like there's decisions to be made, like, what is the, the route going to be that I visit for this feature?

What am I even going to call like this entity and that entity and stuff like that. And I think that goes back to my desire to not juggle and manage. Multiple jobs at the same time. I don't want to, I don't want to overly mix the design job with the testing job. Cause testing can help with design, but design in like a code structure sense.

I usually don't want to mix testing with like UI design and not even UI design, like, like design in the highest sense. Meaning like what even is this thing? How does it work? Big picture wise and stuff like that. That's not the kind of design that testing helps with in my mind of the kind of design that testing helps with again, is the code structure.

So I want to have my big picture design out of the way before I start writing my test.

[00:51:21] Jeremy: and in terms of the big picture design, is that something that you keep all in your head or are you writing that down somewhere? I'm just wondering what your process is.

[00:51:34] Jason: Yeah, it can work a number of different ways in the past. I've done usability testing where I will do some uh, pen and paper prototypes and then do some usability testing with, with users. And then I will um, convert those pen and paper prototypes to something on the computer. The idea being pen and paper prototypes are the cheapest to create and change.

And then the more you cement it, the more expensive it gets to change. So only once I'm fairly certain that the pen and paper prototypes are right. Will I put it into something that's more of a formal mock. And then once I have my formal mock-up and that's been through whatever scrutiny I want to put it through, then I will do the even more expensive step of implementing that as a working feature.

Now having said all that, I very rarely do I go through all that ceremony. Sometimes a feature, usually a feature is sufficiently small, that all that stuff would be silly to do. So sometimes I'll start straight with the the mock-up on the computer and then I'll work off of that. Sometimes it's small enough that I'll just make a few notes in a note-taking program and then work off of that.

What is usually true is that our tickets in our ticketing system have a bulleted list of acceptance criteria. So we want to make it very black and white. Very yes, no. Whether a particular thing is done and that's super helpful because again, it goes back to the mixing of jobs and separating of jobs.

If we've decided in advance that this feature needs to do these four things. And if it does those four things it's done and it doesn't need to do anything more and if it doesn't meet those four criteria, then it's not done then building the thing is just a matter of following the instructions. Very little thinking is involved.

[00:53:45] Jeremy: depending on the scope of the feature, depending on how much information you have uh, you could either do something elaborate, I suppose, where, you know, you were talking about doing prototypes or sketches and, and so on before you even look at code or there could be something that's not quite that complicated where you have an idea of what it is and you might even play with code a little bit to get a sense of where it should go and how it should work.

But it's all sort of in service of getting to the point where you know enough about how you're going to do the implementation and you know enough about what the actual feature is to where you're comfortable starting to write steps in the test about like, these are the things that are going to happen.

[00:54:35] Jason: Yeah. And another key thing that might not be obvious is that all these things are small. So I never work well, I shouldn't say never, but in general, I, don't work in a feature. That's going to be like a week long feature or something like that. We try to break them down into features that are at most like half.

And so that makes all that stuff a lot easier. Like I use the number four as an example of how many acceptance criteria there might be. And that's a pretty representative example. We don't have tickets where there's 16 acceptance criteria because the bigger something is the more opportunity there is for the conceive design to turn out, not to be viable.

And the more decisions that can't be made, because you don't know the later step until the earlier decision is made and all that kind of stuff. So the small size of everything helps a lot.

[00:55:36] Jeremy: but I, I would imagine if you're breaking things into that small of a piece, then would there be parts that. You build and you tasked and you deploy, but to the user, they actually don't see anything. Is that the appraoch?

[00:55:52] Jason: definitely, we use feature flags. Like for example, there's this feature we're working on right now, where we have a page where you can see a long list of items. The items are of several different types right now. You just see all of them all the time, but depending on who you are and what your role is in the organization, you're not going to be interested in all those things.

And so we want people to be able to have check boxes for each of those types to show or hide those things. Whereas checkbox feature is actually really big and difficult to add. And so the first thing that I chose to do was to have us add just one single check box for one type. And even that one, single checkbox is sufficiently hard that we're not even giving people that yet.

We coded it so that you get the check boxes and that one checkbox is selected by default. When you uncheck it, the thing goes away, but it's selected by default so that we can feature flag that. So the checkbox UI is hidden. Everything looks just the way it did before. And now we can wait until this feature is totally done before we actually surface it to users.

So it's the idea of making a distinction between deployment and release. Cause if we try to do this whole big thing, it's, it's gonna take weeks. If we try to do the whole thing, that's just too much risk for something to go wrong. And then like, we're going to deploy like three weeks of work at once.

That's like asking for trouble. So I'm a huge fan of feature flags.

[00:57:35] Jeremy: Interesting. So it's like the, it's almost like the foundation of the feature is going in. And if you were to show it to the user well, I guess in this case, it actually did have a function right at you. You could filter by that one category.

[00:57:52] Jason: oh, I was just going to say you're exactly right. It wouldn't be a particularly impressive or useful feature, but what we have is complete it's it's not finished, but it is complete.

[00:58:06] Jeremy: I'm not sure if you have any examples of this, but I imagine that there are changes that are large enough that I'm not sure how you would split it up until you, you mentioned like half a days worth of time. And I, I wonder if either have examples of features like that or a general sense of how, what do you do if you, you can't figure out a way to split it up that small.

[00:58:34] Jason: I have yet to encounter a feature that we haven't been able to break up into pieces that are that small. So, unfortunately, I can't really say anything more than that because I just don't have any examples of exceptions

[00:58:49] Jeremy: For, for people listening, maybe that should be a goal at least like, see if you can make everything smaller, see if you can ship as little as possible, you know, maybe you don't hit that half a day mark, but at least give it a, give it a try and see what you can do.

[00:59:10] Jason: yeah. And the way I care would characterize it, maybe wouldn't be to ship as little as possible at a time, but to give a certain limit that you try not to go over. And it's, it's a skill that I think can be improved with practice. You learn certain techniques that you can use over and over. Like for example, one way that I split things up sometimes is we will add the database tables in one chunk. And we'll just deploy that, cause that presents a certain amount of risk, you know, when you're adding database tables or columns or anything like that, like it's always risky when you're messing with the structure of the database. So I like to do just that by itself. And it's kind of tidy most of the time because because it's not something that's like naturally visible to the user is just a structural change.

So that's an example of the kind of thing that you learn as you gain practice, breaking bigger things up into smaller pieces.

[01:00:16] Jeremy: so, and, and that example, in terms of whatever issue tracking system you use, what, what would you call that? Would you just call that setting up schema for X future features, or I'm just kinda curious how you characterize that.

[01:00:35] Jason: yeah, something like that. Those particular tickets don't have great names because ideally each ticket has some amount of value that's visible to the user and that one totally doesn't, it's a purely nuts and bolts kind of thing. So that's just a case where the name's not going to be great, but what's the alternative can't think of anything better. So we do it like that.

[01:01:02] Jeremy: you feel like that's, that's lower risk shipping something that's not user-facing first. Then it is to wait until you have at least like one small thing that, you know, is connected to that change.

[01:01:19] Jason: Yeah. I had a boss in the past who had a certain conception of the reason to do deployments. And, and her belief was that the reason that you deploy is to deliver value to the user which is of course true, but there's another really good reason to deploy, which is to mitigate risk. The further production and development are able to diverge from one another, the greater, the risk.

When you do a deployment. I remember one particular time at that job, I was made to deploy like three months of work at once and it was a disaster and I got the blame because I was the one who did the work. And quite frankly, I was really resentful that that had. And that's part of what informs my preference for deploying small amounts of work at a time.

I think it's best if things can be deployed serially, like rather than deploying in patches, just finish one thing, deploy it, verify it, finish the next thing, deploy it, verify it. I have the saying that it's better to be a hundred percent done with half your work than halfway done with a hundred percent of your work. For, for the hopefully obvious reason that like, if, if you have 15 things that are each halfway in progress, now you have to juggle 15 balls in your head. Whereas, if you have 15 things you have to do, and then you finish seven of them, then you can completely forget about those seven things that you finished and deployed and verified and all that.

And your mental bandwidth is freed up just to focus on the remaining work.

[01:03:10] Jeremy: yeah, that, that makes sense. And, and also if you are putting things out bit by bit, And something goes wrong, then at least it's not all 15 things you have to figure out, which was it. It's just the last thing he pushed out.

[01:03:26] Jason: Exactly. Yeah. It's never fun when you deploy a big delta and something goes wrong and it's a mystery. What introduced the problem? It's obviously never good if you deploy something that turns out to be a problem, but if you deployed just one thing and something goes wrong, at least you can. Roll it back or at the very least have a pretty decent idea of where the problem lies. So you can address it quickly.

[01:03:56] Jeremy: for sure. Well I think that's probably a good place to leave it off on, but is there anything else about testing or just software in general that you, you thought we should've brought up?

[01:04:09] Jason: Well, maybe if I can leave the listener with one thing um, I want to emphasize the importance of programming and feedback loops. It was a real eye-opener for me when I was interviewing these candidates to notice the distinct difference between programmers, who didn't program and feedback loops and programmers, who do I have a post about it?

I'm just, it's just called how to program and feedback loops. I believe if anybody's interested in the details. Cause I have like. It's like seven steps to that feedback loop. First, you write a line of code, then you do this. I don't remember all seven steps off the top of my head, but it's all there in the blog post.

Anyway, if I could give just one piece of advice to anybody who's getting into programming, it's a program in feedback loops.

[01:05:00] Jeremy: yeah, I think that's been the, the common thread, I suppose, throughout this conversation is that whether it's. Writing the features you want them to be as small as possible. So you get that feedback of it being done. And like you said, taking it off of your plate. Then there's the being able to have the tests there as you write the features so that you get that immediate feedback, that this is not doing what the test says it should be doing.

So yeah, it makes it, it makes a lot of sense that basically in everything we do try to get to a point where we get a thumbs up, we get at, this is complete. The faster we can do that, the better we'll we'll all be off. Right.

[01:05:46] Jason: exactly. Exactly.

[01:05:50] Jeremy: if people want to check out your book, check out your podcast, I think you even have a, a conference coming up, right? Uh, where, w where can they learn about that.

[01:06:02] Jason: So the hub for everything is code with jason.com. So that's where I always. Send people, you can find my blog, my podcast, my book there. And yeah, my conference it's called sin city ruby. It's a Ruby conference. This will only be applicable dear listener, if you're listening before March 24th, 2022. But yeah, it's, it's happening in Las Vegas.

It's going to be just a small intimate conference and it's a whole different story, but I kind of put on this conference accidentally. I didn't intend to do a conference. I just kind of uh, stumbled into it, but I think it will be a lot of fun. But yeah, that's, that's another thing that I have going on.

[01:06:49] Jeremy: What, what was it that I guess. Got you into deciding this is, this is what I want to do. I want to make a conference.

[01:06:58] Jason: Well, it started off as I was going to put on a class, but then nobody bought a ticket. And so I had to pivot. And so I'm like, okay, I didn't sell any tickets to this class. Maybe I can sell some tickets to a conference. And luckily for me, it turns out I was right because I was financially obligated to a hotel where I had reserved space for the class.

So I couldn't just cancel it. I had to move forward somehow. So that's where the conference came.

[01:07:28] Jeremy: interesting. yeah, I'm, I'm always kind of curious. How people decide what they want to attend, I guess, like, you know, you said how you didn't get enough signups for your class, but you get signups for a conference. And you know, the people who are signing up and want to go, I wonder to to them, what is, what is it about the going to a conference that is so much more appealing than, than going to a class?

[01:07:54] Jason: Oh, well, I think in order to go to a class, the topic has to be of interest to you. You have to be in like a specific time and place. The price point for that kind of thing is usually much higher than for, for a conference. Whereas with a conference it's affordable to individuals, you don't have to get your boss's permission necessarily, at least not for the money. It's more of like a, you don't have to be a specific kind of person in a specific scenario in order to benefit from it. It's a much more general interest. So that's why I think I've had an easier time selling tickets to that.

[01:08:31] Jeremy: Mm, mm. Yeah, it's, it's more of a I wanna get into a room with a bunch of people and just learn a bunch of cool stuff and not necessarily have a specific specific thing you're looking to get out of it, I guess.

[01:08:46] Jason: Yeah. There's no specific outcome or anything like that. Honestly, it's mostly just to have a good time. That's the main thing I'm hoping to get out of it. And I think that is the main draw for people they want to, they want to see their friends in the Ruby community form relationships and stuff like that.

[01:09:07] Jeremy: Very cool. Jason good luck with the conference and thank you so much for coming on software software sessions.

[01:09:13] Jason: Thanks a lot. And uh, thanks for having me.

Nov 10, 2021

00:49:23

Swizec is the author of the Serverless Handbook and a software engineer at Tia.

Swizec

AWS

Lambda
API Gateway
Operating Lambda (The cold start problem)
Provisioned Concurrency
DynamoDB
Relational Database Service
Aurora
Simple Queue Service
CloudFormation
CloudWatch

Other serverless function hosting providers

Related topics

Transcript

You can help edit this transcript on GitHub.

[00:00:00] Jeremy: Today, I'm talking to Swiz Teller. He's a senior software engineer at Tia. The author of the serverless handbook and he's also got a bunch of other courses and I don't know is it thousands of blog posts now you have a lot of them.

[00:00:13] Swizec: It is actually thousands of, uh, it's like 1500. So I don't know if that's exactly thousands, but it's over a thousand.

I'm cheating a little bit. Cause I started in high school back when blogs were still considered social media and then I just kind of kept going on the same domain.

Do you have some kind of process where you're, you're always thinking of what to write next? Or are you writing things down while you're working at your job? Things like that. I'm just curious how you come up with that.

[00:00:41] Swizec: So I'm one of those people who likes to use writing as a way to process things and to learn. So one of the best ways I found to learn something new is to kind of learn it and then figure out how to explain it to other people and through explaining it, you really, you really spot, oh shit. I don't actually understand that part at all, because if I understood it, I would be able to explain it.

And it's also really good as a reference for later. So some, one of my favorite things to do is to spot a problem at work and be like, oh, Hey, this is similar to that side project. I did once for a weekend experiment I did, and I wrote about it so we can kind of crib off of my method and now use it. So we don't have to figure things out from scratch.

And part of it is like you said, that just always thinking about what I can write next. I like to keep a schedule. So I keep myself to posting two articles per week. It used to be every day, but I got too busy for that. when you have that schedule and, you know, okay on Tuesday morning, I'm going to sit down and I have an hour or two hours to write, whatever is on top of mind, you kind of start spotting more and more of these opportunities where it's like a coworker asked me something and I explained it in a slack thread and it, we had an hour. Maybe not an hour, but half an hour of back and forth. And you actually just wrote like three or 400 words to explain something. If you take those 400 words and just polish them up a little bit, or rephrase them a different way so that they're easier to understand for somebody who is not your coworker, Hey, that's a blog post and you can post it on your blog and it might help others.

[00:02:29] Jeremy: It sounds like taking the conversations most people have in their day to day. And writing that down in a more formal way.

[00:02:37] Swizec: Yeah. not even maybe in a more formal way, but more, more about in a way that a broader audience can appreciate. if it's, I'm super gnarly, detailed, deep in our infrastructure in our stack, I would have to explain so much of the stuff around it for anyone to even understand that it's useless, but you often get these nuggets where, oh, this is actually a really good insight that I can share with others and then others can learn from it. I can learn from it.

[00:03:09] Jeremy: What's the most accessible way or the way that I can share this information with the most people who don't have all this context that I have from working in this place.

[00:03:21] Swizec: Exactly. And then the power move, if you're a bit of an asshole is to, instead of answering your coworkers question is to think about the answer, write a blog post and then share the link with them.

I think that's pushing it a little bit.

[00:03:38] Jeremy: Yeah, It's like you're being helpful, but it also feels a little bit passive aggressive.

[00:03:44] Swizec: Exactly. Although that's a really good way to write documentation. One thing I've noticed at work is if people keep asking me the same questions, I try to stop writing my replies in slack and instead put it on confluence or whatever internal wiki that we have, and then share that link. and that has always been super appreciated by everyone.

[00:04:09] Jeremy: I think it's easy to, have that reply in slack and, and solve that problem right then. But when you're creating these Wiki pages or these documents, how're people generally finding these. Cause I know you can go through all this trouble to make this document. And then people just don't know to look or where to go.

[00:04:30] Swizec: Yeah. Discoverability is a really big problem, especially what happens with a lot of internal documentation is that it's kind of this wasteland of good ideas that doesn't get updated and nobody maintains. So people stop even looking at it. And then if you've stopped looking at it before, stop updating it, people stop contributing and it kind of just falls apart.

And the other problem that often happens is that you start writing this documentation in a vacuum. So there's no audience for it, so it's not help. So it's not helpful. That's why I like the slack first approach where you first answered the question is. And now, you know exactly what you're answering and exactly who the audiences.

And then you can even just copy paste from slack, put it in a conf in JIRA board or wherever you put these things. spice it up a little, maybe effect some punctuation. And then next time when somebody asks you the same question, you can be like, oh, Hey, I remember where that is. Go find the link and share it with them and kind of also trains people to start looking at the wiki.

I don't know, maybe it's just the way my brain works, but I'm really bad at remembering information, but I'm really good at remembering how to find it. Like my brain works like a huge reference network and it's very easy for me to remember, oh, I wrote that down and it's over there even if I don't remember the answer, I almost always remember where I wrote it down if I wrote it down, whereas in slack it just kind of gets lost.

[00:06:07] Jeremy: Do you also take more informal notes? Like, do you have notes locally? You look through or something? That's not a straight up Wiki.

[00:06:15] Swizec: I'm actually really bad at that. I, one of the things I do is that when I'm coding, I write down. so I have almost like an engineering log book where everything, I, almost everything I think about, uh, problems I'm working on. I'm always writing them down on by hand, on a piece of paper. And then I never look at those notes again.

And it's almost like it helps me think it helps me organize my thoughts.

And I find that I'm really bad at actually referencing my notes and reading them later because, and this again is probably a quirk of my brain, but I've always been like this. Once I write it down, I rarely have to look at it again.

But if I don't write it down, I immediately forget what it is.

What I do really like doing is writing down SOPs. So if I notice that I keep doing something repeatedly, I write a, uh, standard operating procedure. For my personal life and for work as well, I have a huge, oh, it's not that huge, but I have a repository of standard procedures where, okay, I need to do X.

So you pull up the right recipe and you just follow the recipe. And if you spot a bug in the recipe, you fix the recipe. And then once you have that polished, it's really easy to turn that into an automated process that can do it for you, or even outsource it to somebody else who can work. So we did, you don't have to keep doing the same stuff and figuring out, figuring it out from scratch every time.

[00:07:55] Jeremy: And these standard operating procedures, they sound a little bit like runbooks I guess.

[00:08:01] Swizec: Yep. Run books or I think in DevOps, I think the big red book or the red binder where you take it out and you're like, we're having this emergency, this alert is firing. Here are the next steps of what we have to check.

[00:08:15] Jeremy: So for those kinds of things, those are more for incidents and things like that. But in your case, it sounds like it's more, uh, I need to get started with the next JS project, or I need to set up a Postgres database things like that.

[00:08:30] Swizec: Yeah. Or I need to reset a user to initial states for testing or create a new user. That's sort of thing.

[00:08:39] Jeremy: These probably aren't in that handwritten log book.

[00:08:44] Swizec: The wiki. That's also really good way to share them with new engineers who are coming on to the team.

[00:08:50] Jeremy: Is it where you just basically dump them all on one page or is it where you, you organize them somehow so that people know that this is where, where they need to go.

[00:09:00] Swizec: I like to keep a pretty flat structure because, I think the, the idea of categorization outlived its prime. We have really good search algorithms now and really good fuzzy searching. So it's almost easier if everything is just dumped and it's designed to be easy to search. a really interesting anecdote from, I think they were they were professors at some school and they realized that they try to organize everything into four files and folders.

And they're trying to explain this to their younger students, people who are in their early twenties and the young students just couldn't understand. Why would you put anything in a folder? Like what is a folder? What is why? You just dump everything on your desktop and then command F and you find it. Why would you, why would you even worry about what the file name is? Where the file is? Like, who cares? It's there somewhere.

[00:09:58] Jeremy: Yeah, I think I saw the same article. I think it was on the verge, right?

I mean, I think that's that's right, because when you're using, say a Mac and you don't go look for the application or the document you want to run a lot of times you open up spotlight and just type it and it comes up.

Though, I think what's also sort of interesting is, uh, at least in the note taking space, there's a lot of people who like setting up things like tags and things like that. And in a way that feels a lot like folders, I guess

[00:10:35] Swizec: Yeah. The difference between tags and categories is that the same file can have multiple tags and it cannot be in multiple folders. So that's why categorization systems usually fall apart. You mentioned note taking systems and my opinion on those has always been that it's very easy to fall into the trap of feeling productive because you are working on your note or productivity system, but you're not actually achieving anything.

You're just creating work for work sake. I try to keep everything as simple as possible and kind of avoid the overhead.

[00:11:15] Jeremy: People can definitely spend hours upon hours curating what's my note taking system going to be, the same way that you can try to set up your blog for two weeks and not write any articles.

[00:11:31] Swizec: Yeah. exactly.

[00:11:32] Jeremy: When I take notes, a lot of times I'll just create a new note in apple notes or in a markdown file and I'll just write stuff, but it ends up being very similar to what you described with your, your log book in that, like, because it's, it's not really organized in any way. Um, it can be tricky to go back and actually, find useful information though, Though, I suppose the main difference though, is that when it is digital, uh, sometimes if I search for a specific, uh, software application or a specific tool, then at least I can find, um, those bits there

[00:12:12] Swizec: Yeah. That's true. the other approach I'd like to use is called the good shit stays. So if I can't remember it, it probably wasn't important enough. And you can, especially these days with the internet, when it comes to details and facts, you can always find them. I find that it's pretty easy to find facts as long as you can remember some sort of reference to it.

[00:12:38] Jeremy: You can find specific errors or like you say specific facts, but I think if you haven't been working with a specific technology or in a specific domain for a certain amount of time, you, it, it can be hard to, to find like the right thing to look for, or to even know if the solution you're looking at is, is the right one.

[00:13:07] Swizec: That is very true. Yeah. Yeah, I don't really have a solution for that one other than relearn it again. And it's usually faster the second time. But if you had notes, you would still have to reread the notes. Anyway, I guess that's a little faster, cause it's customized to you personally.

[00:13:26] Jeremy: Where it's helpful is that sometimes when you're looking online, you have to jump through a bunch of different sites to kind of get all the information together. And by that time you've, you've lost your flow a little bit, or you you've lost, kind of what you were working on, uh, to begin with. Yeah.

[00:13:45] Swizec: Yeah. That definitely happens.

[00:13:47] Jeremy: Next I'd like to talk about the serverless handbook. Something that you've talked about publicly a little bit is that when you try to work on something, you don't think it's a great idea to just go look at a bunch of blog posts. Um, you think it's better to, to go to a book or some kind of more, uh, I don't know what you would call it like larger or authoritative resource. And I wonder what the process was for, for you. Like when you decided I'm going to go learn how to do serverless you know, what was your process for doing that?

[00:14:23] Swizec: Yeah. When I started learning serverless, I noticed that maybe I just wasn't good at finding them. That's one thing I've noticed with Google is that when you're jumping into a new technical. It's often hard to find stuff because you don't really know what you're searching for. And Google also likes to tune the algorithms to you personally a little bit.

So it can be hard to find what you want if you are, if you haven't been in that space. So I couldn't really find a lot of good resources, uh, which resulted in me doing a lot of exploration, essentially from scratch or piecing together different blogs and scraps of information here and there. I know that I spend ridiculous amounts of time in even as deep as GitHub issues on closed issues that came up in Google and answer something or figure, or people were figuring out how something works and then kind of piecing all of that together and doing a lot of kind of manual banging my head against the wall until the wall broke.

And I got through. I decided after all of that, that I really liked serverless as a technology. And I really think it's the future of how backend systems are going to be built. I think it's unclear yet. What kind of systems is appropriate for and what kind of kind of systems it isn't.

It does have pros and cons. it does resolve a lot of the very annoying parts of building a modern website or building upon backend go away when you go serverless. So I figured I really liked this and I've learned a lot trying to piece it together over a couple of years.

And if combined, I felt like I was able to do that because I had previous experience with building full stack websites, building full stack apps and understanding how backends work in general. So it wasn't like, oh, How do I do this from scratch? It was more okay. I know how this is supposed to work in theory.

And I understand the principles. What are the new things that I have to add to that to figure out serverless? So I wrote the serverless handbook basically as a, as a reference or as a resource that I wish I had when I started learning this stuff. It gives you a lot of the background of just how backends work in general, how databases connect, what different databases are, how they're, how they work.

Then I talked some, some about distributed systems because that comes up surprisingly quickly when you're going with serverless approaches, because everything is a lot more distributed. And it talks about infrastructure as code because that kind of simplifies a lot of the, they have opposite parts of the process and then talks about how you can piece it together in the ends to get a full product. and I approached it from the perspective of, I didn't want to write a tutorial that teaches you how to do something specific from start to finish, because I personally don't find those to be super useful. Um, they're great for getting started. They're great for building stuff. If you're building something, that's exactly the same as the tutorial you found.

But they don't help you really understand how it works. It's kind of like if you just learn how to cook risotto, you know how to cook risotto, but nobody told you that, Hey, you actually, now that you know how to cook risotto, you also know how to just make rice and peas. It's pretty much the same process.

Uh, and if you don't have that understanding, it's very hard to then transition between technologies and it's hard to apply them to your specific situation. So I try to avoid that and write more from the perspective. How I can give somebody who knows JavaScript who's a front end engineer, or just a JavaScript developer, how I can give them enough to really understand how serverless and backends works and be able to apply those approaches to any project.

[00:18:29] Jeremy: When people hear serverless, a lot of times they're not really sure what that actually means. I think a lot of times people think about Lambdas, they think about functions as a service. but I wonder to you what does serverless mean?

[00:18:45] Swizec: It's not that there's no server, there's almost always some server somewhere. There has to be a machine that actually runs your code. The idea of serverless is that the machine and the system that handles that stuff is trans is invisible to you. You're offloading all of the dev ops work to somebody else so that you can full focus on the business problems that you're trying to solve.

You can focus on the stuff that is specific and unique to your situation because, you know, there's a million different ways to set up a server that runs on a machine somewhere and answers, a, API requests with adjacent. And some people have done that. Thousands of times, new people, new folks have probably never done it.

And honestly, it's really boring, very brittle and kind of annoying, frustrating work that I personally never liked. So with serverless, you can kind of hand that off to a whole team of engineers at AWS or at Google or, whatever other providers there are, and they can deal with that stuff. And you can, you can work on the level of, I have this JavaScript function.

I want this JavaScript function to run when somebody hits this URL and that's it. That's all, that's essentially all you have to think about. So that's what serverless means to me. It's essentially a cloud functions, I guess.

[00:20:12] Jeremy: I mean, there been services like Heroku, for example, that, that have let people make rails apps or Django apps and things like that, where the user doesn't really have to think about the operating system, um, or about creating databases and things like that. And I wonder, to you, if, if that is serverless or if that's something different and, and what the difference there might be.

[00:20:37] Swizec: I think of that as an intermediary step between on prem or handling your own servers and full serverless, because you still have to think about provisioning. You still have to think of your server as a whole blob or a whole glob of things that runs together and runs somewhere and lives or lifts somewhere.

You have to provision capacity. You have to still think about how many servers you have on Heroku. They're called dynos. you still have to deal with the routing. You have to deal with connecting it to the database. Uh, you always have to think about that a little bit, but you're, you're still dealing with a lot of the frameworky stuff where you have to, okay, I'm going to declare a route. And then once I've declared the route, I'm going to tell it how to take data from the, from the request, put it to the function. That's actually doing the work. And then you're still dealing with all of that. Whereas with full serverless, first of all, it can scale down to zero, which is really useful.

If you don't have a lot of traffic, you can have, you're not paying anything unless somebody is actually using your app. The other thing is that you don't deal with any of the routing or any of that. You're just saying, I want this URL to exist, and I want it to run that function, that you don't deal with anything more than that.

And then you just write, the actual function that's doing the work. So it ends up being as a normal jobs function that accepts a request as an argument and returns a JSON response, or even just a JSON object and the serverless machinery handles everything else, which I personally find a lot easier. And you don't have to have these, what I call JSON bureaucracy, where you're piping an object through a bunch of different functions to get from the request to the actual part that's doing the work. You're just doing the core interesting work.

[00:22:40] Jeremy: Sort of sounds like one of the big distinctions is with something like Heroku or something similar. You may not have a server, but you have the dyno, which is basically a server. You have something that is consistently running,

Whereas with what you consider to be serverless, it's, it's something that basically only launches on when it's invoked. Um, whether that's a API call or, or something else. The, the routing thing is a little bit interesting because the, when I was going through the course, there are still the routes that you write. It's just that you're telling, I guess the API gateway Amazon's API gateway, how to route to your functions, which was very similar to how to route to a controller action or something like that in other languages.

[00:23:37] Swizec: Yeah. I think that part is actually is pretty similar where, I think it kind of depends on what kind of framework you end up building. Yeah, it can be very simple. I know with rails, it's relatively simple to define a new route. I think you have to touch three or four different files. I've also worked in large express apps where.

Hooking up the controller with all of the swagger definitions or open API definitions, and everything else ends up being like six or seven different files that have to have functions that are named just right. And you have to copy paste it around. And I, I find that to be kind of a waste of effort, with the serverless framework.

What I like is you have this YAML file and you say, this route is handled by this function. And then the rest happens on its own with next JS or with Gatsby functions, Gatsby cloud functions. They've gone even a step further, which I really like. You have the slash API directory in your project and you just pop a file in there.

And whatever that file is named, that becomes your API route and you don't even have to configure anything. You're just, in both of them, if you put a JavaScript file in slash API called hello, That exports, a handler function that is automatically a route and everything else happens behind the scenes.

[00:25:05] Jeremy: So that that's more of a matter of the framework you're using and how easy does it make it to, to handle routing? Whether that's a pain or a not.

[00:25:15] Swizec: Yeah. and I think with the serverless frameworks, it's because serverless itself, as a concept makes it easier to set this up. We've been able to have these modern frameworks with really good developer experience Gatsby now with how did they have Gatsby cloud and NextJS with Vercel and I think Netlify is working on it as well.

They can have this really good integration between really tight coupling and tight integration between a web framework and the deployment environment, because serverless is enabling them to spin that up. So easily.

[00:25:53] Jeremy: One of the things about your courses, this isn't the only thing you focus on, but one of the use cases is basically replacing a traditional server rendered application or a traditional rails, django, spring application, where you've got Amazon's API gateway in front, which is serving as the load balancer.

And then you have your Lambda functions, which are basically what would be a controller action in a lot of frameworks. and then you're hooking it up to a database which could be Amazon. It could be any database, I suppose. And I wonder in your experience having worked with serverless at your job or in side projects, whether that's like something you would use as a default or whether serverless is more for background jobs and things like that.

[00:26:51] Swizec: I think the underlying hidden question you're asking is about cold starts and API, and the response times, is one of the concerns that people have with serverless is that if your app is not used a lot, your servers scale down to zero. So then when somebody new comes on, it can take a really long time to respond.

And they're going to bail and be upset with you. One way that I've solved, that is using kind of a more JAM Stacky approach. I feel like that buzzword is still kind of in flux, but the idea is that the actual app front-end app, the client app is running off of CDNs and doesn't even touch your servers.

So that first load is of the entire app and of the entire client system is really fast because it comes from a CDN that's running somewhere as close as possible to the user. And it's only the actual APIs are hitting your server. So in the, for example, if you have something like a blog, you can, most blogs are pretty static.

Most of the content is very static. I use that on my blog as well. you can pre-render that when you're deploying the project. So you, you kind of, pre-render everything that's static when you deploy. And then it becomes just static files that are served from the CDN. So you get the initial article. I think if you, I haven't tested in a while, but I think if you load one of my articles on swizec.com, it's readable, like on lighthouse reports, if you look at the lighthouse where it gives you the series of screenshots, the first screenshot is already fully readable.

I think that means it's probably under 30 or 40 milliseconds to get the content and start reading, but then, then it rehydrates and becomes a react app. and then when it's a react app, it can make for their API calls to the backend. So usually on user interaction, like if you have upvotes or comments or something like that, Only when the user clicks something, you then make an API call to your server, and that then calls a Lambda or Gatsby function or a Netlify cloud function, or even a Firebase function, which then then wakes up and talks to the database and does things, and usually people are a lot more forgiving of that one taking 50 milliseconds to respond instead of 10 milliseconds, but, you know, 50 milliseconds is still pretty good.

And I think there were recently some experiments shared where they were comparing cold start times. And if you write your, uh, cloud functions in JavaScript, the average cold startup time is something like a hundred milliseconds. And a big part of that is because you're not wrapping this entire framework, like express or rails into your function. It's just a small function. So the server only has to load up something like, I don't know. I think my biggest cloud functions have been maybe 10 kilobytes with all of the dependencies and everything bundled in, and that's pretty fast for a server to, to load run, start no JS and start serving your request.

It's way fast enough. And then if you need even more speed, you can go to rust or go, which are even faster. As long as you avoid the java, .net, C-sharp those kinds of things. It's usually fine.

[00:30:36] Jeremy: One of the reasons I was curious is because I was going through the rest example you've got, where it's basically going through Amazon's API gateway, um, goes to a Lambda function written in JavaScript, and then talks to dynamoDB gives you a record back or creates a record and, I, I found that just making those calls, making a few calls, hopefully to account for the cold start I getting response times of maybe 150 to 250 milliseconds, which is not terrible, but, it's also not what I would call fast either.

So I was just kind of curious, when you have a real app, like, are, are there things that you've come across where Lambda maybe might have some issues or at least there's tricks you need to do to, to work around them?

[00:31:27] Swizec: Yeah. So the big problem there is, as soon as a database is involved, that tends to get. Especially if that database is not co-located with your Lambda. So it's usually, or when I've experimented, it was a really bad idea to go from a Vercel API function, talk to dynamo DB in AWS that goes over the open internet.

And it becomes really slow very quickly. at my previous job, I experimented with serverless and connecting it to RDS. If you have RDS in a separate private network, then RDS is that they, the Postgres database service they have, if that's running in a separate private network, then your functions, it immediately adds 200 or 300 milliseconds to your response times.

If you keep them together, it usually works a lot faster. ANd then there are ways to keeping them. Pre-warned usually it doesn't work as well as you would want. There are ways on AWS to, I forget what it's called right now, but they have now what's, some, some sort of automatic rewarming, if you really need response times that are smaller than a hundred, 200 milliseconds.

But yeah, it mostly depends on what you're doing. As soon as you're making API calls or database calls. You're essentially talking to a different server that is going to be slower on a lambda then it is if you have a packaged pserver, that's running the database and the server itself on the same machine.

[00:33:11] Jeremy: And are there any specific challenges related to say you mentioned RDS earlier? I know with some databases, like for example, Postgres sometimes, uh, when you have a traditional server application, the server will pool the connections. So it'll make some connection into your data database and just keep reusing them.

Whereas with the Lambda is it making a new connection every time?

[00:33:41] Swizec: Almost. So Lambdas. I think you can configure how long it stays warm, but what AWS tries to do is reuse your laptops. So when the Lambda wakes up, it doesn't die immediately. After that initial request, it stays, it stays alive for the next, let's say it's one minute. Or even if it's 10 minutes, it's, there's a life for the next couple of minutes.

And during that time, it can accept new requests, new requests and serve them. So anything that you put in the global namespace of your phone. We'll potentially remain alive between functions and you can use that to build a connection pool to your database so that you can reuse the connections instead of having to open new connections every time.

What you have to be careful with is that if you get simultaneous requests at actually simultaneous requests, not like 10 requests in 10 milliseconds, if you get 10 requests at the same millisecond, you're going to wake up multiple Lambdas and you're going to have multiple connection pools running in parallel.

So it's very easy to crash your RDS server with something like AWS Lambda, because I think the default concurrency limit is a thousand Lambdas. And if each of those can have a pool of, let's say 10 requests, that's 10,000 open requests or your RDS server. And. You were probably not paying for high enough tier for the RDS server to survive that that's where it gets really tricky.

I think AWS now has a service that lets you kind of offload a connection pool so that you can take your Lambda and connect it to the connection pool. And the connection pool is keeping warm connections to your server. but an even better approach is to use something like Aurora DB, which is also an on AWS or dynamo DB, which are designed from the ground up to work with serverless applications.

[00:35:47] Jeremy: It's things that work, but you have to know sort of the little, uh, gotchas, I guess, that are out there.

[00:35:54] Swizec: Yeah, exactly. There's sharp edges to be found everywhere. part of that is also that. serverless, isn't that old yet I think AWS Lambda launched in 2014 or 2015, which is one forever in internet time, but it's still not that long ago. So we're still figuring out how to make things better.

And, it's also where, where you mentioned earlier that whether it's more appropriate for backend processes or for user-facing processes, it does work really well for backend processes because you CA you have better control over the maximum number of Lambdas that run, and you have more patience for them being slow, being slow sometimes. And so on.

[00:36:41] Jeremy: It sounds like even for front end processes as long as you know, like you said, the sharp edges and you could do things like putting a CDN in front where your Lambdas don't even get hit until some later time.

There's a lot of things you can do to make it where it is a good choice or a good I guess what you're saying, when you're building an application, do you default to using a serverless type of stack?

[00:37:14] Swizec: Yes, for all of my side projects, I default to using serverless. Um, I have a bunch of apps running that way, even when serverless is just no servers at all. Like my blog doesn't have any cloud functions right now. It's all running from CDNs, basically. I think the only, I don't know if you could even count that as a cloud function is w my email signup forms go to an API with my email provider.

So there's also not, I don't have any servers there. It's directly from the front end. I would totally recommend it if you are a startup that just got tens of millions of dollars in funding, and you are planning to have a million requests per second by tomorrow, then maybe not. That's going to be very expensive very quickly.

But there's always a trade off. I think that with serverless, it's a lot easier to build in terms of dev ops and in terms of handling your infrastructure, it's, it takes a bit of a mind shift in how you're building when it comes to the actual logic and the actual, the server system that you're building.

And then in terms of costs, it really depends on what you're doing. If you're a super huge company, it probably doesn't make sense to go and serverless, but if you're that. Or if you have that much traffic, you hopefully are also making enough money to essentially build your own serverless system for yourself.

[00:38:48] Jeremy: For someone who's interested in trying serverless, like I know for myself when I was going through the tutorial you're using the serverless framework and it creates all these different things in AWS for you and at a high level I could follow. Okay. You know, it has the API gateway and you've got your simple queue service and DynamoDB, and the lambdas all that sort of thing.

So at a high level, I could follow along. But when I log into the AWS console, not knowing a whole lot about AWS, it's creating a ton of stuff for you.

And I'm wondering from your perspective for somebody who's learning about serverless, how much do they need to really dive into the AWS internals and understand what's going on there.

[00:39:41] Swizec: That's a tough one because personally I try to stay away as much as possible. And especially with the serverless framework, what I like is configuring everything through the framework rather than doing it manually. Um, because there's a lot of sharp edges there as well. Where if you go in and you manually change something, then AWS can't allow serverless framework to clean up anymore and you can have ghost processes running.

At Tia, we've had that as a really interesting challenge. We're not using serverless framework, we're using something called cloud formation, which is essentially.

One lower level of abstraction, then serverless framework, we're doing a lot more work. We're creating a lot more work for ourselves, but that's what we have. And that's what we're working with. these decisions predate me. So I'm just going along with what we have and we wanted to have more control, because again, we have dev ops people on the team and they want more control because they also know what they're doing and we keep having trouble with, oh, we were trying to use infrastructure as code, but then there's this little part where you do have to go into the AWS console and click around a million times to find the right thing and click it.

And we've had interesting issues with hanging deploys where something gets stuck on the AWS side and we can take it back. We can tear it down, we can stop it. And it's just a hanging process and you have to wait like seven hours for AWS to do. Oh, okay. Yeah. If it's been there for seven hours, it's probably not needed and then kills it and then you can deploy.

So that kind of stuff gets really frustrating very quickly.

[00:41:27] Jeremy: Sounds like maybe in your personal projects, you've been able to, to stick to the serverless framework abstraction and not necessarily have to understand or dive into the details of AWS and it's worked out okay for you.

[00:41:43] Swizec: Yeah, exactly. it's useful to know from a high, from a high level what's there and what the different parts are doing, but I would not recommend configuring them through the, through the AWS console because then you're going to always be in the, in the AWS console. And it's very easy to get something slightly wrong.

[00:42:04] Jeremy: Yeah. I mean, I know for myself just going through the handbook, just going into the console and finding out where I could look at my logs or, um, what was actually running in AWS. It wasn't that straightforward. So, even knowing the bare minimum for somebody who's new to, it was like a little daunting.

[00:42:26] Swizec: Yeah, it's super daunting. And they have thousands, if not hundreds of different products on AWS. and when it comes to, like you mentioned logs, I, I don't think I put this in the handbook because I either didn't know about it yet, or it wasn't available quite yet, but serverless can all the serverless framework also let you look at logs through the servers framework.

So you can say SLS function, name, logs, and it shows you the latest logs. it also lets you run functions locally to an extent. it's really useful from that perspective. And I personally find the AWS console super daunting as well. So I try to stay away as much as possible.

[00:43:13] Jeremy: It's pretty wild when you first log in and you click the button that shows you the services and it's covering your whole screen. Right. And you're like, I just want to see what I just pushed.

[00:43:24] Swizec: Yeah, exactly. And there's so many different ones and they're all they have these obscure names that I don't find meaningful at all.

[00:43:34] Jeremy: I think another thing that I found a little bit challenging was that when I develop applications, I'm used to having the feedback cycle of writing the code, running the application or running a test and seeing like, did it work? And if it didn't, what's the stack trace, what, what happened? And I found the process of going into CloudWatch and looking at the logs and waiting for them to eventually refresh and all that to be, a little challenging. And, and, um, so I was wondering in your, your experience, um, how you've worked through, you know, how are you able to get a fast feedback loop or is this just kind of just part of it.

[00:44:21] Swizec: I am very lazy when it comes to writing tests, or when it comes to fast feedback loops. I like having them I'm really bad at actually setting them up. But what I found works pretty well for serverless is first of all, if you write your backend a or if you write your cloud functions in TypeScript that immediately resolves most of the most common issues, most common sources of bugs, it makes sure that you're not using something that doesn't exist.

Make sure you're not making typos, make sure you're not holding a function wrong, which I personally find very helpful because I have pretty fast and I make typos. And it's so nice to be able to say, if it's completely. I know that it's at least going to run. I'm not going to have some stupid issue of a missing semi-colon or some weird fiddly detail.

So that's already a super fast feedback cycle that runs right in your IDE the next step is because you're just writing the business logic function and you know, that the function itself is going to run. You can write unit tests that treat that function as a normal function. I'm personally really bad at writing those unit tests, but they can really speed up the, the actual process of testing because you can go and you can be like, okay.

So I know that the code is doing what I want it to be doing if it's running in isolation. And that, that can be pretty fast. The next step that is, uh, Another level in abstraction and and gives you more feedback is with serverless. You can locally invoke most Lambdas. The problem with locally running your Lambdas is that it's not the same environment as on AWS.

And I asked one of the original developers of the same serverless framework, and he said, just forget about accurately replicating AWS on your system. There are so many dragons there it's never going to work. and I had an interesting example about that when I was building a little project for my girlfriend that sends her photos from our relationship to an IOT device every day or something like that.

It worked when I ran SLS invoke and it ran and it even called all of the APIs and everything worked. It was amazing. And then when I deployed it, it didn't work and it turned out that it was a permissions issue. I forgot to give myself a specific, I am role for something to work. That's kind of like a stair-stepping process of having fast feedback cycles first, if it compiles, that means that you're not doing anything absolutely wrong.

If the tests are running, that means it's at least doing what you think it's doing. If it's invoking locally, it means that you're holding the API APIs and the third-party stuff correctly. And then the last step is deploying it to AWS and actually running it with a curl or some sort of request and seeing if it works in production.

And that then tells you if it's actually going to work with AWS. And the nice thing there is because uh serverless framework does this. I think it does a sort of incremental deploys. The, that cycle is pretty fast. You're not waiting half an hour for your C code pipeline or for your CIO to run an integration test, to do stuff.

One minute, it takes one minute and it's up and you can call it and you immediately see if it's working.

[00:47:58] Jeremy: Basically you're, you're trying to do everything you can. Static typing and, running tests just on the functions. But I guess when it comes down to it, you really do have to push everything, update AWS, have it all run, um, in order to, to really know. Um, and so I guess it's, it's sort of a trade-off right. Versus being able to, if you're writing a rails application and you've got all your dependencies on your machine, um, you can spin it up and you don't really have to wait for it to, to push anywhere, but,

[00:48:36] Swizec: Yeah. But you still don't know if, what if your database is misconfigured in production?

[00:48:42] Jeremy: right, right. So it's, it's never, never the same as

production. It's just closer. Right? Yeah. Yeah, I totally get When you don't have the real services or the real databases, then there's always going to be stuff that you can miss. Yeah,

[00:49:00] Swizec: Yeah. it's not working until it's working in production.

[00:49:03] Jeremy: That's a good place to end it on, but is there anything else you want to mention before we go?

[00:49:10] Swizec: No, I think that's good. Uh, I think we talked about a lot of really interesting stuff.

[00:49:16] Jeremy: Cool. Well, Swiz, thank you so much for chatting with me today.

[00:49:19] Swizec: Yeah. Thank you for having me.

Oct 28, 2021

01:07:04

Alexander Pugh is a software engineer at Albertsons. He has worked in Robotic Process Automation and the cognitive services industry for over five years.

This episode originally aired on Software Engineering Radio.

Related Links

Alexander Pugh's personal site

Enterprise RPA Solutions

Enterprise "Low Code/No Code" API Solutions

RPA and the OS

Transcript

You can help edit this transcript on GitHub.

[00:00:00] Jeremy: Today, I'm talking to Alexander Pugh. He's a solutions architect with over five years of experience working on robotic process automation and cognitive services.

Today, we're going to focus on robotic process automation.

Alexander welcome to software engineering radio.

[00:00:17] Alex: Thank you, Jeremy. It's really good to be here.

[00:00:18] Jeremy: So what does robotic process automation actually mean?

[00:00:23] Alex: Right. It's a, it's a very broad nebulous term. when we talk about robotic process automation, as a concept, we're talking about automating things that humans do in the way that they do them. So that's the robotic, an automation that is, um, done in the way a human does a thing.

Um, and then process is that thing, um, that we're automating. And then automation is just saying, we're turning this into an automation where we're orchestrating this and automating this. and the best way to think about that in any other way is to think of a factory or a car assembly line. So initially when we went in and we, automated a car or factory, automation line, what they did is essentially they replicated the process as a human did it. So one day you had a human that would pick up a door and then put it on the car and bolt it on with their arms. And so the initial automations that we had on those factory lines were a robot arm that would pick up that door from the same place and put it on the car and bolt it on there.

Um, so the same can be said for robotic process automation. We're essentially looking at these, processes that humans do, and we're replicating them, with an automation that does it in the same way. Um, and where we're doing that is the operating system. So robotic process automation is essentially going in and automating the operating system to perform tasks the same way a human would do them in an operating system.

So that's, that's RPA in a nutshell,

Jeremy: So when you say you're replicating something that a human would do, does it mean it has to go through some kind of GUI or some kind of user interface?

[00:02:23] Alex: That's exactly right, actually. when we're talking about RPA and we look at a process that we want to automate with RPA, we say, okay. let's watch the human do it. Let's record that. Let's document the human process. And then let's use the RPA tool to replicate that exactly in that way.

So go double click on Chrome, launch that click in the URL line and send key in www.cnn.com or what have you, or servicenow hit enter, wait for it to load and then click, you know, where you want to, you know, fill out your ticket for service. Now send key in. So that's exactly how an RPA solution at the most basic can be achieved.

Now and any software engineer knows if you sit there and look over someone's shoulder and watch them use an operating system. Uh, you'll say, well, there's a lot of ways we can do this more efficiently without going over here, clicking that, you know, we can, use a lot of services that the operating system provides in a programmatic way to achieve the same ends and RPA solutions can also do that.

The real key is making sure that it is still achieving something that the human does and that if the RPA solution goes away, a human can still achieve it. So if you're, trying to replace or replicate a process with RPA, you don't want to change that process so much so that a human can no longer achieve it as well.

that's something where if you get a very technical, and very fluent software engineer, they lose sight of that because they say, oh, you know what? There's no reason why we need to go open a browser and go to you know, the service now portal and type this in when I can just directly send information to their backend.

which a human could not replicate. Right? So that's kind of where the line gets fuzzy. How efficiently can we make this RPA solution?

[00:04:32] Jeremy: I, I think a question that a lot of people are probably having is a lot of applications have APIs now. but what you're saying is that for it to, to be, I suppose, true RPA, it needs to be something that a user can do on their own and not something that the user can do by opening up dev tools or making a post to an end point.

[00:04:57] Alex: Yeah. And so this, this is probably really important right now to talk about why RPA, right? Why would you do this when you could put on a server, a a really good, API ingestion point or trigger or a web hook that can do this stuff. So why would we, why would we ever pursue our RPA?

There there's a lot of good reasons for it. RPA is very, very enticing to the business. RPA solutions and tools are marketed as a low code, no code solution for the business to utilize, to solve their processes that may not be solved by an enterprise solution and the in-between processes in a way.

You have, uh, a big enterprise, finance solution that everyone uses for the finance needs of your business, but there are some things that it doesn't provide for that you have a person that's doing a lot of, and the business says, Okay. well, this thing, this human is doing this is really beneath their capability. We need to get a software solution for it, but our enterprise solution just can't account for it. So let's get a RPA capability in here. We can build it ourselves, and then there we go. So there, there are many reasons to do that. financial, IT might not have, um, the capability or the funding to actually build and solve the solution. Or it it's at a scale that is too small to open up, uh, an IT project to solve for. Um, so, you know, a team of five is just doing this and they're doing it for, you know, 20 hours a week, which is uh large, but in a big enterprise, that's not really. Maybe um, worth building an enterprise solution for it. or, and this is a big one. There are regulatory constraints and security constraints around being able to access this or communicate some data or information in a way that is non-human or programmatic. So that's really where, um, RPA is correctly and best applied and you'll see it most often.

So what we're talking about there is in finance, in healthcare or in big companies where they're dealing with a lot of user data or customer data in a way. So when we talk about finance and healthcare, there are a lot of regulatory constraints and security reasons why you would not enable a programmatic solution to operate on your systems.

You know, it's just too hard. We we're not going to expose our databases or our data to any other thing. It would, it would take a huge enterprise project to build out that capability, secure that capability and ensure it's going correctly. We just don't have the money the time or the strength honestly, to afford for it.

So they say, well, we already have. a user pattern. We already allow users to, to talk to this information and communicate this information. Let's get an RPA tool, which for all intents and purposes will be acting as a user. And then it can just automate that process without us exposing to queries or any other thing, an enterprise solution or programmatic, um, solution.

So that's really why RPA, where and why you, you would apply it is there's, there's just no capability at enterprise for one reason or another to solve for it.

[00:08:47] Jeremy: as software engineers, when we see this kind of problem, our first thought is, okay, let's, let's build this custom application or workflow. That's going to talk to all these API APIs. And, and what it sounds like is. In a lot of cases there just isn't the time there just isn't the money, to put in the effort to do that.

And, it also sounds like this is a way of being able to automate that. and maybe introducing less risk because you're going through the same, security, the same workflow that people are doing currently. So, you know, you're not going to get into things that they're not supposed to be able to get into because all of that's already put in place.

[00:09:36] Alex: Correct. And it's an already accepted pattern and it's kind of odd to apply that kind of very IT software engineer term to a human user, but a human user is a pattern in software engineering. We have patterns that do this and that, and, you know, databases and not, and then the user journey or the user permissions and security and all that is a pattern.

And that is accepted by default when you're building these enterprise applications okay.

What's the user pattern. And so since that's already established and well-known, and all the hopefully, you know, walls are built around that to enable it to correctly do what it needs to do. It's saying, Okay. we've already established that. Let's just use that instead of. You know, building a programmatic solution where we have to go and find, do we already have an appropriate pattern to apply to it? Can we build it in safe way? And then can we support it? You know, all of a sudden we, you know, we have the support teams that, you know, watch our Splunk dashboards and make sure nothing's going down with our big enterprise application.

And then you're going to build a, another capability. Okay. WHere's that support going to come from? And now we got to talk about change access boards, user acceptance testing and, uh, you know, UAT dev production environments and all that. So it becomes, untenable, depending on your, your organization to, to do that for things that might fall into a place that is, it doesn't justify the scale that needs to be thrown upon it.

But when we talk about something like APIs and API exist, um, for a lot of things, they don't exist for everything. And, a lot of times that's for legacy databases, that's for mainframe capability. And this is really where RPA shines and is correctly applied. And especially in big businesses are highly regulated businesses where they can't upgrade to the newest thing, or they can't throw something to the cloud.

They have a, you know, their mainframe systems or they have their database systems that have to exist for one reason or the other until there is the motivation and the money and the time to correctly migrate and, and solve for them. So until that day, and again, there's no, API to, to do anything on a, on a mainframe, in this bank or whatnot, it's like, well, Okay. let's just throw RPA on it.

Let's, you know, let's have a RPA do this thing, uh, in the way that a human does it, but it can do it 24 7. and an example, or use cases, you work at a bank and, uh, there's no way that InfoSec is going to let you query against this database with, your users that have this account or your customers that have this no way in any organization at a bank.

Is InfoSec going to say, oh yeah. sure. Let me give you an Odata query, you know, driver on, you know, and you can just set up your own SQL queries and do whatever they're gonna say no way. In fact, how did you find out about this database in the first place and who are you.

How do we solve it? We, we go and say, Okay. how does the user get in here well they open up a mainframe emulator on their desktop, which shows them the mainframe. And then they go in, they click here and they put this number in there, and then they look up this customer and then they switch this value to that value and they say, save.

And it's like, okay. cool. That's that RPA can do. And we can do that quite easily. And we don't need to talk about APIs and we don't need to talk about special access or doing queries that makes, you know, Infosec very scared. you know, a great use case for that is, you know, a bank say they, they acquire, uh, a regional bank and they say, cool, you're now part of our bank, but in your systems that are now going to be a part of our systems, you have users that have this value, whereas in our bank, that value is this value here. So now we have to go and change for 30,000 customers this one field to make it line up with our systems. Traditionally you would get a, you know, extract, transform load tool an ETL tool to kind of do that. But for 30,000 customers that might be below the threshold, and this is banking. So it's very regulated and you have to be very, very. Intentional about how you manipulate and move data around.

So what do we have to do? okay. We have to hire 10 contractors for six months, and literally what they're going to do eight hours a day is go into the mainframe through the simulator and customer by customer. They're going to go change this value and hit save. And they're looking at an Excel spreadsheet that tells them what customer to go into.

And that's going to cost X amount of money and X, you know, for six months, or what we could do is just build a RPA solution, a bot, essentially that goes, and for each line of that Excel spreadsheet, it repeats this one process, open up mainframe emulator, navigate into the customer profile and then changes value, and then shut down and repeat.

And It can do that in one week and, and can be built in two, that's the, the dream use case for RPA and that's really kind of, uh, where it would shine.

[00:15:20] Jeremy: It sounds like the. best use case for it is an old system, a mainframe system, in COBOL maybe, uh, doesn't have an API. And so, uh, it makes sense to rather than go, okay, how can we get directly into the database?

[00:15:38] Alex: How can we build on top of it? Yeah,

[00:15:40] Jeremy: we build on top of it? Let's just go through the, user interface that exists, but just automate that process. And, the, you know, the example you gave, it sounds very, very well-defined you're gonna log in and you're going to put in maybe this ID, here's the fields you want to get back.

and you're going to save those and you didn't have to make any real decisions, I suppose, in, in terms of, do I need to click this thing or this thing it's always going to be the same path to, to get there.

[00:16:12] Alex: exactly. And that's really, you need to be disciplined about your use cases and what those look like. And you can broadly say a use case that I am going to accept has these features, and one of the best ways to do that is say it has to be a binary decision process, which means there is no, dynamic or interpreted decision that needs to, or information that needs to be made.

Exactly like that use case it's very binary either is, or it isn't you go in you journey into there. and you change that one thing and that's it there's no oh, well this information says this, which means, and then I have to go do this. Once you start getting in those if else, uh, processes you're, you're going down a rabbit hole and it could get very shaky and that introduces extreme instability in what you're trying to do.

And also really expands your development time cause you have to capture these processes and you have to say, okay. tell me exactly what we need to build this bot to do. And for, binary decision processes, that's easy go in here, do this, but nine times out of 10, as you're trying to address this and solution for it, you'll find those uncertainties.

You'll find these things where the business says, oh, well, yeah. that happens, you know, one times out of 10 and this is what we need to do. And it's like, well, that's going to break the bot. It, you know, nine times out of 10, this, this spot is going to fall over. this is now where we start getting into, the machine learning and AI, realm.

And why RPA, is classified. Uh, sometimes as a subset of the AI or machine learning field, or is a, a pattern within that field is because now that you have this bot or this software that enables you to do a human process, let's enable that bot to now do decision-making processes where it can interpret something and then do something else.

Because while we can just do a big tree to kind of address every capability, you're never going to be able to do that. And also it's, it's just a really heavy, bad way to build things. So instead let's throw in some machine learning capability where it just can understand what to do and that's, you know, that's the next level of RPA application is Okay. we've got it. We've, we've gone throughout our organization. We found every kind of binary thing, that can be replaced with an RPA bot. Okay.

Now what are the ones that we said we couldn't do? Because it had some of that decision-making that, required too much of a dynamic, uh, intelligence behind it. And let's see if we can address those now that we have this. And so that's, that's the 2.0, in RPA is addressing those non-binary, paths.

I would argue that especially in organizations that are big enough to justify bringing in an RPA solution to solve for their processes. They have enough binary processes, binary decision processes to keep them busy.

Some people, kind of get caught up in trying to right out the gate, say, we need to throw some machine learning. We need to make these bots really capable instead of just saying, well, we we've got plenty of work, just changing the binary processes or addressing those. Let's just be disciplined and take that, approach.

Uh, I will say towards RPA and bots, the best solution or the only solution. When you talk about building a bot is the one that you eventually turn off. So you can say, I built a bot that will go into our mainframe system and update this value. And, uh, that's successful.

I would argue that's not successful. When that bot is successful is when you can turn it off because there's an enterprise solution that addresses it. and, and you don't have to have this RPA bot that lives over here and does it instead, you're enterprise, capability now affords for it. And so that's really, I think a successful bot or a successful RPA solution is you've been able to take away the pain point or that human process until it can be correctly addressed by your systems that everyone uses.

[00:21:01] Jeremy: from, the business perspective, you know, what are some of the limitations or long-term problems with, with leaving an RPA solution in place?

[00:21:12] Alex: that's a, that's a good question. Uh, from the business there, isn't, it's solved for. leaving it in place is other than just servicing it and supporting it. There's no real issue there, especially if it's an internal system, like a mainframe, you guys own that. If it changes, you'll know it, if it changes it's probably being fixed or addressed.

So there's no, problem. However, That's not the only application for RPA. let's talk about another use case here, your organization, uses, a bank and you don't have an internal way to communicate it. Your user literally has to go to the bank's website, log in and see information that the bank is saying, Hey, this is your stuff, right?

The bank doesn't have an API for their, that service. because that would be scary for the bank. They say, we don't want to expose this to another service. So the human has to go in there, log in, look at maybe a PDF and download it and say, oh, Okay.

So that is happens in a browser. So it's a newer technology.

This isn't our mainframe built in 1980. You know, browser based it's in the internet and all that, but that's still a valid RPA application, right? It's a human process. There's no API, there's no easy programmatic way to, to solution for it. It would require the bank and your it team to get together and, you know, hate each other. Think about why this, this is so hard. So let's just throw a bot on it. That's going to go and log in, download this thing from the bank's website and then send it over to someone else. And it's going to do that all day. Every day. That's a valid application. And then tomorrow the bank changes its logo. And now my bot is it's confused.

Stuff has shifted on the page. It doesn't know where to click anymore. So you have to go in and update that bot because sure enough, that bank's not going to send out an email to you and saying, Hey, by the way, we're upgrading our website in two weeks. Not going to happen, you'll know after it's happened.

So that's where you're going to have to upgrade the bot. and that's the indefinite use of RPA is going to have to keep until someone else decides to upgrade their systems and provide for a programmatic solution that is completely outside the, uh, capability of the organization to change. And so that's where the business would say, we need this indefinitely.

It's not up to us. And so that is an indefinite solution that would be valid. Right? You can keep that going for 10 years as long, I would say you probably need to get a bank that maybe meets your business needs a little easier, but it's valid. And that would be a good way for the business to say yes, this needs to keep running forever until it doesn't.

[00:24:01] Jeremy: you, you brought up the case of where the webpage changes and the bot doesn't work anymore. specifically, you're, you're giving the example of finance and I feel like it would be basically catastrophic if the bot is moving money to somewhere, it shouldn't be moving because the UI has moved around or the buttons not where it expects it to be.

And I'm kind of curious what your experience has been with that sort of thing.

[00:24:27] Alex: you need to set organizational thresholds and say, this is this something this impacting or something that could go this wrong. It is not acceptable for us to solve with RPA, even though we could do it, it's just not worth it. Some organizations say that's anything that touches customer data healthcare and banking specialists say, yeah, we have a human process where the human will go and issue refunds to a customer, uh, and that could easily be done via RPA solution, but it's fraught with, what, if it does something wrong, it's literally going to impact.

Uh, someone somewhere they're their moneys or their, their security or something like that. So that, that definitely should be part of your evaluation. And, um, as an organization, you should set that up early and stick to it and say, Nope, this is outside our purview. Even we can do it. It has these things.

So I guess the answer to that is you should never get to that process, but now we're going to talk about, I guess, the actual nuts and bolts of how RPA solutions work and how they can be made to not action upon stuff when it changes or if it does so RPA software, by and large operates by exposing the operating system or the browsers underlying models and interpreting them.

Right. So when we talk about something like a, mainframe emulator, you have your RPA software on Microsoft windows. It's going to use the COM the component operating model, to see what is on the screen, what is on that emulator, and it's gonna expose those objects. to the software and say, you can pick these things and click on that and do that.

when we're talking about browser, what the RPA software is looking at is not only the COM the, the component object model there, which is the browser, itself. But then it's also looking at the DOM the document object model that is the webpage that is being served through the browser. And it's exposing that and saying, these are the things that you can touch or, operate on.

And so when you're building your bots, what you want to make sure is that the uniqueness of the thing that you're trying to access is something that is truly unique. And if it changes that one thing that the bot is looking for will not change. So we let's, let's go back to the, the banking website, right?

We go in and we launch the browser and the bot is sitting there waiting for the operating system to say, this process is running, which is what you wanted to launch. And it is in this state, you know, the bot says, okay. I'm expecting this kind of COM to exist. I see it does exist. It's this process, and it has this kind of name and cool Chrome is running. Okay. Let's go to this website. And after I've typed this in, I'm going to wait and look at the DOM and wait for it to return this expected a webpage name, but they could change their webpage name, the title of it, right. They can say, one day can say, hello, welcome to this bank. And the next day it says bank website, all of a sudden your bot breaks it no longer is finding what it was told to expect.

So you want to find something unique that will never change with that conceivably. And so you find that one thing on the DOM on the banking website, it's, you know, this element or this tag said, okay, there's no way they're changing that. And so it says cool the page is loaded. Now click on this field, which is log in.

Okay. You want to find something unique on that field that won't change when they upgrade, you know, from bootstrap to this kind of, you know, UI framework. that's all well, and good. That's what we call the happy path. It's doing this perfectly. Now you need to define what it should do when it doesn't find these things, which is not keep going or find similar it's it needs to fail fast and gracefully and pass on that failure to someone and not keep going. And that's kind of how we prevent that scary use case where it's like. okay. it's gone in, it's logged into the bank website now it's transactioning, bad things to bad places that we didn't program it for it, Well you unfortunately did not specify in a detailed enough way what it needs to look for.

And if it doesn't find that it needs to break, instead of saying that this is close enough. And so, in all things, software engineering, it's that specificity, it's that detail, that you need to hook onto. And that's also where, when we talk about this being a low-code no-code solutions that sometimes RPA is marketed to the business.

It's just so often not the case, because yes. It might provide a very user, business, friendly interface for you to build bots. But the knowledge you need to be able to ensure stability and accuracy, um, to build the bots is, is a familiarity that's probably not going to be had in the business. It's going to be had by a developer who knows what the DOM and COM are and how the operating system exposes services and processes and how.

JavaScript, especially when we're talking about single page apps and react where you do have this very reactive DOM, that's going to change. You need to be fluent with that and know, not only how HTML tags work and how CSS will change stuff on you in classes, but also how clicking on something on a single page app is as simple as a username input field will dynamically change that whole DOM and you need to account for it. so, it is it's, traditionally not as easy as saying, oh, the business person can just click, click, click, click, and then we have a bot. You'll have a bot, but it's probably going to be break breaking quite often. It's going to be inaccurate in its execution.

this is a business friendly user-friendly non-technical tool. And I launch it and it says, what do you want to do? And it says, let me record what you're going to do. And you say, cool.

And then you go about you open up Chrome and you type in the browser, and then you click here, click there, hit send, and then you stop recording. The tool says, cool, this is what you've done. Well, I have yet to see a, a solution that is that isn't able to not need further direction or, or defining on that process, You still should need to go in there and say, okay, yeah.

you recorded this correctly, but you know, you're not interpreting correctly or as accurate as you need to that field that I clicked on.

And if you know, anybody hits, you know, F12 on their keyboard while they have Chrome open and they see how the DOM is built, especially if this is using kind of any kind of template, Webpage software. It's going to have a lot of cruft in that HTML. So while yes, the recording did correctly see that you clicked on the input box.

What it's actually seen is that you actually clicked on the div. That is four levels scoped above it, whereas the parent, and there are other things within that as well. And so the software could be correctly clicking on that later, but other things could be in there and you're going to get some instability.

So the human or the business, um, bot builder, the roboticist, I guess, would need to say, okay, listen, we need to pare this down, but it's, it's even beyond that. There are concepts that you can't get around when building bots that are unique to software engineering as a concept. And even though they're very basic, it's still sometimes hard for the business user to, they felt to learn that.

And I I'm talking concepts as simple as for loops or loops in general where the business of course has, has knowledge of what we would call a loop, but they wouldn't call it a loop and it's not as accurately defined. So they have to learn that. And it's not as easy as just saying, oh Yeah.

do a loop. And the business will say, well, what's a loop.

Like I know, you know, conceptually what a loop could be like a loop in my, when I'm tying my shoe. But when you're talking about loop, that's a very specific thing in software and what you can do. And when you shouldn't do it, and that's something that these, no matter how good your low code, no code solution might be, it's going to have to afford for that concept.

And so a business user is still going to have to have some lower level capability to apply those concepts. And, and I I've yet to see anybody be able to get around that in their RPA solutions.

[00:33:42] Jeremy: So in your experience, even though these vendors may sell it as being a tool that anybody can just sit down and use but then you would want a developer to, to sit with them or, or see the result and, and try and figure out, okay, what do you, what do you really want this, this code to do?

Um, not just sort of these broad strokes that you were hoping the tool was gonna take care of for you? Yeah.

[00:34:06] Alex: that that's exactly right. And that's how every organization will come to that realization pretty quickly. the head of the game ones have said, okay, we need to have a really good, um, COE structure to this robotic operating model where we can have, a software engineering, developer capability that sits with the business, capability.

And they can, marry with each other, other businesses who may take, um, these vendors at their word and say, it's a low code meant for business. It just needs to make sure it's on and accessible. And then our business people are just gonna, uh, go in there and do this. They find out pretty quickly that they need some technical, um, guidance to go in because they're building unstable or inaccurate bots.

and whether they come to that sooner or later, they, they always come to that. Um, and they realize that, okay, there there's a technical capability And, this is not just RPA. This is the story of all low-code no-code solutions that have ever existed. It always comes around that, while this is a great interface for doing that, and it's very easy and it makes concepts easy.

Every single time, there is a technical capability that needs to be afforded.

[00:35:26] Jeremy: For the. The web browser, you mentioned the DOM, which is how we typically interact with applications there. But for native applications, you, you briefly mentioned, COM. And I was wondering when someone is writing, um, you know, a bot, uh, what are the sorts of things they see, or what are the primitives they're working with?

Like, is there a name attached to each button, each text, field,

[00:35:54] Alex: wouldn't that be a great world to live in, so there's not. And, and, as we build things in the DOM. People get a lot better. We've seen people are getting much better about using uniqueness when they build those things so that they can latch onto when things were built or built for the COM or, you know, a .NET for OS that might, that was not no one no one was like oh yeah, we're going to automate this.

Or, you know, we need to make this so that this button here is going to be unique from that button over there on the COM they didn't care, you know, different name. Um, so yeah, that is, that is sometimes a big issue when you're using, uh, an RPA solution, you say, okay. cool. Look at this, your calculator app.

And Okay. it's showing me the component object model that this was built. It that's describing what is looking at, but none of these nodes have, have a name. They're all, you know, node one node, 1.1 node two, or, or whatnot, or button is just button and there's no uniqueness around it. And that is, you see a lot of that in legacy older software, um, E legacy is things built in 2005, 2010.

Um, you do see that, and that's the difficulty at that point. You can still solve for this, but what you're doing is you're using send keys. So instead of saying, Okay.

RPA software, open up this, uh, application and then look for. You know, thing, this object in the COM and click on it, it's going to, you know, it can't, there is no uniqueness.

So what you say is just open up the software and then just hit tab three times, and that should get you to this one place that was not unique, but we know if you hit tab three times, it's going to get there now. That's all well and good, but there's so many things that could interfere with that and break it.

And the there's no context for the bot to grab onto, to verify, Okay. I am there. So any one thing, you could have a pop-up which essentially hijacks your send key, right? And so the bot yes, absolutely hit tab three times and it should be in that one place. It thinks it is, and it hits in enter, but in between the first and second tab, a pop-up happened and now it's latched onto this other process, hits enter. And all of a sudden outlook's opening bot doesn't know that, but it's still going on and it's going to enter in some financial information into oops, an email that it launched because it thought hitting enter again would do so. Yeah.

That's, that's where you get that instability. Um, there are other ways around it or other solutions.

and this is where we get into the you're using, um, lower level software engineering solutioning instead of doing it exactly how the user does it. When we're talking about the operating system and windows, there are a ton of interop services and assemblies that a, uh, RPA solution can access.

So instead of cracking open Excel, double-clicking on Excel workbook waiting for it to load, and then reading the information and putting information in, you can use the, you know, the office 365 or whatnot that, um, interop service assembly and say, Hey, launch this workbook without the UI, showing it, attach to that process that, you know, it is.

And then just send to it, using that assembly send information into it. And the human user can't do that. It can't manipulate stuff like that, but the bot can, and it it's the same end as the human users trying. And it's much more efficient and stable because the UI couldn't afford for that kind of stability.

So that would be a valid solution. But at that point, you're really migrating into a software engineering, it developer solution of something that you were trying not to do that for. So when is that? Why, you know, why not just go and solve for it with an enterprise or programmatic solution in the first place?

So that's the balance.

[00:40:18] Jeremy: earlier you're talking about the RPA needs to be something that, uh, that the person is able to do. And it sounds like in this case, I guess there still is a way for the person to do it. They can open up the, the Excel sheet and right it's just that the way the, the RPA tool is doing it is different. Yeah.

[00:40:38] Alex: Right. And more efficient and more stable. Certainly. Uh, especially when we're talking about Excel, you have an Excel with, you know, 200,000 lines, just opening that that's, that's your day, that's going to Excel it, just going to take its time opening and visualizing that information for you. Whereas you, you know, an RPA solution doesn't even need to crack that open.

Uh, it can just send data right directly to that workbook and it that's a valid solution. And again, some of these processes, it might be just two people at your organization that are essentially doing it. So it's, you know, you don't really, it's not at a threshold where you need an enterprise solution for it, but they're spending 30 minutes of their day just waiting for that Excel workbook to open and then manipulating the data and saving it.

And then, oh, their computer crashed. So you can do an RPA solution. It's going to be, um, to essentially build for a more efficient way of doing it. And that would be using the programmatic solution, but you're right. It is doing it in a way that a human could not achieve it. Um, and that again is. The where the discipline and the organizational, aspect of this comes in where it's saying, is that acceptable?

Is it okay to have it do things in this way, that are not human, but achieving the same ends. And if you're not disciplined that creeps, and all of a sudden you have a RPA solution that is doing things in a way that where the whole reason to bring that RPA solution is to not have something that did something like that. And that's usually where the stuff falls apart. IT all of a sudden perks their head up and says, wait, I have a lot of connections coming in from this one computer doing stuff very quickly with a, you know, a SQL query. It's like, what is going on? And so all of a sudden, someone built a bot to essentially do a programmatic connection.

And it is like, you should not be who gave you this permissions who did this shut down everything that is RPA here until we figure out what you guys went and did. So that's, that's the dance.

[00:42:55] Jeremy: it's almost like there's this hidden API or there's this API that you're not intended to use. but in the process of trying to automate this thing, you, you use it and then if your, IT is not aware of it, then things just kind of spiral out of control.

[00:43:10] Alex: Exactly. Right. So let's, you know, a use case of that would be, um, we need to get California tax information on alcohol sales. We need to see what each county taxes for alcohol to apply to something. And so today the human users, they go into the California, you know, tobacco, wildlife, whatever website, and they go look up stuff and okay, let's, that's, that's very arduous.

Let's throw a bot on that. Let's have a bot do that. Well, the bot developers, smart person knows their way around Google and they find out, well, California has an API for that. instead of the bot cracking open Chrome, it's just going to send this rest API call and it's going to get information back and that's awesome and accurate and way better than anything. but now all of a sudden IT sees connections going in and out. all of a sudden it's doing very quickly and it's getting information coming into your systems in a way that you did not know was going to be, uh, happening. And so while it was all well and good, it's, it's a good way for, the people whose job it is to protect yourself or know about these things, to get very, um, angry, rightly so that this is happening.

that's an organizational challenge, uh, and it's an oversight challenge and it's a, it's a developer challenge because, what you're getting into is the problems with having too technical people build these RPA bots, right? So on one hand we have business people who are told, Hey, just crack this thing open and build it.

And it's like, well, they don't have enough technical fluency to actually build a stable bot because they're just taking it at face value. Um, on the other hand, you have software engineers or developers that are very technical that say, oh, this process. Yeah. Okay. I can build a bot for that. But what if I used, you know, these interop services, assemblies that Microsoft gives me and I can access it like that.

And then I can send an API call over here. And while I'm at it, I'm going to, you know, I'm just going to spin up a server just on this one computer that can do this. When the bot talks to it. And so you have the opposite problem. Now you have something that is just not at all RPA, it's just using the tool to, uh, you know, manipulate stuff, programmatically.

[00:45:35] Jeremy: so, as a part of all this, is using the same credentials as a real user, right. You're you're logging in with a username and password. if the form requires something like two factor authentication or, you know, or something like that, like, how does that work since it's not an actual person?

[00:45:55] Alex: Right. So in a perfect world, you're correct. Um, a bot is a user. I know a lot of times you'll hear, say, people will be like, oh, hi, I have 20 RPA bots. What they're usually saying is I have 20 automations that are being run for separate processes, with one user's credentials, uh, on a VDI. So you're right.

They, they are using a user's credentials with the same permissions that any user that does that process has, that's why it's easy. but now we have these concepts, like two factor authentication, which every organization is using that should require something that exists outside of that bot users environment. And so how do you afford for that in a perfect world? It would be a service account, not a user account and service accounts are governed a little differently. A lot of times service accounts, um, have much more stringent rules, but also allow for things like password resets, not a thing, um, or two factor authentication is not a thing for those.

So that would be the perfect solution, but now you're dragging in IT. Um, so, you know, if you're not structurally set up for that, that's going to be a long slog. Uh, so what would want to do some people actually literally have a, we'll have a business person that has their two factor auth for that bot user on their phone.

And then just, you know, they'll just go in and say, yeah.

that's me. that's untenable. So, um, sometimes what a lot of these, like Microsoft, for instance, allow you to do is to install a two factor authentication, application, um, on your desktop so that when you go to log in a website and says, Hey, type in your password.

Cool. Okay. Give me that code. That's on your two factor auth app. The bot can actually launch that. Copy the code and paste it in there and be on its way. But you're right now, you're having to afford for things that aren't really part of the process you're trying to automate. They are the incidentals that also happen.

And so you have to build your bot to afford for those things and interpret, oh, I need to do two factor authentication. And a lot of times, especially if you have an entirely business focused PA um, robotic operating model, they will forget about those things or find ways around them that the bot isn't addressing, like having that authenticator app on their phone.

that's, um, stuff that definitely needs to be addressed. And sometimes is only, found at runtime like, oh, it's asking for login. And when I developed it, I didn't need to do that because I had, you know, the cookie that said you're good for 30 days, but now, oh, no.

[00:48:47] Jeremy: yeah. You could have two factor. Um, you could have, it asking you to check your email for a code. There could be a fraud warning. There's like all sorts of, you know, failure cases that can happen.

[00:48:58] Alex: exactly. And those things are when we talk about, uh, third-party vendors, um, third-party provider vendors, like going back to the banking website, if you don't tell them that you're going to be using a bot to get their information or to interface with that website, you're setting yourself up for a bad time because they're going to see that kind of at runtime behavior that is not possible at scale by user.

And so you run into that issue at runtime, but then you're correct. There are other things that you might run into at runtime that are not again, part of the process, the business didn't think that that was part of the process. It's just something they do that actually the bot has to afford for. that's part of the journey, uh, in building these.

[00:49:57] Jeremy: when you're, when you're building these, these bots, what are the types of tools that, that you've used in the past? Are these commercial, packages, are these open source? Like what, what does that ecosystem look like?

[00:50:11] Alex: Yeah, in this space, we have three big ones, which is, uh, automation anywhere UI path and, blue prism. Those are the RPA juggernauts providing this software to the companies that need it. And then you have smaller ones that are, trying to get in there, or provide stuff in a little different way. and you even have now, big juggernauts that are trying to provide for it, like Microsoft with something like power automate desktop.

So all of these, say three years ago, all of these softwares existed or all of these RPA solution softwares existed or operated in the same kind of way, where you would install it on your desktop. And it would provide you a studio to either record or define, uh, originally the process that was going to be automated on that desktop when you pushed play and they all kind of expose or operate in the same way they would interpret the COM or the DOM that the operating system provided. Things like task scheduler have traditionally, uh, exposed, uh, and they all kind of did that in the same way. Their value proposition in their software was the orchestration capability and the management of that.

So I build a bot to do this, Jim over there built a bot to do that. Okay. This RPA software, not only enabled you to define those processes, But what their real value was is they enable a place where I can say this needs to run at this time on this computer.

And it needs to, you know, I need to be able to monitor it and it needs to return information and all that kind of orchestration capability. Now all of these RPA solutions actually exist in that, like everything else in the browser. So instead of installing, you know, the application and launching it and, and whatnot, and the orchestration capability being installed on another computer that looked at these computers and ran stuff on them.

Now it's, it's all in the cloud as it were, and they are in the browser. So I go to. Wherever my RPA solution is in my browser. And then it says, okay, cool. You, you still need to install something on the desktop where you want the spot to run and it deploys it there. But I define and build my process in the provided browser studio.

And then we're going to give you a capability to orchestrate, monitor, and, uh, receive information on those things that you have, those bots that you have running, and then what they're now providing as well is the ability to tie in other services to your bot so that it has expanded capability. So I'm using automation anywhere and I built my bot and it's going, and it's doing this or that.

And automation anywhere says, Hey, that's cool. Wouldn't you like your bot to be able to do OCR? Well, we don't have our own OCR engine, but you probably as an enterprise do. Just use, you know, use your Kofax OCR engine or Hey, if you're really a high speed, why don't you use your Azure cognitive services capability?

We'll tie it right into our software. And so when you're building your bot, instead of just cracking open a PDF and send key control C and key control V to do stuff instead, we'll use your OCR engine that you've already paid for to, to understand stuff. And so that's, how they expand, what they're offering, um, into addressing more and more capabilities.

[00:53:57] Alex: But now we're, we're migrating into a territory where it's like, well, things have APIs why even build a bot for them. You know, you can just build a program that uses the API and the user can drive this. And so that's where people kind of get stuck. It's they they're using RPA on a, something that just as easily provides for a programmatic solution as opposed to an RPA solution.

but because they're in their RPA mode and they say, we can use a bot for everything, they don't even stop and investigate and say, Hey, wouldn't this be just as easy to generate a react app and let a user use this because it has an API and IT can just as easily monitor and support that because it's in an Azure resource bucket.

that's where an organization needs to be. Clear-eyed and say, Okay. at this point RPA is not the actual solution. We can do this just as easy over here and let's pursue that.

[00:54:57] Jeremy: the experience of making these RPAs. It sounds like you have this browser-based IDE, there's probably some kind of drag and drop set up, and then you, you, you mentioned JavaScript. So I suppose, does that mean you can kind of dive a little bit deeper and if you want to set up specific rules or loops, you're actually writing that in JavaScript.

[00:55:18] Alex: Yeah. So not, not necessarily. So, again, the business does not know what an IDE is. It's a studio. Um,

so that's, but you're correct. It's, it's an IDE. Um, each, whether we're talking about blue prism or UiPath or automation anywhere, they all have a different flavor of what that looks like and what they enable.

Um, traditionally blue prism gave you, uh, a studio that was more shape based where you are using UML shapes to define or describe your process. And then there you are, whereas automation anywhere traditionally used, uh, essentially lines or descriptors. So I say, Hey, I want to open this file. And your studio would just show a line that said open file.

You know, um, although they do now, all of them have a shape based way to define your process. Go here, here. You know, here's a circle which represents this. Uh, let's do that. Um, or a way for you to kind of more, um, creatively define it in a, like a text-based way. When we talk about Java script, um, or anything like that, they provide predefined actions, all of them saying, I want to open a file or execute this that you can do, but all of them as well, at least last time I checked also allow you for a way to say, I want to programmatically run something I want to define.

And since they're all in the browser, it is, uh, you know, Javascript that you're going to be saying, Hey, run this, this JavaScript, run this function. Um, previously, uh, things like automation anywhere would, uh, let you write stuff in, in .NET essentially to do that capability. But again, now everything's in the browser.

So yeah, they do, They do provide for a capability to introduce more low level capability to your automation. That can get dangerous. Uh, it can be powerful and it can be stabilizing, but it can be a very slippery slope where you have an RPA solution bot that does the thing. But really all it does is it starts up and then executes code that you built.

[00:57:39] Alex: Like what, what was the, the point in the first place?

[00:57:43] Jeremy: Yeah. And I suppose at that point, then anybody who knows how to use the RPA tool, but isn't familiar with that code you wrote, they're just, they can't maintain it

[00:57:54] Alex: you have business continuity and this goes back to our, it has to be replicable or close as close to the human process, as you can make. Because that's going to be the easiest to inherit and support. That's one of the great things about it. Whereas if you're a low level programmer, a dev who says, I can easily do this with a couple of lines of, you know, dot net or, you know, TypeScript or whatever.

And so the bot just starts up in executes. Well, unless someone that is just as proficient comes along later and says, this is why it's breaking you now have an unsupportable business, solution. that's bad Juju.

[00:58:38] Jeremy: you have the software engineers who they want to write code. then you have the people who are either in business or in IT that go, I don't want to look at your code.

I don't want to have to maintain it. Yeah. So it's like you almost, if you're a software engineer coming in, you almost have to fight that urge to, to write anything yourself and figure out, okay, what can I do with the tool set and only go to code if I can't do it any other way.

[00:59:07] Alex: That's correct. And that's the, it takes discipline. more often than not, not as fun as writing the code where you're like, I can do this. And this is really where the wheels come off is. You went to the business that is that I have this process, very simple. I need to do this and you say, cool, I can do that.

And then you're sitting there writing code and you're like, but you know what? I know what they really want to do. And I can write that now. And so you've changed the process and while it is, and nine times out of 10, the business will be like, oh, that's actually what we wanted. The human process was just as close as we could get nothing else, but you're right.

That's, that's exactly what we needed. Thank you nine times out of 10. They'll love you for that. But now you own their process. Now you're the one that defined it. You have to do the business continuity. You have to document it. And when it falls over, you have to pick it back up and you have to retrain.

And unless you have an organizational capacity to say, okay, I've gone in and changed your process. I didn't automate it. I changed it. Now I have to go in and tell you how I changed it and how you can do it. And so that, unless you have built your robotic operating model and your, your team to afford for that, your developer could be writing checks bigger than they can cash.

Even though this is a better capability.

[01:00:30] Jeremy: you, you sort of touched on this before, and I think this is probably the, the last topic we'll cover, but you've been saying how the end goal should be to not have to use the RPAs anymore

And I wonder if you have any advice for how to approach that process and, and what are some of the mistakes you've seen people make

[01:00:54] Alex: Mm Hmm. I mean the biggest mistake I've seen organizations make, I think is throwing the RPA solution out there, building bots, and they're great bots, and they are creating that value. They're enabling you to save money and also, enabling your employees to go on and do better, more gratifying work. but then they say, that's, it that's as far as we're going to think, instead of taking those savings and saying, this is for replacing this pain point that we had to get a bot in the first place to do so.

That's a huge common mistake. Absolutely understandable if I'm a CEO or even, you know, the person in charge of, you know, um, enterprise transformation. Um, it's very easy for me to say, ha victory, here's our money, here's our savings. I justified what we've done. Go have fun. Um, and instead of saying, we need to squirrel this money away and give it to the people that are going to change the system. So that, that's definitely one of the biggest things.

The problem with that is that's not realized until years later when they're like, oh, we're still supporting these bots. So it is upfront having a turnoff strategy. When can we turn this bot off? What is that going to look like? Does it have a roadmap that will eventually do that?

And that I think is the best way. And that will define what kind of processes you do indeed build bots for is you go to it and say, listen, we've got a lot of these user processes, human processes that are doing this stuff. Is there anything on your roadmap that is going to replace that and they say, oh yeah you know, in three years we're actually going to be standing up our new thing.

We're going to be converting. And part of our, uh, analysis of the solution that we will eventually stand up will be, does it do these things? And so yes, in three years, you're good. And you say, cool, those are the processes I'm going to automate and we can shut those off.

That's your point of entry for these things not doing that leads to bots running and doing things even after there is a enterprise solution for that. And more often than not, I would say greater than five times out of 10, when we are evaluating a process to build a bot for easily five times out of 10, we say, whoa, no, actually there's, you don't even need to do this.

Our enterprise application can do this. you just need retraining, because your process is just old and no one knew you were doing this. And so they didn't come in and tell you, Hey, you need to use this.

So that's really a lot of times what, what the issue is. And then after that, we go in and say, Okay.

no, there's, there's no solution for this. This is definitely a bot needs to do this. Let's make sure number one, that there isn't a solution on the horizon six months to a year, because otherwise we're just going to waste time, but let's make sure there is, or at least IT, or the people in charge are aware that this is something that needs to be replaced bot or no bot.

And so let's have an exit strategy. Let's have a turn-off strategy.

When you have applications that are relatively modern, like you have a JIRA, a ServiceNow, you know, they must have some sort of API and it may just be that nobody has come in and told them, you just need to plug these applications together.

[01:04:27] Alex: And so kind of what you're hitting on and surfacing is the future of RPA. Whereas everything we're talking about is using a bot to essentially bridge a gap, moving data from here to there that can't be done, programmatically. Accessing something from here to there that can't be done programmatically.

So we use a bot to do it. That's only going to exist for so long. Legacy can only be legacy for so long, although you can conceivably because we had that big COBOL thing, um, maybe longer than we we'd all like, but eventually these things will be. upgraded. and so either the RPA market will get smaller because there's less legacy out there.

And so RPA as a tool and a solution will become much more targeted towards specific systems or we expand what RPA is and what it can afford for. And so that I think is more likely the case. And that's the future where bots or automations aren't necessary interpreting the COM and the DOM and saying, okay, click here do that.

But rather you're able to quickly build bots that utilize APIs that are built in and friendly. And so what we're talking about there is things like Appian or MuleSoft, which are these kind of API integrators are eventually going to be classified as RPA. They're going to be within this realm.

And I think, where, where you're seeing that at least surfaced or moving towards is really what Microsoft's offering in that, where they, uh, they have something called power automate, which essentially is it just a very user-friendly way to access API. that they built or other people have built.

So I want to go and I need to get information to service now, service now has an API. Yeah. Your, IT can go in and build you a nice little app that does a little restful call to it, or a rest API call to it gets information back, or you can go in and, you know, use Microsoft power automate and say, okay, I want to access service now.

And it says, cool. These are the things you can do. And I say, okay, I just want to put information in this ticket and we're not talking about get or patch or put, uh, or anything like that. We're just saying, ah, that's what it's going to do. And that's kind of what Microsoft is, is offering. I think that is the new state of RPA is being able to interface in a user-friendly way with APIs. Cause everything's in the browser to the point. where, you know, Microsoft's enabling add ins for Excel to be written in JavaScript, which is just the new frontier. Um, so that's, that's kind of going to be the future state of this. I believe.

[01:07:28] Jeremy: so, so moving from RPAs being this thing, that's gonna click through website, click through, um, a desktop application instead it's maybe more of this high, higher level tool where the user will still get this, I forget the term you used, but this tool to build a workflow, right. A studio. Okay. Um, and instead of saying, oh, I want this to click this button or fill in this form.

It'll be, um, I want to get this information from service now. And I want to send a message using that information to slack or to Twilio, or, um, you're basically, talking directly to these different services and just telling it what you want and where it should go.

[01:08:14] Alex: That's correct. So, as you said, everything's going to have an API, right? Seemingly everything has an API. And so instead of us, our RPA bots or solutions being UI focused, they're going to be API focused, um, where it doesn't have to use the user interface. It's going to use the other service. And again, the cool thing about APIs in that way is that it's not, directly connecting to your data source.

It's the same as your UI for a user. It sits on top of it. It gets the request and it correctly interprets that. And does it the same thing with your UI where I say I click here and you know, wherever it says. okay. yeah. You're allowed to do that. Go ahead. So that's kind of that the benefit to that.

Um, but to your point, the, the user experience for whether you're using a UI or API to build up RPA bot, it's going to be the same experience for the user. And then at this point, what we're talking about, well, where's the value offering or what is the value proposition of RPA and that's orchestration and monitoring and data essentially.

we'll take care of hosting these for you. we'll take care of where they're going to run, uh, giving you a dashboard, things like that.

[01:09:37] Alex: That's a hundred percent correct. It's it's providing a view into that thing and letting the business say, I want to no code this. And I want to be able to just go in and understand and say, oh, I do want to do that. I'm going to put these things together and it's going to automate this business process that I hate, but is vital, and I'm going to save it, the RPA software enables you to say, oh, I saw they did that. And I see it's running and everything's okay in the world and I want to turn it on or off. And so it's that seamless kind of capability that that's what that will provide.

And I think that's really where it isn't, but really where it's going. Uh, it'll be interesting to see when the RPA providers switch to that kind of language because currently and traditionally they've gone to business and said, we can build you bots or no, no, your, your users can build bots and that's the value proposition they can go in.

And instead of writing an Excel where you had one very, very advanced user. Building macros into Excel with VBA and they're unknown to the, the IT or anybody else instead, you know, build a bot for it. And so that's their business proposition today.

Instead, it's going to shift, and I'd be interested to see when it shifts where they say, listen, we can provide you a view into those solutions and you can orchestrate them in, oh, here's the studio that enables people to build them.

But really what you want to do is give that to your IT and just say, Hey, we're going to go over here and address business needs and build them. But don't worry. You'll be able to monitor them and at least say, yeah okay. this is, this is going.

[01:11:16] Jeremy: Yeah. And that's a, a shift. It sounds like where RPA is currently, you were talking about how, when you're configuring them to click on websites and GUIs, you really do still need someone with the software expertise to know what's going on. but maybe when you move over to communicating with API, Um, maybe that won't be as important maybe, somebody who just knows the business process really can just use that studio and get what they need.

[01:11:48] Alex: that's correct. Right. Cause the API only enables you to do what it defined right. So service now, which does have a robust API, it says you can do these things the same as a user can only click a button that's there that you've built and said they can click. And so that is you can't go off the reservation as easy with that stuff, really what's going to become prime or important is as no longer do I actually have an Oracle server physically in my location with a database.

Instead I'm using Oracle's cloud capability, which exists on their own thing. That's where I'm getting data from. What becomes important about being able to monitor these is not necessarily like, oh, is it falling over? Is it breaking? It's saying, what information are you sending or getting from these things that are not within our walled garden.

And that's really where, it or the P InfoSec is, is going to be maybe the main orchestrator owner of RPA, because they're, they're going to be the ones to say you can't, you can't get that. You're not allowed to get that information. It's not necessarily that you can't do it. Um, and you can't do it in a dangerous way, but it's rather, I don't want you transporting that information or bringing it in.

So that's, that's really, what's the what's going to change.

[01:13:13] Jeremy: I think that's a good place to wrap it up, but, uh, is there anything we missed or anything else you want to plug before we go?

[01:13:21] Alex: No. Uh, I think this was uh pretty comprehensive and I really enjoyed it.

Alex thanks for coming on the show

[01:13:28] Alex: No, thank you for having me. It's been, it's been a joy.

[01:13:31] Jeremy: This has been Jeremy Jung for software engineering radio. Thanks for listening.

Mar 02 | 01:20:27

Luca Casonato on Deno

Jan 10 | 01:37:55

Megan Cutrofello on Leaguepedia

Jan 02 | 01:50:47

Victor Adossi on Yak Shaving

Oct 01 | 00:54:10

Xe Iaso on Tailscale

Sep 09 | 00:55:19

Jonathan Shariat on Tragic Design

Aug 17 | 00:57:52

Randy Shoup on Evolving Architecture at eBay

May 11 | 00:59:08

Ant Wilson on Supabase

Apr 04 | 01:09:17

Testing with Jason Swett

Nov 10 | 00:49:23

Taking Notes on Serverless

Oct 28 | 01:07:04

Robotic Process Automation with Alexander Pugh

This podcast could use a review!

This podcast could use a review! Have anything to say about it? Share your thoughts using the button below.

Submit Review

Topics covered:

Luca

Deno

Deno Users

Other related links

Transcript

What's a runtime?

How V8 and JavaScript runtimes are related

Concurrent JavaScript execution

Why create a new runtime? (standards compliance)

WinterCG (W3C group for server side JavaScript)

Is Node part of WinterCG?

Why it takes so long to ship new features in Node

Write once, have it run on any hosting platform

Why bundle so many things into Deno?

We should settle on what tools to use

Node compatibility layer

Node shims and testing

Performance should be equal or better

What should go into the standard library?

WebAssembly use cases

Deno Deploy

Running code in V8 isolates

Configuring permissions

Why deploy your application on the edge?

Consistency in local development vs deployment

Caching options

Working with CDNs

Workloads currently not handled by Deno Deploy (Coming soon)

Who's using deno?

Topics covered:

Related Links

MediaWiki extensions

Conference Talks

Other podcast appearances

Transcript

Topics covered:

Victor

Victor's preferred stack

Search engines

JavaScript build tools

JavaScript frameworks

Frameworks built on top of frameworks

Historical JavaScript tools and frameworks

Related Links

Transcript