Towards Data Science

Help
Help

Suggested Topics
- What is Podknife?
  
  Podknife is a curated podcast information and review site designed to be accessible in your browser from any device. We’re working to build the most useful podcast information source available by providing you with as much publicly available information about each podcast in our database as we can find and keeping it as up to date as possible.
- Can I listen to podcasts on Podknife?
  
  Yeah! There's a player at the bottom of the page.
- How can I get my podcast on Podknife?
  
  We’d love to hear about your podcast and consider it for addition. You can suggest your own by logging in and selecting the “Suggest a Podcast” link from the menu. Fill out the form and we’ll take it from there.
- How can I submit a review of a podcast?
  
  To submit a review of a podcast, go to the page of the podcast you want to review and click the "Submit Review" button below the image associated with the podcast. Clicking that button brings you to a new page where you can select a rating between 1 and 5 stars and write a brief review.
- How can I submit a review of a podcast episode?
  
  To submit a review of a podcast episode, go to the page of the podcast whose episode you want to review and find the specific episode you're looking for. Clicking on any episode reveals a "View Episode" button and clicking on that brings you to an episode-specific page where you can find more information displayed about the episode along with a "Submit Review" button. Clicking that brings you to a new page where you can select a rating between 1 and 5 stars and write a brief review of the episode.
- Is there a Podknife app?
  
  Not yet - we're keeping it in the browser at least for now. No app download required to access anything on the site.
- How can I change my display name?
  
  To change your display name, log in to Podknife and select "Account Settings" in the dropdown menu. From the Account Settings page, enter your desired new display name and press the "Save" button to confirm the change.
- How can I change my password?
  
  To change your password, log in to Podknife and select "Account Settings" in the dropdown menu. From the Account Settings page, enter your desired new password in the two fields below "Update Password" and press the "Save" button to confirm the change.
- How can I keep track of all my reviews?
  
  Within the menu, click on "My Profile", you will then be able to check all your reviews for both podcast & episode. You can also check all your favorite podcasts.
- How can I keep track of all my favorited podcasts?
  
  You can find podcasts you’ve favorited displayed on the My Profile page associated with your account (https://podknife.com/profile).
  You can share your My Profile page displaying your favorite podcasts as well as any reviews you’ve written publicly with a link in the form https://podknife.com/users/[displayname] where [displayname] is replaced by your Display Name.
- How can I favorite a podcast?
  
  On each podcast page you’ll find the outline of a star above the podcast title. Clicking this star fills it in and designates the podcast as a favorite (for logged in users) or invites the user to log in or register for an account (for users who are not logged in). You can find favorited podcasts displayed on the My Profile page associated with your account (https://podknife.com/profile).
- How can I register for a Podknife account?
  
  To register, click on the “Register” button in the top right of your browser window (or visit https://podknife.com/users/sign_up). Enter a Display Name to be associated with your account on the site (alongside any reviews you contribute, for example), an Email address to associate with the account and your password (twice for confirmation). Press the “Confirm” button and you should be all set, logged in and ready to review and favorite podcasts as you wish.
- How can I get in touch?
  
  We’d love to hear from you. You can reach us using the Feedback form (https://podknife.com/feedbacks/new) which can be found either in the footer on each page (for users who are not logged in) or in the dropdown menu (for logged in users).
  
  You can also reach us on Twitter by tweeting us @podknife (https://twitter.com/podknife) or on Facebook by messaging us from our page (https://www.facebook.com/podknife/).
- Log In
- Register
- Feedback
- Help
- Privacy Policy
- Terms of Use
© Podknife 2025

This podcast currently has no reviews.

Submit Review

Favorite Report Add all episodes to Queue

Towards Data Scienceinactive

Publisher |: The TDS team

Media Type |: audio

Podknife tags |: Artificial Intelligence,; Data Science,; Interview,; Machine Learning,; Technology

Categories Via RSS |: Technology

Note: The TDS podcast's current run has ended.

Researchers and business leaders at the forefront of the field unpack the most pressing questions around data science and AI.

Premiere Date |: 2019-07-16
Related Hashtags |: #ChatGPT,; #DataScience,; #MachineLearning,; #ML,; #Python
Frequency |: Weekly
Explicit |: No

This podcast currently has no reviews.

Submit Review

131 Available Episodes (131 Total) / Average duration: 00:49:38

Date Duration Most Reviewed Highest Rated

131 Available Episodes (131 Total)Average duration: 00:49:38

Oct 19, 2022

00:32:26

On the last episode of the Towards Data Science Podcast, host Jeremie Harris offers his perspective on the last two years of AI progress, and what he thinks it means for everything, from AI safety to the future of humanity. Going forward, Jeremie will be exploring these topics on the new Gladstone AI podcast.

***

Intro music:

- Artist: Ron Gelinas

- Track Title: Daybreak Chill Blend (original mix)

- Link to Track: https://youtu.be/d8Y2sKIgFWc

Chapters:

0:00 Intro
6:00 The Bitter Lesson
10:00 The introduction of GPT-3
16:45 AI catastrophic risk (paper clip example)
23:00 Reward hacking
27:30 Approaching intelligence
32:00 Wrap-up

Links

- The new Gladstone AI podcast, where I’ll be talking about one new, cutting-edge AI model each week in plain English (its use cases, its potential malicious applications, and its relevance to AI alignment risk).

- 80,000 Hours: a website where you can get advice on how to contribute to solving AI safety and AI policy problems.

- Concrete Problems in AI Safety: an oldie but a goodie, that introduces a lot of the central problems in AI alignment that remain open to this day.

Oct 12, 2022

00:58:22

Progress in AI has been accelerating dramatically in recent years, and even months. It seems like every other day, there’s a new, previously-believed-to-be-impossible feat of AI that’s achieved by a world-leading lab. And increasingly, these breakthroughs have been driven by the same, simple idea: AI scaling.

For those who haven’t been following the AI scaling sage, scaling means training AI systems with larger models, using increasingly absurd quantities of data and processing power. So far, empirical studies by the world’s top AI labs seem to suggest that scaling is an open-ended process that can lead to more and more capable and intelligent systems, with no clear limit.

And that’s led many people to speculate that scaling might usher in a new era of broadly human-level or even superhuman AI — the holy grail AI researchers have been after for decades.

And while that might sound cool, an AI that can solve general reasoning problems as well as or better than a human might actually be an intrinsically dangerous thing to build.

At least, that’s the conclusion that many AI safety researchers have come to following the publication of a new line of research that explores how modern AI systems tend to solve problems, and whether we should expect more advanced versions of them to perform dangerous behaviours like seeking power.

This line of research in AI safety is called “power-seeking”, and although it’s currently not well understood outside the frontier of AI safety and AI alignment research, it’s starting to draw a lot of attention. The first major theoretical study of power seeking was led by Alex Turner, who’s appeared on the podcast before, and was published in NeurIPS (the world’s top AI conference), for example.

And today, we’ll be hearing from Edouard Harris, an AI alignment researcher and one of my co-founders in the AI safety company (Gladstone AI). Ed’s just completed a significant piece of AI safety research that extends Alex Turner’s original power-seeking work, and that shows what seems to be the first experimental evidence suggesting that we should expect highly advanced AI systems to seek power by default.

What does power seeking really mean though? And does all this imply for the safety of future, general-purpose reasoning systems? That’s what this episode will be all about.

***

Intro music:

- Artist: Ron Gelinas

- Track Title: Daybreak Chill Blend (original mix)

- Link to Track: https://youtu.be/d8Y2sKIgFWc

***

Chapters:

- 0:00 Intro

- 4:00 Alex Turner's research

- 7:45 What technology wants

- 11:30 Universal goals

- 17:30 Connecting observations

- 24:00 Micro power seeking behaviour

- 28:15 Ed's research

- 38:00 The human as the environment

- 42:30 What leads to power seeking

- 48:00 Competition as a default outcome

- 52:45 General concern

- 57:30 Wrap-up

Oct 05, 2022

00:51:21

It’s no secret that a new generation of powerful and highly scaled language models is taking the world by storm. Companies like OpenAI, AI21Labs, and Cohere have built models so versatile that they’re powering hundreds of new applications, and unlocking entire new markets for AI-generated text.

In light of that, I thought it would be worth exploring the applied side of language modelling — to dive deep into one specific language model-powered tool, to understand what it means to build apps on top of scaled AI systems. How easily can these models be used in the wild? What bottlenecks and challenges do people run into when they try to build apps powered by large language models? That’s what I wanted to find out.

My guest today is Amber Teng, and she’s a data scientist who recently published a blog that got quite a bit of attention, about a resume cover letter generator that she created using GPT-3, OpenAI’s powerful and now-famous language model. I thought her project would be make for a great episode, because it exposes so many of the challenges and opportunities that come with the new era of powerful language models that we’ve just entered.

So today we’ll be exploring exactly that: looking at the applied side of language modelling and prompt engineering, understanding how large language models have made new apps not only possible but also much easier to build, and the likely future of AI-powered products.

***

Intro music:

- Artist: Ron Gelinas

- Track Title: Daybreak Chill Blend (original mix)

- Link to Track: https://youtu.be/d8Y2sKIgFWc

***

Chapters:

- 0:00 Intro

- 2:30 Amber’s background

- 5:30 Using GPT-3

- 14:45 Building prompts up

- 18:15 Prompting best practices

- 21:45 GPT-3 mistakes

- 25:30 Context windows

- 30:00 End-to-end time

- 34:45 The cost of one cover letter

- 37:00 The analytics

- 41:45 Dynamics around company-building

- 46:00 Commoditization of language modelling

- 51:00 Wrap-up

Sep 28, 2022

00:49:02

Imagine you’re a big hedge fund, and you want to go out and buy yourself some data. Data is really valuable for you — it’s literally going to shape your investment decisions and determine your outcomes.

But the moment you receive your data, a cold chill runs down your spine: how do you know your data supplier gave you the data they said they would? From your perspective, you’re staring down 100,000 rows in a spreadsheet, with no way to tell if half of them were made up — or maybe more for that matter.

This might seem like an obvious problem in hindsight, but it’s one most of us haven’t even thought of. We tend to assume that data is data, and that 100,000 rows in a spreadsheet is 100,000 legitimate samples.

The challenge of making sure you’re dealing with high-quality data, or at least that you have the data you think you do, is called data observability, and it’s surprisingly difficult to solve for at scale. In fact, there are now entire companies that specialize in exactly that — one of which is Zectonal, whose co-founder Dave Hirko will be joining us for today’s episode of the podcast.

Dave has spent his career understanding how to evaluate and monitor data at massive scale. He did that first at AWS in the early days of cloud computing, and now through Zectonal, where he’s working on strategies that allow companies to detect issues with their data — whether they’re caused by intentional data poisoning, or unintentional data quality problems. Dave joined me to talk about data observability, data as a new vector for cyberattacks, and the future of enterprise data management on this episode of the TDS podcast.

***

Intro music:

- Artist: Ron Gelinas

- Track Title: Daybreak Chill Blend (original mix)

- Link to Track: https://youtu.be/d8Y2sKIgFWc

***

Chapters:

0:00 Intro
3:00 What is data observability?
10:45 “Funny business” with data providers
12:50 Data supply chains
16:50 Various cybersecurity implications
20:30 Deep data inspection
27:20 Observed direction of change
34:00 Steps the average person can take
41:15 Challenges with GDPR transitions
48:45 Wrap-up

Sep 21, 2022

00:41:34

Today, we live in the era of AI scaling. It seems like everywhere you look people are pushing to make large language models larger, or more multi-modal and leveraging ungodly amounts of processing power to do it.

But although that’s one of the defining trends of the modern AI era, it’s not the only one. At the far opposite extreme from the world of hyperscale transformers and giant dense nets is the fast-evolving world of TinyML, where the goal is to pack AI systems onto small edge devices.

My guest today is Matthew Stewart, a deep learning and TinyML researcher at Harvard University, where he collaborates with the world’s leading IoT and TinyML experts on projects aimed at getting small devices to do big things with AI. Recently, along with his colleagues, Matt co-authored a paper that introduced a new way of thinking about sensing.

The idea is to tightly integrate machine learning and sensing on one device. For example, today we might have a sensor like a camera embedded on an edge device, and that camera would have to send data about all the pixels in its field of view back to a central server that might take that data and use it to perform a task like facial recognition. But that’s not great because it involves sending potentially sensitive data — in this case, images of people’s faces — from an edge device to a server, introducing security risks.

So instead, what if the camera’s output was processed on the edge device itself, so that all that had to be sent to the server was much less sensitive information, like whether or not a given face was detected? These systems — where edge devices harness onboard AI, and share only processed outputs with the rest of the world — are what Matt and his colleagues call ML sensors.

ML sensors really do seem like they’ll be part of the future, and they introduce a host of challenging ethical, privacy, and operational questions that I discussed with Matt on this episode of the TDS podcast.

***

Intro music:

- Artist: Ron Gelinas

- Track Title: Daybreak Chill Blend (original mix)

- Link to Track: https://youtu.be/d8Y2sKIgFWc

***

Chapters:

- 3:20 Special challenges with TinyML

- 9:00 Most challenging aspects of Matt’s work

- 12:30 ML sensors

- 21:30 Customizing the technology

- 24:45 Data sheets and ML sensors

- 31:30 Customers with their own custom software

- 36:00 Access to the algorithm

- 40:30 Wrap-up

Sep 14, 2022

00:55:43

Deep learning models — transformers in particular — are defining the cutting edge of AI today. They’re based on an architecture called an artificial neural network, as you probably already know if you’re a regular Towards Data Science reader. And if you are, then you might also already know that as their name suggests, artificial neural networks were inspired by the structure and function of biological neural networks, like those that handle information processing in our brains.

So it’s a natural question to ask: how far does that analogy go? Today, deep neural networks can master an increasingly wide range of skills that were historically unique to humans — skills like creating images, or using language, planning, playing video games, and so on. Could that mean that these systems are processing information like the human brain, too?

To explore that question, we’ll be talking to JR King, a CNRS researcher at the Ecole Normale Supérieure, affiliated with Meta AI, where he leads the Brain & AI group. There, he works on identifying the computational basis of human intelligence, with a focus on language. JR is a remarkably insightful thinker, who’s spent a lot of time studying biological intelligence, where it comes from, and how it maps onto artificial intelligence. And he joined me to explore the fascinating intersection of biological and artificial information processing on this episode of the TDS podcast.

***

Intro music:

- Artist: Ron Gelinas

- Track Title: Daybreak Chill Blend (original mix)

- Link to Track: https://youtu.be/d8Y2sKIgFWc

***

Chapters:

2:30 What is JR’s day-to-day?
5:00 AI and neuroscience
12:15 Quality of signals within the research
21:30 Universality of structures
28:45 What makes up a brain?
37:00 Scaling AI systems
43:30 Growth of the human brain
48:45 Observing certain overlaps
55:30 Wrap-up

Sep 07, 2022

00:48:19

It’s no secret that the US and China are geopolitical rivals. And it’s also no secret that that rivalry extends into AI — an area both countries consider to be strategically critical.

But in a context where potentially transformative AI capabilities are being unlocked every few weeks, many of which lend themselves to military applications with hugely destabilizing potential, you might hope that the US and China would have robust agreements in place to deal with things like runaway conflict escalation triggered by an AI powered weapon that misfires. Even at the height of the cold war, the US and Russia had robust lines of communication to de-escalate potential nuclear conflicts, so surely the US and China have something at least as good in place now… right?

Well they don’t, and to understand the reason why — and what we should do about it — I’ll be speaking to Ryan Fedashuk, a Research Analyst at Georgetown University’s Center for Security and Emerging Technology and Adjunct Fellow at the Center for a New American Security. Ryan recently wrote a fascinating article for Foreign Policy Magazine, where he outlines the challenges and importance of US-China collaboration on AI safety. He joined me to talk about the U.S. and China’s shared interest in building safe AI, how reach side views the other, and what realistic China AI policy looks like on this episode of the TDs podcast.

May 18, 2022

00:51:47

There’s a website called thispersondoesnotexist.com. When you visit it, you’re confronted by a high-resolution, photorealistic AI-generated picture of a human face. As the website’s name suggests, there’s no human being on the face of the earth who looks quite like the person staring back at you on the page.

Each of those generated pictures are a piece of data that captures so much of the essence of what it means to look like a human being. And yet they do so without telling you anything whatsoever about any particular person. In that sense, it’s fully anonymous human face data.

That’s impressive enough, and it speaks to how far generative image models have come over the last decade. But what if we could do the same for any kind of data?

What if I could generate an anonymized set of medical records or financial transaction data that captures all of the latent relationships buried in a private dataset, without the risk of leaking sensitive information about real people? That’s the mission of Alex Watson, the Chief Product Officer and co-founder of Gretel AI, where he works on unlocking value hidden in sensitive datasets in ways that preserve privacy.

What I realized talking to Alex was that synthetic data is about much more than ensuring privacy. As you’ll see over the course of the conversation, we may well be heading for a world where most data can benefit from augmentation via data synthesis — where synthetic data brings privacy value almost as a side-effect of enriching ground truth data with context imported from the wider world.

Alex joined me to talk about data privacy, data synthesis, and what could be the very strange future of the data lifecycle on this episode of the TDS podcast.

***

Intro music:

- Artist: Ron Gelinas

- Track Title: Daybreak Chill Blend (original mix)

- Link to Track: https://youtu.be/d8Y2sKIgFWc

***

Chapters:

2:40 What is synthetic data?
6:45 Large language models
11:30 Preventing data leakage
18:00 Generative versus downstream models
24:10 De-biasing and fairness
30:45 Using synthetic data
35:00 People consuming the data
41:00 Spotting correlations in the data
47:45 Generalization of different ML algorithms
51:15 Wrap-up

May 12, 2022

00:54:43

Two ML researchers with world-class pedigrees who decided to build a company that puts AI on the blockchain. Now to most people — myself included — “AI on the blockchain” sounds like a winning entry in some kind of startup buzzword bingo. But what I discovered talking to Jacob and Ala was that they actually have good reasons to combine those two ingredients together.

At a high level, doing AI on a blockchain allows you to decentralize AI research and reward labs for building better models, and not for publishing papers in flashy journals with often biased reviewers.

And that’s not all — as we’ll see, Ala and Jacob are taking on some of the thorniest current problems in AI with their decentralized approach to machine learning. Everything from the problem of designing robust benchmarks to rewarding good AI research and even the centralization of power in the hands of a few large companies building powerful AI systems — these problems are all in their sights as they build out Bittensor, their AI-on-the-blockchain-startup.

Ala and Jacob joined me to talk about all those things and more on this episode of the TDS podcast.

---

Intro music:

- Artist: Ron Gelinas

- Track Title: Daybreak Chill Blend (original mix)

- Link to Track: https://youtu.be/d8Y2sKIgFWc

---

Chapters:

2:40 Ala and Jacob’s backgrounds
4:00 The basics of AI on the blockchain
11:30 Generating human value
17:00 Who sees the benefit? 22:00 Use of GPUs
28:00 Models learning from each other
37:30 The size of the network
45:30 The alignment of these systems
51:00 Buying into a system
54:00 Wrap-up

May 04, 2022

00:43:02

As you might know if you follow the podcast, we usually talk about the world of cutting-edge AI capabilities, and some of the emerging safety risks and other challenges that the future of AI might bring. But I thought that for today’s episode, it would be fun to change things up a bit and talk about the applied side of data science, and how the field has evolved over the last year or two.

And I found the perfect guest to do that with: her name is Sadie St. Lawrence, and among other things, she’s the founder of Women in Data — a community that helps women enter the field of data and advance throughout their careers — and she’s also the host of the Data Bytes podcast, a seasoned data scientist and a community builder extraordinaire. Sadie joined me to talk about her founder’s journey, what data science looks like today, and even the possibilities that blockchains introduce for data science on this episode of the towards data science podcast.

***

Intro music:

- Artist: Ron Gelinas

- Track Title: Daybreak Chill Blend (original mix)

- Link to Track: https://youtu.be/d8Y2sKIgFWc

***

Chapters:

2:00 Founding Women in Data
6:30 Having gendered conversations
11:00 The cultural aspect
16:45 Opportunities in blockchain
22:00 The blockchain database
32:30 Data science education
37:00 GPT-3 and unstructured data
39:30 Data science as a career
42:50 Wrap-up

Oct 19 | 00:32:26

131. Jeremie Harris - TDS Podcast Finale: The future of AI, and the risks that come with it

Oct 12 | 00:58:22

130. Edouard Harris - New Research: Advanced AI may tend to seek power *by default*

Oct 05 | 00:51:21

129. Amber Teng - Building apps with a new generation of language models

Sep 28 | 00:49:02

128. David Hirko - AI observability and data as a cybersecurity weakness

Sep 21 | 00:41:34

127. Matthew Stewart - The emerging world of ML sensors

Sep 14 | 00:55:43

126. JR King - Does the brain run on deep learning?

Sep 07 | 00:48:19

125. Ryan Fedasiuk - Can the U.S. and China collaborate on AI safety?

May 18 | 00:51:47

124. Alex Watson - Synthetic data could change everything

May 12 | 00:54:43

123. Ala Shaabana and Jacob Steeves - AI on the blockchain (it actually might just make sense)

May 04 | 00:43:02

122. Sadie St. Lawrence - Trends in data science

This podcast could use a review!

This podcast could use a review! Have anything to say about it? Share your thoughts using the button below.

Submit Review