This podcast currently has no reviews.
Submit ReviewThis podcast currently has no reviews.
Submit ReviewAlex Watson is the co-founder and CEO of Gretel.ai, a startup that offers APIs for creating anonymized and synthetic datasets. Previously he was the founder of Harvest.ai, whose product Macie, an analytics platform protecting against data breaches, was acquired by AWS.
Learn more about Alex and Gretel AI:
Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletter
Follow Charlie on Twitter: https://twitter.com/CharlieYouAI
Subscribe to ML Engineered: https://mlengineered.com/listen
Comments? Questions? Submit them here: http://bit.ly/mle-survey
Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/
Timestamps:
02:15 Introducing Alex Watson
03:45 How Alex was first exposed to programming
05:00 Alex's experience starting Harvest AI, getting acquired by AWS, and integrating their product at massive scale
21:20 How Alex first saw the opportunity for Gretel.ai
24:20 The most exciting use-cases for synthetic data
28:55 Theoretical guarantees of anonymized data with differential privacy
36:40 Combining pre-training with synthetic data
38:40 When to anonymize data and when to synthesize it
41:25 How Gretel's synthetic data engine works
44:50 Requirements of a dataset to create a synthetic version
49:25 Augmenting datasets with synthetic examples to address representation bias
52:45 How Alex recommends teams get started with Gretel.ai
59:00 Expected accuracy loss from training models on synthetic data
01:03:15 Biggest surprises from building Gretel.ai
01:05:25 Organizational patterns for protecting sensitive data
01:07:40 Alex's vision for Gretel's data catalog
01:11:15 Rapid fire questions
Links:
NetFlix Cancels Recommendation Contest After Privacy Lawsuit
Improving massively imbalanced datasets in machine learning with synthetic data
Deep dive on generating synthetic data for Healthcare
Radek Osmulski is a fully self-taught machine learning engineer. After getting tired of his corporate job, he taught himself programming and started a new career as a Ruby on Rails developer. He then set out to learn machine learning. Since then, he's been a Fast AI International Fellow, become a Kaggle Master, and is now an AI Data Engineer on the Earth Species Project.
Learn more about Radek:
https://twitter.com/radekosmulski
Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: http://cyou.ai/newsletter
Follow Charlie on Twitter: https://twitter.com/CharlieYouAI
Subscribe to ML Engineered: https://mlengineered.com/listen
Comments? Questions? Submit them here: http://bit.ly/mle-survey
Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/
Timestamps:
02:15 How Radek got interested in programming and computer science
09:00 How Radek taught himself machine learning
26:40 The skills Radek learned from Fast AI
39:20 Radek's recommendations for people learning ML now
51:30 Why Radek is writing a book
01:01:20 Radek's work at the Earth Species Project
01:10:15 How the ESP collects animal language data
01:21:05 Rapid fire questions
Links:
Universal Language Model Fine-tuning for Text Classification
learning-efficiently.html">How to do Machine Learning Efficiently
Rodrigo Rivera is a machine learning researcher at the Advanced Data Analytics in Science and Engineering Group at Skoltech and technical director of Samsung Next. He's previously been in data science and research leadership roles at companies all around the world including Rocket Internet and Philip-Morris.
Learn more about Rodrigo:
rivera.com/">https://rodrigo-rivera.com/
https://twitter.com/rodrigorivr
Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletter
Follow Charlie on Twitter: https://twitter.com/CharlieYouAI
Subscribe to ML Engineered: https://mlengineered.com/listen
Comments? Questions? Submit them here: http://bit.ly/mle-survey
Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/
Timestamps:
03:00 How Rodrigo got started in computer science and started his first company
10:40 Rodrigo's experiences leading data science teams at Rocket Internet and PMI
26:15 Leaving industry to get a PhD in machine learning
28:55 Data science collaboration between business and academia
32:45 Rodrigo's research interest in time series data
39:25 Topological data analysis
45:35 Framing effective research as a startup
48:15 Neural Prophet
01:04:10 The potential future of Julia for numerical computing
01:08:20 Most exciting opportunities for ML in industry
01:15:05 Rodrigo's advice for listeners
01:17:00 Rapid fire questions
Links:
Advanced Data Analytics in Science and Engineering Group
Dan Jeffries is the chief technical evangelist at Pachyderm, a leading data science platform. He's a prominent writer and speaker on all things related to the future. He's been in software for over two decades, many of those at Redhat, and is the founder of the AI Infrastructure Alliance and Practical AI Ethics.
Learn more about Dan:
https://twitter.com/Dan_Jeffries1
https://medium.com/@dan.jeffries
Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: http://cyou.ai/newsletter
Follow Charlie on Twitter: https://twitter.com/CharlieYouAI
Subscribe to ML Engineered: https://mlengineered.com/listen
Comments? Questions? Submit them here: http://bit.ly/mle-survey
Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/
Timestamps:
02:15 How Dan got started in computer science
06:50 What Dan is most excited about in AI
14:45 Where we are in the adoption curve of ML
20:40 The "Canonical Stack" of ML
32:00 Dan's goal for the AI Infrastructure Alliance
40:55 "Problems that ML startups don't know they're going to have"
49:00 Closed vs open source tools in the Canonical Stack
01:00:05 Building out the "boring" part of the infrastructure to enable exciting applications
01:08:40 Dan's practical approach to AI Ethics
01:23:50 Rapid fire questions
Links:
infrastructure.org/">AI Infrastructure Alliance
ai-ethics.org/">Practical AI Ethics Alliance
Rise of the Canonical Stack in Machine Learning
Rise of AI - The Age of AI in 2030
Willem Pienaar is the co-creator of Feast, the leading open source feature store, which he leads the development of as a tech lead at Tecton. Previously, he led the ML platform team at Gojek, a super-app in Southeast Asia.
Learn more:
https://twitter.com/willpienaar
Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletter
Follow Charlie on Twitter: https://twitter.com/CharlieYouAI
Subscribe to ML Engineered: https://mlengineered.com/listen
Comments? Questions? Submit them here: http://bit.ly/mle-survey
Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/
Timestamps:
02:15 How Willem got started in computer science
03:40 Paying for college by starting an ISP
05:25 Willem's experience creating Gojek's ML platform
21:45 Issues faced that led to the creation of Feast
26:45 Lessons learned building Feast
33:45 Integrating Feast with data quality monitoring tools
40:10 What it looks like for a team to adopt Feast
44:20 Feast's current integrations and future roadmap
46:05 How a data scientist would use Feast when creating a model
49:40 How the feature store pattern handles DAGs of models
52:00 Priorities for a startup's data infrastructure
55:00 Integrating with Amundsen, Lyft's data catalog
57:15 The evolution of data and MLOps tool standards for interoperability
01:01:35 Other tools in the modern data stack
01:04:30 The interplay between open and closed source offerings
Links:
Benedikt Koller is a self-professed "Ops guy", having spent over 12 years working in roles such as DevOps engineer, platform engineer, and infrastructure tech lead at companies like Stylight and Talentry in addition to his own consultancy KEMB. He's recently dove head first into the world of ML, where he hopes to bring his extensive ops knowledge into the field as the co-founder of Maiot, the company behind ZenML, an open source MLOps framework.
Learn more:
Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletter
Follow Charlie on Twitter: https://twitter.com/CharlieYouAI
Subscribe to ML Engineered: https://mlengineered.com/listen
Comments? Questions? Submit them here: http://bit.ly/mle-survey
Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/
Timestamps:
02:15 Introducing Benedikt Koller
05:30 What the "DevOps revolution" was
10:10 Bringing good Ops practices into ML projects
30:50 Pivoting from vehicle predictive analytics to open source ML tooling
34:35 Design decisions made in ZenML
39:20 Most common problems faced by applied ML teams
49:00 The importance of separating configurations from code
55:25 Resources Ben recommends for learning Ops
57:30 What to monitor in an ML pipelines
01:00:45 Why you should run experiments in automated pipelines
01:08:20 The essential components of an MLOps stack
01:10:25 Building an open source business and what's next for ZenML
01:20:20 Rapid fire questions
Links:
Josh Albrecht is the co-founder and CTO of Generally Intelligent, an independent research lab investigating the fundamentals of learning across humans and machines. Previously, he was the lead data architect at Addepar, CTO of CloudFab, and CTO of Sourceress, which Generally Intelligent is a pivot from.
Learn more about Josh:
http://generallyintelligent.ai/
Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://www.cyou.ai/newsletter
Follow Charlie on Twitter: https://twitter.com/CharlieYouAI
Subscribe to ML Engineered: https://mlengineered.com/listen
Comments? Questions? Submit them here: http://bit.ly/mle-survey
Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/
Timestamps:
02:15 Introducing Josh Albrecht
03:30 How Josh got started in computer science
06:35 Josh's first two startup attempts
09:15 The tech behind Sourceress, an AI recruiting platform
16:10 Pivoting from Sourceress to Generally Intelligent, an AI research lab
23:50 How Josh defines "general intelligence"
28:35 Why Josh thinks self-supervised learning is the current most promising research area
36:15 Generally Intelligent's immediate research roadmap: BYOL, simulated environments
59:20 How Josh thinks about creating an optimal research environment
01:11:35 The "why" behind starting an independent research lab
01:13:30 AI alignment
01:17:00 Rapid fire questions
Links:
Bootstrap your own latent: A new approach to self-supervised Learning
self-supervised-contrastive-learning.html">Understanding self-supervised and contrastive learning with "Bootstrap Your Own Latent" (BYOL)
Elena Samuylova and Emeli Dral are the co-founders of Evidently AI, where they build open source tools to analyze and monitor machine learning models. Elena was previously the head of the startup ecosystem at Yandex, director of business development at their data factory and chief product officer at Mechanica AI. Emeli was previously a data scientist at Yandex, chief data scientist at the data factory and Mechanica AI in addition to teaching machine learning both online and at multiple universities.
Learn more about Elena, Emeli, and Evidently AI:
https://twitter.com/elenasamuylova
Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: http://cyou.ai/newsletter
Follow Charlie on Twitter: https://twitter.com/CharlieYouAI
Subscribe to ML Engineered: https://mlengineered.com/listen
Comments? Questions? Submit them here: http://bit.ly/mle-survey
Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/
Timestamps:
02:15 How Emeli and Elena each got started in data science
07:10 Applying machine learning across a wide variety of industries at the Yandex Data Factory
14:55 Using ML for industrial process improvement
23:35 Challenges encountered in industrial ML and technical solutions
27:15 The huge opportunity for ML in manufacturing
34:35 How to ensure safety when using models in physical systems
37:40 Why they started working on tools for data and ML monitoring
42:50 Different kinds of data drift and how to address them
48:25 Common mistakes ML teams make in monitoring
55:25 Features of Evidently AI's library
57:35 Building open source software
01:02:25 Technical roadmap for Evidently
01:05:50 Monitoring complex data
01:08:50 Business roadmap for Evidently
01:11:35 Rapid fire questions
Links:
Harikrishna Narayanan is the co-founder of a YC-backed stealth startup. He was previously a Principal Engineer at Yahoo, a Director in Workday's Machine Learning organization, and holds an M.S. from Georgia Tech.
Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://cyou.ai/newsletter
Follow Charlie on Twitter: https://twitter.com/CharlieYouAI
Subscribe to ML Engineered: https://mlengineered.com/listen
Comments? Questions? Submit them here: http://bit.ly/mle-survey
Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/
Timestamps:
02:45 How Hari got started in computer science and machine learning
06:00 Making the transition from IC to manager
14:35 What it means to be an effective engineering manager
19:20 Differences in managing machine learning vs traditional software teams
24:30 The importance of explaining complicated topics simply
30:15 How he thinks about hiring for data science and machine learning
36:50 Mistakes Workday made as it adopted machine learning
41:50 Essential skills for machine learning engineers
54:05 Why the future of AI is augmentation, not automation
58:30 His experience so far with YC
01:02:00 Rapid fire questions
Links:
to-great.html#articletop">Good to Great
The Hard Thing About Hard Things
Learn more about the ML Ops Community: https://mlops.community/
Every Thursday I send out the most useful things I’ve learned, curated specifically for the busy machine learning engineer. Sign up here: https://cyou.ai/newsletter
Follow Charlie on Twitter: https://twitter.com/CharlieYouAI
Subscribe to ML Engineered: https://mlengineered.com/listen
Comments? Questions? Submit them here: http://bit.ly/mle-survey
Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/
Timestamps:
02:45 Intro
04:10 How I got into data science and machine learning
08:25 My experience working as an ML engineer and starting the podcast
12:15 Project management methods for machine learning
20:50 ML job roles are trending towards more specialization
26:15 ML tools enable collaboration between roles and encode best practices
34:00 Data privacy, security, and provenance as first class considerations
39:30 The future of managed ML platforms and cloud providers
49:05 What I've learned about building a career in ML engineering
54:10 Dealing with information overload
Links:
Josh Tobin: Research at OpenAI, Full Stack Deep Learning, ML in Production
Practical ML Ops // Noah Gift // MLOps Coffee Sessions
Building a Post-Scarcity Future using Machine Learning with Pavle Jeremic (Aether Bio)
SRE for ML Infra // Todd Underwood // MLOps Coffee Sessions
Luigi Patruno on the ML Ops Community podcast
Luigi Patruno: ML in Production, Adding Business Value with Data Science, "Code 2.0"
This podcast could use a review! Have anything to say about it? Share your thoughts using the button below.
Submit Review