LLM Data Frontiers
Publisher |
Prateek Joshi
Media Type |
audio
Categories Via RSS |
Technology
Publication Date |
Jan 22, 2024
Episode Duration |
00:33:45

Curtis Northcutt is the cofounder and CEO of Cleanlab, a data curation platform for LLMs. They have raised $30M in funding from Bain Capital Ventures, Menlo, Databricks, and TQ. He was previously the cofounder and CTO of ChipBrain. He has a PhD in Computer Science from MIT.(00:07) Data Curation in the Context of LLMs(01:14) Connection between Language Models and Computer Science(03:14) Importance of Data Curation for LLMs(04:06) Challenges in Data Curation for LLMs(06:09) Confident Learning and its Concept(09:42) CleanLab and its Role(12:42) Role of Open Source Datasets and Tooling(15:08) Balancing Data and Privacy in Regulated Industries(17:25) Feasibility of Federated Learning(20:35) Decentralized Compute and Aggregating Compute Clusters(25:19) Determining Model Size for Data Representation(27:09) Advice for ML Engineers in Handling Data Curation(30:20) Rapid Fire RoundCurtis's favorite book: The Bible (in the context of marketing)--------Where to find Prateek Joshi: Newsletter: https://prateekjoshi.substack.com Website: https://prateekj.com LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19 Twitter: https://twitter.com/prateekvjoshi 

This episode currently has no reviews.

Submit Review
This episode could use a review!

This episode could use a review! Have anything to say about it? Share your thoughts using the button below.

Submit Review