For data-hungry tech companies, YouTube is a gold mine
Podcast |
Marketplace Tech
Publisher |
Marketplace
Media Type |
audio
Categories Via RSS |
News
Technology
Publication Date |
Jul 30, 2024
Episode Duration |
00:11:41

Companies competing in the chatbot wars are using something known in the industry as “the Pile” to train their large language models. It’s a trove of open-source data made up of text scraped from all around the internet, including Wikipedia and the European Parliament. Annie Gilbertson, investigative reporter for Proof News, recently took a deep dive into the Pile and discovered something else: a dataset called “YouTube Subtitles.” Marketplace’s Lily Jamali spoke with Gilbertson about her investigation and how YouTube creators feel about their content being used without their consent.

This episode currently has no reviews.

Submit Review
This episode could use a review!

This episode could use a review! Have anything to say about it? Share your thoughts using the button below.

Submit Review