Please login or sign up to post and edit reviews.
Snorkel: Training Dataset Management with Braden Hancock
Media Type |
audio
Categories Via RSS |
Technology
Publication Date |
Apr 09, 2020
Episode Duration |
00:52:56

Machine learning models require the use of training data, and that data needs to be labeled. Today, we have high quality data infrastructure tools such as TensorFlow, but we don’t have large high quality data sets. For many applications, the state of the art is to manually label training examples and feed them into the

The post Snorkel: Training Dataset Management with Braden Hancock appeared first on Software Engineering Daily.

Machine learning models require the use of training data, and that data needs to be labeled. Today, we have high quality data infrastructure tools such as TensorFlow, but we don’t have large high quality data sets. For many applications, the state of the art is to manually label training examples and feed them into the

Machine learning models require the use of training data, and that data needs to be labeled. Today, we have high quality data infrastructure tools such as TensorFlow, but we don’t have large high quality data sets. For many applications, the state of the art is to manually label training examples and feed them into the training process.

Snorkel is a system for scaling the creation of labeled training data. In Snorkel, human subject matter experts create labeling functions, and these functions are applied to large quantities of data in order to label it. 

For example, if I want to generate training data about spam emails, I don’t have to hire 1000 email experts to look at emails and determine if they are spam or not. I can hire just a few email experts, and have them define labeling functions that can indicate whether an email is spam. If that doesn’t make sense, don’t worry. We discuss it in more detail in this episode.

Braden Hancock works on Snorkel, and he joins the show to talk about the labeling problems in machine learning, and how Snorkel helps alleviate those problems. We have done many shows on machine learning in the past, which you can find on SoftwareDaily.com. Also, if you are interested in writing about machine learning, we have a new writing feature that you can check out by going to SoftwareDaily.com/write.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

The post Snorkel: Training Dataset Management with Braden Hancock appeared first on Software Engineering Daily.

This episode currently has no reviews.

Submit Review
This episode could use a review!

This episode could use a review! Have anything to say about it? Share your thoughts using the button below.

Submit Review