Data Contamination
Media Type |
audio
Categories Via RSS |
Technology
Publication Date |
May 02, 2016
Episode Duration |
00:20:58
Supervised machine learning assumes that the features and labels used for building a classifier are isolated from each other--basically, that you can't cheat by peeking. Turns out this can be easier said than done. In this episode, we'll talk about the many (and diverse!) cases where label information contaminates features, ruining data science competitions along the way. Relevant links: https://www.researchgate.net/profile/Claudia_Perlich/publication/221653692_Leakage_in_data_mining_Formulation_detection_and_avoidance/links/54418bb80cf2a6a049a5a0ca.pdf

This episode currently has no reviews.

Submit Review
This episode could use a review!

This episode could use a review! Have anything to say about it? Share your thoughts using the button below.

Submit Review