Рет қаралды 1,036
Why should you use separate datasets for exploratory data analysis (EDA) and for statistical hypothesis testing? The logic of statistical inference is all about surprise: do these data surprise you? Well, it's hard to be surprised by patterns in data that you've seen already...
This video is a 30 second summary of how not to be a charlatan with data: don't use the same datapoint for generating your hypothesis and testing your hypothesis. Split your data!
Learn more on my blog:
bit.ly/quaesita_charlatan
bit.ly/quaesita_sydd
or
Watch this episode of my ML/AI course:
bit.ly/mfml_049
If you find my musings useful, show some love to those subscribe and share buttons.