Why it's important to split your data

  Рет қаралды 1,036

Cassie Kozyrkov

Cassie Kozyrkov

10 ай бұрын

Why should you use separate datasets for exploratory data analysis (EDA) and for statistical hypothesis testing? The logic of statistical inference is all about surprise: do these data surprise you? Well, it's hard to be surprised by patterns in data that you've seen already...
This video is a 30 second summary of how not to be a charlatan with data: don't use the same datapoint for generating your hypothesis and testing your hypothesis. Split your data!
Learn more on my blog:
bit.ly/quaesita_charlatan
bit.ly/quaesita_sydd
or
Watch this episode of my ML/AI course:
bit.ly/mfml_049
If you find my musings useful, show some love to those subscribe and share buttons.

Пікірлер: 4
@denpluzhhnikov4969
@denpluzhhnikov4969 9 ай бұрын
"watching you just to have a good mood" - professional head of data science at my company:)
@greensock4089
@greensock4089 10 ай бұрын
"We don't need to split our data, we have so much of it that p-values are always tiny and significant" - head of data science at my company...
@Flylikea
@Flylikea 10 ай бұрын
One input is just one input. Isn't that what we are talking about? So, if, for example, one notices a sudden increase or decrease in something, then they'd have to gather a larger dataset to check if what they saw means something or is just normal. (I mean... this is the point, right??? I perform EDA, notice a pattern (eg specific demographic group and risk for churn), that's the inspiration. To test the rigor of my inspiration/observation I'll have to check a greater or different time period or across different cities)
@mitto20
@mitto20 10 ай бұрын
How?
How to work with inherited datasets
5:03
Cassie Kozyrkov
Рет қаралды 2,5 М.
Sigma Girl Past #funny #sigma #viral
00:20
CRAZY GREAPA
Рет қаралды 26 МЛН
She ruined my dominos! 😭 Cool train tool helps me #gadget
00:40
Go Gizmo!
Рет қаралды 61 МЛН
Is prompt engineering a basic skill? Is it even... engineering?
8:09
Cassie Kozyrkov
Рет қаралды 4,8 М.
Optimize your life with decision science
3:05
Cassie Kozyrkov
Рет қаралды 1,9 М.
Autoencoders | Deep Learning Animated
11:41
Deepia
Рет қаралды 2,2 М.
26: Resampling methods (bootstrapping)
9:40
Matthew E. Clapham
Рет қаралды 145 М.
Where does math impostor syndrome come from?
4:35
Cassie Kozyrkov
Рет қаралды 2 М.
Judgment calls in data science
2:23
Cassie Kozyrkov
Рет қаралды 857
How to set the complexity of your decision
2:35
Cassie Kozyrkov
Рет қаралды 1,1 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
How to Create a Dataset for Machine Learning | #AI101
7:20
Jordan Harrod
Рет қаралды 45 М.
The importance of domain expertise in data science
1:10
Cassie Kozyrkov
Рет қаралды 5 М.
Что не так с Sharp? #sharp
0:55
Не шарю!
Рет қаралды 117 М.
Игровой Комп с Авито за 4500р
1:00
ЖЕЛЕЗНЫЙ КОРОЛЬ
Рет қаралды 1,4 МЛН
Телефон в воде 🤯
0:28
FATA MORGANA
Рет қаралды 1,2 МЛН