What haunts statisticians at night

  Рет қаралды 74,431

Very Normal

Very Normal

Күн бұрын

Пікірлер: 160
@very-normal
@very-normal 6 ай бұрын
forgot to pin this, don't shame me Brilliant lol To try everything Brilliant has to offer for free for 30 days, visit brilliant.org/VeryNormal. The first 200 of you will get 20% off Brilliant’s annual premium subscription.
@Vaeinoe
@Vaeinoe 6 ай бұрын
My favourite example of differentiating correlation and causation is that even though firefighters and housefires are often seen together, it's pretty clear the fire department didn't cause the fire
@very-normal
@very-normal 6 ай бұрын
or did they…? (jk that’s a really good example)
@realGBx64
@realGBx64 6 ай бұрын
What? It is quite clear that the house fire caused the firefighters to be there…
@Vaeinoe
@Vaeinoe 6 ай бұрын
@@realGBx64 Wait... true I wonder whether this still is a valid example or not
@AnthonyBerlin
@AnthonyBerlin 6 ай бұрын
​@@realGBx64Yeah. The fire causes the fire fighters to be there, but the fire fighters do not cause the fire to be there. They are there because of the fire, which is just another way of saying the fire causes the fire fighters to be there. So where is the problem in the example?
@realGBx64
@realGBx64 6 ай бұрын
@@AnthonyBerlin that it is not a correlation without causation. There is a clear causal link.
@AmberSZ
@AmberSZ 6 ай бұрын
A couple years ago I read a survey of sci-comm articles that actually found they were too conservative on describing things as correlation/causation. People had the correlation =/= causation warning drilled into them to the point that they were describing results which by the study design could be attributed to causation as "just" correlation!
@code861
@code861 6 ай бұрын
what were the relationships between?
@Tzizenorec
@Tzizenorec 6 ай бұрын
To show causation, you must arrange for one of the variables to be caused purely by randomness (randomly select some of the study participants, and _force_ thing X to happen). Then, if there's a correlation, you know the thing forced to happen as part of the study was the cause of the other thing. There is _always_ a cause and effect; the possibilities are either 1) X caused Y, 2) Y caused X, 3) Another thing not tested for caused both, 4) Some mixture of X and Y caused availability for the study. Rule out three of those as possibilities, and you prove the fourth (assuming the study still shows a correlation).
@andrewharrison8436
@andrewharrison8436 6 ай бұрын
Good science is to "torture" the data to find some correlations (basically nature giving you some hints). Then design randomised trials to find out which correlations are spurious and which are genuine.
6 ай бұрын
Fallacist's fallacy.
@robertwilsoniii2048
@robertwilsoniii2048 4 ай бұрын
​@@TzizenorecOnly trouble is "random" can fail.
@BradyPostma
@BradyPostma 6 ай бұрын
I like how you quote the cartoon _The Boondocks_ about unknown unknowns, in the part where they quote Donald Rumsfeld who was paraphrasing NASA administrator William Graham, who was referencing the work of Joseph Luft and Harrington Ingham's work developing the Johan window in 1955. I love how real expert research gradually enters the pop culture zeitgeist!
@blakaligula3745
@blakaligula3745 6 ай бұрын
As someone in social science (public policy), this is constantly on my mind. 'All of Statistics' has great chapters on causation and DAGs, even covering continuous causal variables. It also points to other great sources on causal analysis for anyone curious.
@very-normal
@very-normal 6 ай бұрын
Yes! All of Statistics is a great book
@figmundsreud9363
@figmundsreud9363 6 ай бұрын
Just some words auf caution about the statement "causation implies correlation". This does of course not mean that a strong linear bivariate correlation is necessary for a causal effect. Confounders can also push the empirical correlation close to 0 with there still being a causal effect
@andrewharrison8436
@andrewharrison8436 6 ай бұрын
and, of course, causation can be none linear.
@MarkusAldawn
@MarkusAldawn 6 ай бұрын
​@@andrewharrison8436 I'm not a statistician- nonlinear causation doesn't mean backwards in time I'm assuming, so is it something like a threshold dose? A certain amount of X causes a certain amount of Y, but half the amount of X causes no Y, for example?
@andrewharrison8436
@andrewharrison8436 6 ай бұрын
@@MarkusAldawnI was thinking more of examples like mortality against weight: very light people have higher mortality (basically starved*) and so do very heavy people (strain on the heart) so there is an optimum weight (depending on age, sex and height). Threshold dose would also be an example but the response could still be largely linear. * but also TB could explain low weight - we ought to be eliminating that factor but I fear we aren't succeding. Could be an example of a confounding factor!
@jakedewey3686
@jakedewey3686 6 ай бұрын
@MarkusAldawn @andrewharrison8436 Consider the drug levothyroxine. It's a thyroid drug where the dose needs to be controlled so precisely that it's available in strength increments of 12.5 micrograms. Even that level of precision in dosage sometimes isn't enough, with doctors instructing patients to alternate between taking two different strengths of the drug to get the correct average dosage. However, the dosage needed by different patients varies wildly - some may take as little as 25mcg, while take up to 300mcg. If you compared patient outcomes vs provided dosage, you'd see a sharp improvement of outcomes as you approach their optimum dosage, and a sharp worsening of outcomes as you pass the optimum dosage, even if you only considered the range of dosages that can be considered therapeutic. If you were to do a study comparing patient outcomes to provided dosage, with patients randomly assigned a dosage, your data would almost certainly show no correlation between outcomes and dosage, because you'd be giving most patients either too much or too little of the drug to get the beneficial effects. In actuality there absolutely is a causative effect on patient outcomes, which could be easily observed by varying the dosage in individual patients. However, because your study design doesn't control for the confounding variables that determine the optimal dosage for the patients, you might conclude that levothyroxine has no effect on patient outcomes, or even that giving any amount of levothyroxine to a patient is actively harmful to them.
@PeloquinDavid
@PeloquinDavid 6 ай бұрын
I have a quibble: causation often involves correlation, but there are situations where it doesn’t. You have to take the functional form into account as well. For example, if the underlying function is a more or less symmetric "U"-shaped one (e.g. a quadratic one) over the range for which you have data, you can have strong causal relation with little or no correlation (which is designed to assess linear relationships but can also detect monotonic ones, albeit with a loss of accuracy/"explanatory power"). Obviously, the equivalent of the correlation coefficient in a general regression model (one that takes account of real non-linear interaction effects), the R-square, does enable you to get a good sense of the explanatory power of non-linear relationships - albeit not without their own issues with confounding factors. But it remains somewhat misleading to assert a clear one-way relationship between causality and "correlation".
@very-normal
@very-normal 6 ай бұрын
I think you’re correct. I avoided any notion of non-linear cause and correlation in my video, so what I said applies to linear relationships. But I’ll wholly admit that I have no exposure to non-linear causal inference so it’s definitely something for me to learn in the future. Thanks for raising your point!
@jakedewey3686
@jakedewey3686 6 ай бұрын
There's also just the impact of confounding factors even when causation exists. The exposure or intervention may be a necessary but insufficient cause (the intervention must occur for the outcome to occur, but isn't enough to cause the outcome on its own) or it may be an unnecessary but sufficient cause (the intervention does cause the outcome, but other things do, too). In both of those scenarios the effect of the exposure/intervention has a causative relationship to the outcome but can easily be hidden by confounding factors that are also affecting the outcome.
@TheLoneWolfling
@TheLoneWolfling 6 ай бұрын
Distance correlation is still a correlation measure, and will reveal correlation with a U shape. I really wish people wouldn't conflate correlation with linear correlation.
@shreeniwaz
@shreeniwaz 6 ай бұрын
One of the best explanations on basic concepts which also sheds light on the intricacies related..
@justdave9195
@justdave9195 6 ай бұрын
That boondock's clip lol. This is probably my favorite stats page. Keep doing what you're doing.
@superuser8636
@superuser8636 6 ай бұрын
I’m so glad I double-majored with a stats concentration… This was one of the best things I covered in our undergrad ANOVA course.
@andresfelipehiguera785
@andresfelipehiguera785 5 ай бұрын
I needed this explanations long time ago! Thanks. I loved the simulations to show the impact of ‘C’
@MorseAttack
@MorseAttack 6 ай бұрын
I was hoping this would go into Panel regression, would make for a good follow up in this series!
@dumbpenguin900
@dumbpenguin900 6 ай бұрын
This is the best explanation of a counterfactual I've heard. Made it very simple. 🎉
@requetevision
@requetevision 6 ай бұрын
YOU TRICKED ME I subliminally absorbed your “example” of clicking this video to SUBSCRIBE, to see what effect there was, and now I am subscribed!!! (Happily 😊 )
@very-normal
@very-normal 6 ай бұрын
ayyy gottem (thanks!)
@Agent-cipher-6120
@Agent-cipher-6120 6 ай бұрын
Hi, I just want to say that you and your videos are a blessing to me and many others! Cheers man!
@travisretriever7473
@travisretriever7473 6 ай бұрын
It keeps them up at night the way those confounding Dover Boys drove that one guy to drink!
@trentneilson9783
@trentneilson9783 6 ай бұрын
Would love for you to extend this with a topic on Mendelian randomisation. About how we actually can say this drug caused this outcome.
@RyeCA
@RyeCA 6 ай бұрын
Im currently learning for my linear regression and their mathematical foundations exam in two weeks. I actually revised the lecture about correlation/causation today! What a coincidence (or is it now :P )
@very-normal
@very-normal 6 ай бұрын
Good luck on your exam!
@shadeblackwolf1508
@shadeblackwolf1508 6 ай бұрын
There are correlations between icecream use and drownings. makes it nice and easy
@sitrakaforler8696
@sitrakaforler8696 6 ай бұрын
As a statistician i wanna say "congrats and thank you too had shared such a great video🎉🎉
@ZenoDiac
@ZenoDiac 6 ай бұрын
Correlation = association = tendency = ???? 😂 it's like it's going in circles
@very-normal
@very-normal 6 ай бұрын
is it
@alejandromilian3767
@alejandromilian3767 6 ай бұрын
This gotta be one of the best KZbin videos I have seen
@hectornonayurbusiness2631
@hectornonayurbusiness2631 6 ай бұрын
correlation is not causation, but it does imply it
@Honigm3lone
@Honigm3lone 6 ай бұрын
Thank you very much for that video. Last year I was reading "The Book of Why" by Judea Pearl and it was quite tricky to read especially for me as a non-native english speaker also after having multiple leactures in statistics. You explained the part of the confounders very well in my opinion, please make more videos on causal inference! :)
@prod.kashkari3075
@prod.kashkari3075 6 ай бұрын
Can you do a video on asymptotic vs exact distributions of statistics?
@jaytout2224
@jaytout2224 6 ай бұрын
statisticians when I tell them correlation doesn’t equal causation (they are in shambles)
@very-normal
@very-normal 6 ай бұрын
🫠 can confirm
@psl_schaefer
@psl_schaefer 6 ай бұрын
Thanks for the great video! I was just thinking that the collider bias (aka Berkson's paradox) would be a great fit here as well. It is somewhat less intuitive than the classical confounder, but still causes spurious relationships. And I would highly recommend "Statistical Rethinking" by Richard McElreath to learn about these topics (on an introductory level).
@very-normal
@very-normal 6 ай бұрын
Yes! I thought about including colliders in here but cut it out on the editing floor. Statistical Rethinking is def a solid book
@minchulkim87
@minchulkim87 6 ай бұрын
Loved it. Are you sure about that last statement: causation -> correlation?
@very-normal
@very-normal 6 ай бұрын
If I were to be very strict about my phrasing, then causation does not necessarily imply (Pearson’s linear) correlation. Counterexamples can be cooked up to show this. I was trying to say this along the lines of “causation always implies correlation (a general association)”, like how I first defined correlation, but I realize now this is vague since I focus on Pearson’s correlation later.
@Croccifixo
@Croccifixo 6 ай бұрын
With my very limited knowledge (some math in high school, a course in R studio as part of a CS degree and some random YT videos), I would say that it depends on how you view it. Stripped of all known and unknown additional causal relationships (including true randomness), I would say no correlation would mean that we can express the dependent variable's value as a function on the independent variable x as f(x) = 0. Any deviation from the 0 line would indicate a correlation, no matter how small. I like to imagine the combination of all causal relationships kind of like how combining the different frequencies combines to a seemingly messy output in a Fourier transform. A change in any of the constituent functions results in a change in the combined function, no matter how imperceptible it might seem. At some point you could argue that the change is so small that we shouldn't count it, but in the strictest sense, I would say it still counts as a correlation. Now, there is a slight problem when we think about confounders, in this case specifically about confounders that are not just correlated, but causally linked to the independent variable. Let's say we have the following 3 causal relationships (again thinking about it like with a Fourier transform, so = really refers to the portion of the full result that comes from the cause, not an actual equality): b = f(a) c = g(a) b = h(c) Here, a is the confounding, b is the depending and c is the independent variable. Now lets say that f(a) = -h(g(a)). Any change a would have on b is mitigated by the effect c has on b, so we would see no correlation between the confounding variable and the dependent variable. If c definitely has no additional causal factors c = g(a) is a true equality. Therefore any changes to c have to come from a, and we know that changes to a don't change b, so we actually end up with no causality or relation at all from a or c onto b. Lets instead look at the case where c does have additional causal factors. Lets say x is the part of c that is not derived from g(a). If we look at how b is affected, we see that the g(a) part of c doesn't affect it at all, so we could possibly argue that h(c) should really be h(x) and then fully removing f(a) as a cause for b. Functionally, this would be the same, since changes to a don't affect b, but any other changes to c do. Since c refers to a measurable variable, it becomes a little tricky. We either need to know what g(a) is, and then write h(x) as h(c - g(a)), or we need to find the factors that generate the x part of c, and instead consider b in terms of those factors, and therefore actually also removing c as a causal variable of b. To me, this seems to imply that we have two options that both would result in the same outcome. Either a and c both have causal relationship with b, where a has causation without correlation, or neither a nor c has any causal relation to b, but c and b share other causal factors. We then need to think about what we really mean by causality. Would b happen purely based on x if c didn't exist as a concept at all? Lets consider what it would look like if a didn't exist in the example, and that c was only affected by the factors contained in x. It would then be true that changes to x necessarily become reflected in b through the effect they perform on c. However, would it still be the case if c didn't exist at all? I don't think it necessarily does (unless you also consider the defining concept of c as a factor, but I would not say that it does, c just either exists or it doesn't). Because of this, even though the maths would be the same, I don't think the two options in regards to a are the same. We need to have a causal relationship between c and b, as that is the actual cause for why x affects b, but since we defined that a has a causal relationship with c, we also need to have a causal relationship from a on b to counteract the changes in c. With all that said, I would say that "causality -> correlation" can possibly be considered true (in the case where you just use x and bypass c), but I would argue that it doesn't have to be true. Again, my knowledge in statistics is pretty limited, so there might be some formal definition that means much of what I've said is bollocks, and there very well might be some flaw in my logic somewhere, and I've probably spent way too much time thinking about it (and half-way through changed my original opinion from being definitely true, it just got stuck in my mind somehow), but at least that's what makes sense to me right now.
@TheKivifreak
@TheKivifreak 6 ай бұрын
Great video. Looking forward how you will solve these unknown confounders. Also, how do you handle the raven’s paradox?
@kkrlolorkk1657
@kkrlolorkk1657 6 ай бұрын
What a fantastic video! I'm a philosophy student from Italy currently writing a thesis about this! I could talk about the whole day. For everyone to read more about this I suggest reading "The book of why" by Judea Pearl, one of the founders of the causal inference and causal modelling. A lot of historical anecdotes are present in the book. For everyone wanting something more technical to read I suggest "Causality" by Judea Pearl and "Causation, Prediction and Search" by SGS, the two major works of the field. Have a nice day!
@vishesh0512
@vishesh0512 6 ай бұрын
I wish you provided some intuition about why the confounder affected data in the way it did. I'd have thought that if half 1h people have have friends and same for 2h, the friend effect would just average out...
@GoneZombie
@GoneZombie 6 ай бұрын
Oh look, a channel for me, specifically!
@erenjaegersrightbicep63
@erenjaegersrightbicep63 6 ай бұрын
Great video as always! I had a weird question. So, assume a spectrum ranging from all correlation and no causation and all correlation mostly/all causation (ofcourse assuming here that causation also implies said correlation), where do the most common causal inference techniques fall on this spectrum? Eg. Linear regression, AB Testing (paired t test for example) RDD, DID, etc, and can it be concluded that a better functional form that controls for variables more accurately will likely push the particular inference method chosen towards the causation side of the spectrum?
@TheLoneWolfling
@TheLoneWolfling 6 ай бұрын
> can it be concluded that a better functional form that controls for variables more accurately will likely push the particular inference method chosen towards the causation side of the spectrum? Yes, but this is a leading question. If the controls are _actually_ more accurate, that's one thing. If. If you phrased it as "can it be concluded that a better functional form that [attempts to control] for variables more accurately" the answer would be "no". It is very easy to inadvertently add controls that end up making the result less accurate.
@BleachWizz
@BleachWizz 6 ай бұрын
12:54 - that's a nice philosopher saying. I just think he forgot to mention unkown knowns. Because there are things someone knows and you don't know it does.
@Siroitin
@Siroitin 6 ай бұрын
Hi zizek
@walterbushell7029
@walterbushell7029 6 ай бұрын
And the things you know that aren't true.
@Siroitin
@Siroitin 6 ай бұрын
@@walterbushell7029 And yet you behave like the non-true thing is true
@magnus.discipulus
@magnus.discipulus 6 ай бұрын
What if the confounder variable is a categorical one? Say, I think that age has influence on salary, but also the type of geography (europe/asia/...)
@reinerwilhelms-tricarico344
@reinerwilhelms-tricarico344 6 ай бұрын
It was a pretty good explanation. Still it’s not satisfying to know that ultimately you can’t “prove” causation merely by statistics, even accounting for confounding conditions. Before using statistics one should always do a deep analysis and build a theory formulated at least with clear, testable hypotheses. And if you want to prove them with stats you have to make sure you write them down first, so that you stay honest. There is always a danger to go the other way, make adhoc assumptions about causation based on apparent statistical findings, and find something that has a good significance value. That’s one of the reasons so many medications are invented, which were actually from failed experiments to find something, but later found to be of some completely different use. So I rather try to stick to the approach preferred in physics: First, try to make a theory that should explain the data at hand, and best is if that is starting from first principles. Then you can use that theory to make predictions for other measurements, and if your theory is quantitative and analytic you should also be able to predict estimates of the uncertainties of your predictions. Then wenn you evaluate measurements compared to predicted measurements you can actually compute the significance, the probability with which your theory is correct. Still, even this can fail in a big time. Your theory may make the right prediction but it’s still completely flawed because of some assumptions that no one involved knew they were wrong. Famous example are epicycles (cycles in cycles) that were used in ancient history and for many centuries to predict planetary motion. These were fairly accurate, but the theory was based on the flawed geocentric assumptions. Later it took quite a while for getting more accurate predictions using Kepler’s laws, derived from first principle, and the heliocentric assumptions.
@jonfe
@jonfe 6 ай бұрын
So basically the idea is that you always have more nodes that you think affecting all relationships. Universe is an infinit graph, all connected.
@martinstephens4633
@martinstephens4633 6 ай бұрын
In the fraction at 03:12, why is there no sqaure bracket E[X] E is an operator!
@very-normal
@very-normal 6 ай бұрын
Sorry, slightly sloppy notation on my part 😅
@MannISNOR
@MannISNOR 6 ай бұрын
Great video! 👏 But is it correct to say that causation -> correlation? There are many instances where a causal effect is masked by another variable. For example if alcohol causes men to become more prone to aggression and women less so. If you don't include gender in the analysis and aggregate across genders then it would look like no relationship between alcohol and aggression.
@fathertedczynski
@fathertedczynski 6 ай бұрын
Yep I'd say causation -> correlation but correlation -/-> causation. I think the hard part is conditioning on the correct variables and understanding which other factors are also correlated. Most often, if you see a causation and a correlation there will be one common factor which is causing the correlation. A common example is correlation of ice cream sales and shark attacks. The common factor here is the number of people populating the beach, which is correlated with both shark attacks and ice cream sales. So (beach population -> ice cream) && (beach population -> shark attacks) -/-> (shark attacks -> ice cream). This is actually a nice little proof found in propositional logic.
@Wishkeyn
@Wishkeyn 6 ай бұрын
I do think it's wrong to say that causation -> correlation, as you say the effect may be too small to get significant correlation. However there should be an increase in correlation due to the causation. Easiest example would be a stock that is part of an index fund, thus causation is obvious. The stock then drops significantly even though the rest of the index fund increases, logically you can reason that the fund would've increased more if that stock had not been part of it. Hence the correlation is higher, however you wouldn't necessarily be able to find the correlation
@TheLoneWolfling
@TheLoneWolfling 6 ай бұрын
correlation != linear correlation.
@zhuwenhao4852
@zhuwenhao4852 6 ай бұрын
great lecture. great example.
@zeevkeane6280
@zeevkeane6280 6 ай бұрын
Great videos, keep it up!
@gaspo1523
@gaspo1523 6 ай бұрын
Great video. I was wondering however what is the formal definition of causation ? Because if we had X = a*C and Y = b*C, wouldn’t we also have C = (1/a)*X , thus Y = (b/a) * X Wouldn’t this mean X causes Y ? Are random variables not the correct way to express causation ?
@very-normal
@very-normal 6 ай бұрын
Thanks! A relationship between X and Y is not necessarily causal. There could be a linear association between X and Y like you described, it does not mean it is causal. In the context of statistics, causes and effects are usually described in terms of random variables since we assume we’re dealing with random data. This is different from other notions of cause, so random variables are not necessary “correct” or “incorrect”, but just a way to formalize causes in a statistical sense
@MrRenanHappy
@MrRenanHappy 6 ай бұрын
Audio sounds duplicated
@pmmeurcatpics
@pmmeurcatpics 6 ай бұрын
Great video!
@ProfDrGrock
@ProfDrGrock 6 ай бұрын
Great video, keep up the great work:)
@krislai7453
@krislai7453 6 ай бұрын
Hi, I really like your videos and I am also a stat major. It would be great if you could include the reference you're using for your videos or the papers you have been reading. I just recently start reading journals and it would be great to see what others are reading too, thanks!
@very-normal
@very-normal 6 ай бұрын
That’s a great idea! I’ll try to keep this up for future videos and I’ll collect my citations for this one soon. If it helps, I usually turn to Journal of the American Statistician and Biometrics to see what’s been happening. But keep in mind I’m in biostatistics, so I’m usually looking at papers in those contexts
@asfreild9383
@asfreild9383 6 ай бұрын
So.. Correlations correlate with causations but do causations necessarily cause correlations?
@ucchi9829
@ucchi9829 6 ай бұрын
I thought it would be like uncleaned data or divergence.
@DOTvCROSS
@DOTvCROSS 5 ай бұрын
I take it 'Gin Rummy' is Donald Rumsfeld. Does quotes sources matter to a statistician?
@very-normal
@very-normal 5 ай бұрын
no he’s the guy from the boondocks clip lol, also Rumsfeld got quote from other people soooooo yeah
@justinahole336
@justinahole336 6 ай бұрын
Great video! But, frankly, it's my ex wifes antics that keeps me up at night! She confounds my whole life!
@cliftonjohnson1990
@cliftonjohnson1990 6 ай бұрын
solid video
@yulia6354
@yulia6354 6 ай бұрын
Thanks for the video! but I don't quite understand if E - some kind of expectation... and it's before the parenthesis...how can it be used both to calculate the deviation from the average work time... and from average notes🤔 It feels that the expectations for time and scores should be different. Has anyone got it?
@very-normal
@very-normal 6 ай бұрын
Hey thanks! I think you’re talking about the covariance equation. You’re right that the expectation for work time and score should be different. By using the expectation operator on both of them (EX and EY), we can get those averages But the covariance considers a new random variable: (X - EX)(Y - EY). This product represents how much two random variables will vary in the same/opposite direction. To get the average of this new random variable, we can take the expectation over it. Hopefully this clarifies a little bit!
@ixywas
@ixywas 6 ай бұрын
i love this !!!!!
@kacodemonio
@kacodemonio 6 ай бұрын
Watching this video caused me to subscribe.
@very-normal
@very-normal 6 ай бұрын
🫡
@mesplin3
@mesplin3 6 ай бұрын
Causation has always confused me. I always just assumed it meant P(Y | X) >= P(Y) where X and Y are random variables.
@TheLoneWolfling
@TheLoneWolfling 6 ай бұрын
a) as described that only works for binary events b) consider the case where X is a coinflip and Y = !X. P(Y | X) = 0, but P(Y) = 0.5.
@mesplin3
@mesplin3 6 ай бұрын
@@TheLoneWolfling a) Causation isn't a binary event? b) X is the event that someone flips a coin and Y is the event where someone doesn't flip a coin? I agree that someone flipping a coin doesn't cause someone not flipping a coin.
@TheLoneWolfling
@TheLoneWolfling 6 ай бұрын
Sorry, it's been too long since university; I'm not using standard notation. X is a binary random variable (50% chance of false, 50% chance of true). Y is a binary variable dependent on X - if X is true Y is false and vice versa. Under your definition of causation: P(Y) = 0.5 P(Y | X) = 0 P(Y | X) < P(Y), therefore X and Y are _not_ causally related. ...but they are.
@TheLoneWolfling
@TheLoneWolfling 6 ай бұрын
"a) Causation isn't a binary event? " I was referring to "P(Y | X) >= P(Y)" - which only works if X and Y are binary. You can talk about e.g. correlation versus causation between food intake and height, but P(height) makes no sense, for instance.
@mesplin3
@mesplin3 6 ай бұрын
@@TheLoneWolfling Let X be a random variable that is a 1 if one gets a heads on a coin flip or a 0 if one does not. Let Y be a random variable that is a 1 if one does not get a heads and 0 if one does get heads. P(Y=1) = 0.5 The coin is fair. P(Y=1 | X=1) = 0 The act of getting a heads on the coin does not cause the act of not getting heads. P(Y=0 | X=1) = 1 The act of getting heads on the coin does cause the act of (not not) getting heads on the coin. I'm not seeing any issues.
@difflocktwo
@difflocktwo 6 ай бұрын
using = instead of
@very-normal
@very-normal 6 ай бұрын
equals sign gang rise up
@ArgumentumAdHominem
@ArgumentumAdHominem 6 ай бұрын
You make a potentially misleading statement that it is generally good to control for a confounder if it is available. This is only true if you know the true DAG. However, there are different possible DAGs, such as Forks, Pipes or Colliders. In some cases, you make things worse by controlling for the confounding variable, as opposed to doing nothing at all. Causal inference highly limited or even impossible without exact prior knowledge of the DAG. Which, of course, raises the question of how can one even learn the DAG. Would be a good topic for the future video. For a more complete picture, I highly recommend watching Statistical Rethinking by Richard McElreath
@very-normal
@very-normal 6 ай бұрын
Yes, you bring up a valid point! I tried to allude to this briefly in the video, but didn’t mention any of the specific reasons that you had mentioned. Definitely a good future video topic.
@ArgumentumAdHominem
@ArgumentumAdHominem 6 ай бұрын
@@very-normal thanks for noting. You did a good job overall, I have added this for completeness, as I presume these videos target mostly beginner audience
@TheLoneWolfling
@TheLoneWolfling 6 ай бұрын
This. The rather interesting issue around stats is: * Causal analysis requires that there be pairs of variables with zero correlation. (Otherwise your inference fails and you end up with the trivial graph, which isn't exactly helpful.) * We know there are confounders in the real world that logically would have correlation with ~everything. (Obvious example: neutrino flux) * We know that controlling for variables without the correct DAG can make things worse. * Controlling for variables in the real world appears to help anyway??
@jackrorystaunton4557
@jackrorystaunton4557 6 ай бұрын
So, it is hopeless then, yes? I mean, you don't really show strategies to deal with this. Statisticians should leave causation to the physicists ;)
@very-normal
@very-normal 6 ай бұрын
The way I mentioned in the video was to control for the confounder in the regression. I also mentioned randomization, but didn’t explain like you said. So it’s not hopeless thankfully!
@korigamik
@korigamik 6 ай бұрын
Man although i like your videos, but you don't have good references of your stuff. If you could create a written blog for your videos for us, I'm sure a lot of us would appreciate that. Just creating videos seems counterproductive if one wants to look up things that you say. Have things written down would be a great thing for the vierwers :)
@very-normal
@very-normal 6 ай бұрын
Thanks for the feedback! This is actually a question I’ve been struggling with for a while, so I’m glad you brought this up. Can I ask you what sort of information you’re missing out on video, but would prefer to see in writing?
@nickmavromatis7657
@nickmavromatis7657 6 ай бұрын
Your vids are so great ❤ my degree is statistics I swear I have my meal with your videos lmao
@korigamik
@korigamik 6 ай бұрын
​@@very-normalyou're great at introducing topics in these videos, if you usually write a script for these videos, you can create a simple blog with headings for the individual topics and add these great graphics with the content. It won't take much time and the effort would be greatly appreciated (not to mention the seo for your website)
@AkamiChannel
@AkamiChannel 6 ай бұрын
34 seconds in and I'm already annoyed. "Correlation does not mean causation" and the human brain inferring patterns from randomness are two different things!!!! (Though these may be correlated! 😂) But why do you explain it as if those are the same!? Idc if you have a phd or anything that is a serious mistake! Stupid people will be confused.
@AkamiChannel
@AkamiChannel 6 ай бұрын
Ugh and now I am 3 minutes and 10 seconds in and though I thought you were "dumbing things down" to be "more accessible" to a wider audience, you so far seem to flaunt more complicated things than you actually need to explain these things. It's as simple as "instead of A causing B, there may be C that causes both A and B" or you could say "ice cream sales don't cause the weather to be hot," etc, but instead you're throwing completely unexplained formulas at us. What is your goal? Is your purpose to make a video that informs and explains or is it to make yourself look smart (you're failing).
@very-normal
@very-normal 6 ай бұрын
lol
@hoagie911
@hoagie911 2 ай бұрын
what a terrible video. You kept pointing at it being impossible to control unknown confounders. And then kept saying don't worry, we have ways around it. But never said what they were.
@very-normal
@very-normal 2 ай бұрын
lol you could try my randomization video if you still want an answer to that question
@InOtherNews1
@InOtherNews1 6 ай бұрын
Genuinely loving the stuff here about theory and practice. Keep it up!
@lok2676
@lok2676 6 ай бұрын
My question is, if excluding confounders leads to ommitted variable bias and including confounders leads to multicollinearity (due to the high correlation between the confounder variable and the other explanatory variable/s), what can one do?
@very-normal
@very-normal 6 ай бұрын
My educated here is that it would be better to include the confounder in the analysis, even with high correlation between the explanatory and confounders. You do get higher standard errors as a result for your exposure, but I think that the resulting estimate is a better description of the effect you want to get. Alternatively, if we can randomize exposure, we theoretically don’t have to include the confounder in the model! But some problems/areas don’t allow for it
@phamnguyenductin
@phamnguyenductin 6 ай бұрын
There are techniques for addressing multicollinearity such as PCA or ridge regression.
@nicholasfengschaefer4758
@nicholasfengschaefer4758 3 ай бұрын
You could also look into double machine learning.
@florentinudrea7837
@florentinudrea7837 6 ай бұрын
please, PLEASE don't stop doing these videos! I love them!
@TheDilla
@TheDilla 6 ай бұрын
I was with you until I saw the equal sign assignment in the R code.
@caty863
@caty863 6 ай бұрын
what's wrong with that? it's a matter of convention...in most cases anyway!
@TheDilla
@TheDilla 6 ай бұрын
​@@caty863Oh I never said it was rational, it just bothers me, haha. Although, i believe there are times when equal sign assignment will do something different than
@caty863
@caty863 6 ай бұрын
@@TheDillayes there are times these operators do different things, for instance the scope of the assigned variables differ when used inside function arguments setting or within function definitions.
@TheSwordfish-g3r
@TheSwordfish-g3r 4 ай бұрын
It's classic chaotic random
@d_b_
@d_b_ 6 ай бұрын
7:15 bugs me. I don't work in an experimental setting, so datasets are given to me. If all I can confidently state are correlations, which may or may not have an effect, what use is my analysis? 😣
@very-normal
@very-normal 6 ай бұрын
Correlations themselves are not bad, they are still hints to underlying (causal) relationships. What you ultimately estimate may not be the *exact* causal effect, but you can still get something that’s close and then you can be open about what you did and where your data comes from. One thing that helped the fight against smoking was the culminations of many observational studies all pointing at similar conclusions. None could really suggest a “cause” by themselves, but the body of evidence that was built became the basis for thinking of smoking as a cause. Your analysis could play a similar role! What kind of settings/data do you usually work with? Out of curiosity 🤠
@d_b_
@d_b_ 5 ай бұрын
@@very-normal A very delayed "thanks" for the response! Its software telemetry data, with no direct access to A/B tests. Mostly tasked with seeing how efficiently users navigate the software, and what we can learn from the data.
@allieindigo
@allieindigo 6 ай бұрын
I know this might be a silly question, but is your voice ai-gen? It just sounds so much like it, especially at 1:47. Sorry, if it isn't, that's my bad.
@very-normal
@very-normal 6 ай бұрын
haha, no offense taken! It’s not AI generated, but I do use Adobe’s AI to try to reduce background noise for my voiceovers. I’ve always felt something was off about the result, but I have yet to find ways to improve my audio. You’re actually the first person to catch that AI has altered my voice
@allieindigo
@allieindigo 6 ай бұрын
@very-normal Ah, I see. Thanks for the reply. Also, fenominal video, man, top quality!
@louiesumrall358
@louiesumrall358 6 ай бұрын
great job wow
@larrybridge7456
@larrybridge7456 6 ай бұрын
I am just curious what Einstein got wrong in his physics. Cool vid. Thanks for making it.
@very-normal
@very-normal 6 ай бұрын
He was not a fan of the then-new field of quantum mechanics, and spent a lot of time trying to refute it. Einstein famously argued that “God does not play with dice,” but it’s since been shown that that is the case
@jamesdavison1333
@jamesdavison1333 6 ай бұрын
Great video. Would like to see a discussion of instrumental variables as a workaround.
@DamaKubu
@DamaKubu 6 ай бұрын
This thing slaps! Haha +1 respect for Bondocks reference :D Oh how I wish you were my stats professor!
@baestrosity1233
@baestrosity1233 6 ай бұрын
Do you have a video on taking oneself (auto-didacting)from algebra up to highish level statistics
@seongunness608
@seongunness608 6 ай бұрын
great video; reminds me of that one saying someone said where you aren't measuring what you think you're measuring having a suite of variables that all weigh a certain amount in the effect variable you are studying tends to be useful in more complex stuff;
@ConnorMcCormick
@ConnorMcCormick 6 ай бұрын
normally in causal structural diagrams X → Y means X "listens" to Y, the opposite of what you said
@lumotroph
@lumotroph 6 ай бұрын
Cool video! Thank you. More story next time please! Subbed
@avaraportti1873
@avaraportti1873 6 ай бұрын
Statisticians typically avoid causal claims.
@MrSomethingred
@MrSomethingred 6 ай бұрын
This was a good topic for a video
@poketopa1234
@poketopa1234 6 ай бұрын
How does this "lm" function account for the confounders? Like how do you know what the coefficient on the confounder is, and whether the confounder scales the effects of X? I'd love to learn more about this.
@very-normal
@very-normal 6 ай бұрын
The “lm” function is what R gives us to run linear regressions. In my simulation, I generated data such that the confounder has a specific linear relationship with both the exposure and outcome, but the specific value itself doesn’t matter. It’s just something I controlled in my simulation. With multiple linear regression, you can only isolate a single coefficient if you hold the values of the other regressors constant. If one predictor is the exposure and the other is the outcome, the coefficient related to the exposure is interpreted as “the average change to the outcome, holding the confounder constant.” Since the confounder is held constant, its effects on X are accounted for. Hopefully this clarifies a bit!
@poketopa1234
@poketopa1234 6 ай бұрын
@@very-normal Thank you so much for the response and the great videos! :)
@yee6365
@yee6365 6 ай бұрын
Amazing video, and thanks to you i finally understand the relationship between linear models and regression. I still struggle to understand how this related to hypothesis testing, e.g. ANOVA. What do confounders mean there? and how to handle them?
@very-normal
@very-normal 6 ай бұрын
Thanks! Confounders would still have the same meaning, but they would not be handled in the same way as with a regression. This is a future video topic, but more generally, we would need to randomize who gets the exposure in order to give us a better shot of finding causal effects. Randomization makes the distribution of -all- confounders the same among all the exposure groups; this better isolates the causal effect since the effect of all the confounders is the same in all groups. In terms of hypotheses tests, nothing much changes really. Being able to randomize exposure lets us go a step further and say that a statistically significant association is also a statistically significant casual relationship
@KostasAlexiou0
@KostasAlexiou0 6 ай бұрын
Awesome video but you left us on a cliff hanger, where should I read about how to deal with observed confounders? Thank you for your awesome content!
@rajinfootonchuriquen
@rajinfootonchuriquen 6 ай бұрын
Why instead of use confounders, just use bayes theorem? Ronald Fischer was a frequentist, but in bayesian statistic, there is causality
5 tips for getting better at statistics
17:16
Very Normal
Рет қаралды 23 М.
The most important skill in statistics
13:35
Very Normal
Рет қаралды 318 М.
Watermelon magic box! #shorts by Leisi Crazy
00:20
Leisi Crazy
Рет қаралды 6 МЛН
Как подписать? 😂 #shorts
00:10
Денис Кукояка
Рет қаралды 7 МЛН
Крутой фокус + секрет! #shorts
00:10
Роман Magic
Рет қаралды 18 МЛН
Brawl Stars Edit😈📕
00:15
Kan Andrey
Рет қаралды 55 МЛН
The BEST Way to Find a Random Point in a Circle | #SoME1 #3b1b
18:35
The biggest beef in statistics explained
21:04
Very Normal
Рет қаралды 42 М.
The most important ideas in modern statistics
18:26
Very Normal
Рет қаралды 108 М.
An impossible game at the heart of math
16:31
SackVideo
Рет қаралды 121 М.
What if a glass of water were LITERALLY half empty?
4:20
xkcd's What If?
Рет қаралды 890 М.
Watermelon magic box! #shorts by Leisi Crazy
00:20
Leisi Crazy
Рет қаралды 6 МЛН