The biggest beef in statistics explained

  Рет қаралды 83,135

Very Normal

Very Normal

Күн бұрын

Пікірлер: 375
@very-normal
@very-normal Ай бұрын
To try everything Brilliant has to offer for free for a full 30 days, visit brilliant.org/VeryNormal. You’ll also get 20% off an annual premium subscription.
@bificommander7472
@bificommander7472 Ай бұрын
At the end of one Baysian statistics lecture, the professor ended with approximately this summary: "Frequency statistics give a mathematically rigerous answer to questions no one asked. Baysian statistics tells you what you want to know, based on assumptions no one believes."
@sophigenitor
@sophigenitor Ай бұрын
The reason why frequentist answers to questions no one asked are still somewhat useful is that under some assumptions they are a reasonable approximation of the Baysian answers to the questions you are interested in.
@Critical-Smoke
@Critical-Smoke 23 күн бұрын
@@sophigenitor examples?
@sophigenitor
@sophigenitor 23 күн бұрын
@@Critical-Smoke The easiest examples are confidence intervals. What people actually want are Bayesian credible intervals. And in most practical examples, the full Bayesian treatment with flat or uninformative priors will result in credible intervals that are indistinguishable from the Frequentist confidence intervals. I have seen constructed examples where this wasn't the case, but that was caused by boundary effects of impossible parameter values.
@berjonah110
@berjonah110 Ай бұрын
I tend to lean toward the Bayesian approach for two reasons: It tends to be easier to build up complicated models using conditional latent variables, and the prior distribution gives a way to incorporate expert knowledge about a subject. I've worked with many subject matter experts who don't have a firm grasp of statistics, but are very knowledgeable about their own corner of the world. Having the ability to take what amounts to "vibe based reasoning" from them and quantify it using an informative prior distribution gives a lot more power than just using a flat prior.
@martian8987
@martian8987 Ай бұрын
calibration
@Bamawagoner
@Bamawagoner 23 күн бұрын
Apparently you also tend to lack a fundamental understanding of these approaches
@martian8987
@martian8987 22 күн бұрын
@@Bamawagoner tell us then
@dhonantarogundul1737
@dhonantarogundul1737 Ай бұрын
I always interpret frequentist statistics as "static statistics" and Bayesians statistics as "dynamic statistics," which works well in my field of study, robotics!
Ай бұрын
robotic statistics!
@haukur1
@haukur1 21 күн бұрын
That's a really neat way of putting it
@hughobyrne2588
@hughobyrne2588 Ай бұрын
It seems like having factions that say "Pi is determined by geometry: the ratio of a circle's circumference to its diameter" and "Pi is determined by calculus: the limit of an infinite sum (pick your favourite)".
@very-normal
@very-normal Ай бұрын
and then some people in one of the factions can’t stand it that the other one says it differently, I won’t say who
@LBR_Bounty
@LBR_Bounty 29 күн бұрын
So off of an hour of KZbin math, I’m assuming the geometry side is frequentist ideals while the calculus side is Bayesian ideals. And the geometry side will be upset at they calculus side of how to interpret it. Let me know if I’m right or wrong.
@scepticalchymist
@scepticalchymist Ай бұрын
Idealists will not stop to debate which approach is more valid, pragmatists will just use one or the other depending which one fits best in any given situation.
@Kubboz
@Kubboz Ай бұрын
@@sumdumbmick I mean, it isn't based on dogma. That's why the guy in your story failed, no?
@Ethan13371
@Ethan13371 Ай бұрын
One option for avoiding Bayesian prior-rigging is to simply publish the calculation itself, lacking any prior. Then, the probability is simply some function of the prior, which you could graph or something to visualize it better
@andrewharrison8436
@andrewharrison8436 Ай бұрын
That seems a really interesting approach - as much for the psychology as for the mathematics. It might make it easier to soften a dogmatic prior into an introspection on the evidence/knowlege/belief that underlay the prior.
@danielkeliger5514
@danielkeliger5514 Ай бұрын
It is kind of what is the frequentist interpetation of Bayesian methods. They are “just a fancy estimate” for which you can proove things like asymltotic unbiasedness and normality, etc. Fun fact, but apart from degenerate cases, Bayesian estimates are not unbiased.
@Tom-qz8xw
@Tom-qz8xw Ай бұрын
The prior is a distribution though? How are you going to graph a functional?
@Y2B123
@Y2B123 Ай бұрын
Exactly, it is very common to have close to no idea about the prior and somehow be able to state something useful in this way.
@Zxymr
@Zxymr Ай бұрын
I told my Asian parents that I was Bayesian. They disowned me.
@nunkatsu
@nunkatsu Ай бұрын
Dude, worst time possible for me to read that comment. I just found out that my Asian crush (who reminds me of my Asian ex) at statistics classes in college has a boyfriend. Everything you wrote gave me PTSD.
@definitelynorandomvideos24
@definitelynorandomvideos24 Ай бұрын
ya both should've calculated the probabilities of those events happening
@kevinvanhorn2193
@kevinvanhorn2193 Ай бұрын
That's because you mispronounced "Bayesian". It's "bay-zee-uhn," not "bay-zhun."
@xinpingdonohoe3978
@xinpingdonohoe3978 Ай бұрын
​@@nunkatsu PTSD over a crush. Is that normal?
@harlowcj
@harlowcj 27 күн бұрын
​@@xinpingdonohoe3978Probably for the statistics crowd.
@John-zz6fz
@John-zz6fz Ай бұрын
One of the advantages of the Bayesian approach is it feels more "natural" to incorporate non-quantitative evidence into your calculations. For example it's pretty easy in a frequentists analysis to calculate the odds of rolling a 6 on a d6 if you have rolled it a hundred times and can see the distribution of prior outcomes (fair or loaded). If instead you tell them we have no prior data but there's double sided tape on the 1 side you can easily "swag" that prior with a Bayesian approach and get better results. I've never actually seen a calculable advantage to either view but if you start fudging some numbers using a frequentist approach it just feel like you are doing something wrong... I don't actually think there is a difference or if there is I clearly don't understand it.
@micheldavidovich6940
@micheldavidovich6940 Ай бұрын
Question about the null hypothesis you selected as I’ve had this is ue come up as well. Why do you select H0) pi = 85%? If you want to make a decision on whether the coffee shop is good or bad, shouldn’t it make more sense to asume pi
@BobJones-rs1sd
@BobJones-rs1sd Ай бұрын
Excuse the long explanation, but I'm trying to correct a few potential fundamental misconceptions in your question first. First, the null hypothesis by convention is typically assumed to equal a specific value, though there are some textbooks and sources that use notation like you're suggesting for one-sided tests. The reason for the equals sign even in one-sided tests is because the actual computation of a p-value for a hypothesis test kind of requires this. (In more advanced statistics, there are ways of defining things to get around this and construct a p-value that accounts for a range of null values, but that complexity isn't necessary for this video.) The reality is that the test done in this video is basically trying to compute the p-value using a null hypothesis value that maximizes the probably of a Type I error. If you do a one-sided test, that null value will still be the one that has the equals sign, and any values "to the other side" of the null (in this case, less than 85%) would have a smaller chance of producing a Type I error. The test we're interested in produces a p-value for that maximizing case. To put it more simply from a math standpoint: whether you're doing a one-sided or two-sided test, you still need a specific null value (not a range) to plug into the estimation formulas for the standard deviation and used in computing the z value which is then used to calculate the p-value. The specific null value (here 85% or 0.85) is the one that maximizes that Type I error probability. Any other value below 85% would give a lower p-value and thus potentially produce an inaccurate test result if used by itself. Hence, pi = 0.85 suffices for the null hypothesis. I think what you're trying to ask here is why the video didn't do a one-sided test. The difference in that case would really be in the ALTERNATIVE hypothesis (not the null). I think you're arguing for an alternative of H_a: pi >85% rather than "not equal to 85%". Arguably, you're correct that it might be more appropriate here, as he's interested in whether the proportion exceeds 85%. But even if he did a one-sided test (in this basic case), he'd still be effectively constructing distributions to test a null hypothesis for a specific value, not a range. The null still wouldn't need to be written as
@micheldavidovich6940
@micheldavidovich6940 Ай бұрын
@@BobJones-rs1sd on the contrary, really appreciate you taking the time to answer that thoroughly. You are completely correct about the alternative hypothesis, that was wrong from me. The rest is just ignorance on my part, so really appreciate the explanation
@lazerbungalow
@lazerbungalow Ай бұрын
@@BobJones-rs1sd I'm also thinking he left it as a two-tailed test because it was the default value for the test he did it R and didn't really think about it too much. Here, it worked out that the test rejected anyway, but yeah, doing a one-tailed test since that was the research question he was interested in might have been preferred. But it still works for this example.
@otsoko66
@otsoko66 25 күн бұрын
@@lazerbungalow Not necessarily -- doing a one-tailed test assumes that all of the error / variation must be in one direction, and you need to provide some proof for that assumption. If you perform a one tailed test at p = .05 without such proof, you are really doing a test with p = .10. One-tailed versus 2-tailed is not a function of your hypotheses, it is a function of how error /variation happens in the world. An example of a good one-tailed test is change in kids' height from age 10 to age 12 -- we can pretty much assume that kids don't shrink from 10 to 12, and that the error/variation will only be how much they grow. But note we MUST assume no kid-shrinkage to do the 1-tailed test here.
@lazerbungalow
@lazerbungalow 25 күн бұрын
@@otsoko66 I see what you're saying, but my initial feeling is to disagree. If he is only interested in whether it is "better," then if the sample ends up being at the end he's not interested in, he fails to reject the null. There is no type I error there. The error rate is still 0.05. If the two populations are the same, he will still only falsely reject 5% of the time. Now, you might possibly make the argument that it affects type II error if you are saying that the two populations could be significantly different in the opposite direction that he assumes. Because while that is not his research question, a rejection in that direction is valid science because it calls into question his basic ideas about his hypothesis. So in that case, if you feel it is a convincing argument to do a two-tailed test, then that's something to go on. But he is not really testing at 10% type I error rate, because in one direction he will fail to reject.
@PrParadoxy
@PrParadoxy Ай бұрын
Bayesian view doesn't apply very well to statistical physics. There is a concept called micro canonical ensemble, where we assume the frequency of each event to be equal. From this, we can calculate entropy, and from that one calculate other physical quantities of interest like temperature. In Frequentists' view, no problem arises as the physical system's properties is independent from observers knowledge. Everyone agree on the temperature of the box of a gas. However, in Baysian point of view, if someone (say by prior measurements) have extra knowledge about the system, they would not assign equal probabilities to each individual events, causing them to claim a different entropy, temperature, etc, which would not agree with our actual observation. I have seen some effort to fix this, however, Baysian view is not as natural as this video makes it to be.
@very-normal
@very-normal Ай бұрын
yeah that’s fair, I see what you mean. Physics is a totally different world than the biostatistics world I’m used to
@naturalequations
@naturalequations Ай бұрын
@PrParadoxy I want to hear more about this! But I would argue that the Logical Bayesian approach has to do with the process of acquiring information on a real phenomenon and updating the status of knowledge of the observer. Its objects are propositions, not the physical system itself. On the other hand, for statistical physics and quantum mechanics, the probabilistic nature of the evolution of the system is a characteristic of the physical model of the system, has nothing to do with the observer (before measurement) and is objective in nature. While the system evolves, there's no update of knowledge to do for anyone. Moreover, when you build the concept of ensemble, one basically starts with the idea of taking an infinite amount of copies of the system. What this means is that when you go and do measurement of the system, when you want to check if your model is correct or not, the model itself is not deterministic but comes with probability distributions. But the Bayesjan statistics would apply in the context of characterizing statistical properties of the system (P, V, T, S, U, whatever) given the experimental data with its own uncertainties and the model which is now non-deterministic.
@bjorntorlarsson
@bjorntorlarsson Ай бұрын
Does "everyone agree"? Isn't that worse than subjective in that it is also collective? What about the ratio of matter to anti-matter in the universe, how does the apriori assumption work in that case? Wouldn't it be a good idea to consider adjusting the parameter, the theory, because the data doesn't fit well with the prior.
@PrParadoxy
@PrParadoxy Ай бұрын
​@@bjorntorlarssonCertain physical quantities are only meaningful in absolute sense. Imagine if someone tells you the temperature of your boiling water in the kettle, is in fact 0 kelvin. They reason that they know the the micro state of each individual particle, so the amount of uncertainty, or the entropy if you will, that they have about the system is zero. It does not make sense, does it? It has nothing to do with parameterization, really.
@naturalequations
@naturalequations Ай бұрын
@@bjorntorlarsson Firstly you choose your hypothesis H, which is a proposition like "This physical process can be described by this specific model which has these specific parameters". Then the prior gives a probability distribution for the parameters of that theory given all you already know, both about the physical process itself you wanna describe and the model you're trying to describe it with. Then through the likelihood what happens is that your prior knowledge about the parameters of that model is updated and you get a new probability distribution "a posteriori", that takes into account the new data. If you change the model, you consider a totally new prior associated to the new model. In the case of choosing the relative value of dark matter fraction or dark energy fraction, it's not a change in the model. If you want to be totally agnostic you choose a prior so that all the Omega_i sum to 1 but are then free to vary within the allowed range. If instead you've done multiple experiments already on the LambdaCDM model another choice could be to take as your prior the posterior given by the chain of former experiments. I don't see the issue here
@davidarredondo2106
@davidarredondo2106 Ай бұрын
Excellent video!! I’m almost done with Bernoulli’s Fallacy myself. I do want to add that, for what I’ll call “reasonable” priors, the choice of prior doesn’t matter in the long run, as the data will dominate the posterior through the likelihood. Basically, with Bayesian statistics, we’ll find the truth if we just keep on collecting more data. Again, thanks for this great summary! I teach both a high school stats course and a high school Bayesian data science course, and this is the best short explanation of the difference I’ve seen. Congrats!
@simonpedley9729
@simonpedley9729 Ай бұрын
But there are many fields of science where you can't collect more data (pretty much the whole of environmental science). So then priors are critical.
@martian8987
@martian8987 Ай бұрын
@@simonpedley9729 So is this a chicken or the egg situation ? (which really doesn't make sense because eggs came first as other animals had them...like chicken's ancestors....so egg always came first!).
@Skyhigh91100
@Skyhigh91100 Ай бұрын
Ok but if you can just keep collecting more data, you can just do a frequentist analysis. If the priors eventually “drop out” of the calculation, all you’re left with is the experimental ratio.
@ZekeRaiden
@ZekeRaiden 19 күн бұрын
That would seem to be a concession to the frequentist though? That is, the specific reason given for why Bayesian approaches are better is that frequentist assumptions either don't make sense (e.g. long-past or one-off events cannot be understood as having a "frequency") or refer to impossible actions (somehow collecting a brand-new, comparably-sized set and running the "experiment" again, when such data simply doesn't exist). If fixing the problem of a bad prior requires repeatedly collecting data, the Bayesian is now in exactly the same hot water as the frequentist: they both need do-overs that are impossible or nonsensical. Under those lights, the rationale seems to be in the frequentists' favor by parsimony: the Bayesian is embarked, they _must_ commit to a prior, but the frequentist does not. Instead, the frequentist commits to a particular risk of making a mistake. Now, I'll note that I'm a pretty firm frequentist who does not have a very positive view of Bayesian methods (not least because I find a lot of Bayesian boosters make some strident and excessive claims...), but I think the point still stands. If the Bayesians' problem with frequentist methods is that the latter requires imaginary repeats, why don't they also have a problem with the risk of bad priors for things where we're only able to update our beliefs a very small number of times.
@simonpedley9729
@simonpedley9729 19 күн бұрын
@@ZekeRaiden To add to your final comment about bad priors...the Datta, Mukerjee, Ghosh and Sweeting (2000) paper shows that the error due to having the wrong prior, and the error due to not having enough data, are the same order, O(1/n).
@itzsnorlax6057
@itzsnorlax6057 Ай бұрын
As someone who lives near that Mostra coffee in San Diego, I recommend it!
@philipoakley5498
@philipoakley5498 Ай бұрын
great point at the end about needing to pre identify the prior _distribution_ (and hence how fast or slow the data will pull toward 'truthiness')
@paulschmitt6703
@paulschmitt6703 Ай бұрын
Excellent, concise description of the essential differences between the Bayesian & Frequentist philosophical perspectives with examples. The frequentist methodology as used today is all too often a hybrid mess of two distinct approaches. The physically separate frequentist approaches of Fisher and Neyman-Pearson have been mistakenly combined in a manner which neither Fisher nor Neyman would have approved. Bayesian findings tend to be more intuitive than frequentist results - so much so that frequentist analyses are often interpreted as in a Bayesian framework! For example, most consumers of statistical information will interpret a frequentist 95% confidence interval in the context of a bayesian 95% credible interval - as the later is much more intuitive to understand!
@douglaszare1215
@douglaszare1215 Ай бұрын
Confusing a confidence interval with a credible interval is just a common error. We can run an experiment to calculate pi and might find a confidence interval of (3.1,3.2), or (2.9,3.1). I guess some people might say that our belief is that the probability pi is in (3.1,3.2) is 95%, but these people are wrong.
@Toksyuryel
@Toksyuryel 20 күн бұрын
I often look at this debate through the lens of physics models, where you can have one model that is simpler and often "good enough" in most scenarios, and another model that is much more complex and able to more accurately describe a larger number of scenarios. Examples being electron orbitals vs electron cloud, or newton vs einstein. Here, I consider the frequentist approach to be the "simpler, good enough" form and the bayesian approach to be the "complex, more accurate" form.
@Skeleman
@Skeleman Ай бұрын
To understand the beta distribution: 1. Imagine you are sending rockets to an alien planet to see what portion of the surface is covered in water. 2. You can send probes that hit the ground, are destroyed instantly, but send back whether they landed on water or land. 3. Let's say you send down a probe and it says it hit land. 4. If the entire planet were water, there is a 0% chance the probe would say land. If the planet were 100% land, there is a 100% chance of it happening. If you plot the percent of land on the planet on the x axis, and the relative probability that the probe says land, you get a line from 0,0 to 1,1. You can image the opposite being true if it said water: a line from 0,1 to 1,0 5. If you send another probe and it says water, then you can combine two plots. multiply the land plot by the water plot because at each possible percent of land on the planet, the probabilities are being combined. You'll end up with a parabola. 6. Keep multiplying the right plot if the probe says water or land, slowly you'll get a bell curve where the peak is at the ratio of land and water probes. This is what the beta distribution is.
@ClementinesmWTF
@ClementinesmWTF Ай бұрын
Except that this interpretation only works for positive integer parameters α, β, whereas the full scope of the distribution works for any positive real numbers α, β. An actual description of the beta distribution would require a bit more complicated “probe sampling” than described here-having troubling thinking up any alterations to the described example at 3a tho.
@BillyViBritannia
@BillyViBritannia Ай бұрын
Apart from maybe being easier to understand for some people, I don't get what the bayesian approach adds. For problems with a very small frequency or sample size you prior is the thing that's going to influence the outcome the most so you are essentially guessing where the frequentist would say "no idea". Doesn't sound like a huge improvement to me. Edit: I guess if you HAVE to make a decision, saying "it's between 0 and 100" is better than saying "it's 50 but I'm probably wrong"
@very-normal
@very-normal Ай бұрын
with something like statistics, the value in being easier to understand can’t be understated
@Skyhigh91100
@Skyhigh91100 Ай бұрын
There is an excellent 3blue1brown video that explains a situation where the Bayesian approach is very impactful: health screening tests. Imagine you have an extremely specific test for a rare disease (let’s say the “true” probability of having it is ~1/10,000 people), something that only gives a false positive 0.1% of the time, and a false negative 1% of the time as well. That’s a great test, right? We should give it to everyone to screen for this disease! What’s the harm, right? With such a low error (from the frequentist perspective), most people who test positive will have the disease and be able to be treated. Well, hold on though, is that assumption true? Imagine giving this test to 1,000,000 people. Since I’ve defined this disease to have an actual objective rate of 1/10,000, about 100 people in this group actually have the disease, with 99 of them being caught by the test and 1 missed. On the other hand, since the test has a false positive rate of 0.1%, 1000 people have been given a false negative result. That means on a test with extremely low type I and II errors, your chance of actually having the disease if you get a positive on the test is only about 10%! That’s incredibly unintuitive from a frequentist perspective, and how one would even go about getting that number and justifying it isn’t really clear. Bayesian statistics, however, bake all of these assumptions into the calculation, so they can be interrogated and updated. That 1/10,000 number was something that I just magically knew in this example, but a Bayesian statistician can get a similar prior probability estimate from any number of sources. The really important part of this is that it demonstrates the need for multiple screening techniques, because they represent multiple times that the probability that you are positive for a specific disease are updated. This is why, for instance, it is no longer recommended that all women get mammograms after a certain age unless there is some other indication that boosts the probability that they have breast cancer: there were too many false positives, and false positives are not free. They cause stress and anxiety for patients, they cost additional healthcare resources, and they dilute the pool of patients who actually need care with patients who only have been told they might need care.
@hashmarker4994
@hashmarker4994 29 күн бұрын
> Here we are trying to understand what Mostra cafe's good rating is. We did a hypothesis t-test with 1 degrees of freedom. Since we evaluated only 1 Mostra cafe instance. Realise that we are not comparing Mostra cafe to other cafe's. > But trying to understand the Good Rating for Mostra cafe in real life based on its Rating in Google.(Which is the sample, not the population) Using the frequentist approach. >We realise the google rating for Mostra cafe is around 0.88 (and we are 95% CI the actual real percentage is between 0.859 & 0.899). But we dont really know the actual probability the good rating is at 0.88, We just know that the population mean is 2 standard deviations around it.
@alex_zetsu
@alex_zetsu Ай бұрын
In my opinion, these two philosophies can be reconciled by thinking of frequentist statistics as just approaching the problem with a specific prior that is asking "am I X percent confident in posterior outcome A?"
@Jk-trp
@Jk-trp Ай бұрын
I like the channel, subscribed, keep up the good work.
@wayneford6504
@wayneford6504 Ай бұрын
Very clearly and skillfully explained.Maybe one day you will do a series on the logic of science material?
@donaldlacombe84
@donaldlacombe84 Ай бұрын
Another advantage of Bayesian statistics is that the joint posterior allows for the calculation of the marginal distributions for the parameters and probability statements can be made regarding these parameters.
@coda-n6u
@coda-n6u Ай бұрын
Thanks for your video! I’m by no means a statistician, but I find Bayesian inference to be interesting and valuable in and of itself when you look at statistical learning. When you need to examine and theorize about the process of learning, viewing probability in terms of a belief updating process is extremely useful. So many people get stuck on the “Bayesian stats is subjective”, but if you’re looking at a machine learning model, the point is that over time it can learn and reduce its error over a training process using belief update rules. Is there a frequentist interpretation of machine learning?
@stephenbrillhart6223
@stephenbrillhart6223 Ай бұрын
Did anyone else notice that the “prejudiced” prior distribution toward the end is not a valid probability density function?
@very-normal
@very-normal Ай бұрын
manim has trouble drawing beta distributions
@CrypticManu
@CrypticManu Ай бұрын
Man I just love your videos, keep it up!
@thinkingchristian
@thinkingchristian Ай бұрын
Great video. I always thought the frequentist approach was actually more subjective, because what is truly considered a relevant counterfactual is open to interpretation in many cases (In my view, the Bayesian is more up front about this). Alan Hájek has a lot of great work on this-in fact Hajek is worth reading in general. What may be interesting to note is that in my field (Electrical Engineering) I find most of us are Bayesians. It may be because there is an intuitive connection between Bayesian statistics and topics in information theory like entropy and mutual information. Still, there is a promise for a hybrid view: as Roderick Little suggests, “inferences under a particular model should be Bayesian, but model assessment can and should involve frequentist ideas". Also it is interesting to note that Clayton's book Bernoulli's Fallacy borrows quite a bit from ET Jaynes (though I disagree with Clayton on a few points). Janyes was a great statistician but he was as hardcore a Bayesian as they come.
@craigparker1410
@craigparker1410 Ай бұрын
Do you use Manim to make your visualizations ? I love how you work through the concepts and keep the canvas as clean as possible. Keep up the great work 🎉
@very-normal
@very-normal Ай бұрын
Yee i am a manim novice
@tuongnguyen9391
@tuongnguyen9391 Ай бұрын
@@very-normal Where to learn manim from your bayesian prior ?
@very-normal
@very-normal Ай бұрын
I’m self taught from reading documentation but I’m aware of tutorial videos on KZbin
@RAFAELSILVA-by6dy
@RAFAELSILVA-by6dy 29 күн бұрын
This video gives the impression that Bayes Theorem is exclusively part of Bayesian probability theory. It's basic set theory and applies whether you are a frequentist or Bayesian. Another issue is that the finite number of measurements applies across all of physics. You cannot, for example, calculate an instantaneous velocity. You can only measure position either side of a finite time interval - and calculate an average velocity. That does not, however, mean that you cannot use calculus in your physical model and cannot use the concept of an instantaneous velocity. Likewise, although you can only repeat an experiment a finite number of times, you can use mathematics to model an infinite number of experiments. We are free, therefore, to use the mathematics of infinite sequences in probability theory. It's not even necessary to believe that there is an absolute underlying probability: only that you can usefully model a scenario using the mathematical concepts of absolute probabilities and relative frequency as the limit of an infinite sequence of experiments. That doesn't need to be practically achievable in order to be a valid mathematical model. Otherwise, physics would have to rely solely on the mathematics of finite numbers! Finally, I don't agree that a Bayesian can believe that A has a 20% probability and not A a 50% probability. That would be absurd. The priors have to be consistent. In fact, both frequentists and Bayesians are essentially tied to the Kolmogorov axioms.
@philipoakley5498
@philipoakley5498 Ай бұрын
What is probability? : one also needs to compare and contrast that with 'statistics' as either synonyms or distinctions to help with discussion. The frequentist 'close enough for practical purposes' get out also isn't great from an engineering perspective either ('when will the bridge fall down?', 'tracking a radar blip', etc.). I feel that the Bayes formula starts as a 'complicated' (tricky to visualise) formula, and that P(A & B) = P(A|B).P(B) = P(B|A).P(A) is an easier starting point that is just as simple as frequentist counting with the same underlying assumptions (Belief: identically and consistency of the independent events)...
@vagarisaster
@vagarisaster Ай бұрын
One of the best ad segues I've ever heard. 💀
@notimportant2478
@notimportant2478 Ай бұрын
Frequentist approch looks like a neutral pragmatic approach to statistics while the bayesian approach is more flexible and adaptive approach. I'm sure each have their own strengths in different situations. I believe frequentist approach is good as an first value estimation when you know nothing about the data you're studying while bayesian approach allows you to get more precise results as your understanding gets better. I know what I'm saying isn't mathematically rigorous but it mathematics is always derived from "desired properties" and it very much look like these approaches are developed for the desired properties they offer. If you have any objections, let me know I'm very interested in learning more about what you think.
@aakashparida2026
@aakashparida2026 Ай бұрын
New found love for statistics....Thank you so much!!
@Shantanu_Dixit
@Shantanu_Dixit Ай бұрын
You just gained a subscriber love your content 🌿🌿🌿
@PerishingTar
@PerishingTar Ай бұрын
You got my butt with that ad transition 😅
@very-normal
@very-normal Ай бұрын
gottem
@danielkeliger5514
@danielkeliger5514 Ай бұрын
Yes, I totally agree that for large sample sizes the two methods basically give the same asnwer. They are also compatible in the sense that Bayesians have their own interpetarion for maximum likelihood and Bayesion methods can be analysed via frequentist language. (Infact it is more natural to understand the limit theorem mention in the video in frequentist terms, in my opinion.) Still, I want to make some remarks. Firstly, I’m sort of a prularist. I don’t think probability stands for a single concept. Statements like “what is the probability that this previously unknowm sonnet was written by Shakespeare” can be interpeted in a Bayesian way much more generally, while physical problems (see belowe) makes more sense in the frequentist interpetation. Ultimately, there are many things that satisfied by the Kolmogorov axioms that has nothing to do with randomness. (Say the ratio of votes in an election.) It is possible to do probability theory without reffering to randomness at all. There are cases when we do actually talk about frequencies in the world. Ergodicity is s good example. Saying things like “if I know the exact inital conditions, I can calculate the exact ratio of times the coins will land on head” and therefore “probabilites are purely epistemic” kind of misses the point. I’m not interested in this very particular initial condition. I want to show that this behaviour when roughly 1/2 of the coin tosses lands on head is typical for most of the initial conditions. This 1/2 number is a property of the system, and it doesn’t describe the mental state of an idealised, rational observer. (With the obvious objection that of cource observations themselves are model dependent, etc.) Lastly, all the populat interpetations have their own philosophycal problems. I don’t know any interpetation that are not ultimately flawed under greater scrutiny. This is actually very typical when it comes to phylosophical problems. (Think about all the different schools of ethics.) I think I like the propensity interpetation of probability the most, but that is not perfect either.
@f1f1s
@f1f1s Ай бұрын
The idea of a repeated experiment showcases the inherent variability in the parameter estimate, i.e. the sampling distribution. A frequentist assumes that there could be a different data set borne by the same invisible data-generating process (law). Bayesians tend to jump onto data matrices as if those n=200 observations were the one and only realisation possible, without other hypothetical scenarios occurring, as if there were no Heisenberg principle or quantum uncertainty. The frequentist approach reflects the randomness of Nature and unobservability of hypothetical outcomes better: ‘it could have been otherwise’. Finally, Bayesians often make ridiculous distributional claims: ‘assuming the prior normal distribution, the posterior distribution of the linear regression slope estimator is precisely Student with n=198 degrees of freedom', whilst frequentists are much more careful about heteroskedasticity, calibration, coverage probability, and Bartlett correction, which are essential to control the false discovery rate: ‘there is some unknown law, but we can compute some functionals thereof regardless of the joint and marginal distributions, as long as enough finite moments exist for the WLLN and CLT to work’.
@ucchi9829
@ucchi9829 Ай бұрын
Finally, a Frequentist defense.
@Tom-qz8xw
@Tom-qz8xw Ай бұрын
There’s a lot of Bayesian bullshit using parametrised distributions in their posterior
@Velereonics
@Velereonics 20 күн бұрын
I have hated the law of large numbers ever since I had the misfortune of learning about it in high school
@huhuboss8274
@huhuboss8274 Ай бұрын
Why would you risk having a wrong prior when you can simply use the frequentist apprach? Genuine question
@very-normal
@very-normal Ай бұрын
I think that for any simple problem like this, enough data will make up for any “wrong” prior. But I’m not sure what it means to have a wrong prior in the first place
@floatingblaze8405
@floatingblaze8405 Ай бұрын
(Talking as a complete amateur here) As I understand it, having a "wrong prior" is no less fatal than making certain assumptions that aren't applicable to your circumstances in a frequentist context (given that I strongly believe there's no such thing as "simply" using a frequentist approach... or "simply" using statistics in general😅). Both will lead to the same outcome of misinterpreted or straight up wrong numbers, so I believe this is more of a "pick your poison" situation.
@huhuboss8274
@huhuboss8274 Ай бұрын
@@very-normal If you believe to have prior knowledge that does not match reality, I would call that a wrong or bad prior, resulting in bad results.
@very-normal
@very-normal Ай бұрын
If that’s the case, then I’ll need to rely on someone to call me out on a bad prior. If results are going to be published, they have to be vetted. Why risk someone misusing frequentist statistics when we can force them to express their beliefs via the prior
@WeirdPatagonia
@WeirdPatagonia Ай бұрын
Most of the time if you have no good guess you actually use non informative priors, which is equivalent to what you suggest. But, even with that, some people would argue that the interpretation of the findings is different, and they would be right
@tunneloflight
@tunneloflight Ай бұрын
Btw - my arguments with statistics do not mean they are useless. Rather, they are frequently and all too easily abused (intentionally or unintentionally). In every instance, the use of statistics as applied to real world analyses must be critically analyzed and scrutinized, starting with the assumptions, presumptions and desires and relative ignorance of those involved. Even when all of that is fair, statistics often goes wildly away from truth or reality. And researchers often fail to apply even the most basic critical analyses to the results. Is the population a single population? Is the population linearly, triangularly, normally, poison or other distributed? Are there hidden variables? Is the data the result of stochastic events acting on stochastic events? Do the results violate sanity? Do the results suggest results outside the bounds of the analysis? Is the thesis or hypothesis that resulted in the data gathered biased in its own rights? Etc...
@antigonid
@antigonid Ай бұрын
That Brilliant joke was, well, brilliant
@markuspfeifer8473
@markuspfeifer8473 Ай бұрын
Bayesianism is just superior. It allows for straightforward statistical connectives and gives us distributions rather than rigid numbers. It’s just a lot richer and might also lend itself more readily to generalizations of statistics once we understand them better (eg negative probabilities and so on)
@xenoduck3189
@xenoduck3189 Ай бұрын
Bayesan probability still has the same definition as frequentist probability!!! What you are showing is not a "definition" of probability, it is just Bayes' rule, which says NOTHING of P(A), only of P(B|A). The law of large numbers gives the definition of probability, regardless of what field of maths you study. I feel like this was really misrepresented in the video.
@foresthobo1166
@foresthobo1166 Ай бұрын
Your comment only shows one thing, you don't understand the Bayesian viewpoint. Back when I was doing my PhD I had a designated helper from the statistics department to coach me on methods. One day we started talking about Bayesian thinking (he was a frequentist). After trying to do some math with me being confused he stated (again as a frequentist): If I roll a die covering it with my hand the probability of it being a specific number from one to six is 1/6. As a frequentist, I say it IS one of the six(it's a physical entity) with said probability, a Bayesian will say we belive it to be something but it isn't until we discover more (remove the hand). This distinction makes very little sense for his example, but a huge difference for more advanced statistics. (and all the natural sciences that depend on it) As a sidenote, there is more than one kind of math and they don't always agree. Look it up.
@xenoduck3189
@xenoduck3189 Ай бұрын
@@foresthobo1166 Whether or not you calculate the probability as a frequentist would, what you are trying to estimate through Bayesian thinking is how likely a specific outcome is to happen. If given the means, you can verify your result using the law of large numbers by running a bunch of experiments. That is probability, regardless of what method of dealing with it you subscribe to. I feel like this is not particularly debatable or hard to understand.
@santiagobustamante6192
@santiagobustamante6192 Ай бұрын
@@xenoduck3189I’m a probability and statistics professor for physicists, and a little bit of a quantum information scientist. Sure enough, the probability of physical events should not depend on your interpretation of probability. That is, both a bayesian and a frequentist must agree that the probability of obtaining a particular outcome in a fair dice roll is 1/6. However, the frequentist states this due to previous experiencie with fair dices, whilst the bayesian does it due to complete lack of information of the result of the dice roll. Once the bayesian sees the results (obtains information of the system), they update their probability distribution to one where the observed result of the dice roll now has unit probability (a distribution with zero entropy, i.e. a state of complete knowledge). An instance which may help understand the difference in the interpretations is the following: for a frequentist, the question “what is the probability that God exists?” does not make any sense and cannot be answered since there is no way of performing trials for the existence of God; something that underlies the frequentist definition of probability. On the other hand, a bayesian may say there is a 50% chance that God exists, since the answer is binary (God either exists or not) and, in this case, a 50-50 probability distribution is the one which best describes the state of complete lack of knowledge (i.e. maximum entropy).
@plainguy3567
@plainguy3567 Ай бұрын
This is incorrect. It makes sense but it isn't correct when you start thinking about events that have eventually but single state. For example the sun WILL go supernova. 100% this will occur. But the sun WON'T go supernova today. 100% it will not occur today. So what happens is that the Law of Large Numbers struggles with events like this. The law tells you it WON'T happen (though we know it will) because each day as an experiment suggests it won't but it also somehow tells you it WILL happen because many stars have this fate. So how do you deduce the odds of the sun going supernova today? If you say it's zero which the LLN suggests and it happens then you were wrong but if you say it's one such as the LLN suggests and it doesn't happen you are also wrong. Sequenced odds that are not IID do not work with LLN. It's just completely incorrect for more complex systems.
@vez3834
@vez3834 Ай бұрын
I don't know why you need to say that Bayes' rule isn't the definition of probability? When he brings it up in the video, he is pretty clear in talking about P(A|B). He is talking about the philosophical interpretation between the two viewpoints. Bayesian thinking gives us a different way of looking at our error and at our assumptions.
@angrymeowngi
@angrymeowngi Ай бұрын
I take offense at that remark against statisticians on their incapacity for violence. I'd have you know, Sir, that statisticians are just as likely to commit violent crimes but have less probability of being caught because they know how not to become a statistic.
@AdamKuczynski322
@AdamKuczynski322 Ай бұрын
Surely more people have visited the cafe than have left a review? So repeating the experiment and collecting another ~1k reviews from those who simply hadn't written theirs down isn't all that far fetched. Impractical, yes, but entirely possible. The idea of 'repetition' breaks down much more when we think of data we can't really sensibly resample/remeasure (like a country's annual GDP or the employment rate).
@scepticalchymist
@scepticalchymist Ай бұрын
The fact that more people have visited than left a review could be already a bias for the statistics because maybe for some weird reason people writing reviews are already prejudiced in their judgement all in the same way. I guess, this could be modeled in a Bayesian approach but a frequentist just has the plain numbers and cannot take anything else into account.
@dataandcolours
@dataandcolours Ай бұрын
To name individuals "frequentists" or "bayesians" is probably one of the most misleading thing one can do when actually trying to explain this in a helpful way.
@seriousbusiness2293
@seriousbusiness2293 Ай бұрын
It's very arguable if the idea of probability itself is a fundamentally real thing. Like as a mini example the digits of pi behave random by every single metric we know, yet they are determenistic and nothing random is happening. The ultimate goal of probability is modeling unknown outcomes and that can be done in many ways. So there is no true right option, all we care for is how accurate we can predict things and how interpretable it is to us. (ps in my eyes Bayesian feels more true to real life and my thinking)
@weetabixharry
@weetabixharry Ай бұрын
I'm not sure what you mean by "real" here. Casinos make real profits. A digit of pi, selected at random, (it is believed, but not proven) has an equal probability of being any number. Meanwhile, a digit of 50/99 (in base 10), selected at random, will be either 0 or 5 with equal probability. These things seem real to me.
@seriousbusiness2293
@seriousbusiness2293 Ай бұрын
@@weetabixharry I meant the sequence is random but deterministic, if you pick random digits you introduce other randomness. My thinking is multiple things appear to us as something random but if we knew the underlying dynamics we could often agree that probability theory is the wrong approach. Lets imagine an event i can only measure a single time like "Alex immediately says yes if ask him on a date today.", the idea of doing repeated trials is not real unless i have access to parallel universes, and taking other variables into account to refine my guess like comparing with other people i asked gives confidence but doesn't fundamentally reflect Alex choice then. Even if we measured every atom interaction in Alex brain we get into discussions of quantum and chaos theories. So even if our best models say the probability was 50% we cant tangibly experience or measure that 50% since we only see one outcome.
@weetabixharry
@weetabixharry Ай бұрын
@@seriousbusiness2293 I think I see roughly what you're saying... and it's uncomfortable to think about. I only feel relatively comfortable in the simple cases where the tests are repeatable and the "parallel universes" all behave the same. For example, I need 1000 dice all rolled in parallel to have the same statistical behavior as 1 die rolled 1000 times. And my dice have to have a *known* probability distribution (preferably, perfectly uniform) or I'm gonna panic.
@seriousbusiness2293
@seriousbusiness2293 Ай бұрын
@@weetabixharry haha 😂 i feel ya. Ya in any case im sure that probability theory is an extremely good tool for reasoning and decision making and often close to some Truth. But as soon as we get philosophical about the fundamentals then there is room for doubt. I think its comparable to the situation of going from Newtons Theories to Relativity Theory. Having a fixed frame of reference makes the math easy and it works most of the time but if you care about fundamentals and edge cases you need a relative model of physics. Thinking about dice and cards is more a clean setup like a Newton model that assumes each object has some absolute probability making for an actually very good model. But converting any probability number into a tangible real world concept may not always work and may need a more nuanced idea of what that number means, like in relativity we found that two observers can disagree on a space or time measurement but that gets fixed if you talk about the new concept of space-time.
@tom-kz9pb
@tom-kz9pb Ай бұрын
The Bayesian camp drives artificial intelligence. It is a viable approach by the grace of Big Data. It is a double-edged sword. It can sometimes ferret out subtle patterns that humans would miss, but there is risk of conflating correlation and causation. The frequentist approach works best if you have a theoretically perfect coin with an exact 50-50 chance of heads or tails. The Bayesian approach works best if you CANNOT be sure in advance whether a coin is loaded or honest, but want to make the best estimate as to the outcome of the next throw, regardless of the uncertain coin status.
@Skeleman
@Skeleman Ай бұрын
If anyone is interested in if there is an objective way to pick a prior probability distribution, you do it with something called "maximum entropy". And the entropy they refer to is the same one the physicists talk about.
@danielkeliger5514
@danielkeliger5514 Ай бұрын
I disagree. In the case of the p parameter for Bernoulli would be the uniform distribution. That is, however, depends on the coordite system you choose, as opposied to other methods like Jefferey’s prior. Maximal entropy arguments in general rely on some assumption of a unoform distributions even in physics. (Think about the whole combinatoric derivation with Stirling’s formula.) Ultimately, all models depends on assumptions let it be frequentist or Bayesion. There is no such thing as “purely leting the data talking for itself”.
@Skeleman
@Skeleman Ай бұрын
@@danielkeliger5514 I agree that there is never a way to "let the data talk for itself". I think i misused the term "objective". There are reasons to use the maxent distribution to ensure you aren't adding any "hidden" assumptions to your analysis.
@danielkeliger5514
@danielkeliger5514 Ай бұрын
@@Skeleman I totally agree that uninformative priors are great tools for mitigating subjectivity. I just don’t belive in logical positivism :)
@GenericInternetter
@GenericInternetter Ай бұрын
Not a statistician, but I do have a take on this... The Bayesian method relies on priors which hamstrings the whole practical purpose of analysis. Instead of debating results, people instead debate priors. It just shifts the whole thing from one frying pan to the other. The simplistic frequentist approach you described is utterly naive. You completely missed the whole concept of random walk. In practice, the most reliable approach to probability is the non-naive version with a large dataset, or a large set of datasets. Random walk is critical to understand for the frequentist approach to make any sense. For example, imagine flipping a balanced coin 4 times (small example, easier to explain) The naive approach would assume that larger datasets tend towards 50% heads, but this doesn't make sense. The probabilities are: 0% heads -> 1/16 25% heads -> 4/16 50% heads -> 6/16 75% heads -> 4/16 100% heads -> 1/16 It's a bell curve centered at 50%. With large data sets, your chance of getting the expected 50% result is only around 6/16, but your chances of getting either 25% or 75% is 8/16... Which means the naive approach is more likely to give an inaccurate result! Random Walk (results steering away) is a huge topic in itself and definitely needs to be accounted for to rely on the frequentist method.
@very-normal
@very-normal Ай бұрын
how would it accounting for it help us understand frequentist methods any better
@punditgi
@punditgi Ай бұрын
Very educational video! 🎉😊
@tahsinahmed7585
@tahsinahmed7585 Ай бұрын
How would you explain the choice of the likelihood distribution chosen?
@very-normal
@very-normal Ай бұрын
I viewed the data as binary, so it needed to come from a discrete distribution. The binomial is the most commonly used family for this, but nothing would stop me from other discrete distributions that also fit binary data. I would lose conjugacy, but there are tools for doing Bayesian things in that case
@__-de6he
@__-de6he Ай бұрын
I guess ,probability is derived from geometric property of our microspace (like general relative theory is derived from timespace geometry). So frequentist approach is more relevant.
@tuongnguyen9391
@tuongnguyen9391 Ай бұрын
When I use machine learning algorithm to predict stuff, is it the bayesian way or the frequentist way ? or something between both or does it really depends on the data distribution or depend of the specific machine learning algorithm ?
@very-normal
@very-normal Ай бұрын
I think it depends on the model. For prediction, I don’t think the distinction matters all that much. I don’t work a lot with prediction but this has been my experience But for inference, it changes how you do statistics and interpret results
@VincentKun
@VincentKun Ай бұрын
During this video i just started to hate frequentist approach, they just simplify everything as if it's all independent. Bayesians give a guess and can iteratively get to the right probability by bayesian updates taking into account all the complex stuff the world offers. While with the frequentist approach you need to take a lot of trials.
@AkshayKumar-vd5wn
@AkshayKumar-vd5wn Ай бұрын
Lot of trials leads to the average of outcomes which can be analyzed than a focused analysis done one time.
@therealjezzyc6209
@therealjezzyc6209 Ай бұрын
Does constant bayesian updating also not require a lot of experimentation and trials? Not defending frequentism, but your reasoning doesn't make sense.
@AkshayKumar-vd5wn
@AkshayKumar-vd5wn Ай бұрын
@@therealjezzyc6209 Thats alright. I use whatever, never thought there was a beef. But in real world averages are fine. You cannot expect to inspect every little event or data or records one by one in its details; Hence generalizations beat specialization.
@therealjezzyc6209
@therealjezzyc6209 Ай бұрын
@@AkshayKumar-vd5wn averages aren't exactly fine in the real world though because not all distributions have finite expectation and variance. Depends on your domain. For example, the ratio of two normal variables is cauchy, whose expected value diverges. This means that if you build a model which ends up requiring a ratio of two samples then you might not have any convergence in your sample means at all. You will need to use extreme value measurements rather than expected values, and estimate the median instead. This actually happens a lot in finance and other complicated modeling because you are working with heavy tailed distributions, so outliers actually occur quite frequently, enough to throw off your samples. Although this is just me being pedantic, I'm sure you get the point and a lot of things end up being normally distributed (but a lot of things also don't too). Typically averages are only good up until the central limit theorem holds, and you can not know whether your distribution has finite variance or expectation before performing your trials in the frequentists perspective. Which means you might not converge to your desired probabilities ever and be wasting your time. idk what you meant in your last paragraph about inspecting everything at once though.
@VincentKun
@VincentKun Ай бұрын
@@therealjezzyc6209 Yeah their in some sense are two face of a medal so a lot of things are in common, in machine learning we love bayesian updates, and I might be biased by my field of study. But I feel that's the right approach to problems.
@Barteks2x
@Barteks2x Ай бұрын
To me this seems like the frequentist approach starts with "the experiments is all we know" and therefore you can calculate the probability directly from definition, while bayesian starts with some belief about what we expect and we try to use not just the experiment data but also other knowledge we may have. Wouldn't then bayesian approach with uninformative prior always reproduce (correctly done) frequentist approach? The frequentist approach is based on the implicit assumption that every possibility is equally likely, with bayesian you don't necessarily have that assumption, you may provide it explicitly though.
@very-normal
@very-normal Ай бұрын
what do you mean by every possibility
@hannahnelson4569
@hannahnelson4569 Ай бұрын
This may be a dumb response. I think 'every possibility' means the support of the random parameter/variable in question.
@koonsickgreen6272
@koonsickgreen6272 Ай бұрын
I got lost. With the coffee shop exercise, "The probability it receives a 4 or 5 star review". Receive from whom? Does it mean it 'has' received by past customers or is it ideally about the coffee shop's track record at the end of time by its reviewers?
@cutestbear3327
@cutestbear3327 Ай бұрын
i know i am probably focusing on the wrong thing, but shouldn't the cafe example be using a one tailed test? 😖
@very-normal
@very-normal Ай бұрын
it could be, but it wouldn’t really change the results of the test
@cutestbear3327
@cutestbear3327 Ай бұрын
@@very-normal it wouldn't indeed. thanks for the wonderful and interesting video 🙏
@brashmane2749
@brashmane2749 Ай бұрын
There is a mechanics anologue to this: Do you use classical mechanics or include relativistic effects? Depends. If classical is good enough you use that because relativism reduces to classical for simple and slow systems. Frequentist or Bayesian? Same reasoning. Depends. If your problem is described well enough (or perfectly) by frequentist approaches you use that, otheise Bayesian. Because why would you shoot yourself in the foot intentionally just to do it the more complicated way?
@-NguyenDuyTanA-mh1db
@-NguyenDuyTanA-mh1db 9 күн бұрын
What program did you use to do research
@very-normal
@very-normal 9 күн бұрын
I’m not quite sure what you mean, but I do use Obsidian to collect and organize all my research in general
@tobiaseriksson7216
@tobiaseriksson7216 Ай бұрын
How come we assume everything is a Gaussian or treat things as if they where? Alot of statisticial tests rely on it, but it seems like all of the factors for the test to be valid is always not respected.
@very-normal
@very-normal Ай бұрын
central limit theorem
@therealjezzyc6209
@therealjezzyc6209 Ай бұрын
​@@very-normal CLP doesn't always hold though, especially if your working with a distribution where higher moments diverge. In finance and physics this can happen more often.
@Impatient_Ape
@Impatient_Ape Ай бұрын
@@therealjezzyc6209 Levy distributions, for instance.
@therealjezzyc6209
@therealjezzyc6209 Ай бұрын
@@Impatient_Ape my goto example is the Cauchy distribution because it looks normal, but it's expected value is infinite. It is also the ratio of two normal random variables so it's actually easy to unknowingly make a model cauchy if you start looking at ratios.
@ucchi9829
@ucchi9829 Ай бұрын
Have you heard of non-parametric statistics?
@qcard76
@qcard76 27 күн бұрын
“…For some reason.” 0:37 At least you’re honest about you bias right off the bat.
@joe_hoeller_chicago
@joe_hoeller_chicago 16 күн бұрын
Causality and geometric inference for the win, with sometimes some Bayes. Frequency is only good to see what categories of things are trending in time. Nothing else. Correlation for real world uses cases doesn’t translate well outside of that.
@antoinesoonekindt9753
@antoinesoonekindt9753 Ай бұрын
Interesting video. I'm a little bit surprised, though. I'm fairly confident (let's say 0.80) that the uninformative prior for the binomial distribution in a beta distribution with parameters alpha=beta=1/2. I'm using Jeffrey's priors. If there's something I'm missing, I'd like to know.
@very-normal
@very-normal Ай бұрын
It doesn’t matter much in this context because there’s so much data that it dominates the posterior. From my perspective, the prior parameters can represent “past” successes and failures, and Beta(1,1) just says we saw only one of both. Having 0.5 of a success doesn’t make as much sense, but it still works in the end. In a paper, we might justify our priors slightly differently
@antoinesoonekindt9753
@antoinesoonekindt9753 Ай бұрын
​@@very-normal, I concur that the alpha and beta parameters are directly linked to the numbers of successes and failures. Jeffrey 's priors are proportional to the square root of the determinant of Fisher's information matrix, it cannot be as readily interpreted. If other methods for uninformative priors exist, I'm interested. Thanks and thanks for the video!
@Velereonics
@Velereonics 20 күн бұрын
When I use your views I just go through the the two three and four star reviews until I find a few that are worded and written in the way that I write and speak and think. Basically I'm looking for someone who has the same personality as me and trying to judge that through the way they leave comments which I think is actually probably a pretty robust method given the way I speak and write. Anyhow I make a choice based on those few reviews alone because I don't really care what somebody thinks about something if we have literally nothing in common because then look what what determines if something is good or bad to that person is not going to resemble what determines if something's good or bad for me.
@very-normal
@very-normal 20 күн бұрын
what would you do if none of the reviews talk like you
@manosprotonotarios5187
@manosprotonotarios5187 Ай бұрын
Your Hypothesis should be one-sided in classical statistics p>=.85
@LNVACVAC
@LNVACVAC Ай бұрын
I am not a mathematician but I have knowledge on biostatistics. The frequentist approach falls short in regards of rare diseases because: 1 - The definition of rare disease is both arbitrary and normative. (less than 1 in 2 thousand). 2 - Most medics, even doctors, are not sufficiently informed about rare diseases. 3 - Differential diagnosis typically is symptom informed and not revolving on the investigation of prevalence of specific causes. (Example: when you go do the medic with a sore throat the medic doesn't ask for a swab before looking for bacterial plaque. However only a subset of individuals with bacterial infection and sore throat will have bacterial plaque when seeing a doctor.) 4 - All these not only create sub notification, typically above 40%, but also the samples of control and unhealthy individuals will not approach infinity. 5 - Complex-Adaptative Natural Entities often behave like Loaded Dice. Not only in prevalence but in appearance/aspect, aggravating the Item 3 problem. -- Mathematicians need to understand these are not transcendental matters and that tools are instrumental, not necessary. This battle is not a first principles contention.
@thomasjalabert658
@thomasjalabert658 24 күн бұрын
I would love to have another example with fewer points and were results are much differents
@johnrichardson7629
@johnrichardson7629 Ай бұрын
The problem with a lot of the current faddish enthusiasm for Bayesian analysis is that soms people are pretending to have very specific, numerical priors that are OBVIOUSLY just pulled out of thin air, at which point, it is unclear what point there is to hearing out the rest of their alleged "analysis".
@very-normal
@very-normal Ай бұрын
i was not aware bayesian analysis was a fad lol
@johnrichardson7629
@johnrichardson7629 Ай бұрын
@very-normal It's no doubt not a fad amongst actual statisticians but it seems to have become a mostly rhetorical gimmick in other fields, including debates over the historicity of religious figures, of all ridiculous things.
@bjorntorlarsson
@bjorntorlarsson Ай бұрын
I never understood what's going on with this "choice of significance level" stuff. In social sciences it's often 5%, in particle physics it's 0.000something. Doesn't it imply that there is a third choice? To take an example from our unfortunate days: A soldier has to either move now, or stay put. Wouldn't a 49% significance level decision be better than the alternative?
@bjorntorlarsson
@bjorntorlarsson Ай бұрын
Astrophysics is by the way the only instance where I've seen clever people draw conclusions based on data which in their diagrams have an error bar that is taller than the Y-axis. So there's physics, and physics. There's stuff in space that we don't know much about, for obvious reasons.
@calloftrading
@calloftrading Ай бұрын
It does imply that there is a spectrum of choices. The significance level is just a measure of how rigorous you should be. When looking at a particular situation, you are probably not seeing every variable inside that system (system is huge and complex), leading to bigger errors and higher variation among each observation of impact of each i.variable in the d.variable.
@calloftrading
@calloftrading Ай бұрын
So for more controlled environments and when trying to prove theorems and turn them into laws essentially or verified characteristics, you need to be more certain. Therefore, there is a higher level of strictness (confidence level). In Financial forecasting it is normal to have a higher randomness associated with a bigger and more complex system, at smaller time frames specially, which leads to accepting lower levels of confidence in firecasts
@bjorntorlarsson
@bjorntorlarsson Ай бұрын
​@@calloftrading It would be nice if there was a way to quantify which confidence level to use. Taking it from the other way, and simply accepting an outcome together with its confidence level, whatever it is, isn't popular. It's looked down upon. But if one has to make a choice, as things are in reality, then the confidence level seems to me to be as much a relevant paramater as are the expected value and the spread measure. I don't quite get it why the confidence level should be somehow picked first, and only then the rest of the parameters be evaluated given a binary within or not of such an arbitrary significance. Isn't btw all of this olden Gaussian way obscolete now, that machine learning fits patterns on big data without considering stuff that were once invented only because they made data analysis simple and practical given the limits of ancient tools?
@calloftrading
@calloftrading Ай бұрын
@@bjorntorlarsson You always can resort to the p-value that indicates you the invalidation point of the significance level
@haukur1
@haukur1 21 күн бұрын
Frequentist view prohibits all notions of epistemology. So it fundamentally has no meaningful way to talk about evidence or partial knowledge. It's the reason why meta reviews are phrased so awkwardly, compared to something like civil court cases ("judged by the weight of the evidence").
@Tom-qz8xw
@Tom-qz8xw Ай бұрын
The problem with Bayesianism is the assumption that the data will conform to these parametric distributions, in the real world this is never the case.
@very-normal
@very-normal Ай бұрын
i think that’s a general problem for statistical models
@froao
@froao Ай бұрын
I didn't understand why in the comparison when he mentioned bootstrap he didn't mention or do a one sided freq test
@very-normal
@very-normal Ай бұрын
what would doing a one-sided test have changed
@simonpedley9729
@simonpedley9729 Ай бұрын
A lot of it is hammers vs wrenches. There are plenty of cases where subjective Bayesian isn't appropriate at all. If a drug company did a clinical trial, and proved that their drugs works, based on an analysis that involved their own subjective prior which assumed that the drug works, would you believe them? If someone is trying to prove that climate change affects x, and they use their own prior which assumes that climate change affects x, would you believe them? These examples illustrate that objectivity is sometimes really important (where objectivity means: reducing arbitrary decisions as much as possible...clearly nothing can be completely objective). On the other hand, there are plenty of situations where you should be including subjective prior information. There is also a whole field of statistics which is frequentist Bayesian methods, which to some extent takes the best of both worlds. It uses Bayesian methods, but has the objectivity of frequentism. The real problem in statistics is over-use of maxlik, which is neither frequentist nor Bayesian.
@very-normal
@very-normal Ай бұрын
That’s fair, but to clear something up: priors in clinical trials are often done with past studies in mind and with input from field experts, they’re not often made purely from the beliefs and feelings of a sole statistician
@simonpedley9729
@simonpedley9729 Ай бұрын
@@very-normal yes…there’s a big philosophical distinction between subjective priors and priors from previous studies
@chonchjohnch
@chonchjohnch Ай бұрын
I thought probability distributions were green’s functions
@QuandaleDingle-bq1on
@QuandaleDingle-bq1on Ай бұрын
Bayesian propaganda 😂
@very-normal
@very-normal Ай бұрын
propaganda for great posteriors
@12nites
@12nites Ай бұрын
You did a two-sided test when a one-sided test was needed. The actual pvalue was a half of what you got. Your H0 should have been pi
@very-normal
@very-normal Ай бұрын
lol even if the pvalue would have been halved, it doesn’t change the conclusion
@RevolutionAdvanced1
@RevolutionAdvanced1 Ай бұрын
I have trouble when you say "you can have strange priors, but you're gonna need to justify them with evidence". There is no rigorous method of assessing whether verbal statements such as "I have a heavy prejudice against cafes like mostra" produce valid or invalid priors. If we cannot have rigor in determining the validity of priors presented in a Bayesian analysis, then we are no longer considering logic and are instead considering rhetoric and argumentation, which the frequentists are very right to point out as being a major flaw.
@brashmane2749
@brashmane2749 Ай бұрын
So, in short: 1) Governor et al show gross invompetence and broadcast private information of their employees. 2) Governor et al misuse their power in order to cover up their mistakes and silence the witnesses with threats and false accusations. 3) governor et al attempt to influence the legal process they falsely instigated in order to get at an innocent journalist that did them a favor. 4) After being publicly proven wrong, the governor et al persist in their defamation and malicious prosecution of the journalist. ... Hold on. Doesn't this exact playbook resemble the actions of a certain yellow gorilla? It seems the societal rot does spread from the top down.
@josiaphus
@josiaphus Ай бұрын
The basis of the science crisis
@kellymoses8566
@kellymoses8566 Ай бұрын
One major issue with frequentist statistics is that it only considers the total count of events and not their more detailed order. It would consider a coin that did 1000 heads in a row and then 1000 tails to have the same behavior as a regular coin even though that is clearly wrong.
@TwentyNineJP
@TwentyNineJP Ай бұрын
Finally I have the vocabulary to describe my philosophical objections to the way that the topic of statistics is often discussed. Probabilities have no place in a world of perfect knowledge; to a hypothetical god, all probabilities would be either 1 or 0, and nothing in between. It is only our ignorance of outcomes that gives meaning to statistics. FWIW the Bayesian approach is what I studied in signal analysis. I just didn't realize that the whole of statistics was bifurcated like this.
@simonpedley9729
@simonpedley9729 Ай бұрын
It's not actually bifurcated. Bayesian statistics produces useful methods, while frequentist statistics is an aspiration. They are not mutually exclusive. There are plenty of methods that are neither, and plenty of methods that are both.
@ZergZfTw
@ZergZfTw Ай бұрын
Bayesian statistics reminds me of Kalman filters to a certain degree. It also seems to me that frequentist statistics is the limit of Bayesian statistics as you gather more data points.
@WeirdPatagonia
@WeirdPatagonia Ай бұрын
Or that frequentist is bayesian statistics with non informative priors (keeping only with the likelihood function)
@SAliGhaderi
@SAliGhaderi Ай бұрын
The Kalman filter is a direct application of Bayes' rule. In fact, there is evidence suggesting that Laplace may have applied a similar approach in his calculations of planetary orbits.
@xavierlarochelle2742
@xavierlarochelle2742 Ай бұрын
@@WeirdPatagonia Using non-informative priors is very different to keeping only the likelihood function. This is especially obvious when you condition your posterior on small samples.
@WeirdPatagonia
@WeirdPatagonia Ай бұрын
@@xavierlarochelle2742 In rigor, you are right, in practice, it depends on the size as you say. I haven't encountered a difference yet, but it is also true that most of my analysis are with medium/big datasets. Thanks for your comment
@bschrobru536
@bschrobru536 Ай бұрын
I noticed that my interpretation of the frequentist confidence interval is quite Bayesian, and I have seen this often in courses as well. What is your take on this @very-normal ?
@very-normal
@very-normal Ай бұрын
These courses are mistaking the frequentist interpretation for the Bayesian interpretation
@klyklops
@klyklops Ай бұрын
Major problems with Bayesian stats (according to a very famous Bayesian statistician) - Gelman A, Yao Y. Holes in Bayesian statistics. Journal of Physics G: Nuclear and Particle Physics. 2020 Dec 10;48(1):014002.
@very-normal
@very-normal Ай бұрын
“recognition of these holes should not be taken as a reason to abandon Bayesian methods or Bayesian reasoning”
@klyklops
@klyklops Ай бұрын
@@very-normal one could say the same about the frequentist approach. My point in sharing is only that the problems with the Bayesian approach aren't just "priors"
@piwi2005
@piwi2005 Ай бұрын
As usual for a bayesian video, there is much bias towards complexification. First, the test should be one sided: you requested at least 85%, so please have the courtesy to do the correct one. That divides p value by 2 from scratch. Then, you do not need a confidence interval at all, you have p value. What the test tells you is that from a sample of 1074 people, there was a probability of 0.27% to get the data you got if anyone was puting 4 or 5 stars _less_ than 85% of the time (by the way, this is how you got the 99.7% "that only Bayesian gives you", supposedly....). This is the frequentist approach, and it deals with facts and makes two assumptions: independance of choices from users and validity of CLT. Then from that p value, you can do what you want, you are not even obliged to do anything, because so far you only collected data and did maths. Once the computation is done, you can _finally_ go philosophical and decide you do not live in a universe where you got unlucky to be in the 0.27%. There is no binomial, no beta, no prior, no "I don't have an idea of my prior, so I'll use uniform distribution but I will call it Beta(1,1)", no some god of philosophy told me that "no idea" meant the existence of a uniform distribution in the realm of ideas, etc... Frequentist works with facts and try, at least when they're not psychologists or marketers, to be rigorous, not forgetting the assumptions they made. They uses stats to falsify theories, and they don't put probabilities on theories which rermain true or false. Bayesians do decision making, using a tool that always works, always getting an answer whatever was the question they had. It is very good for investors who want to use some maths and have a magical tool that allow them to propose a strategy with some appearance of seriousness, and it will work whenever they were lucky with their priors. But at the end of the day, either your posterior "probability" depends a lot on your priors, and you only put a number on your feelings, or it doesn't and you didn't need to go Bayesian. Frequentists don't deal with philosophy. Bayesian do and must.
@very-normal
@very-normal Ай бұрын
wow
@mousev1093
@mousev1093 Ай бұрын
It's a shame this isn't higher but that's probably to be expected in a channel/comment section so heavily biased to one approach. The number one suggested video to follow this is literally called "the better way to do statistics" His entire interpretation of the "frequentist perspective" was purposefully limited and he tried to divorce it from reality and naturally occurring events. I'd go as far and argue that his interpretation of how to report a confidence interval was bordering on incorrect. It can, and should be, phrased practically identically to the way he talked about credibility intervals later. The entire point is that you can't know something perfectly to arbitrary confidence and the estimation of true probability can only be refined. A confidence interval is the way of quantifying this spread of uncertainty. He even contradicted himself on the definition of "repeated experiment". First he defines experiments as events that produce individual data points and then he's purposely obtuse and redefines repeating the experiment to gather another 1074 reviews. Really should have partnered with someone else to present the other side. An entire video with straw men is boring
@very-normal
@very-normal Ай бұрын
feel free to make that video with more correct frequentist teachings, more good statistics videos wouldn’t hurt on KZbin
@mousev1093
@mousev1093 Ай бұрын
@@very-normal I think that might hurt my current content algorithm stuff yknow ;)
@jacobbartkiewicz9994
@jacobbartkiewicz9994 Ай бұрын
The bayesian approach doesn’t necessarily need to be subjective. It can simply be a difference between modelled and unmodelled probability. Given omnipotent knowledge, you could create a perfect model of all contributing factors towards the probability of a particular event. The randomness would then be only due to unmeasurable quantum effects i.e. Heisenburg’s uncertainty principle, thus achieving a completely objective probability. Of course omnipotent knowledge is impossible, so any measured probability can be expressed as the objective probability biased by unmeasured factors. I think this actually unties the 2 hypothesis’ but I haven’t done the math to check.
@therealjezzyc6209
@therealjezzyc6209 Ай бұрын
Heisenberg's uncertainty principle doesn't give probabilities in QM. Also, if you had omnipotent knowledge and could control for all degrees of freedom that determine the outcomes you actually wouldn't need to talk about probability at all, your model would be completely deterministic. Which I think js what you meant to say, but then using bayesian reasoning on a deterministic system is not that effective since you'd be better of using a regular deductive approach rather than the inductive bayesian one.
@danielkeliger5514
@danielkeliger5514 Ай бұрын
@@therealjezzyc6209I think OP tried to make a distinciton between “true” probabilites and epistemic ones. Saying a coin toss is basicaly deterministic, the main source of uncertainty comes from our lack of knowledge of initial conditions. While the spin of the photone under measurments is not coused my anything. There is no further parameters down the line that would make it possible to predict the photon’s future state. Even Laplace’s demon with complete knowledge of the world wouldn’t be able to predict the outcome of that measurment as opposed to coin flips (which of cource has some fluctuations from quantum mechanics, but they are negligible for all intent and purposes). Sure, there is a debate about wether these “truely random” events actually happen or not, but based on Bell’s inequality we have to get rid of either determinism or locality.
@Kram1032
@Kram1032 Ай бұрын
I don't really know this very well, I just remember having read it somewhere so maybe it's completely wrong, but I thought the common "uninformative" choice of he beta distribution is alpha=beta= as small as possible? IIRC the theoretically optima choice is alpha=beta=1/2 (I'm sure you'll eventually talk about that) but I've seen people argue it really should be alpha=beta=epsilon so like 1/10 or even 1/100 basically. It's impossible to set both parameters to 0 but just in principle I could have the *effect* of that, I think, by fixing my initial prior as "a beta distribution with alpha=beta=0" without worrying about the issues with that, and then just following the regular update rules and go from there, right? It's like a truly limiting-case uninformative prior I think? Or is there a good reason not to do this?
@very-normal
@very-normal Ай бұрын
That’s a great question. In my experience, I’ve only seen Beta(1, 1), but most of my experience is in clinical trials, so maybe customs are different elsewhere? My understanding is that your initial prior parameters also influence how much the data will influence the shape of the posterior. Parameters 1 and 1 suggest you know absolutely nothing with discrete trials. But parameters 100 and 100 still look uniform but suggest you had 200 trials that went both ways the same amount of times. Data will influence the shape of the former more than the latter. Not a complete answer but I hope it helps a little bit
@Kram1032
@Kram1032 Ай бұрын
​@@very-normal reading up a bit about it now: So alpha=beta=1 is the Bayes-Laplace prior, alpha=beta=1/2 is the Jeffreys prior and comes from a specific proof: This choice is invariant under reparameterization, i.e. (more or less) proportional to Fisher's information matrix. - That's where my suggestion about alpha=beta=1/2 came from There also is Kerman's "Neutral" prior alpha=beta=1/3 and the limiting case, Haldane's prior (alpha=beta=0) The higher alpha and beta are, the more the prior influences the posterior so in that sense, if you want literally no influence on the posterior, you really ought to go with Haldane's. In that case, the posterior mean equals the maximum likelihood estimate, but there are also plenty people arguing against that choice. For very small datasets the "uniform" choice alpha=beta=1 can be a pretty strong bias, but of course if you have LOADS of data it's gonna be fine.
@very-normal
@very-normal Ай бұрын
That’s interesting, I hadn’t heard of these before. It definitely highlights the fact that choosing a good prior isn’t trivial, something I chose not to include in the video
@falquicao8331
@falquicao8331 Ай бұрын
I think the problem with setting both parameters to zero is that you're not "skeptical" of the data. Suppose you find a restaurant that has a single, positive review. Would you consider that to probably be a better restaurant than one where 990 people leave positive reviews and only 10 leave negative reviews? Ultimately it depends on how likely you consider any proportion of positive reviews to be. Personally, I'd say that parameters of beta=0.5 and alpha=2 work pretty well in this case. Ideally, you would find the exact rating distribution of any coffee place and use that. Also keep in mind that alpha=beta=epsilon means that you think it's either zero or one, with no middle ground. It means you don't expect the value to be a probability but merely a true/false with some accepted error.
@Kram1032
@Kram1032 Ай бұрын
there is a looong section on Wikipedia about the Beta Distribution titled Bayesian Inference where it compares a bunch of choices for uninformative prior and quotes a bunch of works by different people. Most often it seems that Jeffrey's prior is favored by theorists, at least as presented on that page
@tunneloflight
@tunneloflight Ай бұрын
Both frequently lead to bogus answers by never understanding the problem in the first place (in dozens of different ways), and by wrongly conflating chance, risk, and probability of many different flavors, and by presuming assumptions that have no basis in reality. Lies, damnable lies, and statistics - the most damnable lies of all. Beyond all of this, statistical analysis is subject to thousands of human flaws in cognition, in emotion, in biases, and more. Statistical analysis often then gets subsumed into bogus hierarchical fault tree analysis that pile error upon error, and/or supporting bogus and erroneous multi-function weighted attribute analysis, and/or conflating cost-benefit analysis as dispassionate, fair and/or meaningful, and/or truncating high-consequence low-probability events, and/or utterly missing consequence analysis and vulnerability assessment, and/or utterly failing to understand and properly apply "Safety Culture", and wrongly substituting a wildly erroneous belief that a misunderstanding about the meanings of the two words is a substitute fir understanding and properly application of the phrase. Oh yes, AND confusing or reversing cause and effect, or assuming causal linkage when no linkage exists. Said more simply: a pox on both.
@very-normal
@very-normal Ай бұрын
i was waiting for someone to bring up the “lies” quote, congrats on being the first here 🥇🎊
@tunneloflight
@tunneloflight Ай бұрын
@@very-normal It isn't and wasn't simply a matter of a quote. Having spent an entire career in engineering and science, I had the opportunity to watch as statistics was repeatedly abused as a replacement for wisdom and reality, often concealing powerful truths. Among these, I got to watch in real-time the development and spread of p-hacking in hundreds of wilful and inadvertent ways. I also got to see risk assessments pruned of true errors to focus on central value - which no meaning at all, and to see cases where cause and effect were reversed for those to allow maximal harm, while asserting little to no harm, etc... In all of that, I came to appreciate just how true that aphorism is. Add to this the hierarchical approach in science and academia, where an Italian researcher over three decades ago defined uncertainty in an analysis if a real system to be numerical variation in a chosen model. I cannot speak to whether this was a logical or intentional error, or simple ignorance. What ever it was, by being the first major author in the field, other researchers have relied on that error to narrow their analysis of unrelated subjects to achieve desired results (whether they understood the invalidity of that or not). Meanwhile other experts attempting to reach truer results using tools like the Jupiter Suite are repeatedly rejected as they broaden the distributions, while the researchers desire is to (wrongly) narrow their analysis distribution. Bayesian analysis can bring new insights. More often in my experience they serve to obfuscate the relationships resulting in embedded hidden errors being buried in the process. Also add to this the rejection of other tools, like "inconfidence" analysis that more easily identify vulnerabilities in the analyses.
@tunneloflight
@tunneloflight Ай бұрын
"These concocted analyses don't get very far." Quite to the contrary in my experience they rule the roost, and are lost to most analysts in the complexity or obtuseness of the analysis.
@waraiotoko374
@waraiotoko374 Ай бұрын
I still don't understand why the Bayesian method is not susceptible to manipulation and subjectivity. You claim that even if I arbitrarily choose the initial probability, it only makes sense if it is supported by evidence. But where does that evidence come from? From the frequentist method, right? Because if it's from the Bayesian method, then I'm stuck in a circular argument... am I not?
@very-normal
@very-normal Ай бұрын
If a past study uses a frequentist method to analyze the data, then a new prior should be formed to reflect what that finding found. For example if a past study found the probability to be 70%, then my new study should probably make the prior on and around 70% more likely. If past studies use a Bayesian analysis, then it’s even easier. The posterior from the past study becomes the prior in the new study. The past data helps inform the prior, not so much the method was frequentist or Bayesian. You’re right that it can be hard and arbitrary to choose a prior, but that’s not a reason to abandon the method in the first place. Classic frequentist methods don’t work well with smaller sample size, yet people are taught to do it anyway
@XxRiseagainstfanxX
@XxRiseagainstfanxX Ай бұрын
Read "A history of mathematical statistics (from 1750 to 1930)" by Anders Hald
@very-normal
@very-normal Ай бұрын
it’s a good book, you should also try Stigler’s History of Statistics too
@6PrettiestPics
@6PrettiestPics Ай бұрын
More please.
@camillebrugel2988
@camillebrugel2988 Ай бұрын
Couldn't the null hypothesis be an inequality ? That would have been more logical to ask whether mu > 0.85. You have have an MLE of 0.88 and with a t-test you get your p-value and confidence interval but instead of the p-value you could get 1-p_value to get a similar probability than the one from the bayesian side ? I know the student distribution of the test statistic is very different from the bayesian posterior but I would make this kind of bad leap in reasoning intuitively. ^^ There isn't a test statistic for inequalities?
@camillebrugel2988
@camillebrugel2988 Ай бұрын
Sorry it's pi not mu, need to practice my greek alphabet.
@camillebrugel2988
@camillebrugel2988 Ай бұрын
And it should probably be 1_p_value/2 for the symmetry.
@very-normal
@very-normal Ай бұрын
You could use a composite null hypothesis actually! You’d end up with a one-sided test. I’m aware of other tools for composite null hypotheses, but they’re usually outside the scope of what most statistics users would be familiar with
@Acbelable
@Acbelable Ай бұрын
O love this guy
@sriharsha580
@sriharsha580 Ай бұрын
"C.I does n't tell us if it contains the true value of PI or not, you can only know that if you repeated the experiment multiple times then most of them will. " Can you explain this statement I didn't get it.
@very-normal
@very-normal Ай бұрын
The definition of confidence is the proportion of intervals that contain the true parameter value. Different experiment repetitions will produce different datasets, so the ends of the intervals will change depending on the data. In the same way that choosing a 5% level means you only get a type-I error in 5% of experiments, the confidence interval will contain/cover the value of the true parameter in 95% of experiments. There’s no guarantee that you know the one you calculated actually contains it or not
@lemmingsoutside
@lemmingsoutside Ай бұрын
What prior could you use to account for the fact that the tails are probably heavy. I.e. 3 star reviews are a lot rarer than 1 and 5 star reviews?
@very-normal
@very-normal Ай бұрын
You could set the first parameter higher to reflect this. You could choose one to have a particular prior mean to reflect your thoughts on how rare/common the reviews are
@ckq
@ckq Ай бұрын
I'm a big predictions markets and forecasting guy so I essentially take Bayes for granted. Typically you look at past frequencies then apply a prior to it, which can get kinda subjective. Bayesian as a word seems so fancy (aka objective) I thought you could never be truly Bayesian. But really its just fancy terminology ppl say to look cool and not be a frequentist dummy who doesn't understand that randomness exists. I never really saw it as a conflict.
@philipoakley5498
@philipoakley5498 Ай бұрын
You are offered a pair of loaded dice with an assertion of their 'loading'. Can you believe them, and how much should you pay to test them before buying. Should you start by assuming the dice are unweighted (and the sale is a confidence trick), or that the dice are weighted as offered. PS the con artist (?) did a single dice throw, to show you, before stating the weighting...
@very-normal
@very-normal Ай бұрын
i don’t take dice from strangers
@barttrudeau9237
@barttrudeau9237 Ай бұрын
I really enjoy your videos and style of teaching. I found this video especially well done and I learned a lot about a subject that's hard to understand (I'm an architect, not a mathematician, but I love this subject). I have been trying to learn statistical concepts for years and from this video I am starting to understand that if you are a mathematician focused on the computation and not knowledgeable about the subject you are studying, a frequentist approach may be more appealing and appropriate. If you are a subject matter expert trying to predict possible future outcomes, a Bayesian approach would be a better fit. You could say it caters to your prejudices, that's fair, but it also allows you to employee your expertise. So perhaps a Bayesian approach is more corruptible, but done properly it seems to have higher potential.
@very-normal
@very-normal Ай бұрын
Thanks, I’m glad they could be helpful to you. And I think you’re right, both methods have their use. Something I took out of this video was the fact that both methods are necessary in my space of biostatistics. Frequentist statistics are very desirable because of error rate control. If a medicine is risky but possibly useful, we want to be as sure as possible it works. Type-I errors are a different beast when humans are involved. But when we’re trying to look for new drugs and there’s millions of candidates to vet, Bayesian methods are a little better at this because posterior updates are still valid with multiple looks/analyses of the data without having to finagle with our level each time. Sometimes I think people skip my conclusion that you need both but it is what it is lol
@barttrudeau9237
@barttrudeau9237 Ай бұрын
@@very-normal I wish I could do both, I try, but I don't have the depth of mathematics education to do it properly. I do the best I can and keep learning every day. Thank you again so much for sharing your knowledge. It's really appreciated.
@zak3744
@zak3744 Ай бұрын
20:04 - "You can have strange priors, but you're going to have to justify them with evidence." But in that case, they're not really a subjective prior at all, are they? If they're properly evidence-based, then they're objective, surely? And in that case, the fundamental objective/subjective difference that you'd previously described is no longer there. An objective prior followed by an objective analysis gives an objective result, and a subjective prior followed by an objective analysis gives a result that is to some extent not evidence-based!
@happyduck1
@happyduck1 Ай бұрын
The priors are subjective, because they are still degrees of belief, not real properties of some object. People disagree what is "properly evidence-based" and that is reflected in their priors. Two people with the same base evidence would also have the same subjective priors, but having the exact same evidence isn't really possible for humans. If you had literally all the information about everything you could theoretically get "objective priors" but in that case you aren't dealing with probabilities anymore but just know the correct answer. An "objective prior" could really only be 0 or 1, because in a Baysian sense "objective probabilites" don't exist.
@zak3744
@zak3744 Ай бұрын
@@happyduck1 Of course it's possible to have two humans with the exact same evidence. If you run some sort of experimental trial, the maths shouldn't change on the basis of which particular researcher analyses the dataset that you obtain from that trial. (When I say "shouldn't" I'm assuming of course that science "should" be objective and evidence-based, of course, but that should be taken as given: it's definitional to science!) I'm not a mathematician or a statistician as such, but I am an engineer with a background in research so it's not like I'm averse to the numerical and the analytical. I periodically try and understand what this supposed crucial rift is between frequentist and Bayesian stats and it always _appears_ to me to come back to two possibilities, as far as I understand them: a) there's no real difference at all. They're just two different ways of framing the same underlying ideas, which might be practically useful (based on how you expect data to come in over time, for example), but does not mean there's any distinctions of principle between the two. This is the appearance I'm often left with, but my certainty is challenged when I hear statisticians insist that there really are important underlying distinctions of principle that really do make a real difference. b) there actually is a meaningful difference between the two, and that difference is around this notion of subjective belief in the priors. That there wouldn't be a difference if the priors were truly objective, but they don't have to be, and that is where the difference of principle creeps in. And if that really is the point of distinction between the two approaches, then that strikes me as nothing more than an attempt to "launder" subjective prior beliefs which cannot be stringently evidenced into an result which has the _appearance_ (since it came out of a statistical formula containing lots of actual evidence in the non-prior part) of objectivity. And that seems to me not to be science!
@andreypovyakalo9669
@andreypovyakalo9669 25 күн бұрын
The presentation is substantially misleading, indicating serious misunderstanding of mathematical theory of probability. *** Since 1933 nobody defines probability as frequency. It's defined via Kholmogorov's formalism as normalised measure defined over given sigma-algebra of events. Frequency is just a maximum likelihood estimate of a parameter for a particular algebra of events called Binomial distribution. Bayes theorem and the whole analytical Apparatus of Bayesian statistics is a part of Kholmogorov's theory. ''Frequentist'' is a label invented by hostile Bayesianists to misinterpret Fisher-Neymann-Pearson (Fisherian) approach to statistics based on maximum likelihood. As J.Neymann pointed at in his paper on hypothesis testing, he has nothing against the inverse probability (Bayesian) approach so long as the prior distribution is properly justified. Bayesians just add latent unobservable variables to the observed variables called ''parameters'', via a parameterised model and arbitrarily assign a prior distribution to these unobserved variables without any reasonable justification. Example: For observations with hypothesised Weibull distriburion with unnkown parameters, Bayesians are trying to convince the customer that they (customers) know ''subjective'' disttibution of the parameters and use the resulting joint distribution to infer distribution of the ''parameters'' conditional on observed values of observable variables. in Bayesianism, the hypothesided parameterised model and prior distributions are never ever formally validated. Even an apostle of Bayesianism SJ Savage in his paper on elicitation of personal probabilities, admitted that people have no intuition about probabilities of infrequent events ( with probabilites of order 1e-6, 1e-10). Therefore any subjective probabilities of such events can not be reasonably validated. As another apostle of Bayesianism D. Lindley demonstrated in his foreword to his book on Bayesian Statistics, Bayesianism is an escape heaven for those who are unable to understand and therefore brutally misinterpreting Kolmogorov's formalism and basic concepts of Fisherian statistics, e.g. P-value. Bayesians normally misinterpret p-value as likelihood of unobserved outcomes of the experiment in question. In fact, p-value is just a universal STATISTICS, which has known UNIFORM DISTRIBUTION, given that hypothesised distribution of observations is true. *** Another feature of Bayesianism is that if a given prior distribution results in low expected likelihood of observations, then posterior distribution of the parameters will be close to the prior distriburion... I.e. The Bayesian model in question will be ignoring observed data...
@andreypovyakalo9669
@andreypovyakalo9669 25 күн бұрын
Another pleasant feature of Bayesian approach is its sensitivty to re-parameterisation. Exponential models: F(x) = 1- exp(- lambda * x) and F(x) = 1 - exp(- x / T) result in inconsistent posterior distribution, even if uninformative priors are defined over coordinated segments 0 < lambda < 1e-3 and 0 < T < 1000...
The better way to do statistics
17:25
Very Normal
Рет қаралды 256 М.
What Researchers Learned from 350,757 Coin Flips
25:46
Another Roof
Рет қаралды 46 М.
Motorbike Smashes Into Porsche! 😱
00:15
Caters Clips
Рет қаралды 23 МЛН
The Strange Physics Principle That Shapes Reality
32:44
Veritasium
Рет қаралды 6 МЛН
DoubleSpeak, How to Lie without Lying
16:15
What I've Learned
Рет қаралды 11 МЛН
What does it take to win the biggest prize in statistics?
21:15
Very Normal
Рет қаралды 29 М.
Why Are Cooling Towers Shaped Like That?
19:48
Practical Engineering
Рет қаралды 3,1 МЛН
The Key Equation Behind Probability
26:24
Artem Kirsanov
Рет қаралды 146 М.
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 1,3 МЛН
The unexpected probability result confusing everyone
17:24
Stand-up Maths
Рет қаралды 777 М.
Cursed Units 2: Curseder Units
20:18
Joseph Newton
Рет қаралды 546 М.
Is the Future of Linear Algebra.. Random?
35:11
Mutual Information
Рет қаралды 357 М.
Motorbike Smashes Into Porsche! 😱
00:15
Caters Clips
Рет қаралды 23 МЛН