Can’t believe this content is free 😭, thanks so much for this kind of valuable content.
@tr0wb3d3r519 сағат бұрын
same 💖
@very-normal19 сағат бұрын
🫡
@d_b_19 сағат бұрын
Great video. Seeing code was really a turning point for me in understanding statistics, making the formulas and charts in textbooks more tangible. Now seeing the differences in 27:28 makes me want to dive deeper into the tests to see whats causing these differences.
@olipito6 сағат бұрын
Waited for soooo long for a good MC explanation video (series) YT is surprisingly poor on this topic. Thank you!
@ziggle31416 сағат бұрын
Fantastic exposition. You have inspired me to spend more time on Monte Carlo simulations.
@Gigano19 сағат бұрын
Fantastic stuff! I wish this were available 7 years ago haha when I did my PhD haha!
@realshmel5 сағат бұрын
Amazing video . Thanks a million !
@berjonah1103 сағат бұрын
One thing to keep in mind about saving the results is that disk operations are orders of magnitude slower than ordinary computation. Depending on the complexity of the calculation, you might be better of saving an intermediate result, like the chunked mean or something similar.
@jeffreychandler841819 сағат бұрын
I've been diving into Rand Wilcox's robust estimation textbook and it's been a really compelling statistical book. It presents the argument that we care most about the core of the data, so we can use trimmed means and winsorized variances (for those that don't know those terms, you cut the outer percentiles for the mean, and for the variance, you set the outer percentiles to the new min and max from trimming). This seems to allow the robustness of rank statistics while maintaining the interpretability of regular statistics. I've worked with it on some of my data and it seems to capture similar patterns as the rank statistics but I can compute effect sizes and communicate it more effectively.
@jeffreychandler841818 сағат бұрын
Also, DO NOT RUN COMPUTATIONS ON THE LOGIN NODE. That is the ONE WAY you can break a cluster. I cannot emphasize that enough xD The amount of times I've had to postpone work because some person somewhere in the country ran a "10 minute job" on a login node.... Also I know there are tools to help write SLURM files for you, just remember your own cluster will have its own limits. It's also nice practice to request the minimum CPU/memory (usually the two are tied, so may as well use all memory you are allowed) necessary, since these are shared systems. And I will testify to the value of clusters. I took an analysis that would have taken 6 months on a high end consumer-grade desktop and shrunk it to a day and a half using cluster resources. If you are doing research/big data anything, this is a critical skill to your success.
@Terracotta-warriors_Sea19 сағат бұрын
Thank you for a great video
@sethsims741413 сағат бұрын
Python has common sampling distributions in the "random" module of the standard library. Numpy/Scipy have a much wider selection of distributions though and will usually be faster if you know how to use them.
@lex4944 сағат бұрын
Great video, thanks. I really like the aesthetics of your channel. It’s clean and tidy. Can you recommend some resources to get a deep understanding of statistics? I have no background in mathematics, but i’m self studying, because I feel like just intuition is not enough to excel at the subject. I would like to self study statistics from scratch as well. Do a lot of problems, calculations myself. Like a good entry level textbook, or an online resource. Thanks!
@Gigano18 сағат бұрын
By the way, at 14:14, I do not think that code shows how to read in the results of the simulations. Rather it shows the code that runs 10,000 simulations of the non-parallelized kind.
@very-normal18 сағат бұрын
Oops you’re right, that’s my bad. I’ll add an addendum to the comments when I get back to my desktop. Thanks for the heads up!
@TM-fk4gt19 сағат бұрын
I don't understand the conclusion. Could you rephrase it. 27:07
@very-normal19 сағат бұрын
The t-tests work better than the nonparametric test when all the assumptions needed to support the t-test are met. Furthermore, there’s not much of a difference between Student and Welch’s t-test.But this study shows that its power is severely decreased in the presence of outliers, which are a natural occurrence in KZbin. In either case of data generating distribution, power is higher when the two groups are more different, since it’s much easier to detect the difference in their means/typical values. Since the power of the nonparametric test is not that much lower even in the Normal setting, it’s my opinion that it’s a safer hypothesis to use if I want to analyze this type of data in the future.
@statswithbrian18 сағат бұрын
@@very-normal The equal variance (Student) t-test is only going to out-perform Welch's when the sample sizes are really small, which is where you get the benefit of pooling the variances. If you run these simulations with n=3 and data coming from normal distributions, that's where the pooled test will really shine compared to the others. with n=27, you'll be able to estimate each variance separately very well anyways, so you don't really get an advantage from pooling which is why the power is the same. Making sample size (in addition to sample size) a variable in the simulation would let you see when the pooled test outperforms the unpooled test. Adding various degrees of heteroskedasticity would also probability highlight the reverse cases where the unpooled test does better.
@BOKRalavac6 сағат бұрын
I realy love your videos. If there is anything that I can do to help you produce them, I would love to do so (for example translation, since I am german, or writing short code chunks for viewers to download and play with)
@nj-bz8pv16 сағат бұрын
Thanks
@therealawesomequest16 сағат бұрын
Great video! Love your stuff! When you're displaying code blocks, and transitioning between them, could you not morph the entire block? It makes it impossible to track what's being changed. I love the way you talk about these things!
@very-normal15 сағат бұрын
Thanks! Yeah, I’m not a big fan of how manim handles the code changes either. I’ll try to look into this more or get better at an alternative way of showing code!
@onnio7998Сағат бұрын
A very interesting video! I can' help but wonder though whether it would be possible to run the third simulation on a laptop in a reasonable amount of time. Maybe I'm missing something, but shouldn't it be 180 000 * 27 * 2 samples of the distributions and then some math on those from which the final result can be derived. The number of calculations will be in the hundreds of millions probably, but that should't be too bad if I imagine writing some multithreaded c++. Might take a couple of seconds. Am I dreaming?
@sethsims741413 сағат бұрын
You can embed your R script in the submit script using here docs. Assuming you're using bash.