In this R video tutorial, we learn to use R to perform a hypothesis test using a bootstrap approach. An R package does exist for bootstrap hypothesis testing (“boot”), but the package is limited. Here we will show how to build the bootstrap approach; this will allow us to make changes to the sorts of statistics/estimates we want to conduct the test for. The R script accompanying this video has all the R codes used in this tutorial plus extra R codes for students to explore on their own ( statslectures.com/r-scripts-datasets ). If you like to support us, you can Donate (bit.ly/2CWxnP2), Share our Videos, Leave us a Comment and Give us a Like ! Either way We Thank You!
@AndreaLyuu3 жыл бұрын
Hello Mike! Thanks for all the great content! Please help me with this question: If our hypothesis test were to be a one sided (H0:mean-c>=mean-m, H1: mean-c
@kaibecker14113 жыл бұрын
10:23 Minor mistake: 48.0 and 68.2 (#16) are larger than your test statistic
@auroralorenzi29575 жыл бұрын
Hi Marin. Thank you for your videos! It's not an easy topic to teach, but your videos are very clear.
@carlosalexandrecosta42344 жыл бұрын
Just an amazing content. I'm studying Data Science and i can tell you, your videos are helping a lot! Thank you!
@yoyme_blumenfeld Жыл бұрын
Lovely video, thank you! everything clear and I love that you showed us what is behind those functions, so one can experiment with other libraries having good benchmark.
@kahuilim21224 жыл бұрын
Thanks for taking the time to put these amazing videos together.
@marinstatlectures4 жыл бұрын
You’re welcome
@cb48083 жыл бұрын
5:00 what do these numbers in the brackets mean? set.seed(112358)
@zjardynliera-hood56093 жыл бұрын
its just an arbitrary number to set the seed to that rng state. Basically if I set that seed on my machine, I would get the same results he is getting, and if I reran the code, I would get the same results each time.
@TanMan14 жыл бұрын
at 5:35 in the video - if you want to bootstrap for multiple variables, how would you adjust the code?
@yinanxue86536 жыл бұрын
Thank you for the video. I got a question. How do you know that in the bootstrap data, the first 12 rows are weights of casein and the last 13 rows are weights of meatmeal?
@yinanxue86536 жыл бұрын
Nvm. I figured it out. Is it because H0 is difference between mean equals 0, so if H0 is true, we can pool the data?
@yekhtiari6 жыл бұрын
I have the same question.
@marinstatlectures5 жыл бұрын
for bootstrapping, we resample with replacement...so really, it just matters that we have 12 observation for the one group, and 13 for the other. we could just as easily label the first 13 rows for meat meal and then the next 12 for casein. all that is really necessary is that we assign 12 of them to casein and 13 to meatball...but the order we assign them in doesnt matter...hope that clarifies it...
@gillianjean22375 жыл бұрын
@@marinstatlectures Hi Mike, brilliant video but I'm still confused about this part. I have 45016 observations, 22049 in group A and 22967 in group B. How do I make sure that I'm sampling correctly from both groups?
@rhosigma21995 жыл бұрын
@@marinstatlectures Thank you for the video. I also have the same question. BootstrapSamples = matrix(sample(variable, size =n*B,replace =TRUE ,nrow =n,ncol=B) Does this function assure that the BootstrapSamples have 13rows of the meat meal and 12rows of the casein?
@niceday20152 жыл бұрын
Thank you very much, Your teaching has expanded my perception of the world
@MrSoumyaBanerjee4 жыл бұрын
Thank you for the video, but by resampling from d$weight, won't it cause the casein and meatmeal observations to get jumbled up in most cases? Wouldn't it be better to create 2 separate dataframes for each set of weight values, and resample separately from these 2?
@marinstatlectures4 жыл бұрын
Good question. Because in a hypothesis test we begin by assuming there is no difference in weight of the two groups, we want them all mixed together..to see how often we’d observe a difference as large or larger than we saw, if there really is no difference in the groups. When building a confidence interval however, we want to keep the groups observation ms separate, to preserve any group differences observed. Hope that clarifies it
@SinanMavruk2 жыл бұрын
@@marinstatlectures Thank you for the video and explanations. You made the point very clear for me. But in this case, the only difference between using permutation or bootstrapping is in the replacement in resampling? So, how to decide which test is better in application?
@farahyounes28133 жыл бұрын
thank you for your explanation, for the case of a time series how we can apply the method of bootstrap to compare two spectral densities
@JJ-wx3nd3 жыл бұрын
Thank you so much! Especially after I found the boot() in R documentation is not enough to know.
@feloria18623 жыл бұрын
When you resample at 6:20 does the resample keep the feed type variables separate? Or do weight values from either feed type resample to anywhere in the 23 observations?
@feloria18623 жыл бұрын
I figured it out the feeds are resampled to any position because the null hypothesis is that the two populations are the same, so swapping them to any position even if the feed type doesnt match is fine.
@nevertheless4504 Жыл бұрын
HI sir. you video really helps me a lot. However, I just have question. wen we want to do the one side test. We just need to delete the abs right, and when do the comparing, mean(Boot.test.stat1 >=test.stat1). we only use < or. >. instead of >=. Am I right
@sohanaryal2 жыл бұрын
How can we randomly assign first 11 to one type of feed and remaining to other type of feed in bootstrap matrix?
@forestalgarcia15064 жыл бұрын
If I have more than two "diets", how to calculate the absolute difference of means?
@alinevm49152 жыл бұрын
such a didactical video, thanks a lot!
@xdienn3 жыл бұрын
Great video, thank you so much! I'm still wondering though, how to proceed if you have a 2x2 factorial design? Do you then calculate 4 test statistics, one for each group? And for interaction effects?
@b.ambrozio4 жыл бұрын
Thanks for the video! Question: About your last statement: "Any time doing a hypothesis test, we should also include a confidence interval to give an ideal of how big the difference would be". Does it mean I should run my t-test against the CI instead? (e.g. calculate the CI of all my 1000 arrays, and do the t-test agains the means from the CI means? In other words, should I use the means calculated from the range between the first and last quantile in my t-test? )
@munafahmed7254 жыл бұрын
In mean.default(BootstrapSamples[1:12, i]) : argument is not numeric or logical: returning NA How to solve this error?
@juanfranciscoecheverry30314 жыл бұрын
I have the same problem. thanks a lot!
@gbganalyst6 жыл бұрын
Can we then interpret the results of Bootstrapping with the way we interpret the result of independent t-test?
@marinstatlectures6 жыл бұрын
For the most part you can. The beauty of the bootstrap though is that you can also work with more interesting/relevant statistics, aside from just mean/median, which the classical approaches use. You can work with just about any statistic/estimate you can imagine
@Rainstorm1213 жыл бұрын
Thanks very much Sir. If I have random distribution of scores for each variables as follows: A=7, B=13, C=23, D=19, E=15, F=30. If I want to do hypothesis testing to find out which of the variables has statistical significance of score, what is the best advise in using bootstrapping in this situation? Given that Ho: expected probability for each of the variables is equal to 0.12, and Hi: is not equal to 0.12.
@Loggies893 жыл бұрын
You lost me at the i=1 and i=2 bit. Is there a step you aren't showing where these are created? I'm getting an error saying object 'i' not found, so i assume i have to create it at some point before entering it into the boot test statistic.
@gruppenzwangimweb205 жыл бұрын
nice video! btw... this BootstrapSamples
@duncanrager71805 жыл бұрын
Hi Mike, thanks for the helpful video. In this case, the first test statistic is the same as used in a two-sided, two-sample T-test. As an alternative, to use the same test statistic as a one-sided, two-sample T-test, would that be the difference of the mean weight for the two diets (not the absolute difference)?
@marinstatlectures5 жыл бұрын
yes, that's right
@alinepontes73604 жыл бұрын
Thanks for the video! It was the only way for doing a hypo test for a complex dataset. The boot package was not enough. BTW, is there a recommended citation for this boot method (e.g. a book)?
@mathieufen22394 жыл бұрын
Very cool video! Thanks! I wonder if this approach could be used on paired data...
@marinstatlectures4 жыл бұрын
Definitely, the concept of bootstrapping can be used for just about any structure of data. I explained it simply here, but the concept transfers very widely
@parvenraj984 жыл бұрын
You are the best !!!!!
@chathuraedirisuriya65356 жыл бұрын
Bootstrap Hypothesis Testing R script link direct to a wrong file. Please correct it.
@marinstatlectures6 жыл бұрын
thanks for letting us know, It should point to the correct file now.
@MsWilliam635 жыл бұрын
Hi Mike, great videos. Really clear and helpful. I have two questions. What is the best way to report these results in text? Is it best just to report the bootstrapped difference in means and SD and p value (e.g., observered stat = X, mean ± SD, p-value=X)? Is it possible to combine a t-test or Mann-Whitney U with the bootstrapped data in order to get a t-stat as well as the p-value for your difference in means/medians?
@marinstatlectures5 жыл бұрын
it's hard to say the best way to report, as that really depends on context...what is the discipline, what was the focus of the paper, etc. regarding approaches, you can certainly combine this sort of approach with a parametric one like the t-test. for example, you may wish to use a Bootstrap only to estimate the SE of your estimate, and then substitute this estimate into a standard approach like a t-test ex: Confidence interval: Estimate +/- t * BootstrapSE this of course requires the assumptions of the standard t-test/confidence interval approach to be met.
@AndreaLyuu3 жыл бұрын
Hello Mike! Thanks for all the great content! Please help me with this question: If our hypothesis test were to be a one sided (H0:mean-c>=mean-m, H1: mean-c
@mbellett744 жыл бұрын
if the column "feed" is not ordered with( respect to meat meal and casein), how to order it before to run the boot.test.stat? many thanks
@marinstatlectures4 жыл бұрын
You don’t necessarily need to order it, but you can do that with the sort() command. You can also use the tidyverse arrange() command as well
@mbellett744 жыл бұрын
@@marinstatlectures many thanks Mike, I did it using arrange(). Maybe I have to do it because in the boot.test command I have to define two groups of lines to confront: abs(median(bootstrapsamples1[1:938, i]) - median(bootstrapsamples1[939:1511, i] , ...and in my data-set the two groups of events to confront are mixed, again many thanks!!
@santiagomendozapaz21354 жыл бұрын
@@marinstatlectures maybe I am misunderstanding the figure, but, if our matrix from which we are going to resample contains 12 values for type 1 and 11 values for type 2 and we apply the resampling directly to the 23 values, the resulting resampled matrix is going to contain randomly 23 values from both types, therefore, why are you obtaining the mean between [1:12] and [13:23] as in the resulting matrix we are not sure if type 1 is contained in [1:12] or type 2 in [13:23]?
@veducatube57014 жыл бұрын
Sir i jeed to bootstrap spatial point data... Meaning I have 10 values with lat long and a z . I need to bootstrap pairs of xy in a defined region (shapefile) can u help???? regards from India
@marinstatlectures4 жыл бұрын
It difficult to answer without knowing exactly what your data looks like, but it sounds like you will want to res ample entire rows of your data
@veducatube57014 жыл бұрын
@@marinstatlectures Thank you for replying sir. Im giving you a dummy data : lat long water table ( depth in m) 29 79 23 28.45 78.30 21 27 77.45 25 30.30 79.02 26 31 77 22 25.45 80.30 32 Assume that all these original points of latitudes and longitudes with water table values fall in a district (boundary line of this district is a map file format called .shp or ESRI shapefile ). Sir, I want to bootstrap these three columns so that I may have more geographic points for water table in my district. That is possible only when latitudes and longitudes must not fall outside the district boundary or shapefile, meaning the lat long column values must remain contend within shapefile latitudes and longitudes. Sir its very crucial for me. Please guide or share some codes with me.. THank YOU
@sunayana985 жыл бұрын
Hi Marin, when I'm trying to find the test stats of bootstrap samples, R is telling me 'i' is not found. What do I do?
@marinstatlectures5 жыл бұрын
it's difficult to tell without knowing the code you've entered, etc. but it sounds like this part of the code is not in a loop that is running from i=1,2,...,B the "i" is referencing the iteration number in the loop...and R cannot see what i is, so it sounds like you either having initiated a loop, or that command is outside of the loop
@sunayana985 жыл бұрын
@@marinstatlectures I've typed the command exactly like how you've typed it i.e., in the square brackets. However, it says 'i is not found'. Is there an alternate command?
@pilobond5 жыл бұрын
@@sunayana98 I had the same problem but then I realized I type "for (i in i:B)" instead of "for (i in 1:B))" by mistake. Once this was corrected it ran fine. I wonder if you have the same problem.
@damianspencer4 жыл бұрын
Do you do consultations? Please contact me.
@marinstatlectures4 жыл бұрын
It depends on the work. I have no way of contacting you. You can get in touch with me if you like, my contact info is in the about section of our channel