Bootstrap Hypothesis Testing in R with Example | R Video Tutorial 4.4

Bootstrap Hypothesis Testing in R with Example | R Video Tutorial 4.4 | MarinStatsLecutres

Рет қаралды 52,394

MarinStatsLectures-R Programming & Statistics

Күн бұрын

Пікірлер: 67

@marinstatlectures 6 жыл бұрын

In this R video tutorial, we learn to use R to perform a hypothesis test using a bootstrap approach. An R package does exist for bootstrap hypothesis testing (“boot”), but the package is limited. Here we will show how to build the bootstrap approach; this will allow us to make changes to the sorts of statistics/estimates we want to conduct the test for. The R script accompanying this video has all the R codes used in this tutorial plus extra R codes for students to explore on their own ( statslectures.com/r-scripts-datasets ). If you like to support us, you can Donate (bit.ly/2CWxnP2), Share our Videos, Leave us a Comment and Give us a Like ! Either way We Thank You!

@AndreaLyuu 3 жыл бұрын

Hello Mike! Thanks for all the great content! Please help me with this question: If our hypothesis test were to be a one sided (H0:mean-c>=mean-m, H1: mean-c

@kaibecker1411 3 жыл бұрын

10:23 Minor mistake: 48.0 and 68.2 (#16) are larger than your test statistic

@auroralorenzi2957 5 жыл бұрын

Hi Marin. Thank you for your videos! It's not an easy topic to teach, but your videos are very clear.

@carlosalexandrecosta4234 4 жыл бұрын

Just an amazing content. I'm studying Data Science and i can tell you, your videos are helping a lot! Thank you!

@yoyme_blumenfeld Жыл бұрын

Lovely video, thank you! everything clear and I love that you showed us what is behind those functions, so one can experiment with other libraries having good benchmark.

@kahuilim2122 4 жыл бұрын

Thanks for taking the time to put these amazing videos together.

@marinstatlectures 4 жыл бұрын

You’re welcome

@cb4808 3 жыл бұрын

5:00 what do these numbers in the brackets mean? set.seed(112358)

@zjardynliera-hood5609 3 жыл бұрын

its just an arbitrary number to set the seed to that rng state. Basically if I set that seed on my machine, I would get the same results he is getting, and if I reran the code, I would get the same results each time.

@TanMan1 4 жыл бұрын

at 5:35 in the video - if you want to bootstrap for multiple variables, how would you adjust the code?

@yinanxue8653 6 жыл бұрын

Thank you for the video. I got a question. How do you know that in the bootstrap data, the first 12 rows are weights of casein and the last 13 rows are weights of meatmeal?

@yinanxue8653 6 жыл бұрын

Nvm. I figured it out. Is it because H0 is difference between mean equals 0, so if H0 is true, we can pool the data?

@yekhtiari 6 жыл бұрын

I have the same question.

@marinstatlectures 5 жыл бұрын

for bootstrapping, we resample with replacement...so really, it just matters that we have 12 observation for the one group, and 13 for the other. we could just as easily label the first 13 rows for meat meal and then the next 12 for casein. all that is really necessary is that we assign 12 of them to casein and 13 to meatball...but the order we assign them in doesnt matter...hope that clarifies it...

@gillianjean2237 5 жыл бұрын

@@marinstatlectures Hi Mike, brilliant video but I'm still confused about this part. I have 45016 observations, 22049 in group A and 22967 in group B. How do I make sure that I'm sampling correctly from both groups?

@rhosigma2199 5 жыл бұрын

@@marinstatlectures Thank you for the video. I also have the same question. BootstrapSamples = matrix(sample(variable, size =n*B,replace =TRUE ,nrow =n,ncol=B) Does this function assure that the BootstrapSamples have 13rows of the meat meal and 12rows of the casein?

@niceday2015 2 жыл бұрын

Thank you very much, Your teaching has expanded my perception of the world

@MrSoumyaBanerjee 4 жыл бұрын

Thank you for the video, but by resampling from d$weight, won't it cause the casein and meatmeal observations to get jumbled up in most cases? Wouldn't it be better to create 2 separate dataframes for each set of weight values, and resample separately from these 2?

@marinstatlectures 4 жыл бұрын

Good question. Because in a hypothesis test we begin by assuming there is no difference in weight of the two groups, we want them all mixed together..to see how often we’d observe a difference as large or larger than we saw, if there really is no difference in the groups. When building a confidence interval however, we want to keep the groups observation ms separate, to preserve any group differences observed. Hope that clarifies it

@SinanMavruk 2 жыл бұрын

@@marinstatlectures Thank you for the video and explanations. You made the point very clear for me. But in this case, the only difference between using permutation or bootstrapping is in the replacement in resampling? So, how to decide which test is better in application?

@farahyounes2813 3 жыл бұрын

thank you for your explanation, for the case of a time series how we can apply the method of bootstrap to compare two spectral densities

@JJ-wx3nd 3 жыл бұрын

Thank you so much! Especially after I found the boot() in R documentation is not enough to know.

@feloria1862 3 жыл бұрын

When you resample at 6:20 does the resample keep the feed type variables separate? Or do weight values from either feed type resample to anywhere in the 23 observations?

@feloria1862 3 жыл бұрын

I figured it out the feeds are resampled to any position because the null hypothesis is that the two populations are the same, so swapping them to any position even if the feed type doesnt match is fine.

@nevertheless4504 Жыл бұрын

HI sir. you video really helps me a lot. However, I just have question. wen we want to do the one side test. We just need to delete the abs right, and when do the comparing, mean(Boot.test.stat1 >=test.stat1). we only use < or. >. instead of >=. Am I right

@sohanaryal 2 жыл бұрын

How can we randomly assign first 11 to one type of feed and remaining to other type of feed in bootstrap matrix?

@forestalgarcia1506 4 жыл бұрын

If I have more than two "diets", how to calculate the absolute difference of means?

@alinevm4915 2 жыл бұрын

such a didactical video, thanks a lot!

@xdienn 3 жыл бұрын

Great video, thank you so much! I'm still wondering though, how to proceed if you have a 2x2 factorial design? Do you then calculate 4 test statistics, one for each group? And for interaction effects?

@b.ambrozio 4 жыл бұрын

Thanks for the video! Question: About your last statement: "Any time doing a hypothesis test, we should also include a confidence interval to give an ideal of how big the difference would be". Does it mean I should run my t-test against the CI instead? (e.g. calculate the CI of all my 1000 arrays, and do the t-test agains the means from the CI means? In other words, should I use the means calculated from the range between the first and last quantile in my t-test? )

@munafahmed725 4 жыл бұрын

In mean.default(BootstrapSamples[1:12, i]) : argument is not numeric or logical: returning NA How to solve this error?

@juanfranciscoecheverry3031 4 жыл бұрын

I have the same problem. thanks a lot!

@gbganalyst 6 жыл бұрын

Can we then interpret the results of Bootstrapping with the way we interpret the result of independent t-test?

@marinstatlectures 6 жыл бұрын

For the most part you can. The beauty of the bootstrap though is that you can also work with more interesting/relevant statistics, aside from just mean/median, which the classical approaches use. You can work with just about any statistic/estimate you can imagine

@Rainstorm121 3 жыл бұрын

Thanks very much Sir. If I have random distribution of scores for each variables as follows: A=7, B=13, C=23, D=19, E=15, F=30. If I want to do hypothesis testing to find out which of the variables has statistical significance of score, what is the best advise in using bootstrapping in this situation? Given that Ho: expected probability for each of the variables is equal to 0.12, and Hi: is not equal to 0.12.

@Loggies89 3 жыл бұрын

You lost me at the i=1 and i=2 bit. Is there a step you aren't showing where these are created? I'm getting an error saying object 'i' not found, so i assume i have to create it at some point before entering it into the boot test statistic.

@gruppenzwangimweb20 5 жыл бұрын

nice video! btw... this BootstrapSamples

@duncanrager7180 5 жыл бұрын

Hi Mike, thanks for the helpful video. In this case, the first test statistic is the same as used in a two-sided, two-sample T-test. As an alternative, to use the same test statistic as a one-sided, two-sample T-test, would that be the difference of the mean weight for the two diets (not the absolute difference)?

@marinstatlectures 5 жыл бұрын

yes, that's right

@alinepontes7360 4 жыл бұрын

Thanks for the video! It was the only way for doing a hypo test for a complex dataset. The boot package was not enough. BTW, is there a recommended citation for this boot method (e.g. a book)?

@mathieufen2239 4 жыл бұрын

Very cool video! Thanks! I wonder if this approach could be used on paired data...

@marinstatlectures 4 жыл бұрын

Definitely, the concept of bootstrapping can be used for just about any structure of data. I explained it simply here, but the concept transfers very widely

@parvenraj98 4 жыл бұрын

You are the best !!!!!

@chathuraedirisuriya6535 6 жыл бұрын

Bootstrap Hypothesis Testing R script link direct to a wrong file. Please correct it.

@marinstatlectures 6 жыл бұрын

thanks for letting us know, It should point to the correct file now.

@MsWilliam63 5 жыл бұрын

Hi Mike, great videos. Really clear and helpful. I have two questions. What is the best way to report these results in text? Is it best just to report the bootstrapped difference in means and SD and p value (e.g., observered stat = X, mean ± SD, p-value=X)? Is it possible to combine a t-test or Mann-Whitney U with the bootstrapped data in order to get a t-stat as well as the p-value for your difference in means/medians?

@marinstatlectures 5 жыл бұрын

it's hard to say the best way to report, as that really depends on context...what is the discipline, what was the focus of the paper, etc. regarding approaches, you can certainly combine this sort of approach with a parametric one like the t-test. for example, you may wish to use a Bootstrap only to estimate the SE of your estimate, and then substitute this estimate into a standard approach like a t-test ex: Confidence interval: Estimate +/- t * BootstrapSE this of course requires the assumptions of the standard t-test/confidence interval approach to be met.

@AndreaLyuu 3 жыл бұрын

Hello Mike! Thanks for all the great content! Please help me with this question: If our hypothesis test were to be a one sided (H0:mean-c>=mean-m, H1: mean-c

@mbellett74 4 жыл бұрын

if the column "feed" is not ordered with( respect to meat meal and casein), how to order it before to run the boot.test.stat? many thanks

@marinstatlectures 4 жыл бұрын

You don’t necessarily need to order it, but you can do that with the sort() command. You can also use the tidyverse arrange() command as well

@mbellett74 4 жыл бұрын

@@marinstatlectures many thanks Mike, I did it using arrange(). Maybe I have to do it because in the boot.test command I have to define two groups of lines to confront: abs(median(bootstrapsamples1[1:938, i]) - median(bootstrapsamples1[939:1511, i] , ...and in my data-set the two groups of events to confront are mixed, again many thanks!!

@santiagomendozapaz2135 4 жыл бұрын

@@marinstatlectures maybe I am misunderstanding the figure, but, if our matrix from which we are going to resample contains 12 values for type 1 and 11 values for type 2 and we apply the resampling directly to the 23 values, the resulting resampled matrix is going to contain randomly 23 values from both types, therefore, why are you obtaining the mean between [1:12] and [13:23] as in the resulting matrix we are not sure if type 1 is contained in [1:12] or type 2 in [13:23]?

@veducatube5701 4 жыл бұрын

Sir i jeed to bootstrap spatial point data... Meaning I have 10 values with lat long and a z . I need to bootstrap pairs of xy in a defined region (shapefile) can u help???? regards from India

@marinstatlectures 4 жыл бұрын

It difficult to answer without knowing exactly what your data looks like, but it sounds like you will want to res ample entire rows of your data

@veducatube5701 4 жыл бұрын

@@marinstatlectures Thank you for replying sir. Im giving you a dummy data : lat long water table ( depth in m) 29 79 23 28.45 78.30 21 27 77.45 25 30.30 79.02 26 31 77 22 25.45 80.30 32 Assume that all these original points of latitudes and longitudes with water table values fall in a district (boundary line of this district is a map file format called .shp or ESRI shapefile ). Sir, I want to bootstrap these three columns so that I may have more geographic points for water table in my district. That is possible only when latitudes and longitudes must not fall outside the district boundary or shapefile, meaning the lat long column values must remain contend within shapefile latitudes and longitudes. Sir its very crucial for me. Please guide or share some codes with me.. THank YOU

@sunayana98 5 жыл бұрын

Hi Marin, when I'm trying to find the test stats of bootstrap samples, R is telling me 'i' is not found. What do I do?

@marinstatlectures 5 жыл бұрын

it's difficult to tell without knowing the code you've entered, etc. but it sounds like this part of the code is not in a loop that is running from i=1,2,...,B the "i" is referencing the iteration number in the loop...and R cannot see what i is, so it sounds like you either having initiated a loop, or that command is outside of the loop

@sunayana98 5 жыл бұрын

@@marinstatlectures I've typed the command exactly like how you've typed it i.e., in the square brackets. However, it says 'i is not found'. Is there an alternate command?

@pilobond 5 жыл бұрын

@@sunayana98 I had the same problem but then I realized I type "for (i in i:B)" instead of "for (i in 1:B))" by mistake. Once this was corrected it ran fine. I wonder if you have the same problem.

@damianspencer 4 жыл бұрын

Do you do consultations? Please contact me.

@marinstatlectures 4 жыл бұрын

It depends on the work. I have no way of contacting you. You can get in touch with me if you like, my contact info is in the about section of our channel