Bootstrap in Stata

Рет қаралды 32,475

Econometrics, Causality, and Coding with Dr. HK

Күн бұрын

This video will talk about some of the basics of bootstrapping, which is a handy statistical tool, and how to do it in Stata.

Пікірлер: 71

@I.amBago 4 ай бұрын

First 3 minutes were exactly what I wanted to know. Thank you!

@aysecetinel 2 жыл бұрын

This was super helpful!!! I first bootstrapped using bsample and then ran a multi-level fixed effect using reghdfe in the loop. It works great! I noticed by default it sets the obs to be equal to the size of the dataset that you are sampling from. It also lets you oversample by setting an obs greater than the size of the dataset. I also tried bootstrapping by using the command bootstrap, reps(#): then reghdfe. This by default lets you specify the obs number to sample as equal to the number of clusters in the dataset. Thank you again for creating content and sharing! Looking forward to reading your book and hope that you'll have workshops tailored to grad students around the world.

@jessyjkn 3 жыл бұрын

Omg you literally SAVED MY LIFE!!!!! Thank you Thank you Thank you!!!!!!

@tarantula6649 Жыл бұрын

Very helpful video! Thanks a lot!

@pablovelazquez1903 6 жыл бұрын

Thank you for this clear explanation.

@lifehappy217 3 жыл бұрын

Hi, Nick. Thank you so much for the nice video. I am doing panel regression, and wondering whether it is possible to use bootstrap to get the confidence intervals for the panel model using stata (or r).

@NickHuntingtonKlein 3 жыл бұрын

Yep! That's a different goal than in this video though. See www.stata.com/support/faqs/statistics/bootstrap-with-panel-data/

@lifehappy217 3 жыл бұрын

@@NickHuntingtonKlein Thank you so much. This is what I want to learn. It is helpful.

@yasmindoghri9175 2 жыл бұрын

Thank you very much for this video!! I was wondering if I could use bootstrap with different samples. To construct an index, I constructed an index merging data from a different dataset (I extracted mean values per variable from the latter one since it is way larger than my sample). I would like to check if the final index measurement is influenced by the external sample dimension. So as original dataset I considered my sample and after preserve I inputted the external dataset, whereas in the loop I put the distance index formula. Yet, once I run it, it says already preserved. what am I getting wrong?

@NickHuntingtonKlein 2 жыл бұрын

There's a bit too much in here for me to follow it, but if you're getting an already-preserved erorr, that means that you tried to preserve twice in a row without a restore in between. So make sure each preserve is matched by a restore, or if you want to clear out your last preserve without restoring, use "restore, not"

@gabriellanocita4239 3 жыл бұрын

Thanks for the video! I'm wondering if bootstrapping can be used to run an MLM model with random effects predictors in Stata?

@NickHuntingtonKlein 3 жыл бұрын

It sounds like you're looking for bootstrapped standard errors, which is something a bit different than this video is about. But yes you can apply bootstrap SEs to any model in Stata, see www.stata.com/features/overview/bootstrap-sampling-and-estimation/

@mikecheng6010 4 жыл бұрын

Hi thank you so much Nick! If I wanna get the coefficient for each iteration, what should I do?

@NickHuntingtonKlein 4 жыл бұрын

If you are running a regression in your bootstrap you can pull a coefficient out and store it in a local (just like in the code in the video). The way to refer to a coefficient after running the regression is with _b[x], where x is the name of the variable you want the coefficient for

@mikecheng6010 4 жыл бұрын

@@NickHuntingtonKlein Got it thank you so much it works perfectly.

@aibannongspung1765 2 жыл бұрын

Hi Nick .Thank you so much for the insightful video.I have a question to ask you .I am running a regression model and have also added weights to it ( eg I used aw= wt) since the survey data comes with a survey weight /multiplier.I need to bootstrap the model and report the standard errors thereafter.However I cannot use the same weights while bootstrapping . Is there a way around this issue? Will the standard errors generated without weights after bootstrapping be significantly different from the standard errors of the regression model with the weight ?

@NickHuntingtonKlein 2 жыл бұрын

The downloadable package bsweights will help you do this

@aibannongspung1765 2 жыл бұрын

@@NickHuntingtonKleinThank you for the reply .I just want to mention that the survey data that I am using does not have replicate weights. From what I understand, bsweights are helpful when the survey data also includes replicate weights. Can bsweights be used to manually generate these replicate weights for survey data without them ?

@NickHuntingtonKlein 2 жыл бұрын

@@aibannongspung1765 oh I see. Maybe look at svy bootstrap. The replicate weights refer to the weights you get from bootstrap www.stata.com/manuals/svysvybootstrap.pdf

@aibannongspung1765 2 жыл бұрын

@@NickHuntingtonKlein Thank you Nick .I will give it a try .

@kangkana1354 3 жыл бұрын

Thank you so much Nick. I have a query on whether bootstrapping can be used on a survey weighted data set, which uses a svy command before a regression. If yes, how can the codes be modified?

@NickHuntingtonKlein 3 жыл бұрын

If you're just trying to get bootstrapped SEs, look at the "svy bootstrap" help file

@kangkana1354 3 жыл бұрын

@@NickHuntingtonKlein Thank you so much. I am going through the file currently to clear the basics.

@ProfessorAliAhmed 4 жыл бұрын

I am using the stata KCDF function and then the variable generated from this into my regression model. Since my variable is estimated, I have to bootstrap the process. I am able to do the looping and bootstrapping based on your method, But I not able to use the generated bootstrapped variable in the model to get bootstrapped standard errors. any suggestions would be very helpful. Thank yo.

@NickHuntingtonKlein 4 жыл бұрын

Just take the standard deviation of your bootstrapped coefficient (for example, with the summarize command). That's the bootstrap standard error.

@ProfessorAliAhmed 4 жыл бұрын

@@NickHuntingtonKlein Thank you Nick!

@nandinimishra2149 Жыл бұрын

Nice Job Nick 💓💓💓💓

@nandinimishra2149 Жыл бұрын

May u share ur I'd for asking some problem related stata

@evahakobjanyan8528 5 жыл бұрын

great video,I have question .I did exactly you show in video,but without g x normal,because I already had data. But error happens every time. ''invalid obs no'' what does it mean?

@NickHuntingtonKlein 5 жыл бұрын

The "set obs" command is for the purpose of creating the fake data, you don't need it if you already have data, and it will produce that error.

@evahakobjanyan8528 5 жыл бұрын

@@NickHuntingtonKlein do I need g store_means that you write before the word 'quietly'

@NickHuntingtonKlein 5 жыл бұрын

@@evahakobjanyan8528 You need some sort of variable to store the results in, yes.

@ataliethompson6725 4 жыл бұрын

How does one get a bootstrap 95CI and p-value for the difference in two proportions, particularly in multilevel data? I have dataset where eyes are nested within subjects. I want to show that the proportion of var1 is significantly different from the proportion of var2, and since the data is multilevel I'm assuming bootstrap 95CI and p value would be the way to address this?

@NickHuntingtonKlein 4 жыл бұрын

For multilevel data you generally want to do bootstrap sampling by cluster. Once you do that, just store all the ratio estimates from all the bootstrap iterations. The 2.5th and 97.5th percentiles of the estimates are your confidence interval.

@ataliethompson6725 4 жыл бұрын

@@NickHuntingtonKlein How does one bootstrap for the difference in two proportions (as opposed to a mean)?

@NickHuntingtonKlein 4 жыл бұрын

@@ataliethompson6725 that's the beauty of bootstrap - just calculate whatever it is you want to calculate in each of the bootstrap samples. So calculate the difference in proportions

@andreab2114 4 жыл бұрын

What if I have missing values or a multiply imputed dataset ?

@NickHuntingtonKlein 4 жыл бұрын

Missing values you just keep using as normal. For multiple imputation you could bootstrap each imputation separately. There might even be a special MI bootstrap in stata 16, I'm not sure, they added a bunch of MI stufd

@user-cr7hy7sr7s 3 жыл бұрын

Thank you so much for your wonderful video! I just registered this channel as my favorite. Thanks. I'm wondering if I could use this in the regression command. In each loop, I opened the original dataset, ran the regression command and obtained the coefficient. Then I aggregated the results of each resampling. (I mean I calculated the mean and sd of the coefficient.) Am I right?

@NickHuntingtonKlein 3 жыл бұрын

Yep, that works

@user-cr7hy7sr7s 3 жыл бұрын

@@NickHuntingtonKlein Thanks! Your videos went viral in my community!

@user-cr7hy7sr7s 3 жыл бұрын

@@NickHuntingtonKlein By the way, in Stata software, the bootstrap command can also work but the coefficients do not change and only standard errors change. I could not understand why. sysuse auto, clear regress mpg weight gear foreign regress mpg weight gear foreign, vce(bootstrap, rep(1000)) In the second command, you can get the coefficient and SE. But the coef is actually the same as the original model. What is the difference?

@NickHuntingtonKlein 3 жыл бұрын

@@user-cr7hy7sr7s The second command is estimating the coefficient by regular OLS and only the standard errors by bootstrap. This is actually a good idea if you plan to use them for hypothesis tests, as it helps any hypothesis tests done after the fact be sure they're comparing the right things.

@user-cr7hy7sr7s 3 жыл бұрын

@@NickHuntingtonKlein Thank you very much! Got it! Now I understand the mechanism. Much appreciate it. I am working on prediction model development and I wanted to learn how to perform internal validation using the bootstrap resampling method. I guess your program would work to calculate the optimism statistics to evaluate the prediction model based on the regression models. Aren't you going to make some video on this topic??

@QuynhNguyen-ij6fe 4 жыл бұрын

Can you guide using bootstrap with xtabond2? Thanks

@NickHuntingtonKlein 4 жыл бұрын

For bootstrap SEs? I'm not certain that the bootstrap standard error assumptions are justified in the Arellano-Bond case. But in any case you should be able to apply the guide on this page about boostrapping in a panel/ts setting www.stata.com/support/faqs/statistics/bootstrap-with-panel-data/

@YorgosEU 5 жыл бұрын

I am doing a Cost effectiveness analysis for costs and health benefit. from my data I calculated an average cost and an average effect per treatment arm in order to calculate the ICER . Then my Supervisors told me that this is not enough and that I need to do bootstraping...i know how but... I DO NOT HAVE A CLUE WHY do I need to do this though. Does anyone know? THANKS!!

@NickHuntingtonKlein 5 жыл бұрын

I would recommend posting this question in more detail on StackExchange

@YorgosEU 5 жыл бұрын

@@NickHuntingtonKlein thanks Nick

@alisadavtyan2133 5 жыл бұрын

what command should I change if I already have exsiting varaible. thsi part g X=rnormal(4)*2+4

@NickHuntingtonKlein 5 жыл бұрын

Bootstrapping over an existing variable? It should all work the same, you can just skip generating a new variable and use the old one.

@alisadavtyan2133 5 жыл бұрын

@@NickHuntingtonKlein and what about set obs 10000 ?Should I write my obs number ?

@NickHuntingtonKlein 5 жыл бұрын

@@alisadavtyan2133 Everything before the "save originaldata.dta" line is just me creating the fake data, you don't need it. You can just open up your existing data instead.

@alisadavtyan2133 5 жыл бұрын

@@NickHuntingtonKlein and local boots are number of my obs ?

@NickHuntingtonKlein 5 жыл бұрын

@@alisadavtyan2133 That's the number of bootstrap iterations

@justalice5139 5 жыл бұрын

what if it shows ''floor not found''?

@NickHuntingtonKlein 5 жыл бұрын

That suggests there's an error in the line with floor in it. Remember, floor is a function, not a variable. So floor() is correct, not floor () or floor*()

@HE-gw2gr 10 ай бұрын

How to implement Kónya (2006) bootstrap panel granger causality approach in stata?please help me😢

@NickHuntingtonKlein 10 ай бұрын

No idea! Never heard of it. If it were me I'd Google for it.

@HE-gw2gr 10 ай бұрын

@@NickHuntingtonKlein Thank you.Of course I searched, unfortunately I couldn't find it.

@diverdown0011 6 жыл бұрын

Could you provide the do file. I keep getting an error

@NickHuntingtonKlein 6 жыл бұрын

Walter Chin I'm afraid I didn't keep the do file. It's just the same code you can see in the video though.

@diverdown0011 6 жыл бұрын

Thank for taking the time to reply. I figured it out. There was a minor issue in the code I entered. The boot code is working. Would you happen to know how this can be done for nested data? I have diving data with parameters of depths and bottom times (how long and how deep). These dives belong to a group of 17 small-scale fishermen divers. Each fishermen conducted a range of 100-400 dives per year. My goal is get a good understand for what their average depth and bottom time. The dives are nested within each fishermen. The average per fishermen have a lot of variance. Anyway any help is greatly appreciated.

@NickHuntingtonKlein 6 жыл бұрын

Walter Chin There are two ways to go about this depending on what you want to do with it. One uses the "strata" option of bsample, and the other uses the "cluster" option (see help bsample). Strata does a bootstrap such that you are resampling within fishermen (ie fisherman A did ten trips and B did 16, so you resample from A ten times and B 16 times). Cluster resamples at the fisherman level (ie it will resample from fisherman A and fisherman B, picking all the trips that fisherman goes on). If the problem is that there's a lot of noise within fishermen, you probably want the strata option, but I'd recommend looking closer at the help file for more details.

@ASMTowhid 6 жыл бұрын

Could you please help me? My code is not working. It's showing following error: . set obs 'boots' ''' invalid It is not an integer or its value is too large.

@hamaybe 4 ай бұрын

@@ASMTowhid the first apostrophe should be a backtick (next to the one) i.e. `boots'; it is an annoying feature of specifying locals