Difference in Differences Estimation in Stata

  Рет қаралды 206,663

SebastianWaiEcon

SebastianWaiEcon

Күн бұрын

Пікірлер: 215
@wm6698
@wm6698 3 жыл бұрын
Thank you so much for this! My concern is why didn't you run a complete regression model for house price? Why only a bivariate regression? (i.e., dependent and dummies).
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
The purpose of this video is to demonstrate the basic technique of differences in differences estimation. You can certainly add controls to the basic model, but that is outside the scope of this video. You can search my channel for other videos on that.
@d.gondwe
@d.gondwe Ай бұрын
This video formed the backbone of my masters degree dissertation some 5 years ago. And guess what! I got the honourable mention for best thesis award! Eternally grateful
@tdogz8932
@tdogz8932 4 жыл бұрын
I watched the video 2 years ago, it helped me understand the DID model and Stata so that I could finish my graduation dissertation on time. After the graduation, I published another paper using the same model, thank you sooooooooo much!!!!!!!!
@fgghdfg8638
@fgghdfg8638 3 жыл бұрын
Can you help me to do my diff in diff other way I will miss my year we can talk about price
@tdogz8932
@tdogz8932 3 жыл бұрын
@@fgghdfg8638 I'm sorry that I just see your message now. Hope you are doing fine with your dissertation:)
@amnashaukat7827
@amnashaukat7827 3 жыл бұрын
Can you help me in this technique?
@tamandanikuchanje260
@tamandanikuchanje260 2 жыл бұрын
Hello can you help me?
@dargon1084
@dargon1084 2 жыл бұрын
I learnt more in this video than six 2-hour videos of my own uni's lectures
@davecullins1606
@davecullins1606 4 жыл бұрын
You saved my exam in the previous semester, and you're saving me in this semester as well!
@jonaFUN999
@jonaFUN999 3 жыл бұрын
I’m from Andover, England and I approve this video 👍
@dandellionsy6537
@dandellionsy6537 4 жыл бұрын
Thank you so much, I need it. My model might be more complicated but at least I can sense the idea of doing it. Awesome! Keep sharing more
@huangkiana6165
@huangkiana6165 3 жыл бұрын
THIS VIDEO SAVED ME FROM MY DEADLINE. THANK YOU SO MUCH *cry
@sylvieyin5261
@sylvieyin5261 3 жыл бұрын
Thank you so much. This video makes my HW much easier.
@danielkrupah
@danielkrupah 3 жыл бұрын
Sir, please do you provide a paid service for the DD. I needed a coach
@simonazambelli5320
@simonazambelli5320 3 жыл бұрын
Thank you very much. You explained everything very clearly! Thanks
@simonazambelli5320
@simonazambelli5320 Жыл бұрын
Love it! Thank you Sebastian!!
@timothyowuor9478
@timothyowuor9478 3 жыл бұрын
Nice tutorial on DID, thanks for saving me
@sireenkhalili8631
@sireenkhalili8631 3 жыл бұрын
Thank you so much for this video, it was really helpful!
@sajidnoor9482
@sajidnoor9482 4 жыл бұрын
Thank you very much for explaining this very clearly.
@lVaNeSsA90
@lVaNeSsA90 3 жыл бұрын
what did u use rprice and lrprice varibles to?
@MrAdhoul
@MrAdhoul 2 жыл бұрын
Gread video, thank you.
@samknight7290
@samknight7290 5 жыл бұрын
Hi Sebastian, thank you very much for the video. Just wondering why you did not regress the other independent variables?
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
I wanted to keep things simple and focus just on the diff in diff technique. However, you can certainly add more variables to the regression as controls.
@nazlcaneroglu4427
@nazlcaneroglu4427 4 жыл бұрын
Thank you for the video! Btw is there any way that we can also see the trends of both groups by drawing a line graph in Stata? If the trends are same before the treatment period, we should be able to see that right?
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
Yes, you can use a twoway graph to do that.
@user-vb3do7hh9v
@user-vb3do7hh9v 2 жыл бұрын
Thanks, Very well explained. Can I get this dummy data set or can you please guide from where I can get such dummy data set for educational / learning purpose only ?
@VINAYKUMAR-kf6kd
@VINAYKUMAR-kf6kd Жыл бұрын
Thanks for the detailed Info. what if my Dependent variable is Categorical like Anemia (Yes / No). What should i need to take B coefficient or Exp(B)?? And how to cross check in excel ?
@aung9211
@aung9211 5 жыл бұрын
Could you please provide how to check the Equal Trend (Parallel Trend) assumption.
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
Unfortunately, we can't do it with this dataset, since we don't have extra data on either side of the change.
@shamsunnahar2294
@shamsunnahar2294 4 жыл бұрын
clear presentation. Do you have any video on two way cluster regression in stata. If yes, please send me the link here.
@sabrinanasir5844
@sabrinanasir5844 4 жыл бұрын
Thanks for the video! If you don't have an ideal counterfactual control group (i.e. there are some slight differences between the treatment and control groups in the pre-treatment period), can you add other independent variables to the diff n diff when running the regression in Stata?
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
Yes, you can.
@YY-ty5fx
@YY-ty5fx 4 жыл бұрын
What a clear explanation! I'm working on my own DD regression, and it really helped. Does the dependent variable 'price' cover prices before & after the treatment here, right?
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
At the beginning of the video, I show the data browser and scroll through the data. You can see some observations are before and some are after.
@nathanmasak
@nathanmasak 3 жыл бұрын
That's really helpful. Thank you. Did you ever run the "event study" model? I can't find resources on this model? Your input would be appreciated.
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
I haven't, but a Google search turned up some resources. Best of luck with it.
4 жыл бұрын
Hey, thanks. How do you do it with multiple time points?
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
You can still make a variable indicating before and after treatment. You might also want to think about a fixed effects regression.
@subhalakshmipaul4816
@subhalakshmipaul4816 6 жыл бұрын
Hello sir, please provide a video on reshape long from wide particularly when data sets is very large in size ..I.e., how to organise the variables before reshape... please sir ...
@sarahfranz5748
@sarahfranz5748 3 жыл бұрын
Thanks for this video! One question: how would you proceed if you are comparing the difference between control and treated group across a 4 week period, testing whether the difference is bigger in the beginning and decreases?
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
You can interact a time variable (linear trend, or quadratic, etc.) with a treatment dummy variable.
@myleswhitmore8803
@myleswhitmore8803 3 жыл бұрын
Hi SebastianWaiEcon, I am a student at Morehouse College, and I really enjoyed watching your video. I need help running a Diff in Diff regression for my research paper. For context, I am using Stata to analyze NAFTA's impact on GDP and trade flow for its member nations. To facilitate this process, I will be running an individual diff and diff analysis for each country. My dummy variable will be years before 1994 (when NAFTA was signed) and after 1994. My DV will be GDP growth. And my extra variables will be looking at human capital, agriculture industry growth percentage, manufacturing growth percentage, and other variables. However, I struggle with the Stata platform and would like your advice to ensure this regression runs smoothly.
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
The most important thing for diff in diff is to identify a control and treatment group. In your case, that might be countries that were part of NAFTA and countries that were not.
@amnashaukat7827
@amnashaukat7827 3 жыл бұрын
@@sebastianwaiecon Enjoying your video.. But I neend help.. I have 25 countries and data from 1960-2020... How can I specify only one time 2012 while comparing it 2010-2016.. please help me
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
@@amnashaukat7827 A fixed effects model may be more appropriate: kzbin.info/www/bejne/fmqYc3uprMeHadk&ab_channel=SebastianWaiEcon
@MrLi1231
@MrLi1231 4 жыл бұрын
Hi Sebastian, thank you so much. Quick question. Is this dataset a panel, or two separate cross section datasets? I am assuming it is two separate cross section, right?
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
You are correct. It's a pooled cross section. It would be very unlikely for the same houses to be on the market in both years.
@MrLi1231
@MrLi1231 4 жыл бұрын
@@sebastianwaiecon Good point and thank you so much for the quick reply! I am working on a thesis and realised that I was supposed to be doing DiD when I had done a different methodology for the many few weeks. Your video is incredible. Big thanks from Australia!
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
Happy to help!
@2thedata
@2thedata 3 жыл бұрын
Thank you so much! Your video helps me! :D
@ssjvegeto4ever
@ssjvegeto4ever 3 жыл бұрын
Hi Sebastian, thanks a lot for the clean explanation! Could you tell me why you were inlcuding post-treatment levels of your covariates? Aren't they endogenous and thus result into bias? Thanks in advance!
@jackgandhi
@jackgandhi 3 жыл бұрын
I don't understand the question. What I showed here is the most basic version of diff in diff, with the bare minimum amount of variables needed. Even if I had added more variables, that would not have created any bias -- bias happens because you left variables out.
@ssjvegeto4ever
@ssjvegeto4ever 3 жыл бұрын
@@jackgandhi Thank you for the fast reply! Sorry I meant the covariate data structure. I recently did an DiD setup making use of this video's datastructure - and got the criticism that, since I included covariates with a time index for the post traetment period in the regression - these were endogenous and would thus impose bias.
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
@@ssjvegeto4ever What you are describing is a common and valid criticism of time series analysis. The purpose of diff in diff is, if the data allows, solving this problem using a control and treatment group. The "post" dummy (y81 in the video) is not enough to establish a causal relationship. This is why we have the interaction term (y81nearinc in the video). In this video, y81 controls for effects over time that are constant across groups while nearinc controls for group effects that are constant over time. The interaction pulls out the estimated effect. This is not to say this method is perfect as there could still be endogeneity due to variables that are constant neither across groups nor across time, so you still may need to think about controls. The diff in diff method is just one tool in the analyst's toolbox.
@yanvianna4737
@yanvianna4737 2 жыл бұрын
Could you demonstrate how it would work when more than a year before and after treatment?
@TommasoSchembri
@TommasoSchembri 3 жыл бұрын
Hi, thanks for the clear explanation. Is it possible to to a DID by percentage level? So that i come up with a %increase/decrease in the treatment group? thanks!!
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
Yes, you can take the natural log of the dependent variable to get an approximation of a percentage change.
@aymanissa6722
@aymanissa6722 2 жыл бұрын
Thank for such informative video, Could you plz explain DiD method using diff command
@pneumascope
@pneumascope 4 жыл бұрын
I note that you have large Standard Errors in your findings. Does this in any way have an impact on the reliability of the findings or the interpretation of the overall impact of the program (or incinerator in this case)?
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
It's all relative when it comes to standard errors. You could say an SE of about 8000, as it is here, is large, but the estimate is -20,000. Standard errors are always going to be big numbers when dealing with things like the prices of homes, which are in the tens of thousands. All other things being equal, larger standard errors mean less precision in the estimates. Here, we can still be quite confident the incinerator did decrease property values.
@keith-ole
@keith-ole 4 жыл бұрын
Phenomenal explanation, thank you. If you wanted to include more prior years and a few years after, would you have to make a dummy variable for each year?
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
You don't have to do that, but you might want to look into fixed effects models for that kind of thing.
@pudurvivek
@pudurvivek 5 жыл бұрын
Do we need to check the p values of the variables before understanding the effect of the interaction variable on the dependent variable?
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
If you want to know about p-values, I suggest taking a look at my video on hypothesis testing: kzbin.info/www/bejne/opnSoo2ghq17oM0
@katieleck9955
@katieleck9955 3 жыл бұрын
Hi, many thanks for the video. When I try to do DID for my panel data set, stata says that my treatment group dummy and did variable are omitted due to collinearity, do you know why this would be / how i could fix it?
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
Most likely what happened is that you made a mistake creating your dummy variables. Click the magnifying glass button to look at your data to check what went wrong.
@mertbakirci6030
@mertbakirci6030 4 жыл бұрын
Hey, thanks for the great content here. QUESTION: How can I test for the "common trend" assumption of the DiD-estimator in Stata or in general? Thanks in advance!
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
Usually, this is done informally by comparing the dependent variable movement across groups in an extended period of time before and after the treatment goes into effect. You need a lot more data than I have in this example.
@mertbakirci6030
@mertbakirci6030 4 жыл бұрын
@@sebastianwaiecon thank you!
@IamPaste
@IamPaste 4 жыл бұрын
How would you do it for 1978?
@indagame9
@indagame9 5 жыл бұрын
Have you ever done a coefplot to test the treatment effect? If so, I get a positive but not significant coefficient for my treat dummy variable. This would mean that the treatment group actually saw an increase in the fatalities (my y variable) or does it mean my treatment effect is positive? It is confusing because if I do a lowess plot on just the different states fatalities drops over time. However, in the coefplot the graph is trending upwards.
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
I don't use coefplot, but I don't see why it would show results any different from your regression table.
@md.arrahman7125
@md.arrahman7125 4 жыл бұрын
Dr. Thanks for your excellent explanation. Is this step the same for panel data as I planning to run DID for panel(2000-2019)? Expecting your kind suggestion
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
I have some other videos on general panel data methods.
@trobberkah3425
@trobberkah3425 2 жыл бұрын
Hi, im doing a DiD for my thesis, but im dealing with panel data. Do you know what i should do differently compared to the regression you show in this video? I noticed that there is a stata command for a fixed effects DiD regression for example.
@nunosilva1563
@nunosilva1563 2 жыл бұрын
I face exactly the same situation, can you please reply to the above question?
@AAH123-v4x
@AAH123-v4x 4 жыл бұрын
A very useful video. Thank you so much. I have a question. So i created 3 columns similar to y81 nearinc and y81nric. I am running two part logit and glm model. Since the value of y81 and other two is either 0 or 1. Will we put i.y81 and etc? I mean before binary variable ain't we suppose to put i.
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
For a binary variable, you will get the same result just putting the variable in or using the i. structure. If you have a categorical variable with more than two possible values, then you need to use i.
@AAH123-v4x
@AAH123-v4x 4 жыл бұрын
@@sebastianwaiecon Thanks a lot!!
@zdavirandimuhammad1515
@zdavirandimuhammad1515 3 жыл бұрын
hi thank you for the explanation. but can we req the data so we can also practice?
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
This is the dataset KIELMC.dta from the Wooldridge econometrics textbook. It is widely available online.
@zdavirandimuhammad1515
@zdavirandimuhammad1515 3 жыл бұрын
@@sebastianwaiecon thank you. also for kindly reply my message. God bless. stay safe stay healthy
@FannysVista
@FannysVista 4 жыл бұрын
Hi Sebastian, your video helps me a lot to understand DID estimation. I have a follow-up question. Is it possible to estimate difference indifference for survey data analysis? I try it on my survey data. However, the DID from regression and the DID from manual collapse calculations show a different result.
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
The actual source of the data shouldn't matter here, whether it's from a survey or not.
@johnkaimenyi9292
@johnkaimenyi9292 2 жыл бұрын
Hello, is DID regression possible in STATA 15.0?
@sebastianwaiecon
@sebastianwaiecon 2 жыл бұрын
I'm not aware of any changes in recent versions of Stata that would change anything in this video.
@cherrykhalil7481
@cherrykhalil7481 6 жыл бұрын
Sebastian, thank yo so much for this video. Does the data have to be in long shape? Is there a way to run the diff in diff regression on a wide dataset? Thank you.
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
Yes, you can do it. Generate a new variable for the difference, then regress the difference on a dummy variable for the treatment group.
@cherrykhalil7481
@cherrykhalil7481 6 жыл бұрын
Thank you very much! What about the interaction dummy between year and dummy? Given that my dataset is a balanced panel of 400 firms observed in both 2008 and 2013? Thanks again
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
With the wide dataset, there's no interactions as you've already built it in by taking the difference ahead of time.
@jargodm
@jargodm 6 жыл бұрын
@@sebastianwaiecon Just to follow up on this, if you do have the same units before and after, the paired difference test gives a different result than the regression you discuss in the video: Y = b1 + b2*treat + b3*time + b4*treat*time, which assumes independent samples, does it not?
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
I believe the estimate would be the same, but the standard error would be different.
@Maria-ny2mj
@Maria-ny2mj 6 жыл бұрын
Hi! nice video thank you very much! I have a question, how do you do if there are time varying treatment ? in your example it would be… Imagine there is a neighbourhood (1) that got the incinerator got built in 81 but other neighbourhood (2)82, for example… Would it be reg price y81 y82 nearincneighbourdhood1 nearincneighorhood2 y81* nearincneighbourdhood1 y82*nearincneighorhood2? something like that?
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
You could also consider including interactions between y81 and neighborhood 2 and y82 and neighborhood 1. Once we get into more than 2 periods you should also be thinking of this as a fixed effects model. You may find my video on that helpful.
@Maria-ny2mj
@Maria-ny2mj 6 жыл бұрын
@@sebastianwaiecon thank you very much! I will give a look to the video!
@oluwaseunoginni9828
@oluwaseunoginni9828 5 жыл бұрын
please , how did you generate the interaction variable?
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
Create an interaction term by multiplying the two variables you are interacting.
@alexbrunofmn
@alexbrunofmn 5 жыл бұрын
When was the incinerator built?
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
According to the original paper, construction took place from 1981-1984.
@nirobkothopokothon
@nirobkothopokothon 6 жыл бұрын
Hi, I would like to know whether Difference in differences analysis is suitable for a small data set thats contains only 2 years of data and have only 168 samples (84 control and 84 treatment)? Thank you so much.
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
I don't see any reason why not. However, with only 2 years of data, you have no idea of how the outcomes have been trending over time, and you may have a hard time justifying your counterfactual.
@nirobkothopokothon
@nirobkothopokothon 6 жыл бұрын
thank you so much.
@amartilianom
@amartilianom 6 жыл бұрын
Hello, if you want to add control variables or covariates, do you add them normally at the regression? Thanks for the information!
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
Yes, I forgot to mention that in the video. You can add controls to the diff in diff regression as in any other.
@amartilianom
@amartilianom 6 жыл бұрын
Thanks. Another question would be, it is not necessary to tell Stata we have Panel Data when we have already created the dummy variables that differentiate the control and treatment group, and the pre and post periods? No need to run a fixed effects regression too, I guess. I'm just learning about the subject :)
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
For a simple DD like this, you don't need to use xtset, if that's what you're asking. You can actually think of a DD as a very simple sort of FE model that only has two groups and two periods. If you want to see more about FE, I also have a video on it.
@amartilianom
@amartilianom 6 жыл бұрын
I really appreciate your responses. Keep helping us!
@Muhammadilyas-ij6jh
@Muhammadilyas-ij6jh 3 жыл бұрын
Hello sir! I have a question...it looks like you first run a simple OLS regression and then you compute the differences using the collapse command. I do not understand whether to use just OLS regression and report the differences estimator (-18824) as the DID estimator. Please guide me..
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
The number you gave estimates the difference between the treatment and control group before the treatment. We need to use the coefficient estimate for the interaction term to get the DID estimator.
@usmannasim618
@usmannasim618 5 жыл бұрын
Hi Sebastian, Can you also please describe the coding to be used when we have a dummy variable for 'treatment' and 'control' groups? Thanks,
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
I did that in the video. The variable nearinc is the dummy variable for the treatment group.
@adriabc7614
@adriabc7614 4 жыл бұрын
Hi Sebastian, very useful video at a great pace ;). In this example you compare the differences in price, how would you interpret the results if the variable is categorical (eg. completed studies, married, etc). Many thanks!
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
You can only do this if the categorical variable is binary (eg. married and not married). Assign a 1 to married and 0 to unmarried. We now have a linear probability model (see my video on binary choice models). The interpretation of the diff-in-diff is now the difference in probability of being married.
@nazda2007
@nazda2007 4 жыл бұрын
Dear Sebastian, I am working on my dissertation using DiD, i included additional control variables in my model. However, the model suffers from heteroskedasticity and autocorrelation. How to deal with them?
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
You might want to look at my videos on heteroscedasticity.
@ThePMPDiary
@ThePMPDiary 4 жыл бұрын
Hey! How do I generate a variable that separates the years?
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
In this dataset, that is y81 -- a dummy variable with a 1 for 1981 and 0 otherwise. I have another video with some examples of how to create dummy variables: kzbin.info/www/bejne/eqakmYimgpJobKc
@peterdastan1288
@peterdastan1288 6 ай бұрын
Does that mean house prices near garbage incinerator declined by an average of 21.13%?
@abmakwara8010
@abmakwara8010 4 жыл бұрын
Hi Sebastian thank you for the great content very informative, however i have a question, my research is looking at the impact of bank regulation implemented in 2014 and this regulation only affect bigger banks within my population. Banks with population of 25b and over. I have gathered panel data from 2010 - 2019. i intent on using performance ratios as depended and variable that determine profitability as control variables. I am using DID in FE model in Gretl to run the regression. I have generated some dummy variables , time dummy variable for the before and after, group dummy variable with those impacted by regulation as treatment group and the rest as control, regulatory dummy which i am not sure if its necessary. Two questions: 1. Is this research feasible in terms of parallel trend 2. will i need to interact all other variable in my model with time or the interaction only needs to be between time and group dummy. If yes then do i need to add group dummy on every interaction i do? 3. Is there need to add individual time effect since i am running the regression in FE model Many thanks in advance
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
1. I have no idea, but it sounds like you have enough data to make that determination yourself. 2. You should think about this on a case-by-case basis. Think about what you're trying to accomplish and whether or not interactions would help with that. 3. Time dummy variables are an important component in FE. I have some videos on FE and panel data on my channel.
@FanettiMazakura
@FanettiMazakura 6 жыл бұрын
Sebastian, what if I want to include id and time fixed effects in the regression? Do I only keep the interaction variable in the regression?
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
Unlike FE models, diff in diff does not necessarily have the same cross-sectional units across time periods. In my example, it's not the same houses in '78 and '81. As such, ID-based FE won't work. Here, the nearinc variable plays the same role as the FE. Your time dummy is already in there in DD.
@FanettiMazakura
@FanettiMazakura 6 жыл бұрын
Yes, I get that. I have unbalanced panel data and I want to conduct a Difference-in-Differences with id and time fixed effects. Is // xtreg DepVar i.treated##i.during controls i.month , fe cluster(id) // the correct model to achieve that? Or do you think that it would be better to exclude the fixed effects?
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
If I'm understanding what you're trying to do correctly, I think you can include the fixed effects.
@motnaichuoiktnb
@motnaichuoiktnb 6 жыл бұрын
Firstly thank you for your video which is very helpful. As you have mentioned in your comment it was not the same house in '78 and 81', does that mean your treatment and control group are not the same pre and post-treatment ?
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
The criterion for being in the control or treatment group is the same in both years, but the specific houses aren't the same.
@pujiannauli
@pujiannauli 6 жыл бұрын
wht if the p value after the reg. for the dummy time*dummy group is not significant, how to fix this? thank you so much
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
You don't "fix" it, it's just the result you got. It tells you that you can't reject the hypothesis that your treatment had no effect. Now, it could be that you have some endogeneity that you need to control for, but statistical significance, or lack thereof, is not (by itself) a problem to be fixed.
@consultingfaqs
@consultingfaqs 5 жыл бұрын
@@sebastianwaiecon Hi, is the interaction term is insignificant, will adding more variables help us getting the result significant? Since, in the results show that the constant term is highly significant, which means that there is an omitted variable bias. I guess, adding more controls can help solve the problem for the insignificant interaction term.
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
​@@consultingfaqs It bears repeating that the treatment not being significant is not a "problem" to be be solved unless you think this is because of an omitted variable. Tinkering around with different models with the explicit purpose of finding a significant effect is not an ethical use of data. The constant term being highly significant is also not evidence of omitted variables. I'm not sure where you got that idea. Adding more variables might or might not result in existing terms being more significant. It all depends on the direction of the bias, if there is one.
@vaishalisharma6519
@vaishalisharma6519 5 жыл бұрын
Hello sir. How to create the dummy for near inc. The actual command?
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
nearinc indicates whether the house is within 3 miles of the incinerator. There is a variable called "dist" which is the distance from the incinerator in feet. To create the dummy, we would use the command: gen nearinc = dist
@jamesleleji9470
@jamesleleji9470 3 жыл бұрын
How can you do DID using SSPSS or R programming. Thanks
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
The idea will be the same -- create dummy variables for treatment and time and an interaction, then put those in a regression.
@KIMKIM-bt6hr
@KIMKIM-bt6hr 3 жыл бұрын
Good morning. I am a student working with the DID model. Thanks to your DID explanation, I was able to complete my assignment smoothly. But yesterday, the professor asked, 'Why was the control variable excluded, so I couldn't actually answer it.' After class, the professor gave me a separate assignment. That is, put the control variable in and analyze it again. I want to use STATA again. But how do I add a control variable to the current video? Could you please advise which code to enter?
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
You can simply add control variables to the DID regression, if you want.
@KIMKIM-bt6hr
@KIMKIM-bt6hr 3 жыл бұрын
@@sebastianwaiecon I'm a STATA beginner, so can you explain a little bit more about where to put this part?
@Diana-mo6mg
@Diana-mo6mg 4 жыл бұрын
if you used logprice instead of price would the coefficient be different?
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
Yes, it would. See my video on natural logarithms for how that would work.
@zdavirandimuhammad1515
@zdavirandimuhammad1515 3 жыл бұрын
could you explain to us about Propensity Score Matching using STATA?
@DX-nh8qc
@DX-nh8qc 3 жыл бұрын
May I know How to type control covariable in stata
@consultingfaqs
@consultingfaqs 5 жыл бұрын
Could you please tell if we are using for example DHS data, which has data on demographics and health of a nation; but we want to see the effect of an external policy, like NREGA on labourforce participation of females ( the data for which is available in DHS). Then, should we merge NREGA data with DHS data, and then apply matching techniques to determine treatment and control groups? If not this, then how should we see the impact? Thanks
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
This question is specific to data I don't have any experience with and is therefore outside the scope of this video.
@achintyawidhi2299
@achintyawidhi2299 4 жыл бұрын
sir, what the difference between xtreg and reg? if i use data from year 2007 and 2014, should i use reg org xtreg? my dataset doesn't have same units across 2007 and 2014.
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
Reg is the basic regression command and xtreg is used for panel data methods such as within estimation and random effects. If you don't have the same units across years (pooled cross section), then you probably want to use reg.
@harunasanibk2662
@harunasanibk2662 5 жыл бұрын
Sir, how am I supposed to run the data for both "treatment and control" groups? Should I run the data separately? Please, what command should I use?
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
I don't know what you mean by "run the data" here.
@Bibirallie
@Bibirallie 3 жыл бұрын
What if there are multiple before and after variables, but not one conclusive before and after or year variable.
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
You may want to consider a fixed effects model instead.
@GHSHAH
@GHSHAH 6 ай бұрын
How to interpret the interaction term, also how to check it is significant or not.reply fast
@nip5554
@nip5554 5 жыл бұрын
Hi what if I want to control for additional variables? Then the command "collapse (mean) y, by(after treatment) " is not sufficient. Please tell me what to do to control for variables.
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
You can add control variables, but you'll have to run a regression rather than using the collapse method.
@nip5554
@nip5554 5 жыл бұрын
@@sebastianwaiecon Thanks :)
@thanhtoba1464
@thanhtoba1464 5 жыл бұрын
Thank you for your helpful sharing, when I run the command: "corr(y81 nearinc y81nrinc)" to test the autocorrelation between variables and the result shows there is an autocorrelation between "nearinc" and "y81nrinc" variables. The confidence of correlation is 0.5776. So my question is: what should we do in this situation.
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
First of all, "autocorrelation" is a very specific term, which you are using incorrectly. In time series data, this refers to a variable correlating with itself across time. In any case, you've pointed out that an interaction term is correlated with one of the variables you are interacting. This is true by definition. There isn't anything you do about that -- it would be strange if it were not the case. In a more general sense, there is nothing wrong with two variables in a regression being correlated with each other. That is completely normal and probably the case in most regressions.
@thanhtoba1464
@thanhtoba1464 5 жыл бұрын
Thank you for pointing out my problem. You are right, it was my fault in using the term "autocorrelation". What I really mean is the "multicollinearity" but there was a mistake in typing. Anyway, according to the data in the video, the truth is "multicollinearity" really happens in the regression result because the coefficient of correlation between " nearinc" and "y81nrinc" variables is 0.5776. Usually, in the case of encountering "multicollinearity", we usually omit one of the two variables out of the model. However, it is impossible to omit any variable of these two variables due to the requirement of "Difference in difference" method because they must be included together to show the effect of the construction of the incinerator. That is why I asked the question "what should we do in this situation". And this problem not only happens in this example, but it also occurs in every "DID" model because we usually create a "did" variable by multiplying the "time" and "treated" variables (did = time * treated). And the consequence is there always is "multicollinearity" in "DID" model. Can you help me to solve this issue?
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
Multicollinearity is not a big deal. Getting into the practice of dropping variables because they are correlated with another variable in the model will lead you quickly into omitted variable bias. There is a simple test where you regress the one variable you are concerned about on all the other explanatory variables. If the R-squared is under 0.9, don't worry about it. As I explained previously, it is mathematically impossible for a variable and an interaction term involving it to be uncorrelated. The interaction term is absolutely key to a diff in diff regression.
@thanhtoba1464
@thanhtoba1464 5 жыл бұрын
@@sebastianwaiecon Thank you very much for the explanation.
@gregorychung9421
@gregorychung9421 4 жыл бұрын
@@sebastianwaiecon Hello, I found this video very helpful. However, when running my model, my DID variable keeps getting dropped because of collinearity. Is there a fix to that?
@manojsapkota4880
@manojsapkota4880 5 жыл бұрын
Hello sir I am interested on DID and want to know the command to run DID regression on Stata
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
It's all in the video.
@dalemantey6028
@dalemantey6028 6 жыл бұрын
Can you do a DD with logistic regression? Say I have a dichotomous outcome - for this example, it could be something like house sold (yes/no). Would it be a similiar stata code, just change "regress" to "logistic" or are the considerations within DD that might limit the statistical validity of that sort of analysis?
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
The principles which drive DD -- controlling for time trends and cross sectional trends -- are still useful for logits (and probits also). However, you need to be careful about the coefficient interpretations, as it's not as clean as in the least squares DD. I would suggest looking at my video on binary choice models for details.
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
For the code, yes, you can change "regress" to "logit" and it will run.
@dalemantey6028
@dalemantey6028 6 жыл бұрын
Thank you!
@vojtechkolar5897
@vojtechkolar5897 Жыл бұрын
Hey, I kind of understand diff-in diff, now I am dealing with a problem, what if the control is on way larger levels than the treatment Lets stay Control before: 100, after: 200 = 100 % increase, Treatment before: 5, after 9. If I calculate the DID efffect using the standard table so like the diff between differnces i get in this case 100-4= 96!... So the conterfactual state of the world would in the case of treatment be 105 ? !, that does not make sense no? Even the R with OLS gives me these results. What am I doing wrong? Thank you
@vojtechkolar5897
@vojtechkolar5897 Жыл бұрын
I get, that I can solve this problems by working with log-level model. But isnt this problem always with level-level dif in dif? What Am i missing?
@sebastianwaiecon
@sebastianwaiecon Жыл бұрын
You can do diff in diff with the dependent variable in logs. That's no problem as long as you are careful with the percentage change interpretation.
@ariagalit1875
@ariagalit1875 5 жыл бұрын
Hi. My data ranges from 2009 to 2018, and i have both treatment and comparison groups. i just want to ask whether DID, just like what you did in the video, is applicable. I am not much familiar with the method and stata, actually.
@ariagalit1875
@ariagalit1875 5 жыл бұрын
And how come the interaction variable is all zero?
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
You can do DID if you set up a dummy variable to indicate when the treatment went into effect. Once this is in place, you can create the interaction term.
@ariagalit1875
@ariagalit1875 5 жыл бұрын
Thanks much for your reply sir
@frankzhao1678
@frankzhao1678 3 жыл бұрын
Thank you so much, it is a great video. Could you please show me how to do a DiD with multi periods?
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
Do you mean you have multiple periods before and after the change? It functions the same as this, but you need to define your "post" variable to include all periods after the change.
@frankzhao1678
@frankzhao1678 3 жыл бұрын
@@sebastianwaiecon So if I have 2000-2010 data, and the policy happened in 2005. I need to set 2000-2004 equal to '0', and 2005-2010 equal to '1'?
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
That would be the simplest way to do it. I'm not promising this is the perfect solution as you may need to think about more sophisticated ways to handle your specific data, but it is a good starting point.
@jodieteague8254
@jodieteague8254 5 жыл бұрын
could you then graph this in Stata?
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
Yes. You would do this after running the collapse to get all the averages. The "classic" diff in diff graph has the outcome on the vertical axis and time on the horizontal axis. There are three lines: the treated group, the untreated group, and a counterfactual with the same starting point as the untreated group but the same slope as the treated group. See my video on graphing for how to use the twoway command.
@jodieteague8254
@jodieteague8254 5 жыл бұрын
@@sebastianwaiecon Thank you will do!
@BrickTemplar
@BrickTemplar 5 жыл бұрын
Hi Sebastian, I wonder what do we have to do if the effect is spread over the years, say, treatment was implemented in one year for the firms in one industry, next year for another? Say, over the three decades, the U.S. authorities have gradually cut import tariffs on a large variety of goods and services. CUT=1 if this happened, 0 otherwise. The equation will have a form of Investment=b1*tariff CUT + b2*lagged controls + industry FE etc, cluster by industry-year. I do not understand what do I have to add to a simple regression to make it diff-in-diffs in this case... Dummy CUT interacted with what?
@BrickTemplar
@BrickTemplar 5 жыл бұрын
or, like in your example, incinerator would have been installed for one neighborhood in 1981, for another in 1985 etc, for another in 2005... y81 time dummy won't work anymore, so what do we have to interact?
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
You'll need a dummy variable that "turns on" from a 0 to a 1 once the treatment is active. You won't be able to do this by building an interaction term, as it's more complex than that now. I'm not sure there's a better way than putting in the 1s on a case by case basis.
@mathewchandy9588
@mathewchandy9588 4 жыл бұрын
Is heteroscedasticity ever an issue when you conduct a difference-in-difference analysis?
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
Yes, it is. In this example, you could imagine there might be a difference in the variance of prices with and without the incinerator.
@mathewchandy9588
@mathewchandy9588 4 жыл бұрын
@@sebastianwaiecon Then to solve this, would you add the vcerobust command at the end of your regression?
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
I can't think of a theoretical reason why you couldn't do that. To be honest, I think most people just use robust all the time and don't really think about it.
@narlikar78
@narlikar78 6 жыл бұрын
Can we have your dataset used in the video to try the results again ourselves
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
The dataset is KIELMC.dta that comes with the Wooldridge econometrics book. You should be able to find it online.
@monikasrivastava5565
@monikasrivastava5565 4 жыл бұрын
What are the steps to generate the result why u have not shown them. Plzz do i really need how to do it
@emilieriislarsen5134
@emilieriislarsen5134 6 жыл бұрын
Hi, Sebastian, thank you so much for your video. I was wondering if it's possible to do propensity score matching and difference in differences when my dependent variable is dichotomous?
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
I can't comment on specifics as I've never combined all of these myself. However, both diff in diff and propensity score matching can be done with dichotomous dependent variables. You just need to be careful about the issues inherent in linear probability. See my video on binary choice models for details.
@popi20101
@popi20101 4 жыл бұрын
What if we add more than 1 control variable? not only nearinc.
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
You are always allowed to add controls if you think the DD method did not eliminate endogeneity.
@popi20101
@popi20101 3 жыл бұрын
And if we have 5 years of period 2007 to 2011 and the policy is announced at 2009, how to set the year variable?
@himaep_agungkrisyana1013
@himaep_agungkrisyana1013 3 жыл бұрын
can i get do-file this stata?
@narlikar78
@narlikar78 6 жыл бұрын
Sir, Another question in this regard and I humbly request your attention at the earliest: Suppose I have a panel data set of 75 Banks for 5 years (Pre-merger) which have merged to become 30 Banks (also for 5 years Post Merger) and I have been able to establish my model using all the standard Panel Data Test viz. the F-test, BP-LM Test, and Hausman (1978) that it is a Fixed Effects Model. given that my Dependent Variable is an Index of Inclusion (whose values lie between 0 and 1), while all other Independent variables are metric data from Balance sheets of banks, with a time dummy (0 for pre-and post merger), CAN I run a Panel Tobit model knowing well that it is a fixed effects Model. I use Stata 14 for my econometrical model testing? I have been told that Panel Tobit can be accompanied only for Random Effects Model My problem is my Dependent variable has a truncated range ? Please guide asap
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
Mechanically, you can do it with dummy variables (see my fixed effects video). While I am not aware of a specific reason you should not do so, I don't know enough to definitively tell you one way or another.
@ashishstat
@ashishstat 4 жыл бұрын
Can I have the link of data set used in this video
@sebastianwaiecon
@sebastianwaiecon 4 жыл бұрын
It's KIELMC.dta, which comes with the Wooldridge econometrics textbook. You should be able to find it online.
@antoniomastrandrea967
@antoniomastrandrea967 5 жыл бұрын
Hi Sebastian, thank you for your video! I've two questions: 1) What should I do if the FE variables (time and individual) are not significant? (I mean p-value > 0.1) 2) Do I have to take care of R squared in this case? Thank you!
@sebastianwaiecon
@sebastianwaiecon 5 жыл бұрын
1) If what you're after is measuring the treatment effect, this doesn't matter. 2) I don't know what you mean by "take care," but R squared is not particularly relevant in DID estimation.
@raulfotso4032
@raulfotso4032 4 жыл бұрын
Good morning for all.please i want know how to do a Fairlie décomposition.i am student lecturer in university of Douala
@lateralus5117
@lateralus5117 6 жыл бұрын
Hello, i ran into a problem when running my regression. My regression looks like this: regress DepVar post_tr_yr treat_group treat_groupXpost_tr_yr Where post_tr_yr is a dummy for year>2007 However my interaction term (treat_groupXpost_tr_yr) gets omitted due to collinearity. Is this a problem?
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
I always recommend you go to the data browser and take a look at the values. Presumably something went wrong in your variable generation.
@Ilaay23
@Ilaay23 6 жыл бұрын
I also have this problem. My interaction term is omitted due to collinearity, does anyone know how you can fix this?
@bencaplan4565
@bencaplan4565 6 жыл бұрын
I have the same issue - what sort of issue in the variable generation can result in this?
@xMooshy
@xMooshy 6 жыл бұрын
@@bencaplan4565 for the time dummy, the control group also gets a 1 even if it is not treated at all
@jaredgreathouse3672
@jaredgreathouse3672 6 жыл бұрын
What if your data have multiple units treated and untreated at the same time? There, a clean post period makes no sense. If one city 1, for example, is being treated at time t, but city 2 and 4 aren't, but the next year, city 3 is being treated and so on, wouldn't you just do treatment##time variable
@sebastianwaiecon
@sebastianwaiecon 6 жыл бұрын
For that, you might want to look into a full fixed effects model. I have a video on that, as well.
@fgghdfg8638
@fgghdfg8638 3 жыл бұрын
Hi professor I hope you are doing well I'm a follower on KZbin professor can you help me to do an assignment in method difference in differences because I didn't find subject or data can help me to do it I must to do it other way I will repeat the year and I sleep only 3 hours more than 3 weeks just because of this project can you help me and if you want I can pay you to help me
@sebastianwaiecon
@sebastianwaiecon 3 жыл бұрын
I recommend you ask your professor for help - it's what they're there for!
Fixed Effects in Stata
13:41
SebastianWaiEcon
Рет қаралды 214 М.
Basic Difference-in-Differences Method (DID) | Estimation Methods | Stata Tutorials Topic 42
12:00
Dr. Bob Wen (Stata, Economics, Econometrics)
Рет қаралды 3,9 М.
Chain Game Strong ⛓️
00:21
Anwar Jibawi
Рет қаралды 41 МЛН
Une nouvelle voiture pour Noël 🥹
00:28
Nicocapone
Рет қаралды 9 МЛН
Causal Inference: A Simple Difference-in-Difference Model
26:38
Mike Jonas Econometrics
Рет қаралды 61 М.
An intuitive introduction to Difference-in-Differences
12:49
Doug McKee
Рет қаралды 238 М.
Difference-in-differences methods
16:18
Mikko Rönkkö
Рет қаралды 47 М.
The Dome Paradox: A Loophole in Newton's Laws
22:59
Up and Atom
Рет қаралды 587 М.
Treatment effects in Stata: Difference in differences (DID)
10:11
StataCorp LLC
Рет қаралды 3,8 М.
Econometrics - Difference in Differences
16:29
Econometrics, Causality, and Coding with Dr. HK
Рет қаралды 14 М.
Differences in Differences Animation (Beginner)
12:10
Empiricists Academy
Рет қаралды 29 М.
Quasi-experiments: difference-in-differences
11:34
Nathan Wozny
Рет қаралды 16 М.
Treatment effects in Stata: Heterogeneous difference in differences
9:01