Staggered Treatment in Difference-in-Differences (The Effects, Videos on Causality, Ep 56)

  Рет қаралды 12,718

Econometrics, Causality, and Coding with Dr. HK

Econometrics, Causality, and Coding with Dr. HK

2 жыл бұрын

Please visit www.theeffectbook.net to read The Effect online for free, or find links to purchase a physical copy or ebook.
The Effect is a book about research design and causal inference. How can we use data to learn about the world? How can we answer questions about whether X causes Y even if we can't run a randomized experiment? The book covers these things and plenty more. These videos are meant to accompany the book, although they can also be viewed on their own.
This video relates to material found in Chapter 18 of the book.
A version of this video without background music can be found here: • Staggered Treatment in...
All the DID stuff we've done so far has been about treatments that go into place at the same time, whether there's only one treated group or many groups all getting treated at once. But what if that's not the case? What if treatment goes into effect at different times? We used to think two-way fixed effects was fine for that. But oops! It's not. Why not, and what can we do instead?

Пікірлер: 64
@brad2349
@brad2349 Жыл бұрын
Love your explanation of this! Very clear and concise!
@dr.kingschultz
@dr.kingschultz Жыл бұрын
As always another amazing video!
@tahabilgic9439
@tahabilgic9439 Жыл бұрын
Great video, thanks!
@yangyijane
@yangyijane Жыл бұрын
Thank you so much for the explanation. Could you make another video to apply this staggered DID in R?
@NickHuntingtonKlein
@NickHuntingtonKlein Жыл бұрын
Thanks! There are R code examples and packages listed in the textbook chapter.
@svl8389
@svl8389 10 ай бұрын
Your book has been a great help to me to understand difference in difference. Could you further add some info in the did section about rollout did design in R. That would be really helpful.
@NickHuntingtonKlein
@NickHuntingtonKlein 10 ай бұрын
Yep! This is already slated for the second edition.
@renanchicarellimarques4272
@renanchicarellimarques4272 Жыл бұрын
Great video, as always! Do you have a good suggestion of an applied paper which uses the Callaway and Sant'Anna (2021) staggered DID estimator?
@NickHuntingtonKlein
@NickHuntingtonKlein Жыл бұрын
Thanks! Because it's relatively new most of these are working papers still. But I'd recommend checking the list of papers that cite it. scholar.google.com/scholar?cites=6052894912618674159&as_sdt=5,48&sciodt=0,48&hl=en
@gabrielaterra3734
@gabrielaterra3734 2 жыл бұрын
Thank you for the great explanation! When I have a design case like this, should I do matching for each year of treatment considering the treated in the period n+1 as possibly in the control group?
@NickHuntingtonKlein
@NickHuntingtonKlein 2 жыл бұрын
I'd advise against attempting it by hand (unless that's something you do a lot) and instead use a package. For an approach with matching the Callaway and Sant'Anna estimator makes sense, csdid package in Stata or did in R. The appropriate control group can either be the entire set of groups that haven't been treated yet as of time n, or only the groups that never get treated. The former is more precise but you need to be willing to make some assumptions about the treatment not being anticipated
@lukaparisi4351
@lukaparisi4351 Жыл бұрын
Thank you for the video it complements the book in a great way! Quick question for you Nick, I am currently doing my thesis on Gun delay laws and how they affect gun-related suicides in the United States. Since a few states "switch-off" gun laws in my sample I am looking to use this staggered treatment difference-in-differences design. I am having trouble formulating my model equation, would the following equation be correct if i'm only looking at the ATT? ln(Y)_it = \alpha + \beta_1 ( Treatment_s x Post_{s\tau}) + \lambda_t + \mu_s + \epsilon_{st}, with \tau being the subscript for event time, subscript s for state and subscript t for calendar time?
@NickHuntingtonKlein
@NickHuntingtonKlein Жыл бұрын
That would be biased for the reasons mentioned in the video. You'd need to use one of the staggered treatment estimators like Callaway and Sant'Anna. To if you want to incorporate the "switches back off" part that's harder, matrix completion would be one option
@jacobmorgan7495
@jacobmorgan7495 11 ай бұрын
Great balance of intuitive and technical. Are the Callaway & Sant'Anna /Woolridge estimators required when cases are matched to controls prior to the analysis (using propensity score matching, for example)? Thanks!
@NickHuntingtonKlein
@NickHuntingtonKlein 11 ай бұрын
Thanks! And not necessarily, you can do matching without C&S, but it is a good way to do it, especially with staggered treatment.
@matterne.i
@matterne.i 7 ай бұрын
Great video, and your explanation is crystal clear! Thanks. I'm curious-can a staggered difference-in-differences analysis be used when there are varying treatment intensities among the subjects receiving the treatment?
@NickHuntingtonKlein
@NickHuntingtonKlein 7 ай бұрын
Thank you! And to answer your question: It can but needs to specifically account for the continuous nature of the treatment variable. See e.g. arxiv.org/abs/2107.02637
@dunstanburghcastlegolfcour6440
@dunstanburghcastlegolfcour6440 Жыл бұрын
Timely clarification on time variant DID issues.... would be great to see how to solve the problem in package like Stata using the new approaches - would get a lot of interest I am sure.
@NickHuntingtonKlein
@NickHuntingtonKlein Жыл бұрын
I discuss this (and other coding stuff) in the chapter itself. I'd recommend checking out the csdid package.
@dunstanburghcastlegolfcour6440
@dunstanburghcastlegolfcour6440 Жыл бұрын
Thank you very much - I have looked at your book (which is fantastic by the way!) and downloaded the package. The instructions/example included in the Stata package provide very useful help in implementation i.e. they have data set examples - which are invaluable in moving towards implementation of modelling. A point you make in your book - that the Stata DID package does not deal with time variant DID - is really worth putting in bold!! I spent a lot of time reading the Stata manual to try and find this out. It seems to dodge the issue entirely which is very frustrating and will no doubt create similar problems for others (i.e. why can't I do a parallel trends test etc.)! From what I can gather CSDID does allow for some kind of parallel trends test, post estimation, which is very handy. A youtube video running through an example of CS/DR DID Stata implementation using a dataset would be a big hit, I am sure.
@chenzhang8005
@chenzhang8005 9 ай бұрын
Thank you so much for the great content! I have one question regarding testing the pretrend using Csdid stata code. The pretrend is significantly different because treatment and control. I wonder what should I adjust to make sure the common trend assumption is met and how can I conduct matching manually for staggered DID. Thank you for your time 😊
@NickHuntingtonKlein
@NickHuntingtonKlein 9 ай бұрын
There's no way to guarantee that parallel trends holds (since it is an assumption that you cannot observe in the data - common prior trends is suggestive of parallel trends but they aren't the same thing). But if you have some reason to believe, say, that trends differ because of some different starting value of a covariate, then controlling/matching for that variable will fix it. Adding it as a covariate in csdid will do the matching for you, no need to do it manually.
@chenzhang8005
@chenzhang8005 9 ай бұрын
@@NickHuntingtonKlein Thank you for reply! I know that most papers just use the visual inspection to roughly check parallel pretrend, and I wonder whether it is possible to do the visual inspection on the staggered DID as well?
@NickHuntingtonKlein
@NickHuntingtonKlein 9 ай бұрын
@@chenzhang8005 one approach is to do the same inspection, but separately for each cohort (and its matched control)
@user-dk9cg2cr9f
@user-dk9cg2cr9f Жыл бұрын
What if I want to test moderating effects using three-way interaction using a staggered DID model? How should I measure the moderator when different groups have different treatment times?
@NickHuntingtonKlein
@NickHuntingtonKlein Жыл бұрын
I suspect it would work to take the Wooldridge approach and add another level of interactions, then average together the interaction effects to get your moderation effect. I don't actually know if this works but I suspect it would.
@desperatewanderer742
@desperatewanderer742 5 ай бұрын
Thank you! You're so helpful. I was wondering why you say that DiD doesn't care that there are different control groups in different time periods, but it's the TWFE forces these treated groups that are already treated to act as if they're control groups... Why is that? not getting the connection there.
@NickHuntingtonKlein
@NickHuntingtonKlein 5 ай бұрын
You're welcome! Basically, the fixed effects in twfe estimate the did effect by comparing "variation in treatment" against "no variation in treatment" in a given peirod. But there are two ways for treatment not to vary - starting untreated and staying that way, or being treated and staying that way next period. So the latter ends up getting included in the control group.
@desperatewanderer742
@desperatewanderer742 5 ай бұрын
@@NickHuntingtonKlein Thank you! And gotcha, and let's just say we didn't include the TWFEs, then I suppose the did effect will include some of the treated (but staying the same) group's already treated effect...?
@NickHuntingtonKlein
@NickHuntingtonKlein 5 ай бұрын
@@desperatewanderer742 correct.
@aliothrosen9242
@aliothrosen9242 8 ай бұрын
If I want to draw a graph to see evolution of outcome variables of treated and untreated but the x-axis is relative time to the treatment, how can I add the line for untreated group in this graph since untreated doesn't have a relative time to the treatment?
@NickHuntingtonKlein
@NickHuntingtonKlein 8 ай бұрын
Ideally, you draw a separate one of these graphs for each treatment cohort. That way, treatment time is fixed for each graph. 0
@yufangsun7725
@yufangsun7725 Жыл бұрын
I have a stupid question. From what you say, do you mean that in the staggered difference in difference specification (without any of the improved estimators), the "post" variable is only 1 for 1 period after the treatment, and becomes 0 thereafter? (so first 0, then 1, then back to 0)
@NickHuntingtonKlein
@NickHuntingtonKlein Жыл бұрын
Do you mean the two way fixed effects model? Post is 1 in all the post-treatment periods, not just the first one. But to be clear, this model does not work as intended if your treatment is staggered.
@RobertWF42
@RobertWF42 8 ай бұрын
I'm working on a healthcare 12 month pre/post case-control analysis with longitudinal data (monthly obs) where patients in the treatment group received an intervention at staggered times over several years. We'd like to compare trmt patients with control patients to estimate the average trmt effect. Rather than building a separate control group dataset, which is likely very different from rhe trmt population & have to worry about unmeasured confounders, can we simply compare a trmt patient to another trmt patient who hasn't yet received the intervention (create an ad hoc ctrl group from the trmt group)? We'd match the trmt intervention date to the same date in our "control group" and compare pre to post 12 month outcomes.
@NickHuntingtonKlein
@NickHuntingtonKlein 8 ай бұрын
Yep, no particular reason you couldn't do that as long as you think parallel trends holds for those comparisons. I would want to make sure that (a) you do have enough post-observation periods for your "control" group before they get treated (or might potentially start altering their behavior in anticipation of treatment), and you don't include any post-treatment data for your control group for when they're supposed to be acting as a control, and (b) you have enough observations in your "control" group and aren't introducing too much noise by throwing out most of your controls.
@RobertWF42
@RobertWF42 8 ай бұрын
@NickHuntingtonKlein Thanks Nick! If the parallel trends assumption doesn't hold for the pre-intervention data, can we avoid the rule by condensing the outcome observations into only two time points per member: Y0 = average pre-intervention outcome and Y1 = average post-intervention outcome? Or avoid DiD and instead run an ANCOVA by regressing Y1 on Y0 + trmt_flag + covariates.
@NickHuntingtonKlein
@NickHuntingtonKlein 8 ай бұрын
@@RobertWF42 Keep in mind that parallel trends is a *theoretical* assumption that you can't actually observe (see my video on parallel trends). When you check for prior trends in pre-treatment data, you can at best get suggestive evidence of whether parallel trends holds, you can't actually check it. If your research design is "compare a newly-treated group as they change over time to an untreated group as they change over time, including both pre- and post-treatment periods", regardless of the way you estimate the model, you need to assume that the effect of time independent of treatment affects both groups equally, but can't actually observe it. So condensing to two time periods wouldn't help, since the change you need to assume parallel trends for is before vs. "after if the treatment hadn't occurred" and you've still got that change even if it's only a single pre vs a single post.
@mountainsmusicbeer5532
@mountainsmusicbeer5532 10 ай бұрын
Your videos are fantastic (as is your enthusiasm). But I'm new to difference-in-differences. Can it be applied when there are different treatments to different cohorts? Specifically, I'm thinking about a 2-year language program, for which the control group had two years of face-to-face (F2F) classes. But as a result of covid19 preventions measures that began in 2020 (and ended in 2022), different cohorts had different course delivery methods. (Each cohort had the same standardized language test at the beginning of Year 1, end of Year 1, and end of Year 2.) 2018 cohort: Year 1 F2F classes, Year 2 F2F classes 2019 cohort: Year 1 F2F, Year 2 online classes 2020 cohort: Year 1 online, Year 2 online 2021 cohort: Year 1 online, Year 2 F2F
@NickHuntingtonKlein
@NickHuntingtonKlein 10 ай бұрын
Thanks, and yep! The only part of what you said that isn't covered in this video is the non-monotonic treatment (treatment turns "on" over time for some but "off" over time for others).there are some did variants designed for that case, or You might try matrix completion for that.
@mountainsmusicbeer5532
@mountainsmusicbeer5532 10 ай бұрын
@@NickHuntingtonKlein Thanks for the quick reply. I'll need some time to think about this, but this is encouraging.
@vaibhavpuri9278
@vaibhavpuri9278 10 ай бұрын
Amazing video! please do expand this with STATA based example. Found it really informative.
@robbiemaris6238
@robbiemaris6238 8 ай бұрын
Thanks for the great video! What about a situation where there is a staggered rollout over time on two dimensions (location and another covariate)? For example, a new teaching programme is rolled out for different school locations and subjects. So, in Year 1, some schools get the treatment but only teachers of some subjects get the treatment. In Year 2, more schools get the treatment and the list of subjects expands (so some of the teachers at schools initially treated in Year 1 are now treated because their subject is included). Hopefully that makes sense! It's almost like the number of subjects that the programme covers is a measure of treatment intensity that varies over time... However, it's not a linear measure of intensity! Maybe each subject-level teacher training is its own treatment?
@NickHuntingtonKlein
@NickHuntingtonKlein 8 ай бұрын
I'd probably treat each subject/school combination as its own treatment, assuming you have data at this level. If I weren't worried about spillovers at all, I had subject-school level outcomes, and my outcomes were comparable between subject, this is definitely what I'd do. If you're worried that one subject getting trained will affect others in the same school before training occurs, that makes things more complicated.
@robbiemaris6238
@robbiemaris6238 8 ай бұрын
@@NickHuntingtonKlein thanks - that's very helpful! Assuming spillovers weren't an issue, how would that many treatments enter a DiD regression framework? Would there be seperate treatment dummys for all combinations interacted with a post treatment variable?
@NickHuntingtonKlein
@NickHuntingtonKlein 8 ай бұрын
@@robbiemaris6238 Most staggered-DID methods, like those I mention in the video, do separate treatment dummies by cohort - i.e. by when the treatment started for that group.
@robbiemaris6238
@robbiemaris6238 8 ай бұрын
@@NickHuntingtonKlein great! So in the example, there would be seperate treatment dummies for each cohort (school) and subject combination?
@NickHuntingtonKlein
@NickHuntingtonKlein 8 ай бұрын
@@robbiemaris6238 I don't know about subject, but cohort yes. See the Callaway and Sant'Anna or Wooldridge methods
@MMichiganSalveRegina
@MMichiganSalveRegina 10 ай бұрын
But how do you aggregate the effects from separate models?
@NickHuntingtonKlein
@NickHuntingtonKlein 10 ай бұрын
Add or average them as desired. Everything should be coming from a single model so it should be straightforward to just do linear combinations of coefficients as you might normally do. In software, packages for Callaway and Sant'Anna or Wooldridge will include aggregation commands.
@TheArasmcz
@TheArasmcz 5 ай бұрын
yellow flashes are highly annoying
@user-hp6in6vz3m
@user-hp6in6vz3m 17 күн бұрын
Thank you for the amazing explanation! Your textbook is also super helpful. I would like to ask a question regarding other ways to estimate staggered treatment effects. I've come across some papers using the matching x classic DID, which looks like these: 1. matching x DID (transforming the treatment year to t=0): The OLS looks like Y_i,t = Treat_i + Post_t + Treat_i * Post_t 2. matching x DID (without transforming the treatment year to t=0): The OLS looks like Y_i,t = Treat_i + Post_i,t + Treat_i * Post_i,t I was wondering what are the pros and cons with these 2 models, and how good they are compared to TWFE staggered DID?
@NickHuntingtonKlein
@NickHuntingtonKlein 17 күн бұрын
As long as you force it to drop the year just before treatment as the reference year, both of those should give the same result. However, both are wrong under staggered treatment so should only be used if treatment occurs all at the same time. Glad you like the book and videos!
@user-hp6in6vz3m
@user-hp6in6vz3m 16 күн бұрын
@@NickHuntingtonKlein Hi thank you for the prompt reply!! I realized these 2 models are essentially doing the same thing. Could you also explain why this model cannot be used in a staggered treatment? Or any reference material that I can address to? I thought this can be an alternative to TWFE DID. Is it because it is not comparing the early treated with late treated?
@NickHuntingtonKlein
@NickHuntingtonKlein 16 күн бұрын
@@user-hp6in6vz3m as for why this doesn't work, and other estimators, I'd recommend this video! Or the corresponding section of my book, 18.2.5 www.theeffectbook.net/ch-DifferenceinDifference.html#how-the-pros-do-it-2
@user-hp6in6vz3m
@user-hp6in6vz3m 16 күн бұрын
@@NickHuntingtonKlein Ooh so Y_i,t = Treat_i + Post_i,t + Treat_i * Post_i,t this method is the same with Two-way fixed effects DID? Could I understand it that when we fix some units and some year, the problem you mentioned for two way fixed effects would happen (like the forbidden comparison)?
@NickHuntingtonKlein
@NickHuntingtonKlein 16 күн бұрын
@@user-hp6in6vz3m the model you posted with by-year effects is not the same as the regular TWFE model that is just before/after, but neither model works under staggered treatment.
@c.comploj3775
@c.comploj3775 Жыл бұрын
You are targeting graduate students. Why should people read your book, if your videos are very simplistic? Explanations are good, but discussing assumptions etc. might be more relevant for students.
@NickHuntingtonKlein
@NickHuntingtonKlein Жыл бұрын
1. Who says I'm targeting grad students? 2. The book discusses more assumptions, which sort of addresses your other question
Instrumental Variables (The Effect, Videos on Causality, Ep 57)
10:24
Econometrics, Causality, and Coding with Dr. HK
Рет қаралды 1,8 М.
ROCK PAPER SCISSOR! (55 MLN SUBS!) feat @PANDAGIRLOFFICIAL #shorts
00:31
Final muy increíble 😱
00:46
Juan De Dios Pantoja 2
Рет қаралды 47 МЛН
Must-have gadget for every toilet! 🤩 #gadget
00:27
GiGaZoom
Рет қаралды 12 МЛН
Looks realistic #tiktok
00:22
Анастасия Тарасова
Рет қаралды 11 МЛН
9 - Difference-in-Differences
33:01
Brady Neal - Causal Inference
Рет қаралды 9 М.
Difference-in-differences methods
16:18
Mikko Rönkkö
Рет қаралды 42 М.
Causal Inference -- 20/23 -- Staggered Adoption: The Bacon Decomposition
35:54
Intuitive MetriX – Ben Elsner
Рет қаралды 2,9 М.
How to Compute Causal Effects Using Regression Discontinuity: Causal Inference Bootcamp
6:51
Mod•U: Powerful Concepts in Social Science
Рет қаралды 14 М.
Econometrics - Difference in Differences
16:29
Econometrics, Causality, and Coding with Dr. HK
Рет қаралды 12 М.
Quasi-experiments: difference-in-differences
11:34
Nathan Wozny
Рет қаралды 15 М.
Lecture 14   Difference in Differences
1:20:35
Richard Gallenstein
Рет қаралды 20 М.
Causal Inference with Machine Learning - EXPLAINED!
16:09
CodeEmporium
Рет қаралды 37 М.
ROCK PAPER SCISSOR! (55 MLN SUBS!) feat @PANDAGIRLOFFICIAL #shorts
00:31