Пікірлер
@CharlieDickens-j4m
@CharlieDickens-j4m 2 күн бұрын
Hi mate. Not sure if you are still using platform, however, I have a question for you. For a job application assignment I need to us diff-in-diff to estimate the impact of a job training programme on log(employment). For context, in this hypothetical scenario, local government either roll out the programme (treatment group) or not (control group). I have data on log(employment) and log(population) for 3 periods. Two pre treatment periods (parallel trends assumption holds) and one post treatment period. My current regression looks like Δlog(employment) = time fixed effects + group fixed effects + DiD dummy + log(emp) + ε. Does it make sense to add an interaction term for each local government and the DID dummy to capture the heterogeneity in treatment effects for each treated unit?
@NickHuntingtonKlein
@NickHuntingtonKlein 2 күн бұрын
I would probably make log(employment) the dependent variable and leave it off the right side, but your version works too I think. I'm assuming treatment is applied to all cases at the same time - if treatment is staggered then the typical TWFE setup doesn't work. If you add an interaction term for each local government, then in effect what you're doing is running a separate DID for each local government vs. the same control group. There's nothing inherently wrong with this, although keep in mind that your effective sample size for each of the DIDs will be much smaller (i.e. do you have enough sample to actually be able to look at each effect separately) and you now have *many* parallel trends assumptions to investigate - one for each treated government - rather than just one.
@Luger0312
@Luger0312 2 күн бұрын
Dear Mr. Huntington-Klein, I have only today received my copy of your book "The Effect". I'm currently working on my final paper for my Bachelor's degree in sociology. I found your book when I was trying to get a better understanding of how to interpret squared terms and their respective base variables in a regression model. I read but a few paragraphs of the online book and decided to order a copy. Books about statistics aren't usually very "beginner friendly" but I found your approach of explaining things with little implied knowledge and, where knowledge needs to be implied, guiding to the respective chapter, very encouraging. I'll have to rewrite certain paragraphs a little. Not because my understanding and explanation was wrong, but because by the help of your book, I'll be able to describe certain contents more precise while also more easily understandable. After reading a few subchapters, I am already certain your book will not only be a great help for my paper, but will also be a go to lecture when trying to solve a problem or gain better understanding of statistical methods in the future. Or even for spare time lecture - something I wouldn't expect to say about a book discussion statistics. So, long story short, what I want to say is thank you.
@NickHuntingtonKlein
@NickHuntingtonKlein 2 күн бұрын
@@Luger0312 you're welcome! Glad you've enjoyed the book so far, and hope the rest is as helpful.
@rizkydarmawan6540
@rizkydarmawan6540 2 күн бұрын
Thank you for this. I needed a refresher on this particular subject and this video is one of the best there is. Simple and intuitive with good practical examples 👍
@atiyaabdulkarim716
@atiyaabdulkarim716 3 күн бұрын
Hi, Do you add co-variates in your model? If so, do you put covariates measured at baseline?
@NickHuntingtonKlein
@NickHuntingtonKlein 3 күн бұрын
Since in ITS the treatment applies to everyone at the same time, baseline-measured covariates can't be a source of confounding, so adding covariates won't solve any causal inference issue. But you can add them to improve predictive power and reduce noise. Covariates that hcange over time might in some cases be necessary to solve causal inference issues, but you need to be careful with these to avoid issues like post-treatment bias.
@aza6513
@aza6513 5 күн бұрын
its just econ do that hahah they just rediscover something and name it like new.
@NickHuntingtonKlein
@NickHuntingtonKlein 5 күн бұрын
@@aza6513 wait till you hear about machine learning
@MB-sh9ur
@MB-sh9ur 8 күн бұрын
Hi. I want to discuss with you regarding a project I'm doing. Can you please tell me how can I connect with you?
@bisiadeyemo3082
@bisiadeyemo3082 10 күн бұрын
The video will be a lot better if you explain the coefficients instead glancing through it. Even your book, you barely explain the coefficients. It’s not just you, other books on advanced methods do not do a very good job of explaining the coefficients.
@NickHuntingtonKlein
@NickHuntingtonKlein 10 күн бұрын
What about the section titled "How do we interpret the results of this regression once we have estimated it?"
@bisiadeyemo3082
@bisiadeyemo3082 10 күн бұрын
@@NickHuntingtonKlein in journal articles, you are presented with only the coefficients and most students typically have problems explaining it. This is by far more important than the inner workings because most statistical software will do the calculations for you. I read through your instrumental variable section, and you barely explain the results of the first stage regressions. Similar with DID and regression discontinuity. This is not just you, several of the books that I have read, tend to pay little attention to the explaining coefficients
@donoiskandar6820
@donoiskandar6820 11 күн бұрын
Hi Nick. I have just started following your causality series, and it really is wonderful. I just wonder, in the case of fixed effect, does it could unintentionally control the collider and thus make a bias? let's say for the height vs basket ability in the NBA example (assuming there is height variation in each year, while there is no variation in NBA status across years)
@NickHuntingtonKlein
@NickHuntingtonKlein 11 күн бұрын
Thank you! It would be an unusual case where fixed effects introduce collider bias, since for that to be the case, one of those fixed-over-time characteristics would have to be caused by two separate variable-over-time characteristics. It's certainly possible that there is a collider bias problem in the analysis anyway that the fixed effects don't solve, though. In the NBA example, there's already a collider bias problem having ot do with the ability to get into the NBA, and fixed effects would not resolve the issue.
@donoiskandar6820
@donoiskandar6820 3 күн бұрын
@@NickHuntingtonKlein Thank you for your enlightening answer Nick!
@donoiskandar6820
@donoiskandar6820 11 күн бұрын
60% go when sick, 10% go when not sick. thus 60 - 10 = 50% of going to the doctor is explained by being sick. are you assuming that the total sample of all people who are not being sick and those who are already being sick is identical?
@NickHuntingtonKlein
@NickHuntingtonKlein 11 күн бұрын
Nope! It still works with uneven sample sizes.
@guzwall
@guzwall 17 күн бұрын
Great explanation!!
@Seitanistin
@Seitanistin 22 күн бұрын
Thanks from a german socioeconomy student! :)
@user-hp6in6vz3m
@user-hp6in6vz3m 23 күн бұрын
Thank you for the amazing explanation! Your textbook is also super helpful. I would like to ask a question regarding other ways to estimate staggered treatment effects. I've come across some papers using the matching x classic DID, which looks like these: 1. matching x DID (transforming the treatment year to t=0): The OLS looks like Y_i,t = Treat_i + Post_t + Treat_i * Post_t 2. matching x DID (without transforming the treatment year to t=0): The OLS looks like Y_i,t = Treat_i + Post_i,t + Treat_i * Post_i,t I was wondering what are the pros and cons with these 2 models, and how good they are compared to TWFE staggered DID?
@NickHuntingtonKlein
@NickHuntingtonKlein 23 күн бұрын
As long as you force it to drop the year just before treatment as the reference year, both of those should give the same result. However, both are wrong under staggered treatment so should only be used if treatment occurs all at the same time. Glad you like the book and videos!
@user-hp6in6vz3m
@user-hp6in6vz3m 22 күн бұрын
@@NickHuntingtonKlein Hi thank you for the prompt reply!! I realized these 2 models are essentially doing the same thing. Could you also explain why this model cannot be used in a staggered treatment? Or any reference material that I can address to? I thought this can be an alternative to TWFE DID. Is it because it is not comparing the early treated with late treated?
@NickHuntingtonKlein
@NickHuntingtonKlein 22 күн бұрын
@@user-hp6in6vz3m as for why this doesn't work, and other estimators, I'd recommend this video! Or the corresponding section of my book, 18.2.5 www.theeffectbook.net/ch-DifferenceinDifference.html#how-the-pros-do-it-2
@user-hp6in6vz3m
@user-hp6in6vz3m 22 күн бұрын
@@NickHuntingtonKlein Ooh so Y_i,t = Treat_i + Post_i,t + Treat_i * Post_i,t this method is the same with Two-way fixed effects DID? Could I understand it that when we fix some units and some year, the problem you mentioned for two way fixed effects would happen (like the forbidden comparison)?
@NickHuntingtonKlein
@NickHuntingtonKlein 22 күн бұрын
@@user-hp6in6vz3m the model you posted with by-year effects is not the same as the regular TWFE model that is just before/after, but neither model works under staggered treatment.
@saadsarwar7162
@saadsarwar7162 Ай бұрын
Man! you are amaging. I have been searching videos on youtube to learn for a long time. But none of them was enough for me to understand properly. Thank You!!
@qinghuafeng1705
@qinghuafeng1705 Ай бұрын
Hi Dr.HK, could I understand that if the controls influence both x and y, then include them or not in the model will influence the coefficient on x and also adjusted R square. But if they only influence y not influence x, then they won't influence the coefficient on x and adjusted R square, but will influence R square. Thank you for taking the time.
@NickHuntingtonKlein
@NickHuntingtonKlein Ай бұрын
Incorrect. Adjusted r square will be affected either way.
@qinghuafeng1705
@qinghuafeng1705 Ай бұрын
@@NickHuntingtonKlein Thank you very much!
@aleksandermolak5885
@aleksandermolak5885 Ай бұрын
As a person with hierarchical modeling background, I suffer major confusion every time I hear "fixed effects" in the other meaning 🙈
@Matthew-eb3di
@Matthew-eb3di Ай бұрын
This is the best explanation and animation I’ve ever seen for multiple regression and control variables! 🎉🤩
@dany84ct
@dany84ct Ай бұрын
What do you think about c#?
@NickHuntingtonKlein
@NickHuntingtonKlein Ай бұрын
I know they use it in computational finance sometimes. Wouldn't you have to program everything yourself from scratch though for most econometric applications? If you're willing to do that then c# would be fine but also so would any general purpose programming language.
@tareqalmahmud621
@tareqalmahmud621 Ай бұрын
could not find function "linearHypothesis"
@NickHuntingtonKlein
@NickHuntingtonKlein Ай бұрын
It's in the car package, so install car and then library(car).
@zahradidarali5804
@zahradidarali5804 Ай бұрын
Great videos! Video would be even better if you spoke slower :)
@qinghuafeng1705
@qinghuafeng1705 Ай бұрын
Because of your videos, I think I understand fixed effects now. I appreciate your excellent explanations and your replies for our questions!
@qinghuafeng1705
@qinghuafeng1705 Ай бұрын
Great explanation for the R^2! Thanks a lot! If the R^2 is very low, does that mean there might be omitted variables?
@qinghuafeng1705
@qinghuafeng1705 Ай бұрын
Hi Dr.HK, in this video, you said "The omitted variable bias part, the ordinary squares will assign the effect of Z to being the effect of X". Could you explain why "the ordinary squares will assign", what does that mean? Thank you!
@NickHuntingtonKlein
@NickHuntingtonKlein Ай бұрын
Meaning that if you regress Y on X alone, the coefficient on X will include both the effect of X and some part of the effect of Z. The statistical method can't separate out the Z effect since it doesn't know about Z, so that gets lumped into the X coefficient.
@qinghuafeng1705
@qinghuafeng1705 Ай бұрын
Hi Dr. HK, If I controlled person fixed effects, could I understand if the person never change cities, the coefficient on the city actually doesn't capture the influence of this person? The coefficient only captures the influence of person who move cities? Thank you!
@NickHuntingtonKlein
@NickHuntingtonKlein Ай бұрын
Correct
@qinghuafeng1705
@qinghuafeng1705 Ай бұрын
"by changing a variable by making a wet spot in front of my store. That’s all there needs to be for there to be a causal relationship. Me changing this variable, making the floor wet changes, the distribution of another increases, the probability that somebody will fall even if nobody actually did. " -----it's really good to know, thank you!
@rkmofficial202
@rkmofficial202 Ай бұрын
Thank you very much, Dr. for sharing the link. Very interesting and knowledgeable.
@wangguan1548
@wangguan1548 Ай бұрын
Hey, Dr, HK. If it's a mediator factor, should we control it in a regression?
@NickHuntingtonKlein
@NickHuntingtonKlein Ай бұрын
Generally no.
@wangguan1548
@wangguan1548 Ай бұрын
@@NickHuntingtonKlein Thanks for response!!
@qinghuafeng1705
@qinghuafeng1705 Ай бұрын
This is really helpful for me. Could I understand demean is actually remove the influence of something we cannot observe but is unique to that individual? Thank you for your sharing and for your reply.
@NickHuntingtonKlein
@NickHuntingtonKlein Ай бұрын
That's right. It removes anything about that individual that is constant over time. It will not remove things unique to that individuals that also change over time.
@qinghuafeng1705
@qinghuafeng1705 Ай бұрын
@@NickHuntingtonKlein Thank you so much! I appreciate your quick reply!
@qinghuafeng1705
@qinghuafeng1705 Ай бұрын
Really helpful, thank you very much! Now I know why controlling for drugs is a bad control. But could you let me know why you call confounding factor as "back door"? Why it is "back"? Thank you.
@NickHuntingtonKlein
@NickHuntingtonKlein Ай бұрын
Thanks! The back door terminology comes from the fact that it's an alternate way you can get from cause to effect. On a causal diagram, you can follow arrows pointing from cause to effect (for example cause -> outcome) - those are front doors. But there are two ways to get out of your house - the front door or a back door! Back doors are alternate way to get from cause to effect (for example cause <- confounder -> outcome)
@ski34able
@ski34able Ай бұрын
Very helpful thanks!
@lb.basnet
@lb.basnet Ай бұрын
nice explanation
@wangguan1548
@wangguan1548 Ай бұрын
Fantastic explanation!!!
@wangguan1548
@wangguan1548 Ай бұрын
Hey, Dr, HK, I really love your videos and the way your expression. Btw, I found in this video, there may be a mistake/ mistype: at time 4.46, when X=1, isn't it the change in X should be beta1+2beta2, and likewise when X=5, the change should be beta1+10 beta2 ?
@NickHuntingtonKlein
@NickHuntingtonKlein Ай бұрын
Thanks! And yes a 2 got dropped.
@blaisepascal3905
@blaisepascal3905 Ай бұрын
I learned most of these languages and in the following order: Stata - Python - R - Julia And by far R and Julia are the best! (At least for me)
@ifeyinwaumeokeke2571
@ifeyinwaumeokeke2571 Ай бұрын
Hi Thanks very much for this video. I would love to know the package you installed before library(margins). Thank you. I am using version 4.3.1
@NickHuntingtonKlein
@NickHuntingtonKlein Ай бұрын
The other two packages I loaded before margins were "wooldridge" (which I just used to get data) and "jtools" (which I used for regression tables, although these days I'd more likely use modelsummary)
@haraldurkarlsson1147
@haraldurkarlsson1147 2 ай бұрын
As NRC fellow at NASA JSC I studied Martian Meteorites for my postdoc. Some of the samples were onsite while others had to be obtained from natural history museums or individual researchers. Although I was fairly successful in obtaining the stones I desired some refused to send or share samples (for reasons typically not given). What type of missing is that?
@NickHuntingtonKlein
@NickHuntingtonKlein 2 ай бұрын
If you think the decision to withhold the stones from you is related to the characteristics of the stones, that's MNAR. if the choice to withhold is random, it's MCAR. can't be MAR because you're missing the entire observation instead of just some of the values. Yours is more a case of sample selection than missing data (which usually implies you have some variables for your observations but not other variables)
@haraldurkarlsson1147
@haraldurkarlsson1147 2 ай бұрын
@@NickHuntingtonKlein Interesting. These samples are typically rare and thus curators or individuals do not want to part with a big sample used for destructive analysis (different from simple loans). The material after the analysis had less or no value for some future work. Another reason is that museum curators (I used to be one) are simply not willing to part with rare samples. Finally, some may simply want to do the work you are proposing themselves.
@haraldurkarlsson1147
@haraldurkarlsson1147 2 ай бұрын
Very interesting. Now there are some missings in the card data. Fathers' ed is missing about 23% and IQ about 32%. Is that of concern in the modelling?
@NickHuntingtonKlein
@NickHuntingtonKlein 2 ай бұрын
Yes that can be a concern and may be enough to warrant an approach like multiple imputation
@haraldurkarlsson1147
@haraldurkarlsson1147 2 ай бұрын
@@NickHuntingtonKlein What is considered an "acceptable" loss percentage wise? This is tricky stuff. I know that major issues have arisen due to improper imputation (e.g. Rogoff at Harvard if I recall correctly).
@NickHuntingtonKlein
@NickHuntingtonKlein 2 ай бұрын
@@haraldurkarlsson1147 was Rogoff a multiple imputation issue? I thought it was something else. There's not really a specific cutoff (cutoffs that guide your inference or analysis in statistics are almost always a bad idea or at least subpar). But if there's a small amount of missing data (say in the like 5% range), then it likely won't cause a huge issue. More and at the very least you need to start thinking about why it's missing
@haraldurkarlsson1147
@haraldurkarlsson1147 2 ай бұрын
@@NickHuntingtonKlein I think you are right in regards to RR (Reinhart and Rogoff). I may have mistaken omission of countries in the study by RR as the result of imputation. In the paper criticizing the results (Herndon, Ash and Pollin) it is stated that "The omitted countries are selected alphabetically. It is clear from the spreadsheet itself that these are random exclusions." (section 3.2 Spreadingsheet coding error). That is what caught my eye. However, it does show the effect of selective use of data and its dangers. Thanks for your reply.
@sebastionheitzmann3233
@sebastionheitzmann3233 2 ай бұрын
Corellation is not causation, still a good decision😂 Cracked me up :)
@oumardiallo7292
@oumardiallo7292 2 ай бұрын
Very neat!
@ginaelconstelmpoubou1986
@ginaelconstelmpoubou1986 2 ай бұрын
Great
@RumbutterMcSquash
@RumbutterMcSquash 2 ай бұрын
Excellent video. Those jump cuts are giving me epilepsy though.
@tusharsaini6558
@tusharsaini6558 2 ай бұрын
Brexit vote is a good example.
@shankarjeetpanda3476
@shankarjeetpanda3476 2 ай бұрын
Literally saved me. Ur a legend ❤
@dehiole6463
@dehiole6463 2 ай бұрын
0:45❤❤
@dehiole6463
@dehiole6463 2 ай бұрын
is it still true if i put white = 1; not white = 2??? 6:50
@NickHuntingtonKlein
@NickHuntingtonKlein 2 ай бұрын
Yep it will work for any value of white (although in this case white is binary so it can only be 0 or 1 anyway, but in a case with a variable with a wider range, yes)
@cheriseregier4729
@cheriseregier4729 2 ай бұрын
Thank you. Do you know of an informative resource that lays out the R code for CEM?
@NickHuntingtonKlein
@NickHuntingtonKlein 2 ай бұрын
Yep, the whole series is about my book, which contains code for most of the topics, including CEM. See near the end of this section of Chapter 14,. www.theeffectbook.net/ch-Matching.html
@cheriseregier4729
@cheriseregier4729 2 ай бұрын
@@NickHuntingtonKlein Wonderful, thank you again.
@cheriseregier4729
@cheriseregier4729 2 ай бұрын
@@NickHuntingtonKlein can you tell me what the 'w' in this code from the chapter you linked above represents? I am having trouble figuring out how to incorporate the CEM data into my regression analysis. In other words, once I have run the cem() code, what are the next steps? It looks like you re-weight your data based on CEM and then specify this weighting methodology in the regression? brcem <- brcem %>% mutate(cem_weight = c$w) lm(responded~leg_black, data = brcem, weights = cem_weight).
@NickHuntingtonKlein
@NickHuntingtonKlein 2 ай бұрын
@@cheriseregier4729 the w is the CEM weight. See further up the section to see how this is calculated. And yep, once you have the weight you can use it as a weight in your analysis to apply your CEM matching set/weights to any regression, means comparison, etc.
@cheriseregier4729
@cheriseregier4729 2 ай бұрын
@@NickHuntingtonKlein Thank you🙏
@anthonymenor1152
@anthonymenor1152 2 ай бұрын
could you ever do an interaction term between just 1 level of categorical variable and another variable?
@NickHuntingtonKlein
@NickHuntingtonKlein 2 ай бұрын
I suppose you could but it would make the interpretation pretty difficult. The coefficient on the other variable would be a weighted average of the effects for the other levels of the category, and the coefficient on the interaction would be the difference between the effect for that group and that weird weighted average.
@kirkmotocross9388
@kirkmotocross9388 2 ай бұрын
may be to old of a video but outreg2 is not working. says it is unreconizable
@kirkmotocross9388
@kirkmotocross9388 2 ай бұрын
well I got it to work from other comments but now it is opening into word
@NickHuntingtonKlein
@NickHuntingtonKlein 2 ай бұрын
@@kirkmotocross9388 Set the output document type in "using" i.e. output.tex or look at the help file for how to set the output type using options.
@marinakousta680
@marinakousta680 2 ай бұрын
Thanks for the great video and your very helpful book! May I ask - how exactly would the stata code look to test the "pre-treatment" period effects you are seeing and then plotting them?
@NickHuntingtonKlein
@NickHuntingtonKlein 2 ай бұрын
Thanks! See this section of the book www.theeffectbook.net/ch-DifferenceinDifference.html#long-term-effects
@tomgroves1497
@tomgroves1497 2 ай бұрын
Thank you so much for this video! ChatGPT had me running in circles with error codes!
@jayp5898
@jayp5898 2 ай бұрын
Hello Nick, thank you for the detailed explanation. How would you do this (code in python) when you have 20 independent variables of which is a treatment variable which is binary?
@NickHuntingtonKlein
@NickHuntingtonKlein 2 ай бұрын
In statsmodels? See this section of my book www.theeffectbook.net/ch-StatisticalAdjustment.html#coding-up-polynomials-and-interactions
@jayp5898
@jayp5898 2 ай бұрын
@@NickHuntingtonKlein thank you for the quick response. formula = "'inspection_score ~ NumberofLocations*Weekend + Year"' - this assumes that there is interaction only between NumberofLocations and Weekend. What if I have x1, x2 ..x20 variables and x21 as treatment variable? can I write something like: formula = "'y ~ x21*(x1 + x2 + x3 + ... + x19 + x20 )"' ?
@NickHuntingtonKlein
@NickHuntingtonKlein 2 ай бұрын
@@jayp5898 that would work in R but I'm not sure about the statsmodels implementation of patsy. Try it and see! Otherwise you might have to do them one at a time (but you can build the model string procedurally using regular ol string manipulation if that's the case)
@jayp5898
@jayp5898 2 ай бұрын
@@NickHuntingtonKlein Hi Nick, I was able to fit a logistic regression in python including all variables and their product with treatment variable. However, this model ROC is much lower (0.53) than model with just the variables (main effect). Also, this model only contains interaction effect and none of the main variables. What am I doing wrong?
@NickHuntingtonKlein
@NickHuntingtonKlein 2 ай бұрын
@@jayp5898 you definitely do want to include the variables by themselves, not just the interactions. So that's likely to be your issue. In addition, do you mean auc instead of roc? I wouldn't be too worried about your roc at any given cutoff (and I'd only worry about auc if your goal is classification as opposed to inference).