An intuitive introduction to Propensity Score Matching

Рет қаралды 210,218

9 жыл бұрын

Propensity score matching is a common technique used to estimate the effects of a treatment or program when you don't have a randomized controlled experiment. In particular, it's used when you have observational data that includes pre-program characteristics that determine whether or not each individual received the treatment.
In this video, I work through a simple example of how it works and give you the basic intuition for the method. I also talk briefly about how to assess how well the method works, and discuss the method's advantages and disadvantages relative to multiple regression.
Intended audience: Folks who have had some exposure to linear regression models, but want to learn more statistical methods.

Пікірлер: 118

@christopherzimmer 2 жыл бұрын

Among the dozens of PSM videos, this stands out as simply the best. The central example, shown clearly with the intuitive elements highlighted, and the discussion at the end regarding what PSM does *not* do- are crucial and critical! One suggestion: insert a slide showing the logit regression model to really highlight where the probabilities are coming from.

@DermDrNik 4 жыл бұрын

This is excellent, refreshing to see a tutorial where you can tell someone knows what they're doing

@moqaraza 9 жыл бұрын

Extremely helpful, especially with the simple, minimalistic data example. Thank you.

@katyasotiris9667 2 жыл бұрын

Intuitive indeed! Love the simplicity and clarity in your explanation, thank you!

@svalbard01 9 жыл бұрын

This was really helpful and intuitive. Thank you!

@namgaydorji3344 4 жыл бұрын

Extremely helpful to someone who is just beginning to learn the PSM approach. Thank you very much.

@anupamghosh6578 5 жыл бұрын

Excellently presented intuitive explanation of p-score matching! Thank you

@zhulin2531 5 жыл бұрын

Well done. It's very clear and I like it when you explained the advantages and disadvantages of propensity score matching. Very useful for interviews

@Dave48797 3 жыл бұрын

Loved the Video. Best explanation of Propensity Score Matching I ve come across this far.

@32deepan 5 жыл бұрын

Thanks for excellent video Doug. Very informative and intuitive

@myyoutubechannel2858 3 жыл бұрын

Thank you --- wonderful video. When I read "intuitive", I was skeptical. But you truly made it intuitive.

@triong 5 жыл бұрын

Just beautiful! Thanks a lot, Doug.

@gelodude07 4 жыл бұрын

This is so much better than most books!

@daniloamfreire 9 жыл бұрын

Very easy to understand. Thanks a lot!

@sangheepark07 7 жыл бұрын

This is an amazing explanation! Thank you!

@chaiwuty 9 жыл бұрын

Thank you very much. Make me understand a lot more and more and looking forward to your video on propensity score.I use it in medical research.

@Haz2288 7 жыл бұрын

Huge thanks for this, Doug!

@montanabuntragulpoontawee4065 4 жыл бұрын

So easy to understand.As a clinician, I have a hard time studying statisitcs. Really appreciate your work. Thank you so much. Please do more VDOs like this! P.S. I still have a hard time figuring out inverse probability weighting following propensity score use.

@Jhonnydonny 6 жыл бұрын

This is an amazing explanation. Thanks.

@Has_1990 4 жыл бұрын

Thank you Doug! This was very helpful

@ceciliapisoni 6 жыл бұрын

The video is excellent. Thank you very clear and helpful.

@indikamallawaarachchi7188 5 жыл бұрын

Very good explanation. Thank you!!!

@olajumokeolateju1104 2 жыл бұрын

Your example made it easy to understnad. Thanks so much

@szai6068 Жыл бұрын

Okay the second time watching this I finally understood. Thank you!

@projectkfw8201 4 жыл бұрын

Thank you very much sir after watching many finally I understood from you

@dougmckee673 9 жыл бұрын

Thanks so much for the positive feedback!

@roraaa11 Жыл бұрын

Great explanation!

@Lake_mondota 4 жыл бұрын

very clear! great example!thanks

@ripples1984 2 жыл бұрын

quite intuitive and helpful, thanks!

@richardmuhindo1439 6 жыл бұрын

indeed i needed this at this time in my phd studies

@fksons4161 2 жыл бұрын

Thank you for this explanation

@sumitmandal3901 2 жыл бұрын

amazingly explained! Thanks

@johnnychiu9715 5 жыл бұрын

This is great! Thank you!

@linpershey 2 жыл бұрын

Brilliant! Learned a lot from it!

@user-xh4lp5ts8g 8 жыл бұрын

This is great:D Thanks!

@eviirawan48 7 жыл бұрын

Very clear explanation

@douglasespindola5185 5 жыл бұрын

Man, I LOVE YOU! Hahaha! Greetings from Brazil! Nice job!

@garbour456 6 жыл бұрын

Great video, thanks for doing this

@hangsu5294 5 жыл бұрын

Really really helpful, you saved my ass! THANKS!!!!! You earned yourself a subscriber!

@popo-je8ze 2 жыл бұрын

great explanation

@andeslam7370 2 жыл бұрын

i don't know what to say but you teaching is way better than my professor's teaching.

@paigetao6758 3 жыл бұрын

Very helpful. Thank you so much

@TheAkshaykher 4 жыл бұрын

Awesome Video!

@spencerfrank8837 4 жыл бұрын

Really helpful. Thanks!

@hm.91 3 жыл бұрын

Great video! Thanks a lot!

@kayjang4901 5 жыл бұрын

Thank you so much for your great presentation. It is really intuitive. I have seen an article that used a multiple regression with a matched samples instead of using one approach. What do you think of that? Could you advise me?

@melodydaccache4189 2 жыл бұрын

This is excellent

@ericlau6435 5 жыл бұрын

Great work

@FlywithZahanat 2 жыл бұрын

very clear Dear

@jacksheng7650 Жыл бұрын

God, this is so good!

@yumik4990 4 жыл бұрын

I love how your examples are small. There are pro and con in propensity score matching vs multivariate regression. But if one believes that the propensity score can be used to explain casual effects, the multivariate regression model is just as much be able to explain casual effects as both eliminates cofounding factors.

@bijaya7764 6 жыл бұрын

Thanks for the teaching... Do you also have video that shows how you calculated the individual ps1 values? thanks

@douglasmangini8744 5 жыл бұрын

helped a lot, thank you!

@cocoagardenia 6 жыл бұрын

So helpful!

@250nation7 2 жыл бұрын

This was well simplified

@timte5924 2 жыл бұрын

Excellent video, thank you very much! Can you maybe quickly explain how you calculated and displayed PS1 in Stata? I understand how to run the regression but I struggle to find the PS1 outputs per line, so I can actually match one line to another

@bharathkumar32 4 жыл бұрын

Hello Doug, I had extremely good learning from your video. I have one challenge in application. My treatment observations are more than control observations. In this case, how does the matching works? What are the challenges generally this data set would have?

@MrCuongnguyendang 5 жыл бұрын

Thank you for this video, it is very helpful. I need to use the Propensity Matching Score methodology and my dependent variable is a dummy, could you give me a suggestion to evaluate the difference between control and treatment group, thank you so much

@Potencyfunction 3 ай бұрын

😃 What an interesting score.

@maxi01v 3 жыл бұрын

better than my textbook!

@pricillajeyapaul Ай бұрын

Thanks a lot bro 🎉

@AbhishekSharma-mt8yz 5 жыл бұрын

This is very helpful. What happens if the balancing property is not satisfied?

@chris6925 Жыл бұрын

Awesome!

@NZegg 7 жыл бұрын

Dear Doug, Thank you for this very helpful video. I have a question regarding the selection of the covariates when using teffects in stata. The dataset Im using contains 2.8mio observations and I wanna try to estimate the causal effect of brazils Bolsa Família programm (similar to mexicos Oportunidades on which you've also uploaded a video) on educational outcomes. Im not sure on which variables I should match the treatment and control group. Could you please give any suggestions how one should choose the right variables for matching? Thank you in advance =)

@fernandojackson7207 7 жыл бұрын

Thanks, nice presentation, Prof. Please check if my understanding is correct. I just saw a claim that school X has a graduation rate higher than all other schools with students in similar socioeconomic background. Would PSM work as to make sure that the student groups being compared to each other re graduation, have similar social background?

@yulinliu850 2 жыл бұрын

Thanks!

@wgeorge1602 4 жыл бұрын

really good

@kellermartinezsolis5926 Жыл бұрын

Thanks for the video! It is very clear, just a quick question: how did you compute in Stata the column "ps1"?

@ghadaabu-sheasha4278 7 жыл бұрын

Amazing

@AnandKhanna17 4 жыл бұрын

Question, while estimating the propensity score, do we train on the entire dataset or only the records which got the treatment and then estimate for the non-treatment group as unseen data?

@3foss191 7 жыл бұрын

thks for the video

@SNSDjennifer 7 жыл бұрын

Dear Doug, thank you for making this great and easily understanding video. However, a small question regarding the computation of predict probability of treatment, could you show me the calculation of one psl in the example? Thank you :)

@powermod6772 2 жыл бұрын

Logistic regression models the Posterior P(T|X) as a Bernoulli. So for some x value, the logistic regression model returns a probability p for T=1, i.e. p = P(T=1|X=x). This is the propensity score. Note that in classification p is the predicted probability for T being 1. To make a class label (for which purpose logistic regression is most often used) you simply predict class 1 if p > 0.5. But this class label prediction step is omitted here.

@mayastoyanovawarner7997 4 жыл бұрын

Yes! Thank you! I had so many aha moments watching this!

@roypeijen 8 жыл бұрын

Dear Doug, thanks for this video since it already helped me a lot. I have a question though I would like to ask. After you computed ps1 by logistic regression (controlled for vector X), you create match1. How did you create this match1 variable? Did you do this just by hand or is there any stata command that looks at the best match given the scores in ps1? In my large dataset I cannot do it by hand, that is why I am asking. Thanks in advance.

@dougmckee673 8 жыл бұрын

+Roy Peijen Great question--I used Stata's "teffects" command. Specifically: . teffects psmatch (imrate) (T povrate pcdocs) ,gen(match) atet

@masudparvez9133 Жыл бұрын

Its really helpful, but can you please tell how you calculated ps1? How can I do it in Stata?

@masonwang9218 4 жыл бұрын

nice video

@paulinavazquezquintana5662 Жыл бұрын

Which program do you use to calculate this analysis? Are there some code packages, which can be used and upload data? Thanks!

@user-rv3ic2dz9x 3 жыл бұрын

informative

@siyuhou1957 4 жыл бұрын

I don't quite understand the reasoning behind why we can use people's characteristics to predict whether a person is assigned to the treatment group or not. Why are we assuming that the assignment is based on the characteristics, and hence build a logistic regression to predict the assignment using these characteristics, then use the probability as a measure of 'similarity'? I am sure it's right, just don't understand why...

@paolo4401 9 ай бұрын

mi problem is: how I do interpretate the new dataset generated after PSM? how do I create a table showing percentages of each categorical covariate I've chosen for matching?

@valeriablanco03 3 жыл бұрын

Hi! Here you calculate ATT = -7, how do you obtain ATE in this simple example?

@zeinebouni8764 8 жыл бұрын

Thank you for this video is very helpfull. I need to use the Propensity Matching Score methodology and my dependent variable is ordinal. I am Using Stata 14. I just want to know if there is a specific specification for ordinal outcomes? In Stata 14 we have the choice between: Continous Outcomes, Binary Outcomes, Count Outcomes, Fractional outcomes, nonegatives outcomes and survival Outcomes. But not Ordinal outcomes. Thank you

@dougmckee673 8 жыл бұрын

+Zeineb Ouni I don't know of anything built in, but I think you could use propensity score matching to create your matched control group, and then use something like a Wilcoxson Rank Sum test to see if the distributions are significantly different in the two groups. You could also run a ologit with a single independent variable (the treatment dummy) with the combined treatment and matched control data set to quantify the differences. Hope this helps!

@zeinebouni8764 8 жыл бұрын

+Doug McKee Thank you Mr Doug for your response. It's very helpful. I have another idea. This is the situation: The dependante Variable is Ratings Firms (1 to 7; 1 is low Rating and 7 is high). Independantes Variables: D1 (Treatment); D2 (Time). I thougt transform my dependant Variable and create a binary Variable according to the average of Rating. So Ranting2 = 1 if Rating> Average; 0 if Rating < Average. And use Propensity Matching Score for binary Outcomes using Rating2. What do you think? Thank you so much.

@dougmckee673 8 жыл бұрын

+Zeineb Ouni This throws away a little information, but it should work.

@artwork2179 8 жыл бұрын

What is 0.25 and -0.25 written in the blue equation on slide 13? Thanks for the video. Its insightful.

@dougmckee673 8 жыл бұрын

+Soumya Upadhyay I'm computing the average in the treatment group by just adding the four outcomes together and dividing by 4 (aka multiplying by 0.25) and then doing the same thing for the matched control group. Hope this clears things up!

@nikolov901 8 жыл бұрын

I'm trying to learn more about matching and stumbled upon your video. It seems that you frame the question Regression vs. Matching, while other articles I read (including wikipedia) seem to use matching as a preprocessing step in a regression. What's up with this discrepancy?

@dougmckee673 8 жыл бұрын

Both are correct. Classic propensity score matching (what I describe here) is an alternative to regression--You use the covariates to identify close matches between observations of treatment and control. More recently it's become popular to combine regression and propensity scores. That is, you can use the inverse of the propensity score for each observation as a weight in a regression analysis.

@dharman.bhatta7042 8 жыл бұрын

Dear Doug, your videos are very informative and easy to follow, could you please provide the PSM Stata commands for RCT study designs. Your first video related to DiD is very easy to follow with stata commands. Thank you

@dougmckee673 8 жыл бұрын

+Dharma N. Bhatia Glad you like the video! If your RCT is truly randomized, you shouldn't need to do any adjustment using matching--Just use a simple t-test to compare means of continuous variables in your treatment group to your control group.

@dharman.bhatta7042 8 жыл бұрын

+Doug McKee , Thank you for your response, yes true, just I wanted to cross check the DiD (impact) with matching or without matching. Thank you.

@michellesaksena1226 8 жыл бұрын

Doug, I was wondering if PSM can be used when there is no apparent selection bias, but rather to make a comparison between the treated and non-treated groups. For example, if i were to designate birth cohort as my "treatment" where obviously birth year is not an individual decision, the PSM would essentially boil down to pair-wise controlling of treated and non-treated individuals based on whatever J attributes. As in, the distributions of p-scores should be the same for treated and non-treated groups. For an example, i have seen gender used as a "treatment" to compare wage differentials between men and women within subsets of STEM disciplines and gender is for the most part, not an individual decision. However, this was a tautological exercise so i am not sure if this is actually practiced in real life research. Basically, are there other benefits of PSM other than ameliorating selection bias that are used in practice to justify using PSM? Thanks, Michelle

@dougmckee673 8 жыл бұрын

+Michelle Saksena Sometimes people use propensity score matching when they believe the treatment might have very different effects on different groups and they want the control group to look as much as possible like the treatment group. In the situation you describe where you have two groups that are not systematically different, a t-test is the most straight-forward way to compare outcomes. If there is a lot of variation that can be explained by observable characteristics, most people would simply use a regression to increase the precision of the estimate of the difference. Hope this helps!

@michellesaksena1226 8 жыл бұрын

this helps! thank you!!

@gelodude07 4 жыл бұрын

Good however, in the logistic regression why wasnt the predictive accuracy of the model not factored in. One can use the confusion matrix and sensitivity.

@toobaahmedalvi7008 10 ай бұрын

How did you summarize the infant mortality rate lowering 7 deaths per 1000?was 1000 your sample population among treated and non treated infants??

@3foss191 7 жыл бұрын

is lowering the infant mortality by 7...? sorry im not getting well the pronunciation. thks

@zeinebouni8764 8 жыл бұрын

Hi Mr Doug, I am very confused between the commands of Endogenous treatment effects (eteffect in stata) and Linear regression with endogenous treatment effects (etregress in Stata). What's the main difference and when i have to use one not the other one. Really confused. Thank you for the help.

@dougmckee673 8 жыл бұрын

+Zeineb Ouni Great question and believe it or not, this is the first I've heard of either of these commands! Sorry I can't be of any help at all! I recommend spending some quality time with the TE (Treatment Effects) Stata manual.

@zeinebouni8764 8 жыл бұрын

Thank you very much for your interest anfd for recommandations.

@omarfrikhat5191 Жыл бұрын

Interesting (y)

@anmolpardeshi3138 Жыл бұрын

why are you considering weights when calculating effect size. eg 0.25*() - 0.25*() - where did this 0.25 came from and why?

@danielmillian2024 3 жыл бұрын

from were did you get the 0.25 ?

@f2harrell 3 жыл бұрын

It doesn't follow that a large number of control observations are irrelevant if the treatment is very imbalanced. Matching methods tend to discard very applicable controls just because they came later in the dataset. The resulting loss of sample size makes matching inefficient.

@mekonnendemlie2028 3 жыл бұрын

it is good and clear howevere it becomes clear if i is with practical example

@kareemmohammed7862 3 жыл бұрын

at10:12, where match were 6 and 5, in formula its -0.25*(19+25+25+25). it should have been -0.25*(25+19+19+19)..

@graysonbuning500 3 жыл бұрын

No, observation 5 was matched three times and thus we use the observation 5 PS of 25 three times.

@artwork2179 8 жыл бұрын

Mr. Mckee, You said that the command is logistics. Isn't it psmatch2 in stata

@dougmckee673 8 жыл бұрын

+Soumya Upadhyay People used to use 3rd party plugins to do propensity score matching in Stata, but in version 13, Stata added the teffects command which is quite powerful and does ps matching along with several other things.