Propensity Score Analysis in R with Nearest Neighbor, Optimal Pair, and Optimal Full Matching

Рет қаралды 15,335

Күн бұрын

Пікірлер: 46

@statsguidetree Жыл бұрын

I needed to update the rcode to load and clean the dataset to get the data ready for the analyses. Please use the updated rcode here to follow along: gist.github.com/musa5237/78a694bd6663a92a82e45e684e616724

@sanjanakhondaker887 Жыл бұрын

What an amazing explanation!!! Hats off. You even provided the R-script. Super helpful! You saved my thesis, thank you so very much.

@basser1995 3 жыл бұрын

I am pretty desperate because i need to perform propensity matched analysis, having never used R-statistics, (used SPSS). But 15 minutes into this video i can already tell it's going to be extremely helpful!

@statsguidetree 3 жыл бұрын

Thank you so much for the compliment.

@muhammedhadedy4570 Жыл бұрын

I've watched many tutorials explaining propensity score matching on KZbin, and I can tell that this video is the best I've ever seen. Well done, sir. You helped me a lot. ❤❤❤❤

@lanredaodu945 6 ай бұрын

excellent tutorial i watched 3x

@brainwt 5 ай бұрын

Very good guide! Thanks

@analyticspipeline2526 2 жыл бұрын

Great video, thank you for that!

@francyy-ug1qr 8 ай бұрын

thank you sm!!

@manonkinaupenne2090 Жыл бұрын

Thank you very much for this clear explanation! I have a small question: would you use PSM to match patients to healthy controls in a cross-sectional case-controled study? I want to look at the difference in physical activity expressed in minutes per day (dependent variable) between these two groups. thank you!

@statsguidetree Жыл бұрын

Yes, PSM should always work when you have a control group.

@alexwisniewski7105 Жыл бұрын

Do you include both the quadratic and non quadratic terms in your propensity match? For example, if my quadratic term had a lower SDM, should I remove the non quadratic term and just include the quadratic one in my final model?

@statsguidetree Жыл бұрын

This depends on your data and the type of relationships you want to capture and what makes sense specifically for the data you are working with. If you have a quadratic term and quadratic term for your explanatory variable in the model, you are saying that the relationship between your response and the explanatory variables is quadratic and linear (i.e., your model captures both), but just keeping the quadratic term you are saying the relationship is just quadratic. Generally, if you want to capture wider scope of relationships you can leave both but be mindful this could lead to overfitting.

@amalalkalbani4572 2 жыл бұрын

Thank you for the comprehansive explanation. I have an issue with my PSA, the variance ratio doesn't appear when I use the summary function. I got dots only! could you please tell me why? Thank you. (All my covariates are categorical & Binary)

@fleurestethique 2 жыл бұрын

I had the same problem when I entered my covariates as factors into the formula, but variance ratios appeared once I converted them as.numeric. I don't know what that means in terms of interpretation though

@statsguidetree 2 жыл бұрын

@fleurestethique I noticed that the function to visualize the overrate imbalance love.plot() does not allow for categorical variables. However, you can still inspect the covariate imbalance when you use the summary() function.

@vikasmishra4485 2 жыл бұрын

This video is pretty informative. I have one question. In cov balancing plot using cobalt, we need to match both mean and variance stats? In my case mean us balanced with in the threshold but variance is not. Can i say that matching is balanced with mean balancing only?

@statsguidetree 2 жыл бұрын

It is good to have both, I presented only one set of criteria to use but there has been other suggested criteria. Also, recommendations in the literature are always changing. I would try some techniques to see if I get a better balance. But, if I cannot do a better job I would just report in the methods and discussion/limitation. Balancing the covariates will be a big part of the challenge to PS matching.

@fleurestethique 2 жыл бұрын

This was extremely helpful thank you so much! When working with subsets, should I calculate the propensity scores on the whole dataset first and then apply them on the subset or directly calculate the propensity scores only for observations in my subset? Also, the dataset I am working requires me to incorporate additional weights due to the way they did the sampling. How can I apply both the propensity score and the other weights in my regression? Thank you

@statsguidetree 2 жыл бұрын

I may need some more information on the nature of the dataset. But, generally, you could calculate PS for the whole dataset. For your other question about weights, not all PS matching methods produce weights. For example, if 1:1 matching without replacement is used, all the weights =1. But, if you are using a PS matching method that does produce weights and you already have a set of weights you need to apply -- there are a few things you can do. The issue is the 'weights' argument in the lm() function only allows you to use a vector. Now you may have a reason depending on the nature of your dataset to not use the whole dataset and consider subsets -- if that makes sense. Or you may want to consider combining the two sets of weights by multiplying; however, you would need to look at the weights produced and see whether they make sense, before carrying out your regression analysis. Ultimately, my suggestions are just general statements, you may want to consult with some other sources (e.g., previous PS analyses using your dataset or a similar dataset, content experts, etc.).

@velonty 5 ай бұрын

@@statsguidetree Thank you for the awesome video. I have a similar question. If I am using a data set that has survey design requirement. Do I carry out propensity score matching with just the sample data or with the weighted dataset?. Or can I just carry out the PS matching with the sample and in my final regression of the matched data I use the weighted data(survey design weight)

@festusattah8612 Жыл бұрын

great video!!! what will you advise I do if I have more 'treated than control' and the matching approach to use if treatment is not randomized; take for example a state legislation

@statsguidetree Жыл бұрын

You can try using K to 1 matching and optimization or you can try full matching. You can run both and compare which gives you better balance across your covariates.

@lizhang9898 4 ай бұрын

Just to clarify, the first covariate is the same as your dv?

@hasanhash12 Жыл бұрын

Hi, Thank you for video. I loaded dataset coll from the link that you have pinned and then ran the script from identify field names to adjust units for continuous variables. After running it makes all values as NULL in coll and makes coll2 as o obs. of 6 variables. what should i do?

@hasanhash12 Жыл бұрын

and also at line 136 #no psa, just regression if i run mod_test1

@hasanhash12 Жыл бұрын

I suppose problem is here at line 22: coll

@praveena6095 2 жыл бұрын

Great video. If I want to include in my analysis part some additional covariates which are not used for matching, how can I get it in my data after using match.data.

@statsguidetree 2 жыл бұрын

If you want to use additional variables in the analysis phase you can enter those additional variables in the final regression model that were not included in the matching process.

@priyankaroy7243 2 жыл бұрын

while im installing "MatchIt" it shows "There is no package called MatchIt". How to solve it?

@statsguidetree 2 жыл бұрын

Hello, just saw your post. Did you run the code library(MatchIt) first with out running install.packages("MatchIt") I did not install it again because I already installed it before. I kept that line in the code but put the hash sign # first so it was there as a note. Try running it without the hash sign.

@priyankaroy3686 2 жыл бұрын

@@statsguidetree Yes that's solved. Thanks!

@user-iq2qr8lb2y Жыл бұрын

I did the the first step (design phase: selecting covariates) but only 3 out of 14 are significant. And I want to know if it is considered balanced or not and what to do.

@statsguidetree Жыл бұрын

So if covariates are significant it won't be related to whether the values of those covariates are balanced across treatment conditions. To check balance you have to look at standardized mean difference and/or variance ratios values to see whether they are in some threshold you decide to use.

@sharmilibalarajah1940 2 жыл бұрын

Thank you, this was really helpful! Do you have any ideas about how I can approach this if I want to match three groups i.e. non-binary??

@statsguidetree 2 жыл бұрын

I can say that generally PS analyses can be conducted with non-binary treatment groups (i.e., treatment variable with more than 2 levels). But, I do not think the MatchIt package supports it (I could be wrong because it could have been updated). There is another package available if your treatment variable has 3 levels instead of 2 levels called TriMatch. I am not too familiar with the package but here is the general documentation: cran.r-project.org/web/packages/TriMatch/TriMatch.pdf

@김수연-h2r5k 2 жыл бұрын

Thank you for informative video. I did full matching based on your video, and ran comparisons after propensity matching. But, mean, standard deviations and p score did not change at all compared to unmatched data. How can I solve this problem?

@statsguidetree Жыл бұрын

That is a good question, I assume you are talking about p-values in your final model post matching -- if that is the case, ultimately with PS matching you are attempting to just balance the data between your treatment and control groups to make more reliable interpretations of your final model. It could be that after balancing your data you find no average treatment effect.

@mahnoorjadoon2674 Ай бұрын

Can you please tell me how to select outcome means on what basis?

@statsguidetree Ай бұрын

By select outcome means are you referring to checking balance and the ranges I used for the Standardized mean difference of between -.1 and +.1 and Variance ratios between .8 and 1.25? Those ranges were are general recommendations. Zhang et al. (2019) suggested something similar.

@SCaRaB6288 3 жыл бұрын

can we use categorical covariates e.g. 1 = male 2 = female or should they be dummy coded? Thank you

@statsguidetree 3 жыл бұрын

Yes. Categorical covariates can be included.

@katieweir4166 Жыл бұрын

The data doesnt work anymore!

@statsguidetree Жыл бұрын

My apology for the delayed response, you can use the following code to load it into r: coll

@maddybond007 2 жыл бұрын

Please validate if this link has same data, which you have posted initially, since your link is no more accessible: LINK: ed-public-download.app.cloud.gov/downloads/CollegeScorecard_Raw_Data_04262022.zip

@statsguidetree 2 жыл бұрын

I will try to find a way to load the dataset on my GitHub. But, until then, I can email it you. Just send me an email at statsguidetree@gmail.com