Simple techniques for dealing with missing data

Рет қаралды 7,225

Mikko Rönkkö

Күн бұрын

Пікірлер: 30

@duckcluck123 Жыл бұрын

I loved that you added those simulation results. That was very interesting and helped my understanding

@mronkko Жыл бұрын

You are welcome!

@MaverickRam Жыл бұрын

Very helpful and simplified explanation. Thanks for the video!

@mronkko Жыл бұрын

You are welcome!

@danielvoss2483 6 ай бұрын

Good Job 👍👍👍

@mronkko 6 ай бұрын

Thanks!

@surya-td4dg 3 жыл бұрын

Strong Finnish accent :).. Thank you for the awsome content

@mronkko 3 жыл бұрын

I take the comment about my accent as a compliment ;) Funny thing: I used to live in the US and part of the accent was lost during that time. Even if that is about 20 years ago now, I still see my accent diminish when I spend a couple of days there. But now that we cannot travel the accent is as strong as ever!

@PavanKumar-ef1yy 7 ай бұрын

Thanks a lot sir

@mronkko 6 ай бұрын

Most welcome

@mohamadmatinhavaei9859 6 ай бұрын

Great job, but what about missingness that exist in a single column and also it's more than 50%? Is deep models like GAN would be useful for imputation?( In time-series prediction). Many thanks🙏

@mronkko 6 ай бұрын

I assume GAN refers to some kind of neural network. Imputation works regardless of the amount of missing data, under these three conditions: 1) You are doing multiple imputation and not single imputation so that you can quantify the uncertainty introduced by the imputation process. 2) The imputation model contains all features of your data that are relevant for the analysis. 3) The missingness does not depend on the missing value itself. (i.e. data are MAR or MCAR) I do not really see what neural nets would add over throughfully developed imputation model but they are likely to increase sample size requirements.

@mohamadmatinhavaei9859 6 ай бұрын

@@mronkko "Hi again Mikko, I'm tackling a unique challenge with my dataset and believe your insights could greatly help. Could you share any contact info for more brief discussion? Thanks!"

@mronkko 6 ай бұрын

@@mohamadmatinhavaei9859 I take consulting orders through instats.org/expert/mikko--rönkkö-829.

@ashayagarwal Жыл бұрын

I found your channel recently, and started liking your teaching approach. I want to ask if pairwise deletion is possible in regression y = X*beta + e, beta = inv(X'X)X'y. It is possible to calculate a pairwise version of X'X. Would love to hear your thoughts. Thx

@mronkko Жыл бұрын

In pairwise X'X you would need to adjust for sample size for each cell. But in principle you can estimate pairwise covariances of all the variables and then estimate regression from that covariance matrix. The resulting estimator should be consistent under MCAR but getting the standard errors right would require adjustments to the complete data standard error formulas. I have not seen any paper discussing how to do this and therefore I would not be comfortable using this approach. That being said, that I have not read something does not mean that it does not exist. I have just come to the conclusion that because FIML and multiple imputation exist already and I know how to do both, there is little reason for me to learn about other approaches to adjusting for missing data in estimation.

@George70220 7 күн бұрын

Our teacher is focusing on us using KNN to impute data. This seems like a biased method like the traditional methods but I'm not 100% sure.

@mronkko 6 күн бұрын

What does KNN stand for.

@bandungmee Жыл бұрын

Hi It was mentioned that "the imputed data can only be used within the pooling testing and cannot be used for the model testing". Does it mean the data is only imputed/simulated for the purpose of analysing its reliability?. If it cannot be used for model testing, does it mean we still need to use the actual data and perform the deletion of missing data? Correct me if I'm wrong Thank you

@mronkko Жыл бұрын

I need more context. Can you give me a timestamp from the video?

@zhaowu3193 2 жыл бұрын

Hi, thank you for the content. I would like to know how to choose the reference variable, for example, in your case IQ is taken as a reference when imputing job performance. Actually I have a lot of variables in my data set where some of them have a lot of missing values. How can I identify which variable to refer when I want to impute another one?

@mronkko 2 жыл бұрын

Your imputation model needs to use all variables and model all relationships that you have in your main model. In addition, you can use auxiliary variables (I have a video about that). The rule with auxiliary variables is that you should be liberal in including them. However, if your sample size is small you can start to get bias and computational difficulties if you include too many.

@couragee1 2 жыл бұрын

thank you

@mronkko 2 жыл бұрын

You are welcome

@Stelnice 3 жыл бұрын

Hi! In what types of research can I use pairwise/listwise deletion?

@mronkko 3 жыл бұрын

Deleting observations is never ideal if you only consider it from statistical perspective. However, simplicity is also a virtue in applied research (for example, you would be less likely to make mistakes if you keep things simple) and simple techniques should be used over complex ones if the difference in outcomes is small. Deleting observations is OK if a) your sample size is sufficient after deletion and b) your missing data are MCAR. I would not use pairwise deletion because using a different sample size for different analyses complicates things, but this depends on how the data are missing.

@Stelnice 3 жыл бұрын

got this, thank you!