I loved that you added those simulation results. That was very interesting and helped my understanding
@mronkko Жыл бұрын
You are welcome!
@MaverickRam Жыл бұрын
Very helpful and simplified explanation. Thanks for the video!
@mronkko Жыл бұрын
You are welcome!
@mohamadmatinhavaei98598 ай бұрын
Great job, but what about missingness that exist in a single column and also it's more than 50%? Is deep models like GAN would be useful for imputation?( In time-series prediction). Many thanks🙏
@mronkko8 ай бұрын
I assume GAN refers to some kind of neural network. Imputation works regardless of the amount of missing data, under these three conditions: 1) You are doing multiple imputation and not single imputation so that you can quantify the uncertainty introduced by the imputation process. 2) The imputation model contains all features of your data that are relevant for the analysis. 3) The missingness does not depend on the missing value itself. (i.e. data are MAR or MCAR) I do not really see what neural nets would add over throughfully developed imputation model but they are likely to increase sample size requirements.
@mohamadmatinhavaei98598 ай бұрын
@@mronkko "Hi again Mikko, I'm tackling a unique challenge with my dataset and believe your insights could greatly help. Could you share any contact info for more brief discussion? Thanks!"
@mronkko8 ай бұрын
@@mohamadmatinhavaei9859 I take consulting orders through instats.org/expert/mikko--rönkkö-829.
@ashayagarwal Жыл бұрын
I found your channel recently, and started liking your teaching approach. I want to ask if pairwise deletion is possible in regression y = X*beta + e, beta = inv(X'X)X'y. It is possible to calculate a pairwise version of X'X. Would love to hear your thoughts. Thx
@mronkko Жыл бұрын
In pairwise X'X you would need to adjust for sample size for each cell. But in principle you can estimate pairwise covariances of all the variables and then estimate regression from that covariance matrix. The resulting estimator should be consistent under MCAR but getting the standard errors right would require adjustments to the complete data standard error formulas. I have not seen any paper discussing how to do this and therefore I would not be comfortable using this approach. That being said, that I have not read something does not mean that it does not exist. I have just come to the conclusion that because FIML and multiple imputation exist already and I know how to do both, there is little reason for me to learn about other approaches to adjusting for missing data in estimation.
@zhaowu31932 жыл бұрын
Hi, thank you for the content. I would like to know how to choose the reference variable, for example, in your case IQ is taken as a reference when imputing job performance. Actually I have a lot of variables in my data set where some of them have a lot of missing values. How can I identify which variable to refer when I want to impute another one?
@mronkko2 жыл бұрын
Your imputation model needs to use all variables and model all relationships that you have in your main model. In addition, you can use auxiliary variables (I have a video about that). The rule with auxiliary variables is that you should be liberal in including them. However, if your sample size is small you can start to get bias and computational difficulties if you include too many.
@bandungmee Жыл бұрын
Hi It was mentioned that "the imputed data can only be used within the pooling testing and cannot be used for the model testing". Does it mean the data is only imputed/simulated for the purpose of analysing its reliability?. If it cannot be used for model testing, does it mean we still need to use the actual data and perform the deletion of missing data? Correct me if I'm wrong Thank you
@mronkko Жыл бұрын
I need more context. Can you give me a timestamp from the video?
@surya-td4dg3 жыл бұрын
Strong Finnish accent :).. Thank you for the awsome content
@mronkko3 жыл бұрын
I take the comment about my accent as a compliment ;) Funny thing: I used to live in the US and part of the accent was lost during that time. Even if that is about 20 years ago now, I still see my accent diminish when I spend a couple of days there. But now that we cannot travel the accent is as strong as ever!
@danielvoss24838 ай бұрын
Good Job 👍👍👍
@mronkko8 ай бұрын
Thanks!
@Stelnice3 жыл бұрын
Hi! In what types of research can I use pairwise/listwise deletion?
@mronkko3 жыл бұрын
Deleting observations is never ideal if you only consider it from statistical perspective. However, simplicity is also a virtue in applied research (for example, you would be less likely to make mistakes if you keep things simple) and simple techniques should be used over complex ones if the difference in outcomes is small. Deleting observations is OK if a) your sample size is sufficient after deletion and b) your missing data are MCAR. I would not use pairwise deletion because using a different sample size for different analyses complicates things, but this depends on how the data are missing.
@Stelnice3 жыл бұрын
got this, thank you!
@PavanKumar-ef1yy9 ай бұрын
Thanks a lot sir
@mronkko8 ай бұрын
Most welcome
@George702202 ай бұрын
Our teacher is focusing on us using KNN to impute data. This seems like a biased method like the traditional methods but I'm not 100% sure.