Multinomial Logistic Regression with R | 1. Data and Model

  Рет қаралды 31,685

Dr. Bharatendra Rai

Dr. Bharatendra Rai

Күн бұрын

Пікірлер: 73
@hdempers
@hdempers 3 жыл бұрын
Hi Dr. Bharatendra, I'm not sure why you split the dataset into "training" and "testing"?
@bkrai
@bkrai 3 жыл бұрын
When we test performance of a model, we use data that was not used when building the model. So keeping 'testing' data separate later on helps to assess performance of the model.
@Lilian.Chidinma.Nwafor
@Lilian.Chidinma.Nwafor 4 ай бұрын
You're just amazing sir. Thank you
@bkrai
@bkrai 4 ай бұрын
You are most welcome!
@maehovland6222
@maehovland6222 2 жыл бұрын
here's a copy of the code (with my variables instead of Dr. Rai's) in case y'all want to just copy and paste set.seed(45) ind = sample(2, nrow(crash), replace = TRUE, prob = c(0.6,0.4)) training
@bkrai
@bkrai 2 жыл бұрын
Thanks!
@pshani3512
@pshani3512 3 жыл бұрын
Very clear and helpful..as always...Thank you very much Sir...!
@bkrai
@bkrai 3 жыл бұрын
You're most welcome!
@drbanan4168
@drbanan4168 2 жыл бұрын
Thank you so much Dr you are very clear in your videos. I have questions please. After I make sure that the model is accurate using the training and testing datasets, do I use the whole data to generate the model for my project? Or do I report the testing/training results only? Another question, is the accuracy you tested is the same as goodness of fit? Is there a way that we get AIC for our models? Because I need to develop many models using different set of variables and reach for the best model using forward/backwards selection. Can I do that with multinom function? Thank you very much Dr.
@leonnorblad758
@leonnorblad758 3 жыл бұрын
Thank you the amazing video , but why did you use replacemnet in your samlpe? Don't you wanna run tests on other data than you used to create the model? Thank you!
@bkrai
@bkrai 3 жыл бұрын
See link below at 42:30 point for details: kzbin.info/www/bejne/iHPSm6RmeaZ0iZo
@prashu25925
@prashu25925 4 жыл бұрын
As always, amazing video sir...
@bkrai
@bkrai 4 жыл бұрын
Thanks for comments!
@navthavanesan4343
@navthavanesan4343 2 жыл бұрын
Dr Rai, can you do a video or advise how we build in cross-validation or bootstrapping into your nnet mutlinom regression model code?
@YatiChoudhary
@YatiChoudhary 3 жыл бұрын
Sir one doubt, what if our sample size/ observation is less than 100 say it is 80. Then can we use multi nominal logistic regression? (For qualitative study but wants to study the variables in detail) And is there a way to figure out how much train-test split on should do. Or if the model is possible without train-test split. Thank you 🙏
@bkrai
@bkrai 3 жыл бұрын
When response is a factor type variable, you can use this. It's ok if you have 80. For splits you can try different one such as 60:40 or 70:30.
@YatiChoudhary
@YatiChoudhary 3 жыл бұрын
@@bkrai Thank you so much, sir, for the patience in replying to all my questions. It has genuinely helped me in clarifying doubts and building my own model. I would be regular in learning more about statistical models from your KZbin videos. Thank You
@bkrai
@bkrai 3 жыл бұрын
Thanks Yati!
@adityaupadhyaya6441
@adityaupadhyaya6441 2 жыл бұрын
Do you have a video on multinomial mixed effects regression? Thank you!
@bkrai
@bkrai 2 жыл бұрын
I've added it to my list of future videos.
@wereskiryan
@wereskiryan 3 жыл бұрын
Fantastic video. Many thanks
@bkrai
@bkrai 3 жыл бұрын
Many thanks!
@manikandankrishnakumar5430
@manikandankrishnakumar5430 4 жыл бұрын
Thanks for the video
@bkrai
@bkrai 4 жыл бұрын
Welcome!
@fatikanabila2131
@fatikanabila2131 2 жыл бұрын
Hi, sir. Actually this is a very nice video, it so easy to understand with your explanation. but i wanna ask, how if i got a NaN values in standard erorrs summary? what should i do then? Thank you very much, I hope you see my comment and reply it.
@bkrai
@bkrai 2 жыл бұрын
Likely because error is too small.
@fatikanabila2131
@fatikanabila2131 2 жыл бұрын
@@bkrai Then how to handle it?
@lianjek5788
@lianjek5788 4 жыл бұрын
Hi sir, thanks for the video, it is very clear! Why do we need to factor the independent variable from integer? Is there any problem if my all variables i.e. IV and DV both are integers? Thanks.
@bkrai
@bkrai 4 жыл бұрын
For that you should use multiple linear regression.
@lianjek5788
@lianjek5788 4 жыл бұрын
@@bkrai Hi sir, sorry to bother you again, as like dependent variable i have to factor the independent variables as well to do the multinomial logisic regression? However, I factor both the DV and IV's and then after regression the only the categorial variable that I turned into factor….becoming insignificant. Other variables becomes highly significant. My multiple country datasets are showing the same type of findings… I am not sure am I in the right track? Please suggest me. Thanks.
@bkrai
@bkrai 4 жыл бұрын
Using regular regression or logistic regression depends more on what type of DV you have and not that much on type of IV.
@jolojololo3221
@jolojololo3221 3 жыл бұрын
Hi, I want to know if you can help me to find how to calculate R2 or pseudoR2 for my model
@thejuhulikal6290
@thejuhulikal6290 3 жыл бұрын
Sir if i fit multinomial logistic regression.It is giving results as 0 power b to some variables in some comparison, what that mean? what can I do to get rid off that!
@bkrai
@bkrai 3 жыл бұрын
Can you give a more specific example of what you are getting?
@Lilian.Chidinma.Nwafor
@Lilian.Chidinma.Nwafor 4 ай бұрын
Good morning Dr. I wish i can get an urgent attention because I'm in a tight corner right now. Your videos have been helpul in my journey to data field. Please is there an alternative for 0.6, 0.4 prob split because i am getting "error in sample.int(x, size, replace, prob ): incorrect number of probabilities ". I also tried 0.7, 0.3, it gives same error
@bkrai
@bkrai 4 ай бұрын
Thanks for comments! Check your code again, you should not get any error. Let me know if you still get error.
@Lilian.Chidinma.Nwafor
@Lilian.Chidinma.Nwafor 4 ай бұрын
@@bkrai thank you for your quick response Dr. I honestly don't know the problem today. I have a research article to submit and I used 5 likert scale survey method to generate my data, hoping to analyze with multinomial logistic regression but keep getting error. I hope I get it tonight otherwise I will just so a normal descriptive statistics or correlation. Feel so frustrated.
@bkrai
@bkrai 4 ай бұрын
We can do a quick zoom meeting where you can show me where you see a problem.
@Lilian.Chidinma.Nwafor
@Lilian.Chidinma.Nwafor 4 ай бұрын
@@bkrai please can you drop a link or email so I can contact you
@bkrai
@bkrai 4 ай бұрын
seemabharat@gmail.com
@adrianaroca9127
@adrianaroca9127 3 жыл бұрын
amazing video! it saved my life! - I am just worried that I get an error after I create mymodel saying : Error in `contrasts
@bkrai
@bkrai 3 жыл бұрын
Make sure the response variable shows as 'factor'.
@deprofundis3293
@deprofundis3293 4 жыл бұрын
Hi, I have a DV with 6 levels, although 2 of them only have a few observations each and likely need to be excluded. So, it'll probably have to be a DV with 4 levels. The problem is that I cannot partition my data because my sample size is too small. (This kind of data is extremely difficult to collect, and it took years to collect even what I did, so it's not possible to increase sample size). I saw that you recommended Random Forest to someone else with a similar DV, but Random Forest also requires partitioning of the data. Is there really no other way to do my analysis, given my rather small but extremely-hard-earned dataset?
@bkrai
@bkrai 4 жыл бұрын
You can also explore this for oversampling: kzbin.info/www/bejne/fqCVfJ-sr8-Ynck
@rgemsph7339
@rgemsph7339 3 жыл бұрын
Good day sir, what to do sir if my multinomial function converged after some iterations?
@bkrai
@bkrai 3 жыл бұрын
That should be fine.
@sudanmac4918
@sudanmac4918 4 жыл бұрын
Sir if we have 5 levels in dependent variable and having imbalanced data how to rectify it??
@dexterrity
@dexterrity 4 жыл бұрын
Are you able to combine any of the 5 levels of the dependent variable to make the data less imbalanced?
@bkrai
@bkrai 4 жыл бұрын
That could be a good solution where you can group low frequency classes in to one probably calling them 'other'.
@sallu.mandya1995
@sallu.mandya1995 4 жыл бұрын
hi sir , where i can get the medical data sets?
@bkrai
@bkrai 4 жыл бұрын
I've now added the link. Data: goo.gl/MYgpLX
@sallu.mandya1995
@sallu.mandya1995 4 жыл бұрын
@@bkrai dear sir , i mean different medical datasets
@sallu.mandya1995
@sallu.mandya1995 4 жыл бұрын
@@bkrai to work on
@patriciageletkova9772
@patriciageletkova9772 4 жыл бұрын
Hey sir, I don't understand why those numbers: 0,6 and 0,4. Could you help me, please? Thank you.
@bkrai
@bkrai 4 жыл бұрын
It means 60% of the data will be randomly assigned to training data and about 40% to testing data.
@patriciageletkova9772
@patriciageletkova9772 4 жыл бұрын
Dr. Bharatendra Rai, is it a necessary step? can I use this procedure if my dependent variable is only 0 and 1 (not 1, 2, 3)? or would it be better if I rewrote them to 1, 2? I have 225 observations. Thank you so much!
@bkrai
@bkrai 4 жыл бұрын
If you have only 0 and 1 situation, use this link: kzbin.info/www/bejne/d4fbaIqZZqiEbbs
@Zizuzot
@Zizuzot 3 жыл бұрын
Why do you take the number 222 for the seed?
@bkrai
@bkrai 3 жыл бұрын
It's for reproducibility so that anyone partitioning the data has same train and test data.
@Zizuzot
@Zizuzot 3 жыл бұрын
@@bkrai Yes I understand that, but I was wondering why specifically the number 222 :)
@bkrai
@bkrai 3 жыл бұрын
There is no significance attached to 222. It could have been any other number as in other videos.
@kapiljhalani20
@kapiljhalani20 4 жыл бұрын
Dear Sir, I have 10000 Patients data, each patient has one CSV which contain 50,000 Rows and 36 columns. This is a multi label classification problem of disease name y = (0, 1,2,3,4) which maps to some diease name in blood cancer. Dear Sir, is there possibility to build modal on such data ? If yes, I would be very very happy if you just share with me URL Link or video or just idea. would be enough. Thank you so much in advance. Kapil Jhalani from Munich, Germany !!
@bkrai
@bkrai 4 жыл бұрын
You can use this link: kzbin.info/www/bejne/mnvGnYF_g5KHhtE
@kapiljhalani20
@kapiljhalani20 4 жыл бұрын
@@bkrai Thank you for your reply. But I do not have single CSV but 10K CSV,'s each CSV has 50K rows and 36 columns where 50K rows represent one disease name. The example shown in the video was for one CSV. How to handle multiple CSV and multiple rows in machine learning classification ? I thank you again for your time and help. Looking forward to hearing from you. Kind regards, Kapil
@mohamedabdullah9061
@mohamedabdullah9061 4 жыл бұрын
sir in my project i had 207 depended variable what i do sir? pls help me
@bkrai
@bkrai 4 жыл бұрын
And how may independent variables?
@mohamedabdullah9061
@mohamedabdullah9061 4 жыл бұрын
@@bkrai 0ne indepented variable which is user id
@bkrai
@bkrai 4 жыл бұрын
I guess you may be referring to dependent variable as independent. Usually data have one dependent variables and several independent variables.
@mohamedabdullah9061
@mohamedabdullah9061 4 жыл бұрын
@@bkrai ya sir ..i have these type data what i do
@bkrai
@bkrai 4 жыл бұрын
If the response variable is of factor type, you should be able to use this method.
@sarbajitg
@sarbajitg 4 жыл бұрын
Please give the link to the source code.
Это было очень близко...
00:10
Аришнев
Рет қаралды 6 МЛН
Sigma baby, you've conquered soap! 😲😮‍💨 LeoNata family #shorts
00:37
Synyptas 4 | Арамызда бір сатқын бар ! | 4 Bolim
17:24
Logistic Regression with R: Categorical Response Variable at Two Levels (2018)
19:47
Multinomial Logistic Regression with R | 4. Prediction & Model Assessment
8:04
Logistic Regression in R
46:02
David Caughlin
Рет қаралды 11 М.
Logistic Regression in R, Clearly Explained!!!!
17:15
StatQuest with Josh Starmer
Рет қаралды 521 М.
6.2 Logistic Regression Models in R
6:23
MarinStatsLectures-R Programming & Statistics
Рет қаралды 43 М.
Multinomial Regression
12:05
Charles Ripley
Рет қаралды 358
R programming for beginners - statistic with R (t-test and linear regression) and dplyr and ggplot
15:49
Logistic Regression in R   Creating model and testing accuracy
12:02
Devin Bennett
Рет қаралды 20 М.
Logistic Regression [Simply explained]
14:22
DATAtab
Рет қаралды 190 М.
Это было очень близко...
00:10
Аришнев
Рет қаралды 6 МЛН