what would be our feature selection if we are using mixture of continuous and categorical variables to predict categorical variable
@lavendermlay57318 ай бұрын
Hello, this was very helpful video . If you have done a bayesian analysis please provide the video link
@akhileshlekurwale3644 жыл бұрын
Is this good practice to perform statistical test on all column available for modelling how any trigger point to consider this.
@AIEngineeringLife4 жыл бұрын
Akhilesh.. it is very subjective. I would say it is good to investigate each variable to see how it impacts the model. How exhaustive it depends. Most of these tests can be automated
@rameshthamizhselvan24584 жыл бұрын
I have one doubt instead using the stats package we can use the chisquare directly from sklearn library rit?
@AIEngineeringLife4 жыл бұрын
Yes you can.. since I did not use sklearn pipeline I used stats one
@raviirla4594 жыл бұрын
Awesome vedios with great content. loved it.. :).. waiting more vedios on feature engineering.
@chiragbilimoria32754 жыл бұрын
This is what I was looking for sir...Thanks a lot for this video
@TheSocialDrone3 жыл бұрын
You explained it very well! Thanks for producing and sharing this tutorial.
@tryingtolearn32994 жыл бұрын
Thank you very much for the videos. I had two questions. When we have categorical variables, can we use Pearson correlation to get the order of significance, such as paperless billing is more significant than seniorcitizen? or do we need to only use chi- squared test? Another question- if I have few categorical variables with multiple categories, should we first create dummy variables and then run chi squared test on each of the dummy variables against the target variable?
@AIEngineeringLife4 жыл бұрын
Strength of relationship between 2 categorical value can be measured with Cramers V test. You can check my cramers V video in case if you have not already one hot encoding might not be required. you just create a contingency table based on number of categories
@tryingtolearn32994 жыл бұрын
@@AIEngineeringLife Thank you very much for the quick reply.
@junaidasghar84624 жыл бұрын
can we do CHI-sqaure between two categorical data when there is no target variable (gender and paperlessbilling )i.e un-supervised data ?
@AIEngineeringLife4 жыл бұрын
Yes Junaid you can. It can be any 2 categorical variables
@shubhamchoudhary54613 жыл бұрын
@@AIEngineeringLife ..in that way , can we find out multicolinearilty between 2 categorical features??
@rsinh37923 жыл бұрын
Sir reviewer has asked me this question I don't know how to address it, can you please guide me "Use some statistical significant test such as T-test or ANOVA to prove you validate the proposed diagnostic model on patients and quality improvements of your method". I have two datasets. Dataset 1 was used to train the model and dataset 2 was used to validate the trained model. I have trained the ML model deployed it and Validated it on new data and presented the results. Actually, I have understood the question. Shall I apply the statistical test between the performance metrics of trained model results and validation results? Please help me, sir.
@kodjigarpp4 жыл бұрын
Best way to have a productive lunch, thank you! I have a question, did you chose chi_square because the degree of freedom is 1 (for churn x gender for example). If it would have been DOF>30, what would you have chosen?
@AIEngineeringLife4 жыл бұрын
Chi Square can be used with higher cardinality categories as well. But if there are lot of low tail categories it is better to group them and feed it else low tails can distort the output stats
@kodjigarpp4 жыл бұрын
@@AIEngineeringLife thank you for your answer, what do you call low tail?
@AIEngineeringLife4 жыл бұрын
When you say have a feature with 30 categories you might see last few might have only few observation to make a strong conclusion. These are low tail ones
@DoYouHaveAName1 Жыл бұрын
Thank you very much, you are a great teacher
@simransharma10703 жыл бұрын
How do we get to know as to which variables out of the given data are to be compared using chi square test?
@AIEngineeringLife3 жыл бұрын
You can compare all categorical variables if we do not have much background of business or every dependent variable with independent
@GladstonLeon3 жыл бұрын
Null Hypo : There is no relation between the variables 13:30 we fail to reject the Null hypo..s , the gender col is not significant with Churn columm ! How is it possible ???
@subhadipghosh81943 жыл бұрын
What if there are more number of categories in a feature, like say 15-20. What to use in such cases?
@AIEngineeringLife3 жыл бұрын
You can still use it. But if you have very low ocurance of some categories then it might not give correct outcome
@subhadipghosh81943 жыл бұрын
@@AIEngineeringLife Thanks for your reply
@mukeshkund44654 жыл бұрын
Best way to start your morning !!:)
@hardikraja4 жыл бұрын
Awesome...
@akashsinha99384 жыл бұрын
Hi sir, thanks for posting video. I had a question that to check the significance between two categorical we use chi-square test, for significance between two continuous we use t-test. How can we check significance between independent categorical variable and dependent continuous variable or vice versa?
@devpratap4 жыл бұрын
You can use Regression after converting your categorial variable to numeric values. If you're looking for statistical test then ANOVA would suffice. This will help: www.researchgate.net/post/What_if_an_independentvariable_is_categorical_and_dependent_variables_iscontinuous_variable_can_anyone_suggest_a_suitable_test
@akashsinha99384 жыл бұрын
@@devpratap Thanks for your answer. But ANOVA will work in case of independent categorical and continuous dependent variable. what test in case of continuous independent and categorical dependent. Is there any test for such case or we need to convert the categorical dependent to numerical?
@Vk-gv3sc4 жыл бұрын
What if I have a dataset with multiple data? Should I change it to 1NF? How can i do it in python any resources plz
@AIEngineeringLife4 жыл бұрын
Vijay.. Can you elaborate as the dataset I have shown in video has multiple data. Typically while testing we test for individual column with target first during data analysis phase Instead of doing column by column manually we can create functions and iterate through multiple columns
@SahilSingh-cu7rh4 жыл бұрын
Hello sir, What types of project should we do as a fresher to get a job. And also to what extent one should know python?
@AIEngineeringLife4 жыл бұрын
You can check my video on project. This is sample approaches where you can try out something similar kzbin.info/aero/PL3N9eeOlCrP7RBbok898Yk0SsUw1O9urP Learn python to a extent you can do data science work. You need to have good understanding of pandas, numpy, scikit and matplot packages
@SahilSingh-cu7rh4 жыл бұрын
Will surely watch and work on your recommended approach. Thank You
@manishsharma22113 жыл бұрын
Epic tut
@venkateshkatepally61103 жыл бұрын
Will be helpful if colab link is shared for all the videos .Thanks
@AIEngineeringLife3 жыл бұрын
You can find repo details of my courses here - github.com/srivatsan88/ The one you are seeing is part of my applied stats course
@poojashah50954 жыл бұрын
Hello sir.. Thank you for posting this video. But sir I have some doubts regarding this chi square test.. Is it possible to use for numerical dataset as I have numerical dataset not categorical data..? I'm working on lung cancer dataset in which we have all numerical data ... Can you please post one video for selecting best features using chu square test for numerical data? It would be a great help if u do and explain.
@AIEngineeringLife4 жыл бұрын
Chi Square test if for categorical but if it is numeric will pearson or spearman correlation will not work?. Or you can use any other feature elimination method like forward selection or others
@poojashah50954 жыл бұрын
@@AIEngineeringLife so chi square test is not possible for numerical data ? But in this beginning of your video you said that in next video will show how to use chi square test for numerical dataset...
@poojashah50954 жыл бұрын
@@AIEngineeringLife even is it not possible to use for continuous data ?
@AIEngineeringLife4 жыл бұрын
@@poojashah5095 .. What you can do it you can bucket numerical data and run chi square. This is for continuous data where bucket makes sense like age, salary bucket and others. If I had said chi square for pure continuous then I made a mistake but i do have video for continuous data using regular correlation
@poojashah50954 жыл бұрын
@@AIEngineeringLife can you please provide that video link ?
@madhukerbillapati39444 жыл бұрын
Good one!!
@rushikeshbulbule81204 жыл бұрын
Nice ✌
@farahalaa23624 жыл бұрын
Can you give me the code please ?
@AIEngineeringLife4 жыл бұрын
it is in my git repo here - github.com/srivatsan88/KZbinLI/blob/master/statistics/Statistical_Thinking_Feature_Selection_Categorical_Variables.ipynb
@nickw226893 жыл бұрын
Fantastic video, you just helped me with a major assignment and saved me a lot of stress. Buy you a beer if I could! How can I access your github repo?
@uttamagrahari2 жыл бұрын
Thank you sir
@salvadorrojas79693 жыл бұрын
Gran explicación
@erinwolf15634 жыл бұрын
Thank you😊😊
@swapnanilsharma4 жыл бұрын
Why you choose no relation in NULL hypothesis. Why not NULL hypothesis is like: there is some relationship between 2 cat vaiables
@AIEngineeringLife4 жыл бұрын
That is the why chi square test is defined. Each test when was hypothesized was framed on some hypothesis. The null hypothesis of the Chi-Square test is that no relationship exists on the categorical variables in the population but other tests might have different null hypothesis
@swapnanilsharma4 жыл бұрын
@@AIEngineeringLife Thanks for your quick reply. Suppose for an observation, the p-value is very small and less than the significant value, and Cramer's V score is also very less(due to the high sample size). What can we conclude from this?
@AIEngineeringLife4 жыл бұрын
@@swapnanilsharma .. This can be your input to feature selection process of ML model as well to see if this variable is important in modelling the target variable. Again one thing is this is statistical test that gives you a probability of correlation but you can always override it if you feel this variable is important based on domain understanding