Statistical Thinking - Chi Square Test - Feature Selection

  Рет қаралды 19,703

AIEngineering

AIEngineering

Күн бұрын

Пікірлер: 61
@hssp1534
@hssp1534 2 жыл бұрын
what would be our feature selection if we are using mixture of continuous and categorical variables to predict categorical variable
@lavendermlay5731
@lavendermlay5731 8 ай бұрын
Hello, this was very helpful video . If you have done a bayesian analysis please provide the video link
@akhileshlekurwale364
@akhileshlekurwale364 4 жыл бұрын
Is this good practice to perform statistical test on all column available for modelling how any trigger point to consider this.
@AIEngineeringLife
@AIEngineeringLife 4 жыл бұрын
Akhilesh.. it is very subjective. I would say it is good to investigate each variable to see how it impacts the model. How exhaustive it depends. Most of these tests can be automated
@rameshthamizhselvan2458
@rameshthamizhselvan2458 4 жыл бұрын
I have one doubt instead using the stats package we can use the chisquare directly from sklearn library rit?
@AIEngineeringLife
@AIEngineeringLife 4 жыл бұрын
Yes you can.. since I did not use sklearn pipeline I used stats one
@raviirla459
@raviirla459 4 жыл бұрын
Awesome vedios with great content. loved it.. :).. waiting more vedios on feature engineering.
@chiragbilimoria3275
@chiragbilimoria3275 4 жыл бұрын
This is what I was looking for sir...Thanks a lot for this video
@TheSocialDrone
@TheSocialDrone 3 жыл бұрын
You explained it very well! Thanks for producing and sharing this tutorial.
@tryingtolearn3299
@tryingtolearn3299 4 жыл бұрын
Thank you very much for the videos. I had two questions. When we have categorical variables, can we use Pearson correlation to get the order of significance, such as paperless billing is more significant than seniorcitizen? or do we need to only use chi- squared test? Another question- if I have few categorical variables with multiple categories, should we first create dummy variables and then run chi squared test on each of the dummy variables against the target variable?
@AIEngineeringLife
@AIEngineeringLife 4 жыл бұрын
Strength of relationship between 2 categorical value can be measured with Cramers V test. You can check my cramers V video in case if you have not already one hot encoding might not be required. you just create a contingency table based on number of categories
@tryingtolearn3299
@tryingtolearn3299 4 жыл бұрын
@@AIEngineeringLife Thank you very much for the quick reply.
@junaidasghar8462
@junaidasghar8462 4 жыл бұрын
can we do CHI-sqaure between two categorical data when there is no target variable (gender and paperlessbilling )i.e un-supervised data ?
@AIEngineeringLife
@AIEngineeringLife 4 жыл бұрын
Yes Junaid you can. It can be any 2 categorical variables
@shubhamchoudhary5461
@shubhamchoudhary5461 3 жыл бұрын
@@AIEngineeringLife ..in that way , can we find out multicolinearilty between 2 categorical features??
@rsinh3792
@rsinh3792 3 жыл бұрын
Sir reviewer has asked me this question I don't know how to address it, can you please guide me "Use some statistical significant test such as T-test or ANOVA to prove you validate the proposed diagnostic model on patients and quality improvements of your method". I have two datasets. Dataset 1 was used to train the model and dataset 2 was used to validate the trained model. I have trained the ML model deployed it and Validated it on new data and presented the results. Actually, I have understood the question. Shall I apply the statistical test between the performance metrics of trained model results and validation results? Please help me, sir.
@kodjigarpp
@kodjigarpp 4 жыл бұрын
Best way to have a productive lunch, thank you! I have a question, did you chose chi_square because the degree of freedom is 1 (for churn x gender for example). If it would have been DOF>30, what would you have chosen?
@AIEngineeringLife
@AIEngineeringLife 4 жыл бұрын
Chi Square can be used with higher cardinality categories as well. But if there are lot of low tail categories it is better to group them and feed it else low tails can distort the output stats
@kodjigarpp
@kodjigarpp 4 жыл бұрын
@@AIEngineeringLife thank you for your answer, what do you call low tail?
@AIEngineeringLife
@AIEngineeringLife 4 жыл бұрын
When you say have a feature with 30 categories you might see last few might have only few observation to make a strong conclusion. These are low tail ones
@DoYouHaveAName1
@DoYouHaveAName1 Жыл бұрын
Thank you very much, you are a great teacher
@simransharma1070
@simransharma1070 3 жыл бұрын
How do we get to know as to which variables out of the given data are to be compared using chi square test?
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
You can compare all categorical variables if we do not have much background of business or every dependent variable with independent
@GladstonLeon
@GladstonLeon 3 жыл бұрын
Null Hypo : There is no relation between the variables 13:30 we fail to reject the Null hypo..s , the gender col is not significant with Churn columm ! How is it possible ???
@subhadipghosh8194
@subhadipghosh8194 3 жыл бұрын
What if there are more number of categories in a feature, like say 15-20. What to use in such cases?
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
You can still use it. But if you have very low ocurance of some categories then it might not give correct outcome
@subhadipghosh8194
@subhadipghosh8194 3 жыл бұрын
@@AIEngineeringLife Thanks for your reply
@mukeshkund4465
@mukeshkund4465 4 жыл бұрын
Best way to start your morning !!:)
@hardikraja
@hardikraja 4 жыл бұрын
Awesome...
@akashsinha9938
@akashsinha9938 4 жыл бұрын
Hi sir, thanks for posting video. I had a question that to check the significance between two categorical we use chi-square test, for significance between two continuous we use t-test. How can we check significance between independent categorical variable and dependent continuous variable or vice versa?
@devpratap
@devpratap 4 жыл бұрын
You can use Regression after converting your categorial variable to numeric values. If you're looking for statistical test then ANOVA would suffice. This will help: www.researchgate.net/post/What_if_an_independentvariable_is_categorical_and_dependent_variables_iscontinuous_variable_can_anyone_suggest_a_suitable_test
@akashsinha9938
@akashsinha9938 4 жыл бұрын
@@devpratap Thanks for your answer. But ANOVA will work in case of independent categorical and continuous dependent variable. what test in case of continuous independent and categorical dependent. Is there any test for such case or we need to convert the categorical dependent to numerical?
@Vk-gv3sc
@Vk-gv3sc 4 жыл бұрын
What if I have a dataset with multiple data? Should I change it to 1NF? How can i do it in python any resources plz
@AIEngineeringLife
@AIEngineeringLife 4 жыл бұрын
Vijay.. Can you elaborate as the dataset I have shown in video has multiple data. Typically while testing we test for individual column with target first during data analysis phase Instead of doing column by column manually we can create functions and iterate through multiple columns
@SahilSingh-cu7rh
@SahilSingh-cu7rh 4 жыл бұрын
Hello sir, What types of project should we do as a fresher to get a job. And also to what extent one should know python?
@AIEngineeringLife
@AIEngineeringLife 4 жыл бұрын
You can check my video on project. This is sample approaches where you can try out something similar kzbin.info/aero/PL3N9eeOlCrP7RBbok898Yk0SsUw1O9urP Learn python to a extent you can do data science work. You need to have good understanding of pandas, numpy, scikit and matplot packages
@SahilSingh-cu7rh
@SahilSingh-cu7rh 4 жыл бұрын
Will surely watch and work on your recommended approach. Thank You
@manishsharma2211
@manishsharma2211 3 жыл бұрын
Epic tut
@venkateshkatepally6110
@venkateshkatepally6110 3 жыл бұрын
Will be helpful if colab link is shared for all the videos .Thanks
@AIEngineeringLife
@AIEngineeringLife 3 жыл бұрын
You can find repo details of my courses here - github.com/srivatsan88/ The one you are seeing is part of my applied stats course
@poojashah5095
@poojashah5095 4 жыл бұрын
Hello sir.. Thank you for posting this video. But sir I have some doubts regarding this chi square test.. Is it possible to use for numerical dataset as I have numerical dataset not categorical data..? I'm working on lung cancer dataset in which we have all numerical data ... Can you please post one video for selecting best features using chu square test for numerical data? It would be a great help if u do and explain.
@AIEngineeringLife
@AIEngineeringLife 4 жыл бұрын
Chi Square test if for categorical but if it is numeric will pearson or spearman correlation will not work?. Or you can use any other feature elimination method like forward selection or others
@poojashah5095
@poojashah5095 4 жыл бұрын
@@AIEngineeringLife so chi square test is not possible for numerical data ? But in this beginning of your video you said that in next video will show how to use chi square test for numerical dataset...
@poojashah5095
@poojashah5095 4 жыл бұрын
@@AIEngineeringLife even is it not possible to use for continuous data ?
@AIEngineeringLife
@AIEngineeringLife 4 жыл бұрын
@@poojashah5095 .. What you can do it you can bucket numerical data and run chi square. This is for continuous data where bucket makes sense like age, salary bucket and others. If I had said chi square for pure continuous then I made a mistake but i do have video for continuous data using regular correlation
@poojashah5095
@poojashah5095 4 жыл бұрын
@@AIEngineeringLife can you please provide that video link ?
@madhukerbillapati3944
@madhukerbillapati3944 4 жыл бұрын
Good one!!
@rushikeshbulbule8120
@rushikeshbulbule8120 4 жыл бұрын
Nice ✌
@farahalaa2362
@farahalaa2362 4 жыл бұрын
Can you give me the code please ?
@AIEngineeringLife
@AIEngineeringLife 4 жыл бұрын
it is in my git repo here - github.com/srivatsan88/KZbinLI/blob/master/statistics/Statistical_Thinking_Feature_Selection_Categorical_Variables.ipynb
@nickw22689
@nickw22689 3 жыл бұрын
Fantastic video, you just helped me with a major assignment and saved me a lot of stress. Buy you a beer if I could! How can I access your github repo?
@uttamagrahari
@uttamagrahari 2 жыл бұрын
Thank you sir
@salvadorrojas7969
@salvadorrojas7969 3 жыл бұрын
Gran explicación
@erinwolf1563
@erinwolf1563 4 жыл бұрын
Thank you😊😊
@swapnanilsharma
@swapnanilsharma 4 жыл бұрын
Why you choose no relation in NULL hypothesis. Why not NULL hypothesis is like: there is some relationship between 2 cat vaiables
@AIEngineeringLife
@AIEngineeringLife 4 жыл бұрын
That is the why chi square test is defined. Each test when was hypothesized was framed on some hypothesis. The null hypothesis of the Chi-Square test is that no relationship exists on the categorical variables in the population but other tests might have different null hypothesis
@swapnanilsharma
@swapnanilsharma 4 жыл бұрын
@@AIEngineeringLife Thanks for your quick reply. Suppose for an observation, the p-value is very small and less than the significant value, and Cramer's V score is also very less(due to the high sample size). What can we conclude from this?
@AIEngineeringLife
@AIEngineeringLife 4 жыл бұрын
@@swapnanilsharma .. This can be your input to feature selection process of ML model as well to see if this variable is important in modelling the target variable. Again one thing is this is statistical test that gives you a probability of correlation but you can always override it if you feel this variable is important based on domain understanding
Statistical Thinking - Cramer's V Test - Categorical association
17:52
Statistical Thinking - Imputing Missing Values
29:28
AIEngineering
Рет қаралды 13 М.
Quando A Diferença De Altura É Muito Grande 😲😂
00:12
Mari Maria
Рет қаралды 45 МЛН
How to choose an appropriate statistical test
18:36
TileStats
Рет қаралды 152 М.
Chi Squared Test using R programming
16:59
R Programming 101
Рет қаралды 41 М.
Chi Square (Category) | Feature Selection | Python
10:44
Hackers Realm
Рет қаралды 12 М.
How To Know Which Statistical Test To Use For Hypothesis Testing
19:54
Amour Learning
Рет қаралды 815 М.
Feature selection in machine learning | Full course
46:41
Data Science with Marco
Рет қаралды 30 М.
Feature Store for Machine Learning - MLOps
20:43
AIEngineering
Рет қаралды 25 М.
Python for Data Analysis: Chi-Squared Tests
17:32
DataDaft
Рет қаралды 39 М.
T-test, ANOVA and Chi Squared test made easy.
15:07
Global Health with Greg Martin
Рет қаралды 325 М.
Chi Squared Test
10:45
Piers Support
Рет қаралды 464 М.