Chi Square (Category) | Feature Selection

Chi Square (Category) | Feature Selection | Python

Рет қаралды 12,622

Hackers Realm

Күн бұрын

Пікірлер: 35

@HackersRealm Жыл бұрын

Hi everyone, I have mistakenly mentioned that pvalue should be greater than 0.5. It should be 0.05.

@DasSquadBureauhalt 2 жыл бұрын

Literally popped in my recommended 10 minutes ago. This is great, thank you!

@HackersRealm 2 жыл бұрын

Hope it's helpful!!!

@javidhesenov7611 Жыл бұрын

Nice explanation thanks. Before i was watching scipy chi2. It was a little bit diffucult. But turns out sklearn chi2 is pretty straightforward and well explained in the website. Thanks for introducing it.

@HackersRealm Жыл бұрын

Glad it was helpful!!!😄

@isaackodera9441 Жыл бұрын

Just what I was looking for.

@HackersRealm Жыл бұрын

😄

@kennylouries410 Ай бұрын

hello sir Excellent work....kindly share playlist link for previous video

@HackersRealm Ай бұрын

Thanks. Which video you're referring?

@pradeeppaladi8513 Жыл бұрын

Hi Ashwin, your explanation is very good. I liked it & In fact, I have subscribed to your channel as well.

@HackersRealm Жыл бұрын

Glad you liked the video!!! I will try my best to share more videos like this!!!

@sravanirekha 11 ай бұрын

Can we do label encoding if one of the features have more than 10 categories?

@HackersRealm 11 ай бұрын

Yes, you can

@owurakuagyekum3871 11 ай бұрын

Please what will you do next after finding the chi-values and p-values and plotting the graph? How will you use this to analyse the data and come to a conclusion??

@HackersRealm 11 ай бұрын

You can find the importance of the features and try to eliminate the rest if you have many features. Eg. 1000 features

@pradeeppaladi8513 Жыл бұрын

Hi Ashwin, I have a question. In the list of categorical variables that you have extracted, why have you added "Dependents" & "Credit_History". Are they not numerical variables? I just want to understand the basis behind adding them to the categorical variables list! An earliest response is highly appreciated.

@HackersRealm Жыл бұрын

if you check the data, dependents is category as it has a value 4+ which is a string and also credit history is a category similar to gender... only continuous values are considered for numerical

@pradeeppaladi8513 Жыл бұрын

@@HackersRealm Where can we find this dataset? Could you please share the link here?

@HackersRealm Жыл бұрын

@@pradeeppaladi8513 It's in the github repo and the link is in the description!!!

@kartikjha5704 Жыл бұрын

We need to label encode tge variables before applying this or it will work as it is ??

@HackersRealm Жыл бұрын

need to encode before applying

@DharmendraKumar-DS Жыл бұрын

Great explanation....can I use this technique with any dataset for regression?

@HackersRealm Жыл бұрын

This is mostly for categorial data...

@SWJ-MKhyathi 6 ай бұрын

Hi, it's a beneficial video. But how can we use this chi-square for malware detection in Android application? could you please reply me?

@HackersRealm 6 ай бұрын

could you please explain this with more detail like what are the attributes you're considering?

@joseluisbeltramone599 Жыл бұрын

¡Tremendous explanation! Thank you very much.

@HackersRealm Жыл бұрын

Glad you liked it!!!

@shuvamsingh4014 11 ай бұрын

my chi scores is giving nan values in array and the series attribute in pandas is also not working. could you please help me with my problem

@HackersRealm 11 ай бұрын

Are you using different dataset or same?

@shuvamsingh4014 11 ай бұрын

different dataset @@HackersRealm

@69nukeee Жыл бұрын

Thank you! This video was very clear and very insightful to check. I do only have a quick question which isn't still clear to me: what is the null hypothesis H0? Is it maybe the hypothesis of some correlation between the categorical variables against the y target variable? If this is the case, then only variables Credit_history and Education result into having a p-value lower than 0.05, and hence they mean something (H0 valid) while the other dependent categorical variables are to be dropped (as their p-values are higher than 0.05, hence rejecting H0). Did I got it correctly? Anyway, really nice job, keep it up ;)

@AnasAbid-zm1lk Жыл бұрын

The end result is correct, however the reasons aren't, I think you have misunderstood the Chi2 Independance test, let me reclarify it for you: - H0: the target and the dependant variable are independant - H1: the target and the dependant variable are depandant The p-value is linked to the test statistic Chi2 (measure of distance between observed and expected results), the greater Chi2, the greater the distance and therefore the less likely that the variables are independant (if they were independant, observed results and expected results would be close and Chi2 small). Also, the greater the Chi2, the smaller the p-value. Therefore, to sum it up, if the p-value is small (0.05 is a common threshold), it means the independance is unlikely and that we reject H0, hence only keeping variables which p-values are lower than 0.05, since they are dependant to the target (and therefore useful).

@69nukeee Жыл бұрын

@@AnasAbid-zm1lk Thank you for getting back at me!