Python Feature Selection: Chi Square to Select Features | Machine Learning

Python Feature Selection: Chi Square to Select Features | Machine Learning | Python

Рет қаралды 7,635

Stats Wire

Күн бұрын

Пікірлер: 22

@analauratapiavazquez4730 2 жыл бұрын

Hi, thank you for the amazing video. Where can I find the data base that you used?

@StatsWire 2 жыл бұрын

Thank you. You can find the notebook and dataset here: github.com/siddiquiamir/PySpark-Tutorial

@aksmalviyan8342 Жыл бұрын

Thanks for the video....

@StatsWire Жыл бұрын

You're welcome!

@user-lq1cs 2 жыл бұрын

hello, i have another question so i just found out that if you remove the 'random_state' in 'train_test_split', the output of 'f_score' will be different every time you run the code. i want it to stay consistent without the 'random_state' because i cant find a reason and explanation on why to use 'random_state' for my final project. i want the f_score to stay consistent, can i put kfold cross validation here? i hope you understand what im talking about here, thankyou!

@StatsWire 2 жыл бұрын

Hi, if you remove the parameter random_state then every time you split the data into train and test you will have different samples so the accuracy and f1 score will be different too but if you want same then use the parameter random_state

@leamon9024 2 жыл бұрын

Hi, thanks for your hard work. I have one question though. If my target variable is categorical variable, but my features are all numerical variables instead of the categorical ones like the dataset in this video. Can I still use chi2 to do the feature selection?

@StatsWire 2 жыл бұрын

No, you should have both the variable as categorical

@QornainAji 9 ай бұрын

You can convert your numerical variables into categorical data using the binning technique. But how much the bins that you should use depend on your data. Hope answer you.

@reanwithkimleng 6 күн бұрын

Hello sir, how to do feature selection or select features importance if our dataset has categories and quantitative example penguin data has length depth mass , categories are sex island and species.

@Thomas-mr2xx 2 жыл бұрын

Hello, I don't understand why the output to the chi2 function in your tutorial video has output as (p_values, chi2) array, but on the sklearn documentation and my local code the output is (chi2, p_values). Do you know why your code outputs like this?

@StatsWire 2 жыл бұрын

Output is the same, I am printing in a different order.

@mazharalamsiddiqui6904 2 жыл бұрын

Very nice

@StatsWire 2 жыл бұрын

Thank you

@putridisperindag6986 Жыл бұрын

Hello Mr.Amir Siddiqui, thank you for your nice explained video. May I ask, whats the difference between f_score and p_values? and which one we have to choose? Thanks in advance Mr.

@StatsWire Жыл бұрын

Hello, thank you for your kind words. F score and P values are totally different. An F-score is the harmonic mean of precision and recall values. A p-value is used in hypothesis testing to help you support or reject the null hypothesis. The p-value is the evidence against a null hypothesis.

@putridisperindag6986 Жыл бұрын

then Mr@@StatsWire for feature selection purposes, to select best feature should we choose based on top 10 highest p-values or top 10 highest fscore? Thanks Mr

@StatsWire Жыл бұрын

@@putridisperindag6986 In chi-square we have to select those features whose p-values are less than 0.05. Because we are doing testing of hypothesis.

@user-lq1cs 2 жыл бұрын

hello, thankyou for making the video, but i have a question here i followed every step exactly but im using my own dataset, the output that i got is really high number, for example the highest is 9.417441e-01 and the lowest is 1.134117e-01 do you know where i did wrong here? im so confused. keep up the good work, thankyou!

@StatsWire 2 жыл бұрын

Hello! These numbers are too small actually. You can convert these to numbers online.