198 - Feature selection using Boruta in python

Рет қаралды 14,911

DigitalSreeni

Күн бұрын

Пікірлер: 45

@greendsnow 2 жыл бұрын

I can't get enough of these videos. And he knows that.

@DigitalSreeni 2 жыл бұрын

☺️

@evyatarcoco 3 жыл бұрын

Dear sir, your episodes are great! I like to learn about new tools and libraries. Keep teaching us! Thanks

@DigitalSreeni 3 жыл бұрын

Keep watching

@marco_6145 3 жыл бұрын

Really well explained, thanks from Australia

@DigitalSreeni 3 жыл бұрын

Glad it was helpful!

@channelforstream6196 3 жыл бұрын

I would also be interested in more traditional machine learning. Most work done by data scientists I’ve seen is just preprocessing and postprocessing anyway

@_olavodecarvalho 2 ай бұрын

Thank you for sharing your knowledge. One question: why are you scaling the features if boruta uses tree based models?

@bikashchandragupta6333 2 жыл бұрын

Hello Sir, I am following your tutorial but facing an Error, "ValueError: Please check your X and y variable. The providedestimator cannot be fitted to your data. Invalid Parameter format for seed expect int but value='RandomState(MT19937)'". Any help regarding the issue will be highly appreciated.

@joytishfromm8223 3 жыл бұрын

Really good video！Thank you so much！

@fassesweden 3 жыл бұрын

Thank you for this video! Great stuff!

@DigitalSreeni 3 жыл бұрын

Glad you enjoyed it!

@RadhakrishnanBL 3 жыл бұрын

Awesome video man. It really helped me.

@jeffabc1997 5 ай бұрын

nice content, thank you!

@mohammadhassan5240 3 жыл бұрын

best channel in youtube

@DigitalSreeni 3 жыл бұрын

Thanks

@manonathan5892 3 жыл бұрын

Thanks for the video. May I know how Boruta is different from Random Forest's feature importance? Are both same?

@lalitsingh5150 3 жыл бұрын

Thank you for another great video.

@DigitalSreeni 3 жыл бұрын

Thanks for watching!

@lalitsingh5150 3 жыл бұрын

@@DigitalSreeni Sir XGboost is giving error in Boruta...SVM is working fine ValueError: Please check your X and y variable. The provided estimator cannot be fitted to your data. Invalid Parameter format for seed expect int but value='RandomState(MT19937)'

@anjalisetiya2335 2 жыл бұрын

can this algorithm be applied for feature selection of mixed data type i.e. data has both boolean and continuous variables? Please let me know

@0xcalmaf976 3 жыл бұрын

Thanks a lot for sharing your knowledge with us! Do you consider making a tutorial with Brats or LITS challenges? We would love it:)

@DigitalSreeni 3 жыл бұрын

I plan on recording videos on multiclass semantic segmentation which will help you segment BRATS or LITS, slice by slice. I haven’t experimented with 3D Unet yet but I plan on doing it sometime the next few months.

@0xcalmaf976 3 жыл бұрын

@@DigitalSreeni That's great! But one thing, could you please choose the public dataset that would let us follow after you? Thanks!

@zakirshah7895 3 жыл бұрын

Hello Teacher nice video. I am doing classification using CNN. Is there any good way for feature selection as I am using hybrid model. The accuracy is low may be because of the redundant features by the two model.

@aditya_baser 2 жыл бұрын

There are 7 features with rank 1, how do you further rank the features between them?

@kannansingaravelu 3 жыл бұрын

Hi Sreeni. Thanks for the excellent videos. In many cases once the BorutaPy finished running, the tentative numbers printed out is different (less) than the actual runs. For example, in one of my use case with 196 features, the (100) iteration ended with 46 tentative features while the summary printed out only 28. Why is this different? How this is treated in Boruta?

@DigitalSreeni 3 жыл бұрын

Sorry, didn't notice that behavior. I hope the documentation provides some explanation.

@yogaforyou4213 3 жыл бұрын

Thank you so much for your video

@DigitalSreeni 3 жыл бұрын

Glad it was helpful!

@leamon9024 2 жыл бұрын

Hello sir, would you cover a feature selection technique which uses hierarchical or k-means clustering if possible? I saw scikit-learn seems to have this function(sklearn.cluster.FeatureAgglomeration), but few people talks about that. Thanks in advance.

@pablovaras9435 Жыл бұрын

great video

@awa8766 3 жыл бұрын

I'm curious to know if you could point out what the issue is. I have a dataset where my number of labels (y) is 55, and the number of independent variables (X) is 100. The dataframe total (if both X and Y combined) would be 55x101. I used a similar procedure to what you presented, and the only difference in datatype is that my y_train is int64 and my X_train is float64. I ran XGBoost and BorutaPy, but I am receiving an error when fitting the feature selector to X_train and y_train. The error I'm getting is: "Please check your X and y variable. The providedestimator cannot be fitted to your data. Invalid Parameter format for seed expect int but value='RandomState(MT19937)'" I can't seem to find an issue opened on either the BorutaPy or the XGBoost forums with the same error I'm getting. I'd appreciate your input!

@RadhakrishnanBL 3 жыл бұрын

Any help to solve this error "XGBoostError: Invalid Parameter format for seed expect int but value='RandomState(MT19937)'"

@sallahamine9467 2 жыл бұрын

why boruta algorithm does not work with ababoost catboost......

@MrTapan1994 Жыл бұрын

I tried testing with all the feature and with boruta selected feature, the accuracy doesn't changes, so the idea is to use less feature keeping the metric same ?

@DigitalSreeni Жыл бұрын

Selecting fewer features using a feature selection technique like Boruta has several potential benefits: Improved model performance: By selecting only the most relevant features, the model may be able to better distinguish between signal and noise in the data, leading to improved model performance. Reduced overfitting: Selecting a subset of relevant features can help to reduce the risk of overfitting, which occurs when the model becomes too complex and fits to noise in the data rather than the underlying patterns. Improved interpretability: By reducing the number of features, the resulting model may be more interpretable, making it easier to understand the factors that are driving the model's predictions. Reduced computational cost: By working with a smaller number of features, the computational cost of training and evaluating the model may be reduced, which can be particularly important in large datasets or in cases where real-time predictions are required.

@carlosleandrosilvadospraze4005 3 жыл бұрын

Professor, congratulations again for the video! I' m very grateful! I have a doubt. Could I use the feature selector at the end of a pre-trained CNN? (flatenned layer) I would like to reduce the dimensionality using a ML method.

@DigitalSreeni 3 жыл бұрын

Technically the output of a pre-trained CNN would be a bunch of features so I don't see why you cannot perform feature selection on those features. I should admit that I have never tried so I cannot guide you on what to expect.

@carlosleandrosilvadospraze4005 3 жыл бұрын

@@DigitalSreeni Thank you! 😊