A true teacher who is not only selflessly sharing his knowledge, but also thanking all of us people who subscribe to learn from his teachings. Unlike most of the professionals in work places who are doing all sorts of silly things to protect their little turfs and safeguard the very little knowledge they have. What a difference! Thank you sir.
@bkrai2 жыл бұрын
You are very welcome! 😊
@jasonlee58144 жыл бұрын
By far the best example on Boruta model on KZbin. Thank you Dr. Rai.
@bkrai4 жыл бұрын
You are very welcome!
@muhammadmurtala90456 жыл бұрын
Your videos have been hugely instrumental for me learning R. I don't have the words that can adequately convey the magnitude of my gratitude to you, but let me follow the norm and say a BIG THANK YOU for your effort. Thanks a lot, sir!
@bkrai6 жыл бұрын
Many thanks for your comments and feedback!
@netmarketer775 жыл бұрын
I learnt a lot from your videos Dr. Bharatendra. your helpful videos solve me a lot of problems and they are better than 3 hours lecture I attend in a class. Regards
@bkrai5 жыл бұрын
Thanks for your positive comments!
@sammy07223 жыл бұрын
Thank you sir. I used to do these things in PYTHON programming and these things used to consume too much time. I find R- software to be a time saver. I enjoy R more than PYTHON.
@bkrai3 жыл бұрын
Thanks for the feedback!
@piuslutakome6103 жыл бұрын
Hell Prof. Rai, I am trying to select features with the Boruta package but getting errors. My data has 75 observations and 32 variables. The first two columns are character type with Column 1 having CowID's and column 2 has Days in Milk (DIM ie d-7,d+9, and d+21). When I tried running the Boruta, I got the error below, "1. run of importance source... Error in ranger::ranger(data = x, dependent.variable.name = "shadow.Boruta.decision", : Error: Unsupported type of dependent variable." What could be the problem?
@aks10086 жыл бұрын
Sir are there different feature selection functions for different algorithms in r or bortuta will work for all algorithms...like linear, logistic regression, support vector machine, cluster analysis etc..
@bkrai6 жыл бұрын
For feature selection you can use a common method, but then develop your model using various methods using important features.
@aks10086 жыл бұрын
@@bkrai Sir and is there boruta () in python too...
@dr.naeemhaider47476 жыл бұрын
great video, I haven't seen any other channel with such detailed explanation with code. please keep making videos like these, you are such a good teacher. please make the video about time series forecasting and feature selection with the genetic algorithm.
@bkrai6 жыл бұрын
Thanks for comments and suggestion, I've added them to my list.
@db28854 жыл бұрын
I an building a model for two parameters A and B . Do I need to select the explanatory variables that are common to both parameters A and B ? Thank you in advance.
@bkrai4 жыл бұрын
Yes, A and B can be used in one column as independent variable. And then you will need data on explanatory variables for each row.
@shivamnigam96283 жыл бұрын
Sir, I have a task to predict energy rating transmission. But the problem is the dataset contains more than 70 variables and around 60 of them are categorical variables like energy transmission battery condition etc. Will it be wise to use boruta algo at the first hand , the data in several cols contains null and blank values ? And, how to convert 6 or more features into one feature?
@bkrai3 жыл бұрын
Yes, boruta will help with feature selection. For reducing number of features, you can use PCA. I'm also including link for missing data. PCA: kzbin.info/www/bejne/haDaeH6EnMmiraM Missing data: kzbin.info/www/bejne/d5-an4OCf5WZqck
@yashodharpathak1894 ай бұрын
Thanks for the video. Can you please advice how to carry out feature selection when there is a mixure of continuous and categorical independent variables?
@delt196 жыл бұрын
Congratulations on reaching the 10k mark. Looking forward to many more of your videos.
@bkrai6 жыл бұрын
Thanks!
@ashok66446 жыл бұрын
I've watched the video for "Feature Selection " which is commendable. Thank you for sharing your knowledge.
@bkrai6 жыл бұрын
Thanks for comments and feedback!
@tamtzeheuey3 жыл бұрын
Thank you Dr. Bharatendra Rai for providing very clear explanation of the feature selection (FS) in R. How can I modify a feature selection algorithm in order to enhance the performance of the FS algorithm?
@bkrai3 жыл бұрын
As you saw in this video, it uses random forest and shadow variables. You can try any variation on your own.
@lindanidube57143 жыл бұрын
Sorry... mine keeps on giving me Error: Object “SelectedIndices” not found upon generating ROC curves. How do I deal with this issue, please?
@bkrai3 жыл бұрын
which line in the video is causing this?
@lindanidube57143 жыл бұрын
@@bkrai thank you so much for your response... using your code, may you please also include ROC curves for Machine learning models. I was following your codes and they work perfectly fine. Also, please include a code to generate this on a single graph. Please🙏🏽 Machine learning algorithms: neural networks, decision tree, knn, SVM, random forest, Logistic regression , Naive Bayes and eXtreme gradient boosting. Your assistant will be greatly appreciated 😊
@osowatejiri5930 Жыл бұрын
Thank you so much for this video. Please what packages can i install in R to add these libraries
@bkrai Жыл бұрын
Use these packages: library(Boruta) library(mlbench) library(caret) library(randomForest)
@singhvaibhav0336 жыл бұрын
Sir, Amazing video one question though can we use boruta even when we have categorical variables in our data? If not, then please do feature selection video if there are mix of num & categorical variables in our dataset.
@bkrai6 жыл бұрын
This will work fine with categorical independent variables.
@singhvaibhav0336 жыл бұрын
Thank you Sir, I see not only your videos but all the comment replies as well (clears half my doubts from there), This is truly great of you, continue replying to important questions in comments. It helps all your students !
@debasmitadey2315 жыл бұрын
If some variables of my data are non linearly correlated, will this package work ? or it will reject all those variables as their linear correlation is not significant ?
@bkrai5 жыл бұрын
It takes care of non-linearity.
@ramp20116 жыл бұрын
Great video thank you. I have been using RandomForest varImp to do this. I am curious if Barota is a better package to identify the variable importance? Thank you..
@bkrai6 жыл бұрын
It has several features that are very useful for handling situations with the lots of variables. For example, instead of typing names of 33 variables, you can get the formula easily and just copy/paste.
@rameshsahni4 жыл бұрын
sir, I have a data of 140 band of wavelength with reflectance of a particular object, in that i want to select which band of wavelength is best suited to do the job. Is random forest model using boruta will work for the same?
@bkrai4 жыл бұрын
It didn't fully understand your data, so difficult to suggest a method.
@Gius3pp3K5 ай бұрын
Love the videos. I have learnt so much from your KZbin channel. Thanks. Is there a way to use feature selection for a Neural network, for a large dataset, using Keras in R Studio please? I have tried to use LIME, with no joy.
@bkrai5 ай бұрын
Thanks for comments! I'll look into Keras question.
@Gius3pp3K5 ай бұрын
@@bkrai that would be greatly appreciated. Thanks
@mustafacakir__2 жыл бұрын
Hi Mr. Rai. Firstly I would like to say thanks for your pure explanation of the topic. I tried the "boruta" function a few times with different seed numbers and got many different results for the important attributes. Is this function much sensitive to random seed numbers? Thanks.
@bkrai2 жыл бұрын
Yes, it's because of the randomness.
@abdulwaheedshaikh37456 жыл бұрын
Sir, U r very excellent mentor for R. I love You for the sake of R. I am Ph.D. scholar from Chennai. Institute name is : Crescent Institute of Sci & Tech, Chennai.
@bkrai6 жыл бұрын
Thanks for your comments!
@tsumigoonetilleke46286 жыл бұрын
Hi Bharatrendra, Thank you for the link. I watched the video, and it doesn't seem to have wrapper for the classification. Do you have any other videos with a wrapper? Thank you for your help. Cheers
@MHRAJAI Жыл бұрын
Thank you very much for your nice explanation. Can you please explain, what is the significance level to decide either variable is important or unimportant. If the Z score is significantly higher than the maximum Zscor of shadow features (MZSA) it is assigned as important feature. Can you pleas what is alpha here. Thank you 🙏
@dhanashreedeshpande71006 жыл бұрын
Wonderful video! From where are u searching such a different algorithms (such as this feature selection)? You can increase accuracy here further by using parameter tuning function in Random Forest. If possible please add video of web server log preprocessing also.
@bkrai6 жыл бұрын
Thanks for the feedback and suggestion. I've added your suggestion to my list.
@zaafirc3696 жыл бұрын
I love your videos. Thanks for the amazing work that you do. Do you offer online r courses? I really want to master machine learning using r programming
@bkrai6 жыл бұрын
Thanks for your feedback! My online courses are limited to UMass-Dartmouth at this time. But you can learn many machine learning using r methods from this link: kzbin.info/aero/PL34t5iLfZddu8M0jd7pjSVUjvjBOBdYZ1
@hanivlog7743 жыл бұрын
Great video and detailed information. Thumbs up for your hard work! Can we use boruta algorithm with other classification models such as SVM, NB, or MLP, etc?
@ChungChingZhou6 жыл бұрын
Very clear articulation! thank you!
@bkrai6 жыл бұрын
Thanks for comments!
@deprofundis32934 жыл бұрын
Thank you so much for this video; I just subscribed! But I have a question; you'd recommended this method for me on one of the multinomial videos (and I *cannot* thank you enough for the personalized feedback...you wouldn't believe how long I've been slogging through this on my own, desperate for more personalized guidance...). I was really impressed with the "boruta" performance; its tendency to keep ALL important variables actually makes the most sense for my data (but I fear that might not jive with regression). But the cross-validation is where things fall apart. I'd read that I shouldn't use RandomForest on too small of a dataset because there isn't enough data to partition, and my dataset is admittedly small. And when running this today, my accuracy during the final steps on the test dataset was awful.*** I was wondering: (1) Would you recommend an alternative that doesn't require partitioning of data, e.g., a LOOCV (leave-one-out cross-validation) regularization method, like glmnet? (2) Would it be valid for me to use the results of boruta to help inform model-building with AICc? Honestly, much of this machine learning is quite new to me. Small sample size issues are common in my field, but it seems like most people just use AICc to build their models, w/o external validation. I want to be rigorous enough to avoid Type I error, but I also need something sensitive enough to avoid Type II error on such a small dataset. ***It probably doesn't help that I have very low membership in a couple of the groups for my categorical response variable (I also plan to run it w/o them, but then my sample size is even smaller). I'm going to take a look at your video on class imbalance, but I'd be so grateful for more guidance about resolving my particular situation. I've been trying to find an appropriate approach for over a year now...
@yousif_alyousifi10 ай бұрын
Can Boruta deal with the missingness? which better to use all data or the train data? I
@bkrai10 ай бұрын
Yes, it should work fine. Also I used all the data as response was available for all rows. If the test data doesn't have response column, then only train data can be used.
@sebismo6 жыл бұрын
Thanks for the video!!, Very usefull!. It works just for binary classification problems? What about for regressions problems?. Categorical features its also supported by this method?
@bkrai6 жыл бұрын
Would work fine with categorical independent variables. Classification and regression would both work fine. For regression, when using test data, make sure you use root mean square error as confusion matrix is not valid there.
@sudiptapaul29194 жыл бұрын
Useful indeed! Lot's of struggle ends here. Thank you Professor.
@bkrai4 жыл бұрын
You're very welcome!
@ousmanelom62743 ай бұрын
Thank we can use this for categorical variable what test boruta use
@bkraiАй бұрын
You may refer to the following for more details: cran.r-project.org/web/packages/Boruta/Boruta.pdf
@ravindarmadishetty7366 жыл бұрын
wonderful video sir....Please clarify my doubt if we are applying logistic regression can we avoid calculating WOE and IV to identify the strength of the attributes and considering them into the model? Also can we avoid data reduction technique when huge number of attributes?
@bkrai6 жыл бұрын
Thanks for comments! What is WOE and IV that you referred to? If you would not like to remove any attributs where there are huge number of attributes, you can do PCA.
@ravindarmadishetty7366 жыл бұрын
Thank you for replying sir....WOE stands Weight of evidence, IV stands Information value(like p-value criteria). We use them for continues variables and these splitting into bins. These are the keywords generally we use in credit scoring when applying logistic regression. These are useful in knowing the strength and identifying important variables in model. Now after looking this video....just doubted can we use directly proposed algorithm in this video instead calculating WOE and IV for important variables. These WOE and IV only considered for logistic regression and not for other classification models
@bkrai6 жыл бұрын
Note that logistic regression is used only when your response variable is categorical. Usually we try more methods and look at confusion matrix and accuracy. Apart from logistic regression, you can also try random forest and see which method gives better result.
@ravindarmadishetty7366 жыл бұрын
sir, give me your suggestion, i have 578 attributes and i have applied Boruta for featute selection. But, it is taking for 15 mins time for every importance running source..... So in this case, how to overcome this issue? I guess Factor or PCA can apply for data reduction Please advise
@sumeet15094 жыл бұрын
Thanks so much for this awesome tutorial. I am running Random Forests (RF) for academic research purposes. We really are not concerned in the first instance if we have a large number of features. We are more concerned about what to do with collinearity between some of the features (absolute r = 0.5 to 0.9). Some literature suggests that we can include correlated features in RF. What would you recommend? Can Boruta help with this issue? I notice that there is much collinearity in the Sonar dataset, especially amongst adjacent variables
@bkrai4 жыл бұрын
With machine learning models such as RF, collinearity is not an issue.
@vijaysrirambhatla38744 жыл бұрын
Thanks for the explanation about Boruta, can you please provide a reference to get more information about the method.
@bkrai4 жыл бұрын
Here is the link: cran.r-project.org/web/packages/Boruta/Boruta.pdf
@joujoumilor28985 жыл бұрын
thanks sir for sharing this amazing video with us, I have a question about the data should they be normalized or not ?
@bkrai5 жыл бұрын
Normalizing has no negative effects. So I would say whenever there is a doubt, then definitely normalize.
@joujoumilor28985 жыл бұрын
thank you so much :
@bemuzeeqtv5 жыл бұрын
Great video, I am following the steps but I would like to know how I can create my class for my dataset (if possible in R). Because it seems the Class was already defined in the dataset you used (M,R). I would appreciate any help. Thank you.
@bkrai5 жыл бұрын
You can create a new column for that.
@bemuzeeqtv5 жыл бұрын
@@bkrai Thank you. I did that
@surbhiagrawal39514 жыл бұрын
sir these boruta will work with any model where feature selection is required ?or is it mailnly with random forest ?.. I suppose we generally used it with classiifcation problem ?
@bkrai4 жыл бұрын
Since random forest is used along with the idea of shadow attributes, it should work well in many situations. Also since random forest is used, should work both for classification and regression.
@surbhiagrawal39514 жыл бұрын
Also one more question , do the response variable needs to be categorical..? if it is continuous response variable , then also it will work ?
@bkrai4 жыл бұрын
Yes
@deepakbalajiselvam80675 жыл бұрын
Great video with crystal clear explanation, many thanks
@bkrai5 жыл бұрын
Thanks for comments!
@kolozsie5 жыл бұрын
Thanks for the explanation. Tried to run the algorithm on radiomic features but it didn't found any important attributes ('No attributes deemed important'). How's that possible?
@bkrai5 жыл бұрын
I used maxRuns of 500. You can increase it to a higher value if your data needs more runs.
@melodicguitarist6 жыл бұрын
Very well explained Sir. Can you please let me know what set.seed () function do, and after watching your videos Can I practice by downloading the dataset from Kaggle ? Please let me know about this.
@bkrai6 жыл бұрын
set.seed() helps to obtain repeatable results. Without this two people may run the same code but may get different results. I've provided link to data in the description area of the video. You can easily get the file there. Practicing with some Kaggle dataset is also a good idea.
@anjaliacharya95066 жыл бұрын
I install all packages mentioned in the video, I get result from print(boruta) but for plot(boruta) i get error message : Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' is a list, but does not have components 'x' and 'y' what can i do, do i need some other packages?
@bkrai6 жыл бұрын
Check earlier steps and also look at the structure of your data.
@aks10086 жыл бұрын
Dear Sir can we apply boruta () function for other algorithms e.g Linear Regression, Logistic Regression, Support Vector Machine etc along with Random Forest. Thanks
@bkrai6 жыл бұрын
Once you select important features, the final model can be developed using other algorithms.
@mondalsandip4 жыл бұрын
Hello sir, I have two datasets. One is the predictor matrix (species abundance data) and another one is response variables (Soil data) matrix. Can we use boruta in this type of situation? I want to see what are the soil factors influences my species community. I ran this in R, but showing problem like this: Error in ranger::ranger(data = x, dependent.variable.name = "shadow.Boruta.decision", : Error: Competing risks not supported yet. Use status=1 for events and status=0 for censoring. Thanks in advance
@bkrai4 жыл бұрын
What was the code that you used?
@mondalsandip4 жыл бұрын
@@bkrai Boruta( x, y, doTrace=2, maxRuns=100) Where x is the species abundance matrix Y is the soil data matrix
@bharathjc47006 жыл бұрын
Great Presentation sir. How is different from RFE recursive feature elimination?
@bharathjc47006 жыл бұрын
Good to see u sir.
@bkrai6 жыл бұрын
Thanks!
@bkrai6 жыл бұрын
In Boruta, importance significantly larger than those of shadow variables is used. In RFE, random forest with smallest error based on iterative removal of least important variables is used. Both methods are effective in handling feature selection.
@bharathjc47006 жыл бұрын
thank a ton for your valuable inputs sir.
@abiani0074 жыл бұрын
How can I use this feature selection for regression? Shall I use the same technique as you have shown for regression purposes? Plz confirm.
@bkrai4 жыл бұрын
With regression you can use statistical significance for deciding which independent variables to keep in the model.
@abiani0074 жыл бұрын
Dr. Bharatendra Rai and sir plz share a video on FastICA
@kellyng54744 жыл бұрын
This video has been very helpful, however when running the confusion matrix i got an error as follows: > p = predict(rf70, test) > p > confusionMatrix(p, test$yVar) Error: `data` and `reference` should be factors with the same levels. All of my variables (x and y) are continuous variables. Any idea how to solve this issue? Thanks!
@bkrai4 жыл бұрын
For response as continuous variable, you don't need confusion matrix. Probably you can try this: kzbin.info/www/bejne/lWTbfoaYfsmYaKs
@kellyng54744 жыл бұрын
@@bkrai Thank you very much! I will check out the video :)
@bkrai4 жыл бұрын
You are welcome!
@marcelofalchetti6 жыл бұрын
Your videos are really really useful and well explained, thanks for your work!
@bkrai6 жыл бұрын
Thanks for the feedback!
@kushxmen5 жыл бұрын
This video is great. Thank you so much. I have a question on Feature Tools (Python). Is there an equivalent package in r for doing the same? I want to create new features from the predictor variables I have before I perform Feature Selection.
@bkrai5 жыл бұрын
I'm not aware of any equivalent package in r at this time.
@kushxmen5 жыл бұрын
@@bkrai Thanks for the reply. I think as the R community we may need to develop one in future. Will embark on further research. Thanks again for the reply.
@outinthebeach6 жыл бұрын
Thank you so much for your videos. You explain and articulate the steps so well by keeping it simple. It really helps me understand these models easily. Could you please help with or put up a complete lifecycle / steps right up to AUC for some common models like Random Forest, SVM, Logistic Regression...etc. that can be used as a template for model performance and model improvement. If possible. Thank you so much again.
@bkrai6 жыл бұрын
Thanks for your feedback! You can find many of these methods at this link: kzbin.info/aero/PL34t5iLfZddu8M0jd7pjSVUjvjBOBdYZ1 I'll continue to add more.
@jean-lucfanny42103 жыл бұрын
Could you use the same Boruta algorithm with a numeric response instead of the class response (binary)? Since RF could be run using class response or numeric response. Please, guide me,
@bkrai3 жыл бұрын
Thanks for the suggestion, I've added it to my list.
@jean-lucfanny42103 жыл бұрын
You could use Numeric as well. Your videos are life-changing. I really appreciate them.
@bkrai3 жыл бұрын
Thanks!
@faridehalasti37523 жыл бұрын
many thanks, Boruta can cove Surv? how do it works? I need an Example.
@playerjc99692 жыл бұрын
Thanks very much for the awesome tutorial! It really helped me understand a bit of boruta. I have a question, I am currently running boruta and before doing tentative fix, i found three attributes that are tentative and when I did tentative fix, the other two got confirmed important and the other one is unimportant. The thing is, the unimportant attribute had higher normHits percentage than one of the confirmed attributes. The rejected attribute had 0.5306122 and the confirmed attribute had 0.4693878. Is this okay?
@bkrai2 жыл бұрын
I would suggest run classification model with and without those variables and see if it helps to improve model performance or not. If performance improves keep it otherwise exclude.
@RahilKhowaja Жыл бұрын
Great. You made it very simple for us
@bkrai Жыл бұрын
Thanks for comments!
@sumeet15094 жыл бұрын
Thanks for the well explained tutorial. I used the algorithm on the same sample data set - Sonar. The results indicate v43,v44,v45,v46,v47,v48 and v49 as one set of confirmed features among many others. However, I also noticed that amongst these seven features there is much collinearity , with correlation coefficient 'r' ranging from 0.5 to 0.87. Is there any specification in the code that can be used to filter out highly correlated features ?
@bkrai4 жыл бұрын
Note that random forest is not impacted by collinearity. However, if you were using regression, then it is definitely a problem and needs to be addressed.
@devawratvidhate90936 жыл бұрын
Gr8 Video sir Thanks Could you suggest any webscraping material in R ...From begginer to pro level
@bkrai6 жыл бұрын
Thanks for the suggestion, I've added it to my list.
@DnyaneshwarPanchaldsp3 жыл бұрын
can we extract features from text ......such as noun, verb, pronoun, etc ... for selecting features for aspect sentiment analysis
@AliHoolash6 жыл бұрын
Thank you for this nice tutorial. In your example data set, all the variables (except the target variable) are numeric variables. Will Boruta also work on a mix of categorical and numeric variables?
@bkrai6 жыл бұрын
Yes, it will work with both type of variables.
@AliHoolash6 жыл бұрын
Thanks for your prompt reply.
@bkrai6 жыл бұрын
welcome!
@raghavendras53316 жыл бұрын
No words to explain how informative and specific your videos are. great thanks for that. For ali's question to add in, we should do one-hot encoding before running Boruta on categorical variables?
@bkrai6 жыл бұрын
Thanks for comments!
@thourayaaouledmessaoud92236 жыл бұрын
Thanks for this well- explained video. I just have one question could we use Boruta with clustering algorithms such as ( K-means, Dbscan ...) does it works??
@bkrai6 жыл бұрын
Clustering is unsupervised learning method. There is no response variable.
@kapilgupta87223 жыл бұрын
Thank you Professor for such a nice explanation. I have a query that for a data where we have multicolinarity issue, Lasso removes the correlated variables, but Boruta doesn't remove them and it shows all the correlated variables as important. If this is the case then which one is more preferable? Can you please comment on such situation?
@bkrai3 жыл бұрын
You can think of this as non-parametric as there are no statistical assumptions for this machine learning based method.
@kapilgupta87223 жыл бұрын
@@bkrai Thanks professor for the comment. If we want to show the comparison of logistic and Random forest. Then how will we comment about the variables and their significance? Or can you please suggest how to show the comparison between these two methods?
@bkrai3 жыл бұрын
For comparison you can use confusion matrix, accuracy, etc. I would suggest you review this playlist for more: kzbin.info/www/bejne/oqakeneepMmEmrM
@kapilgupta87223 жыл бұрын
@@bkrai Thank you Professor :). WIll go through this lecture.
@SaranathenArun11E2146 жыл бұрын
Sir, thanks, if we have 20 discrete variables, how to find the variable importance?
@bkrai6 жыл бұрын
This method will work fine with discrete variables.
@theahmads7590 Жыл бұрын
hello sir huge fan thanks for your effort, i tried installing Boruta in my r package but r says there is no such package,can you help me
@bkrai Жыл бұрын
Run these lines: install.packages('Boruta') library(Boruta)
@abiani0074 жыл бұрын
Can you send some links for the ensembling of models output in R for regression purposes? Thanks in advance.
@bkrai3 жыл бұрын
Here is the link: kzbin.info/www/bejne/nnSvfICfj6eHqLc
@tsumigoonetilleke46286 жыл бұрын
Hi, Thank you for your video. It's great. Do you have a video or any help document or training for implement a wrapper Naive Bayes Classifier? Can you help?
@bkrai6 жыл бұрын
Try this: kzbin.info/www/bejne/iH3NhISamMxrd68
@tsumigoonetilleke46286 жыл бұрын
Thank you Bharatrendra. I'll try that. Thanks again
@InfiniteSEOHallam6 жыл бұрын
Hi Bharatrendra, Thank you for the link. I watched the video, and it doesn't seem to have wrapper for the classification. Do you have any other videos with a wrapper? Than you for your help. Cheers
@poojamahesh85943 жыл бұрын
sir, i need to find the accuracy of the model after finding the variable importance...please tell me how to do sir..please
@bkrai3 жыл бұрын
It's covered line-34 onward.
@asifhayat41633 жыл бұрын
Sir kindly make a vedio on boruta implemented on raster data Having dem and satelite imagery at the same time. Need ur help and waiting for your coding
@bkrai3 жыл бұрын
Thanks for suggestion!
@21bagong4 жыл бұрын
Dear Prof Rai, Would you mind explaining how boruta algorithm and function works?
@bkrai4 жыл бұрын
It creates shadow variables for each independent variables and randomizes their values. Then the algorithm proceeds to check whether or not these variables exhibit better performance compared to their corresponding shadow variables.
@21bagong4 жыл бұрын
@@bkrai Thank you Dr Rai
@akshayagrawal78486 жыл бұрын
I love your videos! Very helpful, got me through my deep learning grad class.
@bkrai6 жыл бұрын
Thanks for comments and feedback!
@SaranathenArun11E2146 жыл бұрын
happy to see you sir...thanks for all the videos and am great fan of you
@bkrai6 жыл бұрын
I appreciate your feedback and support!
@vijaysrirambhatla38744 жыл бұрын
How can I see the complete list of important variables ? thanks
@flamboyantperson59366 жыл бұрын
Great Sir. First time ever I am seeing you on video. I am glad to see you Sir. Your video is video I have been waiting for a long time for your new videos. Thank You so much Sir.
@bkrai6 жыл бұрын
Thanks for your comments !
@Sandeep-sl7lp3 жыл бұрын
Sir do we need to standardise data or we can give non standardised data to boruta
@bkrai3 жыл бұрын
It's not needed here.
@Sandeep-sl7lp3 жыл бұрын
@@bkrai thanks a lot
@bkrai3 жыл бұрын
You are welcome!
@choubeyrajj4 жыл бұрын
After searching for a perfect example, my search ends here. Thanks for sharing this. While working on 1lac records, the process becomes really time taking. Is there any way to process it faster or any other method to apply this on large data sets.
@bkrai4 жыл бұрын
Probably you can take a 30-40% sample and then apply feature selection to save time.
@DeevyankarAgarwal4 жыл бұрын
Really it was a nice video, thank you Sir. It is taking too much time as my dataset contains 50000 rows and 57 columns. Can u suggest any algorithm for the same purpose that will take less time?
@bkrai4 жыл бұрын
You can refer to this link and among 10 algorithms, xgboost is very quick: kzbin.info/aero/PL34t5iLfZddsQ0NzMFszGduj3jE8UFm4O
@kamilchosta55262 жыл бұрын
Amazing stuff. Thank you!
@bkrai2 жыл бұрын
You are welcome!
@sharjeelarain68973 жыл бұрын
Sir your video is awesome please make videos on fb and KZbin analytics
@bkrai3 жыл бұрын
Thanks for the suggestion! For KZbin and Twitter refer to: kzbin.info/www/bejne/ZqnWfmODl7eDfac
@harunbakirci17814 жыл бұрын
hi teacher first of all thank you very much your tutorials I have a little question variable importances are being diffrent in diffrent algorithms for example tv is better than radio in random forest algorithm but at xgboost algorithm radio is better than tv whichone is correct ? is it dependent that I use algorithm.
@harunbakirci17814 жыл бұрын
firstly I have used boruta. It gave same result with randomforest. But then I have used xgboost algorithm. It has given diffrent result from randomforest and neuralnet.
@bkrai4 жыл бұрын
Yes, it can depend on the algorithm used and their rank on the list can change little bit.
@harunbakirci17814 жыл бұрын
@@bkrai okey my teacher thank you very much 👍👍👍👍
@harunbakirci17814 жыл бұрын
should we use boruta package or varImp or importance(randmforet(model)) or garson function in neuralnet which one should we use
@bkrai4 жыл бұрын
Yes, that's ok as they are different methods.
@deepakpanigrahi96014 жыл бұрын
Can we also have some video (great learning tools) on Model deployment please
@bkrai4 жыл бұрын
Thanks for the suggestion, I've added it to my list.
@rajlaxmikati11755 жыл бұрын
Actually my dataset has so many NAs when I am using boruta I am getting can't process NAs in input error how to handle that?
@bkrai5 жыл бұрын
You can use this link for missing values: kzbin.info/www/bejne/d5-an4OCf5WZqck
@rajlaxmikati11755 жыл бұрын
@@bkrai thank you so much
@bkrai5 жыл бұрын
welcome!
@rajlaxmikati11755 жыл бұрын
@@bkrai why working with very huge dataset soo I am getting error saying it can't process that much mb of data .. what should I do please suggest me some idea
@bkrai5 жыл бұрын
In the 3rd RStudio window if you have too many big dataset, you can remove them by clicking on broom symbol. That will free up space.
@syuhadaazamil41115 жыл бұрын
i want to ask, if my datasets is to large, why when i run it takes longer times? hope you can reply as soon as possible to help me
@bkrai5 жыл бұрын
That's natural. Larger dataset consume more computing resources that's why it takes more time.
@syuhadaazamil41115 жыл бұрын
@@bkrai Tq Dr. Do you think should or shouldn't i reduce the data in the dataset? because i want to make it faster.
@syuhadaazamil41115 жыл бұрын
@@bkrai actually i dont understand how to make a graph using boruta in R becuase my dataset is linear regression.
@bkrai5 жыл бұрын
Sometimes when data size is too big, samples of sufficient size can be taken to reduce processing time.
@bkrai5 жыл бұрын
It should still work as random forest method works for both categorical and numeric variable.
@sm.melbaraj16823 жыл бұрын
Can we use Boruta for data with categorical variables?
@bkrai3 жыл бұрын
Yes
@sm.melbaraj16823 жыл бұрын
@@bkrai okay. Thankyou! Your lectures are very understandable since you explain it very simply.
@bkrai3 жыл бұрын
Thanks for comments! My today’s live lecture will be at 7:30pm India time.
@hans42235 жыл бұрын
Simply awesome and excellent
@bkrai5 жыл бұрын
Thanks for feedback and comments!
@dennismontoro73126 жыл бұрын
was your data already scaled or normalized before you run Boruta?
@bkrai6 жыл бұрын
No, it was not scaled or normalized.
@liangqueenie1516 жыл бұрын
Great video! Thank you Sir
@bkrai6 жыл бұрын
Thanks for the feedback!
@muralidhara20636 жыл бұрын
Thank you for sharing your thoughts and congratulations..
@bkrai6 жыл бұрын
Thanks for your comments!
@Asdfasdffff6 жыл бұрын
Really like your videos 👍🏼. Please, could you made video about tenserflow in R? Something like image recognition, or working on GPU
@bkrai6 жыл бұрын
Thanks for your feedback! You can find TensorFlow some of the videos here: kzbin.info/aero/PL34t5iLfZddtC6LqEfalIBhQGSZX77bOn
@subaganesh5525 жыл бұрын
Sir, can you explain stepwise feature selection algorithm in r???
@alipaloda95714 жыл бұрын
Error: `data` and `reference` should be factors with the same levels. I got this error can you please help is that we have to convert every field as factor
@bkrai4 жыл бұрын
Which line in the video are you referring to?
@alipaloda95714 жыл бұрын
@@bkrai I was trying to create confusion matrix sir At that point of the I got this error
@bkrai4 жыл бұрын
I would suggest you check your data using str.
@Ketonen084 жыл бұрын
Thank you for making videos. How would you set up the Boruta-function if it was for regression? For example for a frequency model (poisson) or severity model (gamma). I added some code if it is easier to answer with an example. The code below is what I tried, but I don't know if it is correct. # Example install.packages("insuranceData") library("insuranceData") data(dataOhlsson) ds_in
@bkrai4 жыл бұрын
Note that the algorithm automatically creates shadow attributes, you do not have to create them.
@equbalmustafa6 жыл бұрын
Thankyou sir for your videos, which are helping us alot....
@bkrai6 жыл бұрын
Thanks for your feedback!
@poojamahesh85943 жыл бұрын
fea
@alamgirsarder13315 жыл бұрын
Can Any one help about how to glmnet use for feature selection ?
@bkrai5 жыл бұрын
Here is the link: kzbin.info/www/bejne/lWTbfoaYfsmYaKs
@SandeepKumar-me6qr6 жыл бұрын
One of the Great video Sir which is really helpful.
@bkrai6 жыл бұрын
Thanks for feedback!
@Adityasharma-zb7no6 жыл бұрын
Sir can you please teach us Factor Analysis as well
@bkrai6 жыл бұрын
Thanks for the suggestion! I've added it to my list.
@poojamahesh85943 жыл бұрын
fea
@redarabie70986 жыл бұрын
thank you Sir for this usefell video
@bkrai6 жыл бұрын
Thanks for the comments!
@thejll2 жыл бұрын
I do not understand how a variable van be less important than its shadow?
@bkrai2 жыл бұрын
It can happen with a variable that doesn't contribute much to the model.
@kevingeorgejohn90945 ай бұрын
Sir I have a doubt..does the run of importance source takes hours or even days to complete for (..., doTrace=2, maxRuns=500)?.. It's taking a very large amount of time for each run of importance source
@bkrai5 ай бұрын
It depends on how big your data is in addition to computing power.
@kevingeorgejohn90945 ай бұрын
@@bkrai and if I give no maxRuns arguments? Will there be any default no of run of importance sources?
@kevingeorgejohn90945 ай бұрын
@@bkrai after its execution..why does it show " no features deemed important and unimportant and all are tentative"?
@bkrai5 ай бұрын
It needs more runs
@kevingeorgejohn90945 ай бұрын
@@bkrai how much would be needed minimum? Would 20 be enough? Or 15?