I enjoyed this and found it instructive to follow along. Appreciate your quick pacing, yet somehow unhurried, teaching style.
@paulushimawan51963 жыл бұрын
Yes that's the reason I like this video. Nice teaching style although less depth. But much better than those courses out there that just give the notebook and we have to run by ourselves without explaining one by one.
@onyedikachiadigwe89954 жыл бұрын
can you show how we will predict which staff will leave from the database
@kazimrazatalpur72284 жыл бұрын
Amazing very informative, can't wait to see your upcoming tutorials.
@shyamkishore6232 Жыл бұрын
How to make conclusions out of the entire coding process for presentation? Like what are the factors under the columns affect the most Attriction
@QUIZ_WHIZ_SMART4 жыл бұрын
This is a good straight forward model training for beginners. But the model is weak. Especially in the case of the problem, if you want to make employee attrition you want to know who will quit the job and maybe contact him and the opposite way. Maybe it will be better to choose another metric like Recall or F1 score.
@ComputerSciencecompsci1123584 жыл бұрын
You can never have enough metrics.
@cloudbaud77944 жыл бұрын
this has a Recall of 15%...howz that any good?? and in this case, the cost of an employee leaving unpredicted can never be same as falsely predicting someone who ends up staying. So is F1 that much more value-added???
@SANJIVRAI66933 жыл бұрын
@@cloudbaud7794 the goal will be to reduce the False Negative as much as possible, so the better Recall the good
@2lauren54 Жыл бұрын
At df.corr() , why do I get..? (FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning. df.corr() ). and an error on every code after that
@sukruthms79843 жыл бұрын
Thank you for the appropriate explanation
@ComputerSciencecompsci1123583 жыл бұрын
Glad you enjoyed the video!
@SaiCharan-zi1zu3 жыл бұрын
Hii this video is at it's best. But I need a conclusion like on which the attrition is more dependent and how are we going to find out the main factor that's affecting the attrition the most?
@jimalyajenkins91333 жыл бұрын
Solid tutorial. How do I use this though?
@shilpashreshtaАй бұрын
How did you decide to use RandomForestClassifier? Why not go for Logistic after dropping redundant features? I am new to this hence confused. Please guide.
@rohittiwari16104 жыл бұрын
Simple and easy code. Nice explanation. Thank you so much
@shashankbafna28674 жыл бұрын
Fantastic approach. Can you also make a video explaining how can we use this model? like what after creating this prediction?
@erickwang58504 жыл бұрын
I also want to know, like can we the significance of each feature, and how to do that
@SANJIVRAI66933 жыл бұрын
@@erickwang5850 yes you can check the important features by their score of impact
@robiparvez9 ай бұрын
where can I find the dataset??
@michaelmullings2 жыл бұрын
Question - How do i predict in a current employee with attrit? how do I now test which employees are now on their way out the door and what factors do i look for that show this
@NitinBhavvsarPoems4 жыл бұрын
You are a pro boss !! Good to see your video. Query - How do you validate the prediction results ? What are the ways and types to validate the same? Your thoughts on classification reports for the same ?
@jananisridhar51753 ай бұрын
Can u share the data set of IBM
@SantoshMaurya-is4bp Жыл бұрын
It's very nice video,it's really helpful me
@ajayantony41444 жыл бұрын
Instead of dropping the Age column can’t we change the index to one for attrition? Just asking, cause I am new to Data Science and curious.
@SANJIVRAI66933 жыл бұрын
yes you can
@manideep44864 жыл бұрын
With this model, how can I check which employee is more likely to attrite?
@idowukila59924 жыл бұрын
Great question. I was wondering, too. Have you by any chance gotten an answer to this?
@QUIZ_WHIZ_SMART4 жыл бұрын
@@idowukila5992 Well this should be the Recall, but in this tutorial it was very weak
@SANJIVRAI66933 жыл бұрын
Any employee who will be predicted as Yes by the model will be most likely to leave - since its a Binary Classification you only get Yes or No result
@sonyishutin9949 Жыл бұрын
@@SANJIVRAI6693 how to see the employee who predicted to leave? I'm still learning
@mehtabrosul6909 Жыл бұрын
in last forest.fit(x_train,y_train) in shows string cannot convert into float why so???
@RahulRautela57972 жыл бұрын
Can we also find the specific reason of leaving, the variable with the highest value?
@abdulalimbaig32864 жыл бұрын
where is the link to the data set?
@nbddesigns76203 жыл бұрын
Getting error at randomforestclassifier using sklearn ? How to solve this
@jeevarajahjeevaratnam62244 жыл бұрын
I can't run seaborn, keep getting modulenotfounderror: no module named 'resource' . I'm using windows 10.
@debarati273 жыл бұрын
how do we show the decision tree?
@allammihay4 жыл бұрын
Hallo, I have follow your step in medium but in the last step when I want to show importance feature there is eror " Valuue eror = Array must all be same length". I don"t understand with this problem, could you help me?
@sherifelgazar40892 жыл бұрын
Friend, can you put the dataset, to apply
@rahulahuja14124 жыл бұрын
Informative. Thanks. But would've been better had you standardized the data and then given an analysis of the data.
@ComputerSciencecompsci1123584 жыл бұрын
Thanks for your opinion!
@cloudbaud77944 жыл бұрын
standardized in what way?
@SANJIVRAI66933 жыл бұрын
@@cloudbaud7794 standard scalar meaning the data set normalized in certain range for all values - mostly from -1 to 1 --lowest values to -1 and highest to 1
@cloudbaud77944 жыл бұрын
can someone please explain how we get 80% accuracy just by guessing "No" all the time need to understand the math (1233-237)/1233
@SANJIVRAI66933 жыл бұрын
if you say that attrition is NO to all the values you will be correct 80% of the time is what he means
@AnkitBhargava3 жыл бұрын
I think he is just trying to pint out that there are too many NOs (not left the company) compared to Yes. So many that even without any modeling or scaling if you simply guess (like a coin toss) that the employee has NOT left, you would be right 80% of the time
@ItAintNecessarilySo2 жыл бұрын
It should really be (# did not leave) / total employees = 1233 / (1233 + 237) which is approx 84%. This is the inverse or reciprocal of what the creator originally wrote.
@yashwantkumarverma14802 жыл бұрын
can we fix range of x axis ?? cuz I do have many data points on x axis
@prasunprakash22974 жыл бұрын
how to calculate employee performance-department wise?
@AnkitBhargava3 жыл бұрын
Thank you for the walkthrough - really helpful. Question: early on in the analysis, you plotted bar graph for Age with Attrition as the hue. But we dont know if Age is correlated with other attribute or attributes so what would be the point of the graph? Age alone does not explain the attrition rate. Why look at that at all?
@nbddesigns76203 жыл бұрын
When we are fit the x_train and Y_train getting value error : Input contains Nan
@fabfitmom2 жыл бұрын
Your columns have null values. clean up the data to make sure all rows have data in al columns. His step where he tests this is : #Get a count of empty values for each col df.isna().sum() the above should give you 0 value for all fields and the below should give you a False for theX_train & Y_train to work : # check for any missing or null values df.isnull().values.any()
@RohanTayal4 жыл бұрын
Thank you for the amazing explanation but i have a query, why did you use label encoder and not one hot encoder to convert non - numeric data into numeric data?
@SANJIVRAI66933 жыл бұрын
you can use either of it
@harikanttiwari53262 жыл бұрын
i am getting error after #use random forest classifier from sklearn.ensemble import RandomForestClassifier forest=RandomForestClassifier(n_estimators = 10 , criterion = 'entropy', random_state = 0) forest.fit(X_train, Y_train) and the error is could not convert string to float: 'Non-Travel'
@dannymuzata46332 жыл бұрын
Before you come to random forest classifier , you must ensure that you have converted all your categorical data to numeric data. You wont have that error.
@vijaysolanki74973 жыл бұрын
why you won't use oversampling in this unbalanced data (yes-237,no-1233)?
@grahamg45292 жыл бұрын
Yes would really help with the FN and recall score
@surender63203 жыл бұрын
Can you please share the code, if you don't mind
@namanagrawal49684 жыл бұрын
where is the dataset used?
@DaisyBhullar274 жыл бұрын
Kaggle
@soumyasrm4 жыл бұрын
Can you please share GitHub link of this project
@chowadagod4 жыл бұрын
Lovely video but please do projects which involves data cleaning , especially handling text data such .what you do is lovely and very much appreciate sir but it's bit too plain and majority of work in data science is DATA CLEANING .so please in upcoming videos focus on this aspect .thank you sir
@mrgz9993 жыл бұрын
I agree, tutorials on (i) Data Cleaning (ii) Merging of two files for two different years to do combined analysis
@codewiththink3032 жыл бұрын
please give me hrm dataset
@HumptyDumptyActual4 жыл бұрын
Your model by random guessing gives you 80% accuracy. But by machine learning it gives 86% accuracy. This generally makes a case against ML since -+6% accurate results are not that far off from random guess. So it is better to go with guessing than ML. Now that's my opinion. Others are welcome to share theirs as well.
@QUIZ_WHIZ_SMART4 жыл бұрын
The building of this model was very straight forward. Of course, if you make it for some project, you will make some Feature engineering steps before start with the training. That the model is weak you can see on the TP. They were only 9 to FP of 45. The Recall is very bad, which means, the whole model is not usable. But with some Feature engineering and maybe a better algorithm, you will receive great results!
@furkanozbudak44404 жыл бұрын
Guessing gives 80% accuracy only on this particular dataset. New gathered data can have %80 attritions = "Yes", which will decrease your guess's accuracy to 20%. Then your guess would be way worse than flipping a coin and predicting based on the tail or head.
@grahamg45292 жыл бұрын
@@furkanozbudak4440 Exactly I fell into the trap of relying on accuracy when working with an unbalanced dataset. It can be very misleading for a beginner, but I’ve learnt precision and recall are actually more important in identifying the target data
@being_aspirang4 жыл бұрын
this data sets is imbalanced, so we should use different approach to do project...
@pavel8224 жыл бұрын
where can I get this data?
@q_14 жыл бұрын
Kaggle.com, IBM HR Analytics Employee Attrition
@alexanderthegreat96314 жыл бұрын
I keep getting a value error: ValueError Traceback (most recent call last) in () 1 from sklearn.model_selection import train_test_split ----> 2 X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = None, random_state = 0) /usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_consistent_length(*arrays) 210 if len(uniques) > 1: 211 raise ValueError("Found input variables with inconsistent numbers of" --> 212 " samples: %r" % [int(l) for l in lengths]) 213 214 ValueError: Found input variables with inconsistent numbers of samples: [1, 1470] Can someone help?
@SANJIVRAI66933 жыл бұрын
test_size needs to be defined - how much split will you give for train/test from whole dataset
@mrgz9993 жыл бұрын
@@SANJIVRAI6693 why we selected 75 and 25% percent split. Why not more?
@ainli41254662 жыл бұрын
Thank you, and i got an error "ValueError: Input contains NaN, infinity or a value too large for dtype('float32')," when running the scripts of # use the random forest classifier from sklearn.ensemble import RandomForestClassifier forest = RandomForestClassifier(n_estimators=10, criterion='entropy',random_state=0) forest.fit(x_train, y_train) could you shed me some lights how to fix it?
@grahamg45292 жыл бұрын
You need to remove NaN’s from from dataset during the data cleansing process