Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced
@dipto6243 жыл бұрын
man!! I was struggling with how to use statistics in EDA. I knew std, mean n all but couldn't use them in the EDA flow. u just cleared my confusion!!!! u won't believe how long I have been struggling with this.. thank god I found this video.. u r a great teacher.. I had the tools but couldn't use them. u just taught me how to use it..
@codebasics3 жыл бұрын
☺️👍
@Mohammed-rx6ok4 ай бұрын
+1
@sultanhusnoo85523 жыл бұрын
Can't thank you enough for the amazing work you do. It is explained in such simple honest way. Many KZbinrs explain things in incomplete way and then keep referencing their paid courses. You are probably the only one who has complete course and complete explanations and exercises all available for free and you even provide some level of feedback to those who interact with you. This is so rare and precious. I have been learning programming and data science with view to improve my career. As soon as I get a salary from any coding related work, I promise to join your patreons. Can't thank you enough for what you do. All the best for you and your family.
@codebasics3 жыл бұрын
Sultan, you are a very kind person and thanks for all your appreciation :) This kind of feedback motivates me to continue my work on youtube!
@sumitkumarsah87824 жыл бұрын
Sir i just wanna say that my respect for you is increasing alot. Keep making such videos. Thank you for your efforts.🙏
@subuqerpsmja4 жыл бұрын
You are such an inspration for people like me who are looking for a transition towards data science day and night im spending my time in this quarantine with datascience and your youtube videos plays a huge role in increasing my caliber. I am a system engineer in cts and now i wish to move my career towards data science. Tirelessly im preparing my portfolio and my resume to forward as per your latest video for the evalation
@ndosh1man Жыл бұрын
you made it?
@krishnanarwade14674 жыл бұрын
I am totally inspired by dhaval sir and krish naik sir Thank you very much for sharing your valuable knowledge with us
@kirandeepmarala55414 жыл бұрын
I have no words how to say Thank You..You always providing Such a knowledge for free all the time...I pray god to keep safe for you and your Family all the Time with Health, Wealth and Prosperity..Thank You once again
@jaganinfo4 жыл бұрын
we will not stop the video :) we will watch entire video . each info is very valuable to us (learners)
@shaiksuleman31914 жыл бұрын
Simply Super B Star.You and Krish are two eyes of Data science
@cesarkastoun57524 жыл бұрын
Hello, 1st of all, I love your videos. You have a great talent for teaching and are putting it to good use. Just a small nit: the heights file you're using is not really a normal distribution, but a bi-modal one, as it has 2 modes. And the reason is very simple, it's because you're lumping together males & females. If you use separate data sets for each gender, you get much "cleaner" normal distributions. Cheers -CJK
@anandshimpi80113 жыл бұрын
Really amazing lecture sir,i increasing interest on Data science sir
@hardikatri78034 жыл бұрын
One of the finest tutorials. Great teaching style.
@codebasics4 жыл бұрын
Thanks Hardik, Keep learning.
@hardikatri78034 жыл бұрын
Thankyou for the support and guidance. Your exercise part in tutorials is just awesome. I really loved your way of teaching
Excellent explanation in every topics, it really helps me alot for my data science career.. thanks
@sa898794 жыл бұрын
very good and neat explanation but there is one draw back in this Z -score it deal with mean calculation when there is some extreme outlier entry or human made error it can be affected instead of that if we go for Median calculation for outliers it will be roboust,what ever the value it will only take the mid values alone,thanks for your teaching z score
@Hale-xn6ec3 жыл бұрын
It is a really beneficial and useful video on this topic, thank you!
@likhithsasank80173 жыл бұрын
Thank you so much sir your way of teaching is so clear and easily understandable
@abdeali0044 жыл бұрын
Great Greaaaaat and a fulll too Greaaattttt explanation man. Loved it.
@hasanbutt86224 жыл бұрын
best tutorial thanks alot sir you are great i have learnt alot of concept from your videos GOD bless you and keep making more videos
@siddharthmodi27403 жыл бұрын
woww! what a simple and easy to understand tutorial. Love it. Thank you sir.
@bhavindedhia99684 жыл бұрын
TOP content seriously thanks sir waiting for more videos specially EDA
@Deepsim3 жыл бұрын
Your tutorial is so clear. Well done!
@codebasics3 жыл бұрын
Glad it was helpful!
@akshaypatil81552 жыл бұрын
16:38 this is just trimming technique. If we want to do capping that means replacing outliers with either lowest defined value or highest defined value, how to do it?
@subuqerpsmja4 жыл бұрын
Really my sincere thanks for your valuable efforts and im keenly following your guideliness
@Medjdiptiranjan2 жыл бұрын
you are simply amazing , yr simple explanation helping a lot , thanks a trillion
@learnerlearner40902 жыл бұрын
Your videos are easy to understand. Thanks so much!
@chivalrousforlan2384 жыл бұрын
Nice one Sir, thank you. One thing sir, I would like you to please make a tutorial on SQL. Thank you sir
@whimsicalkins5585 Жыл бұрын
Thanks very much for your simple and clear code.
@fahadreda30604 жыл бұрын
Great video, Thanks man , keep up the good work
@srishtikumari66644 жыл бұрын
Very well explained sir!! Worth watching
@codebasics4 жыл бұрын
👍😊
@python3602 жыл бұрын
Great tutorial, thanks for using readily available sample CSV as well. ☑☑
@hrushik102 жыл бұрын
You can also use seaborn to plot the bell curve. It's much easier than matplotlib method. seaborn.histplot(data=df.height, kde=True) kde is the kernal density estimate line
@jp-hmАй бұрын
Great video - well explained!
@dhananjaykansal80974 жыл бұрын
Long time sir. I wished you took at least dataset with 5-6 features. Nonetheless it's fantastic
@yogeshbharadwaj62004 жыл бұрын
Tks for the very detailed explanation sir...
@AryanFelix3 жыл бұрын
How do we determine the Z-Score range for Skewed data? Do I use the same range on either side (like -3 to 3) or can I use different values like -1 to 3 (for left skewed data) after looking at the histogram plot? Thanks in advance!
@haythemb42142 жыл бұрын
same question i don't know what is the right range for my data because the (3 , -3) doesn't work for my case
@pranjalgupta94274 жыл бұрын
Sir if data is non-normally distributed then which technique we prefer for removing outliers?
@stuttzzzi3 жыл бұрын
there are ways to convert data into normal distribution..learn scaling
@haintuvn4 жыл бұрын
Thank you for your lectures! I have learnt a lot from the lectures. We can only apply method of Std and Z score to remove the outliers if the data set is normal distribution or we can apply these two methods to all "types" of data set ( normal or not normal distributions)? Thank you again!.
@codebasics4 жыл бұрын
You would do that if you have normal distribution
@haintuvn4 жыл бұрын
@@codebasics Thank you very much! Does that mean we need to test to see if the data set is normal distribution before we apply "Z score or standard deviation " method to remove the outlier?
@prdfrnd4 жыл бұрын
Hi sir, your explanation is really amazing, I recently started to learn data science i have some doubts in this video kindly please explain the question is we have mean of 66.36755 and if we add 3.8475 then it will become 69 how it will be one standard deviation.
@hustleto-n6d4 жыл бұрын
one standard deviation = 3.8475
@nareshchinnam83494 жыл бұрын
Thanks so much for explaining in such a easy way. Could you please clarify what would we need to do if other columns contains important values in the same row where outlier exist? Still we can go ahead and remove the entire row?
@naveenkalhan954 жыл бұрын
thank you very much again... i am really following all your video.. really knowledgeable ... @5:50 of this video, you created the bell curve.. i am aware of one function .kde() which does the same thing. Is it wise to use that? or there is some difference in that to this function you created for drawing bell curve? Thank you very much again. Really appreciate.
@codebasics4 жыл бұрын
Naveen, actually I don't know about kde() function. What does API specification say about that function? Can you try plotting it and see if result is same as mine?
@naveenkalhan954 жыл бұрын
@@codebasics thank you for your reply. I went through your advice and plotted the height using .kde() method and it produced the bell curve same but with a slight difference but plotted the same normal curve. I just had to write this line to draw it: df.Height.plot.kde(); But, thank you again for your precious work. Because it's opening up my brain to think the more agile way of drawing it to understand mathematically.
@hardikatri78034 жыл бұрын
We can also plot through seaborn using parametre ( kde = True)
@vishalvig014 жыл бұрын
Concise Explanation !
@pranjalgupta94274 жыл бұрын
Do we remove outlier before feature scaling and after feature scaling?
@codebasics4 жыл бұрын
We don't need to remove them all the time. We need to treat them which means we might end up changing the value to some resonable value
@codebasics4 жыл бұрын
Yes we remove them before feature scaling
@obigvee4 жыл бұрын
I have question. Let's assume a Dataframe has some missing values with the presence of outliers and I don't want to just remove the outliers I want to winsorize the outliers. Is it right to treat the missing values first before winsorization or the other way round?
@sahanjayawarna48944 жыл бұрын
Very good session as always. I came across this situation but couldn't figure out why. Unless we pass this argument "density=True" in matplotlib.pyplot.hist(), it is not possible to see the normal curve and histogram together in the graph. What is the reason for that?
@flaviobrienza76972 жыл бұрын
A little suggestion to make it simpler. In Z-Score method I can calculate its absolute value through np.abs and I can only write < 3 in my condition for the new dataframe. In addition, to visualize the curve it is better to use sns.histplot with kde=True
@sarfrazhussain98512 жыл бұрын
Nice effort
@tucomax4 жыл бұрын
Question, say you have a df of drink consumption and if you don't want to eliminate the outliers but instead replace them with NaN and keep the zero values of the dataframe, what would you do? Thanks
@satyavardhan82044 жыл бұрын
Also make videos regarding Seaborn please
@ajaykushwaha-je6mw3 жыл бұрын
Removing outlier is good option of replacing outliers with other value is good option ?
@estherugwueke54092 жыл бұрын
how can you apply this rule when you have about 10 features? Do you do them one by one?
@modhua44973 жыл бұрын
Does this work only if the feature is normally distributed? Most of the features in real world data are not normally distributed.
@priyantangupta51763 жыл бұрын
Hello! Your lesson is very helpful for me. Can you just say how can I find outliers using multiple parameters? Like I want to find the outliers using all the column of data together that I have. What should I do?? Thank you in advance.
@trinayanbharadwaj1463 жыл бұрын
How can we apply this to multiple columns? Is there any short way or we have to do it manually for every column?
@Aaron_duckroast3 жыл бұрын
hey. why cant we use 'StandardScaler' and delete all outliers ?
@pythonenthusiast92924 жыл бұрын
awesome.
@sadikaljarif9635 Жыл бұрын
why we choose height column ??why dont we chose weight column???
@Artech.Ranjit3 жыл бұрын
How to decide 3 as a threshold value to calculate zscore values? you have considered ex: zscore >3
@bikashpokharel4784 жыл бұрын
It really helped me. Thank You
@codebasics4 жыл бұрын
Glad it helped!
@GusMD844 жыл бұрын
what happens when the std deviation is way bigger than the mean? Currently exploring a dataset where mean price is ~220 and std dev is ~395? Evidently, there's some big outliers that can be seen straightaway (i.e. min price of 4 and max price of 36000). Should I remove those 'clear' outliers manually and then apply the remove outliers function? (i presume that if I don't do this, the function will remove a lot of 'non-outliers'?
@shounaksushantadasgupta84403 жыл бұрын
how to remove outlier from dataframe which has categorical as well as continuous data, as by percentile technique I am getting NaN value in categorical columns
@reshaknarayan39444 жыл бұрын
Clear and succinct
@ajaykushwaha-je6mw3 жыл бұрын
I have a question kindly answer. Suppose we have 20 column and from all 2 column we are removing outliers, then we are excluding small amount of data from each column, i.e. all together we are loosing huge data. Is this a correct way to handle outliers ?
@beautyisinmind21632 жыл бұрын
hello sir, can we learn personally from you? and how can we contact you
@piyush_sh98 Жыл бұрын
How standard deviations is selected as 3 and zscalar 3 too? Please someone explain
@rsinh37923 жыл бұрын
Sir reviewer has asked me this question I don't know how to address it, can you please guide me "Use some statistical significant test such as T-test or ANOVA to prove you validate the proposed diagnostic model on patients and quality improvements of your method". I have two datasets. Dataset 1 was used to train the model and dataset 2 was used to validate the trained model. I have trained the ML model deployed it and Validated it on new data and presented the results. Actually, I have understood the question. Shall I apply the statistical test between the performance metrics of trained model results and validation results? Please help me, sir.
@harleyquinn52454 жыл бұрын
Sir can l become data analyst after 12th
@HabeshaTV1 Жыл бұрын
can you provide mock interview?
@pukyalligator2 жыл бұрын
Great Video. Thx!!
@zehraup47224 жыл бұрын
Here is a great explanation: www.kaggle.com/c0derr/outlier-detection?scriptVersionId=39511980
@saurabhbarasiya47214 жыл бұрын
Great sir
@boubacaramaiga44084 жыл бұрын
Fantastic, many thanks.
@harshal_ajetrao4 жыл бұрын
Thanks for the video Sir. I am new to the Machine Learning Well I use percentile,standard deviation and zscore method but problem I get in standard dev nd zscore method is the outliers removed doesn't changes values in our data i.e df, rather it gets stored in new frame df_no_outlier_std_dev. So how to update new values after removing outliers in our data i.e df. please help....
@viveksingh8813 жыл бұрын
that is because we are storing it in new dataframe not the original one....in case u want the changes to be reflected in original dataframe store it in original and use inplace = True df = df([......code.....,inplace = True) happy learning
@harshal_ajetrao3 жыл бұрын
@@viveksingh881 Thanks..It was 6months back story..Now I at intermediate level in machine learning 👍
@viveksingh8813 жыл бұрын
@@harshal_ajetrao thats great bro....clearing some doubts on random yotube videos..happy learning :)
@harshal_ajetrao3 жыл бұрын
@@viveksingh881 Thanks for helping man..Keep it up 🤘🤘🤘
@anirbaniitgn84072 жыл бұрын
Everything is good when you are applying Z_score for searching outliers which are either positive or negative outliers. If both positive and negative values are present together then it does not work..!! data = [1, 2, 2, 2, 3, 1, 1,-19, 2, 2, 2, 3, 1, 1, 2,19,25] try with this simple dataset. with IQR method you can detect -19,19,25 all three but with Z_score it is not working. I don't know the reason. If you know Sir then let us know.
@ssrriinniivvaass4 жыл бұрын
Hi Sir, How do I decide Z score values, does it depend on my data or is it always -3 to +3?
@codebasics4 жыл бұрын
Usually is is between 3 and -3 but yes it depends on data. Sometimes people use more than 3 based on data distribution
@pythongui51993 жыл бұрын
Very nice
@nikhilgaikwad99544 жыл бұрын
how to select the number of standard deviation in zscore technique to remove outliers?
@codebasics4 жыл бұрын
General guideline is 3 or more. If data set is small people use 2 STD dev too but just be careful that you don't remove data point that can add value to data analysis process
@marco_61454 жыл бұрын
Fantastic, thank you
@barkhapaswan58073 жыл бұрын
🙌🙌🙌
@AlonAvramson3 жыл бұрын
Thank you!
@sayantandas92814 жыл бұрын
Sir, thank you
@renanaoki714 Жыл бұрын
Thanks!
@anthonym91304 жыл бұрын
I noticed he didn't use z-score or cooks in the real estate project
@bhaskarsubbaiah60024 жыл бұрын
thanks sir
@research__76442 жыл бұрын
BRUH.... why would you remove one column .... this just ruins the propose
@Kingcolumbian3 жыл бұрын
You know python, but you dont know much about statistics in identifying the outliers in normal distributed data.
@janaramon12328 ай бұрын
Bruh,Wdym?
@mohammadfasih77524 жыл бұрын
Zoom in your screen !!!
@sushobhan144 жыл бұрын
content is good but ur delivery is boring
@skcbca85803 жыл бұрын
Sir Z- score will work for numeric data ? In case of text data what we can do ?