Outlier detection and removal using percentile | Feature engineering tutorial python # 2

  Рет қаралды 134,274

codebasics

codebasics

Күн бұрын

Пікірлер: 135
@codebasics
@codebasics 2 жыл бұрын
Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced
@chivalrousforlan238
@chivalrousforlan238 4 жыл бұрын
Most of KZbin tutorials I've gone through are top notch, but yours is loads of miles away. What makes yours different is you start from the basics with a slow pace, which makes it easier to understand.
@sumit121285
@sumit121285 3 жыл бұрын
a true comment for a true teacher......
@onlyguitars
@onlyguitars 10 ай бұрын
Best channel by far. You have become my DS - DA youtuber by far. You tackle one of the most important things that is understanding the concepts and translating it into how to do it. I recently realized that is far better to understand well the concepts than knowing perfectly each pandas function and how to code them fast. Also makes things much more interesting and fun because it clicks and makes sense.
@suyash7450
@suyash7450 Жыл бұрын
You Teach from zero till the the end your pace is perfect and the best part is you provide exercises and resources , Thanks for helping us and teaching us
@kirandeepmarala5541
@kirandeepmarala5541 4 жыл бұрын
max_threshold = df['price'].quantile(0.90) print(max_threshold) min_threshold = df['price'].quantile(0.05) print(min_threshold) df1 = df[(df['price'] > min_threshold) & (df['price'] < max_threshold)] Really, I am Learning a Lot from Your Channel..Waiting everyday for your videos..Your way of Explaining Concepts, Giving Exercises and genuine Talks makes your channel Different from Others..Thank you once again
@_SanchitaPatil
@_SanchitaPatil 2 жыл бұрын
how to decide the quantile values?
@BarongoCalvine
@BarongoCalvine 4 жыл бұрын
Thank you, I just had a DS interview and used the age example for the outlier question, thanks for your useful lessons
@stuttzzzi
@stuttzzzi 2 жыл бұрын
thank you..'if a person cant explain simply,,means he hasnt understood/knows it properly' following ue entire playlists
@vidhanmaheshwari2082
@vidhanmaheshwari2082 4 жыл бұрын
i was stuck in middle of a project due to messy datasets and then i came acoss your video..it helped a lot..Thanks a lot
@KenJee_ds
@KenJee_ds 4 жыл бұрын
Great video as always!
@r7918
@r7918 3 жыл бұрын
Thanks a lot Sir!!! This video was very helpful for me. I was doing things wrong till now. I was under impression that outliers are meant to deleted. Instead of wasting time on irrelevant data points.
@atunraseayomide2504
@atunraseayomide2504 2 жыл бұрын
You never go wrong watching codebasics, I soo much love your work sir.
@codebasics
@codebasics 2 жыл бұрын
👍👍 thanks for you kind words of appreciation Atunrase 🙏
@AbhishekBalsara
@AbhishekBalsara 3 жыл бұрын
This is exactly what I was looking for and particularly for outliers, Thanks a lot for this 👍
@codebasics
@codebasics 3 жыл бұрын
Glad it was helpful!
@iaconst4.0
@iaconst4.0 Ай бұрын
eres un excelente Profesor! gracias por compartir tus conocimientos!!
@arhataria
@arhataria 4 жыл бұрын
This is amazing, Thanks a lot for posting this valuable video!
@Ankurkumar14680
@Ankurkumar14680 4 жыл бұрын
It is an amazing explanation, eager to watch more videos on this topic....thanks a lot for sharing your knowledge and skills :)
@codebasics
@codebasics 4 жыл бұрын
I am glad it was helpful
@manojkumar-hf6vk
@manojkumar-hf6vk 2 жыл бұрын
you have created amazing series Sir.
@pallavikharbanda1372
@pallavikharbanda1372 3 жыл бұрын
Great video!!Please upload more methods to detect and remove outlier detection
@sakshamverma3114
@sakshamverma3114 4 жыл бұрын
Sir pls upload all the videos for feature engineering....... And you are teaching methods are great
@codebasics
@codebasics 4 жыл бұрын
Sure sakaham
@r21061991
@r21061991 4 жыл бұрын
Thank god u again started making videos😀😀
@sumit121285
@sumit121285 3 жыл бұрын
you are a real teacher ..... thanks....thanks a lot....
@codebasics
@codebasics 3 жыл бұрын
Happy to help
@kenny87ification
@kenny87ification 4 жыл бұрын
Thank you for this video.. Got to learn a great way of how to detect Outliers and remove them
@AjayKumar-id7mb
@AjayKumar-id7mb 3 жыл бұрын
Really you explain it in a very easy way.
@codebasics
@codebasics 3 жыл бұрын
Glad you liked it
@shaikhkashif9973
@shaikhkashif9973 Жыл бұрын
8:10 After this if u check outliers in box plot u will find again because of q1 q2 q3 again assigning so that's reason u can't go for remove better go for replacement
@OceanAlves23
@OceanAlves23 4 жыл бұрын
👏👏👏✔ from Brazil - Teresina - Piauí
@saswatleo
@saswatleo 4 жыл бұрын
Amazing Video... Superb Dhaval Sir🙏
@codebasics
@codebasics 4 жыл бұрын
☺️☺️
@naveenkalhan95
@naveenkalhan95 4 жыл бұрын
would request you , if possible please add the link to the playlist of this series in the description of the video... though i have bookmarked it.. but it becomes really easy :) thank you very much for your good work again
@mallickamu
@mallickamu 3 жыл бұрын
Correct me if I am wrong.... We can use percentile based outlier detection only if we know that the variation is normal distribution. For the height example taken by you, we know that it follows normal distribution, so the percentile based outlier detection can be applied. What if the variable distribution is Weibull type. Can we use percentile based outlier detection?
@Deepsim
@Deepsim 2 жыл бұрын
Very clear explaination! Thank you.
@codebasics
@codebasics 2 жыл бұрын
Glad it was helpful!
@leelavathigarigipati3887
@leelavathigarigipati3887 4 жыл бұрын
Thanks a lot for sharing your knowledge and skills.
@rajasekharboppasamudram4463
@rajasekharboppasamudram4463 4 жыл бұрын
Your videos are very easy to understand and thanks for the content. one request from my side, volume of your videos are less compared with other sources. If possible , please increase volume
@codebasics
@codebasics 4 жыл бұрын
Can you check volume of your computer. I played this video on my computer and it's quite good in terms of volume
@kratugautam304
@kratugautam304 2 ай бұрын
​@@codebasicsSavage😂
@fahadreda3060
@fahadreda3060 4 жыл бұрын
Thanks for the video .. keep up the good work.. wish you all the best
@codebasics
@codebasics 4 жыл бұрын
thank you
@naveenkalhan95
@naveenkalhan95 4 жыл бұрын
@10:24 when you take min and max quantile as .001 and .999... I am confused with 0.001... Should it not be 0.01? (meaning 1 percentile). Thank you again.
@kketanbhaalerao
@kketanbhaalerao 4 жыл бұрын
Yes, I am also. bcz 0.001 means 0.1% ?
@viveksingh881
@viveksingh881 3 жыл бұрын
no need to get confused he just went too low with quantile value.... happy learning.....
@aldot1532
@aldot1532 3 жыл бұрын
no, basically he excluded 0.1 % of samples in both ends (upper and lower). the lower end is 0.1 %, which means he excluded 0.1% of samples below the upper end is 99.9%, which means he excluded 0.1 % of samples above
@nafisehyazdi9876
@nafisehyazdi9876 3 жыл бұрын
this tutorial was really good, Thank you
@codebasics
@codebasics 3 жыл бұрын
Glad it was helpful!
@matlabtutorials2986
@matlabtutorials2986 11 ай бұрын
Sir , hdbscan outlier detection pe bhe video banaye please
@tauhidanwar3512
@tauhidanwar3512 4 жыл бұрын
I am confused between the percentage you have taken in min and max thersold. You said that you used 1% as min but you used 0.001 which 0.1 % and 99 % as max but you used 0.999 which is 99.9%. Plz clarify this issue sir. Thanks for this hard work for us.
@codebasics
@codebasics 4 жыл бұрын
I actually meant 0.1% and 99.9%, exact percentage or quartile can really vary on case to case bases. You basically use your sense of judgement to come up with these threshold values
@_Raz_
@_Raz_ Жыл бұрын
This is good only for small data sets . But if we have big datasets with multiple of col so it's very hard to apply
@abhimistry9226
@abhimistry9226 2 жыл бұрын
Thank you dhaval bhai
@immanuelsuleiman7550
@immanuelsuleiman7550 4 жыл бұрын
great content you just gained a new subscriber thank you for your efforts sir
@codebasics
@codebasics 4 жыл бұрын
I am glad it was helpful
@santoshkumarmishra441
@santoshkumarmishra441 3 жыл бұрын
nice explanation sir like always
@haintuvn
@haintuvn 4 жыл бұрын
"quantile(0.001 , 0.999)". When we choose 0.001 or why dot not we choose 0.005 or other? Are there any regulation/ suggestions to choose these numbers? Thank you teacher!
@abhimanyutiwari100
@abhimanyutiwari100 2 жыл бұрын
Same question
@demoprog6878
@demoprog6878 2 жыл бұрын
This is a matter of intuition
@dhruvshah3394
@dhruvshah3394 8 ай бұрын
Just one correction @12:15 you mentioned minimum threshold is 1% but isn't it actually 0.1% ?
@javedj5338
@javedj5338 3 жыл бұрын
nice explanation. clear
@codebasics
@codebasics 3 жыл бұрын
I am glad you liked it
@Ashokkumar-ds1nq
@Ashokkumar-ds1nq 4 жыл бұрын
The Tutorial is just amazing, can you please magnify your screen a bit for a clearer view?
@MindyBrockdesigns
@MindyBrockdesigns 2 жыл бұрын
What if you have several variables to fix outliers for in one data set. For example What if you wanted to remove outliers in the ‘price’ and ‘price-per-square’ variables.
@saisridatta844
@saisridatta844 3 жыл бұрын
Awesome sirr
@robertaraujo347
@robertaraujo347 2 жыл бұрын
I'd like to know why you've used the variable price per area (which is just the quotient between price and area) to do the outlier treatment instead of using the mahalanobis distance (that infact, take into account the correlation between the variables) since you have 4 numerical columns. I hope someone can answer to me. I'm new in this and I have so many questions about it :)
@NikitaSharma-bs4gg
@NikitaSharma-bs4gg 2 жыл бұрын
thank you- really helpful
@soniadubey4773
@soniadubey4773 3 жыл бұрын
thanks. very clearly explained.
@codebasics
@codebasics 3 жыл бұрын
Glad it was helpful!
@saramoeini4286
@saramoeini4286 4 жыл бұрын
thanks alot. i have question . should we do this procedure for all features one by one for detecting outliers and then remove it?
@khushpatelmd
@khushpatelmd 4 жыл бұрын
Shouldn’t it be 97.5 and 2.5 as 95% of values are within 2SD
@yashikasorathia8639
@yashikasorathia8639 3 жыл бұрын
Amazing tutorial sir, but I have one question. How would you decide what quantile value to keep since dataset would be from different domain apart from retail price such as weather report or sales report or any other?
@covelus
@covelus 10 ай бұрын
If I properly understood your question (good one, BTW), I would say domain knowledge mixed with an initial dataset statistical analysis.... Right?
@TheShubham743
@TheShubham743 4 жыл бұрын
Great video
@temurochilov
@temurochilov 2 жыл бұрын
thank you a lot great content.
@ishandaar
@ishandaar 4 жыл бұрын
thanks, this video is really helpful
@codebasics
@codebasics 4 жыл бұрын
Glad it was helpful!
@bhaskartripathi
@bhaskartripathi 2 жыл бұрын
Good explanation mate ! However you can apply fix percentile removal only on toy datasets. For real world data you would Hampel filters, Isolation forest, rolling window based MAD etc for outlier removal.
@preeethan
@preeethan 4 жыл бұрын
What if we want to treat the outliers rather than removing them.? Which is a better practise.?
@codebasics
@codebasics 4 жыл бұрын
You can treat them based on a situation either of them is good
@yogeshbharadwaj6200
@yogeshbharadwaj6200 3 жыл бұрын
Tks for the great video...
@Unclear-Reality
@Unclear-Reality Жыл бұрын
Thank you Sir
@mithilanavishka4531
@mithilanavishka4531 2 жыл бұрын
Sir, How do we decide based on, which column to remove the outliers what is the logic for finding that column? I mean why cant we use other fields like total_sqft after I got describe of data, I saw 75% of samples have total_sqft less than 1672, and max total_sqft is 52272, I wanted remove row have maximum total_sqft, is it wrong ? the way I thought
@mohlagare3417
@mohlagare3417 3 жыл бұрын
thanks a lot for sharing your knowledge and skills , can you please give us dataset. Thanks in advance
@AlonAvramson
@AlonAvramson 3 жыл бұрын
Thank you!
@sudhanshusingh8508
@sudhanshusingh8508 4 жыл бұрын
Hello Sir, I have been through your video and its nice. Since I am new to data analytics I have fair theoretical knowledge about quantiles but don't know on what basis you choose the quantile level, I mean what do you look for in the describe command of any table. please help me with this.
@suryanshsingh2873
@suryanshsingh2873 Жыл бұрын
How to find that where is the outlier present, because there are so many variable presents in the data set
@chrschra
@chrschra 3 жыл бұрын
Nice explanation, thx! But what do to if my data points following are following an exponential distribution?
@manish17788
@manish17788 2 жыл бұрын
what if data has no outlier. In that case we will loose tiny data? how to know if not outlier removal is needed in big dataset?
@ankitac4994
@ankitac4994 2 жыл бұрын
Are there outliers present in categorical data?
@amc8437
@amc8437 3 жыл бұрын
How about using log transformation to remove the skewness, doesn't it do a similar job with min, max thresold?
@zainnaveed267
@zainnaveed267 2 жыл бұрын
min_thresold = df['price'].quantile(0.01) _thresold = df['price'].quantile(0.9999)
@sagar8460830871
@sagar8460830871 4 жыл бұрын
please make video how to setup gpu laptop. for deep learning project. i have gpu laptop but when i am starting training gpu is not process my task my haul task done on cpu. i have 4gb nvidia gtx 1650 graphics card
@mitalipatle4993
@mitalipatle4993 Жыл бұрын
Can we use this method with larger Data Set?
@YO-in2ij
@YO-in2ij 2 жыл бұрын
thank you
@ismailkaracakaya260
@ismailkaracakaya260 Жыл бұрын
But how do you know if the outlier value is above 0.95th percentile?
@haneulkim4902
@haneulkim4902 2 жыл бұрын
There are different ways to remove outliers, when to use what??
@raghuram6382
@raghuram6382 3 жыл бұрын
Is it possible to filter out the outliers of multiple columns in a single program? please do let me know...
@mayanktripathi4u
@mayanktripathi4u 4 жыл бұрын
Is Outlier and Imbalanced are same concept or different? if different could you please share some information... i tried to find based on Definition both seems to be same, but both have different methods to detection and removal. So bit confused.
@codebasics
@codebasics 4 жыл бұрын
They are different concepts. By imbalanced most likely you are referring to imbalanced data sets in terms of machine learning where one class label have very less samples compared to another class label
@mayanktripathi4u
@mayanktripathi4u 4 жыл бұрын
@@codebasics - Does it means that Imbalanced data is mainly for Target / Class label and Outlier is for other features from the dataset?
@ashfakurrahman79
@ashfakurrahman79 Жыл бұрын
Why quantile? and how quantile works?
@mihirit7137
@mihirit7137 Жыл бұрын
the outliers in this video are the mean prices today 😅
@sa89879
@sa89879 4 жыл бұрын
can yo do an example of removing outliers using box plot
@codebasics
@codebasics 4 жыл бұрын
Yes that is coming up
@sa89879
@sa89879 4 жыл бұрын
@@codebasics thank you very much
@panduenglishacademy7856
@panduenglishacademy7856 4 жыл бұрын
I downloaded data from.kagle to.do hand on but when I import csv file in jupyter notebook by its name , it warn an error( name error, file not found Pls help me in solving this issue.)
@panduenglishacademy7856
@panduenglishacademy7856 4 жыл бұрын
Attribute error :- model panda has no attribute 'read'
@swapnshah3234
@swapnshah3234 4 жыл бұрын
@@panduenglishacademy7856 df = pd.read_csv('yourfilename.csv') . it should be like this format.
@trashantrathore4995
@trashantrathore4995 2 жыл бұрын
Hi, anyone did the Exercise? Actually, the main quantile outlier removal is done using the process but what to do with 10k NAN values in "last_review" and 'reviews_per_month" column? Apart from this exercise if we encounter that big number as NAN what should we do? Any suggestions...
@malinibhattacharyya304
@malinibhattacharyya304 2 жыл бұрын
Sir, how to get jupiter note book?
@JatinSharma-tu2zg
@JatinSharma-tu2zg 3 жыл бұрын
Sir secand wala dataset chahiye
@akashgaddam6320
@akashgaddam6320 3 жыл бұрын
how can we fix the thresole value
@muditsrivastava7719
@muditsrivastava7719 4 жыл бұрын
Sir i am not able to download this csv file, it says that data is very large, what to do?
@codebasics
@codebasics 4 жыл бұрын
Can you just git clone it?
@muditsrivastava7719
@muditsrivastava7719 4 жыл бұрын
@@codebasics yes sir i did, but it says we can't make it to csv ( only raw data is available)
@AnushkaSingh-YearBTechChemical
@AnushkaSingh-YearBTechChemical 2 жыл бұрын
This wont work in case if the data contains na values
@jaitiwari241
@jaitiwari241 2 жыл бұрын
where i can find 'bhp.csv file
@tharallaanil4115
@tharallaanil4115 3 жыл бұрын
Data set please sir
@codebasics
@codebasics 3 жыл бұрын
You can find it on GitHub page
@shubhamtyagi4962
@shubhamtyagi4962 4 жыл бұрын
exercise done github.com/styagi9817/oulier-detection-using-quantile-funtion
@codebasics
@codebasics 4 жыл бұрын
That’s the way to go Shubham, good job working on that exercise
@borgir6368
@borgir6368 4 жыл бұрын
who else spot binod at 4:09 lol ;()
@jeevan999able
@jeevan999able 3 жыл бұрын
binod
@krutkanjiya2434
@krutkanjiya2434 3 жыл бұрын
4:00 Binod xD
@allasrinivasulu1464
@allasrinivasulu1464 4 жыл бұрын
Thank you so much sir
didn't manage to catch the ball #tiktok
00:19
Анастасия Тарасова
Рет қаралды 35 МЛН
Will A Basketball Boat Hold My Weight?
00:30
MrBeast
Рет қаралды 117 МЛН
Starting a Career in Data Science (10 Thing I Wish I Knew…)
10:42
Sundas Khalid
Рет қаралды 220 М.
Real World Data Cleaning in Python Pandas (Step By Step)
40:01
Ryan & Matt Data Science
Рет қаралды 81 М.
5 Number Summary And How To handle Outliers Using IQR-Statistics
16:10
How to Detect and Remove Outliers in the Data | Python
24:13
Hackers Realm
Рет қаралды 41 М.
7 Python Data Visualization Libraries in 15 minutes
15:03
Rob Mulla
Рет қаралды 81 М.
Outliers : Data Science Basics
13:07
ritvikmath
Рет қаралды 13 М.
Exploratory Data Analysis with Pandas Python
40:22
Rob Mulla
Рет қаралды 488 М.
didn't manage to catch the ball #tiktok
00:19
Анастасия Тарасова
Рет қаралды 35 МЛН