How to Do Data Cleaning (step-by-step tutorial on real-life dataset)

  Рет қаралды 168,623

Mısra Turp

Mısra Turp

Күн бұрын

Пікірлер: 130
@LightHouse31073
@LightHouse31073 2 жыл бұрын
I would highly recommend new data analysists to participate this this data cleaning process you present. The level of detail is impeccable 👌
@misraturp
@misraturp 2 жыл бұрын
Great to hear Loyiso, thank you!
@LightHouse31073
@LightHouse31073 2 жыл бұрын
@@misraturp My pleasure. You're so good. (*meant to say Data Analysts and not the gibberish above🤭)
@ExtraKanin
@ExtraKanin 2 жыл бұрын
As someone with short attention span these days, I'd like to pat myself at the back for being able to sit through the entirety of the video. Data is really interesting! Thanks, Misra :)
@misraturp
@misraturp 2 жыл бұрын
Great to hear that you enjoyed it! :)
@cybiryan
@cybiryan Жыл бұрын
Thank you for showing us these techniques. And I love the pace, thank you!
@simenandreasknudsen9272
@simenandreasknudsen9272 3 жыл бұрын
This is great Misra, thanks! :) Please make more of these!
@misraturp
@misraturp 3 жыл бұрын
Thank you!
@fehmidatahir2509
@fehmidatahir2509 2 жыл бұрын
Thank you SO much! I am totally in love with the content you share!
@misraturp
@misraturp 2 жыл бұрын
You're very welcome :)
@antonioarana8002
@antonioarana8002 2 жыл бұрын
PERFECTION all this explanation and process, thanks Misra! you are a master, thanks a lot!
@misraturp
@misraturp 2 жыл бұрын
You're very welcome :) and thank you for your nice words Antonio :)
@iqraasif3783
@iqraasif3783 Жыл бұрын
This is gold. Learned a lot from your videos. Thanks!
@prekshagampa5889
@prekshagampa5889 2 жыл бұрын
Hey Misra! I am new here and I love your content. And, Thank you for this wonderful explanation, Its really helpful to get a clear view of how things work .
@misraturp
@misraturp 2 жыл бұрын
That’s great to hear. Thank you!
@fernandoraposo5038
@fernandoraposo5038 3 жыл бұрын
That was the video that i was searching for. Thanks :)
@misraturp
@misraturp 3 жыл бұрын
Awesome! You are very welcome.
@aaratigeorge2904
@aaratigeorge2904 2 жыл бұрын
This was a wonderful tutorial wherein you also gave insights on what things u consider while data cleaning, thank you.
@misraturp
@misraturp 2 жыл бұрын
You're very welcome!
@ahmedkaram-ws3hs
@ahmedkaram-ws3hs 3 жыл бұрын
thank you misra for this great content please keep making videos it's very very helpful and your explanation is so much great.
@misraturp
@misraturp 3 жыл бұрын
Thank you Ahmed!
@atharvasavdekar7613
@atharvasavdekar7613 3 жыл бұрын
You deserve more appreciation. Great Content!!!
@misraturp
@misraturp 3 жыл бұрын
Thank you so much 😀
@animelover5093
@animelover5093 Жыл бұрын
Great content! This step-by-step guide has been extremely helpful in my journey.
@KonradTamas
@KonradTamas Жыл бұрын
Smart bute :) and great content ! Thanks
@la-vieborde9825
@la-vieborde9825 2 жыл бұрын
thank you so much. I am a new data scientist and these videos are very helpful.
@misraturp
@misraturp 2 жыл бұрын
Great to hear!
@mohammedansar8023
@mohammedansar8023 3 жыл бұрын
Amazing...My search has ended here... pls continue unloading more videos...
@misraturp
@misraturp 3 жыл бұрын
That's awesome to hear! Thank you.
@KhaliDALKhafaji
@KhaliDALKhafaji Жыл бұрын
that was a very great illustration thanks alot
@misraturp
@misraturp Жыл бұрын
You're very welcome!
@abdelrhmanrhyaseen6194
@abdelrhmanrhyaseen6194 Жыл бұрын
This was really amazing, Thank you.
@deandu3414
@deandu3414 9 ай бұрын
Thank you for the amazing lecture.
@raminsadeghnasab9310
@raminsadeghnasab9310 2 жыл бұрын
That was amazing. Thanks for your time.
@misraturp
@misraturp 2 жыл бұрын
You're very welcome :)
@FRANKWHITE1996
@FRANKWHITE1996 3 жыл бұрын
Thanks for sharing! 👍
@misraturp
@misraturp 3 жыл бұрын
Of course!
@BillusTinnus
@BillusTinnus 2 жыл бұрын
So this is what data scientists do... Nice video ! :)
@misraturp
@misraturp 2 жыл бұрын
Yes, thanks!
@judyostroot8682
@judyostroot8682 3 жыл бұрын
Wonderful tutorial! This is exactly the type of content I was looking for. When you are cleaning data in a work environment, do you document all the changes you make?
@misraturp
@misraturp 3 жыл бұрын
Hey Judy, I’m happy to hear that! I did not document the cleaning steps most of the time. But having clear comments on the code itself is very useful to people who will maintain the code after you to understand why you have done something.
@paragandozdroch3791
@paragandozdroch3791 Жыл бұрын
Thank you Misra for the detail video, very helpful. I am wandering what is the shortcut for the search drop down bar on 29:18 min of your video . Thanks
@asadghnaim2332
@asadghnaim2332 3 жыл бұрын
Please more on data cleaning
@misraturp
@misraturp 3 жыл бұрын
Noted!
@rwejolandacademy
@rwejolandacademy Жыл бұрын
I loved you from the time I see you. TYJ
@8CountLife
@8CountLife 3 жыл бұрын
I'm very appreciative for your channel
@misraturp
@misraturp 3 жыл бұрын
Thank you 8CountLife!
@hassanmahamat-pz8fx
@hassanmahamat-pz8fx 8 ай бұрын
Good explanation.
@onyinyeobijiofor7075
@onyinyeobijiofor7075 3 жыл бұрын
This is so nice and clean 👌
@misraturp
@misraturp 3 жыл бұрын
Thank you! Cheers!
@harikishan437
@harikishan437 2 жыл бұрын
It's really amazing, i learned a lot of new coding lines here as well ass the concept too ......Thanks a lot @Misra Turp😇😇😇😇😇
@misraturp
@misraturp 2 жыл бұрын
That's great to hear, thank you!
@CresentX
@CresentX 3 жыл бұрын
Good work Misra
@misraturp
@misraturp 3 жыл бұрын
Thank you :)
@NotFog1
@NotFog1 2 жыл бұрын
So good, thanks a lot!
@misraturp
@misraturp 2 жыл бұрын
You're very welcome!
@haliltezel8106
@haliltezel8106 2 жыл бұрын
Thank you for content mısra,datas not always clean as in tutorial:) Good job keep going my friend
@misraturp
@misraturp 2 жыл бұрын
Thank you!
@okotpascal
@okotpascal 2 жыл бұрын
The playlist doesn't seem to be arranged for one to easily follow up on the videos. otherwise great content and thank you for the tutorials.
@alifiaz7792
@alifiaz7792 3 жыл бұрын
Very well explained. During the cleaning you arbitrarily took 25th and 75th percentile as limits for cleaning tree diameter. Can you recommend a more systematic approach to select the these lower and upper quantile values? So appropriate treatment can be applied to the values below and above.
@misraturp
@misraturp 3 жыл бұрын
Hey Ali, great question. My main goal was to not drag the video for very long so I didn't go very deep into that decision. You can read of a good description on this page (machinelearningmastery.com/how-to-use-statistics-to-identify-outliers-in-data/) under the "Interquartile Range Method".
@alifiaz7792
@alifiaz7792 3 жыл бұрын
​@@misraturp Thanks Misra for sharing the link
@nicolaslpf
@nicolaslpf Жыл бұрын
Most people use IQR
@diegomartins7214
@diegomartins7214 Жыл бұрын
Thank you!
@ecitahpi385
@ecitahpi385 2 жыл бұрын
Dear Misra, thank you for your tutorial. from 19:40 min with merge codes and other till and other codes unfortunately not working by myside.. is it possible to share your code source in GitHub, too?
@searchbug
@searchbug 2 жыл бұрын
Wow! Not everyone gets to share this kind of in-depth walk through. Thanks for sharing this, Misra! For our dear friends who are not really techy or no experience dealing with codes and stuff, you may learn the basics or find others ways to clean data. Bulk validation, for example, the only thing you need to do is to make sure that your data sets are well-organized, with proper headings. You can then simply upload them in a third-party software that will do the verification for you, which then will let you know what data sets are outdated, invalid, and inactive.
@misraturp
@misraturp 2 жыл бұрын
You are welcome! Thank you for the addition.
@searchbug
@searchbug 2 жыл бұрын
@@misraturp My pleasure! Thank you for considering it :)
@idongessien2245
@idongessien2245 2 жыл бұрын
Not only are you pretty but freaking intelligent. (Forgive my choice of words, but I'm always straight forward). Was almost frustrated at some point...Thank God I ran into your channel
@kunjalsahu3504
@kunjalsahu3504 2 жыл бұрын
Great content mam for fresher keep up the good work , new subbie
@misraturp
@misraturp 2 жыл бұрын
Welcome!
@hamidmirza333
@hamidmirza333 Жыл бұрын
Grateful for awesome tutorials. How to split the datasets for training and testing sets? I am working with EMG signal classification using SVM classifier. I am confused how to split the data sets to do classificaiton task.
@asadghnaim2332
@asadghnaim2332 3 жыл бұрын
Thank you a lot Mirsa :D
@misraturp
@misraturp 3 жыл бұрын
You're welcome 😊
@dominicatuahene7303
@dominicatuahene7303 2 жыл бұрын
Hi Misra, please the link to the previous video on before this one
@misraturp
@misraturp 2 жыл бұрын
Here it is: kzbin.info/www/bejne/hYqXloSomtCErNU
@HasanKarakus
@HasanKarakus 2 жыл бұрын
Harika anlatım !!
@misraturp
@misraturp 2 жыл бұрын
Tesekkurler!
@shivamagarwal587
@shivamagarwal587 2 жыл бұрын
mask=((tree_census_subset['status']=="Stump") | (tree_census_subset['status']=="Dead")) this line given an error to me can you explain what's the problem here
@govindant8360
@govindant8360 3 жыл бұрын
Very useful for me Mam.
@misraturp
@misraturp 3 жыл бұрын
Thanks a lot
@rangabharath4253
@rangabharath4253 3 жыл бұрын
Awesome 👍😎
@misraturp
@misraturp 3 жыл бұрын
Thanks ✌️
@ashikinfodu
@ashikinfodu Жыл бұрын
Hi Misra, a quick question, what you do with missing values created by logical/skip questions in the survey? Thanks?
@md.alamintalukder3261
@md.alamintalukder3261 Жыл бұрын
Great ❤
@tejaswaroopdasari1495
@tejaswaroopdasari1495 3 жыл бұрын
Your look Great by subject wise Content
@misraturp
@misraturp 3 жыл бұрын
Thank you.
@prasadxev
@prasadxev 5 ай бұрын
when angel teaching programming
@shameermasroor7375
@shameermasroor7375 Жыл бұрын
Hello, Misra! great video! I had a question. When I run the tree_census_subset['steward'].value_counts() statement, pandas does not return the None count, or the count of the trees which do not have a steward. Has there been some change in the value_counts() function or am i doing something wrong?
@shuklajitechnicals2907
@shuklajitechnicals2907 Жыл бұрын
hey, what font( code font ) you are using ??
@ankitsrivastava06
@ankitsrivastava06 Жыл бұрын
Please provide datasets for practice csv file
@mkhex87
@mkhex87 2 жыл бұрын
isn't it often useful to leave missing values as np.NaN so that your aggregation functions skip over them but still compute?
@misraturp
@misraturp 2 жыл бұрын
Depending on your goal that would be helpful. When preparing data for a machine learning task, we would like to get rid of or correct as many missing values as possible so they don't crush the training process.
@samirmendhe7387
@samirmendhe7387 2 жыл бұрын
You made this so beutifull
@misraturp
@misraturp 2 жыл бұрын
Thank you!
@hilloldasfisheuphoria
@hilloldasfisheuphoria 2 жыл бұрын
Hallo Misra, while enrolling myself to the course "deep learning 101 with python and keras" a "coupon code" is asked, how do I get it ???? where from I get the coupon code so that I can fill the "Add Coupon Code" field ???? ,....
@misraturp
@misraturp 2 жыл бұрын
Hey Hillol, that field is optional. I occasionally have campaigns, then I create a coupon code. There are no eligible coupons right now.
@vinayakdixit2855
@vinayakdixit2855 2 жыл бұрын
@19:40
@minhhapham3010
@minhhapham3010 Жыл бұрын
you are so beautiful, and the the content of video is very an useful for new learner. Many thanks
@ekaterinakorneeva4792
@ekaterinakorneeva4792 11 ай бұрын
Please make the links stay on the screen for more than 1 second, it would be much more convenient. Thank you.
@MOHANAMona-yq3dl
@MOHANAMona-yq3dl 9 ай бұрын
Where can i get the data set?
@misraturp
@misraturp 9 ай бұрын
I believe this is the dataset I'm using. www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
@MOHANAMona-yq3dl
@MOHANAMona-yq3dl 9 ай бұрын
@@misraturp Thank you ✨
@DataSet
@DataSet Жыл бұрын
Im ready to be cleaned
@um1541
@um1541 2 жыл бұрын
Would it be better to find 75% and 25% from the max value, instead of using percentiles? 15 doesn't look like 75% of 59, so I assume, we are editing 50% of data.
@mad1337nes
@mad1337nes Жыл бұрын
that's the quartile markers, not an actual percentage. The mean of the "bottom 25% of the data" is that number, same as the mean of "the top 75%", not 75% of the max value
@parkuuu
@parkuuu 2 жыл бұрын
Hello Misra, For the last part (substituted the diameter with Q1 and Q3 values), wouldn't it be better to retain the original column data and then just add a new one based on a condition (like np.where actual dia < Q1, use Q1 else dia), just to be able to compare it side by side
@مسافر-ح2ط
@مسافر-ح2ط 7 ай бұрын
I thought cleaning should be before exploration?
@dr.emmrich
@dr.emmrich 4 ай бұрын
How do you clean a data you haven't explored?.. the exploration actually reveals what to clean
@kiwi-mf2do
@kiwi-mf2do 3 ай бұрын
Can't this get automated with Genrative AI?
@antukhan5592
@antukhan5592 Жыл бұрын
can u share github code?
@hilloldasfisheuphoria
@hilloldasfisheuphoria 2 жыл бұрын
Misra I had submitted my "first name" and "email" to get "Pandas Cheat Sheet (free)" three times, but I have not received any mail yet !!!! ,....
@misraturp
@misraturp 2 жыл бұрын
Hey Hillol, could you check your spam folder? It looks like the email was sent.
@keidran_r3
@keidran_r3 Жыл бұрын
data wrangler, a vs code extension. you're welcome.
@ttffan658
@ttffan658 2 жыл бұрын
Cute smile
@salimayad2151
@salimayad2151 Жыл бұрын
She can fix me
@rohitbuddabathina
@rohitbuddabathina 3 жыл бұрын
Did anyone tell you that you resemble Angelina Jolie ?😀
@misraturp
@misraturp 3 жыл бұрын
Not until now. I'm flattered. 😅
@rohitbuddabathina
@rohitbuddabathina 3 жыл бұрын
@@misraturp I hope you run a correlation test on your face and Angelina's face. I really wanna see the results 😀
@misraturp
@misraturp 3 жыл бұрын
@@rohitbuddabathina Haha sure. :D
@msumode4493
@msumode4493 Жыл бұрын
Visiting for your face.
@theamithsingh
@theamithsingh Жыл бұрын
Great video series @misraturp
@misraturp
@misraturp Жыл бұрын
Thank you!
@misraturp
@misraturp 3 жыл бұрын
👉 Get real world data science experience by doing hands-on work www.misraturp.com/hods
@namaniitkanpur5697
@namaniitkanpur5697 2 жыл бұрын
you looks very beautiful
Python for Data Analysis: Exploring and Cleaning Data
28:22
DataDaft
Рет қаралды 44 М.
Quando eu quero Sushi (sem desperdiçar) 🍣
00:26
Los Wagners
Рет қаралды 15 МЛН
Quilt Challenge, No Skills, Just Luck#Funnyfamily #Partygames #Funny
00:32
Family Games Media
Рет қаралды 55 МЛН
Clean Excel Data With Python Pandas - Removing Unwanted Characters
5:52
Derrick Sherrill
Рет қаралды 115 М.
How is data prepared for machine learning?
13:57
AltexSoft
Рет қаралды 71 М.
Master Data Cleaning Essentials on Excel in Just 10 Minutes
10:16
Kenji Explains
Рет қаралды 730 М.
How I Would Become a Data Analyst In 2025 (if I had to start over again)
15:40
Avery Smith | Data Analyst
Рет қаралды 111 М.
Real World Data Cleaning in Python Pandas (Step By Step)
40:01
Ryan & Matt Data Science
Рет қаралды 95 М.
Learn Pandas in 20 minutes!
23:40
Mısra Turp
Рет қаралды 22 М.
Data Cleaning in Pandas | Python Pandas Tutorials
38:37
Alex The Analyst
Рет қаралды 355 М.
Quando eu quero Sushi (sem desperdiçar) 🍣
00:26
Los Wagners
Рет қаралды 15 МЛН