Real World Data Cleaning in Python Pandas (Step By Step)

  Рет қаралды 48,054

Ryan Nolan Data

Ryan Nolan Data

11 ай бұрын

In this video, I show you how to clean up data within Python Pandas within Jupyter notebook. This Python tutorial is great for those trying to get into Data Analytics or Data Science.
Cricket Data: www.espncricinfo.com/records/...
Everything is coded within MSSQL and inside SQL Server Management Studio.
Interested in discussing a Data or AI project? Feel free to reach out via email or simply complete the contact form on my website.
📧 Email: ryannolandata@gmail.com
🌐 Website & Blog: ryannolandata.com/
🍿 WATCH NEXT
Python for Data Analyst and Scientists Playlist: • Python Tutorials
Python Groupby: • The Complete Guide to ...
Python Pandas Interview Questions: • 23 Python Pandas Codin...
Python Lambda Functions: • Python Pandas Lambda F...
MY OTHER SOCIALS:
👨‍💻 LinkedIn: / ryan-p-nolan
🐦 Twitter: / ryannolan_
⚙️ GitHub: github.com/RyanNolanData
🖥️ Discord: / discord
📚 *Practice SQL & Python Interview Questions: stratascratch.com/?via=ryan
WHO AM I?
As a full-time data analyst/scientist at a fintech company specializing in combating fraud within underwriting and risk, I've transitioned from my background in Electrical Engineering to pursue my true passion: data. In this dynamic field, I've discovered a profound interest in leveraging data analytics to address complex challenges in the financial sector.
This KZbin channel serves as both a platform for sharing knowledge and a personal journey of continuous learning. With a commitment to growth, I aim to expand my skill set by publishing 2 to 3 new videos each week, delving into various aspects of data analytics/science and Artificial Intelligence. Join me on this exciting journey as we explore the endless possibilities of data together.
*This is an affiliate program. I may receive a small portion of the final sale at no extra cost to you.

Пікірлер: 72
@ArmanKHAN-bj9iv
@ArmanKHAN-bj9iv 11 ай бұрын
Fantastic tutorial! Your step-by-step guide on data cleaning in Python Pandas was excellent. Clear explanations and practical examples made it easy to follow along. Looking forward to more of your uploads. Keep up the great work!
@RyanNolanData
@RyanNolanData 11 ай бұрын
Thank you! I’ll have another Python video up this week as well as more coming soon!
@AJAY7509
@AJAY7509 Ай бұрын
this video really helped me man, i was trying to leard about panda now it poped up on my notification, thanks for the video.
@RyanNolanData
@RyanNolanData Ай бұрын
No problem check out my other pandas vids I have a full playlist
@nickdaboss03
@nickdaboss03 11 ай бұрын
you work super hard and put out really good content. Keep it up man, I'm looking forward to watching you grow!
@RyanNolanData
@RyanNolanData 11 ай бұрын
Thank you! Have another video ready to go later this week as well as 90% done with another Python interview question video.
@Al-Ahdal
@Al-Ahdal Ай бұрын
@Ryan Nolan: Excellent Video. Very clearly explained. I'm looking forward to watching you grow!
@RyanNolanData
@RyanNolanData Ай бұрын
Much appreciated!
@koo5867
@koo5867 3 ай бұрын
Now that’s some cool content. This is exact what I wanted. Thanks bro🙏🏼keep helping the poor students like us! 😌
@RyanNolanData
@RyanNolanData 3 ай бұрын
No problem
@nagamanickam6604
@nagamanickam6604 7 ай бұрын
Thank you Ryan nolan
@RyanNolanData
@RyanNolanData 7 ай бұрын
no problem
@tapspasi2319
@tapspasi2319 Ай бұрын
Amazing! Very good presentation
@RyanNolanData
@RyanNolanData Ай бұрын
Thank you
@loydteds3944
@loydteds3944 26 күн бұрын
You're video is very helpful! One question though, how do you remove duplicates in high dimensional data, lets say with 500 duplicates? Thanks
@user-iu5nz2gy6l
@user-iu5nz2gy6l 2 ай бұрын
Thanks . Appreciate for this tutorial. Just have a question on Q5. Why is it already in a data frame? while we have to use to_frame for Q4 ? Thanks
@yankoshuan6225
@yankoshuan6225 8 ай бұрын
i guess watching your videos while preparing my own portofolio , i am halfway there. Thanks a lot
@RyanNolanData
@RyanNolanData 8 ай бұрын
No problem. My first batch of classification vids are done working on regression now
@prathmesh_jadhav8930
@prathmesh_jadhav8930 Ай бұрын
Brother you doing awesome…. Upload more videos related to data analysis
@RyanNolanData
@RyanNolanData Ай бұрын
I have a full playlist of 70ish vids! Working on more though
@lewismurigi3623
@lewismurigi3623 3 ай бұрын
This was so much helpfull, Thanks Man
@RyanNolanData
@RyanNolanData 3 ай бұрын
No problem
@RyanNolanData
@RyanNolanData 3 ай бұрын
No problem
@tianbowen721
@tianbowen721 Ай бұрын
Pretty Amazing :) and I'd say it's some dense content to fit in 40 mins ~~I learned a lot
@RyanNolanData
@RyanNolanData 29 күн бұрын
Awesome
@far3582
@far3582 4 ай бұрын
I am trying to move away from R, and this is a great video. Thanks Ryan!
@RyanNolanData
@RyanNolanData 4 ай бұрын
No problem best of luck
@SuccessGossips
@SuccessGossips Ай бұрын
star means not out with highest score, you don't need to remove it
@ArhamZaiem
@ArhamZaiem 8 күн бұрын
In the highest inns score, why didn't you used rstrip to remove * instead of split??
@satishharijan7280
@satishharijan7280 7 ай бұрын
nice lecture bro thanks for this it is use full video for me
@RyanNolanData
@RyanNolanData 7 ай бұрын
No problem
@yvonnemukhono3566
@yvonnemukhono3566 Ай бұрын
Very helpful.
@RyanNolanData
@RyanNolanData Ай бұрын
No problem
@pradeeppadeliya
@pradeeppadeliya 11 ай бұрын
This is a best tutorial .... 👍👍👍👍👍👍👍👍👍👍👍👍👍👍
@RyanNolanData
@RyanNolanData 11 ай бұрын
Means a ton thank you
@Al-Ahdal
@Al-Ahdal Ай бұрын
@Ryan Nolan: Your videos are great indeed. It is requested to have a comprehensive series on "Data Analytics & Visualization". Thanks
@RyanNolanData
@RyanNolanData Ай бұрын
I have a full data Analyst playlist check it out
@Al-Ahdal
@Al-Ahdal Ай бұрын
@@RyanNolanData , could you please tag or locate. Thanks
@RyanNolanData
@RyanNolanData Ай бұрын
@@Al-Ahdal kzbin.info/aero/PLcQVY5V2UY4JrrKi2bW7DdOD08shTs4QQ
@benayawilly6536
@benayawilly6536 10 ай бұрын
good work. keep it up
@RyanNolanData
@RyanNolanData 10 ай бұрын
Thank you! I just uploaded a new video
@user-bf9lq6bb5s
@user-bf9lq6bb5s 3 ай бұрын
Totally it was a great effort and much appreciated for your hard work. I would like to know how to remove or drop null values from the columns. Thanks in advance
@RyanNolanData
@RyanNolanData 3 ай бұрын
Look up drop na
@user-bf9lq6bb5s
@user-bf9lq6bb5s 3 ай бұрын
Cheers man... any advice how to remove year from a columns. for instances, if a column has numeric and year values and want to remove year (2004 in format)only.@@RyanNolanData
@khan07700
@khan07700 27 күн бұрын
Sir when we import data from site to table I'm not getting the option of table 0 what's the solution for that at 1:54.
@salfrat55
@salfrat55 11 ай бұрын
"FS Jackson played for Cambridge University, Yorkshire and England. He spotted the talent of Ranjitsinhji when the latter, owing to his unorthodox batting and his race, was struggling to find a place for himself in the university side, and as captain was responsible for Ranji's inclusion in the Cambridge First XI and the awarding of his Blue. According to Alan Gibson this was "a much more controversial thing to do than would seem possible to us now". He was named a Wisden Cricketer of the Year in 1894. He captained England in five Test matches in 1905, winning two and drawing three to retain The Ashes. Captaining England for the first time, he won all five tosses and topped the batting and bowling averages for both sides, with 492 runs at 70.28 and 13 wickets at 15.46. These were the last of his 20 Test matches, all played at home as he could not spare the time to tour."
@RyanNolanData
@RyanNolanData 11 ай бұрын
Didn’t know this is a really cool story. Like Branch Rickey in baseball
@pavankalyan_297
@pavankalyan_297 7 ай бұрын
The star in the Highest score column means they were not out till the end of the match. Great tutorial Ryan. will it be possible for you to attach the notebook file here
@RyanNolanData
@RyanNolanData 7 ай бұрын
Thank you and I can look at adding the code to Github this weekend
@user-vl3hm9hv3x
@user-vl3hm9hv3x 17 күн бұрын
bro...u should have used replace method with regex for cleaning *,+ etc chars from the columns
@RyanNolanData
@RyanNolanData 17 күн бұрын
I used regex in my latest project and have a video coming out on it soon funny enough
@davideschreiber2821
@davideschreiber2821 7 ай бұрын
Lots of good stuff here, but I finally gave up at 31:24. If you're confused about what's happening, imagine how confused we learners are as you bounce around from cell to cell copying-pasting-deleting-trying again, trying to figure things out.
@RyanNolanData
@RyanNolanData 7 ай бұрын
Bugs are part of programming and no one is perfect. I show how it’s solved and why it happens
@MrFravallec
@MrFravallec 3 ай бұрын
Great tutorial, got this issue on the data types: AttributeError Traceback (most recent call last) Cell In[11], line 1 ----> 1 df['Inns']= df["Inns"].str.split(pat = '*').str[0] File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:5902, in NDFrame.__getattr__(self, name) 5895 if ( 5896 name not in self._internal_names_set 5897 and name not in self._metadata 5898 and name not in self._accessors 5899 and self._info_axis._can_hold_identifiers_and_holds_name(name) 5900 ): 5901 return self[name] -> 5902 return object.__getattribute__(self, name) File ~\anaconda3\Lib\site-packages\pandas\core\accessor.py:182, in CachedAccessor.__get__(self, obj, cls) 179 if obj is None: 180 # we're accessing the attribute of the class, i.e., Dataset.geo 181 return self._accessor --> 182 accessor_obj = self._accessor(obj) 183 # Replace the property with the accessor object. Inspired by: 184 # www.pydanny.com/cached-property.html 185 # We need to use object.__setattr__ because we overwrite __setattr__ on 186 # NDFrame 187 object.__setattr__(obj, self._name, accessor_obj) File ~\anaconda3\Lib\site-packages\pandas\core\strings\accessor.py:181, in StringMethods.__init__(self, data) 178 def __init__(self, data) -> None: 179 from pandas.core.arrays.string_ import StringDtype --> 181 self._inferred_dtype = self._validate(data) 182 self._is_categorical = is_categorical_dtype(data.dtype) 183 self._is_string = isinstance(data.dtype, StringDtype) File ~\anaconda3\Lib\site-packages\pandas\core\strings\accessor.py:235, in StringMethods._validate(data) 232 inferred_dtype = lib.infer_dtype(values, skipna=True) 234 if inferred_dtype not in allowed_types: --> 235 raise AttributeError("Can only use .str accessor with string values!") 236 return inferred_dtype AttributeError: Can only use .str accessor with string values!
@Muhammad.Kashif31
@Muhammad.Kashif31 3 ай бұрын
your data may be containing integer data, thats why you are getting the error
@VladislavShishkin11
@VladislavShishkin11 7 ай бұрын
I completed the project but I reopped it today and all the code was still there, but when I typed df it was the old table uncleaned? how do I make sure this doesn't happen again?
@RyanNolanData
@RyanNolanData 7 ай бұрын
Ill add my code to github this weekend
@marcus.the.younger
@marcus.the.younger 15 күн бұрын
save the cleaned data
@hemantsharma-xf3ub
@hemantsharma-xf3ub 3 ай бұрын
where i can get the notes
@tasmisa6778
@tasmisa6778 2 күн бұрын
How am I supposed to know all the alphabets are named as those you just did???
@taha5754
@taha5754 3 күн бұрын
Can you share the notebook used in this tutorial? @RyanNolanData
@RyanNolanData
@RyanNolanData 3 күн бұрын
I need to make a website article on this. It’ll have the code in there
@sachinnambiar
@sachinnambiar 3 ай бұрын
Its a dictionary right? Not a list. #rename multiple columns in a dictionary
@dogzrgood
@dogzrgood 7 ай бұрын
Star * means the batsman was not out 😊
@RyanNolanData
@RyanNolanData 7 ай бұрын
I appreciate it. Didn’t know
@Al-Ahdal
@Al-Ahdal Ай бұрын
@@RyanNolanData , Yes * mean batsman not out, but it won't affect any calculations. Great work indeed.
@rajareddyraju6773
@rajareddyraju6773 3 ай бұрын
19:09
@salfrat55
@salfrat55 11 ай бұрын
Headley @4 min mark 😂😁
@RyanNolanData
@RyanNolanData 11 ай бұрын
Haha one day I’ll buy your dup
Mastering Python Classes: A Step-by-Step Guide for Beginners
1:53:54
Ryan Nolan Data
Рет қаралды 1,7 М.
Data Cleaning in Pandas | Python Pandas Tutorials
38:37
Alex The Analyst
Рет қаралды 247 М.
Black Magic 🪄 by Petkit Pura Max #cat #cats
00:38
Sonyakisa8 TT
Рет қаралды 36 МЛН
格斗裁判暴力执法!#fighting #shorts
00:15
武林之巅
Рет қаралды 98 МЛН
КАРМАНЧИК 2 СЕЗОН 5 СЕРИЯ
27:21
Inter Production
Рет қаралды 600 М.
Exploratory Data Analysis with Pandas Python
40:22
Rob Mulla
Рет қаралды 422 М.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 258 М.
Master Data Cleaning Essentials on Excel in Just 10 Minutes
10:16
Kenji Explains
Рет қаралды 472 М.
The Complete Guide to Python Pandas Groupby
44:17
Ryan Nolan Data
Рет қаралды 6 М.
This Is Why Python Data Classes Are Awesome
22:19
ArjanCodes
Рет қаралды 789 М.
Python In Excel: Microsoft Changed EVERYTHING
14:37
Luke Barousse
Рет қаралды 444 М.
Не обзор DJI Osmo Pocket 3 Creator Combo
1:00
superfirsthero
Рет қаралды 1,3 МЛН
Топ-3 суперкрутых ПК из CompShop
1:00
CompShop Shorts
Рет қаралды 455 М.
Очень странные дела PS 4 Pro
1:00
ТЕХНОБЛОГ ГУБАРЕВ СЕРГЕЙ
Рет қаралды 370 М.
Iphone or nokia
0:15
rishton vines😇
Рет қаралды 573 М.
Задача APPLE сделать iPHONE НЕРЕМОНТОПРИГОДНЫМ
0:57