Real-World Dataset Cleaning with Python Pandas! (Olympic Athletes Dataset)

  Рет қаралды 16,478

Keith Galli

Keith Galli

Күн бұрын

I'm prepping a dataset for an upcoming tutorial and I figured walking through the process of cleaning it would work well for a livestream! We use various Python Pandas functions to accomplish our data cleaning goals.
We'll be working off of this repo:
github.com/KeithGalli/Olympic...
Some topics that we cover:
- How you can use web scraping to collect data like this (Python beautifulsoup).
- Splitting strings into separate columns
- Using regular expressions (regexes) to extract specific details from columns
- Converting columns to datetime & numeric types
- Grabbing only a subset of our columns
Sorry that this was a bit last minute scheduling-wise, will try to give more advance notice in the future!
Video timeline!
0:00 - Livestream Overview
4:00 - About the Olympics dataset (source website and how it was scraped)
9:50 - Cleaning the dataset (getting started with code & data)
19:26 - What aspects of our data should be cleaned?
29:08 - Get rid of bullet points in Used name column
34:08 - How to split Measurements into two separate height/weight numeric columns.
1:05:00 - Parse out dates from Born & Died columns
1:25:43 - Parse out city, region, and country from Born column (working with regular expressions)
1:41:15 - Get rid of the extra columns
1:46:08 - Next steps (how would we clean the results.csv)
1:49:41 - Questions & Answers
-------------------------
Follow me on social media!
Instagram | / keithgalli
Twitter | / keithgalli
TikTok | / keithgalli
-------------------------
Practice your Python Pandas data science skills with problems on StrataScratch!
stratascratch.com/?via=keith
Join the Python Army to get access to perks!
KZbin - / @keithgalli
Patreon - / keithgalli
*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.

Пікірлер: 28
@KeithGalli
@KeithGalli 16 күн бұрын
Thank you everyone who tuned in today!!
@beauforda.stenberg1280
@beauforda.stenberg1280 16 күн бұрын
I missed the live stream, but I am watching this video atm. This is the second upload of yours I have watched. I am a subscriber and wish to thank you very much for your uploads. Please, keep them coming. I am very new to Python. I am learning Python: firstly, to realise a knowledge graph 'index' for computational shells and shell scripting in the widest possible purview, for a Web app/website version of a dedicated work on computational shells and shell scripting, I have spent the last six months writing. I need to extract all the data from an archive of Markfown files, the book I have written, which involves cleaning, preserving the relationships of the data to inform the generation of an ontology of the computational shells and shell scripting domain, through natural language processing. Establish a dataset. Export dataset into a directed graph. Visualise with NetworkX. I don't yet know how to do any of this. If you could cover some of the processes involved to realise a knowledge graph from a Markdown file, that would be brilliant! Thanks again for your uploads.
@danprovost8232
@danprovost8232 15 күн бұрын
Great stream this was very helpful! Keep up the good work!
@KeithGalli
@KeithGalli 15 күн бұрын
My man 💪
@aishwaryapattnaik3082
@aishwaryapattnaik3082 6 күн бұрын
Such a great tutorial Keith. Please keep uploading such high quality videos on Pandas and many more
@marcinjagusz2481
@marcinjagusz2481 15 күн бұрын
Thanks Keith! I know it takes some time to prepare and record such staff, but please upload more of Python coding!
@KeithGalli
@KeithGalli 12 күн бұрын
will try to keep them coming!
@Kidpambi
@Kidpambi 15 күн бұрын
Thanks a lot man
@KeithGalli
@KeithGalli 12 күн бұрын
you're very welcome!
@chenjackson6001
@chenjackson6001 7 күн бұрын
感谢你的辛苦付出
@KeithGalli
@KeithGalli 7 күн бұрын
不客气
@AnasM24
@AnasM24 8 күн бұрын
Thank you man
@KeithGalli
@KeithGalli 7 күн бұрын
you're welcome!
@067-ashish7
@067-ashish7 15 күн бұрын
Please Upload more videos related to data cleaning
@Kira-vs4np
@Kira-vs4np 6 күн бұрын
just a note, at 1:19:21 the format = "mixed" isn't really working for me, and it fills the date_born column with NaT values. So, I tried format = "%d %B %Y" and it works
@chillydoog
@chillydoog 14 күн бұрын
Hawaiian shirt and Twisted Tea! My man
@KeithGalli
@KeithGalli 12 күн бұрын
hawaiian shirt yes, but sorry to disappoint just a standard sparkling water I'm drinking haha
@chillydoog
@chillydoog 10 күн бұрын
@@KeithGalli 😉
@ramarisonandry8571
@ramarisonandry8571 14 күн бұрын
From Madagascar
@Hamsters_Rage
@Hamsters_Rage 4 күн бұрын
29:26 - he starts writing some code
@sebastianalvarez1537
@sebastianalvarez1537 14 күн бұрын
holy fuq
@KeithGalli
@KeithGalli 12 күн бұрын
😎😎
@SAGAR-ox6ks
@SAGAR-ox6ks 9 күн бұрын
i did chatgpt for the questions that you framed and it is showing same solution , i could have easily done chatgpt rather than seing this video just download the dataset and put some rows of the dataset in chatgpt and put all the frames question they will be same as in this video for 2 hrs, it took 5 min for chatgpt to do..
@mohammadsamir2713
@mohammadsamir2713 9 күн бұрын
If you're not going to support people efforts, at least don't disappoint them
@Opoliades
@Opoliades 6 күн бұрын
Yeah, but what are you going to do when ChatGPT can’t save you? You didn’t “easily” do the task at hand… you made someone/something else do it. Maybe data analyzing isn’t your thing. Perhaps consider being a LLM-expert instead 😊
Python's 5 Worst Features
19:44
Indently
Рет қаралды 42 М.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 254 М.
Этого От Него Никто Не Ожидал 😂
00:19
Глеб Рандалайнен
Рет қаралды 6 МЛН
КАРМАНЧИК 2 СЕЗОН 4 СЕРИЯ
24:05
Inter Production
Рет қаралды 585 М.
MINHA IRMÃ MALVADA CONTRA O GADGET DE TREM DE DOMINÓ 😡 #ferramenta
00:40
JavaScript Identity Generator
4:13
ScriptingCollege
Рет қаралды 5
The most important Python script I ever wrote
19:58
John Watson Rooney
Рет қаралды 18 М.
How I became an unemployed MIT grad still living with my parents.
21:12
Data Cleaning in Pandas | Python Pandas Tutorials
38:37
Alex The Analyst
Рет қаралды 232 М.
How to turn data into stories
50:43
storytelling with data
Рет қаралды 125 М.
Exploratory Data Analysis with Pandas Python
40:22
Rob Mulla
Рет қаралды 406 М.
MIT Introduction to Deep Learning | 6.S191
1:09:58
Alexander Amini
Рет қаралды 62 М.
Master Data Cleaning with Power Query in Excel in 9 Minutes
9:26
MyOnlineTrainingHub
Рет қаралды 59 М.