Real-World Dataset Cleaning with Python Pandas! (Olympic Athletes Dataset)

Рет қаралды 34,403

Keith Galli

Күн бұрын

Пікірлер: 48

@KeithGalli 8 ай бұрын

Thank you everyone who tuned in today!!

@rrrprogram8667 5 ай бұрын

I really thank god that I found your channel thanks for sharing knowledge and keep uploading

@kebincui 3 ай бұрын

Fabulous session. Thanks Keith 👍

@aishwaryapattnaik3082 8 ай бұрын

Such a great tutorial Keith. Please keep uploading such high quality videos on Pandas and many more

@lisitashamatutu1140 4 ай бұрын

watching from Zambia 🇿🇲

@danprovost8232 8 ай бұрын

Great stream this was very helpful! Keep up the good work!

@KeithGalli 8 ай бұрын

My man 💪

@marcinjagusz2481 8 ай бұрын

Thanks Keith! I know it takes some time to prepare and record such staff, but please upload more of Python coding!

@KeithGalli 8 ай бұрын

will try to keep them coming!

@chenjackson6001 8 ай бұрын

感谢你的辛苦付出

@KeithGalli 8 ай бұрын

不客气

@zahidmhd 3 ай бұрын

we need more like this videos and work on real world data

@AndyJagroom-ur7xh 5 ай бұрын

What's your laptop? Cool videos BTW

@nabuzaidnasr 2 ай бұрын

thank you

@Kira-vs4np 8 ай бұрын

just a note, at 1:19:21 the format = "mixed" isn't really working for me, and it fills the date_born column with NaT values. So, I tried format = "%d %B %Y" and it works

@AndyJagroom-ur7xh 5 ай бұрын

Can you do an update on the numpy video, thank you so much for these videos it helped me a lot ❤

@brendanthorne8353 4 ай бұрын

Hi Keith, watching this video and following along. Just wondering if when we got the fillna code from chat gpt if we should have applied that to our original data frame? Loving the content!

@Hamsters_Rage 7 ай бұрын

29:26 - he starts writing some code

@chillydoog 8 ай бұрын

Hawaiian shirt and Twisted Tea! My man

@KeithGalli 8 ай бұрын

hawaiian shirt yes, but sorry to disappoint just a standard sparkling water I'm drinking haha

@chillydoog 8 ай бұрын

@@KeithGalli 😉

@SangNguyen-bu8xd 6 ай бұрын

Amazing thank u sir

@AnasM24 8 ай бұрын

Thank you man

@KeithGalli 8 ай бұрын

you're welcome!

@067-ashish7 8 ай бұрын

Please Upload more videos related to data cleaning

@zahidmhd 3 ай бұрын

okay i need full course on data science

@Kidpambi 8 ай бұрын

Thanks a lot man

@KeithGalli 8 ай бұрын

you're very welcome!

@alphonsinebyukusenge3071 5 ай бұрын

Where can we find the dataset?

@vg5675 7 ай бұрын

Should i always drop the rows containing null values and then perform the further analysis???

@rohitsinha1092 7 ай бұрын

not necessarily it depends you see in case of doing the same kind of cleaning for machine learning dropping an entire col can cause loss of data that might have helped in pattern recognition of the ml algorithm so you can use other methods to handle missing values for that case but i think its better to just handle them seperately rather than just drop an entire coln even tho that is a possible approach for smaller datasets so its case by case basis but as i am analysing this dataset now i see a few colns with excessively large amounts of null values so i think its okay to drop them. Cheers

@hassankhalid5569 5 ай бұрын

HATS OFF TO YOU BRO..........BRING SOME REAL LIFE PROBLEMS AND END TO END PROJECTS RELATED. TO DATA SCIENCE

@ramarisonandry8571 8 ай бұрын

From Madagascar

@sebastianalvarez1537 8 ай бұрын

holy fuq

@KeithGalli 8 ай бұрын

😎😎

@NaveedAhmed-xt4xk Ай бұрын

why are you drinking soda Keith Galli

@KeithGalli Ай бұрын

It's a sparkling water! No sugar or calories :)

@rrcr4769 5 ай бұрын

Hi Keith, This code handles the issue will: # Split column 'Measurements'to height_cms and weight_kgs dfCpy['height_cm'] = None # add a blank column to store height dfCpy['weight_kgs'] = None # add a blank column to store weight # Extract height and weight information dfCpy['height_cm'] = dfCpy['Measurements'].str.extract(r'(\d+) cm', expand=False).astype(float) dfCpy['weight_kgs'] = dfCpy['Measurements'].str.extract(r'(\d+) kg', expand=False).astype(float) dfCpy

@SAGAR-ox6ks 8 ай бұрын

i did chatgpt for the questions that you framed and it is showing same solution , i could have easily done chatgpt rather than seing this video just download the dataset and put some rows of the dataset in chatgpt and put all the frames question they will be same as in this video for 2 hrs, it took 5 min for chatgpt to do..

@mohammadsamir2713 8 ай бұрын

If you're not going to support people efforts, at least don't disappoint them

@Opoliades 8 ай бұрын

Yeah, but what are you going to do when ChatGPT can’t save you? You didn’t “easily” do the task at hand… you made someone/something else do it. Maybe data analyzing isn’t your thing. Perhaps consider being a LLM-expert instead 😊

@cnliving 3 ай бұрын

Great! For height/weight parts, it's a bit longer, there be some simple solution measure_pattern = r'(?:(\d+)\s*cm)?(?:\s*/\s*)?(?:(\d+)\s*kg)?' df[['height', 'weight']] = df['Measurements'].str.extract(measure_pattern)

@ajp3355 Ай бұрын

you not use .fillna on your df code? df['weight_kg'] = df['weight_kg'].fillna(df['height_cm'])

@KeithGalli Ай бұрын

I didn't want to fillna in this specific dataset given that weights are associated with specific individuals. It didn't seem right to try to automatically populate weights for people based on an average weight or something similar. It's okay to have some nan values in your datasets.

@youcefbouras-f1s 3 ай бұрын

that's what i used : # Parse out dates from Born and Died df['Born Date'] = df['Born'].str.replace(r'in.*','', regex=True) df['Death Date'] = df['Died'].str.replace(r'in.*','', regex=True)