Solving real-world data analysis problems with Python Pandas! (Lego dataset analysis)

  Рет қаралды 87,418

Keith Galli

Keith Galli

Күн бұрын

Пікірлер: 120
@KeithGalli
@KeithGalli 2 жыл бұрын
Level up your data science skills with courses, projects, and competitions offered by DataCamp! Use my link below and check out the first chapter of any course for FREE! :) datacamp.pxf.io/c/3588040/1012793/13294
@masternobody1896
@masternobody1896 2 жыл бұрын
can you do some google job coding. so how can i get a job
@KeithGalli
@KeithGalli 2 жыл бұрын
Big shout-out to my mom for not throwing away my Legos! She's the real MVP
@bobbyg603
@bobbyg603 2 жыл бұрын
Thanks mom!
@vishwasjajpura796
@vishwasjajpura796 2 жыл бұрын
Finally Keith will build his LEGO
@ocraking
@ocraking 9 ай бұрын
nice Kevin Durant reference
@DataProfessor
@DataProfessor 2 жыл бұрын
Wow the Lego stop motion was awesome!
@ahsanshah1866
@ahsanshah1866 2 жыл бұрын
Data professor is here 😀
@KenJee_ds
@KenJee_ds 2 жыл бұрын
dude, loved the intro!
@KeithGalli
@KeithGalli 2 жыл бұрын
Hahaha thanks man :). Very happy that my mom didn't throw out all of my legos!
@rafaelmello8194
@rafaelmello8194 2 жыл бұрын
I'm a begginer in Python and I'm learning a lot from you. You are an awesome teacher. Your pacing and didactic are perfect. Thanks a lot for your effort
@markomarjanovic8348
@markomarjanovic8348 Жыл бұрын
Absolutely love the raw natural style you are doing, hope everyone else appreciates it too, keep going buddy, you are amazing!
@lVaNeSsA90
@lVaNeSsA90 2 жыл бұрын
Thanks for being honest while you search for syntax in the beginning. Love this raw, step by step video. I'm using your videos on my project to get inspired ❤️ thanks for being a good tutor 😊
@alan6506305
@alan6506305 2 жыл бұрын
God, this is brilliant. I watched the other two videos of yours on Pandas. You are a great teacher and friend. Thank you very much for your hard work and kindness.
@FIBONACCIVEGA
@FIBONACCIVEGA 2 жыл бұрын
This video has been a true inspiration to continue learning. I'm doing the datacamp since I want to change my field and I've always liked programming and analyzing data. But he didn't know if he could use the learned knowledge to use it in real life. Now I know that everything I have learned is what is used in real life data analysis. Saludos
@simonvanwijk5178
@simonvanwijk5178 2 жыл бұрын
Man so good to have you back! If it was not for you I would have not gotten a role as a DA as you helped me the most in the beginning.
@JW-pu1uk
@JW-pu1uk 2 жыл бұрын
I really like the thought process in these videos. It's very raw, and really will translate well to an actual work project.
@thebeeskhakis7145
@thebeeskhakis7145 2 жыл бұрын
I'm so happy you're back. Your videos helped me get my new job!
@rksingh1997mp
@rksingh1997mp 2 жыл бұрын
He’s back baby!!
@logannon
@logannon 2 жыл бұрын
Dude, I thought you were dead. Your videos have helped me so much. Glad to see you back!
@amansorout.6779
@amansorout.6779 2 жыл бұрын
Happy to see you back, fighting with something serious, you are not alone.
@H99x2
@H99x2 2 жыл бұрын
These type of videos are your strengths! Great tutorial and explanation Keith
@stratascratch
@stratascratch 2 жыл бұрын
Good to see you’re back!
@qalinlekhaliif5518
@qalinlekhaliif5518 2 жыл бұрын
Thanks a lot man. Your videos are helpful and entertaining as well. We appreciate your great work.
@danielsantoyo2640
@danielsantoyo2640 2 жыл бұрын
Im so happy to see you are back! Panda and Numpy tutorials would be great !!! I’m currently trying to learn panda and numpy for data analytics and this video was super interesting !!! Thanks Keith keep going you are doing great 💯
@weitingteng3241
@weitingteng3241 2 жыл бұрын
Great great and great to see you back
@leomiao5959
@leomiao5959 2 жыл бұрын
The man is back. The hero is back for us!!
@ben-tiki
@ben-tiki 2 жыл бұрын
Another great video Keith! Glad to see yo back. Awesome that you got to work with datacamp. Please if you can make a video o OpenAI it would be awesome. Ive been using their API and its awesome
@Sensei10238
@Sensei10238 2 жыл бұрын
Finally back! It helped me a lot in learning python! Thank you so much!
@PaYaMv2
@PaYaMv2 2 жыл бұрын
Good to have you back my dude! Loooooooved this!
@Omzodijacky
@Omzodijacky 2 жыл бұрын
Man , I'm happy you are back ! you were truly missed
@Magmatic91
@Magmatic91 2 жыл бұрын
Did this project on DataCamp. Was a lot of fun.
@azrmuradl6420
@azrmuradl6420 2 жыл бұрын
Please provide more such kind of videos, or as you always do, give us tips about how we can find such kind of real world ds projects online.
@YunusFidan_
@YunusFidan_ 2 жыл бұрын
Good to see you uploading again!!
@rafaelcastellarmartinez3498
@rafaelcastellarmartinez3498 2 жыл бұрын
Hi Keith, just tried to do the project with you and i got that Star Wars was not the most popular theme in 2004 - Harry Potter and 2017 - Super Heroes, weird that datcamp test said ok, but i did the math manually and harry potter was the most popular in 2004, thanks for your videos. an student from Colombia Latin America!
@adelekeemmanuel4917
@adelekeemmanuel4917 Жыл бұрын
omg... i just did the exercise myself and i discovered the same thing too... Came ti check the video but im seeing something else
@itsReshad
@itsReshad 2 жыл бұрын
Love the great content! Please dont stop! You have an impeccable way of teaching its amazing
@kirubaselvi6754
@kirubaselvi6754 2 жыл бұрын
Keith, Pytorch tutorial please
@KeithGalli
@KeithGalli 2 жыл бұрын
I definitely want to! I need to spend considerable time reviewing and building up my own PyTorch skills before I make a tutorial on it.
@putyah
@putyah 2 жыл бұрын
Awesome video. Small detail: On the new era answer you typed the variable in. It would be nicer to drop every value that is Star Wars. Next select the remaining year as an variable. When the dataset is changed the variable is dynamic so the answer would still be correct.
@KeithGalli
@KeithGalli 2 жыл бұрын
Good suggestion! I agree that would be a better way to go about it :)
@cyrilodoi6868
@cyrilodoi6868 2 жыл бұрын
So good to have you back man! 💯
@tuandino6990
@tuandino6990 2 жыл бұрын
Question 2: theme_count_by_year = licensed_lego_set.groupby('year')['parent_theme'].value_counts().unstack() theme_count_by_year.fillna(0, inplace=True) theme_count_by_year = pd.DataFrame.transpose(theme_count_by_year) Or you can use pivot_table function. By approaching in this way you can create a data frame that's easy to do plot (heatmap) and make high number pops out.
@tuandino6990
@tuandino6990 2 жыл бұрын
@Josh Yorko nice
@manfungnewmanyu1426
@manfungnewmanyu1426 2 жыл бұрын
Yeah!!! Your tutorial is very great and help me so much at the AI master course .
@MashiroRedo
@MashiroRedo 2 жыл бұрын
Waited so long! Thank you
@ocraking
@ocraking 9 ай бұрын
Dude, you ROCK
@admonitoring-pi9os
@admonitoring-pi9os 8 ай бұрын
Hello there. I hope you are good. I am a little late with this comment because this video is already more than 2 years old but since i have started learning python now its the right time for me. where can i find the codes you explained in the video bcz no code is availbale in the project file at the github provided link.
@kartikeyasharma9908
@kartikeyasharma9908 2 жыл бұрын
Hi Keith, loving the video tutorials!
@1990andstillgoing
@1990andstillgoing 2 жыл бұрын
props for sharing your knowledge man, its really easy to understand and apply what you're doing (Y)
@terrytas13
@terrytas13 2 жыл бұрын
Welcome back Keith, so good to see your face again. Stay well my friend!
@KeithGalli
@KeithGalli 2 жыл бұрын
Glad to be back!! :)
@tuandino6990
@tuandino6990 2 жыл бұрын
I've been waiting for this
@terrytas13
@terrytas13 2 жыл бұрын
Love the introduction!!!
@Levy957
@Levy957 2 жыл бұрын
that task #2 was really hard to do alone
@ChileHeroico
@ChileHeroico 2 жыл бұрын
keep doing more videos pls :D
@jongcheulkim7284
@jongcheulkim7284 2 жыл бұрын
Thank you, sir. I had lots of fun^^
@shahoftrading
@shahoftrading 2 жыл бұрын
question: when you merge when using left_on and right_on ...we get the merged df. So for the merged df and under parent_theme why are most if not all of those are "Legoland" and all IDs are 411? also how do we check the full tabular data -- print(df)?
@baburamchaudhary159
@baburamchaudhary159 2 жыл бұрын
in line [99] ie. .groupby(['year', 'parent_theme']) and in next line: .drop_duplilcates(['year']) since we already have grouped by 'year' and 'parent_theme' [I think, it groups unique year and parent_theme] why do we need to drop duplicates by 'year'?
@lucaspioli7970
@lucaspioli7970 2 жыл бұрын
Love your videos! Keep going
@Viralvlogvideos
@Viralvlogvideos 2 жыл бұрын
welcome back to your first tutorial after long back :P
@aditiparashar9171
@aditiparashar9171 Жыл бұрын
you are freakingly smart!
@dharshankumar2522
@dharshankumar2522 2 жыл бұрын
Keith is back...yeahhhh
@kotharidhruv75
@kotharidhruv75 2 жыл бұрын
w8ing fr more such videos
@freddy4videos
@freddy4videos Жыл бұрын
thank you, much love
@alkiviadessavoullis2021
@alkiviadessavoullis2021 2 жыл бұрын
does anyone know why when I press continue or start project the Python Use python ... code checks gets highlighted pink and I can't work on the project ?
@guisande
@guisande 2 жыл бұрын
Hey Keith, I'm divided between going towards data science or cyber security. I love both but I kinda needs to make money by now. Do you think I can own money in a short time in data science? Working as a freelancer or supporting small companies... Edit: I'm glad that you came back. Really love your videos
@adeshmishra1671
@adeshmishra1671 2 жыл бұрын
Go for Cybersecurity brother, Since difficulty level is medium.. But while earning 💰 you can also learn data scientist!!
@ratchakoon
@ratchakoon 2 жыл бұрын
themes.csv which you provided on github does not have 'is_licensed' field. Is 'parent_id' filed as same as 'is_licensed' field?
@KeithGalli
@KeithGalli 2 жыл бұрын
A little confusing, but you want to use parent_themes.csv, not themes.csv !!
@ratchakoon
@ratchakoon 2 жыл бұрын
@@KeithGalli Thank you
@ДимитърСираков-щ7ы
@ДимитърСираков-щ7ы 2 жыл бұрын
keep up the good work!
@raghavgoyal3324
@raghavgoyal3324 2 жыл бұрын
please upload a project every week
@KeithGalli
@KeithGalli 2 жыл бұрын
I'll try my best!
@ElianMrl
@ElianMrl 2 жыл бұрын
Hey guys, would it be a good idea to use Datacamp projects in my resume?
@clayherz_
@clayherz_ Жыл бұрын
if i solve the second question with this code, counted_2 = licensed_sets.groupby(["year", "parent_theme"])[["is_licensed"]].count() counted_2 = counted_2.reset_index().sort_values("is_licensed", ascending=False) counted_2.drop_duplicates("year").sort_values("year", ascending=True) is it wrong
@baggid6257
@baggid6257 2 жыл бұрын
He is back~!
@nitiknayyar7659
@nitiknayyar7659 2 жыл бұрын
Damn I also started this project on Datacamp.
@rodrigo100kk
@rodrigo100kk 2 жыл бұрын
This dude is cool, this chanel too.
@codewithkarthik7136
@codewithkarthik7136 2 жыл бұрын
nice video keith
@rabinmainali3373
@rabinmainali3373 2 жыл бұрын
I done it in following ways:(question 2) 1. i count each licenced film released every year. 2.Then count the only star wars film released every year 3.And i calculate the proportion of step2 and step1. Is it okey ? ,by the way the result is also 2017 for me.
@damarbowo
@damarbowo 2 жыл бұрын
Can I see your membership playlist? I can't find that playlist
@KeithGalli
@KeithGalli 2 жыл бұрын
Hmm I'm not sure what you are asking to see, can you clarify?
@damarbowo
@damarbowo 2 жыл бұрын
@@KeithGalli you have a membership benefits. One of the benefit is got playlist or videos for member. Do you have an example the video or playlist for member join your channel? Hope you understand
@KeithGalli
@KeithGalli 2 жыл бұрын
I just started my memberships last week so I haven't posted any exclusive videos there yet. To get an idea of the types of content I'll post there, check out these videos kzbin.info/www/bejne/p5-2d2uPlrWrbZo kzbin.info/www/bejne/hZyooIN_hNypnsk
@damarbowo
@damarbowo 2 жыл бұрын
@@KeithGalli I'll wait Keith. Regards
@KeithGalli
@KeithGalli 2 жыл бұрын
Sounds good!
@manu93ize
@manu93ize 2 жыл бұрын
bro Can you do a tutorial on data cleaning with Pyspark with real world example.
@merterisen
@merterisen 2 жыл бұрын
16:52 how did you change 'Star wars' text immediately?
@KeithGalli
@KeithGalli 2 жыл бұрын
Lol that was just video editing xD.
@gersonchadijunior7499
@gersonchadijunior7499 2 жыл бұрын
Hey Keith, I love so much your videos. I've been learning Pandas with you since your pokemon's video, but I feel that the last answer is not accurate and in fact the right year should be 2006, because it was the year with less Star Wars Sets released. Can I send you my code somehow?
@letsjoinhands
@letsjoinhands 2 жыл бұрын
hello again Keith. For Q#2 I am getting a different result for new_era using this code: So the lego_all_lic is the DF containing all licensed lego set themes with the shape (1179 x 8) and that has been grouped by year to form lego_all_lic_yr. And the rest of the code I have written is quite simple to understand. Looks as if I have made a big mistake in aggregation but can't seem to locate it. lego_all_lic_yr = pd.DataFrame(lego_all_lic.groupby(by = ['year', 'parent_theme'], axis = 0).agg(Parent_Theme = ('set_num', 'count'))) lego_all_lic_yr.reset_index( inplace = True) lego_all_lic_yr.replace(to_replace = [theme for theme in lego_all_lic_yr['parent_theme'] if theme != 'Star Wars'], value = 'Others', inplace = True) lego_all_lic_yr = pd.DataFrame(lego_all_lic_yr.groupby(by = ['year', 'parent_theme'], axis = 0).agg(Parent_Theme = ('Parent_Theme', 'sum'))) lego_all_lic_yr When you look at the result it shows that 2006 was the first year in which Star Wars lost to other themes in terms of the sets released in that year.
@letsjoinhands
@letsjoinhands 2 жыл бұрын
Ok so I misunderstood the Q basically. It wasn't about Star Wars themed sets vs All The Rest rather it the year in which Star Wars lost out to some other individual theme. Got the correct answer using: lego_all_lic_yr = pd.DataFrame(lego_all_lic.groupby(by = ['year', 'parent_theme'], axis = 0).agg(Parent_Theme = ('set_num', 'count'))) lego_all_lic_yr.reset_index( inplace = True) lego_all_lic_yr = pd.DataFrame(lego_all_lic_yr.groupby(by = ['year', 'parent_theme'], axis = 0).agg(Parent_Theme = ('Parent_Theme', 'sum'))) lego_all_lic_yr = lego_all_lic_yr.sort_values(by = ['year','Parent_Theme'], ascending = False) lego_all_lic_yr.head(50)
@gopikaprasad8607
@gopikaprasad8607 Жыл бұрын
How to export the for loops result into excel?? Please reply
@sabbirahmed8012
@sabbirahmed8012 2 жыл бұрын
Hello Keith, can you please mention some resource to master natural language processing?
@KeithGalli
@KeithGalli 2 жыл бұрын
Hey! I actually did a PyCon lecture on NLP. That should be pretty helpful: kzbin.info/www/bejne/rKqymIqerLqgm8U
@mufasao6776
@mufasao6776 2 жыл бұрын
I see that you posted some of your hidden videos. Thank you.
@БулатМиннуллин-р8щ
@БулатМиннуллин-р8щ 2 жыл бұрын
why didn't you use .agg?
@sanjeetlal1873
@sanjeetlal1873 2 жыл бұрын
Legend's back❤️
@zeasammy7572
@zeasammy7572 2 жыл бұрын
Does DataCamp have video learning platform?
@KeithGalli
@KeithGalli 2 жыл бұрын
The typical structure of classes is short videos that overview the concepts and then a bunch of interactive problems with a code editor to drill down the technical side of those concepts.
@V0X._.T3K
@V0X._.T3K Жыл бұрын
keith moment
@davida99
@davida99 2 жыл бұрын
Yoooo love the vids
@letsjoinhands
@letsjoinhands 2 жыл бұрын
Hi Keith! this is how I solved Q # 1. Pls let me know if this is a bad coding practice, is acceptable or is good in your opinion. so I first made a function called is_lic. def is_lic(df_1, df_2): df_1['is_licensed'] = bool theme_1 = list(df_1['parent_theme']) theme_2 = list(df_2['name']) lic_status = list(df_2['is_licensed']) for i, s in enumerate(theme_1): for r, t in enumerate(theme_2): if s == t: df_1['is_licensed'][i] = lic_status[r] Then is_lic(lego_sets, lego_themes) Then all_themes = [ ] for r in lego_sets.itertuples(): all_themes.append([ r[6], r[1], r[7] ]). Then all_lic_themes = [x for [x, y, z] in all_themes if y is not np.NaN and z == True] star_wars = [theme for theme in all_lic_themes if theme == 'Star Wars'] the_force = int(len(star_wars)/len(all_lic_themes) * 100) the_force = 51%
@KeithGalli
@KeithGalli 2 жыл бұрын
So my biggest recommendation based on your code is to be more explicit with how you name your variables. So instead of "df_1" & "df_2" you might name those dataframes "parent_themes_df" & "lego_sets_df" respectively. Furthermore it would be better to name variables "i" & "s" something like "parent_theme_index" & "parent_theme_value". These types of changes will make your code more readable. Functionally, everything looks sound though. Nice work!
@letsjoinhands
@letsjoinhands 2 жыл бұрын
@@KeithGalli thanks a bunch Keith. and now in retrospect when I think about how you were working on solving this Q in the video I realised that all the time you were using pandas built in methods to solve the Q. so yes we could use a smattering of python methods to do this (like I did) but using that libraries' built-in methods would be more simpler and advantageous most of the times. Is that correct?
@igor-xadrezxadrez8541
@igor-xadrezxadrez8541 2 жыл бұрын
Hey, there's a red dot on your nose.
@KeithGalli
@KeithGalli 2 жыл бұрын
I got in a fight playing hockey!
@Viralvlogvideos
@Viralvlogvideos 2 жыл бұрын
Big nose :P
@AbhishekSharma-hy4nl
@AbhishekSharma-hy4nl 2 жыл бұрын
Bro what happened to your nose😟?
@KeithGalli
@KeithGalli 2 жыл бұрын
Got into a little fight playing ice hockey! We won the game though so it's cool xD
5 Python Libraries You Should Know in 2025!
22:30
Keith Galli
Рет қаралды 21 М.
Solving real world data science tasks with Python Pandas!
1:26:07
Keith Galli
Рет қаралды 1,5 МЛН
Из какого города смотришь? 😃
00:34
МЯТНАЯ ФАНТА
Рет қаралды 2,5 МЛН
The Singing Challenge #joker #Harriet Quinn
00:35
佐助与鸣人
Рет қаралды 45 МЛН
Exploratory Data Analysis with Pandas Python
40:22
Rob Mulla
Рет қаралды 500 М.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 275 М.
Complete Python Pandas Data Science Tutorial! (2024 Updated Edition)
1:34:11
Из какого города смотришь? 😃
00:34
МЯТНАЯ ФАНТА
Рет қаралды 2,5 МЛН