Python Pandas Tutorial (Part 8): Grouping and Aggregating

Python Pandas Tutorial (Part 8): Grouping and Aggregating - Analyzing and Exploring Your Data

Рет қаралды 441,376

Күн бұрын

Пікірлер: 734

@coreyms 5 жыл бұрын

I hope everyone had a great week! We've got a long video this week, but we go over a lot of important topics about how to analyze data in Pandas. We will learn how to answer very interesting questions such as "What is the most popular social media site by country?". I put timestamps together for this video so that you all can skip around if you need to go back and watch a specific section. Here are those timestamps: Aggregate Column - 2:00 Aggregate DataFrame - 3:55 Value Counts - 7:51 Grouping - 12:30 Multiple Aggregates on Group - 26:00 People Who Know Python By Country - 27:20 Practice Question - 34:20 Concat Series - 37:27 Have a great weekend everybody!

@calebmbugua745 5 жыл бұрын

Thanks so much bro,,,,much love from kenya

@anonymous-kl1un 4 жыл бұрын

Hey, is this series gonna continue?

@anonymous-kl1un 4 жыл бұрын

Can you explain all the types of joins

@anonymous-kl1un 4 жыл бұрын

And if possible please explain multi-level Indexing as well

@JoshuaDHarvey 4 жыл бұрын

Corey, is it safe to assume if your coming from a SQL background, that you can effectively use things like the 'pd.concat()' to replace the various joins (left, right, inner etc) workflows in SQL and just use SQLAlchemy or pyodbc libs to load the data and then do all the calculations with python that you would normally do in whatever SQL dialect?

@parthrawri3001 4 жыл бұрын

I love the fact that there are no ads interrupting in the middle. So thoughtful. ❤️

@coreyms 4 жыл бұрын

Yeah, I didn’t want the to ruin the flow of the videos. Glad you noticed :)

@parthrawri3001 4 жыл бұрын

Corey Schafer OMG! Your reply just made my day!

@livingwithlinlin3122 4 жыл бұрын

@@coreyms Thank you so much for doing this. You are such a considerable person with a big heart.

@JoshKonoff1 3 жыл бұрын

Corey, do you have a Patreon page? Thank you for your exceptional videos; a huge help for me and so many people!

@anubhavtomar1384 4 жыл бұрын

3:10 median function 5:00 describe function 7:20 count() 8:05 value_counts() 12:51 grouping the data 14:39 groupby() function 16:07 get_group(), grabbing a specific group by name 17:30 doing same by using the filters 18:40 using value_counts on filters 20:20 value_counts() for groups 21:49 using loc to find for one country 23:40 percentage by using normalize 25:00 median by country group 26:13 agg function for multiple functions 27:30 using filtering to get python users by country 30:20 error on using same approach for groups 31:40 apply method to run that on group 35:40 finding the percentage of people using python in each country(group) 37:40 using concat for combining series in a dataframe 45:30 adding percentage column

@afdqwfqwqwdfqwdawdas 4 жыл бұрын

thx, this is very useful. The videos already are very concise and to the point, but if I am just looking for how to do a proper groupby quickly on my own dataset....

@umutdemir2762 4 жыл бұрын

thanks a lot.

@ravishekharprakash4172 4 жыл бұрын

@@afdqwfqwqwdfqwdawdas sure

@80expertube 4 жыл бұрын

FYI, the percentage problem can be solved alternatively as follows: country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum()/x.count())

@sayarmandal1885 4 жыл бұрын

@@80expertube It throws a RuntimeWarning

@kylebeckhorn885 4 жыл бұрын

Yes please, do a video on the topic of MULTIPLE INDEXING!!

@j3553hh 7 ай бұрын

I would pay to see Corey's tutorial on this. Every time I encounter a multi-index, I'm on stack overflow. It just doesn't seem to stick.

@pewolo 3 жыл бұрын

Let's all admit that this dude is a hard working man and his work is just a wow! I've been following him for quite some time now and I am always impressed by how thoughtful, tactical and clear his explanation is in every tutorial he makes. Hat off to you, dude!

@zhenpan2048 Жыл бұрын

numeric_columns = df.select_dtypes(include="number") medians=numeric_columns.median() print(medians) # this is a way of getting the medians of numerical values as I use df.median(), it gave me value error that says could not convert string to float"I am not a student who is learning to code" thanks for great work. I learn more from you than from my professors. Thank you so much for great efforts!😎

@heretolearndshare Жыл бұрын

You saved my learning session, thanks!

@giovannimantovani795 11 ай бұрын

Thank you bro

@salehabdullahi9356 11 ай бұрын

Thank you, you save me alot of time,

@mn4769 9 ай бұрын

I found that you can shorten it by writing numeric_columns.median()

@sick-ol3jd 7 ай бұрын

Thanks man

@prakhararora8981 Жыл бұрын

hey if ur df.median() doesn't work and ur getting typeerror and valueerror u can do df.median(numeric_only=True)

@DilpreetSingh02 9 ай бұрын

Thanks man

@atienograce2520 8 ай бұрын

Thanks a bunch!

@anre3821 5 ай бұрын

was looking for an advice on this, thanks a lot!

@greentree9751 3 ай бұрын

thanks a lot

@Clrakey 19 күн бұрын

Thank u so much man.

@diegoalarcon6062 4 жыл бұрын

I don't care if some of your videos are long, in other channels they're just redundant but that's not your case! If you start doing short videos we may be losing all that valuable information that you provide to us. So far, this is the best Python channel I've seen. Greetings from Medellín, Colombia.

@merajajam425 5 жыл бұрын

The level of my programming in Python has been substantially improved since I have started watching your great videos. Many thanks, Corey. Would you please prepare some videos regarding the networkx module as well?

@jorgetiz99 4 жыл бұрын

This has to be one of the best videos on youtube about Pandas, thank you so much. Greetings from Perú.

@milrione8425 4 жыл бұрын

I love how you are just using the same data throughout the whole series. Thank you so much, Corey!

@jiangxu3895 4 жыл бұрын

I just discover that your way of teaching is to tell not only how to do it but why this is how to do it. thumb up!!

@chukwuemekamusic663 Ай бұрын

Thanks!

@coreyms Ай бұрын

Thank you!

@amir_forooghi 5 жыл бұрын

YESSSS !!! Corey`s video for groupby. I press like before I watch it. Groupby is just a superpower. Thank you for this awesome series Corey. You are the best.

@elnazdehkharghani6121 5 жыл бұрын

You make all your subscribers happy with just uploading your videos !!! Thanks, Corey

@coreyms 5 жыл бұрын

Thank you all for watching!

@jongyoonsohn8559 5 жыл бұрын

I'd like to share my solution to the practice question. ctr_knows_python = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python', na=False).value_counts(normalize=True)) ctr_knows_python.rename({False:'Don\'t know', True:'I know'}, inplace=True) ctr_knows_python Hope this helps too!

@coreyms 4 жыл бұрын

Nice!

@moushumitamanna 4 жыл бұрын

Hi, can you please explain what "na=False" means here and why do we have to put this in the code? Thanks in advance

@tplano3794 4 жыл бұрын

@@moushumitamanna not applicable

@moushumitamanna 4 жыл бұрын

@@tplano3794 thanks. But why should we put na=false in this code

@tplano3794 4 жыл бұрын

@@moushumitamanna in a column which is expected to have numbers, na does not make sense so we filter out these values. also if you run any functions (mean, median) then you may run into syntax errors

@felipegomez3047 3 жыл бұрын

I'd like to share my solution to the practice question: country_grp['LanguageWorkedWith'].apply( lambda x: x.str.contains('Python').sum() / len(x) * 100 ) As you can see it's just as symple as adding " / len(x) * 100 " in the lambda function, where len(x) is the total number of users for each country.

@ironpolux 3 жыл бұрын

como se te ocurrio esto? ahem I mean, How did u come up with this? well played

@BCS_FahadAhmad 3 жыл бұрын

I guess x.count() in place len(x) makes more sense, since there can be people who did not answer language(I highly doubt XD)

@nicocilia5871 3 жыл бұрын

@@BCS_FahadAhmad I think x.count() will not count NaN so I think len is better if you want to include people that skipped that question. I am assuming that was an option.

@gurjotsingh8631 2 жыл бұрын

I was thinking the same, so i downloaded his repository and tried it and it works. Came here to comment and saw your comment. so , i just wasted 5-10 minutes of my day. whatever.hallelujah.

@kingler199 2 жыл бұрын

Damn well played

@antonyjohne 4 жыл бұрын

Hey Corey! Thanks a million for the Pandas Series. As always, very intuitive and easy to follow. Now that you've taught Matplotlib and Pandas, would love to see a new Numpy series in order to complete the Data Science trinity. Please consider adding a Numpy Series.

@walternyc 3 жыл бұрын

Working on a project evaluating an employee survey and this is just what the doctor ordered. Thanks! One of the best channels in KZbin for data analysis hands down

@Davidkiania 4 жыл бұрын

Best video in the series loving them and normally can’t wait for the next.

@valerioharvey7289 4 ай бұрын

other gurus are just like "here's the code for this, copy it and don't ask why" but you are the only one who shows how things work. Thank you very much

@YeekyYeeky 4 жыл бұрын

one of the best thing that happened to me when I woke up (I am on the opposite side of the world to Corey Schafer) is finding that Corey just upload another Pandas tutorial video , thank you !

@fvdvhome Жыл бұрын

Mr. Schafer, I am so happy I found your teaching. I have been on a journey to become a data analyst, and after completing the Google Analytics Course , I realized that I needed to learn much more. I am currently finishing a Python Course through Coursera offered by IBM. Not every professional, no matter how good they are, have the natural ability to teach. Your method and technique are so amazing and helped me to overcome some of the confusions I had with coding in Python. I learned so much from just this video alone. I will definitely visit the site you referenced, and look forward to learning more from your videos. Thank you so much!

@gregoryogunna9527 Жыл бұрын

American?

@codewithluq 5 жыл бұрын

Corey Again. Very fantastic tutor. I press the like button before I watch.

@brewtalxxx 2 жыл бұрын

Thank you so much for this video. I learnt way more from this than the many hours I spent sitting in class listening to a teacher who just wanted to end the lesson early or have long lunch breaks. This is really precious. And thanks for the reassurance that if I find this difficult, there's nothing wrong with me LOL.

@shikharsaxena9989 4 жыл бұрын

after this lecture i started loving the complex coding of pandas and matplotlib. really you are an amazing teacher

@vagelisilias 3 жыл бұрын

I am a GIS student and I want to thank you because I'm doing my last assignment for university and I'm using Geopandas, matplotlib, pandas, cartopy and forth on and you helped so much with your videos, I have build a nice map and I have produced different tables with my data. Thanks god you are out there and sharing your knowledge free

@LibardoLambrano 4 жыл бұрын

Thanks Corey for sharing these videos. Pretty clear explanations. You are a great teacher.

@Blueshockful 4 жыл бұрын

Im browsing thru some of the videos to brush up on Python, and this is the first python video that didnt get me bored. Concise and brillliant. Love your videos! keep up the good work :)

@panpan4433 3 жыл бұрын

For your exercise (What % knows Python) , I divided the sum in the lambda function by x.count() then multiplied by 100 : country_group['LanguageWorkedWith'].apply(lambda x: 100 * x.str.contains('Python', ).sum() / x.count()) Thanks for the free content, awesome

@priyavratchaudhary9211 10 ай бұрын

use len(x) instead of x.count() because count() function exclude respondents who does not know any language.

@deniscampana8345 4 жыл бұрын

Thanks so much Corey ! It's clearly impossible not to understand what you explain on all your videos : It's fluid, straightforward, crystal clear ! And more over your english : Whaoooo ... Congratulations !! I wonder if I've learned more Pandas or english !! 200% great !!

@saiakhil4751 3 жыл бұрын

I signed up for brilliant org just for Corey Schafer. Thanks for sponsoring him.

@WrongSmth 4 жыл бұрын

Hey, Corey. I'm a network engineer and I'm learning pandas to be able to do some packet analysis and your videos really help me a bunch! This is my solution for the coding problem from the video. Hope it helps! know_python = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum()) total_respondents = country_grp['LanguageWorkedWith'].apply(len) know_python / total_respondents

@sayantanchakraborty75 4 жыл бұрын

Best videos on pandas on KZbin by Corey Sir. Loving them and normally wait for the next videos. Lots of love for you from India.

@WillMoody-crmstorm 4 ай бұрын

Perfect timing. Thank you for all the effort put into these videos. I've done that last jibe crash so many times, with the feet just off centre

@Schmidt3k 4 жыл бұрын

For your practise question, use .mean() instead of .sum() .mean() on a Series of bool will give you the fractions in a quick and easy way. Multiply by 100 for %. edit: As per discussion below, .mean() ignores NA values whereas Corey's approach treats NA as '0'. An alternative is thus: mygroups['LanguageWorkedWith'].apply(lambda x:x.str.contains('Python').fillna(0).mean()) Now, the results should be equal to Corey's.

@davidsp7949 4 жыл бұрын

It looks like a nice solution but numbers from Corey's video are slightly different than those with .mean() and I do not know why. For example: for Afghanistan PctKnowsPython 18.181818, with mean is 20.512821 for Albania PctKnowsPython 26.744186, with mean 27.710843 Does anyone know why?

@sunramaroc 4 жыл бұрын

@@davidsp7949 yeah i have the same doubt,,i guess that s due to the fact that mean() take in consideration only the respondents who effectively answered the question,,and sum() take all respondents even the ones with NaN for the question.so corey solution is Pct over all respondents,, and the mean() is over only the ones who answered this Q.

@jasleung2932 4 жыл бұрын

.mean() neglects those "NAN" responses while if u use x.str.contains('Python').sum()/x.size instead, it would count those "NaN" as "no pythoner" which is what Corey was doing

@fjramons 2 жыл бұрын

Well played. For me your solution is quite elegant. BTW, in case you wanted to treat NA as zeros (to get the same results from the video), you can simply use .mean() with its 'skipna' option disabled. This would make: mygroups['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').mean(skipna = False)

@mahanansari6152 2 жыл бұрын

I just changed sum to value_count(normalize=True) and it worked

@finncollins5696 Жыл бұрын

thanks so much for this series. started from the first video two weeks ago, now in the 8th. this series so far made a lot progress in me,. thanks so much, .May God Bless You. Love from Sri Lanka...

@ahmedhawater7522 4 жыл бұрын

Man you are one of the best teachers who ever learned me something, much love and support ♥️

@sayarmandal1885 4 жыл бұрын

Thanks, Corey. This is one of the most comprehensive pandas tutorials on KZbin. Love from India. I also noticed a subtle issue. We are adding the number of respondents who filled their Country and not who filled LanguageWorkedWith. Someone can fill Country and not LanguageWorkedWith.

@rukhan8900 3 жыл бұрын

The reassurance at the end was so appreciated as a beginner. Thank you for your help !!!

@MatthewFoulk 4 жыл бұрын

Really appreciate the addition of practice problems. It helps me to grasp the material

@yosephkurabachew6539 11 ай бұрын

perfect content. one flaw is that , you never explained what a "lamda" function is and went straight to using the function in your previous videos. you did the same thing here. now i have to first study lamda.

@lvcas9313 11 ай бұрын

Yeah I had to take a look into it. For anyone curious, lambda and "def return" is the same thing, but lambda are throway functions while "def return" assign a name to be applid further in the code. The "def return" syntax is easier to read and more clean than lambda

@bhargav1811 3 жыл бұрын

Every second of your python video are really worth it!!!

@alejandropereyra438 3 жыл бұрын

This video is so useful , the simplification that python does for the problems is so helping. is the best language in the programming of code. And the proffesor of this video is really a genius. !!! thanks.

@aborucu 3 жыл бұрын

Perfect explanation. Making a convoluted yet so important concept crystal clear through step by step explaning and also giving connections to pandas object types. Cheers!

@bobchannell3553 4 жыл бұрын

Thanks for doing this video in a detailed way, like you always do. Just under an hour is a good length for a video like this. Thanks!

@Imrannaseem818 4 жыл бұрын

Thanks Corey. I have waited for this video whole week. Great explaination

@qigangdeng8636 4 жыл бұрын

Hey, Corey, I see many people gave their own answer which are wonderful. So want to give my one here, which looks like more a beginner answer: #create a new column called: 'Python use' df['Python_use'] = ( df['LanguageWorkedWith'].str.contains('Python') ) #.value_counts() the 'Python use' column as it is a boolean type: country_grp = df.groupby(['Country']) Python = country_grp['Python_use'].value_counts(normalize=True) Thanks for your great Lectures! I watched from part 1. I am your big fan now.

@yuewang9623 4 жыл бұрын

Every time I saw a new post, I click the 'like' button before watching:D

@Prasanna_Rahavendra 4 жыл бұрын

Hey Corey! For the question you gave: The percentage of people by country who use python. There is an efficient solution too (Without creating a separate dataframe). country_grp["LanguageWorkedWith"].apply(lambda x:(x.str.contains("Python").sum()/len(x))*100) Actually what I am doing here is, in the lambda function, at the return part, I divide the No. of python users by the length of the given series and then multiply it by 100. This gives the percent of python users in each country. This approach might be a bit code efficient but can be a bit confusing for some.

@maheshmmmec 4 жыл бұрын

len(x) might not give u total respondent since it is series on LanguageWorkedWith and people might have skipped.

@gonzalezgenaro 4 жыл бұрын

@@maheshmmmec Correct

@stevensukenik254 3 жыл бұрын

You can use len(x) in the lambda, it will include the na in the series. You cannot use count(x) because it skips na. But you can use value_counts(x). If you run the following code, it will verify that Prasanna solution is corret: len_total = country_grp["LanguageWorkedWith"].apply(lambda x: len(x)).sort_values(ascending=False) us_no_answer = country_grp["LanguageWorkedWith"].apply(lambda x: x.isna().sum()) us_answer = country_grp["LanguageWorkedWith"].apply(lambda x: x.notna().sum()) df_counts = df['Country'].value_counts() df_counts = pd.concat([df_counts,len_total,us_answer,us_no_answer],axis='columns') df_counts.columns=['df_value_counts','lambda_len','user_respond','user_did_not_respond'] df_counts produces: df_value_counts lambda_len user_respond user_did_not_respond United States 20949 20949 20769 180 India 9061 9061 8844 217

@mggarekar 4 жыл бұрын

nice video :) i liked the q/a approach at the end where you left it open.

@Yasharvl 4 жыл бұрын

Thanks Corey! This is pure gold!

@denizcicek7333 3 жыл бұрын

You are just wonderful, it makes so much fun watching your tutorials. I finde directly the answer, those I need. God bless you brother.

@joncochran9647 3 жыл бұрын

I've watched quite a variety of different data analysis tutorials and this one was easily one of the most engaging for me. Having interesting data really helps.

@mohammadghouse25 4 жыл бұрын

Best Pandas playlist in youtube. One point solution for python learners

@dennisp5302 3 жыл бұрын

I just went through Part 8 a second time. Thanks a bunch!! I learned a lot.

@nicholaspolino2657 4 жыл бұрын

LOL @ "If I did this correctly, and it's definitely possible I made a mistake." Happy I found these videos, thanks.

@Rifbas01 5 ай бұрын

You had me on the "WayChat"..lol 23:00

@fiefiego2298 2 жыл бұрын

thank you Corey!! this is a wonderful pandas series!! you make the concept so easy that even a python beginner (that is me) without programming background in colleague can understand!! i'd like to share my solution as well: (since i don't know concat method, i calculate the answer first then convert them into a dataframe by dictionary) know_py = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum()) answer = df['Country'].value_counts() per_cent = know_py/ answer result = pd.DataFrame({'answer': answer, 'Python': know_py, 'percentage': per_cent}) i find you stop uploading new tutorials for a long time, hope everything goes well with you. and strongly looking forward to hearing from you soooooon!! thank you & greeting at 2022 sep 7th :)

@jeremine9259 Жыл бұрын

Just want to share here my solution for the practice question (but with the survey of 2022): --- country_group['LanguageHaveWorkedWith'].apply(lambda x: x.str.contains('Python').value_counts(normalize=True)) --- And also give thanks to your wonderful videos, Corey! It's been 3 years and they're still among one of the best tutorials.

@keepcontinue Жыл бұрын

I thought the same thing

@AugustoGeografo 5 жыл бұрын

Always looking forward for your videos, Corey.

@RavenEX1980 3 жыл бұрын

best tutorials ever, i have read lot of books, but your technique is global and works best...keep on the good word @Corey Schafer

@Am-hsb 4 жыл бұрын

Thanks a lot Corey! Got to learn complex syntax in simple ways. You are amazing teacher.

@manish_chandra 2 жыл бұрын

One of the best and most easily understandable vid on Pandas. Thank you for creating this !!

@way_to_be_analyst6042 3 жыл бұрын

im just diving into pandas and would like to say - GREAT THANK YOU for such nice and detailed explanation. great job!

@stressfreetrading1341 4 жыл бұрын

NAMASTE!! Corey Schafer.. Love From India

@federicohan1458 4 жыл бұрын

I found amusing explaining what a percentage% is after going over apply & lambda methods, but that's exactly the thoroughness that makes your videos so loved :)

@matthiashupfer2659 4 жыл бұрын

These tutorials are well thought out and really great in explainatory purposes. Greats skills here from Corey! Thank you.

@СергейФролов-ъ5я 3 жыл бұрын

Corey, thank you very much for your free videos!

@pcordeirobr 3 жыл бұрын

Corey, the content of your videos are amazing. This tutorial in special is sensational.

@buzz.b 2 жыл бұрын

Thank you for the last example (percent that knows python). It was great to see how the different methods learnt can come together in a practical example; this really helped consolidate the knowledge gained.

@rauberhozenplotz7009 4 жыл бұрын

Helps me to get into my PhD. Thanks a lot for uploading this!

@parsahosseini4241 4 жыл бұрын

47 minutes of a pure pandas tutorial from a god in python, man you're a hero🔥🔥

@Tigrex281 4 жыл бұрын

Hey Corey, first of all thank you very much for all those fantastic videos. I also have tried to answer the question of percentage knowing python for each country. I came up with following solution: PrctKnowPython = country_grp['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python', na=False).value_counts(normalize=True)).loc[:,True] One advantage of this approach is, that you can just remove na=False and ignore NaN values in your data.

@markkennedy9767 4 ай бұрын

The groupby explanation comparing it to a filter is really good.

@belleriveblvd 4 жыл бұрын

Corey, I learn a lot from your videos. But this one has been especially helpful. Thanks.

@ironpolux 3 жыл бұрын

Really enjoying this series, thank u Corey!

@akritisstory850 4 жыл бұрын

you are a great teacher Corey.

@sumranms 4 жыл бұрын

I really like the way you speak. Your language is clearly understandable and you have a great accent. :)

@turksonmichael1236 Жыл бұрын

Thank you for this. Had clearer understanding of pandas than before. Wish you the very best

@bobchannell3553 4 жыл бұрын

This was a lot to learn in one video. That's why I went back and watched it again this week. At the end, I added something I think would be useful in what I do. I added a filter to select records where the number of respondents is >= 5. filt = python_df['NumRespondents'] >= 5 python_df.loc[filt]

@Boat-xs8lm 8 ай бұрын

Thank you,I am very lucky that I found your tutorials.

@tolex3 4 жыл бұрын

I've been doing data analysis using Python & Pandas for a few years now. Still, I'm picking up new things from your videos. Very clearly explained! Thank you!

@vijaybhatt4347 3 жыл бұрын

You're Genius Love From India You mentioned India in your tutorial Love this gesture 🇺🇸🇮🇳

@maheribnerahman7783 4 жыл бұрын

Really appreciate your teaching strategy man..have been learning a lot from you since the quarantine started.Love from bangladesh

@ashkanfarahani6532 4 жыл бұрын

Hi Corey. I think this might be a relevant simpler approach for getting percentage. I used value_counts(normalize=True) instead of sum. df.groupby(['Country'])['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').value_counts(normalize=True)) This of course return both percentage who know Python and Who don't know. So if we want to get for a specific country, for instance Japan, then: df.groupby(['Country'])['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').value_counts(normalize=True)).loc['Japan'][1]

@RegularDude95 Жыл бұрын

I have a similar approach to,i am happy to see that i am not the only one who always sees the easiest ways =)))

@JahidHasan-Aneek Жыл бұрын

instead of using : value_counts(), use sum() in second line. Then you'll get appropriate answer. df.groupby(['Country'])['LanguageWorkedWith'].apply(lambda x: x.str.contains('Python').sum())

@abdulkadirguven1173 Жыл бұрын

Great approach. Thanks for sharing

@thotarohith2060 Жыл бұрын

Here is my approach : filt=df['LanguageWorkedWith'].str.contains('Python',na=False) python_count=df.loc[filt]['Country'].value_counts() python_count.rename('p_c',inplace=True) python_count -- total_count=country_grp['Country'].value_counts() total_count.rename('t_c',inplace=True) total_count -- result_horizontal = pd.concat([total_count, python_count], axis=1) import numpy as np result_horizontal.replace({'p_c':np.nan},0,inplace=True) result_horizontal['perc']=(result_horizontal['p_c']/result_horizontal['t_c'])*100 result_horizontal

@antoniodefalco6179 3 жыл бұрын

you're a amazing teacher man, thank you for this free content

@anantharjun9662 Жыл бұрын

Heyyy coreyyy I got drops in my eyes after watching the way you taught....you made my day❣️✨love you so much corey

@akosasuke5128 2 жыл бұрын

Corey Shafer deserves a KZbin Teacher award

@weiyancheng6360 4 жыл бұрын

Hi Schafer, thanks a lot for making these great vidoes and sharing the programming knowledge with us. I have watched your python, django tutorials and now this pandas topic and matplotlib is my next plan. I am 35 years old with no basic knowledge about programmnig, but your great work helped me a lot to learn new things.

@moushumitamanna 4 жыл бұрын

First of all, Thank you so much for all your tutorials. You are a great teacher. and yes, please make a video on Multiple Indexing.

@AAND8805 3 жыл бұрын

I am following your pandas series since the last 3 days and may complete in 1 or 2 days max, I will come back to the series to revise it, very well made Series and keep up the good work !😀

@satyamgaba 4 жыл бұрын

Best Valentine's Video!!

@samratsengupta8881 4 жыл бұрын

good job Corey, your content is well suited for preparation of data science

@mohammadsalimkhan4974 4 жыл бұрын

Good work Corey. Loved all the explanations:-)

@AkshayGhadi01 Жыл бұрын

Hello Corey Sir, I love your teaching a lot. You are the best. Thank you

@Terence818 4 жыл бұрын

Yes Corey, having a future video on multi-index will be very helpful!

@xuanyibutzin4775 Жыл бұрын

Super helpful video! Thank you Corey!

@ahmedabdulrahman8567 4 жыл бұрын

very useful ... thanks Corey for the great effort

@omj7113 Жыл бұрын

Thanks a lot for your teaching! Here is the my solution at the end of the video: # group object['column'] is a Series object, so the input of the function is a Series, ana the output value of the function is a float def percent_know_python_each_country(countrySeries): num_know_python = countrySeries.str.contains('Python').sum() num_all = len(countrySeries) percent = round((num_know_python / num_all * 100), 2) return percent country_group['LanguageWorkedWith'].apply(percent_know_python_each_country).sort_values(ascending=False).head(30)