Advanced Use of groupby(), aggregate, filter, transform, apply

Advanced Use of groupby(), aggregate, filter, transform, apply - Beginner Python Pandas Tutorial #5

Рет қаралды 42,458

StrataScratch

Күн бұрын

Пікірлер: 57

@immanuelsuleiman7550 4 жыл бұрын

That display 3 dataframes function is incredibly useful thanks

@stratascratch 4 жыл бұрын

Glad you found it useful!

@immanuelsuleiman7550 4 жыл бұрын

@@stratascratch keep up the good work

@henrygraterol 2 жыл бұрын

The way you explain and manually breakdown the methods is amazing. I do not have a software background, only basic experience with for, while, and if-else loops in C. I am able to easily understand each method due to the structure of your presentation. I subscribed to your channel after this video. Hope to see more of you and see your channel grow.

@thequiickbrownfox 15 сағат бұрын

excellent excellent tutorial!

@StefanoVerugi Жыл бұрын

you are truly talented, the superb teaching quality of your method is by far better (and more effective with entry level people like myself) than most of what can be found in the YT sphere these days most grateful , immediately subscribed

@iaroslavd.916 4 ай бұрын

Great tutorial! Very detailed. Thank you!

@craftykidsclub7039 3 жыл бұрын

the way it is explaning everything is really awesome. Thanks you for nice vedio!!!

@stratascratch 3 жыл бұрын

Thank you! I'm glad you enjoy the pandas tutorial. Definitely a must know if you're working with data and with python. Take a look at the notebook as well!

@shahidkarim7352 4 жыл бұрын

looking forward sql and other python vid, thanks for the content

@tarast4456 4 жыл бұрын

Thank you for this information. The apply method example has helped me with my project

@stratascratch 4 жыл бұрын

Wonderful!

@Bakhiet89 3 жыл бұрын

Thank you so much!! I was fighting with groupby and apply!

@stratascratch 3 жыл бұрын

I'm glad you found it useful! Good luck with python.

@manuel9345 Жыл бұрын

Thank you, very useful

@TassoP-p6m Жыл бұрын

Content is interesting, it’s a carbon copy of material being described in the “Data Science Handbook” from Jake VanderPlas, now you have the option to read or watch the video.

@juliannavas9561 4 жыл бұрын

Very good video, many many thanks!

@stratascratch 4 жыл бұрын

Glad you liked it!

@yusufbas035 2 жыл бұрын

great works keep going dude

@jaysonjaylen 2 жыл бұрын

Great video, if you could increase the volume somehow that'd be great though.

@kennethstephani692 8 ай бұрын

Great video!!

@davida99 2 жыл бұрын

Love the videos. I became a premium member on SS and subbed to the channel. I've seen a huge improvement in my sql AND python skills ! It would be nice to add more questions like leetcode does where maybe you restrict some questions to only using UPDATE or DELETE FROM or even some practice questions where we create tables

@stratascratch 2 жыл бұрын

That's great to hear! We're definitely going to be releasing UPDATE/DELETE/CREATE questions this year. On our roadmap are data structure & algorithm questions, take home assignments using python notebooks, and UPDATE/DELETE/CREATE questions. Stay tuned!

@davida99 2 жыл бұрын

@@stratascratch Wow, cant wait!

@mohammadyahya78 2 жыл бұрын

Thank you very much. Hopefully you can also do more advanced pandas videos. This is very helpful. Not sure what 36:22 `str.lower` means please and how it knows that this refer to `key` column?

@Sheshagiriksrao 2 жыл бұрын

Nice one, I was not able to understand groupby section from Jake Vandreplas's python data science handbook but your video helped me out, could you please take the planet data set example and use two keys to groupby, it is a bit tricky to understand, Thank you

@joseleonardosanchezvasquez1514 2 жыл бұрын

Great thanks

@stratascratch 2 жыл бұрын

You're welcome.

@Konzor 3 жыл бұрын

Thanks a lot. Really clear.

@soojinkim6450 2 жыл бұрын

Thank you for the explanation. What's the difference between transform() and apply()?

@adityaaware9844 Жыл бұрын

Apply can use multiple columns in groupby but it's slower... Transform can use single colm bt its faster

@AnkanChatterjee-d8v Жыл бұрын

can you please drop a link to download the dataset 'planets'? That would really help me. Thank you :)

@stratascratch Жыл бұрын

Here you go! github.com/mwaskom/seaborn-data/blob/master/planets.csv

@jongcheulkim7284 2 жыл бұрын

Thank you^^

@Sam-tg4ii Жыл бұрын

Hard to read the screen. Plz zoom in when recording. Clear explanations. Thanks

@utkalmaheshwari Жыл бұрын

In filter function, filter function is applied on groupby object. How it returned rows from original dataset ?

@stratascratch Жыл бұрын

I'm not sure if I understand your question but the filter function still has access to the dataset so the output can still have values from the original dataset if filtered in the correct way. I would play around with the filters and see what you get in the output as you experiment.

@kirubababu7127 3 жыл бұрын

HI Bro, My requirement is, I have to group by key and key column with column name 'key' and data2 column with name data2 and I need sum value of data1. Kindly share your ideas

@yadali4833 Жыл бұрын

What is x in x['data2'] ? is it df? if so why when x is the cell value in the transform sectiom

@osoriomatucurane9511 Жыл бұрын

I have the same issue, I struggle a lot to get my head arround functions parameter and iterations. It seems to me x is element row, each row is a distinct category. Know, looking at [ ] operator, X['data'] I guess gets access to data series corresponding to x category, from which the aggregation measure is calculated/performed over, in this case the stdv

@0Fallen0 2 жыл бұрын

I came from the pandas data science handbook to youtube to learn more but this is the same thing lol

@ohh_nina_nyc 3 жыл бұрын

Nice video

@stratascratch 3 жыл бұрын

Thank you. I created this lecture and notebooks for an university course and released the contents for free. So I hope you like it.

@manavsaxena5579 3 жыл бұрын

Hi Nate, I was practicing '3 Bed Minimum Problem' on the website and although I managed to solve the question in SQL, I am really struggling with the Pandas Solution. Could you please make a video on it or provide the solution? Also, I would really appreciate if in your future videos you could solve the same problem in both SQL and Pandas.

@stratascratch 3 жыл бұрын

Here's the python solution to the 3 bed min problem: min_beds = airbnb_search_details.groupby(['neighbourhood']).filter(lambda g: min(g['beds']) >= 3).groupby('neighbourhood').mean().reset_index()[['neighbourhood','beds']] result = min_beds.rename(index=str, columns={"beds": "n_beds_avg"}).sort_values('n_beds_avg',ascending=False) Hope that helps. I'll be doing some python videos in the future but not all of the questions will have a python solution, unfortunately. I will try though!

@royalchamp 2 жыл бұрын

thanks

@Pandimoori_krish 3 жыл бұрын

How to create html pdf reports after data cleaning to send client please make vedio

@stratascratch 3 жыл бұрын

Try this man! kzbin.info/www/bejne/i56xY5KIabB4nZo

@Konzor 3 жыл бұрын

Hi Nate, I have a question @25:56 when you do groupby apply: How do you use groupby().apply(function) if you have multiple input parameters of the function? E.g. if in your example "norm_by_data2" would have 2 inputs (x,y).

@stratascratch 3 жыл бұрын

Are you talking about something like this? stackoverflow.com/questions/43483365/use-pandas-groupby-apply-with-arguments

@SudhirKumar-ry4gk 3 жыл бұрын

Please help as I have data of employees in which they did multiple sale, I want if any employee did sale more the 50000 againt it each emp I'd of that person print excellent rest low. Like Emp I'd. Sale status Emp1001 5000. Excellent Emp1001 45000. Excellent Emp1001 2000. Excellent Emp1002 5000. Low Emp1003 2500. Low

@stratascratch 3 жыл бұрын

I think you'd probably want to do a groupby() employee ID first. Then create a new column (the status column) and add the value ('excellent' or 'low) for the status column based on the employee total sale that you were able to calculate from the groupby(). This can be done using an if/else statement. Hope that helps!

@SudhirKumar-ry4gk 3 жыл бұрын

@@stratascratch can please share the code it will help a lot for me.

@stratascratch 3 жыл бұрын

@@SudhirKumar-ry4gk Something like this might work. Hard to test without the dataset. Refer to this resource (stackoverflow.com/questions/40603264/pandas-add-a-new-column-in-a-data-frame-based-on-a-value-in-another-data-frame) for help. Also, you can post on stackoverflow since it's a website of people helping out others. df = employee_table.groupby('id').sum().reset_index() df['status'] = ['excellent' if x > 50000 else 'low' for x in df['sale']] final_df = pdf.merge(employee_table, df, how = 'left') #then remove all the rows you don't need.

@SudhirKumar-ry4gk 3 жыл бұрын

@@stratascratch thanks for your support