How to use groupby() to group categories in a pandas DataFrame

  Рет қаралды 128,019

Chart Explorers

Chart Explorers

Күн бұрын

Пікірлер: 147
@ShiladityaBiswasNow
@ShiladityaBiswasNow 3 жыл бұрын
Thanks a lot! You saved me days! I'm literally crying rn. So pricise and to the point. Love the content
@ChartExplorers
@ChartExplorers 3 жыл бұрын
I'm glad it helped! Groupby was always a sore spot for me learning, but now that I know it I use it all the time.
@DuniyaJahan1
@DuniyaJahan1 2 жыл бұрын
🙏🙏🚩🚩🙏🙏Truly sir great lecture I had been trying to understand group by in pandas since last 25 days, but no-one was able to clear my confusion. But you sir explained me brilliantly and I am really so obliged of you. Thanks and I subscribed you and share on Facebook page, from Banaras City, India 😄😄😄🙏🙏🙏🙏🙏🙏
@rashadm.sadigov4366
@rashadm.sadigov4366 Жыл бұрын
Dude thank you sooo much. Finally someone with proper english explained things properly
@lightningmi
@lightningmi 2 жыл бұрын
Good step by step tutorial. But one thing you missed by Groupby multi columns, and apply different aggregate function. example: [column A, column B] A=sum, B=average. something like that
@athief
@athief 2 жыл бұрын
It's great to have a 5-min quick & dirty dive, but a couple more seconds here and there to say that "agg" means "aggregate", that if we want more than one column summarised we must provide a list (hence the double brackets), etc. It provides a simple explanation that facilitates memory.
@crystalchaung1576
@crystalchaung1576 2 жыл бұрын
I had to watch this a couple times too hear that part around 4:18 about why groupby will only return those who survived. It is good you added that. Now that I understand that, I can take a shot at age groups for the Titanic.
@Aleqsie
@Aleqsie 10 ай бұрын
ok this is a mad comprehensive information that is explained amazingly briefly and clearly within just 7 min.
@imad_uddin
@imad_uddin 3 жыл бұрын
I have seen three of your videos so far, all were very well thought out. Really helpful. You deserve many more subscribers!
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Thanks for your kind words Imad Uddin!
@sgerodes
@sgerodes 3 жыл бұрын
Brilliant. It had exactly what i needed. Multiple groups and the splitting trick
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Perfect! I'm glad it had what you needed.
@skye5107
@skye5107 11 ай бұрын
Thanks a lot i am searching this in entire weeks on articles.
@tonianibal7585
@tonianibal7585 2 жыл бұрын
Thank you very much for sharing! It really helped me, was exactly what I was looking for. People like you are blessed ang good people helping to develop this world! I just subscribed, follow and will share in my groups!
@jackfarah7494
@jackfarah7494 10 ай бұрын
Simple and informative i love this video and am saving it for future references! Thank you!
@InteligenciadeNegocios
@InteligenciadeNegocios 2 жыл бұрын
This is one of the best videos EVER! really helpfull! Thanks a LOT!
@lawngreenlyp
@lawngreenlyp 3 жыл бұрын
This is a very good video for explanation. Thanks so much from Hong Kong.
@Monkeysal07
@Monkeysal07 3 жыл бұрын
THANK YOU!!! that last tip is a life saver
@saisarath623
@saisarath623 2 жыл бұрын
Really helpful tricks. Thank you!
@ChartExplorers
@ChartExplorers 2 жыл бұрын
You're welcome!
@aishwaryapattnaik3082
@aishwaryapattnaik3082 2 жыл бұрын
Just what we needed . Awesome content 🙌🏼
@mrb7931
@mrb7931 Жыл бұрын
Thanks a lot! You saved me day , now i can calculate mean by categorizing datasets
@blueciel_03
@blueciel_03 10 ай бұрын
Thanks a lot, it's really informative for my upcoming exam.
@carolinamalosabastos2648
@carolinamalosabastos2648 11 ай бұрын
Great video! so clear... It helps me a lot! Tks from Brazil!)
@rohitekka2674
@rohitekka2674 3 жыл бұрын
concise, short , illustrious!! Thanks alot!!!
@ChartExplorers
@ChartExplorers 3 жыл бұрын
You're welcome!
@afonsoosorio2099
@afonsoosorio2099 2 жыл бұрын
Awesome 👌. Clear crystal 🔮. I specially like the bin trick, straightforward. That is really amazing 👏 😍. I had to break into intervals using numpy select ( ) or user defined function with apply ( ) to get the same result with the bin method. Keep it up.
@zebramc3693
@zebramc3693 Жыл бұрын
Thank you for your detailed demonstrations.
@andrenevares7543
@andrenevares7543 2 жыл бұрын
Great explanation! Good JOB! Thumbs up!
@VRUNO
@VRUNO 2 жыл бұрын
you got a new follower Sir! really clear, really good explained, God, finally I understand :D thanks so much!
@ThanhVo-zs7ns
@ThanhVo-zs7ns 2 жыл бұрын
Very good and funny videos bring a great sense of entertainment!
@ssrwarrior7978
@ssrwarrior7978 3 жыл бұрын
wow, u made it easy for me and saved lot of time.. THANK YOU
@Jitendrakumar-du1ng
@Jitendrakumar-du1ng 2 жыл бұрын
thanks for the great video, it really helped me.
@vitorribeirosa
@vitorribeirosa Жыл бұрын
Neat and objective!!! Thanks for sharing. I do appreciate your content.
@ZirothTech
@ZirothTech 2 жыл бұрын
Great video, thanks!
@fashaikh5339
@fashaikh5339 3 жыл бұрын
VERY CLEAR , PLEASE IF YOU CAN EXPLAIN HOW DOING INTERSECTION IN CASE WE HAVE (ONE -TO -MANT) RELATIONAL DATA BASE ?. THANKS
@youknownothing_
@youknownothing_ Жыл бұрын
great video. it would be great if you also provide the link for the notebook
@XuanTran-ri1hn
@XuanTran-ri1hn 2 жыл бұрын
Hi. Thank you for your video. May I ask how do you know exactly that which age group is divided to which bin? Although these ages are put into 3 bins but I am unclear which exact age which bin contains? For example: what age range for 'young' in this case?
@JopieSchaft
@JopieSchaft 2 жыл бұрын
​@Adeel KhanI can think of 3 approaches to this: - Group by age_bins, then take the minimum and maximum age: df.groupby(['age_bins']).['age'].agg(['min', 'max']) - Use retbins=True in the pd.cut() function; I think retbins returns the bounds of your bins. - Define the bins yourself, i.e. bins=[0, 20, 60, 120] (instead of bins=3 as in the video) will divide the passengers into a 60 bin
@denisml42
@denisml42 2 жыл бұрын
Thanks for the great video. Im wondering about how you could group the ages in intervals of 10 years. I feel like you probably wouldnt use cut for that since you would need to know the highest / lowest age in order to determine how many cuts you need. Do you have a recommendation on how to do that?
@mohamedfawzy5453
@mohamedfawzy5453 Жыл бұрын
Great explanation! Thank you.
@gabriellopes0
@gabriellopes0 Жыл бұрын
Great explanation!
@MohsinAli-yd9js
@MohsinAli-yd9js 3 жыл бұрын
at 5:39. in setting labels for 'age_bins' how did it get to know that from which age group is young, which one is middle and old. like you did not set the parameters from 0 to 20 for young, 21 to 60 for middle and above 60 for old. or either it does it implicitly.
@JopieSchaft
@JopieSchaft 2 жыл бұрын
Using bins=3 as a parameter to the pd.cut() function automatically divides the group into 3 equally sized categories. See my comment to Xuan Tran for an explanation of how you can find out what it does or what you could do differently.
@onurkoc6869
@onurkoc6869 2 жыл бұрын
you are telling very well proffessor:))
@febriannuralam4760
@febriannuralam4760 28 күн бұрын
i love it, keep it up mate
@coledd9487
@coledd9487 2 жыл бұрын
Hey there, for some reason when i try doing Single Group, Multiple Columns (like in 2:19), I keep getting an error basically stating that it thinks my 'fare' column is filled with strings - as opposed to floats. As such, I can't do sum/mean/numeric methods on that data. I can't seem to get around it.
@ChartExplorers
@ChartExplorers 2 жыл бұрын
Hey Cole DD, sometimes when you read in your data pandas thinks the data is a string even though it should be integers or floats. This video here kzbin.info/www/bejne/m6euiqyJgbitr80 discusses how to convert datatypes of columns and some common problems that you may run into when doing so. Let me know if that works.
@TheShrikhande
@TheShrikhande 3 жыл бұрын
What if I have a dataframe with two date columns (start-date, end-date) along with other attributes and I wish to create bins for each year incorporating both those date columns. How do you think I can manage to do that?
@mohamedkhaled902
@mohamedkhaled902 Жыл бұрын
Very helpful , keep it up ❤
@ericc1317
@ericc1317 2 жыл бұрын
The as_index=0 tip is great! When doing this with .count() instead of sum, like for example I’m doing a project with the code format Df.groupby([‘x’][‘y’],as_index=False)[‘y’].count(), is there any way to keep the original y column along with the new y “count” column in a resulting data frame? With this method it replaces the original y with the count of y.
@rajibroy1170
@rajibroy1170 Жыл бұрын
You are a savior
@nivviyer_
@nivviyer_ 2 жыл бұрын
Thank you so much sir !!
@aliyananwar3727
@aliyananwar3727 2 жыл бұрын
I came here to understand concept of groupby but left with emotions we men sacrificed. 🥺
@pazenriqueguillermo
@pazenriqueguillermo 2 жыл бұрын
Great Video! One question... Let say you do like the first example, group survivers by class and sum(), but I want the result sorted in a descending order ( the class with most survivers to the least...) How would you do that?
@coledd9487
@coledd9487 2 жыл бұрын
.sort_values(ascending=False)
@bnadir3930
@bnadir3930 2 жыл бұрын
Great video ! how can I get max() value grouped by column and yet get the intire dataframe colums to be presented ?
@tinayesibanda3070
@tinayesibanda3070 Жыл бұрын
How can I combine groupby then do distinct count on one of the cat column then sum on some of the numeric column
@nurshibumi
@nurshibumi 2 жыл бұрын
thank u for your time and exertion! i have a question, i have a dataset, there are a few columns in it including "Fuel_Type". Fuel types are petrol, diesel and CNG. all i want is to group by the fuel_type and store the copy of datasets in variables both petrol and diesel. how can I do that, i have been searching for hours :))) pls answer me
@czr372
@czr372 Жыл бұрын
Saved me looots of hours haha! thanx!
@hansrc4469
@hansrc4469 2 жыл бұрын
When I use groupby for multiple columns like you did, it show me a message that used list instead of square brackets.
@pramishprakash
@pramishprakash Жыл бұрын
Great video sir
@osoriomatucurane9511
@osoriomatucurane9511 Жыл бұрын
Hi Bradon, Awesome tutorial. 4:41, survived by class, mean and sum. Proportion would have been more meaningful. How to get percentagem there, I mean the proportion of survived (survived rate) by class. Using transform????? For aggregation only allowed sum, mean, count,......
@rohanbangash5827
@rohanbangash5827 2 жыл бұрын
How would we put the result of a groupby function as a column in our dataframe?
@govindrajput8503
@govindrajput8503 2 жыл бұрын
hi thanks for this. How do I show group by results for more than one variable with more than one aggregate function without the index. so basically mulitple groups as columns + aggregated on more than one function
@ahovebismark4001
@ahovebismark4001 2 жыл бұрын
so please, I need a personal favor, I need to make labels for a plot I generated from a groupby method, any help with that?
@MagnusAnand
@MagnusAnand 3 жыл бұрын
excellent tutorial
@sebastianperalta4775
@sebastianperalta4775 3 жыл бұрын
Thanks for the video.
@AIdevel
@AIdevel 2 жыл бұрын
I have a problem it keeps giving me keyError it doesn’t identify the name of the columns how can I solve it ? Please help me
@maxons.e4643
@maxons.e4643 2 жыл бұрын
How do you sort the data when different conditions are involved in the groupby?
@javierclement3047
@javierclement3047 Жыл бұрын
It seems to me like this function doesn’t really need to exist. I feel like I could make all of these manipulations relatively easily with Boolean operations. Can someone explain the advantage of using groupby()? Because it’s easier? Or is there something I’m missing?
@jakobstigsson9687
@jakobstigsson9687 2 жыл бұрын
Hey, thanks for the video. I have a dataframe that has a column with 0-4 in value, but I wish to group it by 0 and then 1-4. How would that be possible? Is it a big difference?
@ibrar6121
@ibrar6121 Жыл бұрын
In the Quick Tip Section, How did the program know that 29 is Middle_age, 2 is Young_age and 50 is old???
@ericzheng4815
@ericzheng4815 2 жыл бұрын
When trying out this example: df['age_bins'] = pd.cut(df['age'], 3, labels=('young','middle_age', 'old')), I got a error returned. TypeError: can only concatenate str (not "float") to str. I don't know why. I looked at the manual, the code seems good to me.
@michaelcruz1322
@michaelcruz1322 3 жыл бұрын
How did python determine which age_bin to place the individual into? You never specified the age-ranges associated with the categories?
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Hi Michael, good question. The age bins was were grouped with the pandas cut method. By default the cut method will turn continuous data into categorical data by grouping it into three bins (you can specify how many bins you want - but if you don't it will make three bins). So if you have 12 values it will create three bins with 4values in each bin. pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html
@Monkeysal07
@Monkeysal07 3 жыл бұрын
Maybe this will allow you to specify the ranges of the bins. The length of the labels have to be -1 inferior with respect to the length of the bins df['age_cat'] = pd.cut(df['age'], bins=[x for x in range(0,100, 5)], labels=[x for x in range(5,100, 5)], right=True)
@MachineLearningPro
@MachineLearningPro 11 ай бұрын
Great video
@premprakash6863
@premprakash6863 2 жыл бұрын
I want to group by on mobile number and want to merge messages received, how can i do that?
@danielrico3352
@danielrico3352 2 жыл бұрын
Thanks for the video! I have a question. If you want to select one specific biological sex, How could I write that code? For example just females. df.groupby(["pclass", [sex] == female])["survived].sum() It would be right to write it like this? Thanks in advance!
@fashaikh5339
@fashaikh5339 3 жыл бұрын
I have data frame contains three columns, one for restaurants_id , the second for his categories (one or plus categories) and the third column is for his zone. I need to calculate for each restaurant how many restaurants in his zone that share this restaurant in one category at least, and put the result in a new column ?
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Hi F Ashaikh, is it possible for you to email me your data (or provide me with some made up data that is similar to the data you have). That will help me see what is going on a little better. My email is bradonvalgardson@gmail.com
@fashaikh5339
@fashaikh5339 3 жыл бұрын
I did , thank you very much for your help.
@VKRealsta
@VKRealsta 2 жыл бұрын
Thanks by heart
@MatthieuKhairallah
@MatthieuKhairallah Жыл бұрын
Thanks a lot!
@yili6498
@yili6498 2 жыл бұрын
very clear, thxxx
@paar6128
@paar6128 Жыл бұрын
Waow, your're amazing man :))
@AimarZayyan
@AimarZayyan 3 жыл бұрын
Hi, how do i get with specific value column pclass sum for ex : 1 only
@ChartExplorers
@ChartExplorers 3 жыл бұрын
I'm not sure I understand your question. Are you looking to filter the dataframe so that only pclass = 1 is contained in the dataframe? You could use a boolean mask pclass1 = df[df['pclass'] == 1]. If that's what you are looking for you can check out this video on filtering which I think you will find helpful kzbin.info/www/bejne/pJqcn5pqf95mkJo
@kiko1955
@kiko1955 3 жыл бұрын
Como hago un grafico con el resultado de un groupby. How do I make a graph with the result of a groupby?
@pritisingh2432
@pritisingh2432 3 жыл бұрын
Hey I'm having problem in groupby as it is giving Data error and No numeric type to aggregate. Could you please help ?
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Hi Priti, will you run df.dtypes and let me know if there are any numeric (float or int) datatypes in your dataframe? If they are all objects check out this video on how to convert objects into numberic values kzbin.info/www/bejne/m6euiqyJgbitr80 (hopefully that will solve your problem. If this doesn't solve your problem will you copy and past your groupby statement and send it to me please?
@pritisingh2432
@pritisingh2432 3 жыл бұрын
@@ChartExplorers # Visualize Churn Rate by Gender plot_by_gender = churn_dataset.groupby('gender').Churn.mean().reset_index() plot_data = [ go.Bar( x=plot_by_gender['gender'], y=plot_by_gender['Churn'], width = [0.3, 0.3], marker=dict( color=['orange', 'green']) ) ] plot_layout = go.Layout( xaxis={"type": "category"}, yaxis={"title": "Churn Rate"}, title='Churn Rate by Gender', plot_bgcolor = 'rgb(243,243,243)', paper_bgcolor = 'rgb(243,243,243)', ) fig = go.Figure(data=plot_data, layout=plot_layout) po.iplot(fig) This is giving me the error .Can you suggest an alternative
@crunchnos
@crunchnos 3 жыл бұрын
Thank you so f much!
@souravde2283
@souravde2283 3 жыл бұрын
Awesome.
@isaacenobun6370
@isaacenobun6370 3 жыл бұрын
Thanks man
@richarda1630
@richarda1630 3 жыл бұрын
nice ! thanks :)
@brainwaves2389
@brainwaves2389 3 жыл бұрын
thanks
@ChartExplorers
@ChartExplorers 3 жыл бұрын
You're welcome! 😀
@marchanselthomas
@marchanselthomas Жыл бұрын
to the point!
@laychansethaaerd
@laychansethaaerd 3 жыл бұрын
Perfect
@jaskaransingh3200
@jaskaransingh3200 Жыл бұрын
Nice. helpful
@russellmubaya2662
@russellmubaya2662 3 жыл бұрын
Can we then plot a graph of any sort using the generated table we've just grouped ? @Chat Explorers
@russellmubaya2662
@russellmubaya2662 3 жыл бұрын
@Chart Explorers*
@shaikhjunaid8693
@shaikhjunaid8693 2 жыл бұрын
Sir how will you solve the problem when you have to determine who are the top5 highest rated players for every position in fifa dataset?
@YoungerLei
@YoungerLei Жыл бұрын
Hi, it might be fifa.groupby(by='position').apply(lambda group: group.sort_values(by='rate', ascending=False').head(n=5)
@houndofjustice5
@houndofjustice5 3 жыл бұрын
Hello is there any way to put all values in their column depending on their index if value i m trying to group by is lets say Switzerland and it has multiple Happiness ratings for each year how do i put all ratings in same column for each year but just seperate them by comma without summing them up?
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Great question Ivan. Try this out and see if it works for you. First I create a dictionary of data with 3 different countries and some happiness scores. Then I create a DataFrame with this data. The I use groupby function to group each country and then use apply(list) to create a list of all the values in each group. data_dict = {'country':['country_1','country_2','country_3','country_1','country_', 'country_2','country_3','country_2','country_3','country_1, 'happiness':[3,1,3,5,7,4,1,2,3,4]} df = pd.DataFrame(data_dict) df_grouped = df.groupby('country'['happiness'].apply(list)
@houndofjustice5
@houndofjustice5 3 жыл бұрын
@@ChartExplorers thank you for swift answer i managed to do it for one column but i m trying to do it for multiple columns basically just uniting rows with same country values but seperate them with comma its working when i do it for happiness score but if i try to add happiness rank it just throws out happiness score and happiness rank not values just those strings i tried as list but yea still not working I did it with this code which works for Happiness Score: frame.groupby(['Country'])['Happiness Score'].apply(lambda x:' , '.join(x.astype(str))).reset_index()
@ChartExplorers
@ChartExplorers 3 жыл бұрын
@@houndofjustice5 I think I see what you are asking. So you want to groupby country and then list out all the values for that country in the happiness and rank columns. Let me know if this works. If not, I am setting up a discord server for Chart Explorers. That might be a better medium for problem solving. # Example Data data_dict = {'country':['country_1','country_2','country_3','country_1','country_1', 'country_2','country_3','country_2','country_3','country_1'], 'happiness':[3,1,3,5,7,4,1,2,3,4], 'rank':[1,2,3,4,5,6,7,8,9,10]} df = pd.DataFrame(data_dict) # groupby with list for multiple columns df_grouped = df.groupby('country')[['happiness','rank']].agg(lambda x: list(x))
@SudhirKumar-ry4gk
@SudhirKumar-ry4gk 3 жыл бұрын
Please help as I have data of employees in which they did multiple sale, I want if any employee did sale more the 50000 againt it each emp I'd of that person print excellent rest low. Like Emp I'd. Sale status Emp1001 5000. Excellent Emp1001 45000. Excellent Emp1001 2000. Excellent Emp1002 5000. Low Emp1003 2500. Low
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Hi @@SudhirKumar-ry4gk, so you are wanting to group by employee Id and for employees that had sales greater than $50,000 mark them as excellent otherwise mark them as low? Is that correct?
@mohammadmfd682
@mohammadmfd682 3 жыл бұрын
very good
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Thanks!
@ainahannani4489
@ainahannani4489 3 жыл бұрын
How do I make a poisson distribution of a groupby column?
@ChartExplorers
@ChartExplorers 3 жыл бұрын
I'm not sure. I would need to see your data and know more context to better understand what you are trying to accomplish.
@shoaibsoomro
@shoaibsoomro 2 жыл бұрын
at 5:54 while applying pd.cut did not work for me it gives error TypeError: can only concatenate str (not "float") to str Solution: used the two lines that solved the issue. df['age'] = df['age'].replace('?',0) #clean data df['age']=df.age.astype('float64') #convert data type to float
@srideviponmalarp
@srideviponmalarp Жыл бұрын
Can you send dataset
@apz9022
@apz9022 3 жыл бұрын
I have a dataframe that has around 20 columns and 800 rows. One column contains multiple duplicate information that I am using as the group, and based on one of the other columns I want to filter the dataframe to show unique values based on the highest number of this column using max(). I still want to retain all of the other columns and end up with a dataframe that contains these unique values including the original columns. group = df_UE5_Compatability_info.groupby('lookup')['Function Count'].max() where "lookup" is the column I want to group by (containing multiples of the same value) and filter to show the rows with the highest number for "Function Count", how do I make the dataframe contain the other remaining columns associated with the resultant rows determined by the groupby? I am struggling. Difficult to describe in words.. sorry
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Hi Alan, you did a great job explaining thanks providing me an example of what you have done. 😀 If I'm understanding correctly (please correct me if I'm wrong), you have 1 column that contains categories and you want to get the max value for each of those categories in every column that you have (using groupby). Here is a simple example I made that will get the max value for every column in the dataframe based on the groups in Col_4. import pandas as pd # Create practice df df = pd.DataFrame({'Col_1':[1,2,3,4,5], 'Col_2':[6,7,8,9,10], 'Col_3':[11,12,13,14,15], 'Col_4':['Group_1','Group_2','Group_1','Group_1','Group_2'] }) # groupby Col_4 (in your case use lookup) group = df.groupby('Col_4').max() group.head() You will notice here, instead of adding a list of columns to perform the groupby function on I excluded it. This will perform the operation on all the columns. In your example, you should be able to do the following to get your answer: group = df_UE5_Compatability_info.groupby('lookup').max()
@apz9022
@apz9022 3 жыл бұрын
@@ChartExplorers Thanks for the reply. Below is a sample dataset (made up) to try and better explain and one that is more representative to my actual dataset. df = pd.DataFrame({'lookup':['abc123','abc124','abc123','abc125','abc125'], 'Supported':['no','yes','no','yes','yes'], 'Percentage':[0.9,0.6,0.6,0.7,0.6], 'Number of features':[1,6,10,8,11], 'Platform':['Release 1.0','Release 1.0','Release 2.0','Release 1.0','Release 2.0'] }) The output should look like the following: lookup Supported Percentage Number of features Platform 0 abc123 no 0.9 1 Release 1.0 1 abc124 yes 0.6 6 Release 1.0 2 abc123 no 0.6 10 Release 2.0 3 abc125 yes 0.7 8 Release 1.0 4 abc125 yes 0.6 11 Release 2.0 Column "lookup", Row 0 and 2 are common values, as are rows 3 and 4. My goal is to have one row per value in column "lookup", filtered on the highest value in column "Number of features" and all other columns values for the selected row should be shown in the output data frame. Using the following group = df.groupby('lookup').max() creates: Supported Percentage Number of features Platform lookup abc123 no 0.9 10 Release 2.0 abc124 yes 0.6 6 Release 1.0 abc125 yes 0.7 11 Release 2.0 But the percentage is wrong for rows abc123 and abc125, as its has included the highest percentage in each of the groups. My desired result is as follows:- abc123 no 0.6 10 Release 2.0 abc124 yes 0.6 6 Release 1.0 abc125 yes 0.6 11 Release 2.0 where values for columns "Supported', 'Percentage' are taken "as-is' from the dataframe row that contains the row with the highest "Number of features' In my script I am using group = df.groupby('lookup')['Number of features'].max() which returns the following, but I am missing the other columns, in this example Supported, Percentage and Platform. lookup abc123 10 abc124 6 abc125 11 Also, if I try to save the dataframe to csv, I only get the following Number of features 10 6 11 I would have expected to have this csv output? lookup Number of features abc123 10 abc124 6 abc125 11 Thanks again.. and I hope this is more descriptive?
@ChartExplorers
@ChartExplorers 3 жыл бұрын
@@apz9022 thanks for providing the example, that clarifies things a lot. If you use the same dataframe you created in your example you should be able to use the following code: new_df = pd.DataFrame(pd.DataFrame(columns=df.columns)) for item in df['lookup'].unique(): temp_df = df[df['lookup']==item] row = temp_df[temp_df['Number of features'] == temp_df['Number of features'].max()] alist.append(row) new_df = pd.concat([new_df, row], ignore_index=True) new_df Sadly, this uses a for loop. There might be another way to do this would avoid the for loop (I need to work on it a little more to get it to work - I'll let you know if I get it to work). I'm also going to look into groupby a little more. There are some cool things you can do with groupby, but this has several constraints that I do not think groupby will support. With 800 rows and 20 columns performance should not be an issue (but it's always nice to squeeze as much performance out as possible just for fun!). Hope this works. Let me know.
@apz9022
@apz9022 3 жыл бұрын
@@ChartExplorers Thanks.. what is "alist.append" ? I get an error stating "alist" is not defined?
@apz9022
@apz9022 3 жыл бұрын
@@ChartExplorers Thanks.. updated my code and its working like a charm! Thanks. One point, alist.append(row) did not work for me? I have left it out and it still seems to work. What does this do?
@pursh2002
@pursh2002 3 жыл бұрын
# function that groups data by attribute1 and calculates per-group statistics for attribute2 mean and count , how do we make a function for this def get(data, attr1, attr2, statistic):
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Hi Pursh, I'm not sure if I understand exactly what you are trying to accomplish. Are you trying to obtain the mean and count on groups based on multiple columns/attributes? df.groupby(['pclass','sex], as_index=False)['survived'].agg(['mean','count']) If this is the case I'm not sure the purpose of creating a function to do this.
@jha6783
@jha6783 Жыл бұрын
how do you know what is young, middle_age or old. This is not defined.
@Abdullah_Alhathloul
@Abdullah_Alhathloul 8 ай бұрын
nice
@azrflourish9032
@azrflourish9032 3 жыл бұрын
why '?' is needed while reading a csv file??
@ChartExplorers
@ChartExplorers 3 жыл бұрын
Good question, I should have explained this in the video. In the csv file missing data is represented with '?'. When we read in missing data into pandas we can tell it that missing data is represented by then pandas will treat it as a missing value rather than getting confused.
@azrflourish9032
@azrflourish9032 3 жыл бұрын
@@ChartExplorers oh, thank you (^ ^)
@shekharmandal4569
@shekharmandal4569 Жыл бұрын
goat
@ericfayhuynh
@ericfayhuynh Жыл бұрын
looks like the data set is outdated
@abhishekpanda85
@abhishekpanda85 8 ай бұрын
simpler way to explain things...
Convert to DateTime
5:37
Chart Explorers
Рет қаралды 15 М.
How to use the Pandas GroupBy function | Pandas tutorial
19:03
Mısra Turp
Рет қаралды 31 М.
How to filter a pandas DataFrame | 6 HELPFUL METHODS
17:27
Chart Explorers
Рет қаралды 30 М.
The Complete Guide to Python Pandas Groupby
44:17
Ryan & Matt Data Science
Рет қаралды 10 М.
Group By and Aggregate Functions in Pandas | Python Pandas Tutorials
11:05
This INCREDIBLE trick will speed up your data processes.
12:54
Rob Mulla
Рет қаралды 268 М.
This Is Why Python Data Classes Are Awesome
22:19
ArjanCodes
Рет қаралды 815 М.
How do I select multiple rows and columns from a pandas DataFrame?
21:47
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 275 М.
Learning Pandas for Data Analysis? Start Here.
22:50
Rob Mulla
Рет қаралды 114 М.