What do I need to know about the pandas index? (Part 2)

  Рет қаралды 68,462

Data School

Data School

Күн бұрын

In part two of our discussion of the index, we'll switch our focus from the DataFrame index to the Series index. After discussing index-based selection and sorting, I'll demonstrate how automatic index alignment during mathematical operations and concatenation enables us to easily work with incomplete data in pandas.
SUBSCRIBE to learn data science with Python:
www.youtube.co...
JOIN the "Data School Insiders" community and receive exclusive rewards:
/ dataschool
== RESOURCES ==
GitHub repository for the series: github.com/jus...
"set_index" documentation: pandas.pydata.o...
"value_counts" documentation: pandas.pydata.o...
"sort_values" documentation: pandas.pydata.o...
"sort_index" documentation: pandas.pydata.o...
"Series" documentation: pandas.pydata.o...
"concat" documentation: pandas.pydata.o...
Indexing and selecting data: pandas.pydata.o...
== LET'S CONNECT! ==
Newsletter: www.dataschool...
Twitter: / justmarkham
Facebook: / datascienceschool
LinkedIn: / justmarkham

Пікірлер: 134
@Diachron
@Diachron 7 жыл бұрын
Crystal clear as always. This channel is the unsung hero of pandas instruction on KZbin.
@dataschool
@dataschool 7 жыл бұрын
Wow, thank you so much for the incredibly nice comment!
@themustknowfacts510
@themustknowfacts510 3 жыл бұрын
@@dataschool I have one doubt. Why did pandas remove the Index Name "country" from the DataFrame when we concatenated the drinks DataFrame with People series? You can observe it was there in the DataFrame before concatenation.
@ikura18
@ikura18 3 жыл бұрын
Greatly clear explanation. This is still helpful in 2021. Thank you.
@dataschool
@dataschool 3 жыл бұрын
Great!
@20hawkar10
@20hawkar10 7 жыл бұрын
Very well explained Kevin. This series has been a resource I'm come back to time and time again. Please continue your great work.
@dataschool
@dataschool 7 жыл бұрын
That's great to hear! Thanks for your encouraging words!
@nghiepcrypto7034
@nghiepcrypto7034 4 жыл бұрын
I like your videos and the way you explain everything. It's just so clear and easy to understand. Thanks from Viet Nam
@dataschool
@dataschool 4 жыл бұрын
Thank you! P.S. I have been to your country (Hanoi, Hoi An, and more...)
@nghiepcrypto7034
@nghiepcrypto7034 4 жыл бұрын
@@dataschool if you are going to visit Ha Noi again. I'm very happy to be your tour guide.
@sunyboy333
@sunyboy333 3 жыл бұрын
Really powerful mechanism to know, thank you very much for showing it!
@dataschool
@dataschool 3 жыл бұрын
You're very welcome!
@bettyfish1749
@bettyfish1749 7 жыл бұрын
these are SO helpful...thanks so much for doing these!
@dataschool
@dataschool 7 жыл бұрын
You're very welcome! So glad to hear!
@pradneshkalkar5525
@pradneshkalkar5525 4 жыл бұрын
Hi Kevin, when I do drinks.beer_servings * people which you have mentioned in your video (8:57), the values I get after multiplying are floats and not ints as in your video, pls explain this
@DodovNes
@DodovNes 8 жыл бұрын
Thank you very much! You are an amazing teacher!
@dataschool
@dataschool 8 жыл бұрын
You're welcome, and thanks for your kind words!
@sukhendutarafder381
@sukhendutarafder381 6 жыл бұрын
Best teacher on Pandas... thank you Sir
@dataschool
@dataschool 6 жыл бұрын
You're welcome!
@MmahamRroblox
@MmahamRroblox 6 жыл бұрын
First time I knew the concept of indexes after years by just seeing your two videos on series. Great work
@dataschool
@dataschool 6 жыл бұрын
That's awesome! Thanks for sharing.
@manikanta4418
@manikanta4418 3 жыл бұрын
Very Thankful for your work. Is it possible to find index and columns based on particular data value of one variable1 and use them to select variable2 data in the same file for corresponding index and columns.
@kennethstephani692
@kennethstephani692 8 ай бұрын
A terrific video, as always!!
@dataschool
@dataschool 6 ай бұрын
Thank you!
@GauravSharma-HP-23
@GauravSharma-HP-23 5 жыл бұрын
i am trying to fetch data from the CSV file.I am not able to handle string data types. The below code is perfectly working for int datatypes. But while it is fetching string data then in output it is showing Object details along with index no".So i don't want Object details and index number in the output. Please let me know how can i do that. Here below is my code: ----------------------------------------------------------------------- import pandas as pd df=pd.read_csv('C:\ABCC.csv') k=df[df.GS==df['GS'].max()] print("NR PDSCH scheduled rank-SCG SCell :",int(k['NR PDSCH scheduled rank-100'])) print("NR PDSCH modulation for CW0 :",k['NR PDSCH modulation for CW0-100']) ----------------------------------------------------------------------- The output of my Code: NR PDSCH scheduled rank-SCG SCell : 4 NR PDSCH modulation for CW0 : 422 16-QAM Name: NR PDSCH modulation for CW0-100, dtype: object ----------------------------------------------------------------------- I don't want 422-->Index No and "Name: NR PDSCH modulation for CW0-100, dtype: object" in my output. Below is the desired format of output: ----------------------------------------------------------------------- NR PDSCH scheduled rank-SCG SCell : 4 NR PDSCH modulation for CW0 : 16-QAM -----------------------------------------------------------------------
@dataschool
@dataschool 4 жыл бұрын
I won't be able to help, sorry!
@asneogy
@asneogy 8 жыл бұрын
Wow the index based multiplication was just awesome. I was expecting a merge and then a new column calculation. But this is super shortcut and super efficient (at least in terms of coding effort). Thanks! As a caveat I think its important to point out that if you do a pd.concat() WITHOUT setting a sensible index, it will patch things together based on the default index - which totally messes it up.
@dataschool
@dataschool 8 жыл бұрын
Great points... the index-based alignment is very handy!
@abhishekguha1837
@abhishekguha1837 3 жыл бұрын
Amazing videos. The right amount of depth and breadth is covered for each topic. Just wow! Keep up the good work sir.
@dataschool
@dataschool 3 жыл бұрын
Thank you!
@RobertWF42
@RobertWF42 5 жыл бұрын
It's still not entirely clear to me why we use indices in data frames, as well as all of the index operations (resetting, sorting, etc.). I get that an index essentially just a row # label. But couldn't we still analyze data frames & run predictive models without using indices, or with indices that are out of order? Coming from a SAS background, I didn't explicitly use indices or index operations.
@dataschool
@dataschool 5 жыл бұрын
The index is key to a lot of pandas functionality, such as alignment. You could build a system that doesn't use the index, and that is a valid design decision, but you would be making various tradeoffs.
@boughrood
@boughrood 4 жыл бұрын
How do you get a physical frame around the data, i.e. the lines forming boxes which neatly holds and displays the data ? Im using Idle which is just producing figures on a blank background.
@dataschool
@dataschool 4 жыл бұрын
That's because I'm using Jupyter notebook - Hope that helps!
@manujjoshi1682
@manujjoshi1682 3 жыл бұрын
I have gone through 17 of these pandas videos by you but the more I delve into all this the more I realise that its just memorising all these commands rather than framing them myself by logic. I feel that your videos will be more beneficial if you could teach your viewer how to build a command when they know nothing about it. After all, how many commands can we memorise? I understand that basic vocabulary like sort etc will be needed to be memorised, but the commands should not be, right? I want to talk to my computer, I don't want to just be a slave to these memorised commands.
@dataschool
@dataschool 3 жыл бұрын
Thanks for sharing your perspective!
@rubayetalam8759
@rubayetalam8759 2 жыл бұрын
these videos are from 6 years ago, they are so good! Thanks Kevin!
@dataschool
@dataschool 2 жыл бұрын
Thank you!
@Chandrikareddyy
@Chandrikareddyy 7 жыл бұрын
I'm trying to add a series S to a dataframe df by comparing the index values of series and df. Index values of series S is as follow S=[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 13, 14, 16, 17, 18, 19, 20, 21, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,41, 42]] df=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 13, 14, 16, 18, 19, 20, 24, 25, 26, 27, 28, 29, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42]] but confused on compare index values.can share your idea on how to do that ?
@dataschool
@dataschool 7 жыл бұрын
The concat method will automatically align the data by the index, so there's no need to compare the index values yourself. Does that help?
@Chandrikareddyy
@Chandrikareddyy 7 жыл бұрын
yeah ! I got that thanks.I have one more problem.I want to delete different columns rows based on a row value.eg : if i have col1|cpl2|col3 with few row values & now I want to delete col1row value | col2 row value and col3 row value based on col3 row value.is it possible?
@dataschool
@dataschool 7 жыл бұрын
I'm sorry, I don't quite understand what you are trying to do... could you explain in more detail? Thanks!
@Chandrikareddyy
@Chandrikareddyy 7 жыл бұрын
eg : i have dataframe like name|age|year dil 24 1993 bil 24 0 kil 42 0 now I want to delete rows from 3 columns based on year column row with 0 my output should look like name|age|year dil 24 1993 reaming two rows from 3 columns should be deleted based on the row value of year column Hope you got my point :)
@archidar1
@archidar1 7 жыл бұрын
i think what you should do is: dataframe.loc[(year != 0), :]
@xannosp.6894
@xannosp.6894 5 жыл бұрын
wow, 3 years ago and still the best indexing tutorial. Thank you!
@dataschool
@dataschool 5 жыл бұрын
You are too kind, thank you!
@Tony-mt4pi
@Tony-mt4pi 5 жыл бұрын
Love your videos! They're all great.
@dataschool
@dataschool 5 жыл бұрын
Thank you! 😊
@prashant4moni
@prashant4moni 6 жыл бұрын
Great videos
@dataschool
@dataschool 6 жыл бұрын
Thanks!
@ajayrana4296
@ajayrana4296 3 жыл бұрын
how to delete index if i want
@hitsviralonly2215
@hitsviralonly2215 5 жыл бұрын
Hi Sir Im getting "Name: Population, dtype: object" but your getting "Name: Population, dtype: int64 ???
@hitsviralonly2215
@hitsviralonly2215 5 жыл бұрын
i got the output thank you.
@dataschool
@dataschool 5 жыл бұрын
Great!
@Taran72
@Taran72 4 жыл бұрын
Thank you so much for making this series. I am not a programmer, but I find your videos very helpful, concise, clear and I make me learn quickly. This is ideal if you are already in the workforce (like me) and you need programming to complete a project...but don't have time to go back to school. :)
@dataschool
@dataschool 4 жыл бұрын
Thank you so much for your kind words! I truly appreciate it 🙏
@LonglongFeng
@LonglongFeng 7 жыл бұрын
thanks for the video. index alignment + pd.concat could solve many dataset combining problems...woohoo~~
@dataschool
@dataschool 7 жыл бұрын
Excellent!
@andreykuskov8807
@andreykuskov8807 6 жыл бұрын
When you use 'inplace=True', what exactly happens with a dataframe? It will be changed permanently? What if I want to change it temporally? Is it possible or any operation that changes a dataframe always comes with 'inplace=True'?
@dataschool
@dataschool 6 жыл бұрын
When you use inplace=True, it changes permanently. Use inplace=False to change it temporarily.
@MrBhargavafirst
@MrBhargavafirst 6 жыл бұрын
also we have one question suppose we have the model created and we need to test it using complete dataset which is huge (approx 1 million rccords and 1000 columns) what would be the configuration of the sandbox and thought on this ?
@dataschool
@dataschool 6 жыл бұрын
Sorry, I'm not sure how to answer that... good luck!
@rakuma98
@rakuma98 6 жыл бұрын
ufo.describe().loc['25%'] works but ufo.desribe()['25%'] didn't why ??? {drinks.continent.value_counts()['africa'] was a similar case}
@dataschool
@dataschool 6 жыл бұрын
value_counts() outputs a Series, and you can select elements of a Series by putting the index in brackets. describe() outputs a DataFrame, and selection for DataFrames works differently. Hope that helps!
@amitgupta-dy4vg
@amitgupta-dy4vg 6 жыл бұрын
# TODO: Select three indices of your choice you wish to sample from the dataset indices = [] # Create a DataFrame of the chosen samples samples = pd.DataFrame(data.loc[indices], columns = data.keys()).reset_index(drop = True) Can someone please explain me the "data.keys() part .
@dataschool
@dataschool 6 жыл бұрын
What data type is the "data" object?
@yffzju3405
@yffzju3405 7 жыл бұрын
wow,thank u very much for those videos, and for the video 18 I have a question that how can I make the function 'concat()' happen inplace when it doesnt have the arg 'inplace'. my english is not good hope u can understand my question.
@dataschool
@dataschool 7 жыл бұрын
'concat' can't happen in-place. Instead, it is designed to combine two existing objects into a new object. Hope that helps!
@vishavgupta3717
@vishavgupta3717 4 жыл бұрын
Please answer as soon as possible.I want your help.How to append pandas dataframe below an existing Excel file with the help of any library other than openpyxl. I have used openpyxl but openpyxl removes pivot table from my excel sheet. Please provide me code .How to achieve it? Xlwings or win32com library
@dataschool
@dataschool 4 жыл бұрын
Sorry, I won't be able to help - good luck!
@vishavgupta3717
@vishavgupta3717 4 жыл бұрын
@@dataschool Hello.Thanks for your reply.I have achieved it with xlwings
@dataschool
@dataschool 4 жыл бұрын
Great to hear!
@gunjankumar2267
@gunjankumar2267 6 жыл бұрын
hi, fantastic video, thumbs up for the way you teach. suppose i have a data set with say 200 rows, with the help of nrows= 40 , i am able to read the first 40 rows, how to read data set with rows from 40 to 70..or any other value.
@dataschool
@dataschool 6 жыл бұрын
You should be able to accomplish your goal using various read_csv parameters.
@andreapiras1671
@andreapiras1671 6 жыл бұрын
thanks for your job! Is it possible to separate a column containing two coordinates that are separated from a space character in two columns with the respective coordinate? (the column is a object type) thank you
@dataschool
@dataschool 6 жыл бұрын
I'm sorry, I don't precisely understand what you are trying to do. If you could clarify with more details, I might be able to help. Thanks!
@赵紫薇-w5y
@赵紫薇-w5y 7 жыл бұрын
hi, teacher, I have another question. when there is pd. in front of the command, like pd.concat(), when there is no need to use pd. in front of the command. Thank you for your class!!
@dataschool
@dataschool 7 жыл бұрын
pd is part of the command because it's a pandas function. However, if you have a function like "len" which is a built-in Python function, then you just use it by writing "len(whatever)". Also, if you import the function using "from pandas import concat", then you can just use it as "concat(whatever)" rather than "pd.concat(whatever)". Hope that helps!
@raysun5889
@raysun5889 6 жыл бұрын
Hi, I'd like to ask if it's possible to set the newly set index to align(in the same row ) with the others, since from the example it seems the 'country' index is lower the rest of the indexes. thanks for the tutorial BTW.
@dataschool
@dataschool 6 жыл бұрын
If you set country as the index, then 'country' becomes the name of the index, and is no longer part of the columns attribute. Thus, when the DataFrame is being displayed, 'country' will not show up next to the rest of the columns. Does that answer your question? Let me know!
@Entn86
@Entn86 6 жыл бұрын
Can u please make explain the difference between LOC function and GroupBy
@dataschool
@dataschool 6 жыл бұрын
This video explains loc: kzbin.info/www/bejne/rqfTf3Rtl6hrmdU This video explains groupby: kzbin.info/www/bejne/p6qTl3enpLJ9rpo
@chrisoburu3532
@chrisoburu3532 4 жыл бұрын
HOW CAN I DELETE THE INDICES
@dataschool
@dataschool 4 жыл бұрын
The pandas DataFrame always has an index.
@MrBhargavafirst
@MrBhargavafirst 6 жыл бұрын
hi kavik can you please make video on how to perform resampling data using python this will be helpful for us
@dataschool
@dataschool 6 жыл бұрын
Thanks! I cover resampling in my DataCamp course: www.datacamp.com/courses/analyzing-police-activity-with-pandas?tap_a=5644-dce66f&tap_s=280411-a25fc8
@sideegsalahmustafahassan4547
@sideegsalahmustafahassan4547 7 жыл бұрын
How can I get the name of the country which has the max number of win_servings?
@dataschool
@dataschool 7 жыл бұрын
Here are two different methods that both work: drinks.loc[drinks.wine_servings == drinks.wine_servings.max(), 'country'] drinks.sort_values('wine_servings', ascending=False).head(1).country Hope that helps!
@m_fadhln
@m_fadhln 4 жыл бұрын
no extra tip?? unsubsribing. hehehe jk. i love your works!
@dataschool
@dataschool 4 жыл бұрын
😄
@jshaaan
@jshaaan 6 жыл бұрын
how do i remove name: & dtype while using drinks.continent.value_counts(). I need only the output
@dataschool
@dataschool 6 жыл бұрын
I don't think there is an easy way to remove it, I'm sorry!
@何喆帆
@何喆帆 6 жыл бұрын
Thanks for nice video! Is there any tutorials that you explain the difference among Merge, Join and Concat?
@dataschool
@dataschool 6 жыл бұрын
No, I don't have a video like that, sorry!
@ansontong560
@ansontong560 6 жыл бұрын
Thanks, Really Learn lot from your teaching video!!!!! U r great teacher =]
@dataschool
@dataschool 6 жыл бұрын
Thank you for your kind words!
@Dopeboyz789
@Dopeboyz789 5 жыл бұрын
How do you open up pandas
@dataschool
@dataschool 5 жыл бұрын
First you install the library, then you import it. Maybe this video will help? kzbin.info/www/bejne/r6usfpyomKyIa6s
@adilzade
@adilzade 6 жыл бұрын
Excellent tutorial. I clicked to ad 10 times so you can get profit from what yo do. You have to create a course in udemy
@dataschool
@dataschool 6 жыл бұрын
Thanks! :)
@lukassteindl1914
@lukassteindl1914 4 жыл бұрын
great stuff! looking forward to the multiindex video, could you add a link to it once it`s available please?
@dataschool
@dataschool 4 жыл бұрын
Sure! Here's the video: kzbin.info/www/bejne/qpS1eJRoqNSWY8U
@swadhikarc7858
@swadhikarc7858 4 жыл бұрын
excellent video all around. practical examples are exceptional brother!
@dataschool
@dataschool 3 жыл бұрын
I appreciate that!
@ItsWithinYou
@ItsWithinYou 2 жыл бұрын
Super!
@dataschool
@dataschool 2 жыл бұрын
Thank you!
@alkamansi
@alkamansi 8 жыл бұрын
hey this is really useful and the explanations are very detailed....great work
@dataschool
@dataschool 8 жыл бұрын
Thanks! I'm happy to help.
@erfannazari6110
@erfannazari6110 4 жыл бұрын
in our language, I would tell you : "baba to dige ki hasti!"
@dataschool
@dataschool 4 жыл бұрын
😄
@pankaj3856
@pankaj3856 4 жыл бұрын
How can i add a new row which is the total of all
@dataschool
@dataschool 4 жыл бұрын
pandas DataFrames aren't intended to be used in that way. DataFrames contain "the data", and if you want summary statistics about the data, then you run those functions on the DataFrame. That's the pandas way of thinking. Hope that helps!
@pankaj3856
@pankaj3856 4 жыл бұрын
Thanks i did: total = grouped_state.append(grouped_state.iloc[:,2:].sum(),ignore_index=True) total.loc[35, 'STATE/UT']='Total'
@jd5787
@jd5787 6 жыл бұрын
Sorry for the stupid question (I am jumping from one video to the next without following the sequence...) but what is the difference between those 2 syntax please? df['column'] and df.column (or say: drinks['continent'] and drinks.continent to stick to the example here). Thank you
@dataschool
@dataschool 6 жыл бұрын
They are the same! However, bracket notation will always work, whereas dot notation will not always work. More details are here: kzbin.info/www/bejne/sKnUm5ivgLVlis0
@jd5787
@jd5787 6 жыл бұрын
Data School thanks!
@dataschool
@dataschool 6 жыл бұрын
You're welcome!
@jay199334
@jay199334 6 жыл бұрын
can any one tell me how to remove index??
@dataschool
@dataschool 6 жыл бұрын
The index of a DataFrame cannot be removed.
@Fateaha
@Fateaha 6 жыл бұрын
I think it's like join in SQL, Right?
@dataschool
@dataschool 6 жыл бұрын
You mean concatenation? Yes, it's similar.
@taherhekmatfar795
@taherhekmatfar795 6 жыл бұрын
So simple and clear
@dataschool
@dataschool 6 жыл бұрын
Glad it was helpful to you!
@sankyeat
@sankyeat 5 жыл бұрын
One of the best channels for data science.
@dataschool
@dataschool 5 жыл бұрын
Thanks for watching and commenting! :)
@Pradeepkumar-is8vs
@Pradeepkumar-is8vs 6 жыл бұрын
You are the best
@dataschool
@dataschool 6 жыл бұрын
You are too kind :)
@rvma77
@rvma77 7 жыл бұрын
Once again, great videos!
@dataschool
@dataschool 7 жыл бұрын
Thanks!
@vivektiwari3459
@vivektiwari3459 7 жыл бұрын
How do I add a new row to the data set?
@dataschool
@dataschool 7 жыл бұрын
You would create a second DataFrame, and then concatenate the DataFrames using the concat function. Learn more about concat here: kzbin.info/www/bejne/Y4DZYoFnlKuVhpo
@vivektiwari3459
@vivektiwari3459 7 жыл бұрын
Thanks. Got it!
@ericguo8209
@ericguo8209 8 жыл бұрын
Can you speak faster please thanks!
@dataschool
@dataschool 8 жыл бұрын
I'm sorry, but my current pace of speaking is deliberately slower in order to be accessible to a wide audience that includes people for whom English is not their first language. However, if you would like play my videos faster, the KZbin controls do allow that (at least on most browsers and devices). Thanks for understanding!
How do I select multiple rows and columns from a pandas DataFrame?
21:47
What do I need to know about the pandas index? (Part 1)
13:37
Data School
Рет қаралды 134 М.
Will A Guitar Boat Hold My Weight?
00:20
MrBeast
Рет қаралды 264 МЛН
The joker favorite#joker  #shorts
00:15
Untitled Joker
Рет қаралды 30 МЛН
How do I apply a function to a pandas Series or DataFrame?
17:58
Data School
Рет қаралды 202 М.
How do I handle missing values in pandas?
14:28
Data School
Рет қаралды 197 М.
How do I use the MultiIndex in pandas?
25:01
Data School
Рет қаралды 174 М.
Learning Pandas for Data Analysis? Start Here.
22:50
Rob Mulla
Рет қаралды 102 М.
How do I create dummy variables in pandas?
13:14
Data School
Рет қаралды 86 М.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 271 М.
How do I explore a pandas Series?
9:51
Data School
Рет қаралды 75 М.
How do I make my pandas DataFrame smaller and faster?
19:06
Data School
Рет қаралды 66 М.
My top 25 pandas tricks
27:38
Data School
Рет қаралды 269 М.
How do I merge DataFrames in pandas?
21:49
Data School
Рет қаралды 158 М.