More of your pandas questions answered!

Рет қаралды 28,091

Күн бұрын

Пікірлер: 71

@freakybob13579 4 жыл бұрын

At 10:00 , i saw that you read the important part and ignored the negativity in the latter part of the question. That is another lesson we can take from this video on top of Learning Python and Pandas.

@praveenng 7 жыл бұрын

Great video, just like the other videos in the series. Thanks for your time and effort! If loc were exclusive of the second limit, it would be impossible to select the last column or last row. This could be another reason to make loc inclusive.

@dataschool 7 жыл бұрын

Great point! And, thanks for your kind words - I'm glad the videos have been helpful to you!

@ludovicolaci9582 4 жыл бұрын

I guess that's probably why Kevin made this labels and indexes distinction between loc and iloc. iloc is using kind of lists structure for referencing. like start at 0 and end at n-1 (like indexes also (start why 0 until number_of_rows -1)) on the other hand loc wants you to be able of referencing by label name for columns so to not mix up both concepts inside of the code of loc method they (pandas developpers) probably choose to include everything on loc. and keep a normal structure for iloc.

@ramleo1461 5 жыл бұрын

I'm nt getting words to describe hw helpful ur videos r... Pls keep up the good work... Tutorial on numpy as wel.. Plss

@dataschool 5 жыл бұрын

Thank you!

@dariuszspiewak5624 2 жыл бұрын

The convention with .loc (it includes both ends) is fully understandable when one realizes that labels do not continue forever like the integers do. So, if one wants to select all columns, which label would one use as the last? What's after 'Time' in this case? There isn't anything that would logically follow 'Time.' Of course, one could have a convention that in this case None would do the trick but... what if None can be a label as well? And it probably can! With integers there's no such puzzle because they do continue up forever. Therefore there's no ambiguity if there's no (n+1)th column (or even (n + m)th). I think this is the logical explanation of why there are 2 different conventions, one for .loc and the other for .iloc.

@dataschool 2 жыл бұрын

Thanks for sharing, Dariusz!

@ajaybhandari9596 9 ай бұрын

Hi sir, Thanks for making videos I watched many videos on KZbin related Python but couldn't understand anything ,When I heard about your channel, I checked and now I have learnt all the major concepts of the Python . So I wanna thank you for your support. And i am sure that in future I'll not have to face any kind of problems in pandas.. And I have a question = how can we write a function for any problem in python? is there any Way to learn or apply writing function in pandas .. thanks!!

@dataschool 8 ай бұрын

I have a course that will help you with writing functions: courses.dataschool.io/python-essentials-for-data-scientists

@jigneshpatel-ns2it 4 жыл бұрын

How I can convert to datetime format and select year for univariate biplot

@sherlockom22 3 жыл бұрын

i think '~' act as a 'not' operator . means not in 'train'. let me know if this is correct. :-D

@guptaachin 8 жыл бұрын

About loc and iloc: The two methods may be unplanned but the need to hate them can be obviated. How? loc Since loc only takes the actual names of the columns(str generally) or indexes(int generally) it is made exclusive on both the sides. Because during specifying the slice we say 'select from this label to this label'. Making it exclusive will not make sense. iloc - only accepts the positions irrespective of the coincidence of integers(in case of indexes). Since it operates on behalf of the positions it can make the later part of the slice exclusive (may have been made inclusive by the coders). This may not be the exact explanation for the problem but sure helps you to not be annoyed and confused everything the aforementioned methods are used. Summary. loc - labels so inclusive iloc - integer positions so exclusive for latter part of the slice.

@dataschool 8 жыл бұрын

Great summary, thanks!

@saracachique8500 2 жыл бұрын

Hello, thank you for this video, I have a question about one of the your dataframe is about of the actors, for example if i want to count the number of the actors of that list string, how can i do it?

@dataschool 2 жыл бұрын

You'll have to use string methods, see this video: kzbin.info/www/bejne/mKDJknZmfsieftE Hope that helps!

@mariodamhur1151 6 жыл бұрын

Hi. Great video! Thanks for your time. I have a question. If I have a folder with n subfolders that are the classes of my data (let's say cat/ and dog/), how I read as dataframe? Or another similar but with train/ and test/ and inside them dog/ and cat/?

@dataschool 6 жыл бұрын

It depends on exactly what is in each folder, but you would likely have to use 'glob.glob' to get a list of the filenames, and then loop through them, reading each one and appending to the DataFrame.

@yangw9328 7 жыл бұрын

Thank you, Data School, for posting amazing pandas Q&A videos. If someone else has not raised this question, could you explain how you use logging library in jupyter notebook? For example, if I want to print the message on the notebook and export to a separate file on the disk, is there any ipython magics for this or I should still use logging library? Thank you and look forward to your suggestion on this.

@dataschool 7 жыл бұрын

Glad you are enjoying the pandas videos! Regarding your question, I actually don't know. But if you figure it out, please let me know!

@slavrine 6 жыл бұрын

Great questions and great explanations!

@dataschool 6 жыл бұрын

Thanks!

@sergeyteslyuk5496 8 жыл бұрын

Regarding the last question, i suppose that it was about how do i read csv file using sampling, not how to use sample method for DataFrame that has been already read.

@dataschool 8 жыл бұрын

It's hard to say for sure... both are good questions! I'll try to cover the scenario you are describing in a future video.

@ghanemimehdi1063 8 жыл бұрын

Hello, I work with pandas data frame, and i want to send collumn to Qt software but whene i do that i have an error that say that the pandas object is not serializable How can i solve this error please !!

@dataschool 8 жыл бұрын

Have you tried the to_pickle DataFrame method? pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_pickle.html

@kostasnikoloutsos5172 7 жыл бұрын

Joshua dan I love you! Best question

@tamal_sen 5 жыл бұрын

Hello Kevin, thank you for all your effort and time. please elaborate the reason for setting the parameter for 'random_state=99' at 16:39. and what possible scenario will occur if I set to with different values? thank you.

@dataschool 5 жыл бұрын

Great question! It's for reproducibility, meaning it will produce the same result every time you run it. You can use any integer. I wish I could explain more, but it's a complicated topic. Hope that helps!

@fffppp8762 7 жыл бұрын

Great videos. I have a question and it confuses me . In video 23, whatever value you set for random_state, it always returns some constant value. In this case, then why do we set different value for random_state (e.g 1,9,42 )? Thanks

@dataschool 7 жыл бұрын

The goal of setting a random_state is to ensure reproducibility, meaning that you will get the exact same results every time you run your code. However, it doesn't actually matter what number you use as the "seed" (that's the term use to refer to that number), as long as you don't change it in between runs.

@user-jq3um6ez2h 6 жыл бұрын

Thank you for your videos

@dataschool 6 жыл бұрын

You're welcome!

@thliang17 8 жыл бұрын

Hi Kevin. Thanks for so many very useful videos. My question: the code ![](plot_name.png) does not work me. plot_name.png is in my working directory. I always have error: '[]' is not recognized as an internal or external command, operable program or batch file. Do I need to install some packages? Thanks.

@dataschool 8 жыл бұрын

That is Markdown code, and so the cell has to be run as a Markdown cell in order for it to render. Currently, you're trying to run it as a code cell, and that generates an error because it's not Python code. Just convert it to a Markdown cell, and re-run it, and it should work!

@thliang17 8 жыл бұрын

Thank you Kevin. That works! Appreciate and anticipate your more videos! :-)

@yashovardhanmopur438 6 жыл бұрын

Great video! Sorry for the dumb question. I wanted to know the difference between 2 set of tools available. One set of tools involve Pig, Hive, Flume, Spark, etc. and the other side there are set of tools like Pandas, Numpy, etc. Can we use these tools interchangeably or combine these tools to perform data analysis and data munging?

@dataschool 6 жыл бұрын

The first set is tools for big data (across various languages), and the second set is tools for small data (all in Python). You would generally perform your analysis and munging in one tool, but if you use two tools, then ideally they will be in the same language. Hope that helps!

@ish694 7 жыл бұрын

Just curious. There are like soooo many parameters. Do all of them even come in day to day use or have you ever used all of them ??

@dataschool 7 жыл бұрын

I have definitely not used all of them! :)

@Om-iy9ix 6 жыл бұрын

Hie there, thanks for awesome videos. Doubt :- why is ufo.iloc[:,'City':'State'] not working? It shows error that cannot do slice on class pandas.core.indexes.base.index with indexers[city] of class str

@dataschool 6 жыл бұрын

I think you want to use loc instead of iloc in that case.

@mahesh_kok 5 жыл бұрын

Utsav Patel ...Arey Bhai iloc accepts index based slicing and you are providing name of columns ..so u have two option ...use loc instead of this and u will get the desired output but if u still want to use iloc use the column based index like.... ufo.iloc[:, 0:4] and it will work....

@roygao9477 4 жыл бұрын

Many thanks Kevin, the video is very helpful. But I still have a question about "Random". I understand that you used it to "Slice" a table including rows and columns. But I tried the same function to randomly pick up a cell value from a column/series, but it even picked up the "type" information (e.g. "12 49017.0Name: Project ID (IHRIS), dtype: float64"), would you know the reason? Also, is it possible to make a video to teach how to write the data into a particular excel template, instead of just saving it into any excel (assuming there's an existing table in target excel spreadsheet). Thank you.

@dataschool 4 жыл бұрын

I'm sorry, I won't be able to help... good luck!

@enian82 8 жыл бұрын

Thank you the amazing Tutorials. Seriously these are one of the best I have seen. I have a quick question. In Ipython notebook once u execute a cell and then save the notebook. If you reopen the notebook that cell which we executed stays executed, How do you make it in such a way its not executed yet ?

@daxpicture9996 6 жыл бұрын

In Jupyter, use the file menu at the top and select Kernel and then Restart & Clear Output.

@sushichanel7299 6 жыл бұрын

train=ufo.sample(frac =0.75,random_state=99) test.shape (0, 5) I don't know why row is 0. Please advise.

@dataschool 6 жыл бұрын

I think you've made an error in your code prior to this line.

@Martin-lv1xw 4 жыл бұрын

I think I am not so late! its 2020 but I have grasp python knowledge so fast via your teaching. Before, I used to work with R and I have developed a behavior of passing arguments into a function without their names which used to work pretty fine in R but not in python. Is there a trick to do that?

@dataschool 4 жыл бұрын

It depends on the particular scenario... I don't know how to explain briefly, sorry!

@sipanarevshatyan6354 7 жыл бұрын

Hi, thanks for your interesting tutorials. I have a question, could you tell me please how can I make tables from data frames and save those tables as a .png image. And how to give a specific colour to each row?

@dataschool 7 жыл бұрын

I'm sorry, I'm not familiar with how to do that. If you figure it out, I'd love to hear!

@mahesh_kok 5 жыл бұрын

i would like to reframe the answer to the 3rd question why both ranges are inclusive in loc.. at time 14:20 lets assume i want to change the Row Label to that of City..i will use ufo.set_index('City') ... Now try using loc function and u want the rows from ithaca to Abilene i,e first 4 rows...we will write code as ufo.loc['ithaca':'Abilene'] ....and it will output the first 4 rows...imagine the situation where we would have got first 3 rows only...u would be wondering i demanded till Abilene why its showing till Holyoke and hence to avoid this upper bound is inclusive...so treat row Label as Row Names and not Row Index .....i hope this clears the doubt

@dataschool 5 жыл бұрын

Thanks for sharing!

@rishi6739 5 жыл бұрын

Can you make video on difference between transform and apply method

@dataschool 5 жыл бұрын

Thanks for your suggestion!

@rajbir_singh0517 5 жыл бұрын

Hello Sir, Can you please prepare a video on sampling . various types of sampling methods and their usage.

@dataschool 5 жыл бұрын

Thanks for your suggestion!

@mladenvujic4034 2 жыл бұрын

"~" | Using the tilde operator to return inverse data

@dataschool 2 жыл бұрын

Right!

@ritchieng8073 8 жыл бұрын

I do not get the code for the last tip. I can't seem to break this down: ~ufo.index.isin(train.index) Thanks! Your other videos are succinct and great for beginners.

@dataschool 8 жыл бұрын

Thanks for your kind words! Regarding the code, here's how to think about it: ufo.index is the index for 100% of the rows in the ufo DataFrame. train.index is the index for 75% of the rows in the ufo DataFrame. ufo.index.isin(train.index) outputs a Series of booleans (with the same length as the ufo DataFrame) that contains a True for every row in ufo that is also in train. Thus, it outputs a Series in which 75% of the elements are True (corresponding to which rows are in the train DataFrame). The tilde character inverts the Series, meaning Trues become Falses and Falses become Trues. At that point, the boolean Series can be passed to the loc method to select the other 25% of the rows from the ufo DataFrame. Does that make sense? :)

@ritchieng8073 8 жыл бұрын

I get it! Excellent explanation as usual.

@eugeneskokowski7098 7 жыл бұрын

Dear Kevin, it'd be great if you explain more about using tilde operator as it behaves in a bit confusing way (~True returns -2 and ~False returns -1). Thanks!

@sidaliu8989 7 жыл бұрын

Hi Eugene Skokowski. I have searched for this question. It turns out that ~ mark has two meanings in Python. For integers, ~i means i XOR 1, which equivalent to (-i)-1. This is why you have got ~True = -2, Python interprets your True to an integer 1. On the other hand, ~ can also be an operator of an object, which can be overrided by defining a method named '__invert__'. Thus Pandas.Series can design its own response to this operator. Check for more information: stackoverflow.com/questions/8305199/the-tilde-operator-in-python

@pomicsaviox9971 4 жыл бұрын

Can u please create a Video on VLOOKUP For Example Column C3 (in Sheet1) = VLOOKUP(C2,Sheet2!$A$3:$C$4795,3,FALSE) Apply on all 1000 of Rows $4795 - MAX Row of Sheet 2 $ - Values from Sheet 2