At 10:00 , i saw that you read the important part and ignored the negativity in the latter part of the question. That is another lesson we can take from this video on top of Learning Python and Pandas.
@praveenng7 жыл бұрын
Great video, just like the other videos in the series. Thanks for your time and effort! If loc were exclusive of the second limit, it would be impossible to select the last column or last row. This could be another reason to make loc inclusive.
@dataschool7 жыл бұрын
Great point! And, thanks for your kind words - I'm glad the videos have been helpful to you!
@ludovicolaci95824 жыл бұрын
I guess that's probably why Kevin made this labels and indexes distinction between loc and iloc. iloc is using kind of lists structure for referencing. like start at 0 and end at n-1 (like indexes also (start why 0 until number_of_rows -1)) on the other hand loc wants you to be able of referencing by label name for columns so to not mix up both concepts inside of the code of loc method they (pandas developpers) probably choose to include everything on loc. and keep a normal structure for iloc.
@ramleo14615 жыл бұрын
I'm nt getting words to describe hw helpful ur videos r... Pls keep up the good work... Tutorial on numpy as wel.. Plss
@dataschool5 жыл бұрын
Thank you!
@dariuszspiewak56242 жыл бұрын
The convention with .loc (it includes both ends) is fully understandable when one realizes that labels do not continue forever like the integers do. So, if one wants to select all columns, which label would one use as the last? What's after 'Time' in this case? There isn't anything that would logically follow 'Time.' Of course, one could have a convention that in this case None would do the trick but... what if None can be a label as well? And it probably can! With integers there's no such puzzle because they do continue up forever. Therefore there's no ambiguity if there's no (n+1)th column (or even (n + m)th). I think this is the logical explanation of why there are 2 different conventions, one for .loc and the other for .iloc.
@dataschool2 жыл бұрын
Thanks for sharing, Dariusz!
@ajaybhandari95969 ай бұрын
Hi sir, Thanks for making videos I watched many videos on KZbin related Python but couldn't understand anything ,When I heard about your channel, I checked and now I have learnt all the major concepts of the Python . So I wanna thank you for your support. And i am sure that in future I'll not have to face any kind of problems in pandas.. And I have a question = how can we write a function for any problem in python? is there any Way to learn or apply writing function in pandas .. thanks!!
@dataschool8 ай бұрын
I have a course that will help you with writing functions: courses.dataschool.io/python-essentials-for-data-scientists
@jigneshpatel-ns2it4 жыл бұрын
How I can convert to datetime format and select year for univariate biplot
@sherlockom223 жыл бұрын
i think '~' act as a 'not' operator . means not in 'train'. let me know if this is correct. :-D
@guptaachin8 жыл бұрын
About loc and iloc: The two methods may be unplanned but the need to hate them can be obviated. How? loc Since loc only takes the actual names of the columns(str generally) or indexes(int generally) it is made exclusive on both the sides. Because during specifying the slice we say 'select from this label to this label'. Making it exclusive will not make sense. iloc - only accepts the positions irrespective of the coincidence of integers(in case of indexes). Since it operates on behalf of the positions it can make the later part of the slice exclusive (may have been made inclusive by the coders). This may not be the exact explanation for the problem but sure helps you to not be annoyed and confused everything the aforementioned methods are used. Summary. loc - labels so inclusive iloc - integer positions so exclusive for latter part of the slice.
@dataschool8 жыл бұрын
Great summary, thanks!
@saracachique85002 жыл бұрын
Hello, thank you for this video, I have a question about one of the your dataframe is about of the actors, for example if i want to count the number of the actors of that list string, how can i do it?
@dataschool2 жыл бұрын
You'll have to use string methods, see this video: kzbin.info/www/bejne/mKDJknZmfsieftE Hope that helps!
@mariodamhur11516 жыл бұрын
Hi. Great video! Thanks for your time. I have a question. If I have a folder with n subfolders that are the classes of my data (let's say cat/ and dog/), how I read as dataframe? Or another similar but with train/ and test/ and inside them dog/ and cat/?
@dataschool6 жыл бұрын
It depends on exactly what is in each folder, but you would likely have to use 'glob.glob' to get a list of the filenames, and then loop through them, reading each one and appending to the DataFrame.
@yangw93287 жыл бұрын
Thank you, Data School, for posting amazing pandas Q&A videos. If someone else has not raised this question, could you explain how you use logging library in jupyter notebook? For example, if I want to print the message on the notebook and export to a separate file on the disk, is there any ipython magics for this or I should still use logging library? Thank you and look forward to your suggestion on this.
@dataschool7 жыл бұрын
Glad you are enjoying the pandas videos! Regarding your question, I actually don't know. But if you figure it out, please let me know!
@slavrine6 жыл бұрын
Great questions and great explanations!
@dataschool6 жыл бұрын
Thanks!
@sergeyteslyuk54968 жыл бұрын
Regarding the last question, i suppose that it was about how do i read csv file using sampling, not how to use sample method for DataFrame that has been already read.
@dataschool8 жыл бұрын
It's hard to say for sure... both are good questions! I'll try to cover the scenario you are describing in a future video.
@ghanemimehdi10638 жыл бұрын
Hello, I work with pandas data frame, and i want to send collumn to Qt software but whene i do that i have an error that say that the pandas object is not serializable How can i solve this error please !!
@dataschool8 жыл бұрын
Have you tried the to_pickle DataFrame method? pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_pickle.html
@kostasnikoloutsos51727 жыл бұрын
Joshua dan I love you! Best question
@tamal_sen5 жыл бұрын
Hello Kevin, thank you for all your effort and time. please elaborate the reason for setting the parameter for 'random_state=99' at 16:39. and what possible scenario will occur if I set to with different values? thank you.
@dataschool5 жыл бұрын
Great question! It's for reproducibility, meaning it will produce the same result every time you run it. You can use any integer. I wish I could explain more, but it's a complicated topic. Hope that helps!
@fffppp87627 жыл бұрын
Great videos. I have a question and it confuses me . In video 23, whatever value you set for random_state, it always returns some constant value. In this case, then why do we set different value for random_state (e.g 1,9,42 )? Thanks
@dataschool7 жыл бұрын
The goal of setting a random_state is to ensure reproducibility, meaning that you will get the exact same results every time you run your code. However, it doesn't actually matter what number you use as the "seed" (that's the term use to refer to that number), as long as you don't change it in between runs.
@user-jq3um6ez2h6 жыл бұрын
Thank you for your videos
@dataschool6 жыл бұрын
You're welcome!
@thliang178 жыл бұрын
Hi Kevin. Thanks for so many very useful videos. My question: the code ![](plot_name.png) does not work me. plot_name.png is in my working directory. I always have error: '[]' is not recognized as an internal or external command, operable program or batch file. Do I need to install some packages? Thanks.
@dataschool8 жыл бұрын
That is Markdown code, and so the cell has to be run as a Markdown cell in order for it to render. Currently, you're trying to run it as a code cell, and that generates an error because it's not Python code. Just convert it to a Markdown cell, and re-run it, and it should work!
@thliang178 жыл бұрын
Thank you Kevin. That works! Appreciate and anticipate your more videos! :-)
@yashovardhanmopur4386 жыл бұрын
Great video! Sorry for the dumb question. I wanted to know the difference between 2 set of tools available. One set of tools involve Pig, Hive, Flume, Spark, etc. and the other side there are set of tools like Pandas, Numpy, etc. Can we use these tools interchangeably or combine these tools to perform data analysis and data munging?
@dataschool6 жыл бұрын
The first set is tools for big data (across various languages), and the second set is tools for small data (all in Python). You would generally perform your analysis and munging in one tool, but if you use two tools, then ideally they will be in the same language. Hope that helps!
@ish6947 жыл бұрын
Just curious. There are like soooo many parameters. Do all of them even come in day to day use or have you ever used all of them ??
@dataschool7 жыл бұрын
I have definitely not used all of them! :)
@Om-iy9ix6 жыл бұрын
Hie there, thanks for awesome videos. Doubt :- why is ufo.iloc[:,'City':'State'] not working? It shows error that cannot do slice on class pandas.core.indexes.base.index with indexers[city] of class str
@dataschool6 жыл бұрын
I think you want to use loc instead of iloc in that case.
@mahesh_kok5 жыл бұрын
Utsav Patel ...Arey Bhai iloc accepts index based slicing and you are providing name of columns ..so u have two option ...use loc instead of this and u will get the desired output but if u still want to use iloc use the column based index like.... ufo.iloc[:, 0:4] and it will work....
@roygao94774 жыл бұрын
Many thanks Kevin, the video is very helpful. But I still have a question about "Random". I understand that you used it to "Slice" a table including rows and columns. But I tried the same function to randomly pick up a cell value from a column/series, but it even picked up the "type" information (e.g. "12 49017.0Name: Project ID (IHRIS), dtype: float64"), would you know the reason? Also, is it possible to make a video to teach how to write the data into a particular excel template, instead of just saving it into any excel (assuming there's an existing table in target excel spreadsheet). Thank you.
@dataschool4 жыл бұрын
I'm sorry, I won't be able to help... good luck!
@enian828 жыл бұрын
Thank you the amazing Tutorials. Seriously these are one of the best I have seen. I have a quick question. In Ipython notebook once u execute a cell and then save the notebook. If you reopen the notebook that cell which we executed stays executed, How do you make it in such a way its not executed yet ?
@daxpicture99966 жыл бұрын
In Jupyter, use the file menu at the top and select Kernel and then Restart & Clear Output.
@sushichanel72996 жыл бұрын
train=ufo.sample(frac =0.75,random_state=99) test.shape (0, 5) I don't know why row is 0. Please advise.
@dataschool6 жыл бұрын
I think you've made an error in your code prior to this line.
@Martin-lv1xw4 жыл бұрын
I think I am not so late! its 2020 but I have grasp python knowledge so fast via your teaching. Before, I used to work with R and I have developed a behavior of passing arguments into a function without their names which used to work pretty fine in R but not in python. Is there a trick to do that?
@dataschool4 жыл бұрын
It depends on the particular scenario... I don't know how to explain briefly, sorry!
@sipanarevshatyan63547 жыл бұрын
Hi, thanks for your interesting tutorials. I have a question, could you tell me please how can I make tables from data frames and save those tables as a .png image. And how to give a specific colour to each row?
@dataschool7 жыл бұрын
I'm sorry, I'm not familiar with how to do that. If you figure it out, I'd love to hear!
@mahesh_kok5 жыл бұрын
i would like to reframe the answer to the 3rd question why both ranges are inclusive in loc.. at time 14:20 lets assume i want to change the Row Label to that of City..i will use ufo.set_index('City') ... Now try using loc function and u want the rows from ithaca to Abilene i,e first 4 rows...we will write code as ufo.loc['ithaca':'Abilene'] ....and it will output the first 4 rows...imagine the situation where we would have got first 3 rows only...u would be wondering i demanded till Abilene why its showing till Holyoke and hence to avoid this upper bound is inclusive...so treat row Label as Row Names and not Row Index .....i hope this clears the doubt
@dataschool5 жыл бұрын
Thanks for sharing!
@rishi67395 жыл бұрын
Can you make video on difference between transform and apply method
@dataschool5 жыл бұрын
Thanks for your suggestion!
@rajbir_singh05175 жыл бұрын
Hello Sir, Can you please prepare a video on sampling . various types of sampling methods and their usage.
@dataschool5 жыл бұрын
Thanks for your suggestion!
@mladenvujic40342 жыл бұрын
"~" | Using the tilde operator to return inverse data
@dataschool2 жыл бұрын
Right!
@ritchieng80738 жыл бұрын
I do not get the code for the last tip. I can't seem to break this down: ~ufo.index.isin(train.index) Thanks! Your other videos are succinct and great for beginners.
@dataschool8 жыл бұрын
Thanks for your kind words! Regarding the code, here's how to think about it: ufo.index is the index for 100% of the rows in the ufo DataFrame. train.index is the index for 75% of the rows in the ufo DataFrame. ufo.index.isin(train.index) outputs a Series of booleans (with the same length as the ufo DataFrame) that contains a True for every row in ufo that is also in train. Thus, it outputs a Series in which 75% of the elements are True (corresponding to which rows are in the train DataFrame). The tilde character inverts the Series, meaning Trues become Falses and Falses become Trues. At that point, the boolean Series can be passed to the loc method to select the other 25% of the rows from the ufo DataFrame. Does that make sense? :)
@ritchieng80738 жыл бұрын
I get it! Excellent explanation as usual.
@eugeneskokowski70987 жыл бұрын
Dear Kevin, it'd be great if you explain more about using tilde operator as it behaves in a bit confusing way (~True returns -2 and ~False returns -1). Thanks!
@sidaliu89897 жыл бұрын
Hi Eugene Skokowski. I have searched for this question. It turns out that ~ mark has two meanings in Python. For integers, ~i means i XOR 1, which equivalent to (-i)-1. This is why you have got ~True = -2, Python interprets your True to an integer 1. On the other hand, ~ can also be an operator of an object, which can be overrided by defining a method named '__invert__'. Thus Pandas.Series can design its own response to this operator. Check for more information: stackoverflow.com/questions/8305199/the-tilde-operator-in-python
@pomicsaviox99714 жыл бұрын
Can u please create a Video on VLOOKUP For Example Column C3 (in Sheet1) = VLOOKUP(C2,Sheet2!$A$3:$C$4795,3,FALSE) Apply on all 1000 of Rows $4795 - MAX Row of Sheet 2 $ - Values from Sheet 2
@firasbayazed74795 жыл бұрын
thanks but you didn't explain the numbers that in random state
@dataschool5 жыл бұрын
It sets the seed for the pseudo-random number generator used by the sample method. Does that help?
@firasbayazed74795 жыл бұрын
@@dataschool but what the differences when you set the random state to 42 or 99?
@dataschool5 жыл бұрын
Sorry, I don't know how to explain a random number seed in a KZbin comment... it's a longer explanation!