Starting in pandas version 0.19, you can create a category column during the file reading process! Learn more here: kzbin.info/www/bejne/Y3_Fimp7bs1-rs0 And starting in pandas 0.21, the method for specifying ordered categories has changed. Learn the new method here: kzbin.info/www/bejne/qpaYe6WJeLxggrs
@BadriNathJK8 жыл бұрын
I am recommending your channel to all my friends. You are too good.
@dataschool8 жыл бұрын
Wow, thank you!
@amandal81704 жыл бұрын
Yes, he is too good. Even our professor recommended learn pandas from him. lol.
@amandal81704 жыл бұрын
@@dataschool Thanks a lot. Could we have some of R shiny or Python visualisation ? Like your teaching style.
@WaltterValdez8 жыл бұрын
Thanks, I reduced mya data from 592.4 MB to 195.0 MB using categories That's amazing!!!
@dataschool8 жыл бұрын
That is awesome!!
@ilyastrojnov76274 жыл бұрын
remember, with big data you need pd.eval and df.query for filter, these functions don't use memore for temp bool Series
@readtilleternity6 жыл бұрын
Dude, you are awesome! This is THE best tutorial on Pandas I have come across on the internet. You are really doing the internet a great favor! Thanks a lot!
@dataschool6 жыл бұрын
Wow! Thank you so much for your kind words! :) You are very welcome.
@andreacazzaniga84887 жыл бұрын
very useful! I was still a bit skeptical but the example with the country series made it all very clear! you are good at giving the best frame to understand things
@dataschool7 жыл бұрын
Excellent! Glad to hear that this video was helpful to you.
@JSchellergJ6 жыл бұрын
Good lord man, this is awesome and your way of teaching is well paced and easy to follow. You're a incredible teacher, keep this way and you will hit the stars!
@dataschool6 жыл бұрын
Thanks so much for your kind words! Much appreciated!
@jiwon53156 жыл бұрын
You’re amazing at explaining, thanks for uploading these content
@dataschool6 жыл бұрын
You're very welcome! Thanks for your kind comments :)
@Diachron8 жыл бұрын
Well I must sound like a broken record about how good these videos are but they only get better. I've come close on occasion to manually implementing what the category dtype does, so thanks for that revelation.
@dataschool8 жыл бұрын
Thank you! I'm glad the category tip was helpful to you!
@UndoubtablySo Жыл бұрын
category feature super powerful, glad i learnt this
@dataschool Жыл бұрын
Great to hear!
@JR-di9uk7 жыл бұрын
You should mention that if you perform a df['mycolumn'].astype=('category'), you won't be able to enter arbitrary strings into the DataFrame anymore (write ops are limited to the exact categories). This may be an advantage (typo protection) or disadvantage, depending on the use case! Otherwise, thanks for the conscise and clear instructions!
@dataschool7 жыл бұрын
That's a great point, thank you for bringing it up! I really appreciate it.
@FabioRBelotto2 жыл бұрын
I understand that the category becomes "available" to only the kinds of values used on it, but how should I do when need to edit? For example, on sex gender I used to have Male of Female. Now I should store many other types. How to edit / increase the category list?
@s.baskaravishnu227 жыл бұрын
I very much congratulate you for sharing code used in video with us. Many thanks for that. It is very much useful to me. My warm regards to you.
@dataschool7 жыл бұрын
You're welcome!
@nitishkumar-bk8kd4 жыл бұрын
loved ur explanation, great teacher
@dataschool4 жыл бұрын
Thank you! 😃
@ahmadmponda32942 жыл бұрын
Thank you a million. being struggling with inplace returning none type df most of the time.
@GregHacob8 жыл бұрын
Very useful tips. You make pandas easy to understand. Thank you!
@dataschool8 жыл бұрын
You're very welcome!
@kp98345 жыл бұрын
Thank you for an excellent video on writing memory efficient code with categorical data in input. I'm interested in understanding various options to read in large dataframes (other than common pandas and spark methods) containing only numerical data, iterate over its length, create smaller dataframe out of it based on a condition and do some processing, all of which in a faster and memory efficient way. Please cover it if possible.
@dataschool5 жыл бұрын
Thanks for your suggestion!
@senupranesh5 жыл бұрын
Amazing explanation along with hands on. I am really stunned with the way of teaching. Thank you very much. Your accent sometimes remembers me Bruce Lee.
@dataschool5 жыл бұрын
Thank you!
@pldeepesh5 жыл бұрын
This on the coolest tutorials I have watched on pandas. Thanks for making it. I have a question though, would these categories improve the speed of a for loop, if I user iterrows() on the data frame
@dataschool5 жыл бұрын
Thanks for your kind words! As for your question, I'm not sure, sorry!
@Kralnor4 жыл бұрын
Using iterrows() in pandas is an anti-pattern and should only be done as a last resort. See engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6
@Russel49738 жыл бұрын
Great explanation! Never knew about "category" before.
@dataschool8 жыл бұрын
Thanks! It's so useful, I knew I had to cover it in the video series!
@sibinh8 жыл бұрын
Really useful tips. Thanks Kevin.
@dataschool8 жыл бұрын
You're very welcome!
@virenr57677 жыл бұрын
Great Videos. Thank you. Would appreciate your advice on the following - I am attempting to maintain customer-wise product wise monthly sales data. The index would be the product and the columns would be the customer name. Data would have to be captured into the table every month. 1. How would you recommend setting up the structure - As different data frames for each month or as a 3 dimensional array, with the 3rd dimension being the monthly data. 2. How do you set up a blank structure containing all possible products and customers and then populate each data frame with monthly sales data received? 3. Suppose you start dealing with a new customer mid year, how do you populate the entire table with this new customer Series and then start capturing their sales data from the month they start buying? Thank you in advance, for the answers
@dataschool6 жыл бұрын
I'm sorry, but this is way beyond what I can address in a comment... good luck!
@niteshsrivastava65045 жыл бұрын
Thanks for ur knowledge sharing. My question is how this category is different from label encoding. They do the same thing?
@dataschool4 жыл бұрын
Great question! When using the category data type, you are defining how pandas stores that column of data. However, you still treat that column as strings when working with it within pandas. With label encoding, your goal is to convert categories to numbers so that you can work with the numbers, not the strings. Does that answer your question?
@silverahmad4 жыл бұрын
Amazing as always. This entire playlist is in my favorites bar now! I have a quick questions, I tried the bonus tip on the drinksby continent dataframe just to see how it works drinks['continent']=drinks.continent.astype('category', categories=['South America', 'Africa', 'North America', 'Europe', 'Asia', 'Oceania'], ordered=True) and I get this error TypeError: astype() got an unexpected keyword argument 'categories' Any idea why?
@tugraalp013 жыл бұрын
(11:00) That method might be usefull for data analysis studies, but if we apply some macine learning algorithms, we HAVE TO use label encoding or one hot encoding etc. technics , right ? I actually want to know that how much correct to convert the attribute as 'category' type in ML instead of not appliyng encoding technics ?
@dataschool3 жыл бұрын
You are correct that converting to the category type does not prepare it for ML. See this video for more: kzbin.info/www/bejne/ZqiaaXZ-gsSomK8
@mrmuranga4 жыл бұрын
Amazing....I enjoy learning from the channel
@dataschool4 жыл бұрын
Thank you!
@AbrahamHoffman8 жыл бұрын
Yeah, this one was totally awesome. Thanks for making the videos!
@dataschool8 жыл бұрын
Ha! Thank you for the comment! And you are very welcome, I enjoyed making these videos.
@fruitfcker53515 жыл бұрын
If anyone is seeing a FutureWarning error when specifying categories, instead of: df['quality'] = df.quality.astype('category', categories=['good', 'very good', 'excellent'], ordered=True) use: quality_dtype = pd.api.types.CategoricalDtype(categories=['good', 'very good', 'excellent'], ordered=True) df['quality'] = df.quality.astype(quality_dtype)
@dataschool5 жыл бұрын
Right! The API changed in pandas 0.21. More details here: kzbin.info/www/bejne/qpaYe6WJeLxggrs
@Leonardo-jv1ls5 жыл бұрын
Man. You are insanely good.
@dataschool5 жыл бұрын
Thank you! 😊
@vinayakmaheshwari36976 жыл бұрын
Can you make a video on how to merge, join and concatenate in python and also differences between these. Nice videos by the way!
@dataschool6 жыл бұрын
Thanks for your suggestion, I'll consider it! :)
@jolespin6 жыл бұрын
Possible new topic: Methods in pandas that are not well known to most users. I've been using pandas for years and didn't know about the `cat`, `str`, and `memory_usage` methods. I'm familiar with `groupby`, `applymap`, `map`, etc. but it would be cool if you could show case some other methods that are less well known to the common users. Thanks
@dataschool6 жыл бұрын
Great suggestion, thanks!
@jolespin6 жыл бұрын
Didn't know about the memory_usage, cat, str, etc. Nice!
@dataschool6 жыл бұрын
Thanks!
@vvasani7 жыл бұрын
You are the best! I'm feeling Lucky that I found your channel at right time in my learning path ...Thanks a lot! I have one question here. could you please help understanding general idea behind using 'categories' in astype method since it is not a pre-defined parameter in method documentation (if we click shift+ tab :) )? I mean what all parameters we can use in place of kwargs in an instancemethod just like we used 'categories' here? (All properties/attributes of an object?)
@dataschool7 жыл бұрын
Glad you like the videos! Please consider subscribing to the Data School mailing list: www.dataschool.io/subscribe/ Regarding your question, I don't know how to explain the technical details behind why you can pass the argument 'categories' in this case, other than to say that it's because the pandas code has been written to allow that argument. I'm sorry if that's not what you were looking for!
@mmimpositive5 жыл бұрын
How to make the output to appear in a tabular form as is shown in your video? This gives the better clarity of data.
@dataschool5 жыл бұрын
The way the output looks is determined by your editor. I'm using the Jupyter notebook, though note that the output varies even across different versions of the notebook.
@jaikishank4 жыл бұрын
Great explanation .Thank you.
@dataschool4 жыл бұрын
You are welcome!
@mdzahidulislam68577 жыл бұрын
I am glad that I came across your videos. It is really helpful for me. However, can we use categorical and numeric features for building decision trees in sklearn? I am getting the following errors: ValueError: could not convert string to float: 'Zimbabwe' Thank you very much for your help.
@dataschool7 жыл бұрын
You can use categorical features with any scikit-learn model, however you will need to transform them to numeric values. Here are two videos that may help you: kzbin.info/www/bejne/ZqTCYnyph7SaesU kzbin.info/www/bejne/r521nXp5qaann6c
@mdzahidulislam68577 жыл бұрын
Thanks a lot! There are awesome..
@dataschool7 жыл бұрын
You're very welcome! Glad they were helpful to you :)
@uguree4 жыл бұрын
custom ordered category is now a bit different: from pandas.api.types import CategoricalDtype cat_type = CategoricalDtype(categories=['good', 'very good', 'excellent'], ordered=True) df.quality.astype(cat_type)
@hsrayyar3 жыл бұрын
thanks! It works!
@jaikapoor36664 жыл бұрын
why does .info( ) have parenthesis? Isn't it an attribute of the DataFrame?
@nelsonmacy10103 жыл бұрын
Brilliant video! Thx.bonus was awesome
@dataschool3 жыл бұрын
Glad you enjoyed it!
@safeeqahmed33066 жыл бұрын
Great video. I have a doubt. Suppose if i have a dataset about computers. I have a column for number of antivirus installed in a computer. I have total 100 observations but only 3 unique values for this column (1, 2 and 3). So should I consider this column as numeric or categorical?
@dataschool6 жыл бұрын
It depends - what are you trying to predict?
@safeeqahmed33066 жыл бұрын
Data School I am predicting if a particular machine will be attacked by a malware soon, based on its configurations and a number of other parameters including number of antiviruses installed
@dataschool6 жыл бұрын
You would consider the column numeric.
@safeeqahmed33066 жыл бұрын
Data School thanks a lot. May I know the reason please? And why it depends on the predictor?
@spacedustpi5 жыл бұрын
Thanks. Very useful. Why do you prefer df.loc[df.quality > 'good', :] over df[df.quality > 'good']?
@dataschool5 жыл бұрын
Either is fine. The first is more explicit, whereas the second is more readable, so I go back and forth! :)
@priyankrajsharma5 жыл бұрын
awesome tutorial.. you made it so easy
@dataschool5 жыл бұрын
Thanks!
@olabrew7 жыл бұрын
Hi, could you do a lesson on using the pivot function in Pandas? Haven't seen a good example anywhere.
@dataschool7 жыл бұрын
Thanks for the suggestion! Maybe this might be helpful to you? pbpython.com/pandas-pivot-table-explained.html
@olabrew7 жыл бұрын
Thanks! That helps to explain it a bit better. Cheers
@evapatrick34765 жыл бұрын
Hi there, thanks for your excellent tutorial. I have a question that I unable to find an answer to, Can you use these columns (ones which have been converted into categories) in analysis, specifically machine learning models? If not how can one do without have to use get_dummies option since I have a column of about 8,000 unique rows?
@dataschool5 жыл бұрын
I recommend scikit-learn's OneHotEncoder for this case. No, you can't directly feed a category column to scikit-learn. Hope that helps!
@serdarb89956 жыл бұрын
You are great Kevin
@dataschool6 жыл бұрын
Thanks! You are great Serdar!
@oliverf29243 жыл бұрын
Great tutorial, thank you
@dataschool3 жыл бұрын
You are welcome!
@rephechaun6 жыл бұрын
Hi Kevin, Does this mean we can throw in this category converted variable into machine learning model like Logistic Regression in sklearn or statmodels?
@dataschool6 жыл бұрын
No, that's not how it works, sorry!
@nishitsethi94058 жыл бұрын
Thanks for the very informative video. I have one question. How do we convert multiple columns to 'category' data type at once? In my data set, I have 25 categorical columns and 6 integer columns. So is there an efficient way of converting these 25 columns to categorical while importing the data set or after importing? Thanks.
@dataschool8 жыл бұрын
Great question! There might be an easy way to do this, perhaps with the apply function, but I'm not sure at the moment. Let me know if you figured out an efficient method!
@FabioRBelotto2 жыл бұрын
I usually have to work over big big data samples, even for simple analysis. The main issue I face is that pandas takes more time to read/store the data frame than working on it. Sadly, is quicker e easier to just run some extractions using sql as is runs on the database server than importing data to my local machine.
@AlonsoParejawee4 жыл бұрын
Thank you! Is it possible to create multiple dataframes based on the categories I have in my dataset?
@richardanderson83778 жыл бұрын
My question is about using categorical variables to build a logistic regression model using statsmodels. I had some 0-1 integer variables that I wanted to use as some of the predictor variables to build a logistic regression model, but converted them to categorical thinking this would avoid being treated as numerical. However, I got a ValueError: unrecognized data structures: / . Do you understand why? I can take this to a different forum if that would be better..
@dataschool8 жыл бұрын
My video coming out on July 12 will answer that question! I'll let you know when it's posted.
@dataschool8 жыл бұрын
Check out my latest video, and see if it answers your question: kzbin.info/www/bejne/ZqTCYnyph7SaesU Hope that helps!
@richardanderson83778 жыл бұрын
Nice video. My question goes a bit further. Suppose you wanted to use your k-1 dummy variables in a statsmodels or sci-kit learn logistic regression. would you leave them as type integers or convert them to type categorical?
@dataschool8 жыл бұрын
You would leave them as type integer. Good luck!
@sunoreal5 жыл бұрын
There is no 'categories' or 'ordered' parameters in the astype() method I use pandas version 0.25.1 So, how do I set a priority in this version? Oh you did explain in your message Thank you
@dataschool5 жыл бұрын
This should help: nbviewer.jupyter.org/github/justmarkham/pandas-videos/blob/master/pandas_changes.ipynb
@jmjdotorg4 жыл бұрын
Why did sort_values() method not work in line 9 and instead you used sorted()?
@ganeshs85226 жыл бұрын
Hi Thanks for the nice videos! df[df.quality >'good'] also works Is there any reason you use df.loc[df.quality > 'good'] in the last part of this video? Under what conditions you use df[ condition] vs df.loc[condition]?
@dataschool6 жыл бұрын
In this case, I use loc to be more explicit. I general, I use loc whenever its flexibility is required.
@itsme.samrat4 жыл бұрын
loved this part
@dataschool4 жыл бұрын
Thanks!
@amish15023 жыл бұрын
The tutorials are super nice and helpful, but I just got a slight problem that the 'categories' and 'ordered' arguments are not working in python 3.9 and pandas version 1.2.2
@dataschool3 жыл бұрын
See here: kzbin.info/www/bejne/qpaYe6WJeLxggrs
@biswajitpatowary57847 жыл бұрын
Thats too good. Can you plz come up with tutorial videos of Matplotlib?
@dataschool7 жыл бұрын
Thanks for the suggestion! :)
@asiftandel87504 жыл бұрын
Great Video Sir
@dataschool4 жыл бұрын
Thanks!
@annelizabeth7287 жыл бұрын
Thanks for another fantastic video! I tried the tip at the end, and got a warning message: "FutureWarning: specifying 'categories' or 'ordered' in .astype() is deprecated; pass a CategoricalDtype instead." I checked the pandas documentation and substituted CategoricalDType, e.g. "cat_type = CategoricalDtype(categories=["good", "very good", "excellent"],ordered=True) [newline] df['quality'].astype(cat_type)" but that didn't really work the way I was expecting either. Is there a newer way of accomplishing this?
@dataschool7 жыл бұрын
Thanks for your kind words! Regarding your question, you are correct that this has changed in the latest versions of pandas. However, your proposed code looks exactly correct to me. What exactly are you expecting that you are not seeing? Just to be clear, you do need to overwrite the existing 'quality' column if you want there to be a permanent change: df['quality'] = df['quality'].astype(cat_type)
@dataschool7 жыл бұрын
I discuss the new syntax for specifying categories in my latest video, "5 new changes in pandas you need to know about": kzbin.info/www/bejne/qpaYe6WJeLxggrs Hope that helps!
@grijeshmnit5 жыл бұрын
brilliantly explained.
@dataschool5 жыл бұрын
Thank you!
@jdavis381008 жыл бұрын
Great job Kevin!
@dataschool8 жыл бұрын
Thanks! :)
@danielmayper65485 жыл бұрын
I've been following along on your examples and they've all been incredible, but I encountered an error I can't see to get around on this one. At about 16:45, the command df['quality'] = df.quality.astype('category', categories=['good','very good','excellent'], ordered=True) is given and whenever I try and submit that line to the compiler I get the error ValueError: Got an unexpected argument: categories Was there an update to Pandas that may have changed this function or is there some kind of error I'm not aware I'm making?
@danielmayper65485 жыл бұрын
I had tried going to your github and copying the line you used from there, but I was getting the same error
@dataschool5 жыл бұрын
The pandas API has changed, please see this video: kzbin.info/www/bejne/qpaYe6WJeLxggrs
@ItsWithinYou3 жыл бұрын
As usual, great lesson. Many thanks!
@dataschool3 жыл бұрын
Thank you!
@experimentalhypothesis11376 жыл бұрын
these videos are excellent!
@dataschool6 жыл бұрын
Thanks!
@amitghosh4255 жыл бұрын
at 16:44 I get the error message "ValueError: Got an unexpected argument: categories" for running "df['quality'] = df.quality.astype('category', categories=['good', 'very good', 'excellent'], ordered =True)" . please help
@dataschool5 жыл бұрын
The pandas API has changed. See this video: kzbin.info/www/bejne/qpaYe6WJeLxggrs
@tkannab17 жыл бұрын
Excellent video!! thank you!
@dataschool7 жыл бұрын
You're very welcome!
@hoegwonkim17275 жыл бұрын
I should have found your channel more earily! Tks for sharing great vedio
@dataschool5 жыл бұрын
😄
@LonglongFeng7 жыл бұрын
question: at 5:20, when you coded drinks.memory_usage(deep=True).sum(), it gave '24920L'. What does the 'L' mean after the figure? I think I seemed to see the 'L' thing appears when using the '.shape' function. what does that 'L' mean?
@dataschool7 жыл бұрын
L stands for "long", which I believe refers to the "long integer" type, which is the NumPy data type being used to store that data. In other words, it's an implementation detail that you don't really need to know. Hope that helps!
@kostasnikoloutsos51727 жыл бұрын
I am wondering if there is any cryptographic system that can convert strings to integers and then decrypt them back. If yes then why pandas do not implement that in the background to reduce space? Also if we use this astype("category") does has any effects when we export this dataframe into csv or excel file?
@dataschool7 жыл бұрын
Question 1 - I'm not sure. Question 2 - no effect. Hope that helps!
@rvg2964 жыл бұрын
Seems like in the latest pandas 1.1.2 version df['quality'] = df.quality.astype('category',categories=['good','verygood','excellent'],ordered=True) this throws an error saying unexpected categories argument. I guess this should work. df['quality'] = pd.Categorical(df.quality,categories=['good','verygood','excellent'],ordered=True)
@dataschool4 жыл бұрын
Thanks for sharing! Yes, the pandas API for ordered categories has changed since I recorded this video.
@geocarvalhont7 жыл бұрын
Amazing tip, thank you again!
@dataschool7 жыл бұрын
You're very welcome!
@u0000-u2x8 жыл бұрын
Very useful!
@dataschool8 жыл бұрын
Agreed! It's surprising that it's not more widely known! I'm trying to change that :)
@FabioRBelotto2 жыл бұрын
What is the amount of non unique values that still worth becoming a category?
@srosell1004 жыл бұрын
Hi, why do you hace to put memory_usage = 'deep' and not only memory_usage
@dataschool4 жыл бұрын
That's how you specify the parameter
@srosell1004 жыл бұрын
@@dataschool Thank you very much!!!, never though you would answer, and thank very much in general for your content you have thought me so much!!!
@KhalilYasser4 жыл бұрын
Thank you very much. Amazing tutorial. When trying this line `df['Quality'] = df.Quality.astype('category', categories = ['good', 'very good', 'excellent'], ordered=True)`, I encountered an error `TypeError: astype() got an unexpected keyword argument 'categories'`
@KhalilYasser4 жыл бұрын
Searched and solve like that: `from pandas.api.types import CategoricalDtype` then I used the line like that `df['Quality'] = df['Quality'].astype(CategoricalDtype(categories=['good', 'very good', 'excellent'], ordered=True))`
@hariharamoorthythennetipan21907 жыл бұрын
cool. Very nice examples.
@dataschool7 жыл бұрын
Thanks! Glad it was helpful to you!
@PradeepKumar68 жыл бұрын
Amazing always !!! Is it possible to convert these type of data into category while we read the data into python? Also, There is another datatype called datetime. I think it would be great if you may enlighten us with that as well for the purpose of datetime manipulation in future.
@dataschool8 жыл бұрын
Thanks! Regarding your first question, I haven't figured out a way to do it. Regarding datetimes, I will cover that in an upcoming video :)
@dataschool8 жыл бұрын
My latest video on the datetime format has been released: kzbin.info/www/bejne/r3TKe3qpnJWLl5Y Hope that helps!
@rahulgulati8908 жыл бұрын
Thanks for sharing such great videos. Can you create one video in explaining pivot table in pandas. That would be really helpful. Regards Rahul
@dataschool8 жыл бұрын
You're welcome! And, I will do my best to create one on pivot table. In the meantime, here's a good post on it: pbpython.com/pandas-pivot-table-explained.html
@rahulgulati8908 жыл бұрын
+Data School thank you kevin
@vasanthnayak40868 жыл бұрын
Hi... Thanks for sharing the Greatest series of videos on Pandas...!!! Quick question: Is there a way to convert a csv (size more than 2 GB) to a pandas data frame in the system where the RAM is 2 GB. I am getting 'memory error', while executing the code. I cant use 'category', I need the data as same as in the csv. Thanks...!!!
@dataschool8 жыл бұрын
Thanks for your kind words! One strategy is to read in only some of the rows and columns (only the ones you need), demonstrated here: kzbin.info/www/bejne/eF7VaomrgJ1jms0
@TheAlderFalder6 жыл бұрын
This was awesome!
@dataschool6 жыл бұрын
Thanks!
@AbhishekAnand-y2n Жыл бұрын
Hi! The data file url doesn't seem to be working all of a sudden. Could you look it up please?
@dataschool Жыл бұрын
You can get the datasets from here if needed: github.com/justmarkham/pandas-videos
@RohanB-xg6vg3 жыл бұрын
Hello ,currently I am using pandas version 1.2.2,in that I get an error while runing this code , df.quality.astype('category',categories=[''good','very good','excellent'],ordered =True) And it says that astype() got an unexpected keyword argument 'categories' Do they removed those parameters in newer version of pandas as this video was few years old?
@dataschool3 жыл бұрын
See this video: kzbin.info/www/bejne/qpaYe6WJeLxggrs
@reazshafqat55047 жыл бұрын
first of all thank you for all of your videos! my question would be: in your case the size of the continent category is 488KB but in my case its 744KB. Can you explain the reason behind this difference?
@dataschool7 жыл бұрын
Glad you like the videos! Regarding your question, it's probably due to the version of pandas or Python.
@aakashkumarnain75928 жыл бұрын
Hello Kevin!! How can I rename my columns which I changed to categorical data to the original names of the columns?
@dataschool8 жыл бұрын
You can use the DataFrame method 'rename', which I talk about in this video: kzbin.info/www/bejne/ZqalmqWPe82csKc
@Kavyashree406 жыл бұрын
Hi, Your videos are superb. Learnt a lot.Could you please explain me about pivot and pivot_table?
@dataschool6 жыл бұрын
Thanks! I will consider that for future videos.
@bhanu41876 жыл бұрын
i want to compare two date and time columns and produce the categorical value of new column if both columns have the same value , like if two columns have the same date and time i need to have 1 else 0. how it can be done pls help me
@dataschool6 жыл бұрын
df['new'] = (df.first == df.second)
@muhammadfayyaz71346 жыл бұрын
Would grateful if you make some tutorials on big data analytics thanks
@dataschool6 жыл бұрын
Thanks for your suggestion!
@muhammadfayyaz71346 жыл бұрын
Data School i hope will see a great tutorial series from you about big data soon. 😊
@ishaangupta22234 жыл бұрын
Hey python shows an error whenever I type categories in astype, saying: astype got an unexpected keyword argument 'categories'. Can you please help.
@anngu30864 жыл бұрын
the syntax got updated, you better check out the first comment he pinned on top
@ishaangupta22234 жыл бұрын
Ann Gu Thanks
@rdg82686 жыл бұрын
I need something like categories for a age range, for example 0-10, 0-20... Is it possible?
@dataschool6 жыл бұрын
Sure!
@Jacob9303217 жыл бұрын
What about cols=['col1', 'col2' ]; df[cols].apply(lambda x: x.astype('category')
@dataschool7 жыл бұрын
That seems like it would work!
@saurabhkhodake8 жыл бұрын
For the bonus tutorial i got error as "_astype() got an unexpected keyword argument 'categories' " Has the definition to astype() changed? Appreciate if someone could help.
@mleiano8 жыл бұрын
I had a similar error, I think what you did is you somehow ran the code without the "ordered = True" bit of the code at first or some such partial code and then tried to run it again with all the arguments as shown in the tutorial above, in that case it does show the error you mentioned. Just run the DataFrame creation command; ie, df = pd.DataFrame(...) again and then run the df.quality.astype(...) code, it should work. It did for me anyways. Let me know how it goes. Can anyone explain why it happens though? I am not sure about that.
@dataschool8 жыл бұрын
What version of pandas are you running?
@KimmoHintikka8 жыл бұрын
Thanks to re-running the the df creation again worked. My pandas version info from conda. pandas 0.19.2 np112py36_1 ------------------------- file name : pandas-0.19.2-np112py36_1.tar.bz2 name : pandas version : 0.19.2 build string: np112py36_1 build number: 1 channel : defaults size : 8.4 MB arch : x86_64 date : 2017-02-04 license : BSD md5 : 5ce048ed69412b7bec27989c5c963678 noarch : None platform : darwin url : repo.continuum.io/pkgs/free/osx-64/pandas-0.19.2-np112py36_1.tar.bz2 dependencies: numpy 1.12* python 3.6* python-dateutil pytz
@vinodkumar-ro7rc8 жыл бұрын
Excellent Article
@dataschool8 жыл бұрын
Thanks!
@lonewolf25476 жыл бұрын
For my dataset it reduced the size by approximately 50%. What i wanted to ask is if it has to lookup each time, does this increases the time complexity?
@dataschool6 жыл бұрын
No, the lookup shouldn't take a meaningful amount of time.
@niteshsawant27164 жыл бұрын
How to autoupdate the ID column
@vanmemet7 жыл бұрын
Thanks for your great videos, I am very enjoying watching, learning a lot. But most of these concepts are already addressed in sql world. I think when you tutor the video, you may reference these subjects to sql subjects. IMHO.
@dataschool7 жыл бұрын
SQL and pandas can indeed accomplish many of the same tasks. For SQL users, you are right that SQL comparisons might be helpful. You might like resource #5 here: www.dataschool.io/best-python-pandas-resources/
@rikicade20123 жыл бұрын
df['Quality'] = df.Quality.astype("category", categories=["good", "very good", "excellent"],ordered=True) any idea when I run this I get this
@dataschool3 жыл бұрын
See this video for details: kzbin.info/www/bejne/qpaYe6WJeLxggrs
@kostasnikoloutsos51727 жыл бұрын
You used a parameter called categories.This is not in the parameters of astype method. I think its in **kwargs.In docs I found this: kwargs : keyword arguments to pass on to the constructor. Where is the constructor I cannot understand this
@dataschool7 жыл бұрын
Sorry, I don't know how to answer your question!
@Om-iy9ix6 жыл бұрын
Hie there Great videos, when we wrote drinks.continent.cat.codes.head() we got 1 2 0 2 0 and when I did drinks.head after that, it displayed Asia Europe and all instead of just numbers which should point to a look up table containing strings. Then I did was drinks.memoryusage(deep =True ) which gave reduced continent size... How does this worked . One side it does not reflect in Data frame and on other side it shows reduced . Hope you help me out soon.. Thanks a lot for your amazing videos. Please make more videos on Data Science ML topics .
@dataschool6 жыл бұрын
Great question! The integers are the internal encodings for those categories, and the size is reduced due to those encodings. Does that help? You might like this video series: kzbin.info/aero/PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A
@TrevorHigbee5 жыл бұрын
Looks like Pandas ordered categories syntax has changed. Should now be: from pandas.api.types import CategoricalDtype df['quality'] = df.quality.astype(CategoricalDtype(categories=['good', 'very good', 'excellent'], ordered=True))
@pdileepan5 жыл бұрын
What worked for me is: df['quality'] = pd.Categorical(df.quality, categories=['good', 'very good','excellent'], ordered=True)
@mehmetbugu5 жыл бұрын
@@pdileepan thanks
@dataschool5 жыл бұрын
Thanks for sharing! I have more details here: kzbin.info/www/bejne/qpaYe6WJeLxggrs
@patrickmckowen11545 жыл бұрын
Over 9000!!!!!
@dataschool5 жыл бұрын
😄
@JuanManuelBerros8 жыл бұрын
before the conversion it was OVER 9000 !!!! @10:21
@dataschool8 жыл бұрын
Pretty cool, right? :)
@alancheriyan99386 жыл бұрын
I was looking through the comments for this comment! xD
@aleemkhusro32687 жыл бұрын
does anyone know why I get NoneType when I do df.info()? Thanks in advance.
@dataschool7 жыл бұрын
Are you sure that the 'df' object is a pandas DataFrame?