very easy to follow and understand, in contrast with many other tutorials I found, great and many thx
@dataschool8 жыл бұрын
Great to hear! Thanks for your kind words!
@jwilliams8210 Жыл бұрын
Amazingly clear explanation!!! Thank you!!
@dataschool11 ай бұрын
You're very welcome!
@souraneelmandal79122 жыл бұрын
How do you create that structure table style in jupyter
@brendensong80003 жыл бұрын
Wow!!! This is my first video watching you teach. it's crystal clear!!! Looking forward to more video!
@dataschool3 жыл бұрын
Awesome! Thank you!
@TheNikhileshYadav5 жыл бұрын
Hello Kevin, for a Multi-label categorical field with more than 600 entries can the same strategy of dummy variables followed ? If not then please suggest the ways in which it can be converted to numeric form. Thank You.
@dataschool5 жыл бұрын
You can use the same strategy, though I would recommend using OneHotEncoder from scikit-learn. Hope that helps!
@Raaajzzz Жыл бұрын
Thankyou for illustrating it so well , i was not clear with the reasoning behind dropping the first column when using the dummies. But now i have clear idea about that
@dataschool Жыл бұрын
Glad I could be helpful!
@sandhya68184 жыл бұрын
That bonus is awesome... Thankyou so much... You explained it so well....
@dataschool4 жыл бұрын
My pleasure!
@tronalddump24442 жыл бұрын
Thanks bro. You are my hero ❤
@dataschool2 жыл бұрын
Thank you!
@crigar0014 жыл бұрын
amazing bro, are you have material in español?
@ramleo14615 жыл бұрын
Hi Kevin, In relation to the bonus question, Do I need to assign the results of get dummies to a variable in order to make the changes permanent?
@dataschool5 жыл бұрын
Yes you do!
@luisportillo34913 жыл бұрын
Dude, you're amazing! new follower here!
@dataschool3 жыл бұрын
Thanks!
@alainleclerc4523 Жыл бұрын
you are a wonderful teacher!! thank you very much!!
@dataschool Жыл бұрын
Thanks so much!
@Negr0ni3 жыл бұрын
Yours videos are making me passionate about the data science career again, also they are making my first Job on data analytics so much easier. Thank you so much!
@dataschool3 жыл бұрын
You're welcome!
@haciendadad5 жыл бұрын
I really like that he explains the extra attributes and the things that people gloss over. For example, the : and axis. I'm a newbie, so that little stuff was useful to me.
@dataschool4 жыл бұрын
Great to hear!
@rohitjacob88906 жыл бұрын
Hello Kevin. I am a big fan of your work.Being a big user of R, your tutorials have made me like Python so much that I have completely switched to Python at work now. It would be very helpful if you did a video series each on other basic packages in python like numpy,matplotlib, seaborn , stats models and bokeh.Learning from your videos is so much easier and less time consuming. Currently I am working on my internship during my course and I use atleast one of your tips daily at work.Thanks again. Hoping to see more good content like this.Cheers!!!!!!
@dataschool6 жыл бұрын
That's awesome to hear! Thanks for your kind comments and suggestions! I will do my best :)
@nadineprins16474 жыл бұрын
This was so useful! i didn't know your channel before I googled how to make dummies in pandas. Definitely going to check out your other videos :)
@fet16124 жыл бұрын
2:05 the Series-Map method train['Sex_male']=train.Sex.map({'female':0, 'male':1}) train.head(2) Dummy encoded map({'female':0, 'male':1}) female ==> 0, male ==> 1
@fet16124 жыл бұрын
7:10 train.Embarked.value_counts() S 644 C 168 Q 77 Name: Embarked, dtype: int64 the embarkation points of The RMS Titanic were: (1) Southampton, England, (2) Cherbourg, France, and finally (3) Queenstown, Ireland in April 1912,
@fet16124 жыл бұрын
3:55 try the following piece of code train.columns Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked', 'Sex_male'], dtype='object') >>>>
@AhmedKhaliet2 жыл бұрын
Thank you 💞 it's really great ❣️
@dataschool2 жыл бұрын
You're welcome!
@RachelBb-k6q8 күн бұрын
Hey Kevin, regarding dummy variable, what is the technique I can apply in a model input data if i am foreseeing high sales performance in a future or pent up demand etc? Would you still add 0 and 1 to flag those dates?
@aytachuseyn38103 жыл бұрын
I have a question. 🙋♂️ There are two variables and dozens of observations on the set that we converted to dummy variables. If we delete one of the dummy variables and then delete the original variable, how does the train time machine understand which one belongs to which one? E.g; Sex_Female and Remarked_C have been deleted. Then came the new variable for prediction: Sex_Male: 0, Remarked_Q: 0 Remarked_Q: 0, 1, 0. Is it Sex_Female 0 or is it Remarked_C 0? How does the machine know which variable is Sex_Female and which is Remarked_C? (No ordering because real variables have been deleted) P.s. If you do not understand the question, I m sorry for my bad English.
@sarmigarmi4 жыл бұрын
Awesome! I have a question. Why are we dropping the first column? As in, for example, Embarked_C?
@dikshyasurvi68693 жыл бұрын
This was useful. How do you create dummies for specific ranges ? For instance, 10-50% 1 group, 50-70% - group 2, etc.
@andreacazzaniga84884 жыл бұрын
very good especially the last trick !
@dataschool4 жыл бұрын
Thank you!
@Malachiasz19834 жыл бұрын
Great video. It's a shame that KZbin algorithm will probably demonetize it due to "sexual content" :(
@Jinsh05 жыл бұрын
SUPER VIDEO!! Very Useful!
@dataschool5 жыл бұрын
Thanks!
@ROT4C3 жыл бұрын
Suppose I have multiple columns of dummy variables and I simply want a sum of the variables across those columns, how do I do that?
@ashokgahatraj12102 жыл бұрын
It is crystal clear , thanks man❤️
@dataschool2 жыл бұрын
You're very welcome!
@ashwinsingh13255 жыл бұрын
These are great tutorials! Finally found a clear, concise explanation for why your code is written the way it is :)
@dataschool4 жыл бұрын
Thank you!
@LS-rw3hn5 жыл бұрын
Dude seriously, you just saved me a lot of work.
@dataschool5 жыл бұрын
Awesome, that's great to hear!
@MrKingoverall5 жыл бұрын
THAAAANKKK YOUUUUUU !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! I love you man !!!!!!!
@dataschool5 жыл бұрын
Ha! You are very welcome! 😍
@VarunKumar-pz5si4 жыл бұрын
Living Legend Kudos...!!!!!!
@dataschool3 жыл бұрын
Thanks!
@JerryBlane5 жыл бұрын
Hi, just wanted to say I love your videos. Can you please do a video on join(), concat(), and merge()?
@dataschool5 жыл бұрын
Thanks for your suggestion! See here for concat: kzbin.info/www/bejne/Z2bUXpypbbWSfpY
@watheusbr2 жыл бұрын
so helpful, thanks a lot!
@dataschool2 жыл бұрын
Great to hear!
@rhettsmedia4 жыл бұрын
How’s the public repository moved
@lumosyang6 жыл бұрын
OMG you just saved my ass, thank you!!! and love you!!! will follow up and watch thru all your data videos.
@dataschool6 жыл бұрын
You're very welcome! :)
@vijayanandhan46495 жыл бұрын
Great Tutorial about to deal categorical variables with dummies. The last bonus tips is helped my assignment.
@dataschool5 жыл бұрын
Great to hear!
@robertue13 жыл бұрын
Thank you so much for this video, really well and easily explained!
@dataschool3 жыл бұрын
Thank you!
@angsumandas13 жыл бұрын
My dox. Its 4 year old I am seeing it now
@alensadventures20804 жыл бұрын
Hey I'm new to Python and I just wanted to say that your videos are super clear and easy to understand! This has been a great help for me! Teaching code is clearly your calling
@dataschool4 жыл бұрын
Thanks very much for your kind words! I really appreciate it 🙏
@D4nte-RN10 ай бұрын
Like usual... I will try to understand some ML concept which is not clear for me. I make the same way: clik, clik, clik between movies from youtubers - most of them make movies from the same source, without thinking, without understand. And then finally, once again, I'm on your channel and you explain me everything with clear and slow. Thanks for your amazing job!
@dataschool9 ай бұрын
Thanks so much for your kind words!
@ajaykushwaha-je6mw3 жыл бұрын
Best tutorial video on Dummy variable.
@jasonwong83154 жыл бұрын
awesome!! excellent!!!
@dataschool4 жыл бұрын
Thank you!
@dembobademboba692411 ай бұрын
Very helpful and very interesting....keep up the good work always bro....
@dataschool9 ай бұрын
Thank you!
@arjunpukale33105 жыл бұрын
Should we apply feature scaling to categorial columns?
@dataschool4 жыл бұрын
I'm not sure there is a definitive answer to this, sorry!
@manasa410878 жыл бұрын
I am addicted to your videos ...I want to re do my old assignments with all the tricks :)
@dataschool8 жыл бұрын
Ha! Great to hear :)
@vipul53404 жыл бұрын
Can we assign numbers from 0,1,2,3... to a categorical variable rather than making n-1 extra columns?
@dataschool4 жыл бұрын
Yes, but you should generally only do that if the categories have a natural ordering.
@Thelaunius4 жыл бұрын
Hi. I understand we only need k-1 dummy variables because we can infer the last variable from the rest, but how would that affect certain classifiers like rule-based ones for example? If they don't have that last variable they cannot create rules like "IF Vk = 1 THEN class = 0". I am thinking that they might not be able to infer it because they only use what columns they have.
@dataschool4 жыл бұрын
I'm not sure how to answer that question, I'm sorry!
@sandy0111875 жыл бұрын
Thank you. i was searching for what is drop_first=True. And i found this video. The bonus tip which you had explained cleared this doubt. Please make more videos like this, on interesting tricks and tips on python, machine learning and data science.
@dataschool5 жыл бұрын
You are in luck, because I'm working on a video of my top 25 pandas tricks right now!! Stay tuned...
@flutterflowhack2 жыл бұрын
Easy to understand, straight to the point thank you for your tutorials they have been of great help
@dataschool2 жыл бұрын
You're welcome!
@8eck3 жыл бұрын
Very clear and very helpful, thank you very much. But i still don't understand why we need to remove a column after making dummy columns? It is like with training and test data?
@dataschool3 жыл бұрын
Whether or not you need to depends on the circumstances. See this video for more: kzbin.info/www/bejne/hIrXqKysrtt3e80
@haciendadad5 жыл бұрын
Wow, rarely do you see such a high rating. usually about 10 - 20% vote down. Good ones are like 5%, this guy has less than 1%. Gotta subscribe to him if he is that good! I loved the first video, cant wait to see more.
@dataschool4 жыл бұрын
Thank you so much!
@fet16124 жыл бұрын
6:45 Break the bottom piece of code into several segments and contemplate the output. Ask yourself why it happens and when it doesn't. then following along will start making more sense. Remember, a good data scientist is always thinking and he is always LEARNING. pd.get_dummies(train.Sex) pd.get_dummies(train.Sex, prefix='Sex') pd.get_dummies(train.Sex, prefix='Sex').iloc[:,:1]
@Ihsan_almohsin3 жыл бұрын
you are simply awesome
@dataschool3 жыл бұрын
Thank you!
@prathikasundaramurthy5 жыл бұрын
Hey, what if I have a larger amount of categorical data? E.g.: 15000 unique values of that feature
@dataschool5 жыл бұрын
It depends, but you probably wouldn't use dummy variables. There's not a simple answer, sorry!
@marklittlewood24187 жыл бұрын
If you can create a video or series on Tensorflow that is not esoteric then I would be more impressed than I already am with your video tut's, many thanks
@dataschool7 жыл бұрын
Thanks for your suggestion!
@fet16124 жыл бұрын
3:50 Dummy Variables - an alternative method pd.get_dummies(train.Sex) this is a top-level function meaning you have to write pandas. (or, pd.) before it such as: pandas.get_dummies()
@twafsimon1033 жыл бұрын
if we want all the three values of the Embarked feature in a single column mean values 0,1 and 2 for the Individual category How could we do it?
@dataschool3 жыл бұрын
You could use the map method, see this video for an example: kzbin.info/www/bejne/hpDUYaehjtapic0
@sumitbali91945 жыл бұрын
Can't thank you enough for the BONUS tip!!!! Impressed!!!
@dataschool5 жыл бұрын
You're very welcome! :)
@sushichanel72996 жыл бұрын
We'd like to know more about tensorflow and machine learning. Thanks so much for great videos.
@dataschool6 жыл бұрын
Thanks for your suggestion!
@sushichanel72996 жыл бұрын
Thanks so much Sir.
@samc24816 жыл бұрын
yeah, Thanks kevin, but tensorflow tutorial would be booommmm, please try it, Thanks
@dataschool6 жыл бұрын
I appreciate the suggestion!
@vinayaknaik5404 жыл бұрын
Hi, I wanted to know which one do you prefer onehotencoder from sklearn or get_dummies pandas method.... What are the pros and cons of both methods...
@dataschool4 жыл бұрын
I now recommend OneHotEncoder from scikit-learn if your goal is to prepare your dataset for Machine Learning. I have a whole video explaining exactly how to do this: kzbin.info/www/bejne/n6OrmXeDl9xmrtE
@amitdarak7 жыл бұрын
Why does pd.get_dummies works with iloc and not loc?
@dataschool7 жыл бұрын
It will work with either, but I use iloc because it allows me to always use the same code since I'm referencing columns by position. If you use loc, you have to reference columns by name, but the names will change every time. More information is here: kzbin.info/www/bejne/rqfTf3Rtl6hrmdU
@carlosdiaz34284 жыл бұрын
Hi Kevin, How could I apply this to numeric variables? For example, if the ticket fare is in [0, 2000) have a 0 and if it is in [2000, inf) have a 1 Thanks!
@bharath-cm2bt5 жыл бұрын
thank u......
@dataschool5 жыл бұрын
You're welcome!
@dipakraut60585 жыл бұрын
Great Explanation, Just Amazing.
@dataschool4 жыл бұрын
Thank you!
@rashayahya4 жыл бұрын
Can you please explain the difference between join, concat, and append... .thanks
@dataschool4 жыл бұрын
I just released a video on that topic! See here: kzbin.info/www/bejne/n4q6fJmLhNl6l9k
@sourovroy79515 жыл бұрын
Great!
@dataschool5 жыл бұрын
Thanks!
@alal-zj4zb5 жыл бұрын
Very nice video and great explanatio. Keep it up 👏👏
@dataschool5 жыл бұрын
Thank you!
@kostasnikoloutsos51727 жыл бұрын
I cannot understand why is dummy variables useful? At first I thought that it was something similar to type categories we learned earlier but at the end of this video I realized that they are not worth I did not know when and why do I need those dummy variables!
@dataschool7 жыл бұрын
I cover dummy encoding in this lesson: github.com/justmarkham/DAT8/blob/master/notebooks/10_linear_regression.ipynb Hope that helps!
@muslumyildiz56943 жыл бұрын
Thank you so much. You are a really wonderful great instructor..
@dataschool3 жыл бұрын
Thank you so much!
@seansantiagox3 жыл бұрын
Thanks for showing how to add this to the dataframe, very helpful!
@dataschool3 жыл бұрын
Glad it was helpful!
@kuldipchauhan5246 жыл бұрын
your vedios are awesome - i get back to your vedios whenever get stuck anywhere - not only i get solutions- i get bonus - which is always for real
@dataschool6 жыл бұрын
Thanks for your kind words! Glad I can be helpful :)
@sabinadhikari26433 жыл бұрын
Which encoder should we use If the column has more than 100 categorical values?
@dataschool3 жыл бұрын
That's a complex question, but you can always try one-hot encoding or ordinal encoding, regardless of the number of levels.
@jaikishank4 жыл бұрын
Great video and simple explanation . Thank you. One clarification if we need to feed the columns to the data frame for modelling hope we should not use drop=True (since the variable will be lost) or am i assuming wrong???
@alimahmood41584 жыл бұрын
hi there bro i have 24 different categories .So how many column should i have to drop in that case
@dataschool4 жыл бұрын
Sorry, I'm not sure I understand?
@wuminminnie3 жыл бұрын
This is awesome, thank you so much
@misslindiwelive2 жыл бұрын
Once again, my fighter!
@ankitgupta66974 жыл бұрын
Sir i want to know .What does get_dummies() function do and why it is needed?
@dataschool4 жыл бұрын
That's what the video covers! Hope it's helpful to you.
@borntolose_livetowin6 жыл бұрын
Let's imagine I have NaN in my Embarked-column. Regardless of my replacement-value, I would have 4 new columns. How many columns (or which) do would I have to remove?
@dataschool6 жыл бұрын
I think I understand your question... you can still define any of those columns as the baseline level and remove it. Hope that helps!
@borntolose_livetowin6 жыл бұрын
Ahhh, ok, I see ... a NaN-value is more or less nothing else than another category... just checked the documentation, NaN will be handled by the get_dummies-function by default as baseline :-) thanks!
@22MJangel5 жыл бұрын
Detailed and systematic= easy to follow..
@dataschool5 жыл бұрын
Thanks!
@eric33726 жыл бұрын
This was an exceptional video! Thank you so much! Sincerely!
@dataschool6 жыл бұрын
You're very welcome! Glad it was helpful!
@jourdango26155 жыл бұрын
Hi, I understand how dummy variables work, but why would we want to drop the first dummy variable column? If i were someone looking at the dataframe, i'm going to end up thinking that 'male' or 'not male' are the categorical values for Sex, and i'm going to think embarked only has 'Q', 'S', and 'Not Q and Not S', i'm not going to know that the other Embarked Value is 'C'. Isn't this dropping readability and data? how does this help????
@juliangermek48435 жыл бұрын
As someone who didn't come too far in data science yet, I'd say: You're right, by dropping these columns you forget what this first category was. We don't create this dummy dataset for humans to read, however, but for computers and their algorithms. They don't know what these letters mean anyway (neither do I, as a matter of fact); for them it is just important to be able to distinguish between three cases; and this they can still do: Q, S, or neither of them. I stand to be corrected by someone with more experience ;) (Was curious: the letters apparently indicate the Port of Embarkation: C = Cherbourg; Q = Queenstown; S = Southampton)
@dataschool5 жыл бұрын
Excellent answer, Julian! 👏
@twafsimon1033 жыл бұрын
I am always inspired by your lecture thanks
@dataschool3 жыл бұрын
Thank you! 🙏
@PankajMishra-rt6hr8 жыл бұрын
Hey kevin :) One question....here if we use get_dummies we add more and more colums to our data frame,is there any way to do this inplace like if our series has 'adult','kid','senior_citizen' so whenever it occurs adult get replaced by 0,kid with 1,senior citizen with 2 and so on for different values whenever it occurs in the series,can I map like this ? Thanks
@PankajMishra-rt6hr8 жыл бұрын
EDIT : I have found it,for future readers,we can do this using sklearn's preprocessing package. STEPS: 1)Import Package - from sklearn.preprocessing.LabelEncoder() 2)Make object(or whatever it is called) - le=LabelEncoder() 4)To convert into numbers- train['Sex']=le.fit_transform(train['Sex']) 5) To convert back - train['Sex']=le.inverse_transform(train['Sex']) That's it :)
@dataschool8 жыл бұрын
Right! LabelEncoder is useful for taking a series of categorical data and converting it into a series of integers representing the categories. You can also do this within pandas using factorize: pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.factorize.html
@thereadletter24267 жыл бұрын
That bonus tip is amazing. Thank you!
@dataschool7 жыл бұрын
Glad you liked it! You're very welcome :)
@NikhilKumar-pz3uz6 жыл бұрын
I used the bonus tip but the columns names I didnot passed in the column list is also getting converted how can we solve this??
@dataschool6 жыл бұрын
Sorry, it's hard for me to say without seeing your code. Good luck!
@dembobademboba692411 ай бұрын
Please send to me more link of python , entire guidelines ....
how do u use get_dummies in data pipiline for example when test data and train data is not split from same source ?
@dataschool4 жыл бұрын
For creating dummy variables within a pipeline, I definitely recommend using scikit-learn's OneHotEncoder instead. I have a lesson about that here: kzbin.info/www/bejne/n6OrmXeDl9xmrtE
@mkosinski8 жыл бұрын
Let's say you did conversion to dummy variable and now want to train multinomial classification algorithm (say logistic regression) on Embarked column. You would need the original Embarked column, wouldn't you?
@dataschool8 жыл бұрын
You would not need the original Embarked column. The dummy variables encode the same information as the Embarked column, but in a numerical way that can be used by a machine learning model - that's the primary reason you create the dummy variables. Hope that helps!
@mkosinski8 жыл бұрын
Thanks!
@cookiepp37774 жыл бұрын
Thank you very much
@raghulsuresh19975 жыл бұрын
Is there any way to reverse get_dummies so that i can get my original dataframe?
@dataschool5 жыл бұрын
Great question! I don't think so, but maybe it's possible.
@anotherone62764 жыл бұрын
your bonus tip was a life saver! Thank you thank you thank you
@dataschool4 жыл бұрын
Glad it helped!
@l88704 жыл бұрын
thanks, its really helped. can u do the mean target encoding?
@dataschool4 жыл бұрын
I'm not sure what you mean, I'm sorry!
@usmanshaikh11155 жыл бұрын
Thank you wonderful explanation as always!
@dataschool5 жыл бұрын
You're very welcome!
@nikotuba5 жыл бұрын
It's impossible to listen at normal speed. only 1.5
@dataschool5 жыл бұрын
Everyone has their own preferences. Thanks for watching!
@rohitshete53725 жыл бұрын
How I encode the variable Ticket in the dataset?
@dataschool5 жыл бұрын
You probably need to use string methods to pull out whatever you think is relevant.
@tensianne4 жыл бұрын
Thank you for all those great videos!
@dataschool3 жыл бұрын
Thanks!
@ragurajan75675 жыл бұрын
Bro which one is better? One hot encoding or dummy method?
@dataschool5 жыл бұрын
Nowadays, I recommend OneHotEncoder from scikit-learn rather than get_dummies from pandas.