No video

Pandas Get Dummies | pd.get_dummies()

  Рет қаралды 20,060

Greg Kamradt (Data Indy)

Greg Kamradt (Data Indy)

Күн бұрын

Пікірлер: 61
@gudguy1a
@gudguy1a Жыл бұрын
Well, kinda late to this party, by a couple of years, but DANG...!!!!! VERY good, clear, explicit explanation with examples. If I could put this all in bold to emphasize, I might... Thx for the short but deep piece, this is the missing spark some of us do not get in a timely manner.
@svishal25
@svishal25 3 жыл бұрын
Hey, I read about the 'dummy variable trap', and that dropfirst should help counter this! Anyway thanks a lot, great video
@DataIndependent
@DataIndependent 3 жыл бұрын
Wonderful! Glad it worked out. Good luck.
@BlissOn47
@BlissOn47 Күн бұрын
The dropfirst parameter reduces the no of data be interpreted without loosing the existing significance of the data. So I guess it makes the data interpretation process more concise. Tell me if I'm getting it wrong :)
@sudiptomitra
@sudiptomitra 3 жыл бұрын
Explained in a crisp & smart way.
@DataIndependent
@DataIndependent 3 жыл бұрын
Awesome thank you! Any other topics or videos you’d like to see?
@sudiptomitra
@sudiptomitra 3 жыл бұрын
@@DataIndependent Like to see VDOs on Train, Test data, Model Building with linear Reg., Computing R2 on train set, prediction on Test set & computation of its R2.
@omarmiah7496
@omarmiah7496 3 жыл бұрын
I see this channel blowing up if you keep it up!
@omarmiah7496
@omarmiah7496 3 жыл бұрын
BTW, in my data science course we're using drop_first indicators for two distinct values. My data set looks at Titanic survivors. My category is the gender. Using pd.get_dummies(df, drop_first =True) it will return a column with only males indicated by a 1. We wouldn't see a column for women but we can infer it implicitly by understanding that 0 refers to women. In my experience, drop_first works well when there are discrete values between different gender classifications (M or F). In your example you have 3 different columns so , 0 and 0 in the table indicates 500 club, while 1 indicates every other cinema club
@DataIndependent
@DataIndependent 3 жыл бұрын
Nice thank you! We've done 50 pandas functions so far, the goal is more content to help people. KZbin is mostly polished information. If you want to see behind the scenes I would go to the twitter account twitter.com/DataIndependent
@DataIndependent
@DataIndependent 3 жыл бұрын
@@omarmiah7496 Nice, ya that works well, you don't need the extra column. The only reason I usually don't do that is because it's harder to explain to other people. Validation by negation. Very do-able but it's up to you on which one you want to use. Thanks for the comments!
@SalmanAhmed-mg9gk
@SalmanAhmed-mg9gk 3 жыл бұрын
short and sweet! Thanks man!
@DataIndependent
@DataIndependent 3 жыл бұрын
Thank you! Glad it worked out.
@pranavgoyal2366
@pranavgoyal2366 3 жыл бұрын
beautifully explained man, great work, thnx for making this video
@DataIndependent
@DataIndependent 3 жыл бұрын
Glad it worked out! Thanks!
@roshinroy5129
@roshinroy5129 Жыл бұрын
Thanks a lot buddy!!!
@DataIndependent
@DataIndependent Жыл бұрын
nice, glad it worked out
@vin___weasel
@vin___weasel Жыл бұрын
great video!
@DataIndependent
@DataIndependent Жыл бұрын
Thanks!
@ashishchandra141
@ashishchandra141 3 жыл бұрын
very well explained ..!!! thank you :)
@DataIndependent
@DataIndependent 3 жыл бұрын
Great! Glad it worked out!
@azuremis
@azuremis 2 жыл бұрын
Brilliant explanation, short and concise!
@DataIndependent
@DataIndependent 2 жыл бұрын
Wonderful! Glad it worked out. Anything else you wanna see?
@azuremis
@azuremis 2 жыл бұрын
@@DataIndependent thanks for asking. Nothing immediately comes to mind 😊
@rishigupta2342
@rishigupta2342 2 жыл бұрын
Thanks for the explanation!!!
@DataIndependent
@DataIndependent 2 жыл бұрын
Nice, glad it worked out
@damdarhudaii1911
@damdarhudaii1911 Жыл бұрын
on point explanation
@DataIndependent
@DataIndependent Жыл бұрын
Get dummies is one of my favorite pandas functions
@harryshambaugh5981
@harryshambaugh5981 Жыл бұрын
FYI - Reason for to use drop_first() = Avoid multicollinearity issues
@DataIndependent
@DataIndependent Жыл бұрын
Nice thanks for the tip
@parsiabolouki7685
@parsiabolouki7685 3 жыл бұрын
we use drop_first=True to prevent duplicated information when doing deep learning.
@DataIndependent
@DataIndependent 3 жыл бұрын
Awesome thanks for sharing! I figured this was the case
@aminmw5258
@aminmw5258 3 жыл бұрын
You're underrated
@DataIndependent
@DataIndependent 3 жыл бұрын
wow thanks - What other videos do you want to see? I'm looking for more content ideas
@WhiskeyjackXXX
@WhiskeyjackXXX 3 жыл бұрын
Good explanation.
@DataIndependent
@DataIndependent 3 жыл бұрын
Glad to hear it. Thank you. What other videos would you like to see?
@user-yt4rl6xs3j
@user-yt4rl6xs3j 9 ай бұрын
seems like the drop_first is used for binary categories. if you have a gender category and the options are Male and Female, it will create two dummies one for Male and second for Female. It is repetitive because if Male is 0 then 100% female is 1. These columns can be combined into 1. so a single female column with 0 s and 1 s, 0 meaning they are male, 1 meaning they are female. Thats my assumption.
@ajaykushwaha-je6mw
@ajaykushwaha-je6mw 3 жыл бұрын
In real world project we may have >20 categorical feature, so do we need to mention all column names in function?
@DataIndependent
@DataIndependent 3 жыл бұрын
Nope, you have options if you don't want to list every column you want by name. You can create the list of column names another way. The most common method I've seen (which one works if your columns are next to each other) is: df.columns[2:20]. This will list out the columns names in that range. Another way to do it is via string matching ("Give me all columns that start with "my_special_column"). Happy to explain this further if you want.
@response2u
@response2u 2 жыл бұрын
Thank you, sir!
@DataIndependent
@DataIndependent 2 жыл бұрын
Nice! glad it worked out.
@response2u
@response2u 2 жыл бұрын
@@DataIndependent Tnx! It sure did!
@emeraldpopcorniac7673
@emeraldpopcorniac7673 6 ай бұрын
When i run get dummies in my dataframe for gender, it returns true/ false instead of numbers - why? Can anyone explain?
@shaikhkashif9973
@shaikhkashif9973 Жыл бұрын
If we have ex-Age categories such as Young ,Adult ,Old for this whether we have to go for either ordinal or nominal encoding pls answer
@DataIndependent
@DataIndependent Жыл бұрын
Sorry I don't understand the question. Could you rephrase it?
@anthonymalary7616
@anthonymalary7616 7 ай бұрын
I have. a question I run my pd.get_dummies() method and my categorical data remains categorical and does not convert into numerical values. My categorical vales are true or false statements
@DataIndependent
@DataIndependent 7 ай бұрын
Do you just have one categorical column? Are you ensuring to set the values on the DF?
@anthonymalary7616
@anthonymalary7616 7 ай бұрын
I have multiple columns of categorical data and when I run the get dummies function on them it just remains true or false any ideas what it could be?@@DataIndependent
@736939
@736939 2 жыл бұрын
What if I have multi-lable target that represented as the list of items (not one mutual exclusive class). how to use get_dummies then?
@DataIndependent
@DataIndependent 2 жыл бұрын
Sorry, I'm not understanding your question. Could you rephrase it?
@736939
@736939 2 жыл бұрын
@@DataIndependent In multi-label classification the target column represented not in atomic way ( not in the 1st normal form), but each ceil of target value represented as the list of items (nested table). Then, how to use get_dummies from pandas to turn all the targets into the proper form of the multi-lable one hot encoding?
@DataIndependent
@DataIndependent 2 жыл бұрын
@@736939 ha ya - sorry man, I'm totally not getting it. If you take a video of your data I can try to figure it out with you.
@tarblood
@tarblood 2 жыл бұрын
good thx
@DataIndependent
@DataIndependent 2 жыл бұрын
Solid!!! Thank you
@aadititanksale5879
@aadititanksale5879 7 ай бұрын
👍
@DataIndependent
@DataIndependent 7 ай бұрын
👍
@daniloyukihara2143
@daniloyukihara2143 3 жыл бұрын
nice video, but not the info i was seeking =(
@DataIndependent
@DataIndependent 3 жыл бұрын
What was the info you were seeking?
@daniloyukihara2143
@daniloyukihara2143 3 жыл бұрын
@@DataIndependent i have a dataset from netflix and a column named 'listed_in" that brings all the movie types using ', ' as a separator. When I apply pd.series.get_dummies(sep = ',') it works fine but duplicates all columns. When I try DataFrame.columns.duplicated() python do not recognize the duplicated columns as True. Just burning my brain to solve this mystery.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 266 М.
How do I merge DataFrames in pandas?
21:49
Data School
Рет қаралды 158 М.
Smart Sigma Kid #funny #sigma #comedy
00:40
CRAZY GREAPA
Рет қаралды 37 МЛН
Lehanga 🤣 #comedy #funny
00:31
Micky Makeover
Рет қаралды 27 МЛН
Turn numbers into categories with the Pandas "cut" method
9:44
Python and Pandas with Reuven Lerner
Рет қаралды 4,7 М.
Pandas for Data Science in 20 Minutes | Python Crash Course
23:06
Nicholas Renotte
Рет қаралды 120 М.
Handle Categorical features using Python
18:37
Krish Naik
Рет қаралды 33 М.
One Hot Encoder with Python Machine Learning (Scikit-Learn)
9:03
Ryan & Matt Data Science
Рет қаралды 16 М.
How do I create dummy variables in pandas?
13:14
Data School
Рет қаралды 85 М.
How to Reshape Dataframes | Pivot, Stack, Melt and More
12:49
Mısra Turp
Рет қаралды 24 М.
How do I select multiple rows and columns from a pandas DataFrame?
21:47
How do I use the MultiIndex in pandas?
25:01
Data School
Рет қаралды 173 М.
Real World Data Cleaning in Python Pandas (Step By Step)
40:01
Ryan & Matt Data Science
Рет қаралды 66 М.