Advance House Price Prediction-Feature Engineering Part 1

  Рет қаралды 81,794

Krish Naik

Krish Naik

Күн бұрын

Пікірлер: 97
@akashsaha3921
@akashsaha3921 5 жыл бұрын
Applied Ai course plus krish's tutorial!!! Deadly combination. I am in love wd ML. Thanks to you. You just changed my perception and gave me a perfect strategy to proceed. God bless u man!!!
@ganeshahire8168
@ganeshahire8168 5 жыл бұрын
Bro how much % you completed the course??
@unsharma9229
@unsharma9229 5 жыл бұрын
Which course u r talking about...where I can get the course please tell?
@akashsaha3921
@akashsaha3921 4 жыл бұрын
@@ganeshahire8168 now doing case studies
@akashsaha3921
@akashsaha3921 4 жыл бұрын
@@unsharma9229 search applied AI in google
@tulrose
@tulrose 4 жыл бұрын
You are right. I have completed the course recently. I come here for quick revisions.
@shatadruroychowdhury6319
@shatadruroychowdhury6319 3 жыл бұрын
5:00 That condition should have been >0, if u have a feature with only 1 missing value, u won't be able to capture it.
@Ravi-sl5ms
@Ravi-sl5ms 5 жыл бұрын
you have such a nice style of explaining things. waiting eagerly for the next part
@RahulGupta-kd1cn
@RahulGupta-kd1cn 5 жыл бұрын
dividing train and test, if we have train and test data. then we can do like this train data, validate data (that will come when we divide the train data into the train and validate data) and use test data for testing the model
@astrostudent2302
@astrostudent2302 4 жыл бұрын
Finding missing values part at 05:04..I have a doubt... Why is it not dataset[feature].isnull().sum() >= 1 (I have added equal operator). Can you please clarify sir?
@RavinderSingh-te8vy
@RavinderSingh-te8vy 3 жыл бұрын
maybe for features having only 1 missing value , complete row can be deleted.
@c.vinaykumar7737
@c.vinaykumar7737 3 жыл бұрын
hello krish sir, your videos are brilliant, one slight correction, while calculating percentage of nan values in columns, your using mean to calculate percentage, after using mean you should multiply by 100 to get percentage, pardon me if iam wrong.
@youssefzayn1159
@youssefzayn1159 2 жыл бұрын
it is not necessary as multiply it by 100 is going to just make it more clear but you already know that 0.53 is 53%
@devadularani6811
@devadularani6811 5 жыл бұрын
Explained well...will be waiting for tomorrow's video
@ocean2738
@ocean2738 Жыл бұрын
In train test split part their should be dataset.drop('SalePrice'),dataset['SalePrice]
@ananthkumar8901
@ananthkumar8901 11 ай бұрын
yes you are correct
@Mish-333
@Mish-333 Ай бұрын
Good explanation of the codes, but if you look closely, he's not explained the logic behind the codes meaning - did not explain the real reasons of why the particular steps are followed, or say what's the ultimate goal of the code/s.
@csprusty
@csprusty 4 жыл бұрын
Amazing effort and clear explanation. great work Krish!!!
@ijeffking
@ijeffking 5 жыл бұрын
Very nice tutorial. Thank you very much.
@sahayaajay7684
@sahayaajay7684 4 жыл бұрын
Thank you for sharing your knowledge. I am waiting for your part 2 of feature engineering.
@sandipansarkar9211
@sandipansarkar9211 4 жыл бұрын
Finished practising this particular code in Jupyter Notebook. thanks.
@rakeshdayalan8049
@rakeshdayalan8049 4 жыл бұрын
Krish , you're awesome ! thanks for your videos
@aashishrana4129
@aashishrana4129 3 жыл бұрын
very insightful video, thanks krish
@ishankanodia7477
@ishankanodia7477 7 ай бұрын
For train test split, in the brackets we should use x and y but why are we using the complete dataset in place of x? (it contains Sales Price too)
@AmanKumarSharma-de7ft
@AmanKumarSharma-de7ft 5 жыл бұрын
Great work sir👏👏👏simplicity as always Please upload the roc cirve video, docker video and how to write a research paper in ml and dl. Thanks for your support
@ManishKumar-qs1fm
@ManishKumar-qs1fm 5 жыл бұрын
Really nice video 👍 sir, next part plz explain highly -ve or +ve skewed data, how it's becom normal in one code
@sarrae100
@sarrae100 3 жыл бұрын
Excellent, yet simple.
@Abdullahkbc
@Abdullahkbc 2 жыл бұрын
pretty smart way at 09:58
@ananthkumar8901
@ananthkumar8901 11 ай бұрын
KrishNaik video + chatGPT/Bard is a deadly combination
@arunprabhu1853
@arunprabhu1853 5 жыл бұрын
Hi krish, I am in need of guidance from experts like you. My qualification is MSc (IT). My entire work experience is, being a "Computer Teacher". I am at my 30's too. By self-interest right now i am pursuing some Data Science courses through online. My question is, whether the companies will consider me for an IT job. Even as a fresher ? Please guide me in this. I am planning to change my profession.
@sidnayak4395
@sidnayak4395 3 жыл бұрын
Yes ..one of my Faculty from Engg college with 4+ Years exp is working as ML Engineer
@raghupro
@raghupro 4 жыл бұрын
Thanks for the video Krish. Would like to understand why for identifying features with missing values you have considered isnull().sum() > 1 and not > 0. If a feature has only 1 missing value, can we omit it?
@tanishajain9073
@tanishajain9073 4 жыл бұрын
Did you get the ans.?
@prabhatkumarsharma4240
@prabhatkumarsharma4240 3 жыл бұрын
as you said in the case of high outliers the missing values for the feature should be replaced with median and mode, can you elaborate why? and what we should do if our variables don't have many outliers? Please answer if possible, it may solve others doubts as well.
@madhuradhongade7632
@madhuradhongade7632 2 жыл бұрын
It might be because high outliers imply skewed distribution and hence mean would not be the correct measure of central tendency to consider. Take an eg, given a set of numbers: 10,15,17,20,95. The mean of this set of numbers would be 31.4 although most of the data is centered around 15/16. The reason being 95- the outlier. Hence it makes more sense to consider the median i.e. 17 as it makes more sense w.r.t to the given set of numbers. Correct me if I am wrong/missing something
@dipanshuawhad7396
@dipanshuawhad7396 2 жыл бұрын
For that you can prefer 7 day live statics playlist krish mentioned there about what is suitable mean,median,mode in case of outliers present in the dataset and why too
@sandipansarkar9211
@sandipansarkar9211 4 жыл бұрын
Great video Krish. Now need to get my hands dirty with coding in Jupyter Notebook. Thanks
@143balug
@143balug 4 жыл бұрын
Hi Krish, I have observed one thing here, the temporal feature "GarageYrBlt" is replaced the missing values while handling the numerical feature. Could you please correct me if i am wrong.
@aimeokoko726
@aimeokoko726 5 жыл бұрын
Thanks for your videos, very useful. I have a question. To handle missings, you use "sum()>1", does it take into account features with only one missing?
@manojrangera
@manojrangera 3 жыл бұрын
Use sum() >=1
@manishchouhan6626
@manishchouhan6626 3 жыл бұрын
@@manojrangera Use sum () > 0
@miteshkumar7739
@miteshkumar7739 5 жыл бұрын
Hello sir... How to work data scientist in company. Make practical video plzzz....
@peacefullmusic8374
@peacefullmusic8374 3 жыл бұрын
@Krish Naik bro why did use simple imputer for missing values ?
@ayanasalim3318
@ayanasalim3318 4 жыл бұрын
If the year fields are subtracted from year sold, is there handle situation where year value is zero, say in yearmodified? Because 2020-0 would give 2020 years since modified.
@unezkazi4349
@unezkazi4349 4 жыл бұрын
The percentage of missing values that you are printing in the start are divided by 100 I guess. You need to multiply them by 100.
@unezkazi4349
@unezkazi4349 4 жыл бұрын
And how did you handle categorical features? By just replacing nan with missing?
@vinayakbasavaraddi3135
@vinayakbasavaraddi3135 4 жыл бұрын
why are we not creating dummies for categorical variables ? Instead of just replacing the null values with "Missing"
@vignesh7687
@vignesh7687 3 жыл бұрын
Nice Krish. I have one question, why can't we fill the NAs with mode value of that feature in a categorical feature column? Why encoding those as 'Missing'?
@ranganathjoshi1592
@ranganathjoshi1592 3 жыл бұрын
Bcoz,it can be even used for encoding(if in case).
@saipatibandla4049
@saipatibandla4049 4 жыл бұрын
Hi Krish, when I looked at the description for this dataset, some of the categorical features had 'NA' as one of the categories. I think this conflicts with data not being present vs showing some category as NA. Wouldn't that be like a problem when deciding which data is actually missing?
@vijendramathur1483
@vijendramathur1483 5 жыл бұрын
Any idea about Algo Trading?
@SuperChowhan
@SuperChowhan 5 жыл бұрын
Plz plz help me with Anaconda installation. I have reinstalled anconda but i cannot find anaconda-navigator, anaconda command prompt. No shortcuts are found related to anaconda.plz plz help me
@udaymishra3238
@udaymishra3238 4 жыл бұрын
which OS ?
@rohitjaiswal6102
@rohitjaiswal6102 5 жыл бұрын
Thank you sir...
@tirumaleshn8504
@tirumaleshn8504 4 жыл бұрын
Krish sir! Why didn't use the train and test data separately for feature engineering?
@raghavarora5077
@raghavarora5077 4 жыл бұрын
What if we use K - fold Cross validation instead of Train-test-split?
@louerleseigneur4532
@louerleseigneur4532 3 жыл бұрын
Thanks Krish
@suryapratap1961
@suryapratap1961 4 жыл бұрын
Dataset.groupby('YrSold')['SalePrice']. median ().plot() . Here median will give one value so how we can plot from this ?
@mohanramesh3506
@mohanramesh3506 4 жыл бұрын
Hi Krish, How do I handle a column with 'Text description' i.e, a paragraph of text in it. Please let me know.
@abdurahman1019
@abdurahman1019 Жыл бұрын
I am a beginner at data science and python as well. Is it expected from me to be able to write all these code by myself or is the understanding enough for an interview?
@adityapathania3618
@adityapathania3618 3 жыл бұрын
For numerical nan values I am getting all replaced by 0 none of them are replaced by 1 , any suggestions. @krish
@saswatleo
@saswatleo 4 жыл бұрын
Why you are doing NaN replacement with 0 or 1 As u r replaced with Median ?? Plz Clarify
@adityanarendra5886
@adityanarendra5886 3 жыл бұрын
Which feature is replaced with 0,1,median respectively?
@manojrangera
@manojrangera 3 жыл бұрын
Sir I have 1 question in categorical features some of category having more than 90 ℅ of missing data . So can we remove those feat from dataset...? I saw EDA video also in which we get that missing data have important role in saleprice.. May be that y you used all features to get all the information.. Please sir reply me.. And get my doubt clear.. 🙏🙏
@rajsekharrouthu8438
@rajsekharrouthu8438 4 жыл бұрын
Can we do feature engineering together for trains and test data at once
@rohitjaiswal6102
@rohitjaiswal6102 5 жыл бұрын
Please upload the 2nd part.
@amitbudhiraja7498
@amitbudhiraja7498 3 жыл бұрын
Sir u forgot to remove the outliers in the data
@sumironchatterjee6289
@sumironchatterjee6289 4 жыл бұрын
In the last part, my 'YearBuilt' and 'YearRemodAdd' got converted, but 'GarageYrBlt' did not. Someone Help Please.
@shreyapande9289
@shreyapande9289 4 жыл бұрын
why to replace the' nan' values ONLY by median or mode in numerical feature ... is there any specific reason ?
@shashireddy7371
@shashireddy7371 4 жыл бұрын
Becuase the feature has Outliers. If you take mean it will be completly wrong value .
@mranaljadhav8259
@mranaljadhav8259 4 жыл бұрын
Because, the feature has outliers, here if you take mean it influenced by outliers and skewed distribution, so median is good to handle or deal with outliers.
@mansisarda2259
@mansisarda2259 4 жыл бұрын
Sir why GarageYrBuilt values is getting converted to object datatype after handling its missing values ?
@mranaljadhav8259
@mranaljadhav8259 4 жыл бұрын
I think you typed something wrong while handling missing value ,check your code ,my dataset [GarageYrBuilt ] is of float type.
@ramavathusrinivas8282
@ramavathusrinivas8282 4 жыл бұрын
why we are keeping mean here dataset[feature].isnull().mean()
@aashiagarwal9870
@aashiagarwal9870 4 жыл бұрын
I need help sir. If i do same code with test data than in place of saleprice which variable should I use?
@thisdot3955
@thisdot3955 3 жыл бұрын
I too have the same doubt.
@ranjan4495
@ranjan4495 4 жыл бұрын
Sir, The test.csv dataset, do not contain a feature named "Sale price". So how to proceed in this dataset.
@Joshua75623
@Joshua75623 4 жыл бұрын
Hi did you got the answer for this question??
@mranaljadhav8259
@mranaljadhav8259 4 жыл бұрын
Hey test data doesn't contain target variable.
@lijindurairaj2982
@lijindurairaj2982 3 жыл бұрын
GOD bless you
@Neuraldata
@Neuraldata 4 жыл бұрын
Great video sir, I have also started the initiative to teach Data Science online for knowledge dissemination :)
@manishbolbanda9872
@manishbolbanda9872 4 жыл бұрын
discrete_feature = [feature for feature in num_feature if len(dataset[feature].unique())
@christiansetzkorn6241
@christiansetzkorn6241 Жыл бұрын
nothing advanced even in part 2 )-:
@gulsanafatima419
@gulsanafatima419 4 жыл бұрын
Sir plzz urdu m v De dety same yhi lecture h
@sandipansarkar9211
@sandipansarkar9211 3 жыл бұрын
code finished
@devleenabanerjee4036
@devleenabanerjee4036 4 жыл бұрын
Hi Krish, Thanks for such a nice explanation. If I want to share my file with you, where should I send? Please share your email ID. Thankyou.
@krishnaik06
@krishnaik06 4 жыл бұрын
krishnaik06@gmail.com
@devleenabanerjee4036
@devleenabanerjee4036 4 жыл бұрын
@@krishnaik06 thankyou
@scott.bradley.16940
@scott.bradley.16940 5 жыл бұрын
Put the camera so we can also see your mouth. It makes it easier to understand what you are saying.
Advance House Price Prediction-Feature Engineering Part 2
13:54
Krish Naik
Рет қаралды 66 М.
How to treat Acne💉
00:31
ISSEI / いっせい
Рет қаралды 108 МЛН
coco在求救? #小丑 #天使 #shorts
00:29
好人小丑
Рет қаралды 120 МЛН
Enceinte et en Bazard: Les Chroniques du Nettoyage ! 🚽✨
00:21
Two More French
Рет қаралды 42 МЛН
Advance House Price Prediction- Exploratory Data Analysis- Part 1
23:29
Началось! Обвал цен на немецкую тройку в Китае.
15:02
The Sad Reality of Being a Data Scientist
8:55
Samson Afolabi
Рет қаралды 115 М.
Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic)
2:20:17
Ryan & Matt Data Science
Рет қаралды 25 М.
Google's 9 Hour AI Prompt Engineering Course In 20 Minutes
20:17
Tina Huang
Рет қаралды 107 М.
House Price Prediction in Python - Full Machine Learning Project
40:40
How to treat Acne💉
00:31
ISSEI / いっせい
Рет қаралды 108 МЛН