Advance House Price Prediction- Exploratory Data Analysis- Part 1

  Рет қаралды 346,310

Krish Naik

Krish Naik

Күн бұрын

Пікірлер: 266
@alankarshukla4385
@alankarshukla4385 5 жыл бұрын
Sir, this is one of your best video series.Every thing are implemented together not in pieces. Can't wait to watch this videos series together .Hope you soon upload all the remaining parts. Great Work.
@theoutlet9300
@theoutlet9300 4 жыл бұрын
this is so clean. i cant look at my nb now. this series is a gem. i have learned so much
@vengalrao5772
@vengalrao5772 3 жыл бұрын
Bro I want to dm u like I need an roadmapfor data science... please suggest me.. Iam fully confused and I am doing self-study
@amritgurung3410
@amritgurung3410 5 жыл бұрын
This is third notebook i read for this topic from kaggle. And yours was simplest to understand with straightforward code. Thank you for this notebook. Highly appreciated!!
@shivaprakashranga8688
@shivaprakashranga8688 5 жыл бұрын
at 5.26 the code has to be feature_withNa = [fea for fea in dataset.columns if dataset[fea].isnull().sum()>=1] . becoz electrical (feature) has one missing value.
@arjyabasu1311
@arjyabasu1311 4 жыл бұрын
Yes i agree with you
@piratedartist332
@piratedartist332 4 жыл бұрын
If any feature is having only one missing value then we can ignore that and can fill that. Maybe that is why he took sum()>1
@iamjm_rishav7894
@iamjm_rishav7894 4 жыл бұрын
could you tell me if its necessary to examine the data before even starting the notebook!!
@tsungchewu
@tsungchewu 4 жыл бұрын
Shivaprakash, I totally agree with you. That's why totally I have 19 features have null values, but Krish only had 18 null_value features.
@Aman-yu4re
@Aman-yu4re 7 ай бұрын
In the start when Sir took percentage of missing values, just to clarify misunderstandings they are not actually percentages, they are ratios of the missing values (to all the values) .
@srujanachalla8797
@srujanachalla8797 4 жыл бұрын
Hi Krish, this series was too good,end-end implementation for solving a problem. Followed the same steps in my Project and it had lot of appreciation from my Managers. All Credit goes to you. You are just Awesome .Thank You so much.
@harshstrum
@harshstrum 5 жыл бұрын
Thank you krish bhaiya lots and lots of blessings...Please never stop doing this...I have completed two projects by seeing your videos and got learn a lot many things from you. Thanks again.
@junaidarshad8465
@junaidarshad8465 9 ай бұрын
For the mean calculation to find out the missing values, do you need to multiply by 100 to get the percentage?
@amiraaliIsInLove
@amiraaliIsInLove 3 жыл бұрын
I can't believe I've never come across your channel before! Thank you so much! 😍😍😍😍😍😍😍
@abhishek-shrm
@abhishek-shrm 5 жыл бұрын
Sir how you make 2 videos every day? It has now become a new habit of mine to eat, code, sleep and watch your videos. I love your videos.
@dentrifications
@dentrifications 3 жыл бұрын
@@saransh5760 yes
@dentrifications
@dentrifications 3 жыл бұрын
Did u get a job now
@mungarasaikishore3248
@mungarasaikishore3248 3 жыл бұрын
Sir , thank you very much for the videos ,these are very help full, more than some paid online courses.
@elukasreeja7208
@elukasreeja7208 4 жыл бұрын
This is amazing got to learn many possible things and tricks in eda. Thanks a lot , your videos are just awesome!! have been following regularly
@nikhil_somani
@nikhil_somani 3 жыл бұрын
missing values percentage can be checked using more simpler code df.isna().sum()* 100 / len(df)
@deepaklonare9497
@deepaklonare9497 Жыл бұрын
thanks😊
@snehalsanap1750
@snehalsanap1750 2 жыл бұрын
sir what an explanation, this was by far the best video on eda loved it... you are truly an inspiration sir! keep going
@ashishbhai5288
@ashishbhai5288 2 жыл бұрын
why he has taken the threshold value as 25 to define discrete features at timestamp 18:00? how 25 can separate discrete and continuous features from each other? how he has decided this value? if months are less than 25 then what happens? did you get it? please help me out
@manusreeg4999
@manusreeg4999 3 жыл бұрын
str' object is not callable i'm getting thus error at 7:55 after using the code.. I'm a beginner can anyone help me pls?
@salihsarii
@salihsarii Жыл бұрын
This is the best EDA series . Thanks Krish :)
@mohit000singh
@mohit000singh 5 жыл бұрын
Thank you sir!!! for sharing whole procedure. Waiting for the further parts...
@adwaitpatil8300
@adwaitpatil8300 3 жыл бұрын
Helped me a lot i recently enrolled for this competition and is my first time working on more than 20 columns but the way u structured your eda great THANKS!
@ashishbhai5288
@ashishbhai5288 2 жыл бұрын
bro, can you please explain to me why he has taken a value less than 25 for defining discrete_features... I didn't get it
@adwaitpatil8300
@adwaitpatil8300 2 жыл бұрын
@@ashishbhai5288 I need to rewatch the video 😂 it's been a year
@ashishbhai5288
@ashishbhai5288 2 жыл бұрын
@@adwaitpatil8300 if you get some free time then please have a look over it and explain if possible.... dev tujha bhala karo😁😁
@ashishbhai5288
@ashishbhai5288 2 жыл бұрын
@@adwaitpatil8300 timetsamp 18:00
@ShivShankarDutta1
@ShivShankarDutta1 4 жыл бұрын
One of best EDA, Excellent Analysis. Thanks Krish.
@ittecheval1868
@ittecheval1868 3 жыл бұрын
your explaination is not saying, "why you are doing and what in each step". I feel, you are just walkthrough or kind of KT on what you have done.. It confuse and demotivate the new learner sir. I watched almost 40+ of videos of yours..
@sfodjknfwoa
@sfodjknfwoa 3 жыл бұрын
Thank you, I've learned so much already
@adityachandra2462
@adityachandra2462 5 жыл бұрын
Great job done sir, it's really helpful !
@sunilsharanappa7721
@sunilsharanappa7721 4 жыл бұрын
wow superb explanation Krish. Thanks.
@deepakkumarpatel5728
@deepakkumarpatel5728 4 жыл бұрын
This is the best video till data for EDA
@hiteshdamani30
@hiteshdamani30 5 жыл бұрын
Sir please upload video on AUC ROC metrics and Regression Error metrics..... M waiting for it since long
@arshkatyal2807
@arshkatyal2807 4 жыл бұрын
Hi Krish I had a doubt. I didn't understand the relationship found between dependent variables and features having missing values. Please could you explain a bit more
@Paragparashar03
@Paragparashar03 2 жыл бұрын
It is a simple analysis that if only null values ( =1) are taken into consideration, will it affect the dependent variable? if not, then we can consider dropping those values later. And all other values are taken as 0 so that these values won't affect the result during this analysis step.
@rohitbharti9360
@rohitbharti9360 4 жыл бұрын
Very useful information for new learners..... Thanks sir
@sandipansarkar9211
@sandipansarkar9211 4 жыл бұрын
Awesome video for beginners in data science career.Thanks
@TirtharajSen
@TirtharajSen Жыл бұрын
There are mistakes here in this tutorial : 1) the identification of the features that have NaN values. There we must consider a feature to be 1 if there is at least 1 NaN value in the column. But there in the video, it is done >1 which means it needs to have 2 NaN values to be identified as a 1 2) The discrete value identification process is vague.. that is no concrete rule for identifying discrete variable features.
@rohitjaiswal6102
@rohitjaiswal6102 5 жыл бұрын
Thank you so much sir for your great help to us....
@berkaycamur8282
@berkaycamur8282 3 жыл бұрын
Are all these percentage expressions an indication of a null expression?(I'm talking about 07:25 ). What does mean all those percentages?
@ayansrivastava722
@ayansrivastava722 2 жыл бұрын
if anyone is reading this comment, do ensure ...isnull().mean()*100 . i.e multiply by 100 too! the percentage is then correct. If u do sns.heatmap(df.isnull()) and look at second last or 3rd last feature, you're know :)
@deepaklonare9497
@deepaklonare9497 Жыл бұрын
Dataset1['LotFrontage'].isnull().sum()*100/len(Dataset1) Dataset1['LotFrontage'].isnull().mean()*100
@rushikeshbulbule8120
@rushikeshbulbule8120 5 жыл бұрын
Thanks ....you are big support for learning ......
@ashishbhai5288
@ashishbhai5288 2 жыл бұрын
Sir, why have you taken the threshold value as 25 to define discrete features at timestamp 18:00. how 25 can separate discrete and continuous features from each other. many of us have the same query.
@sudhanshuparab198
@sudhanshuparab198 3 жыл бұрын
Amazing work sir
@bhooshan25
@bhooshan25 3 жыл бұрын
learn a lot. will watch again while practicing in jyupter.
@SeluDisplay
@SeluDisplay 2 жыл бұрын
i dont get the point of 8:20, u should just use the feature thats the most important/representative or got biggest weight in the dataset?
@ashishbhai5288
@ashishbhai5288 2 жыл бұрын
why he has taken the threshold value as 25 to define discrete features at timestamp 18:00? how 25 can separate discrete and continuous features from each other? how he has decided this value? if months are less than 25 then what happens? did you get it? please help me out
@geethamadhurimalempati6408
@geethamadhurimalempati6408 5 жыл бұрын
can anyone explain at 9:12, why median, can we have mean or variance or something else?(In comments, it is written as let's calculate the mean value of sales price but in code, it is written as median]
@rakeshranjan9728
@rakeshranjan9728 4 жыл бұрын
As per my understanding, if we use mean it is heavily influenced by outliers, so we prefer to go with median, ie, the middle value.
@shaelanderchauhan1963
@shaelanderchauhan1963 4 жыл бұрын
@@rakeshranjan9728can we use the median in Normal Distribution for Anomaly detection?
@krishvedagiri-q9r
@krishvedagiri-q9r Ай бұрын
Content is very good. But why are we using median(), isn't it like mean() fits well in those cases, like "Finding relation between missing values vs SalesPrice" in the video.
@anshmahajan1247
@anshmahajan1247 3 жыл бұрын
Instead of using list comprehension, can select_dtypes() be used to select numerical, discrete and continuous features? I was wondering, specifically in the case of discrete features, whether using 25 as a threshold value is practical or not?
@nikhil_somani
@nikhil_somani 3 жыл бұрын
yes select_dtypes() is more handy and good to go. df.select_dtypes(exclude="O") in this case
@samuelnikhade5612
@samuelnikhade5612 4 жыл бұрын
This is a very helpful video, thanks a lot !!
@haintuvn
@haintuvn 4 жыл бұрын
Why you choose threshold of "25" in " discrete_feature=[feature for feature in numerical_features if len(dataset[feature].unique())
@thirunaidu8934
@thirunaidu8934 3 жыл бұрын
same question
@thirunaidu8934
@thirunaidu8934 3 жыл бұрын
That's experimental if you chose 24 you would be getting only 16 unique features for 25 you will be getting 17 and for 26,27,28... you would be getting 17 features only so there is no change in number of features so that's the reason he chose 25.
@kudaykumar1261
@kudaykumar1261 4 жыл бұрын
Sir, here differentiate the continous and discrete variable in the numerical variables bases on the unique variable counts(here take the threshold values be 25) so what it exactly means ? Actually by the previous understanding of your videos, variable the discreate variables those are whole numbers and we can count those (Num of schools in city), where as continous variable there will be a range of values (Height of person, there may be whole numbers and float values also). So how can we differentiate these variable based on Unique( ) function on the code ?????
@Mish-333
@Mish-333 Ай бұрын
Good explanation of the codes, but if you look closely, he's not explained the logic behind the codes meaning - did not explain the real reasons of why the particular steps are followed, or say what's the ultimate goal of the code/s.
@zohaibiqbal589
@zohaibiqbal589 4 жыл бұрын
sir I tried this project for practice but my "plot bar" shows same color for every field, how can I differentiate them??
@GamerBoy-ii4jc
@GamerBoy-ii4jc 3 жыл бұрын
yeah same here you can try the following code as well Instead of this code: data.groupby(feature)['SalePrice'].median().plot.bar() You can use this one: :sns.barplot(x=feature, y='SalePrice', data=data, ci=False, estimator=np.median)
@manikumardonepudi665
@manikumardonepudi665 5 жыл бұрын
thank you so much bro... for making a pipeline on linear regression. if possible make a pipeline on logistic also.................thank q
@mmudgal33
@mmudgal33 3 жыл бұрын
please make a video about hackathon project from anlytics vidya held last weekend. its about employee retention. it has lot of problem and its in every part of project.
@rupayan21
@rupayan21 4 жыл бұрын
Any reason, why median is being considered and not average?
@bagavathypriya4628
@bagavathypriya4628 4 жыл бұрын
Sir it's better if you also include the kaggle dataset link in the description box.
@km02-cr7
@km02-cr7 Жыл бұрын
Yes we need kaggle dataset link
@km02-cr7
@km02-cr7 Жыл бұрын
Plz share link sir ,it's great help
@karanarora9341
@karanarora9341 5 жыл бұрын
Great job!!
@UpendraKumar-mu7mk
@UpendraKumar-mu7mk 3 жыл бұрын
why you are using median to view discrete features statistics. Should'nt mean have been better.
@me_debankan4178
@me_debankan4178 2 жыл бұрын
14:56 I didn't understand dataset.groupby('yrsold')['salePrice'].median().plot() how it is labelling the graph, why we can't simply Just use this plt.plot() ?
@Samverma-h3z
@Samverma-h3z 6 ай бұрын
Sir, why I am getting an error in finding missing value like featuers is not defined While I copied as your coding please guide me
@pushpitkumar99
@pushpitkumar99 3 жыл бұрын
Ok i'm lost. Aren't continuous variables floating point numbers? Like they fall in some interval. But all the features you took as continuous are not floating type. Please explain.
@lakshyarajsinghrathore1902
@lakshyarajsinghrathore1902 Жыл бұрын
why in my jupyterlab all the bars having same colour. like all the features are having the same blue color instead of different colours, all the code is same, even i have downloaded and re run the notebook.
@khushboochhabra2136
@khushboochhabra2136 Жыл бұрын
At 16:18 why did we subtract the year feature from year sold?
@ananthkumar8901
@ananthkumar8901 Жыл бұрын
To detemine the year Difference and comparing with House sales. To check if there are any relationship between them
@khushboochhabra2136
@khushboochhabra2136 Жыл бұрын
@@ananthkumar8901 can you please help me understand: to get the relationship, we could have plotted both years and the sales amount....how can difference bring up the relationship?
@ananthkumar8901
@ananthkumar8901 Жыл бұрын
​@@khushboochhabra2136 Plotting just the year sold and year built wouldn't tell us much on its own. We need to see the age of the house to understand how that impacts price. That's why we subtract year built from year sold. This gives us the "house age", which shows how much time has passed since construction. By analyzing this difference, we can see if older houses generally have lower prices (due to depreciation) or if newer ones cost more. Simply plotting both years without this calculation wouldn't reveal this crucial relationship.
@kushswaroop7436
@kushswaroop7436 2 жыл бұрын
Great Video and Explanation
@ashishbhai5288
@ashishbhai5288 2 жыл бұрын
why he has taken the threshold value as 25 to define discrete features at timestamp 18:00? how 25 can separate discrete and continuous features from each other? how he has decided this value? if months are less than 25 then what happens? did you get it? please help me out
@kushswaroop7436
@kushswaroop7436 2 жыл бұрын
@@ashishbhai5288 : 25 is not separating. He meant to select discrete feature he has taken 25 as threshold of unique values, generally continuous variable would have 100s of values. you can keep keep it 10,50 whatever your choice. But as per this dataset I selected 15 as my threshold, although discrete feature has no big role to play you can altogether skip the step, I tried that I am getting 99.98% accuracy in both ways
@ashishbhai5288
@ashishbhai5288 2 жыл бұрын
@@kushswaroop7436 thanks bro ...Your feedback is really helpful. Now I don't have any doubt.
@CptPrick
@CptPrick 2 ай бұрын
Hi, I have a doubt about finding the relationship between missing values and sales price. In video it is shown that feature with missing values may have high median sales price, but isn't it possible that the sales price may be affected more by other features? Will Correlation Coefficient be much better here to find the relation between these features and SalePrice?
@CptPrick
@CptPrick 2 ай бұрын
This is the answer I got from ChatGPT(if someone else also has the same doubt): That's a great question! You're right that the sales price may indeed be influenced by other features, but the relationship between missing values and the target variable (SalePrice) can still provide valuable insights. When we use the median sale price to compare houses with missing values (marked as 1) versus those without (marked as 0), we're checking if the presence of missing data itself is correlated with higher or lower sales prices. This can sometimes suggest that missing values are not random. For example, missing values in a feature might indicate a higher-end house where certain details weren’t recorded because they weren’t needed, leading to higher prices. However, using the correlation coefficient would indeed provide a more formal, numerical measure of how strongly missingness in a feature is associated with SalePrice. But there’s a catch: correlation only works for numerical data, and missingness is binary (1 or 0). You’d likely use something like point-biserial correlation, which measures the relationship between a binary variable (missing or not) and a continuous variable (SalePrice). Both approaches have merit. The bar chart with median sale prices helps visually identify trends, while correlation offers a more quantifiable way to assess the strength of this relationship. If you're dealing with multiple features and suspect that others may have stronger correlations, using correlation (and even more advanced methods like multivariate analysis) could give a clearer picture. In summary, median values help detect patterns, but correlation can offer a deeper, numerical perspective. You could use both methods depending on the analysis depth you're aiming for.
@saxenarachit
@saxenarachit 5 жыл бұрын
Hi Krish, at 17:47 I am unable to understand the logic of differentiating Continuous and Discrete data. Like unique count less than 25 becomes discrete data?? Can you pl explain?
@krishnaik06
@krishnaik06 4 жыл бұрын
Yes
@saxenarachit
@saxenarachit 4 жыл бұрын
@@krishnaik06 how this happened? Any explanation to this logic please?
@sarthaktyagi6783
@sarthaktyagi6783 4 жыл бұрын
I also have same query . Can you please explain this ?
@shaelanderchauhan1963
@shaelanderchauhan1963 4 жыл бұрын
for i in numerical_features: #print(i,dataset[i].unique(),len(dataset[i].unique())) print(i,len(dataset[i].unique())) ________________________________________ for i in numerical_features: #print(i,dataset[i].unique(),len(dataset[i].unique())) if len(dataset[i].unique())
@sumatsrivastava2244
@sumatsrivastava2244 4 жыл бұрын
Sir same query please help us out
@vineethkumarsahu
@vineethkumarsahu 3 жыл бұрын
at 6:00 , shouldn't the code be ....isnull().sum() > 0 instead of '> 1' ? if no, pls be kind enough to explain briefly. thanks! :)
@kishlayamourya3141
@kishlayamourya3141 5 жыл бұрын
Thank you very much for this tutorial.... I was trying to work on this data set on kaggle but got struck but now it will be better👍
@rajatpuri3917
@rajatpuri3917 3 жыл бұрын
What is the name of the dataset on kaggle
@manojrangera
@manojrangera 3 жыл бұрын
@@rajatpuri3917 advanced house price dataset... Link also given in distription
@pothinenilaharipriya1321
@pothinenilaharipriya1321 Жыл бұрын
Hi sir can you please explain why did you consider 25 as a limit to the discrete variable.
@prabhatdass6184
@prabhatdass6184 5 жыл бұрын
Hello Krish, print(feature, np.round(dataset[feature].isnull().mean(), 4), ' % missing values') How this line is calculating the % missing values, How this mean is calculating, i tried to find on excel with filtering but not able to understand how the missing value percentage is coming?
@manthanrathod1046
@manthanrathod1046 4 жыл бұрын
Exactly. I too am watching this video and I stopped suddenly to figure it out. I don't understand since isnull() would output a true or false value and then mean? Like mean of what? The remaining values of the column? Because the NaN would not contribute to the mean. I think it is simply calculating the mean of the remaining values ignoring the NaN values and printing but it's ofcourse not % values as stated in the video.
@varunwalvekar9595
@varunwalvekar9595 4 жыл бұрын
@@manthanrathod1046 dataset[feature].isnull().mean() this line of code gives value for (number of null values in feature column/total number of values in the same column) for eg., lets say feature = 'abc' dataset['abc'].isnull().mean() , will print (number of null values in 'abc' column/total number of values in 'abc' column) this will give the percentage for null values in column 'abc'
@manthanrathod1046
@manthanrathod1046 4 жыл бұрын
@@varunwalvekar9595 ohh okayy. Got it Thanks
@me_debankan4178
@me_debankan4178 2 жыл бұрын
you can do this instead , giving same result : nullcol=[] for i in dataset: if(((dataset[i].isnull().sum())/1460)*100!=0.0): print(i,' ',np.round((dataset[i].isnull().sum()/1460)*100,2),'% missing') nullcol.append(i)
@unsharma9229
@unsharma9229 5 жыл бұрын
thank u for everything
@RaviKumar-mu4ne
@RaviKumar-mu4ne 2 жыл бұрын
if i have a very large dataset of size 40000 and 11 columns, what should be my unique values limit to consider it as discrete or contionous...just like here chris has taken 27 as the limi...what should i take
@newforest9985
@newforest9985 4 жыл бұрын
Please explain this line of code: data.groupby(feature)['SalePrice'].median().plot.bar()
@davitbuliskeria1324
@davitbuliskeria1324 4 жыл бұрын
Good question, I am also interested
@aravindp2176
@aravindp2176 4 жыл бұрын
It basically groups all the rows with the given feature. Let's say if the feature is 'Year' it will group all the rows with same year eg.2010. And it will calculate the median of the all sales price values in the given year and display it with bar graph
@ajaydanam-zi8yz
@ajaydanam-zi8yz 8 ай бұрын
Sir, Actually i have some doubts regarding this video. Is there any channel or any application to discuss on these.
@UpskillingAcademy
@UpskillingAcademy Ай бұрын
Hello, My name is Eka from Upskiling. I'm really interested in the content you create, and I would like to ask for your permission to use the link to your KZbin video as a resource on the Upskiling website. Please note that Upskiling will not repost or re-upload the video from your KZbin channel; we will only be sharing the link to the video. I hope to hear from you soon. Thank you! 🙏
@shashireddy7371
@shashireddy7371 4 жыл бұрын
Thanks for greate video Krish . I have 1 doubt which is metioned below. Why you have taken Median of salefprice while ploting / comparing with feature with SalePrice. eg: data.groupby(feature)['SalePrice'].median().plot.bar() plt.title(feature) plt.show() You have done same while checking relatioship between missing value against SalePrice and Discreate feature against SalePrice. We can directly plot missing value vs SalePrice and Discreate feature Vs SalePrice than whagt is the reasin for taking median of SalePrice . Please help me to undertand this . Thanks
@mlanhenke
@mlanhenke 4 жыл бұрын
I was wondering the same, especially when the annotation/comment explicitly tells 'compare to the mean sale-price'. As a comment further down states: the use of median prevents the influence of outliers compared to the mean, which made sense to me
@-Raviteja-fn2oi
@-Raviteja-fn2oi 2 жыл бұрын
iam getting bar plot only for MiscFeature others features are not appearing what shuld i do?
@louerleseigneur4532
@louerleseigneur4532 3 жыл бұрын
Thanks Krish
@vidityatyagi2748
@vidityatyagi2748 4 жыл бұрын
Hi, You have imputed the GarageYrBlt column with Median value. This column is a DateTime type, does it makes sense to impute such column with Median value ?
@sheikhsalman1873
@sheikhsalman1873 4 жыл бұрын
i just copied the same code to work with another house dataset . but when i start using for loop code for % null values. Its not working. Help me plaese
@YOGESHMULEY-n1j
@YOGESHMULEY-n1j Жыл бұрын
if we dont have any variable columns then ..?
@amolsawle2868
@amolsawle2868 4 жыл бұрын
Hi sir, Your videos are very good. Thank you for making it. Sir, Can we use seaborn lib to plot graph using for loop and why did you take 25 for extracting discrete data Thank you.
@ashishbhai5288
@ashishbhai5288 2 жыл бұрын
bro, I also have the same question... have you got the answer
@ashishbhai5288
@ashishbhai5288 2 жыл бұрын
why did he take 25 as the threshold value
@reegee8321
@reegee8321 4 жыл бұрын
Hi Krish, please can you do a video on image dataset, augmentation and class decomposition of images. Thanks
@abdurahman1019
@abdurahman1019 Жыл бұрын
I am a beginner at data science and python as well. Is it expected from me to be able to write all these code by myself or is the understanding enough for an interview?
@bharathjc4700
@bharathjc4700 5 жыл бұрын
Hi Sir, what statistical tests should we perform as soon we get the data
@nallalarajureddy8550
@nallalarajureddy8550 3 жыл бұрын
sir can you please provide literature survey report for this project i.e for house price prediction project
@8th_gen
@8th_gen 4 жыл бұрын
sir, Im a student working on a machine learning project to predict the delivery of goods (early or late) if its ordered again in the future.Since I have a lot of variables like weather ,road closure etc, which model best suits to predict the delivery
@dhirendrasingh7379
@dhirendrasingh7379 4 жыл бұрын
If the variables are non linear and linear , try decision trees(RF , XGboost )
@alankritreddy-e8z
@alankritreddy-e8z Жыл бұрын
Why do we write and what do you mean by none in pd.pandas .set _option (‘display.max_columns’, none )
@ananthkumar8901
@ananthkumar8901 Жыл бұрын
None: In this context, setting it to None means there is no limit on the number of columns to be displayed. It effectively tells Pandas to display all columns when showing a DataFrame. Source : ChatGPT 🙂
@jyotishranjanmallik2356
@jyotishranjanmallik2356 4 жыл бұрын
Sir, How do u get to know the parameter for choosing Discrete feature is those whose unique values is less than 25, incase some feature are discreet but if they contain 26 or 27 uniques values, then our code reject those features. plz guide.
@jyotishranjanmallik2356
@jyotishranjanmallik2356 4 жыл бұрын
understood sir, we have to check till no new feature gets added to dicreet_feature list, till 24 it is 16 and at 25 it gets 17 and after that there is no new dicreet feature gets added to disreet_feature list. You are awesome sir
@bendivanitha7211
@bendivanitha7211 2 жыл бұрын
Can we get accurate prediction with only one feature available plz reply
@arvindkumar-ug1zf
@arvindkumar-ug1zf 5 жыл бұрын
Great Sir !
@jitendrapradhan8495
@jitendrapradhan8495 Жыл бұрын
Hello Krish, I am following you for the last year for this ML. will these all-listed EDA videos help with a classification problem statement? if not do you have any FE,FS videos for both classification and Regression ML?
@niranjandhavan
@niranjandhavan 4 жыл бұрын
Great Content.
@Manojrohtela
@Manojrohtela 3 жыл бұрын
How bar plot is displaying in diff color ?
@shashankpandey1966
@shashankpandey1966 3 жыл бұрын
is feature here is inbuilt ???can someone please help me.
@hyperaktive100
@hyperaktive100 3 жыл бұрын
Hi Sir, one doubt. Why is the median Sales Price being considered in most bar plots?
@karanmthevar
@karanmthevar 4 жыл бұрын
Hi Krish Great Work. I am facing an issue. My bar graphs are showing same color as it is taking sale price as legend and not the other features. How do we change it?
@GauravMehra-up5ly
@GauravMehra-up5ly 4 жыл бұрын
yes same plz tell if you solved that problem
@yashsharma4521
@yashsharma4521 4 жыл бұрын
Hello sir, I am a big fan of your work but I have a question. The data science project does not start with the question. LIke Business problem >Data extraction >then all the mention parts in the youtube. Please help me to understand this thanks Yash Sharma
@parikshitrajpara5706
@parikshitrajpara5706 4 жыл бұрын
he is of no use replies to no one
@sartyakimanna9986
@sartyakimanna9986 4 жыл бұрын
why we are taking median for calculating
@PriyaAmar848
@PriyaAmar848 3 жыл бұрын
hiiiii, How are you doing Krish ? I could see different colors for barplot in every feature versus SalePrice plot. But I could not see the relevant code of color control within barplot. Could you please share. Tried searching in internet too. But the code uses range function for generating different colors. appreciate sharing the code.
@m.b.9496
@m.b.9496 4 жыл бұрын
Hey guys, when I Plot my Bars, i get all the bars in the same colour, how did he do it?
@sunilsharanappa7721
@sunilsharanappa7721 4 жыл бұрын
Usually when you plot from dataframe we get same color if we use series we get different color. you can use below code to see same graph in different color. plt.figure(figsize=(20,25)) i=0 LabelName_with_na=[features for features in house.columns if house[features].isnull().sum()>1] for feature in LabelName_with_na: data = house.copy() # let's make a variable that indicates 1 if the observation was missing or zero otherwise data[feature] = np.where(data[feature].isnull(), 1, 0) # let's calculate the mean SalePrice where the information is missing or present bplot=data.groupby(feature)['SalePrice'].median() if i!=16: plt.subplot(4,4,i+1) plt.title(feature) bplot.plot.bar(color=plt.cm.Paired(np.arange(len(data)))) i=i+1
@piratedartist332
@piratedartist332 4 жыл бұрын
Hi Krish, I just have a doubt is that why did you choose the median over here "data.groupby(feature)['SalePrice'].median().plot.bar()" why can't we go for count??
@harishkumar-zx6vg
@harishkumar-zx6vg 4 жыл бұрын
Because of outliers
@Sathwik-fh8uc
@Sathwik-fh8uc 9 ай бұрын
while finding the SalePrice with respect to YrSold why median(), why not mean(). pls can anyone explain why
@moulavb3932
@moulavb3932 5 жыл бұрын
for feature in feature_na: print(feature,np.round(data[feature].isnull().mean(),4),"% missing value") in this code what the use mean(),4?
@adityachandra2462
@adityachandra2462 5 жыл бұрын
It will return the mean with 4 digits after decimal point for features having null values
@liyekting2189
@liyekting2189 4 жыл бұрын
@@adityachandra2462 what does the mean here means ? how do it relates to percentage of missing value ? sorry i didnt get it
@varunwalvekar9595
@varunwalvekar9595 4 жыл бұрын
​@@liyekting2189 dataset[feature].isnull().mean() this line of code gives value for (number of null values in feature column/total number of values in the same column) for eg., lets say feature = 'abc' dataset['abc'].isnull().mean() , will print (number of null values in 'abc' column/total number of values in 'abc' column) this will give the percentage for null values in column 'abc'
@keshavsharma-pq4vc
@keshavsharma-pq4vc 4 жыл бұрын
4 is after decimal values like 54.9898 it take only 4 decimal values after point( . )
@shivambhayre5056
@shivambhayre5056 5 жыл бұрын
Sir why do we need to copy data i mean in this code many times you have copy data why??
@amritgurung3410
@amritgurung3410 5 жыл бұрын
Will be easier to access values of particular features for plotting or other analysis otherwise we will only have list of features with no access to its values. We can also use dataframe variable used to read csv file instead of copying it. But making a separate copy like data = dataset.copy() is simply for ease. If we don't copy like above we can simply use dataset varaible which represent dataframe. For eg. dataset[feature]
@keerthivasan1277
@keerthivasan1277 5 жыл бұрын
Good job sir
@RS-cn2cf
@RS-cn2cf 4 жыл бұрын
hi, can somebody tell how these bar graphs for '## Lets Find the realtionship between them and Sale PRice' are colored, I cannot see any code for then in repository.
@GauravMehra-up5ly
@GauravMehra-up5ly 4 жыл бұрын
yeah same my bar graphs are not showing different colors
@keshavsharma-pq4vc
@keshavsharma-pq4vc 4 жыл бұрын
you can use seaborn library sns.barplot(x=feature,y='SalePrice',data=data,ci=False)
@MMOI96
@MMOI96 2 жыл бұрын
where can i find the dataset??
@vinr
@vinr 3 жыл бұрын
Really good video
Advance House Price Prediction-Feature Engineering Part 1
14:12
Krish Naik
Рет қаралды 82 М.
Exploratory Data Analysis with Pandas Python
40:22
Rob Mulla
Рет қаралды 520 М.
Exploratory Data Analysis
5:02
IBM Technology
Рет қаралды 70 М.
All Machine Learning algorithms explained in 17 min
16:30
Infinite Codes
Рет қаралды 527 М.
Data Analytics vs Data Science
6:30
IBM Technology
Рет қаралды 582 М.
Stanford's FREE data science book and course are the best yet
4:52
Python Programmer
Рет қаралды 714 М.
I Tried 50 Data Analyst Courses. Here Are Top 5
8:41
Stefanovic
Рет қаралды 291 М.