Sir, this is one of your best video series.Every thing are implemented together not in pieces. Can't wait to watch this videos series together .Hope you soon upload all the remaining parts. Great Work.
@theoutlet93004 жыл бұрын
this is so clean. i cant look at my nb now. this series is a gem. i have learned so much
@vengalrao57723 жыл бұрын
Bro I want to dm u like I need an roadmapfor data science... please suggest me.. Iam fully confused and I am doing self-study
@amritgurung34105 жыл бұрын
This is third notebook i read for this topic from kaggle. And yours was simplest to understand with straightforward code. Thank you for this notebook. Highly appreciated!!
@shivaprakashranga86885 жыл бұрын
at 5.26 the code has to be feature_withNa = [fea for fea in dataset.columns if dataset[fea].isnull().sum()>=1] . becoz electrical (feature) has one missing value.
@arjyabasu13114 жыл бұрын
Yes i agree with you
@piratedartist3324 жыл бұрын
If any feature is having only one missing value then we can ignore that and can fill that. Maybe that is why he took sum()>1
@iamjm_rishav78944 жыл бұрын
could you tell me if its necessary to examine the data before even starting the notebook!!
@tsungchewu4 жыл бұрын
Shivaprakash, I totally agree with you. That's why totally I have 19 features have null values, but Krish only had 18 null_value features.
@Aman-yu4re7 ай бұрын
In the start when Sir took percentage of missing values, just to clarify misunderstandings they are not actually percentages, they are ratios of the missing values (to all the values) .
@srujanachalla87974 жыл бұрын
Hi Krish, this series was too good,end-end implementation for solving a problem. Followed the same steps in my Project and it had lot of appreciation from my Managers. All Credit goes to you. You are just Awesome .Thank You so much.
@harshstrum5 жыл бұрын
Thank you krish bhaiya lots and lots of blessings...Please never stop doing this...I have completed two projects by seeing your videos and got learn a lot many things from you. Thanks again.
@junaidarshad84659 ай бұрын
For the mean calculation to find out the missing values, do you need to multiply by 100 to get the percentage?
@amiraaliIsInLove3 жыл бұрын
I can't believe I've never come across your channel before! Thank you so much! 😍😍😍😍😍😍😍
@abhishek-shrm5 жыл бұрын
Sir how you make 2 videos every day? It has now become a new habit of mine to eat, code, sleep and watch your videos. I love your videos.
@dentrifications3 жыл бұрын
@@saransh5760 yes
@dentrifications3 жыл бұрын
Did u get a job now
@mungarasaikishore32483 жыл бұрын
Sir , thank you very much for the videos ,these are very help full, more than some paid online courses.
@elukasreeja72084 жыл бұрын
This is amazing got to learn many possible things and tricks in eda. Thanks a lot , your videos are just awesome!! have been following regularly
@nikhil_somani3 жыл бұрын
missing values percentage can be checked using more simpler code df.isna().sum()* 100 / len(df)
@deepaklonare9497 Жыл бұрын
thanks😊
@snehalsanap17502 жыл бұрын
sir what an explanation, this was by far the best video on eda loved it... you are truly an inspiration sir! keep going
@ashishbhai52882 жыл бұрын
why he has taken the threshold value as 25 to define discrete features at timestamp 18:00? how 25 can separate discrete and continuous features from each other? how he has decided this value? if months are less than 25 then what happens? did you get it? please help me out
@manusreeg49993 жыл бұрын
str' object is not callable i'm getting thus error at 7:55 after using the code.. I'm a beginner can anyone help me pls?
@salihsarii Жыл бұрын
This is the best EDA series . Thanks Krish :)
@mohit000singh5 жыл бұрын
Thank you sir!!! for sharing whole procedure. Waiting for the further parts...
@adwaitpatil83003 жыл бұрын
Helped me a lot i recently enrolled for this competition and is my first time working on more than 20 columns but the way u structured your eda great THANKS!
@ashishbhai52882 жыл бұрын
bro, can you please explain to me why he has taken a value less than 25 for defining discrete_features... I didn't get it
@adwaitpatil83002 жыл бұрын
@@ashishbhai5288 I need to rewatch the video 😂 it's been a year
@ashishbhai52882 жыл бұрын
@@adwaitpatil8300 if you get some free time then please have a look over it and explain if possible.... dev tujha bhala karo😁😁
@ashishbhai52882 жыл бұрын
@@adwaitpatil8300 timetsamp 18:00
@ShivShankarDutta14 жыл бұрын
One of best EDA, Excellent Analysis. Thanks Krish.
@ittecheval18683 жыл бұрын
your explaination is not saying, "why you are doing and what in each step". I feel, you are just walkthrough or kind of KT on what you have done.. It confuse and demotivate the new learner sir. I watched almost 40+ of videos of yours..
@sfodjknfwoa3 жыл бұрын
Thank you, I've learned so much already
@adityachandra24625 жыл бұрын
Great job done sir, it's really helpful !
@sunilsharanappa77214 жыл бұрын
wow superb explanation Krish. Thanks.
@deepakkumarpatel57284 жыл бұрын
This is the best video till data for EDA
@hiteshdamani305 жыл бұрын
Sir please upload video on AUC ROC metrics and Regression Error metrics..... M waiting for it since long
@arshkatyal28074 жыл бұрын
Hi Krish I had a doubt. I didn't understand the relationship found between dependent variables and features having missing values. Please could you explain a bit more
@Paragparashar032 жыл бұрын
It is a simple analysis that if only null values ( =1) are taken into consideration, will it affect the dependent variable? if not, then we can consider dropping those values later. And all other values are taken as 0 so that these values won't affect the result during this analysis step.
@rohitbharti93604 жыл бұрын
Very useful information for new learners..... Thanks sir
@sandipansarkar92114 жыл бұрын
Awesome video for beginners in data science career.Thanks
@TirtharajSen Жыл бұрын
There are mistakes here in this tutorial : 1) the identification of the features that have NaN values. There we must consider a feature to be 1 if there is at least 1 NaN value in the column. But there in the video, it is done >1 which means it needs to have 2 NaN values to be identified as a 1 2) The discrete value identification process is vague.. that is no concrete rule for identifying discrete variable features.
@rohitjaiswal61025 жыл бұрын
Thank you so much sir for your great help to us....
@berkaycamur82823 жыл бұрын
Are all these percentage expressions an indication of a null expression?(I'm talking about 07:25 ). What does mean all those percentages?
@ayansrivastava7222 жыл бұрын
if anyone is reading this comment, do ensure ...isnull().mean()*100 . i.e multiply by 100 too! the percentage is then correct. If u do sns.heatmap(df.isnull()) and look at second last or 3rd last feature, you're know :)
Thanks ....you are big support for learning ......
@ashishbhai52882 жыл бұрын
Sir, why have you taken the threshold value as 25 to define discrete features at timestamp 18:00. how 25 can separate discrete and continuous features from each other. many of us have the same query.
@sudhanshuparab1983 жыл бұрын
Amazing work sir
@bhooshan253 жыл бұрын
learn a lot. will watch again while practicing in jyupter.
@SeluDisplay2 жыл бұрын
i dont get the point of 8:20, u should just use the feature thats the most important/representative or got biggest weight in the dataset?
@ashishbhai52882 жыл бұрын
why he has taken the threshold value as 25 to define discrete features at timestamp 18:00? how 25 can separate discrete and continuous features from each other? how he has decided this value? if months are less than 25 then what happens? did you get it? please help me out
@geethamadhurimalempati64085 жыл бұрын
can anyone explain at 9:12, why median, can we have mean or variance or something else?(In comments, it is written as let's calculate the mean value of sales price but in code, it is written as median]
@rakeshranjan97284 жыл бұрын
As per my understanding, if we use mean it is heavily influenced by outliers, so we prefer to go with median, ie, the middle value.
@shaelanderchauhan19634 жыл бұрын
@@rakeshranjan9728can we use the median in Normal Distribution for Anomaly detection?
@krishvedagiri-q9rАй бұрын
Content is very good. But why are we using median(), isn't it like mean() fits well in those cases, like "Finding relation between missing values vs SalesPrice" in the video.
@anshmahajan12473 жыл бұрын
Instead of using list comprehension, can select_dtypes() be used to select numerical, discrete and continuous features? I was wondering, specifically in the case of discrete features, whether using 25 as a threshold value is practical or not?
@nikhil_somani3 жыл бұрын
yes select_dtypes() is more handy and good to go. df.select_dtypes(exclude="O") in this case
@samuelnikhade56124 жыл бұрын
This is a very helpful video, thanks a lot !!
@haintuvn4 жыл бұрын
Why you choose threshold of "25" in " discrete_feature=[feature for feature in numerical_features if len(dataset[feature].unique())
@thirunaidu89343 жыл бұрын
same question
@thirunaidu89343 жыл бұрын
That's experimental if you chose 24 you would be getting only 16 unique features for 25 you will be getting 17 and for 26,27,28... you would be getting 17 features only so there is no change in number of features so that's the reason he chose 25.
@kudaykumar12614 жыл бұрын
Sir, here differentiate the continous and discrete variable in the numerical variables bases on the unique variable counts(here take the threshold values be 25) so what it exactly means ? Actually by the previous understanding of your videos, variable the discreate variables those are whole numbers and we can count those (Num of schools in city), where as continous variable there will be a range of values (Height of person, there may be whole numbers and float values also). So how can we differentiate these variable based on Unique( ) function on the code ?????
@Mish-333Ай бұрын
Good explanation of the codes, but if you look closely, he's not explained the logic behind the codes meaning - did not explain the real reasons of why the particular steps are followed, or say what's the ultimate goal of the code/s.
@zohaibiqbal5894 жыл бұрын
sir I tried this project for practice but my "plot bar" shows same color for every field, how can I differentiate them??
@GamerBoy-ii4jc3 жыл бұрын
yeah same here you can try the following code as well Instead of this code: data.groupby(feature)['SalePrice'].median().plot.bar() You can use this one: :sns.barplot(x=feature, y='SalePrice', data=data, ci=False, estimator=np.median)
@manikumardonepudi6655 жыл бұрын
thank you so much bro... for making a pipeline on linear regression. if possible make a pipeline on logistic also.................thank q
@mmudgal333 жыл бұрын
please make a video about hackathon project from anlytics vidya held last weekend. its about employee retention. it has lot of problem and its in every part of project.
@rupayan214 жыл бұрын
Any reason, why median is being considered and not average?
@bagavathypriya46284 жыл бұрын
Sir it's better if you also include the kaggle dataset link in the description box.
@km02-cr7 Жыл бұрын
Yes we need kaggle dataset link
@km02-cr7 Жыл бұрын
Plz share link sir ,it's great help
@karanarora93415 жыл бұрын
Great job!!
@UpendraKumar-mu7mk3 жыл бұрын
why you are using median to view discrete features statistics. Should'nt mean have been better.
@me_debankan41782 жыл бұрын
14:56 I didn't understand dataset.groupby('yrsold')['salePrice'].median().plot() how it is labelling the graph, why we can't simply Just use this plt.plot() ?
@Samverma-h3z6 ай бұрын
Sir, why I am getting an error in finding missing value like featuers is not defined While I copied as your coding please guide me
@pushpitkumar993 жыл бұрын
Ok i'm lost. Aren't continuous variables floating point numbers? Like they fall in some interval. But all the features you took as continuous are not floating type. Please explain.
@lakshyarajsinghrathore1902 Жыл бұрын
why in my jupyterlab all the bars having same colour. like all the features are having the same blue color instead of different colours, all the code is same, even i have downloaded and re run the notebook.
@khushboochhabra2136 Жыл бұрын
At 16:18 why did we subtract the year feature from year sold?
@ananthkumar8901 Жыл бұрын
To detemine the year Difference and comparing with House sales. To check if there are any relationship between them
@khushboochhabra2136 Жыл бұрын
@@ananthkumar8901 can you please help me understand: to get the relationship, we could have plotted both years and the sales amount....how can difference bring up the relationship?
@ananthkumar8901 Жыл бұрын
@@khushboochhabra2136 Plotting just the year sold and year built wouldn't tell us much on its own. We need to see the age of the house to understand how that impacts price. That's why we subtract year built from year sold. This gives us the "house age", which shows how much time has passed since construction. By analyzing this difference, we can see if older houses generally have lower prices (due to depreciation) or if newer ones cost more. Simply plotting both years without this calculation wouldn't reveal this crucial relationship.
@kushswaroop74362 жыл бұрын
Great Video and Explanation
@ashishbhai52882 жыл бұрын
why he has taken the threshold value as 25 to define discrete features at timestamp 18:00? how 25 can separate discrete and continuous features from each other? how he has decided this value? if months are less than 25 then what happens? did you get it? please help me out
@kushswaroop74362 жыл бұрын
@@ashishbhai5288 : 25 is not separating. He meant to select discrete feature he has taken 25 as threshold of unique values, generally continuous variable would have 100s of values. you can keep keep it 10,50 whatever your choice. But as per this dataset I selected 15 as my threshold, although discrete feature has no big role to play you can altogether skip the step, I tried that I am getting 99.98% accuracy in both ways
@ashishbhai52882 жыл бұрын
@@kushswaroop7436 thanks bro ...Your feedback is really helpful. Now I don't have any doubt.
@CptPrick2 ай бұрын
Hi, I have a doubt about finding the relationship between missing values and sales price. In video it is shown that feature with missing values may have high median sales price, but isn't it possible that the sales price may be affected more by other features? Will Correlation Coefficient be much better here to find the relation between these features and SalePrice?
@CptPrick2 ай бұрын
This is the answer I got from ChatGPT(if someone else also has the same doubt): That's a great question! You're right that the sales price may indeed be influenced by other features, but the relationship between missing values and the target variable (SalePrice) can still provide valuable insights. When we use the median sale price to compare houses with missing values (marked as 1) versus those without (marked as 0), we're checking if the presence of missing data itself is correlated with higher or lower sales prices. This can sometimes suggest that missing values are not random. For example, missing values in a feature might indicate a higher-end house where certain details weren’t recorded because they weren’t needed, leading to higher prices. However, using the correlation coefficient would indeed provide a more formal, numerical measure of how strongly missingness in a feature is associated with SalePrice. But there’s a catch: correlation only works for numerical data, and missingness is binary (1 or 0). You’d likely use something like point-biserial correlation, which measures the relationship between a binary variable (missing or not) and a continuous variable (SalePrice). Both approaches have merit. The bar chart with median sale prices helps visually identify trends, while correlation offers a more quantifiable way to assess the strength of this relationship. If you're dealing with multiple features and suspect that others may have stronger correlations, using correlation (and even more advanced methods like multivariate analysis) could give a clearer picture. In summary, median values help detect patterns, but correlation can offer a deeper, numerical perspective. You could use both methods depending on the analysis depth you're aiming for.
@saxenarachit5 жыл бұрын
Hi Krish, at 17:47 I am unable to understand the logic of differentiating Continuous and Discrete data. Like unique count less than 25 becomes discrete data?? Can you pl explain?
@krishnaik064 жыл бұрын
Yes
@saxenarachit4 жыл бұрын
@@krishnaik06 how this happened? Any explanation to this logic please?
@sarthaktyagi67834 жыл бұрын
I also have same query . Can you please explain this ?
@shaelanderchauhan19634 жыл бұрын
for i in numerical_features: #print(i,dataset[i].unique(),len(dataset[i].unique())) print(i,len(dataset[i].unique())) ________________________________________ for i in numerical_features: #print(i,dataset[i].unique(),len(dataset[i].unique())) if len(dataset[i].unique())
@sumatsrivastava22444 жыл бұрын
Sir same query please help us out
@vineethkumarsahu3 жыл бұрын
at 6:00 , shouldn't the code be ....isnull().sum() > 0 instead of '> 1' ? if no, pls be kind enough to explain briefly. thanks! :)
@kishlayamourya31415 жыл бұрын
Thank you very much for this tutorial.... I was trying to work on this data set on kaggle but got struck but now it will be better👍
@rajatpuri39173 жыл бұрын
What is the name of the dataset on kaggle
@manojrangera3 жыл бұрын
@@rajatpuri3917 advanced house price dataset... Link also given in distription
@pothinenilaharipriya1321 Жыл бұрын
Hi sir can you please explain why did you consider 25 as a limit to the discrete variable.
@prabhatdass61845 жыл бұрын
Hello Krish, print(feature, np.round(dataset[feature].isnull().mean(), 4), ' % missing values') How this line is calculating the % missing values, How this mean is calculating, i tried to find on excel with filtering but not able to understand how the missing value percentage is coming?
@manthanrathod10464 жыл бұрын
Exactly. I too am watching this video and I stopped suddenly to figure it out. I don't understand since isnull() would output a true or false value and then mean? Like mean of what? The remaining values of the column? Because the NaN would not contribute to the mean. I think it is simply calculating the mean of the remaining values ignoring the NaN values and printing but it's ofcourse not % values as stated in the video.
@varunwalvekar95954 жыл бұрын
@@manthanrathod1046 dataset[feature].isnull().mean() this line of code gives value for (number of null values in feature column/total number of values in the same column) for eg., lets say feature = 'abc' dataset['abc'].isnull().mean() , will print (number of null values in 'abc' column/total number of values in 'abc' column) this will give the percentage for null values in column 'abc'
@manthanrathod10464 жыл бұрын
@@varunwalvekar9595 ohh okayy. Got it Thanks
@me_debankan41782 жыл бұрын
you can do this instead , giving same result : nullcol=[] for i in dataset: if(((dataset[i].isnull().sum())/1460)*100!=0.0): print(i,' ',np.round((dataset[i].isnull().sum()/1460)*100,2),'% missing') nullcol.append(i)
@unsharma92295 жыл бұрын
thank u for everything
@RaviKumar-mu4ne2 жыл бұрын
if i have a very large dataset of size 40000 and 11 columns, what should be my unique values limit to consider it as discrete or contionous...just like here chris has taken 27 as the limi...what should i take
@newforest99854 жыл бұрын
Please explain this line of code: data.groupby(feature)['SalePrice'].median().plot.bar()
@davitbuliskeria13244 жыл бұрын
Good question, I am also interested
@aravindp21764 жыл бұрын
It basically groups all the rows with the given feature. Let's say if the feature is 'Year' it will group all the rows with same year eg.2010. And it will calculate the median of the all sales price values in the given year and display it with bar graph
@ajaydanam-zi8yz8 ай бұрын
Sir, Actually i have some doubts regarding this video. Is there any channel or any application to discuss on these.
@UpskillingAcademyАй бұрын
Hello, My name is Eka from Upskiling. I'm really interested in the content you create, and I would like to ask for your permission to use the link to your KZbin video as a resource on the Upskiling website. Please note that Upskiling will not repost or re-upload the video from your KZbin channel; we will only be sharing the link to the video. I hope to hear from you soon. Thank you! 🙏
@shashireddy73714 жыл бұрын
Thanks for greate video Krish . I have 1 doubt which is metioned below. Why you have taken Median of salefprice while ploting / comparing with feature with SalePrice. eg: data.groupby(feature)['SalePrice'].median().plot.bar() plt.title(feature) plt.show() You have done same while checking relatioship between missing value against SalePrice and Discreate feature against SalePrice. We can directly plot missing value vs SalePrice and Discreate feature Vs SalePrice than whagt is the reasin for taking median of SalePrice . Please help me to undertand this . Thanks
@mlanhenke4 жыл бұрын
I was wondering the same, especially when the annotation/comment explicitly tells 'compare to the mean sale-price'. As a comment further down states: the use of median prevents the influence of outliers compared to the mean, which made sense to me
@-Raviteja-fn2oi2 жыл бұрын
iam getting bar plot only for MiscFeature others features are not appearing what shuld i do?
@louerleseigneur45323 жыл бұрын
Thanks Krish
@vidityatyagi27484 жыл бұрын
Hi, You have imputed the GarageYrBlt column with Median value. This column is a DateTime type, does it makes sense to impute such column with Median value ?
@sheikhsalman18734 жыл бұрын
i just copied the same code to work with another house dataset . but when i start using for loop code for % null values. Its not working. Help me plaese
@YOGESHMULEY-n1j Жыл бұрын
if we dont have any variable columns then ..?
@amolsawle28684 жыл бұрын
Hi sir, Your videos are very good. Thank you for making it. Sir, Can we use seaborn lib to plot graph using for loop and why did you take 25 for extracting discrete data Thank you.
@ashishbhai52882 жыл бұрын
bro, I also have the same question... have you got the answer
@ashishbhai52882 жыл бұрын
why did he take 25 as the threshold value
@reegee83214 жыл бұрын
Hi Krish, please can you do a video on image dataset, augmentation and class decomposition of images. Thanks
@abdurahman1019 Жыл бұрын
I am a beginner at data science and python as well. Is it expected from me to be able to write all these code by myself or is the understanding enough for an interview?
@bharathjc47005 жыл бұрын
Hi Sir, what statistical tests should we perform as soon we get the data
@nallalarajureddy85503 жыл бұрын
sir can you please provide literature survey report for this project i.e for house price prediction project
@8th_gen4 жыл бұрын
sir, Im a student working on a machine learning project to predict the delivery of goods (early or late) if its ordered again in the future.Since I have a lot of variables like weather ,road closure etc, which model best suits to predict the delivery
@dhirendrasingh73794 жыл бұрын
If the variables are non linear and linear , try decision trees(RF , XGboost )
@alankritreddy-e8z Жыл бұрын
Why do we write and what do you mean by none in pd.pandas .set _option (‘display.max_columns’, none )
@ananthkumar8901 Жыл бұрын
None: In this context, setting it to None means there is no limit on the number of columns to be displayed. It effectively tells Pandas to display all columns when showing a DataFrame. Source : ChatGPT 🙂
@jyotishranjanmallik23564 жыл бұрын
Sir, How do u get to know the parameter for choosing Discrete feature is those whose unique values is less than 25, incase some feature are discreet but if they contain 26 or 27 uniques values, then our code reject those features. plz guide.
@jyotishranjanmallik23564 жыл бұрын
understood sir, we have to check till no new feature gets added to dicreet_feature list, till 24 it is 16 and at 25 it gets 17 and after that there is no new dicreet feature gets added to disreet_feature list. You are awesome sir
@bendivanitha72112 жыл бұрын
Can we get accurate prediction with only one feature available plz reply
@arvindkumar-ug1zf5 жыл бұрын
Great Sir !
@jitendrapradhan8495 Жыл бұрын
Hello Krish, I am following you for the last year for this ML. will these all-listed EDA videos help with a classification problem statement? if not do you have any FE,FS videos for both classification and Regression ML?
@niranjandhavan4 жыл бұрын
Great Content.
@Manojrohtela3 жыл бұрын
How bar plot is displaying in diff color ?
@shashankpandey19663 жыл бұрын
is feature here is inbuilt ???can someone please help me.
@hyperaktive1003 жыл бұрын
Hi Sir, one doubt. Why is the median Sales Price being considered in most bar plots?
@karanmthevar4 жыл бұрын
Hi Krish Great Work. I am facing an issue. My bar graphs are showing same color as it is taking sale price as legend and not the other features. How do we change it?
@GauravMehra-up5ly4 жыл бұрын
yes same plz tell if you solved that problem
@yashsharma45214 жыл бұрын
Hello sir, I am a big fan of your work but I have a question. The data science project does not start with the question. LIke Business problem >Data extraction >then all the mention parts in the youtube. Please help me to understand this thanks Yash Sharma
@parikshitrajpara57064 жыл бұрын
he is of no use replies to no one
@sartyakimanna99864 жыл бұрын
why we are taking median for calculating
@PriyaAmar8483 жыл бұрын
hiiiii, How are you doing Krish ? I could see different colors for barplot in every feature versus SalePrice plot. But I could not see the relevant code of color control within barplot. Could you please share. Tried searching in internet too. But the code uses range function for generating different colors. appreciate sharing the code.
@m.b.94964 жыл бұрын
Hey guys, when I Plot my Bars, i get all the bars in the same colour, how did he do it?
@sunilsharanappa77214 жыл бұрын
Usually when you plot from dataframe we get same color if we use series we get different color. you can use below code to see same graph in different color. plt.figure(figsize=(20,25)) i=0 LabelName_with_na=[features for features in house.columns if house[features].isnull().sum()>1] for feature in LabelName_with_na: data = house.copy() # let's make a variable that indicates 1 if the observation was missing or zero otherwise data[feature] = np.where(data[feature].isnull(), 1, 0) # let's calculate the mean SalePrice where the information is missing or present bplot=data.groupby(feature)['SalePrice'].median() if i!=16: plt.subplot(4,4,i+1) plt.title(feature) bplot.plot.bar(color=plt.cm.Paired(np.arange(len(data)))) i=i+1
@piratedartist3324 жыл бұрын
Hi Krish, I just have a doubt is that why did you choose the median over here "data.groupby(feature)['SalePrice'].median().plot.bar()" why can't we go for count??
@harishkumar-zx6vg4 жыл бұрын
Because of outliers
@Sathwik-fh8uc9 ай бұрын
while finding the SalePrice with respect to YrSold why median(), why not mean(). pls can anyone explain why
@moulavb39325 жыл бұрын
for feature in feature_na: print(feature,np.round(data[feature].isnull().mean(),4),"% missing value") in this code what the use mean(),4?
@adityachandra24625 жыл бұрын
It will return the mean with 4 digits after decimal point for features having null values
@liyekting21894 жыл бұрын
@@adityachandra2462 what does the mean here means ? how do it relates to percentage of missing value ? sorry i didnt get it
@varunwalvekar95954 жыл бұрын
@@liyekting2189 dataset[feature].isnull().mean() this line of code gives value for (number of null values in feature column/total number of values in the same column) for eg., lets say feature = 'abc' dataset['abc'].isnull().mean() , will print (number of null values in 'abc' column/total number of values in 'abc' column) this will give the percentage for null values in column 'abc'
@keshavsharma-pq4vc4 жыл бұрын
4 is after decimal values like 54.9898 it take only 4 decimal values after point( . )
@shivambhayre50565 жыл бұрын
Sir why do we need to copy data i mean in this code many times you have copy data why??
@amritgurung34105 жыл бұрын
Will be easier to access values of particular features for plotting or other analysis otherwise we will only have list of features with no access to its values. We can also use dataframe variable used to read csv file instead of copying it. But making a separate copy like data = dataset.copy() is simply for ease. If we don't copy like above we can simply use dataset varaible which represent dataframe. For eg. dataset[feature]
@keerthivasan12775 жыл бұрын
Good job sir
@RS-cn2cf4 жыл бұрын
hi, can somebody tell how these bar graphs for '## Lets Find the realtionship between them and Sale PRice' are colored, I cannot see any code for then in repository.
@GauravMehra-up5ly4 жыл бұрын
yeah same my bar graphs are not showing different colors
@keshavsharma-pq4vc4 жыл бұрын
you can use seaborn library sns.barplot(x=feature,y='SalePrice',data=data,ci=False)