Kaggle Competition - House Prices: Advanced Regression Techniques Part1

  Рет қаралды 293,683

Krish Naik

Krish Naik

Күн бұрын

Пікірлер: 321
@ШопенШумит
@ШопенШумит 5 жыл бұрын
You're the best! My RSME dropped dramatically after watching your video from 0.20 to 0.14. Thanks to XGBoost and you! Thanks a lot!
@shivalikapatel7222
@shivalikapatel7222 4 жыл бұрын
i'm new to data science field, could you please guide me a lil regarding how we decide to choose XGboost instead of something like linear regression?
@rajeevmayekar1775
@rajeevmayekar1775 3 жыл бұрын
@@shivalikapatel7222 check krish video search.... evalml krish naik on youtube .... which algo is best for the model ....evalml helps
@Dynamite_mohit
@Dynamite_mohit 4 жыл бұрын
Here is the confession, I didn't liked your videos earlier. But this is series and the way you explained things with why and how is awsome. Thanks alot for this. helped me clear multiple doubts. and a note to the viewer(audience) i have never commented on a youtube video. But this guy deserves it
@chetannalinde1441
@chetannalinde1441 5 жыл бұрын
Thank a ton, Krish. This is what I was exactly looking to start with Kaggle, keep the awesome work coming. Looking forward to more such videos.
@xpabhi
@xpabhi 5 жыл бұрын
Liked the way you explain the problem statement and the feature engineering part. I am pursuing the data scientist carrier and have great interest in the ML techniques. It's always a pleasure to watch you.
@harshitahluwalia8443
@harshitahluwalia8443 4 жыл бұрын
In 18:08 , if you want to see those values then do this, pd.set_option('display.max_rows', None) then after writing the above code, again write df.isnull().sum() Now you will be able to see all the values
@manojkuna3962
@manojkuna3962 Жыл бұрын
Thanks for the information
@whatitgoingtoB
@whatitgoingtoB 4 жыл бұрын
Thanks Krish, appreciable, not slow and time consuming, it is what every beginner need everything without wasting time...Loved it.
@shivadigitalsolutionsandam56
@shivadigitalsolutionsandam56 2 жыл бұрын
bhai tum bolte bahot achcha ho. maine kuch nhi to 10 video dekhi same problem per. but your video is good one
@sandrafield9813
@sandrafield9813 4 жыл бұрын
Hey thanks for this!! You're a great teacher. You really helped me parse through some things in my machine learning class. Sadly, I almost pushed ctrl-enter to enter this comment. lol
@MrJaga121
@MrJaga121 5 жыл бұрын
Great work Krish . Thank you very much for explaining line by line .
@nabeelpm4894
@nabeelpm4894 5 жыл бұрын
Thanks again krish..love your passion and humbleness..you are giving such a valueble knowledge..indeed.. sharing knowledge is the best thing in the world..Thank you so much brother..
@samudragupta719
@samudragupta719 5 жыл бұрын
All I can say this is one of the best exploration I've ever gone through! This must go on... ❤
@codingfun915
@codingfun915 4 жыл бұрын
Instead of writing the whole big column with categorical values......we can do this c = data.columns categorical=[] for a in c: if data[a].dtypes==object: categorical.append(a) and our categorical will be the list similar to columns list in the video Btw amazing video just loved it
@harikaepuri9337
@harikaepuri9337 4 жыл бұрын
Very neat and detailed explanation sir. Thank you very much for making me understand the whole project and how to participate in Kaggle competition. Looking forward to more such Kaggle competition videos sir.
@sabyasachighosh9847
@sabyasachighosh9847 5 жыл бұрын
Krish... U r going a great job... Good to learn from u...
@sauravsrivastava2353
@sauravsrivastava2353 2 жыл бұрын
This video was really helpful for me because I am just fresher in the Data Science world and I even don't know how to deal with such real-world data science problems, So thanks Krish sir for this kind of video, pls make another video regarding another Kaggle competitions.
@rajinirox
@rajinirox 2 жыл бұрын
around 22:07 - if you want to check it yourself if actually number of categories are different in training and test data set, use this "for column, col in zip(df,test_df): print(len(df[column].value_counts()), len(test_df[col].value_counts()))" this will print the number of categories column wise for training and test dataset
@akshayjadhav2213
@akshayjadhav2213 4 жыл бұрын
very nice dear sir..u try to explain from basics and which is the necessary and important thing one should do. I had heard about kaggle competitions but today understood how it works.Thanks a lot and keep encouraging us.
@abhimanyutiwari100
@abhimanyutiwari100 5 жыл бұрын
Nice. Coincidently, I too, completed this advanced regression kaggle problem yesterday.
@RahulVarshney_
@RahulVarshney_ 4 жыл бұрын
How to extract all the categorical features: features=df.select_dtypes(include=['object']).copy() This will give dataframe of all categorical feature. To extract columns we will write Categorical_features= features.columns 😊
@Omarismail-vs4jl
@Omarismail-vs4jl 4 жыл бұрын
you are a life saver
@astridbrenner2957
@astridbrenner2957 3 жыл бұрын
df.columns ?
@RahulVarshney_
@RahulVarshney_ 3 жыл бұрын
@@astridbrenner2957 df.columns will give you the original dataframe columns while features.columns will give you categorical columns
@astridbrenner2957
@astridbrenner2957 3 жыл бұрын
@@RahulVarshney_ Thank you Rahul. I'm new
@RahulVarshney_
@RahulVarshney_ 3 жыл бұрын
@@astridbrenner2957 you can go through many tutorials on dataframe play with them
@rakesh2you
@rakesh2you 4 жыл бұрын
Thanks for this videos . It helped me in submission to Kaggle and understand what else goes in data science project
@oleholeynikov8659
@oleholeynikov8659 2 жыл бұрын
it is my exam project. Thanks a lot for the video!!!!
@BiancaAguglia
@BiancaAguglia 5 жыл бұрын
Nice job, as usual, Krish. 😊 One note about accuracy: I recently heard a data scientist at Netflix say that some of the models that win competitions on Kaggle are too complex and too impractical to be put into production. So our job is to find a balance between accuracy and usability. I thought that was interesting. 😊
@whoknows8992
@whoknows8992 5 жыл бұрын
yeah thats true!
@saurabhtripathi62
@saurabhtripathi62 4 жыл бұрын
yes
@eyabaklouti9066
@eyabaklouti9066 4 жыл бұрын
can you give us the name of the documentary please
@mohammedfaisal6714
@mohammedfaisal6714 4 жыл бұрын
If increasing the accuracy about 0.5 % is demanding a lot computational effort , companies will not be interested in such investments Agree to your point @Bianca A
@BGrovesyy
@BGrovesyy 4 жыл бұрын
Exactly, it’s very important to consider overfitting if the objective is deployment. An overfitted mode will not respond well to new variables and so will be not suitable for the “real world”
@akshayjhamb1022
@akshayjhamb1022 5 жыл бұрын
Thanks Krish for the video. keep coming for more solutions of kaggle competitions
@gaganlohar5517
@gaganlohar5517 5 жыл бұрын
Thank You, Krish, you are doing a great job. Very nice video.
@pradipkaushik6583
@pradipkaushik6583 5 жыл бұрын
Thank you so much Sir, you have explained these topics with so much ease. Keep posting such excellent videos.
@asdubey007
@asdubey007 4 жыл бұрын
thanks a lot sir to Crystal clear my concept ....... .........thank you soo much for your this much of efforts ........ keep doing really awesome work
@vishal56765
@vishal56765 5 жыл бұрын
Loved it...try to upload next videos like the one you did on hyperparameter tuning so that we understand how to iterate on the same problem to get better results. So slowly make this problem a complete series jsing same datset.
@geekydanish5990
@geekydanish5990 5 жыл бұрын
Great start man hope to see more videos soon
@PradeepSingh-gh1jp
@PradeepSingh-gh1jp 5 жыл бұрын
Made a great mistake here If two or more features have identical category names, doing final_df =final_df.loc[:,~final_df.columns.duplicated()] will actually create problems. You must have done pd.get_dummies(final_df[fields],drop_first=True, prefix=fields) to avoid such problem, but you did pd.get_dummies(final_df[fields],drop_first=True) here prefix is very important to distinguish each category by its associated feature. The rank you achieved here doesn't make sense after that, but the knowledge you gave is awesome. Thank you
@pranavkirdat8192
@pranavkirdat8192 5 жыл бұрын
keep up the good work . u channel will grow
@greenshadowooo
@greenshadowooo Жыл бұрын
A very useful sharing !😍😍😍
@CharzIntLtd
@CharzIntLtd 3 жыл бұрын
Thank you very much Mr Krish you have given me a clear start
@dennisbesseling9267
@dennisbesseling9267 3 жыл бұрын
in the description file it says that N/A values should be considered as a value for the absence of the feature. So if there is a null value in any of the basement columns this means that the house doesn't have a basement and so on..
@rakeshkumarrout2629
@rakeshkumarrout2629 4 жыл бұрын
Krish this is really useful for upcoming decads...
@bivekyadav08
@bivekyadav08 Жыл бұрын
This man is always there to help🙌 Thanks 🥺❤🙏
@shailendraverma761
@shailendraverma761 5 жыл бұрын
HI Krish. Thanks for your such a nice explanation. I was able to easily followup and tried few others things on data which results into score 2513 rank
@Prajwal_KV
@Prajwal_KV 4 жыл бұрын
how are you dividing the training data set and testing data set df_train=final_df_iloc[:1422,:] df_test=final_df_iloc[1422:,:] how did you know it was 1422?how to calcluate?
@kalyanprasad4069
@kalyanprasad4069 4 жыл бұрын
Hello Krish - It would be really helpful if you do a video on how to step into kaggle competitions? what are the basics thing that one should aware before entering into competitions. Thanks for understanding Sincerely
@o_rod8954
@o_rod8954 4 жыл бұрын
Thanks for the video. You make it easy to understand and follow!
@vijayanarayanan3425
@vijayanarayanan3425 4 жыл бұрын
Hi Krish, it was so nice listening to your video....thoroughly enjoyed.......
@spamaccount1513
@spamaccount1513 2 ай бұрын
15:14 isn't that data leakage? He imputed mean values using the mean from the test df
@shashankvm
@shashankvm 5 жыл бұрын
You are my role model brother...I want to be like you :)
@Amrrkevin
@Amrrkevin 5 жыл бұрын
Please continue/complete Deep Learning series. We r waiting for those videos very eagerly.
@VivekKumar-li6xr
@VivekKumar-li6xr 5 жыл бұрын
Yes please ..even I am waiting for the same. But this too is very informative. Thanks a lot Krish for making video from your busy time.
@BhartiDeepak
@BhartiDeepak 5 жыл бұрын
First of all thanks for the video, it is very informative. I am new to data science so this could be a novice thing to ask; however, one part that I wanted to point out is that you are combining train and test data for modelling. From what I have learnt we should never combine the train and test data for training as it will not be good for predicting the results for the dataset that is not seen by model. Please correct me if I am wrong.
@mohammed.dawood_
@mohammed.dawood_ 4 жыл бұрын
He didnt feed the combined data set into the model. He only gave the training part for the input i.e first 1422 rows. The only reason for combining both training and test data was to create dummy variables. That could have been done seperately but he mentioned that for some attributes training data had 3 categories and test data had 4. So making dummy variables separately would have created discrepency in number of columns.
@maheshwarang2008
@maheshwarang2008 5 жыл бұрын
Thank you so much, sir. you are doing a superb job for us who wants to enter in Machine learning
@mohitkeshwani456
@mohitkeshwani456 5 жыл бұрын
Really helping this... Thanks alot for making these types of videos
@aryamahima3
@aryamahima3 5 жыл бұрын
very very helpful videos.. your efforts are highly appriciable.
@kevinmartinezperez4111
@kevinmartinezperez4111 Жыл бұрын
Men que buen video, muchas gracias, Saludos desde Perú
@Marcel-f1
@Marcel-f1 Жыл бұрын
In summary: the machine learning engineer is making a “guess” about the dataset he is working on, and making multiple repetitive tasks like fill null values, feature extraction, and all these work that can be automatized
@nandalal-dev6095
@nandalal-dev6095 4 жыл бұрын
sir, not every feature requires one hot encoding For example: for feature LotShape : we have values like this Regular Slightly irregular Moderately Irregular Irregular we can do LabelEnconding for these values ( [1,2,3,4]).
@AKHILESHKUMAR-nk2rk
@AKHILESHKUMAR-nk2rk 4 жыл бұрын
is it a ordinal categorical feature
@AKHILESHKUMAR-nk2rk
@AKHILESHKUMAR-nk2rk 4 жыл бұрын
yes pls do it
@colinodwanny3022
@colinodwanny3022 Жыл бұрын
I thought you wern't supposed to use label encoder on the X variable? I don't have a lot of experience, but that is just what I have heard
@Zeba_Sayyed
@Zeba_Sayyed 5 жыл бұрын
Thank you soo much sir.. Ur videos are soo helpful
@aktharm1317
@aktharm1317 5 жыл бұрын
Great !! Good Work by you !
@abhileshm7216
@abhileshm7216 5 жыл бұрын
Thank you for this initiative.... Can you please explain in detailed way of ...what are different levels in category data of train and test ......why we concatenated ...and when dummified what are duplicates in dummified created and why we removed those duplicates from this example ....please explain it sir
@kadhirn4792
@kadhirn4792 4 жыл бұрын
Good techniques. I have learnt a lot from this thank you so much
@tonyt6379
@tonyt6379 3 жыл бұрын
Thanks. Great work! Could you explain where the duplicate columns come from at 26:23 ? I don't understand why you get these from one hot encoding the train and test set together.
@ashutoshsalekar1810
@ashutoshsalekar1810 3 жыл бұрын
There are some columns which have same categories as other columns do. And while creating dummy variable the new column get the name of that category. so multiple columns get created with same name. Eg: column1 = ['yes', 'no', 'neutral'] # 3 categories column2 = ['yes', 'no'] # 2 categories when we apply get_dummies on these columns it will create columns such as yes, no, neutral, yes, no. We can see there are 2 columns with name 'yes' and 2 columns as name 'no'.
@ritika2708
@ritika2708 3 жыл бұрын
​@@ashutoshsalekar1810 yeah but shouldn't it be col1_yes and col2_yes. we need value of both attributes , if we will have coluns like Yes and no only how will we know for which attribute its is ?? And a simple pd.get_dummies(df) should have done that. I don't understand why we need such complicated method . I am getting 275 distinct columns with this not sure how can only 175 columns will serve the purpose.
@rohitbharti9360
@rohitbharti9360 4 жыл бұрын
Thank you so much..... It is very helpful 🙂
@niteshsoni5379
@niteshsoni5379 5 жыл бұрын
Great job sir..
@ShubhamGuptaGgps
@ShubhamGuptaGgps 4 жыл бұрын
what to do for this, my jupyter notebook on writing df.isnull().sum() not shows complete instead shows using dots in mid for continuation instead of scroll bar df.isnull().sum() Out[13]: Id 0 MSSubClass 0 MSZoning 0 LotFrontage 259 LotArea 0 ......... MoSold 0 YrSold 0 SaleType 0 SaleCondition 0 SalePrice 0 Length: 81, dtype: int64
@ShubhamGuptaGgps
@ShubhamGuptaGgps 4 жыл бұрын
Got it: nulls = df.isnull().sum().to_frame() for index, row in nulls.iterrows(): print(index, row[0])
@faisalkhan-oo5jd
@faisalkhan-oo5jd 3 жыл бұрын
Great videos! Thanks a lot But in the property description text file it is written for some features that NA means the feature is not available rather than meaning that data for that column is not available. Does anyone else agree that we don't have to treat all the columns with mean and mode?
@PrasadHonavar
@PrasadHonavar 5 жыл бұрын
Excited for your next Kaggle video.
@isingcauseimliving
@isingcauseimliving 4 жыл бұрын
Hi Krish. It could have helped if you would have read the description of the pricing. Why some features were chosen or why some features like say area = f(length * breadth), but both are given separately. So we could have done something by creating features ourselves. So I would have liked to see why you removed any of the features. I am just a noob, learning ML, so allow me to question. Can you also take it that the reason that "Fence" has so many null values is that there are actually very few houses that have fences. However, the houses which have fences, by intuition are costly houses. In this case shouldn't we take into consideration the value of the indices rather than the value of the null points. For example a big mansion would have fences, however, there are not many mansions in the training set. This does not mean that we do not include the housing price of the mansion for our solution. We would need to go over all of the 81 features to determine, with intuition, what could be the real life scenario rather than just thinking about data as "null points". Please let me know if I am right or wrong. Thank you.
@eanamhossain1156
@eanamhossain1156 5 жыл бұрын
Thanks brother for this video. Continue..and more video upload.
@pembasherpa3240
@pembasherpa3240 3 жыл бұрын
Very helpful! Thank You
@jeremyheng8573
@jeremyheng8573 2 жыл бұрын
Thank you! Good tutorial!
@liiinx_com
@liiinx_com 4 жыл бұрын
Hey Man, Thanks for the Video!
@sogolgolafshan7843
@sogolgolafshan7843 3 жыл бұрын
as I type final_df.shape, I get the 'NoneType' error. Can you help me on what I should do?
@Peter-ns6jg
@Peter-ns6jg 2 жыл бұрын
this helped me a lot. thanks
@Prajwal_KV
@Prajwal_KV 4 жыл бұрын
sir,how are you dividing the training data set and testing data set df_train=final_df_iloc[:1422,:] df_test=final_df_iloc[1422:,:] how did you know it was 1422?how to calcluate?
@shivambansal1993
@shivambansal1993 3 жыл бұрын
I have the same ques
@naageshk1256
@naageshk1256 7 ай бұрын
Thank you sir ❤🎉
@chetanmazumder310
@chetanmazumder310 4 жыл бұрын
You are GREAT sir .
@abhinavshrivastava4637
@abhinavshrivastava4637 4 жыл бұрын
I want to evaluate my model with confussion matrix but in test data we don't have that column 'SalePrice' then if i run this commond : confusion_matrix(y_test, y_pred). Here I dont have 'y_test' . What to do ???? please suggest
@AKHILESHKUMAR-nk2rk
@AKHILESHKUMAR-nk2rk 4 жыл бұрын
confusion matrix is for classification problem right.....
@atulpandey1979
@atulpandey1979 5 жыл бұрын
Excellent..!!
@samratkishore4668
@samratkishore4668 5 жыл бұрын
Sir I think you have to..fix the outliers in the data set....that can increase your prediction sir..
@krishnaik06
@krishnaik06 5 жыл бұрын
Yes that is the plan which I will doing that in my next video..this is just the beginning
@samratkishore4668
@samratkishore4668 5 жыл бұрын
@@krishnaik06 Thank you..for your response sir...awaiting for more updates from you sir...!!! Always with love❤️
@santosharavind2887
@santosharavind2887 5 жыл бұрын
@@krishnaik06 Thank you for clearly explaining each and every step. Would like to know when are you going to post the continuation video? so we can complete one entire project. Thank you in advance
@anandacharya9919
@anandacharya9919 5 жыл бұрын
Super, but we can merge the test and train data first then can do feature Eng, to avoid double work.
@krishnaik06
@krishnaik06 5 жыл бұрын
Yes we can
@hafizmfadli
@hafizmfadli 4 жыл бұрын
nice video,thank your for sharing this video
@babayaga626
@babayaga626 5 жыл бұрын
Hello Sir, Can you please discuss about the parameters and values for XGBosst classifier. Also how do we get the best value of parameters using cross validation.
@cusematt23
@cusematt23 4 жыл бұрын
I'm pretty sure you're deleting perfectly good columns when you're removing "duplicates". For example the categorical features can come in conditions of "good" "bad" "excellent". This can apply to garage, basement, or attic. Since you didnt use a prefix for get_dummie, you now have 3 columns with the name "good", "bad", and "excellent" and you delete 2 when you "remove duplicates". They were never duplicated, they simply weren't intelligently named. If you use a prefix which is equal to fields, there are now no duplicates. Logically, why would there be duplicates in the first place? It doesn't make logical sense to me that there would be duplicate columns after applying get_dummies, when there weren't duplicate columns before applying get_dummies!
@nickey0207
@nickey0207 3 жыл бұрын
i agree
@ritika2708
@ritika2708 3 жыл бұрын
Exactly , I have the same doubt it should be like garage_good and attic_good , as I am a new bee in this field I thought I might be missing something , so I looked into comments to see if anyone else also feel that way , also he used drop_first = True which removes the first value, not sure why he did that. I am getting around 275 columns if I apply encoding to call categorical columns , but we can convert lots of there in simple 1,2,3 ranking i believe.
@kdevendranathyadav5087
@kdevendranathyadav5087 Жыл бұрын
Bro what is the final output this project
@victormayowa7989
@victormayowa7989 Жыл бұрын
But duplicates goes across the observation unless it is restricted to a column
@Nitsoney
@Nitsoney 5 жыл бұрын
Hi Sir, thanks for uploading the videos and training us with such good content, I didn't get the one hot column part why it was done??
@abhishekbajiya3332
@abhishekbajiya3332 4 жыл бұрын
Hey Krish, Why didn't you use OneHotEncoder and ColumnTransformer to change categorical variables?
@kristiangohibi7326
@kristiangohibi7326 2 ай бұрын
Thank you for your video
@chandrakanthshalivahana8616
@chandrakanthshalivahana8616 5 жыл бұрын
sir, why did u drop df.drop(['GarageYrBlt '],axis=1,inplace=True) as it has only 81 null values
@selvaprabu3878
@selvaprabu3878 5 жыл бұрын
Superb sir......great ...I have one doubt, sir 1)while treating the skewness and normality ..is it required to check all the columns(IVs) or only in DV...I am a beginner for Data science (Because , Linear regression Assumption not says about IVS) Can u explain clearly sir? 2)Another thing for feature selection I have to each and every column separately or can I with auto selection method. In, Manual condition for Numerical features based on Correlation what about categorical features? I have chi2 or ANOVA test I have to take or some other... I know theoretical...but actual real-time projects how it will be?(some Dataset having 400 Columns)
@thejswaroop5230
@thejswaroop5230 3 жыл бұрын
Thank you it was helpful
@MrPriti999
@MrPriti999 5 жыл бұрын
great one
@edzhem
@edzhem 3 жыл бұрын
Hey Krish, Thanks for your effort! just one quick question is data heteregenous data isnt it?
@hiw92
@hiw92 4 жыл бұрын
Great video
@bibhasgiri527
@bibhasgiri527 5 жыл бұрын
Thank you.. This is really helpful.
@devathimahesh8007
@devathimahesh8007 5 жыл бұрын
Nice video
@Akshat.agr13
@Akshat.agr13 5 жыл бұрын
Why are we dropping some records in test and train data because null values should have been removed after applying mean to numeric features and mode to categorical features? (The heat map generated for 2nd time still shows some null values for some features before you execute the above step)
@krishnaik06
@krishnaik06 5 жыл бұрын
Because at the last there were some features which was having very less no of NaN values. So thought of dropping it
@zeuspolancosalgado3385
@zeuspolancosalgado3385 3 жыл бұрын
Kaggle Competition Link: www.kaggle.com/c/house-prices-advanced-regression-techniques/overview Original Dataset Link: jse.amstat.org/v19n3/decock.pdf
@himanshusahoo143
@himanshusahoo143 5 жыл бұрын
Hi sir, just one question, why you apply 'Get dummy' feature, Rather 'label encoding'. Get dummy, is creating huge numbers of columns. And again removing duplicate columns. Can't we simply apply label enc? Please advise.
@sahil-7473
@sahil-7473 4 жыл бұрын
Hello Sir. Thanks for showing walkthrough of this problem. I have a suggestion. In test data, why are you replacing nan with those values that are there in test data. This is wrong! In test data, you have to replace nan with with those values that are there in train data. Test data is just like hidden. For example, let say I trained model with train.csv and i feed a query. For a query, these values can be anything either it is null or some values in some features. Now, for this query, to be replace null value, u don't have a bunch of test data, right? But u have done it with train data. So, it should be replaced with those values that has performed in train data. Thanks
@muzamilshah8028
@muzamilshah8028 5 жыл бұрын
very nice work
@bars-qt9yi
@bars-qt9yi 5 жыл бұрын
Hi sir nice work but please make your videos in odered way or in sequence we don't know how to start from scratch
@BiancaAguglia
@BiancaAguglia 5 жыл бұрын
Krish has several playlists organized by topic. Pick a playlist, and try to go through it in order. I think you'll find that helpful. 😊 Another idea is to make your own list for the thing you're trying to learn. For example, let's say you're trying to learn statistics. Google "statistics for data science" and create a list of things you need to learn. Then come back to KZbin and search for each topic in order. This is your chance to start thinking like a data scientist. Data scientists solve problems. Think of this (i.e. finding a good sequence to learn data science) as your first data science problem. 😊Don't get discouraged. It takes a while to become good at data science. Learning it is like going to a completely foreign city. At first you feel lost but, if you explore for a while, you'll be able to easily find your way around. Best wishes to you. 😊
@bars-qt9yi
@bars-qt9yi 5 жыл бұрын
@@BiancaAguglia love you man love you can you help me to become a data scientist please tell me full path or KZbin channel please your thinking and mind very powerfull love you and give me reply please
@BiancaAguglia
@BiancaAguglia 5 жыл бұрын
​@@bars-qt9yi Please remember that your goal is to be a problem solver and ask for as little help as possible. 😊 I know you feel overwhelmed at first though, so here is a quick path to get you started (look these up on KZbin): 1. Python tutorial for beginners. 2. Statistics tutorial for beginners. 3. Pandas tutorial for beginners. 4. Matplotlib tutorial for beginners. 5. Scikit-learn tutorial for beginners. 6. Sign up for a Kaggle account and look at the Titanic challenge. That's Kaggle's best known challenge and you'll find plenty of help about it. Give yourself 2 to 6 months to go through these steps. Practice solving on your own the problems in the tutorials. 7. Start taking advanced statistics. 8. Start learning calculus. 9. Start learning linear algebra. By the time you're done with this list you'll know enough about data science to start carving your own path. 😊 Remember Henry Ford's words: "Whether you think you can, or you think you can't - you're right." 😊Start thinking that you CAN be a data scientist and then work hard and smart to become one. 😊Best wishes.
@bars-qt9yi
@bars-qt9yi 5 жыл бұрын
@@BiancaAguglia love you god bless you how I contact with you
@bars-qt9yi
@bars-qt9yi 5 жыл бұрын
@@BiancaAguglia and krish first playlist is about machine learning or about data science
@vivianjoseph822
@vivianjoseph822 2 жыл бұрын
tqsm brother!!
@Girish0512
@Girish0512 5 жыл бұрын
before getting into it what should be the basic preparations to be work on?
@preetdahiya3012
@preetdahiya3012 4 жыл бұрын
I am getting an error i converted float into int32 but while saving a csv file of same it convert 'Id ' column data type in int64 which shown error on kaggle.... so in last do have knowledge to overcome from it? please help me if you know
@datasciencetoday7127
@datasciencetoday7127 2 жыл бұрын
at 27:58 if you drop the sales price XGBoost will give error features mismatch
@ibrahimkuru450
@ibrahimkuru450 2 жыл бұрын
sir i am getting this error. How can i solve.
@ashutoshkumar2834
@ashutoshkumar2834 4 жыл бұрын
I have one question. Let's say I have 2 dataframe train_df and test_df. Both have same type of columns(column_A,column_B,column_C).Both dataframe have some missing values. If I drop column_A in train_df , is it mandatory to drop same column train_A in test_df ?
@wise1569
@wise1569 4 жыл бұрын
Yes
@anoopk4659
@anoopk4659 3 жыл бұрын
Garagecars is a categorical variable ,but mean is used for fill na
@ApurvaMishra9
@ApurvaMishra9 3 жыл бұрын
Hi Krish! Thank you so much for this ml intro to kaggle via house price prediction. I am a novice in the field and had a doubt. I shall be grateful if you could help me out. In theory, isn't testing data the data that is not touched at all meaning how can we perform preprocessing on the new untouched data and not violate that concept?
@adityadwivedi9159
@adityadwivedi9159 2 жыл бұрын
Bro assume that he first preprocessed the train set by creating 2 functions and passing the train data to them 1- Nan handler this will handle all the Nan values 2- Category handler this will handle all the Categorical features Now if will not use these 2 functions on the test set then we may have null values and categorical features in test data also and if this is the case then ml algorithms will not work on it and wouldn't predict so preprocessing on test set is also required
@imPriyansh77
@imPriyansh77 2 ай бұрын
@@adityadwivedi9159 Thanks!!
House Price Prediction using Linear Regression Machine Learning
12:15
It works #beatbox #tiktok
00:34
BeatboxJCOP
Рет қаралды 41 МЛН
Мен атып көрмегенмін ! | Qalam | 5 серия
25:41
BAYGUYSTAN | 1 СЕРИЯ | bayGUYS
36:55
bayGUYS
Рет қаралды 1,9 МЛН
All Machine Learning algorithms explained in 17 min
16:30
Infinite Codes
Рет қаралды 485 М.
Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic)
2:20:17
Ryan & Matt Data Science
Рет қаралды 25 М.
How to do the Titanic Kaggle Competition
18:28
Aladdin Persson
Рет қаралды 79 М.
Complete NLP Machine Learning In One Shot
3:53:11
Krish Naik
Рет қаралды 212 М.
Learn Machine Learning Like a GENIUS and Not Waste Time
15:03
Infinite Codes
Рет қаралды 336 М.
Kaggle Competition - House Prices: Advanced Regression
18:14
Tiny Mud Particle
Рет қаралды 4,5 М.
Linear Regression in Python - Full Project for Beginners
50:52
Alejandro AO - Software & Ai
Рет қаралды 39 М.
It works #beatbox #tiktok
00:34
BeatboxJCOP
Рет қаралды 41 МЛН