Tutorial 11-Exploratory Data Analysis(EDA) of Titanic dataset

  Рет қаралды 331,566

Krish Naik

Krish Naik

Күн бұрын

Пікірлер: 290
@aakritiroy7336
@aakritiroy7336 4 жыл бұрын
After so much of struggle with my LMS, I was finally able to understand entire EDA in within 30 minutes. Thank you.🙏👍
@bhadrakadabra
@bhadrakadabra 4 жыл бұрын
Is it the inmovidu one?
@ytg6663
@ytg6663 3 жыл бұрын
What ia LMS
@ONE-THING-2RAY
@ONE-THING-2RAY 3 жыл бұрын
@@ytg6663 learning machine shorts
@shinosukenohara.123
@shinosukenohara.123 3 жыл бұрын
@@ONE-THING-2RAY Where it is?
@akashravindra..
@akashravindra.. 2 жыл бұрын
@@ytg6663 Learning Management System
@VVV-wx3ui
@VVV-wx3ui 5 жыл бұрын
Doing a job that of True Guru, Ekalavyas are all around and raring for such knowledge-impartation. Thanks much Krish.
@Esha25ghosh
@Esha25ghosh 4 жыл бұрын
You are awesome sir! Not only are you a great mentor, but also a great motivator. Thanks for all the great work you have been doing. Stay blessed!
@chaos8514
@chaos8514 2 жыл бұрын
I am learning this for data analyst but not sure what more should I learn to get job asap.. if you can help please we can connect on instagram
@classicemmaeasy2292
@classicemmaeasy2292 2 жыл бұрын
Me trying to understand data analysis with python couple of days ago now U actually make it simpler and beginners friendly, more unction to function sir
@souvikdas3905
@souvikdas3905 5 жыл бұрын
What a beautiful video for a beginner who is just getting his hands on data science.
@aayushshukla342
@aayushshukla342 6 ай бұрын
Loved the video; in fact, the entire playlist gives an amazing approach to the intricacies of Machine Learning. Thank you, Sir.
@thePrabhuChannel
@thePrabhuChannel 4 жыл бұрын
21:30 Median of the passenger age travelling in each Pclass can be calculated using below code instead of looking at boxplot and guessing the number. df[df['Pclass']==1]['Age'].median() df[df['Pclass']==2]['Age'].median() df[df['Pclass']==3]['Age'].median()
@viveksingh881
@viveksingh881 4 жыл бұрын
good one brother i was thinking the same y to guess it when we can actually calculate it,....
@tusharmahuri2439
@tusharmahuri2439 3 жыл бұрын
There is a error comes when I want to use sns.countplot. And the error is "could not interpret input 'survived' "
@yashikaarora8573
@yashikaarora8573 2 жыл бұрын
@@tusharmahuri2439 bro copy the heads from the data set and not just type, the language is case sensitive it is 'Survived' and not 'survived'
@sunnychandra5064
@sunnychandra5064 5 жыл бұрын
You have actually cleared the EDA concept for me, Thanks a lot !!
@ShivamChaudhary-jn4kw
@ShivamChaudhary-jn4kw Жыл бұрын
why 0 and 1 is taken in cols as the indexing of the column is 2 and 5 then why 0 and 1 is taken can you clear
@aliakbarrayhan6389
@aliakbarrayhan6389 5 жыл бұрын
Sir I'm very impressed to see your such amazing video.. Though I am very weak in programming but now I feel like that i should start my programming journey again cause i have someone like u who can explains anything in very simple way
@PiyushSingh-cq2xv
@PiyushSingh-cq2xv 3 жыл бұрын
This is one of the best data set being used to understand how to fix the nulls. Great Job and thank you .
@vital4statistix
@vital4statistix 3 жыл бұрын
Krish, This material is FIRST CLASS. Appreciate it very much.
@imranullah7355
@imranullah7355 4 жыл бұрын
Thanks a lot Sir... You've expailed it in a great way... Love from Pakistan
@sudeeprajput1830
@sudeeprajput1830 3 жыл бұрын
You are amazing brother. Your videos are helping me gain confidence in ML. Keep up the good work
@ManishKumar-gg2vm
@ManishKumar-gg2vm 5 жыл бұрын
awesome explain ...........I really can't stop myself to comment on this video...……...on of the grt video on data visualization
@sowjanyadharmavarapu2653
@sowjanyadharmavarapu2653 3 жыл бұрын
sir i really liked your video.. but according to road map video, you asked us to watch python 1-24 lectures first..in this eda concept, you have mentioned some new words like get_dummies, and few other new words.. stuck with the last 10 mins explaination.. else everything is really clear and understandable.. thanks for all the efforts...
@dynamictechnocrat
@dynamictechnocrat 2 жыл бұрын
Get dummy are use in pandas
@ashridas9896
@ashridas9896 2 жыл бұрын
It is basically one - hot encoding.. Encoding techniques are used to convert categorical data into numerical data Since it is applied on 'Embarked' column kzbin.info/www/bejne/hYWzq2imobCVapI
@aination7302
@aination7302 4 жыл бұрын
Both imputing and dropping missing values (NaN) is not a good practice with real world data. The ideal way is to derive a new field indicating missing values. 1 for missing else 0. because, sometimes missing value can be a new information in itself. just sharing some learning from my job :)
@okonvictor8711
@okonvictor8711 2 жыл бұрын
Hi please do you mind sharing how to do that here. Or can I reach you via email?
@waqarmehdi4394
@waqarmehdi4394 2 жыл бұрын
Yes, it depends upon the dataset and problem you want to solve. In this case, dropping the null value is the best possible option in my opinion.
@vinothv8514
@vinothv8514 5 жыл бұрын
Nice work Mr. Krish...... It's really helpful
@girishmahamuni1830
@girishmahamuni1830 4 жыл бұрын
Thank you for providing knowledge in a simple way.
@rupeshnandanyadav8108
@rupeshnandanyadav8108 2 жыл бұрын
Awesome tutorial on Exploratory Data Analysis ❤️❤️
@MrKmdmustaq
@MrKmdmustaq 5 жыл бұрын
Can u please make a video on treating the outliers, this will help us a lot in solving the problems
@premkishanmishra1574
@premkishanmishra1574 Жыл бұрын
loved your video , far better than the uni teachers :P
@akanshabhandari1062
@akanshabhandari1062 4 жыл бұрын
Very helpful..... U did a lot of hard-work for us.... Thnk u so much sir🙌🙌🙏🙏..... And ur way of teaching is very good that is form basic
@theayodejipopshow
@theayodejipopshow 2 жыл бұрын
This video is amazing. Thanks so much for sharing your wealth of knowledge.
@garvitjain4106
@garvitjain4106 3 жыл бұрын
@Krish You are doing an amazing job.
@VengalraoPachavaedu
@VengalraoPachavaedu 5 жыл бұрын
I have seen some of your videos, excellent work. I really appreciate your work Mr. Krish Naik.
@GauravVerma-jk6cf
@GauravVerma-jk6cf 3 жыл бұрын
this was really one of the most usefull stuff avialable !!!!!!!!!!!!!!!
@warmachinex5330
@warmachinex5330 2 ай бұрын
that notification in the 3:39 part 🤣🤣😂😂
@mustafaraza6107
@mustafaraza6107 4 ай бұрын
16:15 now we have displot() ---- [without t]
@tumul1474
@tumul1474 5 жыл бұрын
this is beyond amazing....amazing place to learn and to revise the impn techniques
@RajatSharma-ct6ie
@RajatSharma-ct6ie 5 жыл бұрын
Great work sir, learning a lot from your videos, please upload more videos on EDA..
@AshishRoy
@AshishRoy 2 жыл бұрын
Very nicely explained. Awesome
@MuhammadAwais-n2b
@MuhammadAwais-n2b 4 ай бұрын
3:37 Add hahahaha Great learning Exp love you brother
@ifhamaslam9088
@ifhamaslam9088 4 жыл бұрын
Superb explanations.. And interesting to learning
@pravinmore434
@pravinmore434 4 жыл бұрын
Thanks a lot for the very detailed lesson Sir.. that was really fruitful and helped me complete one of my project. Thanks a ton..
@lavanyameesa6432
@lavanyameesa6432 3 жыл бұрын
wonderful explaination
@mssnal
@mssnal 3 жыл бұрын
Great one Krish. Basically covers most of the things a beginner needs to understand.
@ashishgoyal7020
@ashishgoyal7020 3 жыл бұрын
Thank you Krish.
@gkmadhav
@gkmadhav 3 жыл бұрын
Is there a part 2 and 3 for this video, about feature engineering on the same dataset?
@GreatHimalayanAsmr
@GreatHimalayanAsmr 4 жыл бұрын
Thankyou sir it is very helpful 😊.
@umeshrbaidya
@umeshrbaidya 4 жыл бұрын
Great video Sir, I just have two doubts that why did you not use get_dummies on "Pclass" as it was also categorical data.. and second why did you not normalize the "Fare" and "Age" Columns as their values are might over power the results?
@bharathb3946
@bharathb3946 4 жыл бұрын
Same doubt bro
@harshmakwana8001
@harshmakwana8001 4 жыл бұрын
If you type "train.info( )" you will see thae dtypes of all the columns. I don't know if this might help or not but get_dummies( ) can be used for objects only i think as they do not represent any numerical value for the system to compute get_dummies( ) changes indicates those objects into numerical values. Please correct me if i am wrong as i am also confused about this if you agree or have a different insight on this please tell me so.
@venkatadeviprasadkankanala7387
@venkatadeviprasadkankanala7387 5 жыл бұрын
Very nice one thank you very much for sharing valuable information
@ShubhamJain-in6sz
@ShubhamJain-in6sz 4 жыл бұрын
Great work sir!!👍🏻👍🏻
@pandian3731
@pandian3731 4 жыл бұрын
Another great video very useful one bro like NLP.. 📍
@piyush_paul_
@piyush_paul_ 5 ай бұрын
3:35 the add🫠💀
@diprajkadlag
@diprajkadlag 2 жыл бұрын
one note, in boxplot the middle line inside the box is median value, not the mean value
@pepetisiddhardha9848
@pepetisiddhardha9848 4 жыл бұрын
I didnt understood why categorical features disappeared in training data for logistic regression
@siddhisingh4713
@siddhisingh4713 Жыл бұрын
Everytime, I import data it shows error "file not found" import pandas as pd data=pd.read_csv('C:\Users\Siddhi Singh\Desktop\Iris.csv') print(data)
@Kishor_D7
@Kishor_D7 Жыл бұрын
Actually you should reset the laptop because if any file found in name of panda means error willl be encountered and in the other case you should download and upload in jupyter notebook and in that jupyter notebook you should copy the path...
@krishs7244
@krishs7244 2 күн бұрын
U can try using Google collab
@unnatiraut9553
@unnatiraut9553 2 жыл бұрын
Great to understand. thanks alot
@vinayaksharma6349
@vinayaksharma6349 4 жыл бұрын
sir how you get to know the age age has relation with pclass (how and which analysis you did?)
@ashishmeher216
@ashishmeher216 4 жыл бұрын
@Vinayak sharma you can relate any column with any other column.
@SravanKumar-td5im
@SravanKumar-td5im 3 жыл бұрын
You could do a heat map of all features and get their correlation according to which you can know which feature is dependent on what
@naveenrawat6505
@naveenrawat6505 3 жыл бұрын
great video :) i have a suggestion we can drop PassengerId to increase the accuracy score because it doesn't contribute to the dependent variable
@tusharmahuri2439
@tusharmahuri2439 3 жыл бұрын
@naveen rawat There is a error comes when I want to use sns.countplot. And the error is "could not interpret input 'survived' "
@naveenrawat6505
@naveenrawat6505 3 жыл бұрын
@@tusharmahuri2439show me the line of code
@abhinavmahajan448
@abhinavmahajan448 4 жыл бұрын
Thanks for the detailed video. Really helpful :)
@mohamedshathik8045
@mohamedshathik8045 3 жыл бұрын
Hi krish, You didn't drop the passenger ID column before fit the logistic regression model cause it doesn't contain any information.
@kasturidas4081
@kasturidas4081 3 жыл бұрын
Where are the previous and next videos of this video? I couldn't find Someone help me please
@ganeshrao405
@ganeshrao405 3 жыл бұрын
Really helpful, Thank you soo much.
@babupatil2416
@babupatil2416 5 жыл бұрын
Hi Krish, Please create some more videos on EDA, it will be helpful.
@saylisuryawanshi3989
@saylisuryawanshi3989 4 жыл бұрын
great job sir, please do make more such videos for practising for beginners .
@sulaimankhan8033
@sulaimankhan8033 4 жыл бұрын
Krish - Thank you for the EDA, Throw some light on Story Telling - If you had to conclude the EDA, Theorotically, In lay man terms - we must do the story telling- Correct me If I am wrong .
@honey9111
@honey9111 4 жыл бұрын
Thanks a lot Kris. EDA was well explained. I could not understand the last part starting from confusion matrix and how to read the final result of the analysis?
@devanshusharma9386
@devanshusharma9386 5 жыл бұрын
very helpful for beginners
@pedrocrespo2681
@pedrocrespo2681 4 жыл бұрын
Pretty nice explanation !
@buzzfeedRED
@buzzfeedRED 11 ай бұрын
why you place this video in the playlist at this point, there are so many doubts
@MrDeeb00
@MrDeeb00 Жыл бұрын
Hi, Enable auto subtitle, It helps a lot. Thank you.
@louerleseigneur4532
@louerleseigneur4532 3 жыл бұрын
Thanks Krish
@yashaskumargb3827
@yashaskumargb3827 2 жыл бұрын
Sir play list is best But please share the link from which u downloaded dataset fir every vedio So that we can do what u explained in vegio
@samyakkumarsahoo8706
@samyakkumarsahoo8706 4 жыл бұрын
It was a resourceful video. But why EDA is done before train-test split ?
@aasthasingh67
@aasthasingh67 3 жыл бұрын
How do you know for one kind of result, which plot to use exactly?
@buzzfeedRED
@buzzfeedRED 11 ай бұрын
@Krish : Arrange your Complete ML playlist videos into a roadmap playlist, from start to end : to data scientist
@hemapriyaelumalaipalani3752
@hemapriyaelumalaipalani3752 4 жыл бұрын
Great video Krish. One doubt- how did you find the correlation between pclass and age before creating the box plot?
@joelbraganza3819
@joelbraganza3819 4 жыл бұрын
Use ANOVA test for finding relationship between variance of each class-group of the categorical variable and the mean of the continuous variables associated with each group.
@aradhyakanth8409
@aradhyakanth8409 2 жыл бұрын
Sir, what is the need to visualise the data in this problem. You haven't use any analysis extracted from the visualisation to get help out in data cleaning.
@nanduneo1
@nanduneo1 3 жыл бұрын
No one will travel with only children and no spouse XD..18:22
@subhamsaha2235
@subhamsaha2235 3 жыл бұрын
One correction Sir-- In the boxplot, them middle line is the median(50% percentile). Thank you
@RahulRoy-qy8rk
@RahulRoy-qy8rk 4 жыл бұрын
This was so helpful. Thank You
@KimJennie-fl3sg
@KimJennie-fl3sg 4 жыл бұрын
20:20 hey, uhmm.. 50% percentile gives us MEDIAN of the age of people with 1st class... So we are using MEDIAN value instead of MEAN right? Very helpful video for me to understand EDA
@sharathkumar8422
@sharathkumar8422 4 жыл бұрын
You're right, 50%ile is the median. I think you should check out the definition of median and percentiles on this page - www.statisticshowto.com/probability-and-statistics/percentiles-rank-range/#:~:text=The%2050th%20percentile%20is%20generally,quartiles%20is%20the%20interquartile%20range. That should clear your doubt.
@gangasekar3224
@gangasekar3224 2 жыл бұрын
Instead of mayplot lib and seaborn can we use powerbi
@hrcnszn
@hrcnszn 2 жыл бұрын
totally unrelated to the topic but how does your taskbar look like that
@anahitasaxena9439
@anahitasaxena9439 11 ай бұрын
why did you decide to analyse age with respect to Pclass in the missing value stage ?
@classicgd
@classicgd 4 жыл бұрын
Hi Krish thanks for the videos... do you have a playlist explaining all algorithms ?
@thistimeforafricaa
@thistimeforafricaa 3 жыл бұрын
Thank you sir
@yashkhilavdiya5693
@yashkhilavdiya5693 2 жыл бұрын
Thank You So Much
@abhishekts740
@abhishekts740 Жыл бұрын
Please upload video related time series analysis
@Sab_Moh_Maya_Hal
@Sab_Moh_Maya_Hal 4 жыл бұрын
very knowledgeable,thanks man :)
@jagadeeshabburi570
@jagadeeshabburi570 3 жыл бұрын
kind of fantastic video bro, but it needs 2-3x watch for crystal clear understanding.
@shubhamthapa7586
@shubhamthapa7586 4 жыл бұрын
i have a question why is he not using SimpleImputer class from scikit learn instead of finding the realtion to make the nan values having some values we can easily do it through sklearn module and also why isnt he using label encoder for binary values ???
@kalpatarusahoo1820
@kalpatarusahoo1820 5 жыл бұрын
Krish. Can you explain while data cleaning, why the passenger class is compared with Age and not any other columns. Big doubt of mine
@Wanderlust1342
@Wanderlust1342 4 жыл бұрын
did you get the answer?
@adeniyi5875
@adeniyi5875 Жыл бұрын
I like the video, but how did you know exactly the graphical representation to use, i mean why countplot why not jointplot? Why line plot not boxplot? I hope you really understand my questions sir
@samudragupta719
@samudragupta719 5 жыл бұрын
Sir One question always revolves always in my mind that how should we remember all the libraries and syntaxes that are needed to Preprocess the data or doing the visualization stuffs??! It would be grateful if you share your strategies regarding that?!
@nabiltech1366
@nabiltech1366 4 жыл бұрын
Same question
@samitaadhikari3182
@samitaadhikari3182 4 жыл бұрын
@Rahul Ranjan i'll try too thanks for suggestion
@muhammadbilalanwar6429
@muhammadbilalanwar6429 4 жыл бұрын
A very good about EDA but one thing i must mention that you didnt even touch the outliers concept. Its the major part of EDA and honestly i take this video only for outliers . But didnt find .
@josephtolentino1900
@josephtolentino1900 4 жыл бұрын
I wish I found this the first time around
@naveenrawat6505
@naveenrawat6505 3 жыл бұрын
loving the playlist :)))))
@vinayakindulkar3706
@vinayakindulkar3706 4 жыл бұрын
Why not you have used transform on the train set is there any reason behind it.
@ittecheval1868
@ittecheval1868 3 жыл бұрын
After separating the 'Survived', I could not able to understand what you did.
@ds-hy9nc
@ds-hy9nc 4 жыл бұрын
when i try to apply my functinon (23:20)it is showing unexpected EOF while parsing
@sohamdeshpande3654
@sohamdeshpande3654 4 жыл бұрын
Thank you very much!!!
@sung3898
@sung3898 4 жыл бұрын
The middle line in box plot is not average but it's a median.
@naveengoud3264
@naveengoud3264 4 жыл бұрын
Best explanation
@LearnwithNaviOfficial
@LearnwithNaviOfficial 11 ай бұрын
@krish Naik we drop the age column then how again age column occur
@madhumeenamk6531
@madhumeenamk6531 4 жыл бұрын
Can regression be done using unsupervised algorithms?
@bhavanshah1368
@bhavanshah1368 3 жыл бұрын
@Krish Naik : Hi Krish, could you please explain why Age assigned cols[0] and Pclass cols[1],??I have not understood this
@Parshant17
@Parshant17 3 жыл бұрын
Are you sure that is average in boxplot near 20th mintue? Because when we talk about percentile then 50%ile should be median.
@abdullahkidwai7222
@abdullahkidwai7222 2 жыл бұрын
I am getting key error after executing the following code: sns.distplot(train['Age'].dropna(),kde=False,color='darkred',bins=40) Any suggestion/idea as to what is to be done to stop getting this error?
@udhayarajsivakumar6735
@udhayarajsivakumar6735 2 жыл бұрын
Me to facing same error... Have you eliminate this error
Tutorial 12- Python Functions, Positional and Keywords Arguments
13:39
How To Become Expertise in Exploratory Data Analysis
10:05
Krish Naik
Рет қаралды 188 М.
Support each other🤝
00:31
ISSEI / いっせい
Рет қаралды 81 МЛН
REAL or FAKE? #beatbox #tiktok
01:03
BeatboxJCOP
Рет қаралды 18 МЛН
What is Agentic AI? Important For GEN AI In 2025
22:36
Krish Naik
Рет қаралды 127 М.
Live Resume Data Science Projects  Roasting
1:34:41
Krish Naik
Рет қаралды 6 М.
Best of CES 2025
14:50
The Verge
Рет қаралды 639 М.
Tutorial 9- Seaborn Tutorial- Distplot, Joinplot, Pairplot  Part 1
21:43
Complete Statistics For Data Science In 6 hours By Krish Naik
5:28:32
Krish Naik
Рет қаралды 1,1 МЛН