Kaggle - Titanic Solution [1/3] - data analysis

  Рет қаралды 79,664

Minsuk Heo 허민석

Minsuk Heo 허민석

Күн бұрын

Пікірлер: 72
@MadhuKumarKilli
@MadhuKumarKilli 5 жыл бұрын
Wonderful step-by-step data analysis of a complicated dataset. Thanks for your video. Was very helpful in guiding me through the initial phase of Data Analysis with Python.
@anubhavsood1510
@anubhavsood1510 5 жыл бұрын
Complicated? This is one of the easiest Dataset.
@robertogarza1902
@robertogarza1902 5 жыл бұрын
This is a very good video and it helped me with an assignment for my machine learning class. Thank you very much!
@ishaansharma5982
@ishaansharma5982 6 жыл бұрын
Useful for getting started with kaggle competition . Nice.
@TheEasyoung
@TheEasyoung 6 жыл бұрын
thanks for cheerful replying, I will keep up good youtubing :)
@jeehoneybee2589
@jeehoneybee2589 5 жыл бұрын
Wonderful introduction on Kaggle. Thanks!
@brianheartwood2071
@brianheartwood2071 4 жыл бұрын
Good Video Minsuk, thanks my friend
@mathError4004
@mathError4004 3 жыл бұрын
Excellent sir. Can you make a video about Advanced House Price Prediction problem from kaggle?
@akshaymishra4957
@akshaymishra4957 6 жыл бұрын
I was able to rectify it....Good tutorial.. Keep helping us with more data set.
@bhavyar2986
@bhavyar2986 5 жыл бұрын
Beautiful explanation
@prabirbiswas440
@prabirbiswas440 4 жыл бұрын
Awesome. Thank you very much Minsuk. This is very very helpful.
@creativecore3575
@creativecore3575 5 жыл бұрын
Hello! Awesome video on this data set! It's very nice to see your way around the mystery as a Data Detective. But i have a question, how did you confirm on the SibSp data visualization section @ 9:15 that you were more likely to survive if you were accompanied by family members? The colors at the top of the [Dead] bar show that the people who came with many family members, no? Thanks again !
@TheEasyoung
@TheEasyoung 5 жыл бұрын
Lannon Khau you are right. The chart shows more likely to survive if the guy have only few siblings, but many siblings more likely dead. Thanks for feedback!
@kaushik4420
@kaushik4420 6 жыл бұрын
Fantastic work , thanks a ton :)
@Antnierv
@Antnierv 5 жыл бұрын
how is it that you are more likely to survive if you have more than 2 siblings/spouse? i see all the colors representing having more than 2 in the dead bar....
@TheEasyoung
@TheEasyoung 5 жыл бұрын
Good catch and thanks! You are right. Having more than 2 actually died more.
@Antnierv
@Antnierv 5 жыл бұрын
@@TheEasyoung okay thanks! great tutorial
@vanshikapathak517
@vanshikapathak517 5 жыл бұрын
What is the need to take title feature as we know we have sex amd pclass feature
@dhananjaykansal8097
@dhananjaykansal8097 5 жыл бұрын
You’re lovely sir. Beautiful explained
@abrahamjacob8876
@abrahamjacob8876 6 жыл бұрын
Really informative video ! Thanks a lot !
@russellmatson7923
@russellmatson7923 5 жыл бұрын
Great work! Would be better as a rap duo with a Titanic survivor or their descendents.
@simsin3297
@simsin3297 5 жыл бұрын
Is it necessary to plot a graph..?
@TheEasyoung
@TheEasyoung 5 жыл бұрын
Simran Singh nope! Just for better understanding of the data :)
@simsin3297
@simsin3297 5 жыл бұрын
@@TheEasyoung thanks for replying.. I am a beginner in machine learning...so,how should I decide features for graph ?
@TheEasyoung
@TheEasyoung 5 жыл бұрын
Simran Singh anything you think it is relevant, you plot it.
@davidpage9681
@davidpage9681 4 жыл бұрын
Fantastic video, I am trying to follow what you have done and am getting the folowing error when trying to apply the code from Binning 4.4.2 for dataset in train_test_data: dataset.loc[ dataset['Age'] 16) & (dataset['Age'] 26) & (dataset['Age'] 36) & (dataset['Age'] 62, 'Age'] = 4 ValueError: cannot set using a multi-index selection indexer with a different length than the value I have been on stack overflow and as I am still relatively new to coding am struggling ti undertsand this error, any help would be very much appreciated. Thanks Dave
@bharatchandra2512
@bharatchandra2512 5 жыл бұрын
NameError: name 'df' is not defined i'm getting an error that df is not defined 7:31 Please give fast reply sir How to solve the error?
@anubhavsood1510
@anubhavsood1510 5 жыл бұрын
You need to import the source data into the df first. Rather than coding, I would suggest going for theory and trying to see why are we doing what we are doinh
@robind999
@robind999 6 жыл бұрын
very good video. thanks
@stevesongrealestate
@stevesongrealestate 5 жыл бұрын
I logged on Kaggle, but didn't see the button "my submissions' or "submit predictions ". Why?
@TheEasyoung
@TheEasyoung 5 жыл бұрын
I logged in Kaggle and there is submit predictions button at the top right corner. I hope you find.
@stevesongrealestate
@stevesongrealestate 5 жыл бұрын
@@TheEasyoung I don't have on my side. Is there any requirement for the account?
@TheEasyoung
@TheEasyoung 5 жыл бұрын
Steve Song I have no idea but at least I don’t remember if there was. I wish you find it soon!
@stevesongrealestate
@stevesongrealestate 5 жыл бұрын
@@TheEasyoung That's Okay.
@manjunathnagendra9014
@manjunathnagendra9014 6 жыл бұрын
Amazing. In plot function how to set default colors which you are using bar plot stacked. Because plot function doesn't have palettes, you can only set using colors
@manjunathnagendra9014
@manjunathnagendra9014 6 жыл бұрын
Hey I figured it out. sns.set() will do it. Thanks :)
@Tino-df1su
@Tino-df1su 5 жыл бұрын
Hi! I watch your video about Titanic Solution today and I don't see your code in this link Github on the description. Can you share this code in video with me? Thank you!
@akshaymishra4957
@akshaymishra4957 6 жыл бұрын
my code is showing error when calling bar_chart('Pclass')..Help me out
@TheEasyoung
@TheEasyoung 6 жыл бұрын
Akshay Mishra which error do you get?
@satyammeena3155
@satyammeena3155 6 жыл бұрын
survived =train [train[survived]==1][feature].value_counts() please sir explain the use of feature in this and also explain the work of value_counts()
@TheEasyoung
@TheEasyoung 6 жыл бұрын
in bar_chart function, here feature is given as parameter it can be 'Pclass', 'Embarked' or any feature. train [train[survived]==1] means I only choose survived train data, train [train[survived]==1][feature] means a list from the survived guys feature (if there are 100 survived guys, the list has 100 item in it in terms of the feature), . value_counts() will group 100 feature item into feature's class, for example, if feature were 'Pclass', value_counts() will give something like 1:30, 2:30, 3:40, since we have 1st class, 2nd class, 3rd class and assume we have 100 survived train data. I hope my answer helps your question.
@vininitdgp
@vininitdgp 5 жыл бұрын
Hi, Thanks for the code Everything is working fine but I got stuck at cell 77, error is showing at line 3 which is saying that "could not convert string to float: 'S' Pleaser let me know how to resolve this? Thanks in advance
@caissa6187
@caissa6187 4 жыл бұрын
Thanks so much! 😊
@joelkhaung
@joelkhaung 6 жыл бұрын
Thanks for your demo. How did you generate Train and Test data csv. Any approach?
@TheEasyoung
@TheEasyoung 6 жыл бұрын
Kyaw Khaung hi, you can get titanic dataset from Kaggle.
@coricua88
@coricua88 5 жыл бұрын
Great work, Prof. Heo. I have question for you. Is it indispensable or recommendable to alwasy divide dataset for train and test separately for beginners ?
@TheEasyoung
@TheEasyoung 5 жыл бұрын
Yes. Since test data must be unseen data to the ML model for valid testing. Thanks for watching!
@coricua88
@coricua88 5 жыл бұрын
@@TheEasyoung so how can I select which datas to place into train and test dataset (csv file) respectively ? is there any rule to select for sake of accuracy ?
@ChannelMath
@ChannelMath 5 жыл бұрын
@@coricua88 they just need to be randomly selected
@zeus1082
@zeus1082 6 жыл бұрын
But people who paid high fare or who departed from rich place is same as pclass.Why dont we just include jist the pclass and ignore others?
@TheEasyoung
@TheEasyoung 6 жыл бұрын
you maybe right, you can test with your hypothesis, and compare the results to see which way results in better result in Kaggle test set. Sometimes we can delete duplicate feature or ignorable features. but sometimes data looks like correlated (when A goes high, B goes high) but which is actually not. thanks for good question!
@zeus1082
@zeus1082 6 жыл бұрын
Minsuk Heo 허민석 I used forest algo and used sex,pclass,age,parch as features.I got 0.7655 score for that in kaggle.Just now I submitted.How bad is that score?.By the way the tutorial was superb
@TheEasyoung
@TheEasyoung 6 жыл бұрын
aneesh cool your score is ok. Mine was 79.5 around. Cheers!
@zeus1082
@zeus1082 6 жыл бұрын
Minsuk Heo 허민석 coooool.we could still improve our model right.Last time I took the median for the age.Iam going to include some features and see the result.Weould improve our score.
@TheEasyoung
@TheEasyoung 6 жыл бұрын
good to hear that. one tip for you is ensemble. take a look bagging, boosting, stacking. since you already have preprocessed data, all you need is just collaborate multiple classifier results, which shall give you better results.
@РадаСеношенко
@РадаСеношенко 4 жыл бұрын
대단히 감사합니다 !!
@deep1998-j1v
@deep1998-j1v 4 жыл бұрын
Is this done in ML or DL?
@TheEasyoung
@TheEasyoung 4 жыл бұрын
In ML.
@pran6663
@pran6663 5 жыл бұрын
thanks a lot !!
@tamago9760
@tamago9760 6 жыл бұрын
参考になります!ありがとう。
@pranjalpathak4498
@pranjalpathak4498 5 жыл бұрын
"It is the first class... very rich people" LOL
@ChannelMath
@ChannelMath 5 жыл бұрын
then what he finds next...not so funny
@andrianjuric2732
@andrianjuric2732 4 жыл бұрын
"Damn it Jin Yang"
@4abdoulaye
@4abdoulaye 7 жыл бұрын
thanks
@TheEasyoung
@TheEasyoung 7 жыл бұрын
Abdoulaye Diallo my pleasure:)
@slawomirgontarek4213
@slawomirgontarek4213 6 жыл бұрын
Please set translation on other languages. Thank for it!!!
@guntherrondina2190
@guntherrondina2190 5 жыл бұрын
Its so funny hearing him saying that most male are dead at 8:04
@slawomirgontarek4213
@slawomirgontarek4213 6 жыл бұрын
Hello, PLEASE SET TRANSLATION ON OTHER LANGUAGES !!! THANK'S
@y_rb4080
@y_rb4080 4 жыл бұрын
Man ur accent is insane :(
@denisvoronov6571
@denisvoronov6571 4 жыл бұрын
It's better to have it in English with an accent than to not have it at all.
Kaggle - Titanic Solution [2/3] - Feature Engineering
13:14
Minsuk Heo 허민석
Рет қаралды 45 М.
How to do the Titanic Kaggle Competition
18:28
Aladdin Persson
Рет қаралды 80 М.
IL'HAN - Qalqam | Official Music Video
03:17
Ilhan Ihsanov
Рет қаралды 700 М.
Support each other🤝
00:31
ISSEI / いっせい
Рет қаралды 81 МЛН
We Attempted The Impossible 😱
00:54
Topper Guild
Рет қаралды 56 МЛН
How to Make a Data Science Project with Kaggle
21:00
Google Cloud Tech
Рет қаралды 133 М.
Kaggle - Titanic Solution [3/3] - Classifier, Cross Validation
10:30
Minsuk Heo 허민석
Рет қаралды 27 М.
But what is a neural network? | Deep learning chapter 1
18:40
3Blue1Brown
Рет қаралды 18 МЛН
3 Proven Data Science Projects for Beginners (Kaggle)
7:34
Ken Jee
Рет қаралды 336 М.
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 4,4 МЛН
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 435 М.
How to do the Titanic Kaggle competition in R - Part 1
35:07
Data Science Dojo
Рет қаралды 101 М.
IL'HAN - Qalqam | Official Music Video
03:17
Ilhan Ihsanov
Рет қаралды 700 М.