Titanic Survival Prediction in Python - Machine Learning Project

  Рет қаралды 91,785

NeuralNine

NeuralNine

Күн бұрын

In this video we build a model, which predicts titanic survivors with a decent accuracy.
Kaggle Challenge: www.kaggle.com/c/titanic
◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾
📚 Programming Books & Merch 📚
🐍 The Python Bible Book: www.neuralnine.com/books/
💻 The Algorithm Bible Book: www.neuralnine.com/books/
👕 Programming Merch: www.neuralnine.com/shop
🌐 Social Media & Contact 🌐
📱 Website: www.neuralnine.com/
📷 Instagram: / neuralnine
🐦 Twitter: / neuralnine
🤵 LinkedIn: / neuralnine
📁 GitHub: github.com/NeuralNine
🎙 Discord: / discord
🎵 Outro Music From: www.bensound.com/

Пікірлер: 97
@timvielhauer1231
@timvielhauer1231 7 ай бұрын
The latest pandas version is not ignoring string values in the .corr function anymore. just add "numeric_only=True" and it will work again
@user-ss5uf4dx3n
@user-ss5uf4dx3n 7 ай бұрын
thank you so much! i was looking how to resolve this issue
@hk6926
@hk6926 Ай бұрын
People who are dump like me , here what it means :) sns.heatmap(train_data.corr(numeric_only='True'), cmap='YlGnBu')
@ireallydontknow.
@ireallydontknow. Ай бұрын
Thank you so much bro I was trying to solve this for 2 days continuously and nothing worked..🥹
@saya5664
@saya5664 2 жыл бұрын
Great tutorial video! helped me to understand how pipeline in ML works, hope there will be more Kaggle competition walkthrough like this from you soon! :)
@muratsahin1978
@muratsahin1978 Жыл бұрын
I was pretty confused when I saw %100 accuracy lol, thanks for the explaining.
@benjamindeporte3806
@benjamindeporte3806 2 жыл бұрын
Nice "real life" example of the scikit pipeline. Helped me a lot, thanks.
@jaym0ney_
@jaym0ney_ 2 жыл бұрын
This is a great video, I’ve been trying to find a good place that would show the code behind creating a basic ML pipeline, or show some beginner feature engineering and whatnot, but I haven’t found anything as straightforward as this. A lot of other people have a lot of fluff in their tutorials, but you just show it straight up, which I really appreciate. Do you have any recommendations for textbooks/articles for a beginner wanting to get into Machine Learning? I have a strong math/programming background, so that’s not an issue, I just need something that will comprehensively explain all the main components of making an ML project. Thanks in advance and keep up the good work!
@cryptigo
@cryptigo 2 жыл бұрын
This is actually such a good idea. A lot of python program / resume ideas are boring. Thanks!
@jeremyheng8573
@jeremyheng8573 2 жыл бұрын
Thank you for great tutorial! Do you have more Kaggle competition walkthrough?
@shashvatsinghal2574
@shashvatsinghal2574 Жыл бұрын
This is the best video i have ever watch on datascience and ml till date
@armantech5926
@armantech5926 11 ай бұрын
Great Video, thank you!
@vivekthumu8992
@vivekthumu8992 Жыл бұрын
Thank u so much for providing this video helped me to understand a lot
@paralogyX
@paralogyX 2 жыл бұрын
Good video, but: 1) What was a purpose of test set? You didn't use for your model estimation and you used cross-validation. 2) You shouldn't fit StandardScaler on Kaggle Test Set, but only transform on the same scaler you used for training data, because if features distributed a bit different, then scaling will be different and your model will get different numbers for exactly similar passenger. Would be nice if you pay attention to these details, because they are really important. But generally, video is nice and useful.
@jaysoncastillo2593
@jaysoncastillo2593 7 ай бұрын
Got the same comment. Test set shouldn’t be fitted anymore but only transformed.
@jaysoncastillo2593
@jaysoncastillo2593 7 ай бұрын
Do you know any yt channel solving the titanic dataset for reference?
@wasgeht2409
@wasgeht2409 Жыл бұрын
Thank you... I have one question, why u pick this models ? On which KPI based you choice your models for any kinds of problems. That will be a very interesting for me
@jomp6141
@jomp6141 Ай бұрын
Man your video was awesome. Easy to follow and replicate, plus you explain the key insights for those of us who have only a little knowledge of data analysis. Thanks a lot!
@abhinavchoudhary6849
@abhinavchoudhary6849 2 жыл бұрын
Awesome bro
@Mychannel12404
@Mychannel12404 2 жыл бұрын
Your videos are awesome I like them too much that's great job. Love from India....
@mertmunuklu7732
@mertmunuklu7732 5 ай бұрын
Thanks, it is a great tutorial
@aflahalabri6331
@aflahalabri6331 4 ай бұрын
I don't think there was a need for creating the AgeImputer class at least in the latest versions, probably using the SimpleImpute class directly is sufficient. But it's good learning tip on how to create a custom class.
@AzureCz
@AzureCz 2 жыл бұрын
I'm curious, how do I know the accuracy percentage inside the notebook, comparing the prediction with the dataset that we have, and not just uploading to kaggle.
@novagamings4505
@novagamings4505 Жыл бұрын
I am new in the field of data science in terms of experience. I have completed paid skill course from IBM though. In my first attempt of this project which is my first project i got an accuracy of 78%. Is it good enough and should i move on to next project or try to refine my model for better accuracy. Please suggest someone with experience
@pravachanpatra4012
@pravachanpatra4012 2 жыл бұрын
Can you make a tutorial on an AI that plays a game using the NEAT module in python and pygame???
@anotherone8256
@anotherone8256 2 жыл бұрын
Nice video.
@Warclimb64
@Warclimb64 Ай бұрын
had a problem here 42:05 I solved only selecting numeric: X_test_numeric = X_test.select_dtypes(include=[np.number])
@valentinmagis6743
@valentinmagis6743 Жыл бұрын
Why are you scaling the variables when using a tree-based model? Scaling is done to Normalize data so that priority is not given to a particular feature. Scaling is mostly important in algorithms that are distance based and require Euclidean Distance. Random Forest is a tree-based model and hence does not require feature scaling.
@yogeshwarkethepalli4234
@yogeshwarkethepalli4234 11 ай бұрын
sparse matrix length is ambiguous; use getnnz() or shape[0] showing error message as shown above.(How to slove this) column_names = ["C", "S", "Q", "N"] ---> 13 for i in range(len(matrix.T)): 14 X[column_names[i]] = matrix.T[i]
@wbdhh317
@wbdhh317 7 ай бұрын
me too how to solve
@yashtysingh1171
@yashtysingh1171 10 ай бұрын
Sir my updated sklearn version doesn't have fit_transform.. Please guide what should I do!
@statistikochspss-hjalpen8335
@statistikochspss-hjalpen8335 Жыл бұрын
11:45 You can't use Pearson correlation coefficient for nominal/ordinal data. 12:49 you need to create dummy variables for each class.
@unfff
@unfff 10 ай бұрын
Hey, I see he addresses the Pearson correlation coeffecient issue later on where he uses One Hot Encoding to turn the data from ordinal to discrete. Is there a better way to visualize correlation even when you use this method? Or would doing the one hot encoding first and then doing the correlation heat map be best practise?
@statistikochspss-hjalpen8335
@statistikochspss-hjalpen8335 10 ай бұрын
@@unfff doing one hot encoding and choosing the right correlation coefficient are two separate things. One hot encoding has nothing to do with correlation analysis. One hot encoding is just a transformation of a variable that can be used for multiple purposes.
@Summer-of8zk
@Summer-of8zk 9 ай бұрын
to fix the fact corr() doesnt work with words, then you can do "df.corr(numeric_only=True)". where df is your data, and that will give the corr for your data but you do lose the non integer data coiumns.
@statistikochspss-hjalpen8335
@statistikochspss-hjalpen8335 9 ай бұрын
@@Summer-of8zkYou are talking about a technical solution. What do you mean by if it doesn't work? Every statistical software will produce a correlation coefficient as long as your columns have some digits in it. I'm talking about what's theoretically (in)correct.
@juanmariomorenochaparro127
@juanmariomorenochaparro127 Жыл бұрын
Thanks, very interesntin video, new susbcribe.
@TheErick211_
@TheErick211_ 2 ай бұрын
Is there a video in which you have a deep explanation of how to understand 'Class' __init__ and everything related to this methods?
@Animax590
@Animax590 4 ай бұрын
I just used logistic regression and got 0.7655 taking only gender & Pclass. Thanks for your clarification about 100% accuracy though.
@supremenp
@supremenp Жыл бұрын
sns.heatmap(titanic_data.corr(), cmap="YlGnBu") plt.show() This gives error: could not convert string to float: 'Braund, Mr. Owen Harris' shouldn't the titanic_data.corr() drop the string columns automatically?
@heisgiovann
@heisgiovann 10 ай бұрын
How did you solve this error?
@unfff
@unfff 10 ай бұрын
Do sns.heatmap(titanic_data.corr(numeric_only=True),cmap="YlGnBu") instead of sns.heatmap(titanic_data.corr(),cmap="YlGnBu") in 11:50 as I assume it defaulted to True when this video was made and was later made not to. This is because that correlation function can't figure out the correlation between anything not quantitative so you have to tell the function to only look at numerical features.
@TheShakour
@TheShakour 9 ай бұрын
@@unfff tnx bro... it helped
@sushre10
@sushre10 3 ай бұрын
yes this same error exist to me also
@mahis7232
@mahis7232 3 ай бұрын
@@unffftysm 🥰
@paulbuono5088
@paulbuono5088 Жыл бұрын
Interesting where at 15:10 you said you don't want to look too much at your training set so you don't get biased. It seems everyone else I hear says to examine it as much as possible....is there something I'm misinterpreting from you or them?
@alimemon9942
@alimemon9942 3 ай бұрын
He said testing dataset not the training dataset.
@philjoseph3252
@philjoseph3252 2 ай бұрын
Is there a difference between hit encoding in pandas and sklearn? The process is so much easier with pandas, is there a particular reason why he used sklearn?
@soorajsridhar3279
@soorajsridhar3279 Жыл бұрын
I followed the code as said in the video and came across an error when we fit_transform with the strat_test_set. The error was that the 'Embarked' column was missing. I think it is because we drop it in featuredropper function, but in the pipeline as we process it all over again , I guess we get this error. Can you help me fix it asap???
@yogeshchoudhary1414
@yogeshchoudhary1414 Жыл бұрын
I got the same error too
@rachelalam560
@rachelalam560 9 ай бұрын
Me too
@binglinjian2324
@binglinjian2324 9 ай бұрын
maybe that's because you run that part of code multiple times? I restart and run all the code, it works fine.
@jeeaspirant7890
@jeeaspirant7890 17 күн бұрын
​@@binglinjian2324please tell how to fix this 😢
@fizipcfx
@fizipcfx 2 жыл бұрын
This is strange but, if you add the name length as a column it helps. The name length has 0.332350 correlation with the Survived column :)
@paralogyX
@paralogyX 2 жыл бұрын
Correlation is not causation. Very good example!
@ParthivShah
@ParthivShah Ай бұрын
nice
@jsemslava7880
@jsemslava7880 Жыл бұрын
A little bit fast(especially typing xD), but good tutorial; I got 79,42%, thanks!
@emmaoye2704
@emmaoye2704 Жыл бұрын
Am i the only one Stuck at 32:31. i keep getting this error: AttributeError: 'FeatureEncoder' object has no attribute 'transform'
@aidaosmonova4798
@aidaosmonova4798 6 ай бұрын
could you solve this?
@lemanosmanli2006
@lemanosmanli2006 29 күн бұрын
@@aidaosmonova4798 hi could you solve it?
@jeeaspirant7890
@jeeaspirant7890 17 күн бұрын
Please tell how to fix this
@TheErick211_
@TheErick211_ 2 ай бұрын
Can we download your jupyter notebook from somewher?
@user-cd5gd5ij4k
@user-cd5gd5ij4k 2 ай бұрын
Thank you for you teach video, it is very good for noob
@tgmbrett
@tgmbrett 2 жыл бұрын
at 32:00, how is he calling stat_train_set in the pipeline.fit_transform function when the variable doesnt exist yet?
@90Lema
@90Lema Жыл бұрын
Did u find the answer?😬
@sayuri_20
@sayuri_20 Ай бұрын
@@90Lema Did you find yet ?
@shanondalmeida7235
@shanondalmeida7235 7 ай бұрын
Correlation doesn't work for string values hw u did it ? 🤔
@Dan-mm9yd
@Dan-mm9yd 2 ай бұрын
Same problem
@lemanosmanli2006
@lemanosmanli2006 29 күн бұрын
@@Dan-mm9yd numeric_only=True
@TheNewfacto
@TheNewfacto 5 ай бұрын
I just submitted mine today and I got a score of 0.78229 but then I saw all those 1s and I was like "just how did they do that"😂
@komalrehman7173
@komalrehman7173 2 ай бұрын
i am having strat data error after that everywhere its an error anyone can explain why
@cristhianriverajurado7497
@cristhianriverajurado7497 Жыл бұрын
I got this error ValueError: Input contains NaN after this line strat_train_set = pipeline.fit_transform(strat_train_set),I was following your tutorial.
@yashp5341
@yashp5341 Жыл бұрын
I got the same error, did you perhaps get the answer?
@francoramirezcastillo8075
@francoramirezcastillo8075 Жыл бұрын
@@yashp5341 I solved it, but I don't know if you get the same error, it kept emphasizing this: X[column_names[i]] = matrix.T(i), and it should look like this: X[column_names[i]] = matrix .T[i], I had to change the parentheses for this [ ], I hope it helps
@dragosdalta4317
@dragosdalta4317 Жыл бұрын
Cn't import BaseEstimator, anyone can help?
@lemanosmanli2006
@lemanosmanli2006 29 күн бұрын
Hello thanks for your this video , but strat_train_set = pipeline.fit(strat_train_Set) give attribute error that DataFrame object has no attribute "toarray"
@jeeaspirant7890
@jeeaspirant7890 17 күн бұрын
How to fix this please tell
@lemanosmanli2006
@lemanosmanli2006 17 күн бұрын
@@jeeaspirant7890 I can't fix it
@whilstblower901
@whilstblower901 9 ай бұрын
Give the notebook
@mtk-0_0
@mtk-0_0 Жыл бұрын
decent vid
@HypnosisBear
@HypnosisBear 2 жыл бұрын
Lol
@pogus3229
@pogus3229 2 жыл бұрын
lol
@HypnosisBear
@HypnosisBear 2 жыл бұрын
Even I laughed at the title.
@kianestrera-hr5vt
@kianestrera-hr5vt 29 күн бұрын
I see they probably cheating I lost confidence when I say some 100% while I only got 0.76 which I think is not bad
@aleksandr.v100
@aleksandr.v100 2 жыл бұрын
Very interesting. But please translate your video in Russian
@quasii7
@quasii7 2 жыл бұрын
No offence, but the generally accepted language of computer science is English. It would be hard to translate everything, and I am saying this as a non native speaker.
@aleksandr.v100
@aleksandr.v100 2 жыл бұрын
@@quasii7 а, ну ладно
@paralogyX
@paralogyX 2 жыл бұрын
I am also Russian, but all computer science literature etc is mostly in English, so better to get used to it.
@marie-louiseleroux828
@marie-louiseleroux828 2 жыл бұрын
I'm actually tired of worrying about stocks. it's driving me nuts these days,I think crypto investment is far better than stock made over $39k in a week..
@abubakar_Abson
@abubakar_Abson 2 жыл бұрын
oops that's a huge lost.
@charlesthomas2735
@charlesthomas2735 2 жыл бұрын
That's a good idea,but how do I get an experienced trader? I don't know anyone sorry to bother you mate do you have any that I could work with?
@greysonyhk2826
@greysonyhk2826 2 жыл бұрын
He'll help you recover your money. But must take caution, On the broker you invest with.
@jonassturluson5273
@jonassturluson5273 2 жыл бұрын
he is the best Broker, I have tried lots of professionals but got exceptional income trading with Dave Javens he is the best strategy now earning over $18,300 every 10 days...
@thomassterne599
@thomassterne599 2 жыл бұрын
To me it is, been working with him for a year and four months. And I have been getting my profits seems legit to me️
How to do the Titanic Kaggle Competition
18:28
Aladdin Persson
Рет қаралды 71 М.
孩子多的烦恼?#火影忍者 #家庭 #佐助
00:31
火影忍者一家
Рет қаралды 21 МЛН
Sigma Girl Past #funny #sigma #viral
00:20
CRAZY GREAPA
Рет қаралды 27 МЛН
I wish I could change THIS fast! 🤣
00:33
America's Got Talent
Рет қаралды 86 МЛН
House Price Prediction in Python - Full Machine Learning Project
40:40
Python Machine Learning Tutorial (Data Science)
49:43
Programming with Mosh
Рет қаралды 2,8 МЛН
Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic)
2:20:17
Beginner Kaggle Data Science Project Walk-Through (Titanic)
38:16
Professional Preprocessing with Pipelines in Python
21:48
NeuralNine
Рет қаралды 58 М.
Tutorial 11-Exploratory Data Analysis(EDA) of Titanic dataset
31:45
Simple maintenance. #leddisplay #ledscreen #ledwall #ledmodule #ledinstallation
0:19
LED Screen Factory-EagerLED
Рет қаралды 7 МЛН
Собери ПК и Получи 10,000₽
1:00
build monsters
Рет қаралды 1,8 МЛН
Спутниковый телефон #обзор #товары
0:35
Product show
Рет қаралды 1,7 МЛН
How To Unlock Your iphone With Your Voice
0:34
요루퐁 yorupong
Рет қаралды 28 МЛН