How to do the Titanic Kaggle competition in R - Part 2

  Рет қаралды 27,568

Data Science Dojo

Data Science Dojo

Күн бұрын

Пікірлер: 24
@michaelpastor6841
@michaelpastor6841 3 жыл бұрын
Three Cheers for Data Science Dojo! Excellent Tutorial - Thanks!
@cCc1453cCc
@cCc1453cCc 2 жыл бұрын
just wow. Salutation. I studied political science and I already did some modeling but this is next level stuff. Thank you for the insights man. But it suprised me you didnt even to bother checking Rsquared getting the best model but still your way ahead from what Ive ever seen. Again thx man
@Datasciencedojo
@Datasciencedojo 2 жыл бұрын
Keep following us for more tutorials, Seytan.
@Bradelff
@Bradelff 6 жыл бұрын
This 2nd tutorial for fixing NA's is just what I've been looking for as a way to improve my model. thx
@syhusada1130
@syhusada1130 2 жыл бұрын
You're predicting 1 unit of Fare, with 246 unit of Age still missing. (246 out of 263 missing existed in the filtered from the upper whisker version of the dataframe). Is this alright?
@saravanansarasu4573
@saravanansarasu4573 7 жыл бұрын
Clear and Good explanation.. Cheers from India :)
@wendhynovizar3926
@wendhynovizar3926 Жыл бұрын
Unfortunaely I have missing values object on this chunk: titanic.model
@Datasciencedojo
@Datasciencedojo Жыл бұрын
It seems that you have missing values in your "titanic.train" dataset, which is causing an issue when using the randomForest function. Remove rows with missing values: If the missing values are present in only a small portion of the dataset, you can choose to remove the rows that contain missing values using the na.omit() function. However, be cautious when removing data, as it may lead to loss of valuable information. Use the random forest implementation that handles missing values: There is an R package called missForest that extends the random forest algorithm to handle missing values. This package imputes missing values in a random forest framework, allowing you to use the randomForest function with datasets containing missing values.
@wendhynovizar3926
@wendhynovizar3926 Жыл бұрын
@@Datasciencedojo thanks for the heads up, I will look at it tomorrow
@ishwarashar8503
@ishwarashar8503 4 жыл бұрын
I am getting an error like this after running the predict() function : "Error in eval (predvars,data, env): numeric 'envir' arg not of length one. I am predicting age. I have followed everything same as fare except for the outlier filter i have taken the full data. Help please !!
@Bradelff
@Bradelff 6 жыл бұрын
This tutorial on how to build and submit a kaggle competition model could not be better
@oscaryiu3329
@oscaryiu3329 5 жыл бұрын
Will it a problem if the filter do not remove the row which haves NA value in Fare? Since i want to do a regression model to predict age but it is too many NA in the filter
@Datasciencedojo
@Datasciencedojo 5 жыл бұрын
So it is not always a good idea to just remove the rows from the data where there are some missing values since if a value is missing from one column then it might be possible that the other columns are telling you some important information about the data. So another way is to find an average value and replace that in the data. Here even using this is a bad idea since we have already mentioned that fare varies according to different P-classes. So you would want to have some form of predictive model here which tells you that what is the best possible way to cater to those missing values in your data. You can first use this model to fill in your NA values in Fare just like we did this in the video. Fill your Fare column using these values and then build a regression model to predict the age. Hope this helps!
@syedarif8726
@syedarif8726 7 жыл бұрын
What was the reason for doing categorical casting? beginner
@lekalache9888
@lekalache9888 7 жыл бұрын
Who can explain me why after having used linear models and having new data (i cleaned Fare and Age) but the result of the last prediction is the same ? How can we clean data like Embarked ? Thanks a lot for the video and for you support.
@mario17-t34
@mario17-t34 2 жыл бұрын
Thanks, is there any details video about actual randomeForest part going from @17.27 ? I can not understand how we able to attach Survived to original Test, without messing rowIDs ? Is that done by somehow internally ? 60 Survived str(Survived ) Factor w/ 2 levels "0","1": 1 2 1 1 2 1 2 1 2 1 ... - attr(*, "names")= chr [1:418] "1" "2" "3" "4" ... 62 PassengerId
@videojock100
@videojock100 8 жыл бұрын
very good video for new learners can we have something for cross validation
@OpeItsJoey
@OpeItsJoey 5 жыл бұрын
Man you've got to make the text smaller and enlarge your window so I can actually see what's going on... Otherwise a handy video
@akashprabhakar6353
@akashprabhakar6353 4 жыл бұрын
Thanks for the video sir... I tried applying logistic regression with following code: survived.equation2
@piyushpandey8509
@piyushpandey8509 4 жыл бұрын
the moment 2:57.....wait for it
@funkmouseyang
@funkmouseyang 6 жыл бұрын
My accuracy got worse :(
@mandarvichare61285
@mandarvichare61285 5 жыл бұрын
same here .. it decreased from 0.779 to 0.775
How to do the Titanic Kaggle competition in R - Part 1
35:07
Data Science Dojo
Рет қаралды 101 М.
Building Our First Model | Introduction to Text Analytics with R Part 4
28:36
So Cute 🥰 who is better?
00:15
dednahype
Рет қаралды 19 МЛН
Text Analytics Fundamentals | Introduction to Text Analytics with R Part 2
33:59
Data Pipelines | Introduction to Text Analytics with R Part 3
31:49
Data Science Dojo
Рет қаралды 38 М.
Reshape, Subset, and Summarize Data | Introduction to dplyr Part 2
21:46
Data Science Dojo
Рет қаралды 22 М.
Introduction to Feature Engineering | Introduction to dplyr Part 4
28:26
Data Science Dojo
Рет қаралды 8 М.
N-grams | Introduction to Text Analytics with R Part 6
29:37
Data Science Dojo
Рет қаралды 31 М.
Kaggle - Titanic Solution [1/3] - data analysis
10:41
Minsuk Heo 허민석
Рет қаралды 79 М.
Intro to Data Visualization with R & ggplot2
1:11:15
Data Science Dojo
Рет қаралды 276 М.
R programming for beginners - statistic with R (t-test and linear regression) and dplyr and ggplot
15:49