How to do the Titanic Kaggle competition in R

How to do the Titanic Kaggle competition in R - Part 2

Рет қаралды 27,568

Data Science Dojo

Күн бұрын

Пікірлер: 24

@michaelpastor6841 3 жыл бұрын

Three Cheers for Data Science Dojo! Excellent Tutorial - Thanks!

@cCc1453cCc 2 жыл бұрын

just wow. Salutation. I studied political science and I already did some modeling but this is next level stuff. Thank you for the insights man. But it suprised me you didnt even to bother checking Rsquared getting the best model but still your way ahead from what Ive ever seen. Again thx man

@Datasciencedojo 2 жыл бұрын

Keep following us for more tutorials, Seytan.

@Bradelff 6 жыл бұрын

This 2nd tutorial for fixing NA's is just what I've been looking for as a way to improve my model. thx

@syhusada1130 2 жыл бұрын

You're predicting 1 unit of Fare, with 246 unit of Age still missing. (246 out of 263 missing existed in the filtered from the upper whisker version of the dataframe). Is this alright?

@saravanansarasu4573 7 жыл бұрын

Clear and Good explanation.. Cheers from India :)

@wendhynovizar3926 Жыл бұрын

Unfortunaely I have missing values object on this chunk: titanic.model

@Datasciencedojo Жыл бұрын

It seems that you have missing values in your "titanic.train" dataset, which is causing an issue when using the randomForest function. Remove rows with missing values: If the missing values are present in only a small portion of the dataset, you can choose to remove the rows that contain missing values using the na.omit() function. However, be cautious when removing data, as it may lead to loss of valuable information. Use the random forest implementation that handles missing values: There is an R package called missForest that extends the random forest algorithm to handle missing values. This package imputes missing values in a random forest framework, allowing you to use the randomForest function with datasets containing missing values.

@wendhynovizar3926 Жыл бұрын

@@Datasciencedojo thanks for the heads up, I will look at it tomorrow

@ishwarashar8503 4 жыл бұрын

I am getting an error like this after running the predict() function : "Error in eval (predvars,data, env): numeric 'envir' arg not of length one. I am predicting age. I have followed everything same as fare except for the outlier filter i have taken the full data. Help please !!

@Bradelff 6 жыл бұрын

This tutorial on how to build and submit a kaggle competition model could not be better

@oscaryiu3329 5 жыл бұрын

Will it a problem if the filter do not remove the row which haves NA value in Fare? Since i want to do a regression model to predict age but it is too many NA in the filter

@Datasciencedojo 5 жыл бұрын

So it is not always a good idea to just remove the rows from the data where there are some missing values since if a value is missing from one column then it might be possible that the other columns are telling you some important information about the data. So another way is to find an average value and replace that in the data. Here even using this is a bad idea since we have already mentioned that fare varies according to different P-classes. So you would want to have some form of predictive model here which tells you that what is the best possible way to cater to those missing values in your data. You can first use this model to fill in your NA values in Fare just like we did this in the video. Fill your Fare column using these values and then build a regression model to predict the age. Hope this helps!

@syedarif8726 7 жыл бұрын

What was the reason for doing categorical casting? beginner

@lekalache9888 7 жыл бұрын

Who can explain me why after having used linear models and having new data (i cleaned Fare and Age) but the result of the last prediction is the same ? How can we clean data like Embarked ? Thanks a lot for the video and for you support.

@mario17-t34 2 жыл бұрын

Thanks, is there any details video about actual randomeForest part going from @17.27 ? I can not understand how we able to attach Survived to original Test, without messing rowIDs ? Is that done by somehow internally ? 60 Survived str(Survived ) Factor w/ 2 levels "0","1": 1 2 1 1 2 1 2 1 2 1 ... - attr(*, "names")= chr [1:418] "1" "2" "3" "4" ... 62 PassengerId