Multinomial classification with tidymodels and volcano eruptions

Рет қаралды 7,770

Julia Silge

Күн бұрын

Пікірлер: 35

@pablotercero4860 4 жыл бұрын

Amazing !! Thanks for sharing , I learn something incredibly useful every time, even tips and tricks.

@haraldurkarlsson1147 Жыл бұрын

In regards to the tectonic settings it would be best to simply lump together all intraplate, all Rift zone, all Subduction to get three factor. Another approach is to group it by crust into categories oceanic and continental crust and intermediate crust. I think this would be better than simply tossing stuff.

@haraldurkarlsson1147 Жыл бұрын

As a side comment. I worked on a project as undergraduate to determine the type of volcano that most likely had produced a particular mix of rock types. Based on this work (around 1977) we concluded that a particular mix of rock samples dredged of the coast Iceland originated from a central volcano or not (there was also gravity data and possibly paleomagnetic data).

@JuliaSilge Жыл бұрын

That is so interesting! 👀

@haraldurkarlsson1147 3 жыл бұрын

Very interesting project! I must say, however, as a geologist, that I would have been surprised if the data correlated with latitude and longitude. Volcano types are mostly linked to their tectonic settings. Shield volcanoes are almost exclusively linked to oceanic settings and hotspots such as Iceland or Hawaii and are dominated with basalts. Stratovolcanoes on the other hand are typically linked with andesite (and rhyolite) and are found around subduction zones. Most active volcanoes link up with plate boundaries and those boundaries have no relation to latitude or longitude. When I was an undergrad in Iceland I worked on volcanic rocks dredged off the seafloor near Iceland. My task was to identify the volcano type they were associated with based on the mix of rock types we collected at each site. I would have loved to access to the tools you are using here but alas those did not exist. It would have been much easier to infer the origin of these rocks.

@hesamseraj 4 жыл бұрын

Thank you very much Julia.

@jonathanjayes 4 жыл бұрын

Thank you Julia! This was fascinating!

@foobar4275 3 жыл бұрын

@Julia: In the volcano_rec recipe I think there is a mistake. Minute mark ~21 - EDIT: I thought there was a mistake but it turns out there is no mistake, just a different way to handle a feature matrix with continuous and dummy variables. The issue is step_zv and step_normalize on all_predictors after creating dummy variables. In the recipe, dummy variables are created for tectonic_settings and major_rock_1. Then, all variables are passed to steps zero variance and normalization. I ran a quick simulation on my personal machine and the recipe as written would calculate the variance for the previously created dummy variables and standardize the dummy variables. EDIT: I thought that binary variables shouldn't be standardized but apparently there is some literature that suggests binary variables should be standardized (Tibshirani) or how to standardize continuous variables to approximate the scale of a [one-hot encoded] binary variable. I haven't finished the video yet so if you go back and correct this, I apologize. Otherwise, others be warned, those steps are wrong. One solution would be to do the step_zv and step_normalize before the dummy step as step_zv(all_numeric_predictors()) and step_normalize(all_numeric_predictors()). I've tested this and it works.

@JuliaSilge 3 жыл бұрын

Well, it's not necessarily a "no-no" to center and scale dummy variables: community.rstudio.com/t/should-i-center-scale-dummy-variables/43212

@foobar4275 3 жыл бұрын

@@JuliaSilge Thank you for sharing the link! =D I wasn't aware of Tibshirani's or Gelman's views on standardizing binary variables.

@lukasputtmann3590 4 жыл бұрын

I really enjoyed this video! Thanks a lot.

@haraldurkarlsson1147 3 жыл бұрын

Does the step_zv remove variables with perfect correlation? Possible confounding variables?

@JuliaSilge 3 жыл бұрын

No, just those with zero variance: recipes.tidymodels.org/reference/step_zv.html You can filter out variables that are highly correlated with step_corr(): recipes.tidymodels.org/reference/step_corr.html

@haraldurkarlsson1147 3 жыл бұрын

Ah - thanks

@taiwankyh 4 жыл бұрын

You suggest a good article for multi-classification; could you please spell the author or give the hyperlink? Thanks

@haraldurkarlsson1147 3 жыл бұрын

Julia, I am enjoying your videos tremendously. Currently I am focusing on the tidymodels. Do you have a suggestion for which order they should be watched in or are they each stand-alone? Thanks P. S. I have used skimr for sometime but recently it has stopped working? I have updated the version but no change. Any ideas?

@JuliaSilge 3 жыл бұрын

I unfortunately haven't invested time at this point in putting the videos "in order"; they do vary in how advanced they are and I have tried to note in the descriptions which ones are better for folks just starting out with tidymodels. Sorry about that! They have been made somewhat organically week by week using Tidy Tuesday data. I haven't had any problems with skimr lately, but if you can create a reprex showing the problem, I'm sure the maintainers would be happy to see what is happening: github.com/ropensci/skimr/issues

@haraldurkarlsson1147 3 жыл бұрын

@@JuliaSilge Thanks - I understand. I do love the wholistic approach though of working through a project from beginning to end. That to me has been my main issue with places like DataCamp where you see more bite-site projects.

@clarkevansteenderen7827 4 жыл бұрын

Thank you for this awesome tutorial!! Does one only subset data into training and testing sets if there is a lot of data available? Or how do you decide whether to do that, or to just use bootstrapping on the original data as a whole, as you did in this example?

@JuliaSilge 4 жыл бұрын

Almost *always* you want to split into training/testing; this is the most important step in empirical model validation. The only time when you might not want to do this is when the available data is "pathologically" small, like this dataset of volcanoes.

@brodiegus2473 3 жыл бұрын

I dont mean to be offtopic but does anybody know a tool to log back into an Instagram account? I was stupid lost the password. I love any assistance you can give me!

@everettleonel2844 3 жыл бұрын

@Brodie Gus instablaster :)

@brodiegus2473 3 жыл бұрын

@Everett Leonel i really appreciate your reply. I found the site through google and im waiting for the hacking stuff atm. I see it takes a while so I will get back to you later when my account password hopefully is recovered.

@brodiegus2473 3 жыл бұрын

@Everett Leonel it worked and I actually got access to my account again. I am so happy! Thanks so much, you saved my ass!

@王伟-g1h 4 жыл бұрын

Thanks for the tutorial. Great.

@christopheraloo5121 4 жыл бұрын

had a feeling population within kilometres would make for a good predictor since different types of volcanoes have different amount of footprint(used loosely)

@UsmanKhaliq10 4 жыл бұрын

thanks! this was a pretty cool tutorial.

@renanxcortes2 4 жыл бұрын

Very cool video! Very didactic and informative! I wonder where in the code of tidymodels (or its dependencies) the predicted probabilities generated are corrected by the resampling strategy that the user uses (for example, oversampled some of the minority categories). Similarly as explained here: www.knime.com/blog/correcting-predicted-class-probabilities-in-imbalanced-datasets. Also, I think the metric was good, wasn't it? Because in this case the "Naive Guessing" would be 33,33% of probability and not 50%, therefore and AUC higher than 60% is already good, isn't it? Thank you so much again for posting this video!

@flamboyantperson5936 4 жыл бұрын

You are amazing. could you please recommend someone like you who makes video in Python? It would be of great help.

@JuliaSilge 4 жыл бұрын

I really like Rachael Tatman's livestreams: www.twitch.tv/rctatman

@flamboyantperson5936 4 жыл бұрын

@@JuliaSilge Thank you so much.

@flamboyantperson5936 4 жыл бұрын

@@JuliaSilge Can I add you on facebook?

@JuliaSilge 4 жыл бұрын

@@flamboyantperson5936 HA well I'm not on Facebook, actually.

@flamboyantperson5936 4 жыл бұрын

@@JuliaSilge No Problem. There is a lot to learn from you but unfortunately my company is not working on R. I wish you could give the same knowledge in python. You are very very talented.