Amazing !! Thanks for sharing , I learn something incredibly useful every time, even tips and tricks.
@haraldurkarlsson1147 Жыл бұрын
In regards to the tectonic settings it would be best to simply lump together all intraplate, all Rift zone, all Subduction to get three factor. Another approach is to group it by crust into categories oceanic and continental crust and intermediate crust. I think this would be better than simply tossing stuff.
@haraldurkarlsson1147 Жыл бұрын
As a side comment. I worked on a project as undergraduate to determine the type of volcano that most likely had produced a particular mix of rock types. Based on this work (around 1977) we concluded that a particular mix of rock samples dredged of the coast Iceland originated from a central volcano or not (there was also gravity data and possibly paleomagnetic data).
@JuliaSilge Жыл бұрын
That is so interesting! 👀
@haraldurkarlsson11473 жыл бұрын
Very interesting project! I must say, however, as a geologist, that I would have been surprised if the data correlated with latitude and longitude. Volcano types are mostly linked to their tectonic settings. Shield volcanoes are almost exclusively linked to oceanic settings and hotspots such as Iceland or Hawaii and are dominated with basalts. Stratovolcanoes on the other hand are typically linked with andesite (and rhyolite) and are found around subduction zones. Most active volcanoes link up with plate boundaries and those boundaries have no relation to latitude or longitude. When I was an undergrad in Iceland I worked on volcanic rocks dredged off the seafloor near Iceland. My task was to identify the volcano type they were associated with based on the mix of rock types we collected at each site. I would have loved to access to the tools you are using here but alas those did not exist. It would have been much easier to infer the origin of these rocks.
@hesamseraj4 жыл бұрын
Thank you very much Julia.
@jonathanjayes4 жыл бұрын
Thank you Julia! This was fascinating!
@foobar42753 жыл бұрын
@Julia: In the volcano_rec recipe I think there is a mistake. Minute mark ~21 - EDIT: I thought there was a mistake but it turns out there is no mistake, just a different way to handle a feature matrix with continuous and dummy variables. The issue is step_zv and step_normalize on all_predictors after creating dummy variables. In the recipe, dummy variables are created for tectonic_settings and major_rock_1. Then, all variables are passed to steps zero variance and normalization. I ran a quick simulation on my personal machine and the recipe as written would calculate the variance for the previously created dummy variables and standardize the dummy variables. EDIT: I thought that binary variables shouldn't be standardized but apparently there is some literature that suggests binary variables should be standardized (Tibshirani) or how to standardize continuous variables to approximate the scale of a [one-hot encoded] binary variable. I haven't finished the video yet so if you go back and correct this, I apologize. Otherwise, others be warned, those steps are wrong. One solution would be to do the step_zv and step_normalize before the dummy step as step_zv(all_numeric_predictors()) and step_normalize(all_numeric_predictors()). I've tested this and it works.
@JuliaSilge3 жыл бұрын
Well, it's not necessarily a "no-no" to center and scale dummy variables: community.rstudio.com/t/should-i-center-scale-dummy-variables/43212
@foobar42753 жыл бұрын
@@JuliaSilge Thank you for sharing the link! =D I wasn't aware of Tibshirani's or Gelman's views on standardizing binary variables.
@lukasputtmann35904 жыл бұрын
I really enjoyed this video! Thanks a lot.
@haraldurkarlsson11473 жыл бұрын
Does the step_zv remove variables with perfect correlation? Possible confounding variables?
@JuliaSilge3 жыл бұрын
No, just those with zero variance: recipes.tidymodels.org/reference/step_zv.html You can filter out variables that are highly correlated with step_corr(): recipes.tidymodels.org/reference/step_corr.html
@haraldurkarlsson11473 жыл бұрын
Ah - thanks
@taiwankyh4 жыл бұрын
You suggest a good article for multi-classification; could you please spell the author or give the hyperlink? Thanks
@haraldurkarlsson11473 жыл бұрын
Julia, I am enjoying your videos tremendously. Currently I am focusing on the tidymodels. Do you have a suggestion for which order they should be watched in or are they each stand-alone? Thanks P. S. I have used skimr for sometime but recently it has stopped working? I have updated the version but no change. Any ideas?
@JuliaSilge3 жыл бұрын
I unfortunately haven't invested time at this point in putting the videos "in order"; they do vary in how advanced they are and I have tried to note in the descriptions which ones are better for folks just starting out with tidymodels. Sorry about that! They have been made somewhat organically week by week using Tidy Tuesday data. I haven't had any problems with skimr lately, but if you can create a reprex showing the problem, I'm sure the maintainers would be happy to see what is happening: github.com/ropensci/skimr/issues
@haraldurkarlsson11473 жыл бұрын
@@JuliaSilge Thanks - I understand. I do love the wholistic approach though of working through a project from beginning to end. That to me has been my main issue with places like DataCamp where you see more bite-site projects.
@clarkevansteenderen78274 жыл бұрын
Thank you for this awesome tutorial!! Does one only subset data into training and testing sets if there is a lot of data available? Or how do you decide whether to do that, or to just use bootstrapping on the original data as a whole, as you did in this example?
@JuliaSilge4 жыл бұрын
Almost *always* you want to split into training/testing; this is the most important step in empirical model validation. The only time when you might not want to do this is when the available data is "pathologically" small, like this dataset of volcanoes.
@brodiegus24733 жыл бұрын
I dont mean to be offtopic but does anybody know a tool to log back into an Instagram account? I was stupid lost the password. I love any assistance you can give me!
@everettleonel28443 жыл бұрын
@Brodie Gus instablaster :)
@brodiegus24733 жыл бұрын
@Everett Leonel i really appreciate your reply. I found the site through google and im waiting for the hacking stuff atm. I see it takes a while so I will get back to you later when my account password hopefully is recovered.
@brodiegus24733 жыл бұрын
@Everett Leonel it worked and I actually got access to my account again. I am so happy! Thanks so much, you saved my ass!
@王伟-g1h4 жыл бұрын
Thanks for the tutorial. Great.
@christopheraloo51214 жыл бұрын
had a feeling population within kilometres would make for a good predictor since different types of volcanoes have different amount of footprint(used loosely)
@UsmanKhaliq104 жыл бұрын
thanks! this was a pretty cool tutorial.
@renanxcortes24 жыл бұрын
Very cool video! Very didactic and informative! I wonder where in the code of tidymodels (or its dependencies) the predicted probabilities generated are corrected by the resampling strategy that the user uses (for example, oversampled some of the minority categories). Similarly as explained here: www.knime.com/blog/correcting-predicted-class-probabilities-in-imbalanced-datasets. Also, I think the metric was good, wasn't it? Because in this case the "Naive Guessing" would be 33,33% of probability and not 50%, therefore and AUC higher than 60% is already good, isn't it? Thank you so much again for posting this video!
@flamboyantperson59364 жыл бұрын
You are amazing. could you please recommend someone like you who makes video in Python? It would be of great help.
@JuliaSilge4 жыл бұрын
I really like Rachael Tatman's livestreams: www.twitch.tv/rctatman
@flamboyantperson59364 жыл бұрын
@@JuliaSilge Thank you so much.
@flamboyantperson59364 жыл бұрын
@@JuliaSilge Can I add you on facebook?
@JuliaSilge4 жыл бұрын
@@flamboyantperson5936 HA well I'm not on Facebook, actually.
@flamboyantperson59364 жыл бұрын
@@JuliaSilge No Problem. There is a lot to learn from you but unfortunately my company is not working on R. I wish you could give the same knowledge in python. You are very very talented.