These are honestly some of the most helpful/instructional videos that I've ever found on statistical modeling.
@My-NaMeS_jEfF3 жыл бұрын
Thank you Julia I’m voting for you tonight because you rock!!!!
@sarizwan19863 жыл бұрын
Nicely done Prof. Julia. I hope you take viewer requests?? :) Can you do a video on ensemble modelling for multiclass classification problems and then derive how important the variables are based on the ensemble. Thank you very much for doing these screencasts consistently and regularly. I can't really stress how important these are to new learners like me.
@deanmait2 жыл бұрын
Thank you. Another great screencast. The systematic step through is a treat.
@collinguidry98672 жыл бұрын
This was really informative! I was not expecting latitude and longitude passed straight into the model to work so well.
@method3413 жыл бұрын
Hi Julia, love your videos. I was wondering if you could do one on web scraping? In particular stock market data. Thanks
@jesuisrobert8083 жыл бұрын
I liked at the end where you had the graph of items that correlated most with price. It reinforced the geo map of prices. West of city houses generally do have bigger lots. It would have been interesting to see the trend of data prices over time. Also, where are the jobs? Type of jobs. That plays a huge role in the latitude - price correlation. For the longest time, tech jobs were north and west. Now we are seeing more jobs in central and Tesla and Oracle just built a presence in South Austin. Central Austin has more to do with activities, IMO.
@Leonhard-Euler2 жыл бұрын
Good video. Two points: 1) When generaing top_words, the clause "filter(!word %in% as.character(1:5))" could be improved to "filter(nchar(word) > 1)", this could eliminate all single-letter word 2) It looks like the hasSpa logical column is not convert to numeric before tuning, this cause the tuning step fail on my box. I simply add a clause "mutate(hasSpa = as.numeric(hasSpa))" when generating austin_split, this works around the problem and the tuning step suceeds.
@mosesotieno16293 жыл бұрын
The videos are magnificently great! I love them. Thank you very much
@deannanuboshi13872 жыл бұрын
Great video. Is there a way to obtain the confidence interval of the final prediction?
@JuliaSilge2 жыл бұрын
xgboost doesn't generate a confidence interval, but if you are using a model that does, you can get that as shown here: www.tidymodels.org/start/models/ You might be interested in bootstrap prediction intervals: markjrieke.github.io/workboots/
@EsinaViwn9 Жыл бұрын
Demonstrated EDA attempts are perfect.
@AnantaPradhan3 жыл бұрын
really nice!! Could you upload some prediction models using raster dataset?
@vitorklein3 жыл бұрын
Love your videos, you have a great teaching skills
@alanjiang29303 жыл бұрын
Thanks for sharing, Julia!
@charlesnjorogekamau86813 жыл бұрын
This has been very helpful. Where can I access this specific dataset?
@georgecooke56393 жыл бұрын
Cheers Julia, these videos are great. 👍👍
@tankUpp3 жыл бұрын
on 19:28, Julia mentioned to adjust the p-value because she trained many models. Any books someone can refer that touches on this topic? I don't understand the reason behind. ty!
@JuliaSilge3 жыл бұрын
I think the Wikipedia article could be a good place to start and at least helps you know about the right vocabulary to look up for more info: en.wikipedia.org/wiki/Multiple_comparisons_problem
@tankUpp3 жыл бұрын
@@JuliaSilge Huge ty for introducing this new term for me!!! I would have probably find it a few years later :)
@N1loon3 жыл бұрын
Amazing! I learn so much from each of your videos, thank you!
@황성근-t9k3 жыл бұрын
As always, Thanks for useful codes and info
@darmaw223 жыл бұрын
Julia, Many Thanks!
@EsinaViwn9 Жыл бұрын
I have noticed that parentheses are colorized differently (so it is easy to track where is the closing one). How do you configure this? What is the Rstudio theme used?
@JuliaSilge Жыл бұрын
That part is not a theme, but an option in RStudio. You can check that out here: posit.co/blog/rstudio-1-4-preview-rainbow-parentheses/
@EsinaViwn9 Жыл бұрын
@@JuliaSilge thanks!
@afazdadash95622 жыл бұрын
Thank you
@mniyas36123 жыл бұрын
Hi! Recently I have encountered a problem in my work. My variables are several maps in raster format, the value of which I extracted at the sampling points, and I have to generate the pollution zoning map from them with the xgb regression model. As long as the data is in Excel format, everything is going well, but I do not know with what function I can generalize the model on raster maps. I could not do that with the predict function.
@JuliaSilge3 жыл бұрын
You can `predict()` on lat/long points that were not part of your training set but you'll need to generate a new grid of points for prediction
@reshmilb252711 ай бұрын
please avoid background colour black. use eyesight friendly colour
@joekannoo3 жыл бұрын
You’re the best Julia!
@harisjaved13793 жыл бұрын
Hi Julia thank you. Can you please deal with a more complicated dataset? Very sparse dataset or imbalanced classes please? Thank you
@violinplayer72013 жыл бұрын
Awesome, thanks for sharing!
@MrJruda3 жыл бұрын
Just curious, is there a particular reason you decided to use xgboost instead of something else?
@JuliaSilge3 жыл бұрын
xgboost tends to give you really good performance on dense tabular data like this, so it's a good option for a "competitive" situation like SLICED
@MrJruda3 жыл бұрын
can't figure out why r keeps giving me this message Error in workflow(austin_recipe, xgb_spec) : unused arguments (austin_recipe, xgb_spec) everything worked perfectly till here
@JuliaSilge3 жыл бұрын
Looks like you need to update to the latest version of workflows from CRAN
@journey-in-pixels3 жыл бұрын
Much appreciated 🙏
@MrAbhimufc3 жыл бұрын
Love the content!
@ravi2813813 жыл бұрын
Thanks for the wonderful screencast! I have a question. Why do we use price_total in the glm?
@JuliaSilge3 жыл бұрын
That's like the "failures" in successes out of failures for a binomial model.
@alexandregeorgelustosa59693 жыл бұрын
Top 👏👏👏
@Akbar_Ato2 жыл бұрын
Julia, what an amazing content. Kudos! I am religiously watching each one of them. You know you've got the best Tidy Tuesdays. Meanwhile, I am making house price prediction by using XGBoost regression. I have 5 categorical predictors for one of numerical predictors. When I run, I get this error over and over again 🙁I have its reprex, just in case. [1] "Error in as_indices_impl(): ! Must subset columns with a valid subscript vector. x Subscript has the wrong type quosures. i It must be numeric or character."
@JuliaSilge2 жыл бұрын
That's great that you have created a reprex! I recommend that you post on RStudio Community, where folks can see the reprex and help you understand where your problem is: rstd.io/tidymodels-community