Predict housing prices in Austin TX with tidymodels and xgboost

  Рет қаралды 14,030

Julia Silge

Julia Silge

Күн бұрын

Пікірлер: 49
@chrisjackson8526
@chrisjackson8526 3 жыл бұрын
These are honestly some of the most helpful/instructional videos that I've ever found on statistical modeling.
@My-NaMeS_jEfF
@My-NaMeS_jEfF 3 жыл бұрын
Thank you Julia I’m voting for you tonight because you rock!!!!
@sarizwan1986
@sarizwan1986 3 жыл бұрын
Nicely done Prof. Julia. I hope you take viewer requests?? :) Can you do a video on ensemble modelling for multiclass classification problems and then derive how important the variables are based on the ensemble. Thank you very much for doing these screencasts consistently and regularly. I can't really stress how important these are to new learners like me.
@deanmait
@deanmait 2 жыл бұрын
Thank you. Another great screencast. The systematic step through is a treat.
@collinguidry9867
@collinguidry9867 2 жыл бұрын
This was really informative! I was not expecting latitude and longitude passed straight into the model to work so well.
@method341
@method341 3 жыл бұрын
Hi Julia, love your videos. I was wondering if you could do one on web scraping? In particular stock market data. Thanks
@jesuisrobert808
@jesuisrobert808 3 жыл бұрын
I liked at the end where you had the graph of items that correlated most with price. It reinforced the geo map of prices. West of city houses generally do have bigger lots. It would have been interesting to see the trend of data prices over time. Also, where are the jobs? Type of jobs. That plays a huge role in the latitude - price correlation. For the longest time, tech jobs were north and west. Now we are seeing more jobs in central and Tesla and Oracle just built a presence in South Austin. Central Austin has more to do with activities, IMO.
@Leonhard-Euler
@Leonhard-Euler 2 жыл бұрын
Good video. Two points: 1) When generaing top_words, the clause "filter(!word %in% as.character(1:5))" could be improved to "filter(nchar(word) > 1)", this could eliminate all single-letter word 2) It looks like the hasSpa logical column is not convert to numeric before tuning, this cause the tuning step fail on my box. I simply add a clause "mutate(hasSpa = as.numeric(hasSpa))" when generating austin_split, this works around the problem and the tuning step suceeds.
@mosesotieno1629
@mosesotieno1629 3 жыл бұрын
The videos are magnificently great! I love them. Thank you very much
@deannanuboshi1387
@deannanuboshi1387 2 жыл бұрын
Great video. Is there a way to obtain the confidence interval of the final prediction?
@JuliaSilge
@JuliaSilge 2 жыл бұрын
xgboost doesn't generate a confidence interval, but if you are using a model that does, you can get that as shown here: www.tidymodels.org/start/models/ You might be interested in bootstrap prediction intervals: markjrieke.github.io/workboots/
@EsinaViwn9
@EsinaViwn9 Жыл бұрын
Demonstrated EDA attempts are perfect.
@AnantaPradhan
@AnantaPradhan 3 жыл бұрын
really nice!! Could you upload some prediction models using raster dataset?
@vitorklein
@vitorklein 3 жыл бұрын
Love your videos, you have a great teaching skills
@alanjiang2930
@alanjiang2930 3 жыл бұрын
Thanks for sharing, Julia!
@charlesnjorogekamau8681
@charlesnjorogekamau8681 3 жыл бұрын
This has been very helpful. Where can I access this specific dataset?
@georgecooke5639
@georgecooke5639 3 жыл бұрын
Cheers Julia, these videos are great. 👍👍
@tankUpp
@tankUpp 3 жыл бұрын
on 19:28, Julia mentioned to adjust the p-value because she trained many models. Any books someone can refer that touches on this topic? I don't understand the reason behind. ty!
@JuliaSilge
@JuliaSilge 3 жыл бұрын
I think the Wikipedia article could be a good place to start and at least helps you know about the right vocabulary to look up for more info: en.wikipedia.org/wiki/Multiple_comparisons_problem
@tankUpp
@tankUpp 3 жыл бұрын
@@JuliaSilge Huge ty for introducing this new term for me!!! I would have probably find it a few years later :)
@N1loon
@N1loon 3 жыл бұрын
Amazing! I learn so much from each of your videos, thank you!
@황성근-t9k
@황성근-t9k 3 жыл бұрын
As always, Thanks for useful codes and info
@darmaw22
@darmaw22 3 жыл бұрын
Julia, Many Thanks!
@EsinaViwn9
@EsinaViwn9 Жыл бұрын
I have noticed that parentheses are colorized differently (so it is easy to track where is the closing one). How do you configure this? What is the Rstudio theme used?
@JuliaSilge
@JuliaSilge Жыл бұрын
That part is not a theme, but an option in RStudio. You can check that out here: posit.co/blog/rstudio-1-4-preview-rainbow-parentheses/
@EsinaViwn9
@EsinaViwn9 Жыл бұрын
@@JuliaSilge thanks!
@afazdadash9562
@afazdadash9562 2 жыл бұрын
Thank you
@mniyas3612
@mniyas3612 3 жыл бұрын
Hi! Recently I have encountered a problem in my work. My variables are several maps in raster format, the value of which I extracted at the sampling points, and I have to generate the pollution zoning map from them with the xgb regression model. As long as the data is in Excel format, everything is going well, but I do not know with what function I can generalize the model on raster maps. I could not do that with the predict function.
@JuliaSilge
@JuliaSilge 3 жыл бұрын
You can `predict()` on lat/long points that were not part of your training set but you'll need to generate a new grid of points for prediction
@reshmilb2527
@reshmilb2527 11 ай бұрын
please avoid background colour black. use eyesight friendly colour
@joekannoo
@joekannoo 3 жыл бұрын
You’re the best Julia!
@harisjaved1379
@harisjaved1379 3 жыл бұрын
Hi Julia thank you. Can you please deal with a more complicated dataset? Very sparse dataset or imbalanced classes please? Thank you
@violinplayer7201
@violinplayer7201 3 жыл бұрын
Awesome, thanks for sharing!
@MrJruda
@MrJruda 3 жыл бұрын
Just curious, is there a particular reason you decided to use xgboost instead of something else?
@JuliaSilge
@JuliaSilge 3 жыл бұрын
xgboost tends to give you really good performance on dense tabular data like this, so it's a good option for a "competitive" situation like SLICED
@MrJruda
@MrJruda 3 жыл бұрын
can't figure out why r keeps giving me this message Error in workflow(austin_recipe, xgb_spec) : unused arguments (austin_recipe, xgb_spec) everything worked perfectly till here
@JuliaSilge
@JuliaSilge 3 жыл бұрын
Looks like you need to update to the latest version of workflows from CRAN
@journey-in-pixels
@journey-in-pixels 3 жыл бұрын
Much appreciated 🙏
@MrAbhimufc
@MrAbhimufc 3 жыл бұрын
Love the content!
@ravi281381
@ravi281381 3 жыл бұрын
Thanks for the wonderful screencast! I have a question. Why do we use price_total in the glm?
@JuliaSilge
@JuliaSilge 3 жыл бұрын
That's like the "failures" in successes out of failures for a binomial model.
@alexandregeorgelustosa5969
@alexandregeorgelustosa5969 3 жыл бұрын
Top 👏👏👏
@Akbar_Ato
@Akbar_Ato 2 жыл бұрын
Julia, what an amazing content. Kudos! I am religiously watching each one of them. You know you've got the best Tidy Tuesdays. Meanwhile, I am making house price prediction by using XGBoost regression. I have 5 categorical predictors for one of numerical predictors. When I run, I get this error over and over again 🙁I have its reprex, just in case. [1] "Error in as_indices_impl(): ! Must subset columns with a valid subscript vector. x Subscript has the wrong type quosures. i It must be numeric or character."
@JuliaSilge
@JuliaSilge 2 жыл бұрын
That's great that you have created a reprex! I recommend that you post on RStudio Community, where folks can see the reprex and help you understand where your problem is: rstd.io/tidymodels-community
Predict giant pumpkin weights with tidymodels workflowsets
37:19
Get started with tidymodels using vaccination rate data
25:46
Julia Silge
Рет қаралды 12 М.
Интересно, какой он был в молодости
01:00
БЕЗУМНЫЙ СПОРТ
Рет қаралды 3,8 МЛН
Мясо вегана? 🧐 @Whatthefshow
01:01
История одного вокалиста
Рет қаралды 7 МЛН
Smart Sigma Kid #funny #sigma
00:33
CRAZY GREAPA
Рет қаралды 38 МЛН
[BEFORE vs AFTER] Incredibox Sprunki - Freaky Song
00:15
Horror Skunx 2
Рет қаралды 20 МЛН
Tuning Model Hyper-Parameters for XGBoost and Kaggle
24:34
Jeff Heaton
Рет қаралды 18 М.
Tuning XGBoost using tidymodels
50:36
Julia Silge
Рет қаралды 19 М.
Cooking Your Data with Recipes in R with Max Kuhn
1:23:40
R User Group at Harvard Data Science Initiative
Рет қаралды 8 М.
Tuning random forest hyperparameters with tidymodels
1:04:32
Julia Silge
Рет қаралды 18 М.
Learn Machine Learning Like a GENIUS and Not Waste Time
15:03
Infinite Codes
Рет қаралды 200 М.
Modeling hotel bookings in R using tidymodels and recipes
51:34
Julia Silge
Рет қаралды 31 М.
GLM in R
18:17
Kasper Welbers
Рет қаралды 58 М.
Интересно, какой он был в молодости
01:00
БЕЗУМНЫЙ СПОРТ
Рет қаралды 3,8 МЛН