Predict Baseball Stats using Machine Learning and Python

  Рет қаралды 17,068

Dataquest

Dataquest

Күн бұрын

We'll predict future season stats for baseball players using machine learning. The stat we'll predict is the wins above replacement (WAR) a player will generate next season.
We'll first download and clean baseball season data using python and pybaseball. We'll do feature selection using a sequential feature selector to identify the most promising predictors for machine learning. We'll then train a ridge regression model to predict future season WAR. We'll measure error and improve the model.
In the end, you'll have a model that can predict future season WAR and the next steps to improve the model.
You can find the full code here - [project-walkthroughs/baseball_games at master · dataquestio/project-walkthroughs · GitHub](github.com/dataquestio/projec...)
Chapters
00:00 Introduction
02:00 - Download the data
05:52 - Creating an ML target
09:15 - Cleaning the data
16:19 - Selecting useful features
27:13 - Making predictions with ML
38:15 - Improving accuracy
49:26 - Diagnosing issues with the model
52:28 - Wrap-up and next steps with the model
-----------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: bit.ly/3O8MDef

Пікірлер: 40
@imfrshlikeuhh
@imfrshlikeuhh Жыл бұрын
the fact that this type of content is FREE is mind blowing
@anishapostate4221
@anishapostate4221 Жыл бұрын
he fact that people are not knowing this is another mind blowing thing
@imfrshlikeuhh
@imfrshlikeuhh Жыл бұрын
@@anishapostate4221 i wldnt say that, there are plenty more ppl who dont know this than do
@DanielGarcia-uq8yz
@DanielGarcia-uq8yz Жыл бұрын
Great project...love the concept of dataquest's guided project walkthroughs. Thanks Vik
@jscott21
@jscott21 Жыл бұрын
Incredible video - thank you so much
@pushkarratnaparkhi2205
@pushkarratnaparkhi2205 Жыл бұрын
Great video. Thanks 💯💯
@SuperNunera
@SuperNunera Жыл бұрын
Ty for sharing. Amazing content.
@kingofhavila9850
@kingofhavila9850 Жыл бұрын
That day I joined the webinar slightly late so I was excited about watching this video.
@evanmaurer1968
@evanmaurer1968 Жыл бұрын
I appreciate this content sir. Thank you so much!
@reena3571
@reena3571 Жыл бұрын
Thank you immensely for sharing
@leassis91
@leassis91 Жыл бұрын
thank you for this content!
@tomkmb4120
@tomkmb4120 Жыл бұрын
Hey Vik, coming here from your more recent video with NBA stats analysis. In this instance, is pybaseball replacing the more manual work being done by playwright and having to parse the specific html in order to scrape the data you need? Is there an equivalent for the NBA to pybaseball? I think there may be one for the NFL that I've seen in places but this is all new to me so I can't be sure. Just struggling a bit with adapting that previous video to be a regular python file instead of following along directly with your Jupyter tutorial is all.
@henryryan5194
@henryryan5194 Жыл бұрын
I might be missing something, but... Once you have trained and tested the model, what is the process to apply the model to predict the following year? In this video you trained the mode to predict the "Next_WAR" which in this case would be the players 2022 WAR, and then evaluated the model based on the real result vs. your predicted result. But, if you wanted to predict 2023 WAR, how would the code need to be adjusted? Essentially, how do you used the trained model to predict 2023 player WAR?
@willcarroll9762
@willcarroll9762 Жыл бұрын
You ever figure it out? I’m struggling there too
@Chris-rl6rw
@Chris-rl6rw Жыл бұрын
@@willcarroll9762 This model can only predict one year out into the future. To predict 2023, you would need 2022 data. It's not necessairly a full time series analysis, but a linear regression model used to predict the following years stats. Predicting Next WAR is predicting next years stat. You could attempt to create a column for 2 years out into the future by shifting the 'WAR' column again and testing how the model predicts two years into the future and so on. My guess is it may start performing poorly at that stage.
@LouieWinehouse
@LouieWinehouse Жыл бұрын
you could train it based on the first 3 months of data to predict the next 6 months of the season or however u want. For my mlb ML model i train it on March-July to predict August-October
@hakeemyatim5363
@hakeemyatim5363 Жыл бұрын
Hello! This is an awesome project and walkthrough that you've done! I actually wanted to try predicting HR's instead of WAR's in this model, but when I tried it with scaling the data for ridge regression, I would get HR numbers between 0 to 1 with the minmax scaler. But if I skip that part, I'd get the whole number of the predicted HR for the next year. Would it still be accurate if we are just looking at HR's when I skip the scaling? Again, Great Video!
@Dataquestio
@Dataquestio Жыл бұрын
You don't want to scale your target column. So if you're predicting HRs, you want to scale all of the columns except the HR column.
@tomkmb4120
@tomkmb4120 11 ай бұрын
A little confused on the Sequential Feature Selector, you mention that after normalising the data - it picks the features that it thinks will help with accuracy the most, how is it determining that? Sorry if that's a stupid question.
@arundey3971
@arundey3971 Жыл бұрын
any idea on why pybaseball package no longer loads. I tried pip install pybaseball, and I get an error.
@cloudcomputingbd
@cloudcomputingbd Жыл бұрын
nice
@wanjohisamuel8547
@wanjohisamuel8547 Жыл бұрын
Your videos are amazing. I'm starting to love ML. What advice will you give to someone who is starting Data Science...
@Dataquestio
@Dataquestio Жыл бұрын
That's great to hear, Wanjohi! I actually started a site called Dataquest where you can learn data science from scratch - the data scientist path will teach you all the main data science skills - www.dataquest.io/path/data-scientist/ .
@paperk1d
@paperk1d Жыл бұрын
Is it possible to this in R I am just started to learn about programming so I don’t have much knowledge about this
@chealol4233
@chealol4233 7 күн бұрын
How would you be able to do this for "Predicting" an player to record a hit in a given game? Is that possible?
@vitonash
@vitonash 8 ай бұрын
a bit confused on what the purpose of making the full copy and then dropna() was. it doesn't seem like the full copy was used at all throughout the rest of the code?
@tjans1979
@tjans1979 Жыл бұрын
What editor are you using for this?
@turtle1897
@turtle1897 Жыл бұрын
It’s Jupyter Notebook
@fudgenuggets405
@fudgenuggets405 8 ай бұрын
I don't think pybaseball is working any more. I get a blank .csv at the beginning after supposedly downloading the Fangraphs data.
@gianpierrealvarado993
@gianpierrealvarado993 4 ай бұрын
Does anyone know why I wouldn’t be able to import pybaseball on JupyterLab anymore? I’m trying to follow along on my own notebook and for some reason I’m getting an error code that the module doesn’t exist. Thanks for any help in advance!
@AlyssaFord-xs3ht
@AlyssaFord-xs3ht Жыл бұрын
I am having trouble finding the batting csv file
@zachbroussard8734
@zachbroussard8734 Жыл бұрын
I’m not getting the CSV when I run this. Can anyone help?
@el_goomba
@el_goomba Жыл бұрын
how would you adjust the code to predict 2023 war?
@kellybjames
@kellybjames 3 ай бұрын
did you solve for this?
@peter93263
@peter93263 Жыл бұрын
Can you do something similar for English Premier league soccer?
@AbrarMuhtasim
@AbrarMuhtasim Жыл бұрын
'Customer segmentation and clustering in retail using machine learning' with real data set. Please make a project tutorial in this project😭😭😭😭
@emmamutegi5919
@emmamutegi5919 Жыл бұрын
I have a problem running this...help removed_columns = ['NEXT_WAR', 'Name', 'Team' ,'IDfg', 'Season'] selected_columns = dataset.columns[~dataset.columns.isin(removed_columns)] 'AttributeError: 'function' object has no attribute 'columns'
@Dataquestio
@Dataquestio Жыл бұрын
It looks like 'dataset' is a function for some reason. It should be a pandas Dataframe. Make sure you didn't accidentally assign to the `dataset` variable.
@turtle1897
@turtle1897 Жыл бұрын
@@DataquestioI have that same issue and I have just started Dquest and was just using this as a follow along project while I wasn’t studying. I have some knowledge but not yet to this stage yet just working towards familiarity
Web Scraping NBA Games With Python [Full Walkthrough W/Code]
1:19:10
Baseball Prediction using Machine Learning - Data Wrangling
27:14
numeristical
Рет қаралды 4,2 М.
Дибала против вратаря Легенды
00:33
Mr. Oleynik
Рет қаралды 2,9 МЛН
Неприятная Встреча На Мосту - Полярная звезда #shorts
00:59
Полярная звезда - Kuzey Yıldızı
Рет қаралды 6 МЛН
Is it Cake or Fake ? 🍰
00:53
A4
Рет қаралды 20 МЛН
Predict the Outcome of Football Matches Using this Model
21:02
Kie Millett
Рет қаралды 163 М.
Predict The Stock Market With Machine Learning And Python
35:55
Dataquest
Рет қаралды 639 М.
Predict NBA Games With Python And Machine Learning
58:33
Dataquest
Рет қаралды 44 М.
Selecting the best model in scikit-learn using cross-validation
35:54
Using Machine Learning for Predicting NFL Games | Data Dialogs 2016
37:16
Berkeley School of Information
Рет қаралды 49 М.
Weather Prediction With Python And Machine Learning [W/Code]
45:06
Дибала против вратаря Легенды
00:33
Mr. Oleynik
Рет қаралды 2,9 МЛН