Cleaning NBA Stats Data With Python And Pandas: Data Project [part 2 of 3]

  Рет қаралды 13,016

Dataquest

Dataquest

Күн бұрын

This is part 2 of a series where we predict which NBA player will win MVP. You can watch this without having seen part 1.
In this video, we'll do some data cleaning! We'll combine our mvp, player, and team stats data using pandas. Along the way, we'll work through a lot of data cleaning, including using merge, map, fillna, and replace. We'll also use matplotlib to explore the data.
By the end, we'll have a clean DataFrame that we can use for machine learning in part 3 of this series.
The code that we write in this video can be found here - github.com/dataquestio/projec....
If you want to watch part 1, you can see it here - • Web Scraping NBA Stats... .
Chapters
00:00 Intro
01:16 Overview of the NBA data
4:35 Cleaning the MVP vote data
6:55 Cleaning the player data
15:23 Combining the player and MVP data
19:04 Cleaning the team data
30:08 Combining the team, player, and MVP data
31:08 Exploring the NBA data
-----------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: bit.ly/3O8MDef
#PythonTutorial #Python #Dataquest #Importing #Managing#Data #machinelearning

Пікірлер: 34
@patrickmurray1988
@patrickmurray1988 2 жыл бұрын
Thanks for this project. I'm currently working through the Data Analyst in Python path and it's fun seeing the things I'm learning being put to use on projects outside of the lessons.
@SuperSumittanwar
@SuperSumittanwar 2 жыл бұрын
This is the second video i have seen on data quest and feels like vik is the rockstar, the way he think and solve data is remarkable 🤩
@futureverse9347
@futureverse9347 9 ай бұрын
As a former professional athlete who is looking to learn data science and statisitcs THANK YOU for this great work!
@Data_Man
@Data_Man 2 жыл бұрын
This was very helpful in applying things I just recently learned into something I have interest in. Plus now I have a dataset to do all kinds of analysis with.
@Dataquestio
@Dataquestio 2 жыл бұрын
Glad to hear it! -Vik
@nicesoundworks7954
@nicesoundworks7954 2 жыл бұрын
Thanks to you and the DQ team. here another DQ student, making and enjoying indata scientist path. You are doing a great great job. Again: [ ∞ 🙏 ]
@Dataquestio
@Dataquestio 2 жыл бұрын
Thanks a lot! Glad you're enjoying it :)
@afasfafafas
@afasfafafas 2 жыл бұрын
Great project! Looking forward to the ML model!
@Dataquestio
@Dataquestio 2 жыл бұрын
We'll be uploading this later in the week!
@tarkanh2519
@tarkanh2519 2 жыл бұрын
Perfect video...
@naschendani1474
@naschendani1474 Жыл бұрын
huge thanks
@bobbyjordan4532
@bobbyjordan4532 Жыл бұрын
very great project! by the way, the * means that the player was an All-Star in that specific year!
@Dataquestio
@Dataquestio Жыл бұрын
Awesome, thanks! -Vik
@IamDeftly
@IamDeftly Жыл бұрын
I'm running into a bit of an issue. After I combine the MVPs and Player data and I try to look at the data by sorting through the "Pts Won" for the MVPs, the data I get in the table has NaN values for everything except for the pts won, pts max and share. I'm not sure what went wrong or how to fix this. Any help?
@danielowolabi6891
@danielowolabi6891 2 жыл бұрын
Thanks for this project. Are we to create the mapping of team nicknames and team names csv file ourselves or it can be accessed somewhere too?
@Dataquestio
@Dataquestio 2 жыл бұрын
Hi Daniel - you can find the mapping file here - github.com/dataquestio/project-walkthroughs/blob/master/mvp/nicknames.txt
@AndresIniestaLujain
@AndresIniestaLujain 2 жыл бұрын
How would you have dealt with null values in your dataframe? E.g. For a player during a given season, 3P% is 'null' for a player that had 0 3PA. Would you complete the data, leave it null, or is it context-dependent? If you would complete it, would you replace the null value with a calculation of their career average? If leaving it null, could you still run correlations without the null values affecting the corr values too heavily? Thanks! Really learning a lot from this series.
@Dataquestio
@Dataquestio 2 жыл бұрын
Hi Andres - it is definitely context dependent. It depends on what you'll be doing with the data. In this case, we're predicting who will win MVP. Someone with 0 3P attempts won't get any MVP votes, so we can replace it with 0 or drop the row. If you were trying to calculate a rating for each player, then replacing with career averages would make more sense. When finding correlations, rows with null values usually aren't considered, so you'd still be fine!
@AndresIniestaLujain
@AndresIniestaLujain 2 жыл бұрын
@@Dataquestio Thanks for your answer! Makes sense. However, I would disagree that someone with 0 3PT attempts won't get MVP votes. Ex: Paint dominant players like Shaquille O'Neil in 1997-98 :)
@Dataquestio
@Dataquestio 2 жыл бұрын
That's a fair point :) I was thinking modern era!
@justDlight
@justDlight Жыл бұрын
CSV file Nicknames can be created through this its a manual process you should do it yourself. ATL Atlanta Hawks BKN Brooklyn Nets BOS Boston Celtics CHA Charlotte Hornets CHI Chicago Bulls CLE Cleveland Cavaliers DAL Dallas Mavericks DEN Denver Nuggets DET Detroit Pistons GSW Golden State Warriors HOU Houston Rockets IND Indiana Pacers LAC Los Angeles Clippers LAL Los Angeles Lakers MEM Memphis Grizzlies MIA Miami Heat MIL Milwaukee Bucks MIN Minnesota Timberwolves NOP New Orleans Pelicans NYK New York Knicks OKC Oklahoma City Thunder ORL Orlando Magic PHI Philadelphia 76ers PHX Phoenix Suns POR Portland Trail Blazers SAC Sacramento Kings SAS San Antonio Spurs TOR Toronto Raptors UTA Utah Jazz WAS Washington Wizards
@vivekdwivedi3130
@vivekdwivedi3130 8 ай бұрын
can anybody tell me where i can get this mvp cvs file
@vasoochigava5213
@vasoochigava5213 2 жыл бұрын
thanks, stats.apply(pd.to_numeric, errors='ignore') doesn't convert objects to integers and there is no error. to say the truth please in future don't skip part what you don't showcase( i am saying about nickname csv). I made the conversion by astype by the way. i've tried to understand the issue with unique values but they were integers for sure in case of age but it wasn't converting with to_numeric anyways
@Dataquestio
@Dataquestio 2 жыл бұрын
Hi Vaso - pd.to_numeric should convert all numeric columns. You do need to make sure to assign back, though, so `stats=stats.apply....`. You also need to make sure the columns are clean (don't have any non-numeric values in them), otherwise `errors="ignore"` will cause nothing to happen.
@Speakingmymind365
@Speakingmymind365 2 жыл бұрын
when we are combining teams with rest of the data combined i am getting suffix kindly help
@Dataquestio
@Dataquestio 2 жыл бұрын
Hi - can you please share the error message you are getting, your code, and the code right before/after your code?
@romhen233
@romhen233 2 жыл бұрын
hey, i would like to know how to create the nicknames.csv
@Dataquestio
@Dataquestio 2 жыл бұрын
Hi Rom - I started with this list (en.wikipedia.org/wiki/Wikipedia:WikiProject_National_Basketball_Association/National_Basketball_Association_team_abbreviations), then added in some historical team codes/nicknames as well.
@romhen233
@romhen233 2 жыл бұрын
thank you. can you show/write how to do it because i’m stuck on it and cannot continue the project. i will really appreciate it.
@tarkanh2519
@tarkanh2519 2 жыл бұрын
Hi, how can we find relevant csv.files? Please support.
@Dataquestio
@Dataquestio 2 жыл бұрын
You can find the 3 csv files from the last part (teams, players, mvps) here - github.com/dataquestio/project-walkthroughs/tree/master/mvp
@svetlanadolgushina4936
@svetlanadolgushina4936 2 жыл бұрын
regex = False trick works with "+" as well.
Web Scraping NBA Stats With Python: Data Project [Part 1 of 3]
43:43
Predicting the NBA MVP: Machine Learning Project [part 3 of 3]
47:49
小女孩把路人当成离世的妈妈,太感人了.#short #angel #clown
00:53
PINK STEERING STEERING CAR
00:31
Levsob
Рет қаралды 21 МЛН
1❤️#thankyou #shorts
00:21
あみか部
Рет қаралды 88 МЛН
СНЕЖКИ ЛЕТОМ?? #shorts
00:30
Паша Осадчий
Рет қаралды 7 МЛН
Solving real world data science tasks with Python Pandas!
1:26:07
Keith Galli
Рет қаралды 1,5 МЛН
Real World Data Cleaning in Python Pandas (Step By Step)
40:01
Ryan Nolan Data
Рет қаралды 53 М.
Predict NBA Games With Python And Machine Learning
58:33
Dataquest
Рет қаралды 43 М.
7 Python Data Visualization Libraries in 15 minutes
15:03
Rob Mulla
Рет қаралды 69 М.
Web Scraping NBA Games With Python [Full Walkthrough W/Code]
1:19:10
Predict Football Match Winners With Machine Learning And Python
44:43
Data Cleaning in Pandas | Python Pandas Tutorials
38:37
Alex The Analyst
Рет қаралды 253 М.
小女孩把路人当成离世的妈妈,太感人了.#short #angel #clown
00:53