Who is the data MVP of the NBA?
Use data science to analyze the data from FiveThirtyEight.com and draw your own conclusions about who is the most valuable player.
Background
Sports fans and sports teams have long been obsessed with data and numbers. In modern professional sports, data and statistics are heavily used in personnel decisions that can include which players to draft, how much to pay for a given player, and even which players to play in which game situations. The actual statistical models that professional teams use to evaluate their players are kept highly secret, but fans and sports pundits have long engaged in number crunching as they analyze the data from their favorite sports, teams, and players.
One of the best examples out there of this is the FiveThirtyEight website which covers sports, politics, and science through the lens of data. We love what they do with statistics and mathematical modeling around NBA basketball players and teams. We have built an activity for you to work through around the metric that they have developed to rank and compare NBA players in terms of offense, defense, and their overall contribution to their team’s wins and losses. They call it RAPTOR and it stands for Robust Algorithm (using) Player Tracking (and) On/Off Ratings.
From the FiveThirtyEight website:
To complete this activity, it is not important that you have a deep understanding of basketball or even of how the RAPTOR metric is calculated. The important thing to remember is that RAPTOR measures how much a player contributes on offense or defense relative to the hypothetical average player. A RAPTOR value of zero would be an average player. High positive values mean the player is making a strong contribution in the measured area (offense, defense, or overall). If you would really like to geek out on how RAPTOR is measured, we suggest that you check this out.
Dataset
We downloaded this data from the FiveThirtyEight.com website here. The dataset contained 826 rows of data for all NBA players. There were separate observations (rows) for a player’s regular season and playoff performances. We took a subset of the data so that it only included data from the regular season on the 381 players who played more than 500 total minutes in games. That subset is the dataset you will use in this activity.
Variables
Player - This is the name of the player. It has been coded as info so you will not use it in graphing, but you can view it by hovering your pointer on any datapoint on the graph.
Team - This categorical variable is the Basketball-Reference ID or three letter code of an NBA team.
Playing Time (min) - This numeric variable is the number of minutes of playing time that a given player had throughout the season. Measured in minutes.
RAPTOR offense - This numeric variable represents points above average per 100 possessions added by player on offense, using both box and on-off components.
RAPTOR defense - This numeric variable represents points above average per 100 possessions added by player on defense, using both box and on-off components.
RAPTOR total - This numeric variable represents the total points above average per 100 possessions added by player on both offense and defense, using both box and on-off components.
Wins Against Replacement (WAR) - This numeric variable represents the hypothetical number of wins a player adds to his team during the regular season relative to a hypothetical average player.
Activity
Get to know the data
Describe how the data are distributed for RAPTOR Offense. In other words, what is the shape of the data when you look at that variable across the entire dataset? Include a histogram as a visual aid to go along with your description.
2. Describe how the data are distributed for RAPTOR Defense. In other words, what is the shape of the data when you look at that variable across the entire dataset? Include a histogram as a visual aid to go along with your description.
Key Questions:
3. Plot RAPTOR offense on Y (as a jittered dot plot) to see the values of all players. Use your pointer to hover and see which player is represented by any data point. Who are the top 3 offensive players in the NBA according to RAPTOR?
4. Plot RAPTOR defense on Y (as a jittered dot plot) to see the values of all players. Use your pointer to hover and see which player is represented by any data point. Who are the top 3 defensive players in the NBA according to RAPTOR?
5. Do good players on offense tend to also be better players when on defense? Plot RAPTOR Offense on X and RAPTOR Defense on Y and include the graph below as visual evidence for your answer.
6. Visually add RAPTOR Total to your graph by showing it as Z on the plot you already made with Plot RAPTOR Offense on X and RAPTOR Defense on Y. Paste your graph here.
7. Based on these data alone who would your top five for NBA MVP be in 2023? List them in order from 1st place to 5th place.
Further Questions
8. The actual top 5 for the 2023 NBA MVP voting are listed in the table below. The voting system for MVP voting awards 10 points for a first place vote, 7 points for a second place vote, 5 for third, 3 for fourth, and 1 point for a fifth place vote.
How do these results differ from your results based on the data alone?
9. The voting for NBA MVP is conducted by humans and is not based on data, although data may influence the MVP voters. Why do the results based on the actual voting differ from the results based on the data alone? What factors other than the data do you think may have been considered when selecting the 2023 NBA MVP?