2018-2019 NBA MVP predictions

With the All-Star break, I have thought that it would be a great opportunity to predict the 2018-2019 MVP. After half of the season, Giannis Antetokounmpo and James Harden seem favorites but there are also some interesting challengers such as Paul George with his brillant season.

Based on statistics of the players plus the previous MVP, I will try to predict the 2018-2019 MVP Award winner by predicting their Vote Share  (award points / maximum number of award points).


The Data

I have collected the data on Basketball Reference : I have created a database with every MVP contenders for each year since 1980. For this season, based on the NBA MVP ladder, our 10 contenders are :

  1. Giannis Antetokounmpo
  2. James Harden
  3. Paul George
  4. Stephen Curry
  5. Joël Embiid
  6. Nikola Jokic
  7. Kevin Durant
  8. LeBron James
  9. Kawhi Leonard
  10. Kyrie Irving

For each players, I have selected the following stats :

  • Basic statistics : G, MP, PTS/G, TRB/G, AST/G, STL/G, BLK/G, FG %, 3P %, FT %
  • Team statistics : Seed, number of wins
  • Advanced statistics : WS, WS/48, VORP, BPM
  • Vote share statistic

First, let’s have a look on the variables that are correlated with our target (Share).

WS, WS/48, BPM and VORP are the top 4 variables that are correlated the most with Share.

Then, we can rank our 10 players according to those 4 metrics.

We can see the duel between Harden and Giannis but also that Jokic got maybe a chance by being ranked 3rd for 3 metrics.


The Model

I have tried several regression Machine Learning algorithms in order to predict the Share for each of our contenders. To evaluate my models I have chosen the Mean Squared Error (MSE) which mesures the average of the squares of the errors. After a 10-fold cross-validation I have displayed boxplots for each model.

The best modeld are Ridge Regression and Linear Regression but since the Ridge median is smaller, I have decided to chose this model.



The 2018-2019 MVP should be …

Giannis Antetokounmpo !


Limits of the model

It’s important to remind that Machine Learning models only take stats into account and not for example :

  • Popularity : LeBron and Curry are much more popular than Jokic.
  • « Surprises » : Paul George weren’t expected to be this good such as Denver Nuggets.
  • Triple doubles, points records.

These factors are also important.