Find the best players of 2015-2018 NBA Draft with AI

I have always been passionate about following young players through their progression. In today sports and more particularely American ones such as the NBA it is really important to scout young players for the annual Draft.

The NBA Draft takes place every year at the end of Regular Season where teams can pick young players coming from NCAA (University League) and all around the world. This process is used to introduce new players to the League and to balance the franchises : the worst team of the season have higher probability to pick first. So, logically the best prospects will be picked in first positions. However being picked first does not mean the player will necessarily have a great success : Anthony Bennett was the 1st pick of the 2013 NBA Draft (by Cleveland Cavaliers) but has never been successfull in the NBA. On the contrary, some players have been picked later and are good players : Isaiah Thomas (pick 60 in 2011) or Draymond Green (pick 35 in 2012) for example.

In this article, I was interesed by finding the best players of the NBA Drafts depending on their statistics during their Rookie (1st seaon) and Sophomore (2nd season) seasons in the NBA. For each year since 2015 I tried to predict the 9 best players of each Draft : the players that will be the best 5 years after their arrival in the League. For example predict the best players in 2023 that have been drafted 5 years earlier so in 2018. I have chosen 5 years because it allows to have a good vision of both the potential and the consistency.

The goal is to to use an algorithm that is able to evaluate the progression between Rookie and Sophomore year to predict if the player will still play in the league in 5 years but also and mainly if he will be a top player, a good player or an average one. Therefore I have collected the following data on NBA Stats and Basketball Reference for all players drafted between 1996 and 2018 :

Both Rookie and Sophomore year. PIE means Player Impact Estimate and will be the metric used to predict how good a player will be 5 years after being drafted.


Some Visualization

Before getting to the heart of the matter I wanted to draw some graphics to illustrate a bit our data.

First of all here are the top 10 Universities that sent players to the draft and the top 10 Teams that drafted the most during the last 25 years.

Philadelphia 76ers appears first with their « Trust the Process ».While for the University, the Kentucky Wildcats is in first position with some famous players : DeMarcus Cousins, Anthony Davis, Karl-Anthony Towns or Devin Booker.

Then I wanted to see if there is a really a link between being picked in the first choices and having a high PIE. I have plot the following graphs :

  • PIE depending on pick and the years after being drafted : picks 1-10 leading no matter the years. The first 2 years (Rookie and Sophomore) it looks a bit mixed while after 5 years it is quite well sorted.
  • Mean PIE for each picks range depending on years : players picked between 1-10 appears much more better than the other picks. It is interesting to note that players from the secound round (picks 30-60) are on average better than player of the end of 1st round (pick 21-29).
  • Good players no matter the picks as times go : first thing to observe is that many drafted players don’t play more than 2 years. At the end only good players stay in NBA so it is logical to see the green portion (proportion of good players) grows through the time. (To define good players I have taken players with a PIE higher than 8,5 for Rookie year, higher than 10 for Sophomore year and higher than 11 for 5 years in the league).
  • Distribution among the good players : picks 1-10 represent almost 50%. As we already mentionned, after 5 years more players from pick 2nd round than end of 1st round. (It is partly explained by the fact that there are 3 times more players from picks 30-60 than picks 21-29).

To complete, I drew the histograms of each groups. It allows to verify that our data has a Gaussian distribution because a sum of independant Gaussians (which is true here) gives a Gaussian. Furthermore we can cofirm that most of the time pick position is linked with the level of the players in the NBA.

The Model

(Every mathematical details at the end)

I remind you that the aim of the model is to predict the level of a player (his Player Impact Estimate more precisely) 5 years after being drafted depending on his statistics in Rookie and Sophomore year. Top players are players with a PIE above 15.

I have tried many Machine Learning algorithm for regression (Decision Tree, kNN, Random Forrest, Bagging, Boosting, Linear Regression, Lasso/Ridge Regression and Elastic net Regression). To evaluate those models, I have used the R2 coefficient going from 0 (bad) to 1 (good) and the best is Elastic Net Regression (R2 = 0,6 on test set). I have trained the model and tested the model with a 10-fold cross-validation of players drafted between 1996 and 2013 and tested it on the whole 2014 Draft.

For the 2014 the top players are Joël Embiid (pick 3), Nikola Jokic (pick 41) and Andrew Wiggins (pick 1). Good players are Bogdan Bogdanovic (pick 27), Zach LaVine (pick 13), Jabari Parker (pick 2), Dario Saric (pick 12), TJ Warren (pick 14) and Marcus Smart (pick 6).

As you can see, the model is not so bad : Embiid and Jokic are definitely NBA top players today. At the opposite I think there are better players than Dario Saric or Jabari Parker if we look them today.


The Results

Now that we have a fairly successful model we can predict top and good players from 2015 to 2018 Drafts.

Top : Karl Anthony Towns (pick 1), Devin Booker (pick 13), Kirstaps Porzingis (pick 4)

Good : D’Angelo Russel (pick 2), Montrezl Harrell (pick 32), Jahlil Okafor (pick 3), Richaun Holmes (pick 37), Myles Turner (pick 11) and Norman Powell (pick 46)

Top : Ben Simmons (pick 1), Brandon Ingram (pick 2), Domantas Sabonis (pick 11)

Good : Jaylen Brown (pick 3), Jamal Murray (pick 7), Caris LeVert (pick 20), Buddy Hield (pick 6), Malcolm Brogdon (pick 36) and Ivica Zubac (pick 32)

Top : Donovan Mitchell (pick 13), John Collins (pick 19), De’Aaron Fox (pick 5)

Good : Jayson Tatum (pick 3), Bam Adebayo (pick 14), Kyle Kuzma (pick 27), Lonzo Ball (pick 2), Lauri Markkanen (pick 7) and Jonathan Isaac (pick 6)

Top : Luka Doncic (pick 3), Trae Young (pick 5), De’Andre Ayton (pick 1)

Good : Shai Gilgeous-Alexander (pick 11), Mitchell Robinson (pick 36), Jaret Jackson Jr (pick 4), Collin Sexton (pick 8), Devonte’ Graham (pick 34) and Donte Divicenzo (pick 17)

I would say that all the players named here are already top or good players and can still improve but I am quite suprised for some players : for example Jayson Tatum not being top player when we see his level this season. Moreover, players are probably missing such as Pascal Siakam for example.


The limits
  • Drafted players who don’t play or get injured first years are penalized (Markelle Fultz for example)
  • Being drafted in a bad ranked team gives more minutes to play which means more minutes to shine.
  • Model does not take injuries into account : injuries can slow player progression or even put an end to his career.
  • It is « quite easy » to predict level of players after observing them 2 years in the league (we don’t need AI to predict Doncic will be a top player for example).


Mathematical details

Linear Regression

In statistics, linear regression is used to modelize a target (for us PIE) that is a relationship between one or more variables (in our case : minutes played, points scored etc.). It is mathematically defined as followed.

We need to estimate betas with the dataset.

Elastic Net

Elastic Net is a regularization method that uses both L1 (LASSO) and L2 (Ridge) regularizations in order to « control » the values of the estimated parameters Betas. The estimates are defined as followed.

(Argmin : value that minimize the function)

The advantage is that we keep advantages of both LASSO and Ridge methods :

  • Selection of relevant variables
  • Group correlated variables