After 5 weeks from the beginning and 20 % of the regular season that have been played we can make some assumptions for the end. Some teams are already in a difficult position and will probably activate the tanking mode. While, at the opposite, some other teams are doing an interesting start and they must confirm on the duration. In this article I will try to predict the final ranking at the end of the regular season while giving a probability for each team to be in the Playoffs in April 2020. For that, I will build statistical models to be able to predict the winrate at the end of the season based on current statistics.
Balanced sheet after 20% of season played
Firstly, I wanted to establish a balanced sheet after 20 % of the season already played. After studying the literature of sports analytics, it exists many estimators for the winrate. I have selected the Pythagorean expectation devised by Bill James for baseball and adapted by Daryl Morey in basketball. It estimates the number of games a team should have won based on the points scored and allowed (if we go back to a really simple definition, we win a game when we score more points than our opponent). The formula is the following :
Based on this estimator, I have represented the number of wins expected for each team with their actual number of wins. If the dot is green, it means they have performed better (or equal) and it is red if their performances were not good as expected.
Memphis, Charlotte and Sacramento are the top 3 team and Phoenix, Dallas and Oklahoma are the bottom 3 teams. We can see the accuracy of thiss estimator with 8 exact predictions and a maximum of ± 2 wins.
Now, still based on the Pythagorean expectation and by assuming that the team well perform the same exact way (which is impossible) we can predict the final ranking at the end of regular season. It can give us a first but certainly wrong idea because some teams are going to wake up, there will be injuries and the more the season advances the more the motivation of the playoffs will grow. But still, here are the rankings for each conference.
(the colored number is the actual number of wins and the black one the predicted)
For example, I am not sure the Bucks will end the regular season with 70 wins. In addition, was I said previously, the predicted rank is based on the actual performances (but it is not because you are doing good in november that you will still doing good in march) and we can see it very well for the Mavs for example.
In order to counter the strong hypothesis, we need to build models that can take into account the fact that performance is variable. For that, I have used stats.nba.com data since 2000-2001 season. For each season and each team, I have selected many statistics that were measured at the end of October (few sample of games played), at the end of November (when I write this article) and the end of the season. There are all kind of statistics ranging from basic ones such as winrate to more advanced ones such as impact estimations or ratings.
After that, we need to build a model and I have chosen 2 approaches :
- Model 1 : inspired of the Pythagorean estimation, based on points scored and allowed and I added shooting percentage.
- Model 2 :based on Dean Oliver « Four factors on Basketball Success » which are shooting, turnovers, rebounding and free throws.
To train these models I have used 2000-2001 to 2017-2018 seasons and I have tested them on 2018-2019 season. Since we are trying to predict a winrate that is a regression problem, my evaluation criteria is the R2 coefficient. Moreover, as I want an accurate prediction I will also take a look at the Mean Absolute Error.
After implementing several common Machine Learning Algorithms (Linear Regression, RF, kNN, boosting..) the best seems to be the Boosting for both approaches. To check the validity of the models, I have ploted the residuals to see if at least 95 % are within 2 standard deviations of the mean.
Then to give a mathematical interpretation I have realized a Shapiro-Wilk test to see if the sample is normal to validate the model. Here are the results :
Both of our models have more than 95 % and a p-value < 0.05 so everything is ok. As we can see, the first model (inspired of Pythagorean estimation) is doing better than the second one (four factors model) with a better R2 and also a better MAE (4.83 for model 1 and 5.72 for model 2). I’m going to use Model 1 for the final predictions.
For each team there is the predicted number of wins and the probability of going to the Playoffs.
Ladies and gentlemen here are the predicted rankings at the end of Regular Season for each Conference !
Not a big surprise with the Bucks finishing first accompanied by the Raptors, the 76ers, the Celtics, the Heat, the Pacers, the Nets and the Magic for the 2020 Playoffs. The Pistons and the Wizzards still have a chance to reach the last places.
It’s much more disputed for the West but the King and his buddies should finish 1st. The Nuggets, the Clippers and the Rockets are almost certain to reach Playoffs followed by the Jazz and the Mavericks. Then for the last 2 places there is a battle between Portland, Minnesota, Phoenix and Oklahoma as the underdog but the Blazers and the Wolves should make it.
I have looked at the Win Share (estimated number of wins a contributed by a player) for important players in top teams to measure how impacted they would be if the player is injured. The number displayed is WS divided by the number of game played to calculate a percentage. For example James Harden contributed 23 % of the Rockets wins for the moment.
Western players : James Harden (0,23), Luka Doncic (0,2), LeBron James (0,19), & Anthiny Davis (0,19), Damian Lillard (0,18), Karl Anthony Towns (0,17), Rudy Gobert (0,15) & Donovan Mitchell (0,11), Kawhi Leonard (0,11), Nikola Jokic (0,1), Devin Booker (0,07).
Eastern players : Giannis Antetokounmpo (0,2), Jimmy Butler (0,18) & Bam Adebayo (0,14), Kyrie Irving (0,15), Domantas Sabonis (0,13), Pascal Siakam (0,12) & Fred VanVleet (0,12), Al Horford (0,12) & Joël Embiid (0,12), Jonathan Isaac (0,1) & Evan Fournier (0,09), Jayson Tatum (0,07).
If one of this player is injured, his team could lose from 10 to 20 % of their game..