Written By: John McCool (@Desertrose28)
Before analytics and advanced scouting, NBA teams tended to rely on sloppy intuition in the draft. Teams drafted players based on metrics such as college points per game, overall “look”, and shooting skills at a scouting session. At best, these factors offer a partial snapshot of a player’s future performance at the NBA level.
The advent of analytics and advanced scouting can be largely credited to Darryl Morey. Morey is the current general manager of the Houston Rockets. Since taking over, Morey has altered how the Rockets scout players in the draft. The Rockets were one of the earliest adopters of floor spacing while maximizing three point and lay-up opportunities.
In Michael Lewis’ latest book, The Undoing Project, he briefly highlights how Morey brought analytics into player evaluation and how the Rockets were the first front office to explore draft prospects through a statistical lens.
When building his initial draft model, Lewis describes how Morey not only looked at players’ college numbers and physical attributes but also analyzed less obvious factors such as whether the player was left or right handed, had two parents, or played zone defense in the college. While the majority of these statistics were poor predictors of NBA success, Morey ultimately found that his model identified skilled and undervalued players.
As less of a predictive model and more of a fun experiment, we used random forest modeling to project player performance in the NBA based on physical attributes (i.e. body fat) and college statistics. We considered NBA players drafted between 2004-05 through 2015-16. Because some players were missing complete body metrics, the sample was reduced to 204 players.
The NBA Draft with Random Forest Modeling
Random forest is a machine learning method that is capable of performing both regression and classification tasks. It builds trees similar to how a normal decision tree algorithm operates. At every split it makes in the tree, random forest uses a small random subset of features to make the split and builds multiple trees.
We classified players based on their win shares per 48 minutes (WS/48) in the NBA. The players in our subset (204 in total) averaged 0.056 WS/48 with 0.022 and 0.103 WS/48 for the first and third quartiles respectively. Depending on their WS/48, players were classified as replacement, average, good, or elite.
In the sample, 97 draft picks were considered “ good” or “elite” while 107 players were labeled as “average” or “replacement” based on their WS/48 in the NBA. For reference, win shares per 48 minutes normalizes how many win shares a player produces over 48 minutes.
In the model, the so-called “elite” players held WS/48 above 0.103 and the “good” players averaged WS/48 between 0.056 and 0.103. Sampled players in these categories include Kawhi Leonard, Jimmy Butler, and Draymond Green. The above average players held WS/48 between 0.022 and 0.056 and the replacement draft picks produced less than 0.022 WS/48 in the NBA including Devin Booker, Brandon Knight, and Malcolm Lee.
Model 1: Physical Attributes
The first random forest model used players’ body fat, hand length, hand weight, wingspan, height (without shoes), and weight as predictors. Using just these factors, the model has 35.3% accuracy in classifying player performance based on WS/48.
Figure 1: Confusion matrix showing the predicted versus actual classification of NBA draft picks based on WS/48.
With a 78.6% error rate, this model had the most difficulty predicting average WS/48 players. On the flip side, the model was much more successful identifying elite at a significantly lower 47.7% error rate. The best predictor of player skill is standing reach (p-value of 0.20) and to a lesser extent hand length (p-value of 0.38).
Model 2: College Statistics
The second random forest model was trained on college statistics tracking minutes played, assists, steals, blocks, turnovers, and points per game. We also created a binary (0,1 value) depending on whether the player played in a power conference or non-power conference. This model had a slightly improved 40.2% accuracy rate in predicting future NBA success (in terms of WS/48).
Figure 2: Confusion matrix showing the predicted versus actual classification of NBA draft picks based on WS/48.
Using just college statistics, this model did a poor job predicting replacement and average players (64.7% and 60.7% error rate respectively). The most significant predictors of players’ WS/48 in the NBA were total rebounds and turnovers per game with p-values below the 0.05 threshold. Steals and assists per game were also somewhat predictive in future performance (p-value<0.10).
It is also interesting that college points don’t necessarily translate to NBA performance. In the sample, eight players posted WS/48 above the eightieth percentile while scoring less than 10 points per game in college. This subset includes Montrezl Harrell, Steven Adams, and Mason Plumlee. On the flip side, 12 players that scored more than 15 points per game in college flamed out in the NBA.
Model 3: Combine and College Statistics
The final random forest model combined players’ college statistics and combine metrics. This model accurately predicted player performance 41.2% of the time. The best predictors were once again total rebounds and turnovers per game (both significant at the 0.05 level.).
Figure 3: Confusion matrix showing the predicted versus actual classification of NBA draft picks based on WS/48.
Finding the Right Fit
For front offices across the league, it is difficult to pinpoint how a potential draft prospect will adjust to the athleticism and pace of play in the NBA. As the model suggests, college numbers and physical metrics aren’t necessarily the best indicators of NBA success. Aside from the elevated competition level, some draft picks may also have difficulty fitting into a team’s system or clash with their coach’s style of game.
Another important metric that is less utilized in prospect evaluation is player psychology. Independent sports psychologists are becoming more pivotal in player development at the college and NBA levels. For example, in the NBA, Aaron Gordon, Zach Levine, and Andre Drummond regularly work with sports psychologists to improve their mental focus.
Advanced modeling and scouting have improved NBA player projections. However, it still takes some luck to the draft the league’s next mega star. Beyond a player’s statistics, physical features, and athletic ability, it certainly helps to think about how he will fit into a certain playing style or influence social dynamics within the locker room. It is the teams that are willing to consider these factors that have the best chance of finding a hidden gem in this year’s draft.
(Statistics courtesy of NBA.com and Basketball Reference).