Written by: John McCool (@Desertrose28)
This year’s draft features an exciting crop of prospects including Markelle Fultz, Josh Jackson, and Malik Monk. Despite the deep draft class, teams picking in the lottery are still at the mercy of player flops and the injury bug that derails the careers of many promising young players. We will use the physical measurements and college statistics from the 2010 through 2015 draft classes to determine which variables are the best indicators of performance at the NBA level.
The 2010-2015 draft picks’ win shares per 48 minutes (WS/48) was predicted using random forest. Players that lacked physical measurements and/or college playing experience were excluded for the model.
The model combined players’ college statistics and physical attributes to predict their WS/48 in the NBA. Some players did not have long careers in the NBA and therefore had a small game sample size that added an element of bias to the model.
Random forest is an ensemble model that takes a combination of individual parts (variables) and grows many different models and outcomes averaged across a group.
It creates randomness through bootstrap aggregation and creating subsets. The latter technique takes a randomized sample of rows in the data set with replacement. The former takes a subset of variables (instead of the entire group of variables) when creating the model.
Figure 1: Node Purity for the increasing importance of college statistics and physical attributes in the random forest model
The above plot shows the node purity in the random forest model. Node purity relates to the loss function by which splits are chosen or how well the trees split the data in the model.
The most predictive variables for future NBA win shares per 48 minutes is a player’s offensive rebounds, conference, three point attempts, and defensive rebounds at the college level. On the physical side, a player’s weight, max vertical jump, and wingspan are among the most predictive variables in the model. The model was overall fairly accurate in projecting draft picks’ WS/48. In the sample, the average actual WS/48 was 0.0519 compared to 0.0507 WS/48 predicted in the model.
Players that had short careers in the NBA could artificially boost wins shares per 48 despite playing limited minutes. The model was trained on data between the 2009-2010 through 2015-2016 college basketball seasons. Some players in the model have incomplete college statistics. For example, Kemba Walker’s freshman statistics in 2008-2009 were not included in the model, which could add bias to the win shares predictions.
It also important to note that it is much easier to predict draft picks’ WS/48 after two or three years in the NBA. Projecting the future value of 2017 draft picks such as De’Aaron Fox and Lonzo Ball is much harder without having their baseline statistics in the NBA.
Figure 2: Predicted WS/48 compared to the actual WS/48 in the NBA
Source: Basketball Reference and Sports Reference (Statistics through 3/9/17)
One of the most undervalued players in the model is San Antonio Spurs’ forward Kawhi Leonard. He is underestimated in the model by 0.07 WS/48. Leonard’s “model value” in part is hurt since he played in the Mountain West Conference against weaker competition and owned a below average 32-inch max vertical jump entering the NBA. Leonard still received the highest WS/48 projection in the model edging out Karl-Anthony Towns.
Isaiah Thomas was also undervalued in part because of his below average wingspan and reach. However, Thomas had among the highest vertical jump (40 inches) and averaged 1.8 three pointer per game that stabilized his predicted value.
Similar to Thomas, Kemba Walker was undervalued because of below average reach and wingspan but made up these deficiencies with a 39.5-inch max vertical jump. Walker was also a prolific scorer averaging 19.5 points and 1.5 three pointer per game in his final three seasons with the Connecticut Huskies.
Figure 3: Actual WS/48 versus projected WS/48 for undervalued picks in the sample
The model overvalued Andy Rautins in part because he averaged 2.8 threes per game at Syracuse. Despite his sharp shooting, the shooting guard recorded just a 30.5-inch max vertical jump. Another overvalued prospect was Justin Harper, the 32nd pick in the X draft. The model rewarded Harper’s ability to knock down the three and grab offensive boards.
Figure 4: Actual WS/48 versus projected WS/48 for overvalued picks in the sample
Improving Draft Precision
It would be useful to take player’s position into account in the model. For example, point guards and stretch forwards typically have different body types. When projecting the point guard’s future, the model should evaluate him based on his assist to turnover ratio and shooting metrics rather than his rebound totals. On the flip side, the model should “reward” the stretch power forward for having high offensive rebound totals and an above average wingspan.
While statistical modeling can offer a fun and intriguing glimpse of a draft pick’s performance, data driven analysis should be complemented with detailed scouting reports and personality tests. Theo Epstein, the architect behind the Red Sox and Cubs’ World Series teams, builds his teams around character and chemistry. Scouts working with the Cubs are required to give three examples of how a prospect responds to adversity on the field and three examples of the prospect responds to adversity off the field. A handful of NBA teams are likely applying some of these same scouting and player development principles to complement their use of analytics.