Are NHL Expected Goal Models Biased? (Part One)

Written by John McCool (@Desertrose28)

Quick Takeaways:

  • There is potential evidence that the rebound and rush coefficients in logistic regression models for expected goals (xG) shifted across seasons
  • The distance and angle coefficients for xG models did not differ significantly from 2008-09 through 2015-16, though there appears to be a slow, upward shift
  • Significant changes in these coefficients across seasons may render xG models trained on past seasons of data inappropriate for computing xG statistics on subsequent seasons
  • Analysts publishing xG models/results should carefully consider how they train and apply logistic regression models for xG, and should be clear about the data on which their models were trained
  • Part 2 of this series will look at the distribution of these coefficients accounting for variation within and across seasons

In hockey analytics, shot attempt metrics have grown in popularity in recent years due to their ability to project future player and team performance. However, metrics like Corsi (all shot attempts) and Fenwick (all unblocked shot attempts) do not take into account shot quality. One approach to incorporating shot quality into shot attempt metrics is to weight each unblocked shot attempt by its probability of being converted into a goal. In this approach, a goal expectancy is assigned to unblocked shots based factors such as the distance, angle, and whether the shot resulted from a rebound or off the rush. The resulting metric is typically referred to as “Expected Goals” (xG).

Expected goals (xG) are better predictors of team and player performance than Corsi and +/- because goals are random and scarce events. For reference, the NHL average expected goal on a given shot hovered around 0.068 based on shooting percentage in 2015-2016.[1] Also, it is important to realize that expected goals are also influenced by pass sequences, quick touches, or traffic in front of the net, and not just by player skill alone. Ryan Stimson, a contributor of Hockey Graphs, created The Passing Project, which tracks the relationship between pass types (i.e behind the net or stretch passes) and shooting percentage.

Expected goals differs from shot attempt metrics like Corsi or Fenwick in that the value assigned to a particular shot attempt comes directly from a statistical model, typically logistic regression. To evaluate players or teams in the 2016-17 season, it might seem reasonable to use an xG model trained on previous seasons (e.g. data from 2007-08 to 2015-16). However, doing so could potentially lead to a substantial amount of bias. What if goalies have systematically gotten better or worse at stopping particular types of shots (e.g. better rebound control) or shots from particular areas of the ice over time? What if players have gotten systematically better or worse at converting particular types of shots or shots from particular areas of the ice over time? If this is true, it would be inappropriate to use an xG model trained on previous seasons of data to evaluate players or teams in the current season. To date, no one has explored this potential bias.

Part one of this series explores the variation between the distance, angle, rebound, and rush coefficients in logistic regression models for expected goals from the 2007-2008 through the 2015-16 season. Changes in these xG coefficients may indicate systematic shifts in how goals are scored in the NHL over time, rendering xG models trained on past seasons of data biased when applied to the 2016-17 season. As shown below, we find that the xG coefficients do differ significantly between some seasons, particularly those for rebound and rush shots.

The statistical analysis was carried out using logistic regression. Using this technique, we developed a model that measured the probability or odds of the response (shot outcome) taking on a particular value (1 if goal, 0 if non-goal), which was modeled conditional on the distance of the shot, angle of the shot, whether or not it was a rebound shot, and whether or not it was a shot off the rush. The expected goal coefficient values will be compared to the coefficients obtained in the most recent 2015-2016 season, gathered from 107,872 goals, shots on goal, and missed shot observations. All data comes from Corsica.

Distance Coefficient

Shot distance is a good predictor of goal probability. Unsurprisingly, shot attempts closer to the net have a higher probability of getting past the goaltender. As an example, the Montreal Canadians had a -0.47 correlation between expected goals and the distance of their shots last season, which includes the statistics of their recently signed players during the offseason.

The log-odds of a goal for the distance coefficient was -0.041 in 2015-16. In other words, for every one-foot decrease in the distance coefficient, the log-odds of a goal (relative to a non-goal / save) fell by 0.041.[2] This coefficient had the highest absolute Z-value (-48.1) suggesting that it is strongest predictor of expected goals in the model (p-value < .001).[3]

We can see that while there are no significant changes in the distance coefficient across the past nine seasons, there appears to be a slight upward trend in its value between 2007-08 and 2015-16 moving from -0.044 to -0.041. Based on this increase, we might interpolate that there is now a higher expected goal probability on shots from further distances. Interestingly, for consecutive pairs of seasons such as 2010-11 and 2011-12 or 2013-14 and 2014-15, the distance coefficients make substantial (though not significant) jumps across these seasons.

Angle Coefficient

We also found that increasing the angle of a shot slightly decreases the log-odds of a goal by -0.013 units (p-value< .001). There was also a -0.20 correlation between angle of the shot and expected goals last season.

The time series shows an increasing trend in the distance coefficient since 2007-08. The coefficient moved from -0.016 to -0.015 before jumping to -0.013 in 2015-16. The large increase for the 2015-16 season is concerning, since the angle coefficient in 2015-16 falls outside of the 95% confidence interval from all previous seasons except the lockout-shortened 2011-12 season. This graph provides some potential evidence of a systematic shift in what affects the likelihood of an unblocked shot attempt resulting in a goal in recent seasons. At the very least, some caution should be used when using xG models from previous seasons to obtain xG statistics for the current season.

Rebound Coefficient

The log-odds of a goal from rebound shots was 0.934 last season (p-value<. 001). As a binary/indicator variable, the rebound coefficient has a slightly different interpretation than the distance and angle coefficients. For instance, we would say that the log-odds of a goal (vs. a non-goal / save) increased by 0.934 compared to a non-rebound shot situation in 2015-16.[4]

This graph is particularly concerning. The time series plot shows that there has been a significant drop in the rebound coefficient, especially in recent seasons. This coefficient reached a peak in 2010-11 (1.27) before steadily falling to 0.934 in 2015-16. The decreasing trend might suggest that rebound chances are having less of an influence on expected goal probability. It is also possible that goalies are stopping more rebound shots during recent seasons. The coefficients for several seasons fall outside of the 95% confidence interval around the coefficients of the previous season (e.g. 2011-12, 2013-14, 2015-16). These significant differences in the rebound coefficient from season-to-season are very concerning, and provide some evidence that it may be inappropriate to use xG models trained on previous seasons to obtain xG statistics for subsequent seasons.

Rush Coefficient

Rush shots increased the log-odds of an expected goal by 0.839 compared to “normal” shot attempts in 2015-16 (p-value<0.001).[5] The coefficient had a high amount of variation (likely due to the lower sample size for rush shots), similar to that of the rebound coefficient. The expected goal log-odds for the rush coefficient increased by 0.04 units last season relative to 2014-15, which was only a moderate change compared to a 0.10 drop between the 2012-13 and 2013-14 seasons or the large increase from the 2009-10 to 2010-11 seasons. The unpredictability of this coefficient could possibly be explained through goalie, defensive, or offensive performance in rush situations.

Overall, we expected to find minimal variation between the expected goal coefficients. On one hand, there was relatively low deviation in the distance and angle coefficients. On the other, the rebound and rush coefficients had an unusually high amount of variation. Some of this variation could be related to the changing pace and style of play of the game or simply random sequencing.

Finally, note that we used a parsimonious xG model here (with no interaction terms). More complicated xG models (for example, those that include interaction terms involving rebound and/or rush shots) may suffer coefficient instability across seasons even more significant that what we discovered above. We encourage other analysts to perform similar tests of their own xG models and, perhaps more importantly, to be clear about which seasons their models were trained when publishing xG results.

The second part of this series will explore the coefficient variability in more depth using bootstrap analysis, which uses resampling techniques to provide a measure of the variability to the coefficient estimates from individual seasons. This allows us to approximate the distribution of the distance, angle, rebound, and rush coefficients, accounting for variability within seasons. We can then compare the measure of variability in the bootstrapped coefficient means and identify the changes in these coefficients during the last nine NHL seasons.


Ryder (2004), Johns (2004), Krzywicki (2005, 2009, 2010), Awad (2009), Schuckers (2011).


[1] This means that players were expected to score on 6.8% of their shots on average.
[2] We will be using this interpretation for the distance and angle coefficients.
[3] The significant p-value also tells us that the model fits significantly better than one with no predictors.
[4] Alternatively, we one could say that the expected goal odds increased by 2.54 units versus a non-rebound shot.
[5] This does not take into account if the shot stems from a rebound or the distance and angle of the shot.

Leave a Reply