NFL Expected Points with nflscrapR: Part 2 - Multinomial Logistic Regression

Written by: Ron Yurko (@Stat_Ron)

The Goal of an Expected Points Model

The ultimate purpose of an expected points (EP) model is to provide an estimate for how many points a team is expected to score given the situation.  This allows us, statisticians and football fans, to measure the value of a play in terms of expected points rather than yards (which we already know is flawed).  So… how do we actually model expected points?

Enter the Statistician

As I mentioned in Part 1 of this series, most football analysts/websites don’t explain their methods in any level of statistical detail.  As a Carnegie Mellon Statistics PhD student, that doesn’t comply.  Addressing this problem as a statistician, I need to understand the data I’m working with.  Using the nflscrapR package it’s easy to create a play-by-play dataset from NFL games during 2009-2016 seasons containing the following key information:

  • Possession team: team with the ball on offense (opponent on defense)
  • Down: 4 downs to advance the ball 10 yards (typically)
    • Can convert for new set of downs, else turnover to defense
  • Yards to go: distance in yards to convert first down
  • Yard line: distance in yards away from opponent’s end zone (100-0) - the field position
  • Time remaining: Seconds remaining in the half
    • Each half is 1800 seconds, overtime is 900 seconds1

Then of course a series of plays makes up a drive, which changes with possession and the following type of scoring events:

  • No Score: 0 points, turnover or half/game ends
  • Field Goal: 3 points, kick through opponent’s goal post
  • Touchdown: 7 points2, enter opponent’s end zone
  • Safety: 2 points for opponent, tackled in own end zone

For every play, we can then identify the type of next score, whether it be the current drive or future drives, that occurs within the current half with respect to the possession team3:

  • For: Touchdown (7), Field Goal (3), Safety (2)
  • Against: -Touchdown (-7), -Field Goal (-3), -Safety (-2)
  • No Score

Modeling the Next Score

From a modeling perspective, our response variable Y is the next score and is limited to only the seven types listed above.  Below we can see the distribution of the next score for all plays from 2009-2016, and can clearly see that this is an offense driven league.

A common approach to modeling expected points takes a “nearest neighbors” form by identifying similar plays in historical data based on down, yards to go, etc. and then taking the average (like Carter’s initial approach in 1971).  This idea seems straightforward, but what actually defines a similar play?  We shouldn’t only look at plays with the exact same situation due to potentially unique or rare situations.  Do we define similarity using Euclidean distance, which would mean assuming the difference between 1st and 2nd down is the same as the difference between 3rd and 4th?  See the problem?  A nearest neighbors approach requires a lot of assumptions that may not be easy to defend.

Another simple type of model is linear regression.  With the response Y as the next score, and X describing a play’s situation we can define an intuitive parametric model for expected points providing coefficients describing the relationships for each variable:

But given the fact the next score only takes seven possible values, is modeling our response as continuous appropriate?  The first thing we MUST do after fitting a linear regression model is check the residual diagnostics, because we’re assuming the model errors follow the Normal distribution with constant variance, and are independently, identically distributed:

When we plot the residuals versus fitted values, all we should see is a random scattering of points with no clear trends whatsoever.  So after fitting a linear regression model using the exact same variables in the nflscrapR expected points model, this happens…

THIS IS DISGUSTING!!!  Honestly, I don’t think I would’ve graduated if I turned in a selected linear regression model with diagnostics like this.  This is a no-go and shows how linear regression results in systematic trends for the errors of the different types of scoring events.

Multinomial Logistic Regression

Our solution to this problem is to treat modeling expected points properly as a classification problem, and to fit a multinomial logistic regression using the nnet package in R.  This is an extension of logistic regression to more than two classes for the response variable.  The nflscrapR expected points model is actually modeling the probabilities of next scoring event using six logit transformations relative to the No Score event:

The advantage of this approach is that our model is agnostic of the value associated with each next score type.  Meaning if you don’t if think a touchdown is worth 7 points, our model actually doesn’t care.  It outputs the probability that a touchdown for the offense is the next score, the probability a field goal for the offense is the next score, and so on for each of the seven possible next score events.  Then to get expected points we simply multiply each event by the assigned value, in the case of nflscrapR this means a touchdown is 7 and see above for the others:

  • Next score probabilities: 
  • Expected Points =

One could easily then assign their own values for each of the next score events, and we’ll consider adjusting these values later on as well.  By using a multinomial logistic regression we are modeling expected points in an appropriate manner as a classification problem, and retain the interpretable properties of a parametric model.

The next post in this series will cover the variables selected in the nflscrapR model, visualizing their relationships with the different types of next scores.

And of course the code to generate the figures in this post is located here.

NFL has shortened overtime to 10 minutes (600 seconds) in 2017.

We’re using 7 points for simplicity, but ultimately this does not affect the model.

3 Remember that we treat PATs separately.




None of the nflscrapR research would be possible without Max’s incredible work in creating the package and for Sam advising us every step of the way.

Thanks as well to Dr. Joseph Yurko (MIT) and Madeline Marco Scanlon (Carnegie Mellon) for their constant feedback in my ridiculous football endeavors.


Goldner, K. (2017).
Situational success: Evaluating decision-making in football.
In Albert, J., Glickman, M. E., Swartz, T. B., and Koning, R. H.,
editors, Handbook of Statistical Methods and Analyses in Sports,
pages 183–198. CRC Press, Boca Raton, Florida.

Carter, V. and Machol, R. (1971).
Operations research on football.
Operations Research, 19(2):541–544.

Hastie, T., Tibshirani, R., and Friedman, J. (2009).
The Elements of Statistical Learning: Data Mining, Inference,
and Prediction.
Springer, New York, New York.

Pasteur, R. D. and David, J. A. (2017).
Evaluation of quarterbacks and kickers.
In Albert, J., Glickman, M. E., Swartz, T. B., and Koning, R. H.,
editors, Handbook of Statistical Methods and Analyses in Sports,
pages 165–182. CRC Press, Boca Raton, Florida.

Burke, B. Advanced football analytics.

Leave a Reply