NFL Expected Points with nflscrapR: Part 1 - An Introduction to Expected Points

Written by: Ron Yurko (@Stat_Ron)

A Tale of Two Runs

In the first game of the 2016 season, the Cleveland Browns lined up on 3rd-and-20 following the Philadelphia Eagles sacking QB Robert Griffin III for a 10-yard loss (this sentence pretty much sums up the 2016 Browns). On the next play, Browns' RB Duke Johnson takes a draw play up the middle for an 11-yard gain, of course failing to convert the first down resulting in the Browns once again punting. The Eagles would go on to win 29-10.

Meanwhile the Pittsburgh Steelers, a much better team than the Browns, were facing a 3rd-and-1 against the Washington Redskins in their first game of the 2016 season. Pittsburgh RB DeAngelo Williams rushes up the middle for a 3-yard gain, converting the first down. The Steelers would go on to score a touchdown, and eventually win the game 38-16.

Now obviously, you don't need me to tell you that the 2016 Pittsburgh Steelers were a lot better than the 2016 Cleveland Browns. But by only looking at the yards gained on a play, a traditional football stat, Johnson's 11 yards failing to convert the first down is 3.67 times greater than Williams's 3 yards which converted the first down!

This is a clear example of an important fact... NOT ALL YARDS ARE CREATED EQUAL!

A Brief History Lesson

I'm not introducing anything new with this idea, in fact it's been around since the ‘70s. A great irony of the opposition to applying statistical analysis in football is that former BYU and NFL quarterback Virgil Carter was a pioneer! He studied statistics at BYU and after being drafted by the Chicago Bears the front office funded his MBA, with a quantitative focus, at Northwestern University. Along with Dr. Robert Machol in 1971, Carter published Operations Research on Football and introduced an expected points model. After turning the field into 10-yard bins they calculated the average value of the next scoring event, noting the relationship between field position and the expected points:

The next big step in football analytics arrived in The Hidden Game of Football by Bob Carroll, Pete Palmer, and John Thorn (Palmer and Thorn wrote the classic book The Hidden Game of Baseball). The authors went above and beyond Carter's initial work, establishing that a play's success is a function of both the down and yards to go. If it's 1st-and-10, for a play to be successful it must gain 4 yards, because that leads to 2nd-and-6, then another 4 yards to 3rd-and-2 and then again for a first down. If the play results in less than 4 yards, such as 3 yards, then it's 2nd-and-7 which means more yards must be gained on a following play to compensate for the first down failure (this changes if the team is willing to go for it on 4th down). In addition to the incredible amount of football analysis in the book1, they also developed a linear expected points model beginning with -2 points on the team's own goal line and increasing by 2 points for every 25 yards ending at +6 on the opposing goal line.

We can see how this compares with Carter’s model:

In terms of an expected points model, the approaches above are pretty simplistic. An expected points model needs to account for all aspects of the situation: down, yards to go, yard line, and time remaining in the case of nflscrapR's model (more on that later). Recent developments have been led by Aaron Schatz at Football Outsiders, Keith Goldner at numberFire, and Brian Burke who is now at ESPN but previously ran his own site Advanced Football Analytics (apologies to many others not listed)2. Their work has led to new insights into the game, however there is one clear problem...

Reproducible Research and nflscrapR

Because of the reliance on proprietary and costly data sources, such as charting that can also be biased by human judgement, recent work in football analytics is not easily reproducible. I don't need to tell you why reproducible research is important, but the lack of reproducibility has been a major setback in football analytics. Other sports such as baseball and hockey have seen an incredible amount of community research due to growth in publicly available data.  An example of this is baseball’s Pitchf/x data which led to several baseball “outsiders” joining front offices around the league due to their incredible work. Meanwhile, Roger Goodell’s football is limited to either the companies able/willing to pay, or those brave souls who chart every little detail themselves (which of course takes a ridiculous amount of time). For a PhD student like myself, this presents a dilemma - what publicly available dataset can I use for football research?

The solution to this problem is nflscrapR, an R package created by Max Horowitz which uses an API maintained by the NFL to scrape, clean, parse, and output clean datasets at the individual play, player, game, and season levels going back to 2009. Under the supervision of Sam Ventura, Max also calculated the expected points and win probability for every play from models built solely using nflscrapR data. His goal was to spark a movement for a larger football analytics community to form around reproducible research. I joined the development of nflscrapR this past year, focusing on improving the expected points and win probability models. This post serves as an introduction for a series of posts on every aspect of the nflscrapR expected points model, to clearly explain what, why, and how we model expected points. Here are key facts that will be explained in future posts:

  • It’s primarily a multinomial logistic regression model generating probabilities for the seven possible types of next scoring events within the same half:
    • Touchdown (7)
    • Field Goal (3)
    • Safety (2)
    • No Score (0)
    • Opponent Safety (-2)
    • Opponent Field Goal (-3)
    • Opponent Touchdown (-7)
  • The model is fit using the following variables and interactions:
    • Yards from opponent’s end zone
    • Down
    • log(yards to go)
    • Indicator for goal down situations
    • Seconds remaining in the current half
    • Indicator for under two-minute warning
    • Interaction between log(yards to go) and down
    • Interaction between yards from opponent’s end zone and down
    • Interaction between log(yards to go) and goal down indicator
  • Observations are weighted by both score differential and difference in number of drives between play and the next score.
  • All model decisions were based on Leave-One-Season-Out cross validation and calibration.
  • Field goals, PATs, and kickoffs are treated separately.

And you can access our code for the exact same model used in nflscrapR here.

Expected Points Results

Before diving into the various explanation posts, I’ve decided to pull a Tarantino and show you the results of the model in this introduction.  Below is a comparison of the nflscrapR model by down with the Carter and Hidden Game of Football models from before:

The importance of down is clear, and it’s interesting to see how the nflscrapR model built on data from 2009-2016 compares to Carter’s estimates from 1971.  Instead of just looking at the expected points output, an advantage of using a multinomial logistic regression model is that the relationship between the field position and the probabilities for each of the types of scoring events can also be viewed (with respect to the possession team):

This is personally one of my favorite charts in terms of understanding the importance of field position with regards to the down. On 1st down, the team is more likely to score a touchdown than a field goal across the field, with the probabilities diverging closer to the end zone.  But this relationship changes as the down increases, causing field goals to be much more likely once it is 4th down.

Now returning to our original run example, how do we use expected points to properly evaluate those two running plays? Using the change in expected points from one play to the next, we can calculate the expected points added (EPA) providing a point value based on the change in situation for a play.  Duke Johnson’s 11-yard rush failing to convert the first down? Only 0.04 EPA. Meanwhile, DeAngelo Williams’ 3-yard carry converting the first down? 0.29 EPA, meaning his 3 yards were over 7 times more valuable than Johnson’s 11 yards.  This is ultimately the point of an expected points model, not necessarily optimally predicting the next scoring event, but providing a baseline to measure team and player performance appropriately.

The next post in the series should make Carnegie Mellon Stats professor Rebecca Nugent proud, as I’ll explain why we chose a multinomial logistic regression model and the problems with other approaches.  I’ll also eventually post about the various expected points added metrics we’ve calculated with nflscrapR, but in the meantime check out my slides from the Great Lake Analytics in Sports Conference and access the data files with all the stats here.

And of course the code to generate the figures in this post is located here.

1 It’s really an incredible book but out of print, had a to buy a used copy of the 1998 edition myself.

2 Burke's site in particular has been incredibly helpful for my nflscrapR research with its clear posts and explanations of expected points and win probability.




None of the nflscrapR research would be possible without Max’s incredible work in creating the package and for Sam advising us every step of the way.

Thanks as well to Dr. Joseph Yurko (MIT) and Madeline Marco Scanlon (Carnegie Mellon) for their constant feedback in my ridiculous football endeavors.


Goldner, K. (2017).
Situational success: Evaluating decision-making in football.
In Albert, J., Glickman, M. E., Swartz, T. B., and Koning, R. H.,
editors, Handbook of Statistical Methods and Analyses in Sports,
pages 183–198. CRC Press, Boca Raton, Florida.

Carter, V. and Machol, R. (1971).
Operations research on football.
Operations Research, 19(2):541–544.

Hastie, T., Tibshirani, R., and Friedman, J. (2009).
The Elements of Statistical Learning: Data Mining, Inference,
and Prediction.
Springer, New York, New York.

Pasteur, R. D. and David, J. A. (2017).
Evaluation of quarterbacks and kickers.
In Albert, J., Glickman, M. E., Swartz, T. B., and Koning, R. H.,
editors, Handbook of Statistical Methods and Analyses in Sports,
pages 165–182. CRC Press, Boca Raton, Florida.

Burke, B. Advanced football analytics.