CMU in Sports: Sherri Nichols’ Impact on Baseball Analytics

Written by: Austin Leung

The CMU in Sports Series details the impacts of past and present CMU students, faculty, post-docs, and staff who have made an impact in the world of sports analytics.  The first profile in this series features an interview with Sherri Nichols, a former CMU graduate student in Computer Science.

In the early to mid eighties, there was Bill James, there was Pete Palmer and then, before the internet had ever taken the world by storm, there was Sherri Nichols on rec.sport.baseball. Rec.sport.baseball was a precursor to internet forums where discussion of baseball analytics flourished. While the baseball world continued on, unaware of the growing sabermetrics movement, Nichols argued for the significance of walks and discussed the new OPS stat. There wouldn’t be a breakthrough in the MLB for years, notably with Billy Beane’s Moneyball Oakland Athletics in the early 2000s. Bill James and Pete Palmer were instrumental in paving the way for the analytics revolution that dominates today’s MLB, but not to be lost in the timeline, so was Sherri Nichols. Nothing came easy for her as she was one of the only women in a field that was completely unexplored. And yet, she has made lasting contributions in the form of Defensive Average and Retrosheet that have changed the way baseball is understood today.

As far as she can remember, Nichols was a baseball fan, growing up with the Reds. She would play with the boys at recess, but as much as she wanted to, girls weren’t allowed to play little league in her area. So, she pursued her love for baseball in another way.

Elsewhere, while Nichols was still in high school, aspiring baseball writer Bill James had decided to look at baseball from a different angle than everyone else—with statistics. To James, home runs, strikeouts, and other traditional stats didn’t do the complexity of America’s pastime justice. The first annual edition of The Bill James Baseball Abstract, filled with brand new statistics, was published in 1977. Then in 1984, John Thorn and Pete Palmer’s The Hidden Game of Baseball added yet another layer, arguing for the measurements that actually contributed to winning. The sabermetrics movement had begun to materialize. How much more was there to baseball?

But at the end of the day, it was still the eighties. And the eighties were a different time for analytics. You couldn’t work off the analysis tools of others by downloading packages in R, because, well, R didn’t exist then. Instead, C was the language of choice. And while the Internet was in its infancy, Nichols had to find her own way navigating through textbooks. She built upon her affinity for programming and math after conducting data analysis on physics as an undergraduate.

When she stumbled upon rec.sport.baseball while she was studying at CMU, she found likeminded people who wanted to talk about newer stats like walks and OPS and how they affected the game. There, she met and conversed with people like Gary Huckabay, who founded the popular sabermetrics website Baseball Prospectus. As Nichols and others bounced around ideas, she went to sabermetrics conventions and used her skills to conduct her own analysis.

But as sabermetrics grew, there was one imminent problem—how do you analyze in the absence of data? It wasn’t as simple as hopping online and checking Statcast for baseball data; the World Wide Web wasn’t even released to the public until 1991. Nichols recognized this hurdle. And in a way, you could say she became Statcast.

In 1983, when Bill James founded Project Scoresheet to collect play-by-play baseball game data, Nichols became involved in recording the data. Using information from Project Scoresheet, Nichols and Pete Decoursey, another Project Scoresheet contributor, created their own statistic to evaluate fielding, called Defensive Average (DA). The status quo was to use fielding percentage, but Nichols used the hit location data being collected in Project Scoresheet to evaluate what percentage of balls hit into each player’s area turned into outs. It was one of the first of its kind, splitting the field into zones. It accounted for harder to reach and harder to field balls rather than avoiding errors. Defensive Average became a precursor to commonly used defensive statistics today such as Ultimate Zone Rating (UZR), which also evaluates fielders based on how they perform on balls hit into their area on the baseball field.

As Project Scoresheet died down, Nichols became involved in its what could be considered its successor in Retrosheet, after meeting its founder, biology professor David Smith. She would serve as the vice-president on Retrosheet’s Board of Directors through 2003. Retrosheet had the goal of retroactively recovering box scores and play-by-play for the decades of baseball that had been played to that point. The status quo for baseball statistics prior to Project Scoresheet and Retrosheet was the Elias Sports Bureau, which had official records for the MLB, but there was no such resource for the public. Certain statistics might have been available on a team by team basis perhaps through yearbooks or media guides, but there was really no viable comprehensive option.

Compiling these statistics was a multi-step process, in which Nichols took a leadership role. It was first reliant on obtaining game data by contacting retired sports writers, teams, announcers, and fans, or even buying collections of old score sheets on Ebay. Then came the tedious part, as score sheets were mailed out to thousands of volunteers who would each translate each score sheet into a common system and input it into their personal computer (personal computers only gained popularity in the late 80s). Every score sheet would take 45 minutes to an hour. Finally, the data entries would be sent back in floppy disks. Today, 94.4% of American and National League games since 1901 have play-by-play data on Retrosheet which totals to over 186,000 games. Websites such as Baseball-Reference rely on Retrosheet’s enormous dataset which has only made possible by the tremendous undertaking of people such as Nichols. Nowadays, people have lived their whole lives without knowing of a world where they couldn’t look up whether Mack Wheat had a higher batting average on balls in play against left handed starters in 1920 (.270 against lefties versus .248 against righties). That’s the magic of Retrosheet.

One of the decisive factors in the end of Project Scoresheet was monetization at the top, as thousands of unpaid volunteers were involved in making it possible. Retrosheet was founded upon the principle that data would be publicly given away. There were offers to purchase the data, primarily from game companies, but Retrosheet’s directors recognized that game companies, book writers, and broadcasters would all be able to benefit from it. Sherri Nichols was one of the most vocal supporters of Retrosheet’s accessibility. To her, Retrosheet wasn’t non-profit, it was anti-profit. Retrosheet, and all of Nichols’ work for that matter, has never been monetized. It was never about money for Nichols.

 

It was for the love of baseball.

 

Despite Nichols’ incredible pioneering work for her time, she never really considered working with a major league team. Teams just didn’t see sabermetrics as the future at the time. She recalls a time that the general manager of the Pittsburgh Pirates wanted Nichols and a few others at CMU to use statistics to convince the field manager of an idea. Of course, it wasn’t for pay and the general manager didn’t really believe in the statistics. They rejected his offer.

Nichols ended up dropping out of CMU and working on the Andrew File System with IBM, and later worked as a software engineer at Adobe. Over time, especially with the birth of her daughter, she made the decision to step back from the world of baseball analytics. Lost in all her accomplishments is that the end of the day, she was one of the lone women in a male dominated field. She had no female trailblazer to look up to because she was that trailblazer. Reflecting on her time in sabermetrics and tech, Nichols expressed a frustration with the culture driven by the gender imbalance that still lingers on today. There are times when you just get tired of it. You reach a point where it’s just one of the factors that make you think, is it worth it? Despite having a welcoming company in tech, she found that overall, it just wasn’t a world that was friendly towards women. It wore on her. And even to this day, she sees the same issues transpiring in the form of controversies like Gamergate. Her message to women: Pick where you go, but watch the culture.

Nowadays, Nichols can be found in the Seattle area where she participates in lots of community work including working with the ACLU, serving on the planning commission for the city of Redmond, and working on the community truancy board in her local district. She’s a converted San Francisco Giants fan, having lived in the Bay Area for thirteen years and preferring to follow the National League.

She’s just a fan now, like the rest of us. Long gone are her days of Defensive Average and Retrosheet, but their remnants are still etched into the storied history of baseball. As a kid, Sherri Nichols was told that girls weren’t supposed to play baseball. At CMU, she got her chance at the plate, playing intramural softball. That’s where she discovered sabermetrics, and her opportunity to make an impact on the world of baseball. She hit it out of the park.

Leave a Reply