Josh Harrison: Fooled By Randomness?

By Ron Yurko

If you predicted Justin Morneau to win the 2014 NL Batting Title over the Pittsburgh Pirates super-utility man Josh Harrison by four points, then you’re either a genius or probably crazy.  No one predicted Harrison to have a breakout year in 2014, but rather he was expected to sit on the Pirates bench and only fill in for injury or resting starters.  Except Harrison played phenomenally, made the All-Star game, forced the team to bench throwing-error machine Pedro Alvarez, and was named the starting third-baseman.  J-Hay quickly became a fan-favorite in Pittsburgh with his “Charlie Hustle” style and absurdly entertaining run-down abilities.  Now heading into the 2015 season for the first time as a starter, the question becomes whether or not Harrison’s 2014 was a fluke.

On the surface J-Hay’s 2014 slash line (AVG/OBP/SLG) of .315/..347/.490 looks great and, according to Fangraphs, his WAR of 4.9 makes his season look even better.  However a cause for concern, pointed out by anyone with sabermetrics knowledge, was Harrison’s .353 BABIP (Batting Average on Balls in Play) which was much higher than any of his prior MLB seasons (.304, .259, and .253 for 2011, 2012, and 2013 respectively).  Naturally this is an indication that Harrison experienced some batted ball luck in 2014, conveniently hitting the ball in the gap more than any previous year and should experience a step back in 2015.  But 2014 was Harrison’s first full season of playing, so it’s not necessarily reasonable to write him off as a flash in the pan immediately.  In order to justify whether or not Harrison’s 2014 happened merely by an expected balls in play (BIP) variation, I decided simulate his 2014 season using only his career numbers prior to 2014 to generate a null distribution of batting averages (because it’s the most commonly mentioned stat outside of sabermetrics), then I compare this distribution to his actual 2014 AVG to calculate a p-value, serving as a ball in play significance test.

Before jumping into testing J-Hay’s numbers, I will first go through how my simulation model works with the 2014 league average player.  Rather than just using the league average batting average as a success probability for a hit, everything revolves around the different types of batted balls and their averages.  Before diving into the process, I calculated the 2014 league average values for BB% (walk % = BB/PA), K% (strikeout % = SO/PA), LD% (line drive % = LD/BIP), GB% (ground ball % = GB/BIP), and FB% (fly ball % = FB/BIP).  Table 1 displays the values for this fictional league average hitter:

Table 1:  2014 League Average Hitter Rates
BB% K% LD% GB% FB%
8.1% 17.8% 21.3% 43.6% 35.1%

Along with these values, I used the 2014 batting averages for the three different types of batted balls according to Fangraphs: .239 for ground balls, .685 for line drives, and .207 for fly balls.  FInally, for simplicity sake I use Josh Harrison’s 550 PA as the number of PA for each simulated season.  It’s important to note that I’m ignoring sacrifices but this should not have much of an impact.  The process is as follows:

  1. For each of the 550 PA choose one of three outcomes:
    1. Strikeout with probability = K% = 17.8%
    2. Walk with probability = BB% = 8.1%
    3. Ball in play with probability = 1 - (K% + BB%) = 74.1%
  2. Mark each of the simulated BIP as one of three outcomes:
    1. Ground ball with probability = GB% = 43.6%
    2. Line drive with probability = LD% = 21.3%
    3. Fly ball with probability = 35.1%
  3. Use the batted-ball batting averages to simulate the success of a hit:
    1. Probability of a ground ball going for a hit = .239
    2. Probability of a line drive going for a hit = .685
    3. Probability of a fly ball going for a hit = .207
  4. Finally calculate the simulated batting average by simply taking the number of hits and dividing by the simulated at-bats total = PA - simulated walks.
  5. Repeat this process 1000 times to generate a large distribution.

In Figure 1, I display the distribution of the 1000 simulated batting averages for the league average hitter (in red) versus the distribution of the actual 146 qualified hitters in 2014 (in blue).  I use the density rather than frequency since I ran 1000 simulations versus the 146.

Figure 1: Simulated Batting Average Distribution for League Average Hitter
Screen Shot 2015-03-14 at 3.32.51 PM

The point of Figure 1 is to show that while most simulated seasons for the league average hitter will be around the .260, by sheer batted ball luck the average hitter could hit .290 or .230 with reasonable luck.  This could of course mean a big difference in pay grade in arbitration, and hence why it is so important to look at a player’s BABIP before stating he is a “true” .300 hitter.  In comparison to the blue curve, the red curve is of course more conservative since it is for the league average hitter and thus has lower densities at the tails compared to the actual 2014 batting averages.

Now using this same logic, I simulate Josh Harrison’s 2014 season 1000 times using his total MLB career rates prior to the 2014 season displayed in Table 2.  J-Hay had a total of 575 PA from 2011 to 2013, essentially a full season of playing time stretched over 3 years.

Table 2:  Josh Harrison’s Career MLB Rates (2011-2013)
BB% K% LD% GB% FB%
2.61% 12.3% 19.87% 41.28% 38.85%

Figure 2:  Simulated Batting Average Distribution for Josh Harrison

Screen Shot 2015-03-14 at 3.45.54 PM

The red dashed line marks J-Hay’s actual 2014 batting average of .315, which as it turns out is significantly higher than what these simulations with his prior MLB career numbers would expect.  Only 2.3% of the simulated batting averages were higher than this, implying the obvious statement that Josh Harrison improved beyond expectations from his MLB career to date.  In order to enlighten as to why J-Hay experienced such a vast improvement, I also stored the batted ball rates for these 1000 simulations and display in Figures 3 and 4 the histograms of the simulated line drive and ground ball rate distributions, with his actual 2014 rates as the red dashed line.

Figure 3: Simulated Line Drive Rate Distribution for Josh HarrisonScreen Shot 2015-03-14 at 3.51.21 PM

Figure 4:  Simulated Ground Ball Rate Distribution for Josh Harrison

Screen Shot 2015-03-14 at 3.52.09 PM

The key takeaway from these histograms is that Josh Harrison’s line drive and ground ball rates were significantly different than the simulation results.  His 2014 line drive rate was a career best 24% with only 2.1% of the simulations having a higher rate, and his ground ball rate was 37.3% with 95.6% of the simulations having higher ground ball rates.  Meaning he cut back on his ground balls and hit more line drives than what was expected with his career numbers.  Of course improvements should have been expected as J-Hay is entering his prime at 27 years old.  Plus his career numbers from 2011-2013 were made up of three partial seasons, so its not necessarily true that his rate statistics accurately reflected his true value as a hitter.  The success Harrison had in 2014 was not unprecedented, consider his 2013 minor league numbers at AAA where in 296 PA hit .317/.373./.507 with a .360 BABIP.  Unfortunately I don’t have access to his minor league batted ball rates so I can’t compare his minor league performance with simulations.  But as another check to see if his 2014 season was due to BABIP luck, I will run the simulation again but with his 2014 rates (Table 3) rather than his career rates.  The distribution of the simulations is displayed in Figure 5:

Table 3:  Josh Harrison’s 2014 Rates
BB% K% LD% GB% FB%
4.0% 14.7% 24.0% 37.3% 38.7%

Figure 5:  Simulated Batting Average Distribution for 2014 Josh Harrison

Screen Shot 2015-03-14 at 4.02.07 PM

The simulations using his 2014 rates show an uptick in his expected batting average, but his actual batting average of .315 was still significantly higher with only 5% of the simulations with higher values.  Comparing this to if J-Hay hit .300, 20% of the simulated seasons had batting averages greater than .300 so within reasonable chance .300 was not out of the picture for J-Hay, if his 2014 rates are his true rates.  Because this was Harrison’s first full season I’m not going to necessarily declare that he experienced tremendous luck in 2014, but going forward into 2015 my expectations are that Harrison is really a .280-.290 hitter.  Using this simulation method, it would be interesting to see how it compares to actual performances on a year-to-year basis but I’ll save that for next time.  It’s also extremely important to note that these batted ball rates are not definitive, as Fangraphs bluntly points out, as not all line drives are created equal with the same going for ground balls and fly balls.  These merely serve as solid starting points though, and ideally with access to proprietary data with launch angles and speed off the bat one can paint a much more accurate picture of a player’s true hitting value, along with any possible variation.  An idea that I will look into to address the setback of not having this data, is to see how a player’s actual BABIP compares to his simulated values based on the batted ball rates.  Then I can approximate a “hard contact” value for a player if for instance they’re BABIP consistently outperforms their simulated values.

Returning back to Harrison however, the safe bet is of course to say that he won’t hit .315 again this year but rather around .285 near the peak in the simulated distribution, which is still a pretty good average.  In comparison Baseball Prospectus’ PECOTA projects him to hit .282, with a drop in his power numbers something I did not address here.  The real problem in projecting him going forward is how 2014 was his only full season, making it hard to consider how much we can trust his performance last year.  Even though it’s terribly cliche, 2015 will probably be a defining year in J-Hay’s career proving whether or not his 2014 numbers were sustainable.  In my opinion, between Walker, Mercer, and Harrison I could see Harrison as the first player to lose his job to the newly acquired Jung-ho Kang considering Walker’s prime power numbers and Mercer’s reliability.  It’s very easy to root for a player like J-Hay, someone that wasn’t even supposed to be a starter but yet from what appears to be sheer willpower plays himself into the All-Star game and competes for the batting title.  To depart, I leave you with two spray charts showing all of Josh Harrison’s 2014 line drives (on left) and ground balls (on right) using the interactive spray chart tool from Fangraphs (again…).  J-Hay sprayed line drives everywhere in the outfield, but his ground balls heavily populated on the left side of the infield making him an ideal infield shift candidate.  This of course would further cut back on his batting average if every team the Pirates played were as diligent as them or the Astros, but that’s something that only time will tell...

Screen Shot 2015-03-14 at 4.11.42 PM

Leave a Reply