Let us discuss the concept of Batting Average on Balls In Play, or BABIP. In simplest terms, a player’s BABIP is his batting average while ignoring strikeouts and home runs. Why ignore those? Because neither play results in a “ball in play” that could possibly be converted to an out by the defense. The goal of BABIP is to describe how difficult it was to get a player out when he put the ball in play.
Many factors effect a player’s BABIP; a few examples being speed, propensity to hit the ball hard, and batted-ball profile (think line drives and pop-ups!)
Being fast (think Billy Hamilton!) makes it easier to leg out infield singles, thus increasing a player’s BABIP. Naturally, you’d expect someone like Billy to have a higher BABIP for grounders than an average player; he does. For his career, Billy is batting .302 when he puts the ball on the ground, while the average player in 2016 batted .239 on grounders.
Hitting the ball hard (think Giancarlo Stanton!) will also increase your BABIP. When a ball is hit hard, defenders have less time to react and reach the ball before it falls in for a hit. Hitting the ball hard, more often, will lead to higher batting averages. This should be very intuitive.
There are many other factors that play into BABIP. Obviously, hitting line drives is preferable to hitting pop ups (think Joey Votto!). Along the same line, it is nice to spray the ball around enough that a team can’t easily employ a defensive shift (think Jay Bruce!).
In short, everything you do in the batter’s box accounts for your BABIP. Each player, based on all these factors, will have a unique BABIP profile. An average player will have a BABIP around .300. Each of the last 5 seasons saw league-wide BABIPs no lower than .297 and no higher than .300.
For the sake of demonstration, let us look at Joey Votto, since he happens to be a historic example of BABIP prowess. He hits the ball hard regularly, hits gobs of line drives, avoids pop-ups better than just about anyone in history, and usually sprays the ball to all parts of the field. Because of this, and despite the fact that Votto doesn’t possess great speed, he has posted a career BABIP of .359. Said another way, when Votto puts the bat on the ball (and it doesn’t leave the yard), it has turned into a hit 36% of the time. This happens to be tied for 3rd all-time. Like…in all of history. He trails only Ty Cobb and Rogers Hornsby and is tied with Rod Carew (min 4000 PA, no 1800s guys!). Folks…please appreciate the fact that you get to watch an all-time great in the batter’s box on a daily basis.
Switching gears a bit, now that we understand how BABIP is calculated and what affects it, let’s talk about the “L word:” luck. Many productive conversations on player valuation have broken down when the “L word” is invoked.
Since we know an average player with an average batted-ball profile should have a BABIP around .300, we might sometimes say a guy has been unlucky if he maintains an average batted-ball profile and has a BABIP lower than .300. Conversely, we sometimes say a guy has been lucky if he maintains an average batted-ball profile and has a BABIP north of .300.
In these terms, the word luck is used to express the mathematical idea of variance. Inherently, we all understand the concept and accept it without the need for math. We know, as baseball fans, if a guy hits five pop-ups in a row and all five fall in for base hits, the batter’s true talent level is not a 1.000 AVG. Along the same lines, if a batter hits five screaming line drives that all happen to be hit directly towards a defender, we know that batter’s true talent level is not a .000 AVG. The answer lies somewhere in between.
It is very important to understand that batted ball profiles can vary quite a bit from player to player. It follows, then, that we should expect a player’s BABIPs to vary quite a bit. For example, it is simultaneously possible for a guy running a.330 BABIP to be unlucky while his teammate running a .295 BABIP is lucky.
So, how do we describe the luck factor for each player, and how do we figure out if that factor is within expectations based on the way that player is hitting the ball?
The answer, of course, is xBABIP; or, expected batting average on balls in play. This is a framework used to describe, with the greatest accuracy we can, how often a player’s hits should be falling in for hits, based on what the player is actually doing. This isn’t hocus-pocus prediction; this is honest-to-goodness multi-variate regression using historical data! Three cheers for historical data!!
Several days ago, a fantastic writer/researcher by the name of Mike Podhorzer released a version of a formula that calculates a player’s xBABIP which now also incorporates defensive shifts. The fact that some players are shifted quite frequently has been a common missing factor in many of these xBABIP equations up to this point.
As usual, these types of equations can be calculated using publicly available data and using FanGraphs split tools. The process is outlined in the article linked above. The inputs for this equation are as follows:
– Speed Score (Spd)
– Hard-Hit Rate (Hard%)
– Line Drive Rate (LD%)
– Fly Ball Rate (FB%)
– Infield Fly Ball Rate (IFFB%)
– Ground Ball Rate (GB%)
– Pull Rate while Shifted (pullGBshift%)
– Not Shifted Balls in Play (NoShBIP)
– Shifted Balls in Play (ShBIP)
– Rate of BIP While Shifted (%BIPSh)
It might seem like a lot, but following the outlined formula in the article, and using a spreadsheet to keep yourself organized, it is quite straightforward.
I decided to go ahead and calculate the eight projected Reds starters with the new system and the old system (that did not incorporate shifts) and then compare the two.
Mostly I did this for fun. And mostly, my idea of fun is viewed as “odd” by many. Regardless, here is a chart to gaze upon!
Looking first at the 2016 Actual BABIP column, one might conclude that Joey Votto and Jose Peraza were very, very lucky! Their BABIPs were way over .300! No way they can sustain those .360+ numbers! Well, what do the peripherals say?
Check out the xBABIP columns; they incorporate all the peripherals we care about. While not quite as high as the actuals, inflated xBABIPs show both players were doing the things that need to be done to carry an inflated BABIP. That is to say, we shouldn’t expect either player to fully regress to league-average this year. (Peraza, given his tiny sample size, however, is still a candidate for huge BABIP regression. Joey Votto is not.)
As you might expect, a player who pulls grounders at a high rate (Scott Schebler) is going to have a big drop between old and new, now that we are incorporating shift data. In 2017, we probably should not expect Schebler to maintain an above-average BABIP unless he changes his batted-ball profile.
Zack Cozart’s big delta is interesting, given that righties are not shifted a ton. I think it likely has something to do with the new equation weighting each event slightly differently.
For those who care, this new model is the first one of this type (i.e. – not using granular ball-in-play exit velocity data) which has broken an r-squared of 0.5. Again, more on that in the article linked above.
Really, there’s nothing that groundbreaking here from an analysis perspective, but I ran these numbers and figured I’d share with everyone! Baseball is upon us!!