Part 1 – The Problems with ERA | Part 3 – Pitching Arsenal

In Part 1 of this series, we looked at the weaknesses of using ERA as a statistic to measure pitcher performance. It turns out there are a number of factors that have a substantial impact on a pitcher’s ERA that the pitcher doesn’t control. Let’s take a look at a few of the statistics that improve on ERA.

Isolating  the Pitcher’s Contribution

Suppose we pare back things for which we hold the pitcher accountable. Pitchers do have significant control over strikeouts, although catchers play a role with calling pitches and pitch framing. Umpire strike zones matter, too. But pitcher performance plays an overwhelming role in strikeouts. The same is true for walks and hit batters. Let’s give the pitcher credit and blame for those three outcomes.

For the moment, let’s assume home runs are something the pitcher controls. The number of home runs surrendered is unrelated to defense or relief pitchers or sequencing or official scorers, although it does depend on park factors. But lets assume for now that home runs belong in the bucket of stuff the pitcher controls.

We need a statistic that evaluates pitchers on those outcomes. Just count up home runs, walks, HBP and strikeouts. Those are standard box score stats. Nothing fancy. Figure out a weighting for each that reflects the known data on contribution to runs scored. To help with familiarity, use a formula that puts our stat on the same scale as ERA with 4.00 about average, 5.00 and above lousy, 3.50 good, and below 3.00 outstanding.

What we described is Fielding Independent Pitching (FIP), which is one of a group of similar statistics referred to as ERA Estimators. FIP has been popularized by the site FanGraphs and used as a basis for their WAR calculations.

FIP – Evaluating on Strikeouts, Walks and Home Runs

Fielding Independent Pitching measures how the pitcher actually pitched. The pitcher gave up those home runs and walks. He struck out those batters.

It’s more crucial to list the factors that don’t influence FIP. It doesn’t look at the number of runs scored or whether they were earned. It doesn’t include hits in the formula. Remember from yesterday’s post that hits have a huge component of randomness and are affected by batter skill and defense.

FIP does a better job of isolating what the pitcher controls in his performance than does ERA. FIP does not depend on official scorers decisions, or a shortstop’s range, or a left fielder’s arm strength, or the effectiveness of relief pitchers, or the sequence of events, or whether soft fly balls fall in as hits.

Research shows a pitcher’s FIP is a better predictor of how many runs he’ll give up in the future than does the pitcher’s ERA. Think of FIP as what a pitcher’s ERA would be assuming average defense, average bullpen and average luck.

A pitcher’s FIP is more stable than ERA from year to year, which is another indication it better reflects actual pitcher talent. If a pitcher has a long enough career, his ERA usually converges to his FIP. 75% of pitchers with at least a thousand innings pitched had an ERA within .2 of his FIP.

FIP isn’t perfect. It doesn’t account for that small part of batted balls that the pitcher does control. It includes home runs, even though those are influenced by park factors. But if you’re looking for a better measure of pitching performance, it’s a good place to start.

xFIP – Evaluating on Strikeouts, Walks and Fly Balls

Let’s go back to home runs. After years of study, we’ve learned that pitchers surrender one home run for every 10-12 fly balls they allow. That stat is expressed as the ratio HR/FB. For many years, HR/FB remained near 10%. In the past three seasons, the number jumped to around 12.5%.

Pitchers do have a degree of control over the number of fly balls they give up. If, as the data indicates, home runs are a reasonably consistent percentage of fly balls, the number of home runs a pitcher gives up is a function of his fly ball percentage (FB%).

Let’s say we wanted a version of FIP that “normalizes” home runs hit across luck and stadium dimensions. The way to do that would be to remove HR from the equation and replace it with a variable representing a pitcher’s FB% in relation to the league FB%.

That statistic is called xFIP where the “x” stands for “expected.” FIP counts how many home runs a pitcher gives up. xFIP estimates how many home runs a pitcher should give up assuming average luck and stadium size. It works essentially the same way FIP does. Pitchers control strikeouts, walks, hit batters and fly ball percentage. The formula is scaled to ERA. You can find xFIP at FanGraphs

Why is xFIP important?

Over a season, the number of home runs an individual pitcher gives up varies quite a bit and might even diverge from league average over the duration of an entire year. Eventually the pitcher will move back toward league average. But an unusually high or low HR/FB for certain stretches may not be a good indicator of his true talent.

Studies show that xFIP is a better predictor of future pitching than FIP. Both are better than ERA.

SIERA – Adding Back Some Pitcher Skills

Let’s return to that small amount of influence pitchers have on batted balls and try to factor that into an ERA estimator.

Here is the raw data: Pitchers with greater velocity and more strikeouts also generate more poor contact and more double plays per ground ball. Pitchers with higher walk rates give up more runs than would be supposed by straight linearity. Pitchers with higher ground ball rates have lower out rates than fly ball pitchers.

A formula that takes all of that into account is more complicated than the one for FIP or xFIP. But it is still based on what the pitcher controls.

This statistic is called SIERA, which stands for Skill-Interactive ERA. You can find it at FanGraphs.

SIERA assumes the pitcher has average luck, defense, sequencing, park factors and home runs. It incorporates strikeouts, walks, HBP and FB% as things under the pitcher’s control. What SIERA adds to xFIP is an attempt to model the small fraction of batted balls that the pitcher can influence.

Studies show that SIERA is a better predictor of future pitching than xFIP, FIP and ERA.

DRA

In 2015, the folks at Baseball Prospectus (a historic and tremendous baseball site) introduced their own stylized pitching statistic. It’s called Deserved Runs Average (DRA). DRA is a “mixed model” because like ERA it weights all batting events, including hits, but normalizes ERA in many, many ways. DRA controls for the stadium, temperature, quality of opposing batter, pitching on the road, defense, pitch count, catcher framing, umpire strike zone, number of runners on base, number of outs, base runner speed and more. It’s also scaled the same as ERA. 

Statcast “Expected” Stats

MLB’s Trackman system now gives us batted ball data, such as exit velocity and launch angle, for every play. Using that, it’s possible to develop new measures of how the pitcher performed. MLB’s Statcast Search page contains several new statistics that look at every hit ball a pitcher gives up.

Based on exit velocity and launch angle, it’s possible to formulate an expectation for how many hits and extra-base hits the pitcher should have given up. Examples include expected batting average (xBA), expected slugging percentage (xSLG) and expected, weighted on-base average (xwOBA). You can find them at the Baseball Savant website operated by MLB. They evaluate pitchers but are scaled to hitting stats, not ERA.

These new stats share similarities with FIP, xFIP and SIERA in that they assume average defense, bullpens, sequencing and, to a certain degree, luck.

But there is an important difference between the ERA Estimators and the new Statcast Expected Stats. Expected Stats give the pitcher 100% credit for the batted ball profile he surrendered. If a pitcher gives up more hits with a powerful angle-velocity combination, Expected Stats attribute it entirely to pitcher performance. But we know pitchers control far less of the variance in batted-ball profiles than that.

The Statcast Expected Stats give additional insight into actual pitcher performance and eliminate much of the noise that makes ERA an unreliable measure. But assuming that pitchers have complete control over batted balls will lead you down a questionable path.

“Minus” Stats

It is possible to adjust certain statistics for park effects. The convention among baseball statisticians is to put a minus-sign at the end and scale the statistics to 100. You can find ERA-, FIP- and xFIP-. Every point below 100 is a percentage that a pitcher is better than average. For example, a pitcher with an FIP- of 90 is 10 percent better than average, taking into account ballpark. 

About WHIP

The statistic WHIP stands for Walks plus Hits per Innings Pitched. It measures how many base runners a pitcher allows per inning. Because it’s a non-traditional baseball acronym, people often assume WHIP is a new-fangled sabermetric stat when that isn’t the case.

WHIP was a term invented by the guys who came up with the first fantasy baseball league in 1979. So it’s a made-up fantasy baseball stat.

WHIP does offer a certain snapshot of pitcher performance. Walks are an important way to evaluate pitchers. Of course, plenty of other statistics measure walks. The second half of the WHIP equation is Hits. Assigning the number of hits given up to the pitcher is a problematic and inaccurate way to measure the pitcher.

Defense, luck and hitter talent play an overwhelming role in Hits. An analyst looking to mitigate that variance would avoid using WHIP to analyze pitchers in favor of the stats described above. In that sense, WHIP is more of an anti-modern stat than a modern one.

Conclusion

In the first two parts of this series, we’ve looked at ERA, ERA estimators and Statcast Expected Rate as ways to measure pitching.

But there are new, granular ways to evaluate pitchers, many of which are at the cutting edge of thinking and based on brand new technology. Those metrics examine and measure the pitcher’s arsenal, individual pitches and outcomes. We’ll cover them in Part 3.

20 Responses

  1. matthew hendley

    Query: is their a place to find the mathematical formulas for all the various subjects here. Is there in addition a breakdown of the more complex formulas as well?

    • Jordan Barhorst

      Generally, Google can help you out here. Since not all of these statistics are created/maintained by the same group, the formulas and research are scattered among the creators separate sites.

      My favorite statistics site, Fangraphs, has an amazing glossary that not only dives into the formula for each of these stats, but also has examples for how to use the statistics to evaluate players, as well as tables that list what an ‘Excellent’, ‘Above Average’, ‘Average’, ‘Below Average’, and ‘Poor’ example of each statistic can be for a given season.

  2. wkuchad

    Televised games started showing a few updated hitting stats other than just batting average. Do you think pitching stats like FIP or SIERA will be shown along with ERA?

    • Steve Mancuso

      That’s a great question. Radio and TV broadcasts have been the slowest media to adopt new statistics. The Reds main radio announcer still routinely uses Pitcher Wins as the primary way to evaluate pitchers. To the team’s credit, the Reds do post more hitting stats on the GABP scoreboard including some of the Statcast data. I’ve overheard many fans asking each other questions like “What is OPS?” based on what’s on the scoreboard. It promotes good conversation. I’d like to see the Reds post FIP on the scoreboard. More on this later in this series.

      It was a huge step for FSO to start showing on-base percentage (albeit briefly and only once) on Reds broadcasts. That’s just a baby step. OBP has been known to be important for 20 years. And you can listen to months of the radio broadcast and never hear the statistic on-base percentage even mentioned. Chris Welch is the one Reds broadcaster who studies new statistics and tries to work them into the discussion.

      The place where the newer stats are breaking through is the written word, like team blogs. Even the beat writers have tried to incorporate a few new stats. In the end, it comes down to eduction first. The broadcasters, writers and media executives need to learn and understand what’s out there. Fans need to be educated on them, too, so that the profit-driven media won’t shy away from using new stats.

      That’s the purpose of this series at Redleg Nation. The more we learn collectively about the stats, the more we can use them freely in our discussions.

      • Amarillo

        I read at one point that team OPS was the stat with the highest correlation to final standings. From a friend who has a job in an MLB office I am told that wOBA is used that primary comparison between hitters used by teams.

      • Steve Mancuso

        OPS is similar to wOBA in that it is a single measure that combines power with batting average and walks. wOBA is a little more sound from a math standpoint. But they both get at the same thing.

  3. greenmtred

    A question: Are walks not also influenced by the hitter’s skill (think Joey Votto)? Ditto for strikeouts, since it seems, intuitively at least, that hitters who are good at not striking out would not strike out very much even against good strikeout pitchers. Thanks for doing this, Steve. Very clear and understandable and recommended reading for–especially–old-school fans like me.

  4. BK

    Really nice article…thanks for putting this together!

  5. WVRedlegs

    Great series so far. Very educational for the masses. Great writing style by not throwing in a lot of opinion and criticizing those that do not think along the same lines. However, good baseball fans need to take it upon themselves to learn these newer stats. Each year I try to take a couple of newer stats and learn more about them to be able to use them half way intelligently. The pitching ones have been the harder ones to adjust to.
    Thanks for the refresher course.
    One question? How is the new radio personality, Tommy Thrall, in using the newer stats? That could usher in a new era of stat usage on the broadcasts in addition to ushering in a new era from Marty.

  6. WVRedlegs

    Now this looks like a regular season lineup today.
    1. Gennett 2B
    2. Votto 1B
    3. Kemp LF
    4. Suarez 3B
    5. Puig RF
    6. Winker DH
    7. Schebler CF
    8. Peraza SS
    9. Barnhart C
    P- Santillan

    Votto looks entrenched at that #2 spot, so far.

    • Drrobo

      Now, if the Reds can avoid being down by 5 after three, it should be an interesting season. While I say that tongue in cheek, the past few years have been frustrating to me as a fan for the game to be over before it started. Can’t imagine how that played on the psyche of the position players.

  7. Drrobo

    As I read the two articles I got dizzy from my spinning head with all the new (to me) acronyms. While it would be easier to stick to the traditional terms I immediately realized the inevitable that I needed to re-educate myself since new age broadcasters are going to be using the new analytical terminology especially since it seemingly demonstrates more accurately a player’s performance. Then I read WV’s comment and it confirmed my thoughts. I have put a glossary of acronyms in the note section of my mobile device for quick access.

  8. Indy Red Man

    In my opinion its hard to go wrong with a pitchers WHIP. I like to look at batting average against and HRs allowed as well. Of course HRs can be subjective. How many rockets to the wall did Alex Wood allow last year? A towering 375 ft flyball to left-center at night in LA is just another out, but the same swing in Gabp would probably be 8 rows back in the cheap seats.

    • Drrobo

      I have asked numerous times through social media if there were stats on the number of first-third row HR’s hit in GABP with no results. Can anyone help? I maintain it will forever be an issue when it comes to signing FA pitchers. Other organizations have restructured their outfields so is it worth considering?

      • Matt WI

        I don’t have specifics, but I’ve seen answers posted around here before that based on how GABP was constructed, moving back the walls is not technically feasible. It’s too bad, it’d be worth it otherwise.

    • Matt WI

      I have always liked WHIP too. I wonder if there are any studies done that would demonstrate it’s correlation strength with FIP or what have you.

      • Ethan L

        It’s easy to download the stats of FanGraphs and put it into Excel. From there, you can run a quick correlation. I did that with WAR and wRC+, etc. for fun a while back. It was easy and also informative.

  9. scottya

    Current Pitching staff analysis based on the last two season’s sierra: 4.14 starting staff & 3.75 bullpen staff = 3.98 projected staff earned runs given per 9 innings. The pirates gave up 693 runs last season with a 4.00 staff era.

    1. Wood ERA 17% of starter innings (3.74 sierra over last two seasons)
    2. Castillo ERA 17% of starter innings (3.77 “)
    3. Gray 15% of starter innings (4.18 sierra “)
    4. Roark 15% of starter innings (4.35 sierra “)
    5. Desclafani 15% of starter innings (3.96 sierra “)

    6. – 8. 21% of starter innings equally at a (4.72) : Reed (4.29) career sierra, R Stephenson (5.15) career as a sierra, T. Mahle (4.72) career sierra. (last season 23% of our starts came from pitcher 6-10, I used 21%)

    Reliever’s 2 year sierra = 3.75 at 41% of all innings pitched.
    1. Iglesias at 13% of reliever innings (3.24 sierra)
    2. J Hughes 13% of reliever innings (3.56 sierra)
    3. D Hernandez at 13% of reliever innings (3.37 sierra)
    4. M Lorenzen at 13% of reliever innings (4.22 sierra)
    5. A Garrett 10% of reliever innings (3.49 sierra)
    6. M Bowman 10% of reliever innings (3.87 sierra)
    7. Z Duke 10% of reliever innings (4.00 sierra)
    8. S. Romano, C Reed, J Stephens, M Wisler 3.77+ 1.98 + 4.23 + 6.58 (career bullpen) = 4.14

  10. Steve Schoenbaechler

    “Pitchers do have significant control over strikeouts”

    “pitcher performance plays an overwhelming role in strikeouts.”

    First statement, wrong. Second statement, overwhelming may seem a bit strong to me, but I can agree with this. Do the pitchers have control over K’s? Not at all. If they did, they would be K-ing a lot more batters than they do. Does a pitcher’s performance play an overwhelming role in K’s? Yes.

    “let’s assume home runs are something the pitcher controls”

    You’re basing this on controls. But, then, just last part, you talked about the park dimensions, which would definitely play a significant role in HR’s, also. A pitcher can control when a pitch is that 1 inch too high or low, putting it right in the batter’s zone for HR’s? And, a pitcher can control the dimensions of the park?

    “FIP – Evaluating on Strikeouts, Walks and Home Runs” as well as all of the others

    We aren’t going to evaluate when a pitcher gives up singles and doubles? We aren’t going to evaluate when a pitcher keeps the ball out of the batter’s hot zones? We aren’t going to evaluate how a pitcher keeps a batter off balance and guessing? These are too difficult to track? But, these are definitely important aspects to consider when pitching.

    That’s why I said before, there is definitely a correlation between wins and ERA, FIP, SIERA, QMF, ABC, and whatever other statistic that can be developed to measure a pitcher’s success. But, as an average baseball fan, high energy baseball fan, low energy baseball fan, or likewise, what would I rather have, the wins? Or, the ERA? Or, the FIP? Or the SIERA?

    Give me the wins, baby. In the end, those are the only things that matter.

    Don’t get me wrong. Hey, I like Roark, and he comes to us with a losing record most recently. But, he also had a lower ERA than a vast majority of our starters had last year. So, I’m willing to see what he can do here.

    So, what am I getting to? You have to look at the entire gambit of items, including the ones that can’t be sabermetrified. For the haters, I never did say not to look at the sabermetrics. For, I would at the sabermetrics. At all of them plus additional material, not just the sabermetrics.

    I mean, most basically, if the pitcher is known not to be a very good team player and clubhouse buddy, aka a locker room poison, but he has good sabermetrics, are you really going to look to bring that pitcher into your clubhouse? What’s the sabermetric for being a good clubhouse personality?

    That’s why, for me, “if” I had to choose, give me the wins, baby. In the end, that’s all that matters. If the person wins, odds are drastic they are going to have a low ERA, a low FIP, a low SIERA, a low BABIP, etc. For example, rarely will you find a 20-game winner with an ERA over 5.

  11. Ethan L

    I enjoyed this article a lot. More please. Feel free to do threads like these on more sabermetrics :). This is what makes RLN legit.