2013 Reds / Reds - General / Reds By The Numbers

Measuring pitcher performance with FIP, xFIP and SIERA

If you want to measure how many runs a pitcher has given up, ERA is your statistic.

If you want to measure how well a pitcher has actually pitched then FIP, xFIP or SIERA may be better.

And understanding the difference between those ideas is what this post is about.

The most commonly used measure for how a pitcher has performed is Wins and Losses. But those statistics really don’t directly measure how the pitcher pitched, rather they measure how well the team played overall — including defense and hitting — on the day that pitcher pitched.

If you wanted to isolate the role of the pitcher, look at the Total Runs a pitcher gave up. Yet Total Runs is still substantially dependent on fielding performance. It’s widely accepted that pitchers shouldn’t be accountable for runs scoring due to the fielding errors of their teammates.

A further refinement in isolating the pitcher’s contribution is to rely on the official scorer to determine errors and then judge runs as “earned” or “unearned.” Assigning only earned runs to a pitcher is the basis of the statistic we know as Earned Run Average (ERA). We typically cite ERA and not Total Runs because we want to limit the credit/blame of the pitcher to only what he controls.

While ERA is an improvement over Total Runs, it still depends on many things out of the pitcher’s control and is a relatively crude measure of how well the pitcher has actually pitched. Fielding isn’t just about errors, it’s about range and arm strength. So ERA depends on the quality of your fielders (and the whims of the official scorer).

ERA also depends on the competency of the bullpen, as many “earned” runs allowed are actually inherited runners scored off of relief pitchers. For example, Mike Leake was charged with two earned runs when Logan Ondrusek allowed inherited runners to score against the Cardinals. If Sam LeCure comes in instead of Ondrusek, that might significantly affect Leake’s ERA. There’s a wide variability for bullpen performance between teams and even from night to night on the same team.

ERA also depends on how lucky or unlucky a pitcher is on balls batted into play. Research has shown that pitchers have little control over what happens to a ball that is put in play.

Finally, ERA depends on the luck of random sequencing. Suppose Pitcher A gives up a single, walk and home run, in that order. He’d have given up three earned runs. Suppose Pitcher B gave up a home run, single then walk. That sequencing would have produced only one earned run. Did Pitcher A perform three times worse than Pitcher B? Because that’s what his ERA would indicate.

ERA measures the actual “earned” runs a pitcher is assigned, but it depends on many variables that a pitcher can’t control. Fortunately, lots of smart people have come up with more refined and accurate measures that further isolate how a pitcher has actually performed.

One of which is FIP.

FIP stands for Fielding Independent Pitching. FIP is calculated by counting the number of home runs, strikeouts, walks and hit batters the pitcher actually allows and plugs those numbers into a formula. A constant term (3.2) is added so the resulting number is scaled to ERA for the sake of familiarity and comparability.

The argument for FIP over ERA is that it better isolates what the pitcher controls — not his shortstop’s range, not the official scorer’s ruling, not the relief pitcher’s ability to throw strikes, not whether bloops fall in for hits etc.

Studies have shown that FIP is a better predictor of a pitcher’s future ERA than the pitcher’s current or past ERAs. That’s an important sentence to process. If you want to measure how many runs the pitcher has already given up, use ERA. If you want to predict how many runs a pitcher will give up in the future, FIP is better.

FIP is just one of the alternatives to ERA.

xFIP (Expected FIP) uses the concept of FIP as a starting point, but normalizes home runs across luck and stadiums. FIP uses a hard count of how many home runs a pitcher actually gives up. xFIP estimates how many a pitcher should give up assuming normal luck. It’s based on the notion that a pitcher only controls how many fly balls he surrenders, and that home runs are a fairly constant percentage of all fly balls (HR/FB) over time. [On the other hand, some pitchers, like Mike Leake, have seemed to be more prone to home runs, even when taking into account how many fly balls they give up.] xFIP has been proven to be a better ERA predictor than FIP.

SIERA (skill-interactive ERA) adds more nuance yet to FIP. It accounts for the fact that all balls in play are not the same. For example, ground balls are turned into outs at a higher rate than line drives but at a lower rate than fly balls. It also turns out that pitchers with more strikeouts generally have lower HR/FB, so it’s a refinement on xFIP. Pitchers with more strikeouts also tend to have lower BABIP and more double plays per ground ball. SIERA has been proven to be a better predictor of ERA than either FIP or xFIP.

Of course, you should never look at just one measure of how a pitcher performs. FIP, xFIP and SIERA are just a few of the most common sabermetric attempts to further refine measurement of pitcher performance. If you want to know how a team did, look at Wins, Runs and ERA. If you want to look at how the pitcher, in isolation, performed and will perform, look at FIP, xFIP and SIERA.

Here are the up-to-date 2013 numbers for the Reds’ starting rotation:

[table id=34 /]

 

51 thoughts on “Measuring pitcher performance with FIP, xFIP and SIERA

  1. Finally, ERA depends on the luck of random sequencing. Suppose Pitcher A gives up a single, walk and home run, in that order. He’d have given up three earned runs. Suppose Pitcher B gave up a home run, single then walk. That sequencing would have produced only one earned run. Did Pitcher A perform three times worse than Pitcher B? Because that’s what his ERA would indicate.

    Why yes, yes he did.

    Certain pitchers, wether you know it or not, have another level that they call upon when they “get into trouble”. What this extra level constitutes, wether it’s pulling out a little bit more velocity, or snapping their wrists a little crisper to get their breaking balls to break more, or simply focusing more intently, it up for debate, but certain pitchers do seem to have it.

    It’s also a measure of a pitcher not getting flustered when things don’t go their way. Good pitchers tend to embody “When the going gets tough, the tough get going” mentality while other pitchers who may have a fragile psyche fold like a cheap card table at the first sign of adversary.

    So really, it has nothing to do with the luck of the sequence and everything to do with a pitcher’s ability to pitch well under pressure and to bear down instead of breaking down when they get themselves into a mess.

    • @CI3J: I’m not sure why you’re saying one pitcher in my example was more flustered than the other. They each gave up a home run, walk and single. Just in a different order. How does that relate to being flustered or in trouble?

      Put it on a smaller scale, just walk and home run. You think that a pitcher who faces a runner on first can be described as “flustered” or in “trouble” ? I don’t.

      Sequence has more to do with the part of the lineup they are facing.

      You really think one pitcher was *three times* worse than the other? It seems kind of absolutist to say it has “nothing to do with luck.”

      • @Steve Mancuso:

        If a pitcher gives up a walk and a single before the homerun, that means he has two men on and would necessarily have to be aware of them on base and how each pitch could potentially end up with him giving up multiple runs, as opposed to pitching with the bases empty where the pitcher knows that, at worst, he can only give up one run. This is added pressure that otherwise would not be there. There is also the fact the pitcher has to change his pitching mechanics with runners on so as to prevent the running game.

        So yes, the order of the sequence does matter. Statistically, sure, they all gave up a walk, a hit, and homerun. But statistics can’t take into account the emotional impact the sequence can have on a pitcher. Say he gives up a walk on a ball that he thought should have been called a strike. Say he gives up the hit on a ball that he meant to place in one location but it got away from him. As the frustration builds, it acts as a distraction that in turn could lead to the pitcher “hanging one” and having it deposited in the stands.

        • @CI3J: I totally understand what you’re saying. And emotional factors like that undoubtedly play a part some of the time. But you have to admit that your narrative certainly doesn’t apply in every single case. Runs are totally, completely determined by the mental state of the pitcher? Random luck obviously plays a part, too.

          Pitchers A and B might have had six runs leads or been six runs behind so that allowing three runs may not have mattered much or caused stress. Both pitchers might have been in a tied or one-run game, so that giving up just a solo homer was stressful, like last night in the seventh.

          My point: To say that one pitcher was three times worse than the other exaggerates the difference. They gave up the exact same things.

        • @CI3J: The biggest problem I have with the whole “flustered” doctrine being presented is this: you cannot measure it. We should never look at things that are immesurable when making decisions, which is the basis for stats like FIP. We can’t accurately measure the impact of the players behind the pitcher on runs allowed, so we limit it to thing that only the picture can control.

          We can’t measure “flusteredness” or any of that. You have anecdotdal evidence, but that doesn’t stand up to any sort of logical, mathematical test.

          Take Steve’s example, but this time apply it two Pitcher A twice, rather than a Pitcher B. Say he throws Game 1 and Game 7 of the WS and faces the same lineup both times. He gives up a walk then a homer in Game 1 and a homer then a walk in Game 7. Has the pitcher’s skill level changed? No. But ERA would tell you a negative story about him in Game 1 and a positive one in Game 7.

      • @Steve Mancuso: I can understand it a little, Steve. It sort of has to do with the “human factor”. That is something that stats will never be able to consider. That’s why I said before things like the only 100% correlator you will find about how many runs a team scores is how many times they cross homeplate before the 3rd out when they are at bat. Or like the example I posted down below. I felt I saw it from Leake a lot myself. When the Reds got him an early lead, he normally ended up having a good game. When the Reds got behind early with him pitching, it seemed to get worse for him as the game went. He rarely seemed to be able to keep the Reds in the game if down by a couple of runs, for even more runs would end up crossing the plate with him pitching. But, stake him to an early lead, you could almost count on him for a good 7+ innings.

    • @CI3J: It is more than just sequencing, I think. The pitcher who gave up the 3-run homer in your example may, for example, not be as effective from the stretch than he is from the wind-up. And a pitcher’s overall effectiveness includes how good he is from the stretch. I’m not really buying the “emotional stress” argument, though.

      But Steve’s post in general was very well written and useful.

      Steve, in this small sample size of 5 starters, the advanced metrics place give higher numbers (relative to ERA) to the 2 soft-tossers, Arroyo and Leake. Is there some theory that would explain that, or is it likely just a quirk to these 2 guys?

      By observation, Homer tends to leave about 3 balls a game at thigh high, which get crushed, thereby erasing his other 95 good pitches. I wish there was a Disaster Ball stat that would cover that. Homer also tends to get to an 0-2 count pretty easily, then flounder/nibble to retire the guy in 7 pitches.

      • @Big Ed: Until last night, Homer was actually well below league average in home runs allowed this year. He’s only given up four home runs in GABP this year.

        The answer to your question about Leake and Arroyo is their strikeout rate. The advanced stats love strikeouts because they prevent balls from being put in play where they can become hits.

  2. Nice post, Steve. Well done, especially with the stats to show some comparison. As you may have implied, stats are just that, stats. They never tell why. They can’t tell exactly what will happen, only as a predictor.

    Along those lines, I would like to know how would these compare in another regard, maybe even as an addition to this. Only as a hypothetical, without doing any research on this, only example (even if one doesn’t agree with this, general stats have shown this forever):

    Assuming all competitors are similar, all parks are similar, etc.

    Pitcher #1 could be more all over the runs chart in his games, like 5, 2, 6, 2, 4, 0, 4, 1 run, given up in 8 straight games, even assuming all complete games here. ERA is 3.00.

    Pitcher #2 could be more consistent, like 3, 3, 2, 3, 4, 4, 2, 3. The ERA is still 3.00.

    But, obviously, there is a difference in their pitching. One seemingly more prone to a “roller coaster ride” (ups and downs, never really knowing what you are going to get), the other seemingly more consistent. Technically, it would be the difference between a high standard deviation and a low standard deviation. As an analogy, it would be the difference between, for example, a running back just as likely to gain just a yard or two and even fumble the ball as being able to breakaway for a long run TD, compared to a running back who probably won’t fumble and who you can almost guarantee getting at least 3-4 years per carry, but not possessing the breakaway speed to get that long run TD.

    The entire thing I may be getting to is this: would it be incorrect to say all of these are still averages and still doesn’t say what the pitcher may be prone to do? Like pitcher #1, he was prone to give up as many as 0-6 runs. But, pitcher #2, you could almost bank he would give up 2-4 runs. Possibly even being a situational thing? Like with the running backs, if it is late in the game, and you are losing by less than a TD, you may be more prone to playing the breakaway back, having a better chance of getting that TD. But, if behind by the same score and, say, earlier in the game, you may still play the more consistent back.

    • @steveschoen:

      This is a very good point. Say you have a guy who throws 3 no-hitters in a season but in other games gives up 5-6 runs and has to be pulled in the 5th, while you have another guy that always makes it to the 8th while giving up 3 or fewer runs.

      Personally, I value consistency in a pitcher. I think it is fallacy to place too much stock into a pitcher who has a tendency towards feast or famine. Again, this cosistency may be due to the feast or famine guy being unable to deal with pressure as soon as things start going bad while the consistent guy just gets the job done. Certainly something like this can’t be all luck, can it?

      • @CI3J: And, that’s why I wonder if any of these stats show any of that.

        And, for baseball, I’m not saying I would choose one or the other for, say, the WC playoff game. I could see the manager selecting your “no-hitter” guy, hoping he will have one of his good games and his team will only need 1-2 runs to win the game. I could see a manager selecting your “8th inning” guy, hoping his offense will respond with enough runs to still win the game. Where still, in theory, how I read it, each pitcher could still have the same ERA, FIP, FIP+, and SIERA, since I believe these are all still averages and don’t tell how “good and bad” (max and min for the stat guys) a single pitcher has been.

  3. I’m not a big sabremetrics guy, and while I have a passing knowledge of some of the better known advanced stats, I do appreciate these primers with additional depth. That said, I often wonder if they detract from appreciation of the game. I was at the Braves-Phillies game last night and enjoyed as always seeing the game played at the highest level. Saw some defensive gems, a couple of minor misjudgments that might have influenced the outcome (I love that baseball is so finely tuned a game that a moment’s hesitation or a little better jump can make all the difference), and some old fashioned hard luck. The Phil’s hit several rockets right at Braves defenders that defied the BABIP numbers and made outs out of balls that should probably have been hits and might have flipped the outcome. Most of those subtle, beautiful details get lost in the statistical view. Maybe I’m just saying that I hope the numbers crunching types still get to enjoy the beauty of a boys’ game played with the utmost skill.

    I’m also fascinated by some of the contradictions that arise when the sabremetrics are juxtaposed with the day to day foibles of our favorite team. It’s the classic large vs. small sample size problem, but it’s still quite interesting. In Steve’s column we learn that a pitcher has no control over batted ball outcomes – so many batted balls become hits, so many fly balls become home runs, and the pitcher’s only skills are in missing bats or in inducing fly balls vs. grounders. An oversimplification I know, but go with it for a moment. Then in the game summary we see that Homer Bailey did not pitch well, allowing 4 runs on 3 home runs in 6 1/3 innings. But the box score tells me that he induced 8 fly ball outs. That means 3 of 11 fly balls left the yard, and I conclude that Homer did not pitch poorly, he was just unlucky! That HR/FB ratio is 2 1/2 times his normal rate and unsustainable. To my eye (oh no, the dreaded eye test), Homer was a bit off, missing spots and perhaps with a bit less movement on his fastball than he often has. So I agree that he did not pitch particularly well – not awful, just not great.

    So here’s my take anyway. I love these advanced metrics as a planning tool: roster construction, lineup planning, in-game strategy (best example = almost never sacrifice bunt), and even comparisons between different players whether considering trades or debating relative merits of all-time greats or two mid-level shortstops. But I love baseball because of the joy, the skill, the drama, the unexpected moment, the dazzling play, the subtle shift, the energy, and the heroics. And I love the Reds because, well, I love the Reds.

    Thanks as always for the food for thought and the place to sit and chew it over…

    • @Chris DeBlois: I can understand what you mean Chris. For many, like the math geeks (I will admit I am one and have to struggle many times to hold it back. For, many times when I hear some math geeks start talking stats, I will hear something I believe they either don’t consider enough in depth or too much depth and think I should add my two cents), it can add to their appreciation for the game. Or, it’s how they relate to things. For others, yes, it does detract from the overall appreciation of the game.

      For the record, as far as I am concerned, to each their own. If you want to talk stats, great. If not, great. If you want to bash the Bakerman, fine. If not, fine. Just don’t diss the other posters simply because they are discussing that.

    • @Chris DeBlois: I haven’t met a lot of sabermetricians, but I can say that my experience is that the people who put lots of stock into the advanced stats also love watching baseball. You’d really have to love the sport to take the time to get so involved in all the stats. I’m not saying that you can’t love baseball if you don’t use advanced stats. I’m saying that people who do use advanced stats also love baseball including the idiosyncratic quirks, at least in my experience.

    • @Chris DeBlois: The way I look at advanced stats is what they mostly do is help tease out the contributions of individual players. Most of the old-school stats, like wins/losses, ERA, RBI etc. do a pretty good job of evaluating the overall performance of teams. RBI for example, depend on someone getting on base and someone driving them in — two players.

      Sabermetrics, to me, seems devoted to taking those stats a step further and trying to figure out the value of individual players.

      For example, last night in the first inning, Choo walked, Votto doubled, Phillips singled (RBI) and Bruce grounded to first (RBI). Different stats assign different value to what each player did. If you look at RBI, then Choo and Votto made no contribution to the inning. And Phillips and Bruce made exactly the same.

      Obviously, you’d want statistics (like weighted runs created wRC+) that isolate the fact that without Choo walking, Phillips would have had no RBI. Without Votto doubling, maybe neither RBI occurs. And Phillips, by not making an out, contributed to the possibility of future runs in the inning more than Bruce did, who made an out.

      Nothing wrong with RBI, it just measures team factors a lot and should be thought of that way. Stats like wRC+ isolate the specific performance of the hitters.

  4. The discrepancies between ERA and FIP are highest for Homer (in his favor) and Leake (not in his favor). Has the bullpen really bailed out Leake that often this year (and in turn, have they done Homer fewer favors?)?

    Also, pardon the ignorance, but is WHIP not a respected pitching stat?

    • @Davis Stuns Goliath: The main reason that Bailey and Leake are treated so differently by the advanced stats is their strikeout rate. A second smaller factor is that Leake has been a little luckier this year on balls in play (BABIP).

      WHIP combines H/9 and BB/9 which are relatively commonly cited stats. WHIP has become popularized mostly because it is used so often in fantasy baseball leagues.

      One way to think about H/9 is that it’s partly derivative of strikeouts (more strikeouts, fewer opportunities for balls in play to become hits), LD% (line drive percentage) and luck. Sabermetrics tries to weed out the luck factor and focus more on things like K/9, BB/9 and LD% which are more under the control of the pitcher.

    • @joelie1274: Close call. If I had to decide right now, I’d say Latos. But in the time between now and the end of the season they could pitch in ways that would make me change my mind. Both are pretty good choices, though. Homer has a better track record in the post-season (although extremely small sample).

  5. Food for thought… the Reds could sell at an all time high on Leake this off season and he regresses to the taker.

  6. I posted this a few days ago in the “Is Homer Bailey An Ace?” article. It’s why I don’t think FIP is all that superior to ERA. I think FIP can be a stat, it may show some things, but I don’t think it’s the king of how a pitcher will do in the future or anything.

    Mike Leake 2013: 10-5, 2.94 ERA, 22 GS, 140.2 IP, 34 BB, 86 K, 15 HR, pitches in GABP.
    Edinson Volquez 2013: 8-9, 5.44 ERA, 22 GS, 132.1 IP, 62 BB, 109 K, 12 HR, pitches in Petco Park.

    Mike Leake 2013 FIP: 4.04
    Edinson Volquez 2013 FIP: 4.03

    • @ToddAlmighty: And I guess the other part of my argument is that FIP takes luck and defense into play… but I don’t really understand how. If they can’t properly gauge defense on it’s own, how do they do it in a complex pitching formula?

      Jay Bruce gets to a ball quickly and launches it on a laser to second base. The guy is forced to limit what would normally be a double into a single. There’s no fielding stat for that for a single person. So how do you properly work in the fielding ability of 8 guys in FIP?

      –Continuing inability to properly measure defense:
      Joey Votto in 2011: 6 errors, .996 Fld%, Gold Glove Winner.
      Joey Votto in 2013: 12 errors, .988 Fld% (so far)

      Joey Votto’s 2011 dWAR: -0.6
      Joey Votto’s 2013 dWAR: 0.0

      So I think FIP is just something that has TOO MUCH that it’s attempting to incorporate and calculate with, when some of those things can’t even be properly calculated on their own.

      • @ToddAlmighty: FIP does not incorporate defense. All it does is count up the number of home runs, walks, strikeouts and hbp a pitcher gives up. That’s it. That’s the list. None of those are affected by defense, unlike ERA, which counts the runs pitchers have given up. Runs are influenced tremendously by defense, as your example proves.

        • @Steve Mancuso: Is there adjustments for which ballpark you pitch in?

          And will high strikeout throwers always be favored in FIP? Because I am failing to see how Volquez could ever be construed to be having a better year this year than Leake or Arroyo.

        • @ToddAlmighty: xFIP normalizes home runs (it assumes that every pitcher, regardless of where they pitch, gives up the same percentage of home runs on fly balls) so it would neutralize park factors. Leake has a bit of a lead over Volquez in xFIP (4.06 vs. 4.18). PETCO moved their fences in this year, so the park differences aren’t as great as they used to be. Volquez has actually given up more homers in PETCO (7) than he has on the road (6).

          I’m not saying that Volquez has had as good of a year as Leake, but many of the stats you cited (W-L, ERA) are awfully noisy when it comes to teammate contributions, as I detail in the post. Volquez has more strikeouts and given up fewer home runs. Leake has a big edge on walks. And Volquez has been really unlucky on balls in play with BABIP .330 (career .307).

        • @ToddAlmighty: I think strikeouts should be favored in that there is no better way for a pitcher to control his situation than to be able to eliminate the batter at the plate all by himself to get out of a jam. Some guys can do that, and they are more likely to avoid runs being scored because nobody advances on a SF or anything like that. But, there will always be some guys that will be better than what they should be according to the stats FIP measures.

        • @ToddAlmighty: That right there is one reason why you have to take all stats with a grain-of-salt. For instance, I know of one pitcher whose ERA/FIP career was 3.16/3.26 (23 seasons), a control pitcher, and another pitcher whose line read 3.19/2.97 (27 seasons), a power pitcher. It might look like the second one would be more likely to win a game. But, then, their career records: 355-227 to 324-292 respectively. As Steve was alluding to, I believe, FIP, xFIP, and SIERA all lean towards power pitchers being better pitchers. But, like here, obviously not true. The control pitcher was able to produce the same ERA and 31 more wins in 4 fewer seasons.

          Not to say control pitchers are better than power pitchers. For, we can also go into other details, like control pitchers would normally throw more pitches during a game and, thus, throw fewer innings and leave more games early, sending the game to the pen for them to finish, who may not be as good a pitcher. Just saying that you have to look at all stats with a grain-of-salt. One stat will never be the end all cure all. High correlations very possible, something to go by sure, but never 100% correlations since we will always be dealing with the “human factor”.

        • @Steve Mancuso: Because touching back on a low-K pitcher and FIP.

          Bronson Arroyo 2009-2013:
          2009: 3.84 ERA/4.78 FIP
          2010: 3.88 ERA/4.61 FIP
          2011: 5.07 ERA/5.71 FIP
          2012: 3.74 ERA/4.08 FIP
          2013: 3.51 ERA/4.12 FIP

          The article up top said that FIP was a better predictor of future ERA than ERA was… yet I can’t help but notice that FIP keeps trying to tell me Arroyo will be terrible (and for that HR laden year he was) but four out of the last five years he’s had an ERA under 3.90, which would tell me ERA has been telling me more of what I should expect out of Arroyo than FIP.

        • @ToddAlmighty: There are obviously going to be some outliers. Some pitchers do tend to outperform their peripheral stats (K/9 etc.). Maybe it’s because Bronson is a wily veteran. Cueto is another pitcher who has outperformed his FIP pretty consistently.

          But when they crunch the numbers across all pitchers over a long time, FIP predicts ERA better than past ERA. You can’t think that just because you found one pitcher that it disproves the entire theory?

        • @ToddAlmighty: You posted this while I was typing. Bronson is a guy that gives some stats guy fits because of how he pitches around his peripherals. But he does it all the time. He’s the man, and we’re lucky to have had him.

        • @Steve Mancuso: @Steve Mancuso: To Todd, also don’t forget that WAR is a measure of how good you are in relation to a replacement player. The value of the “replacement” player varies from year to year. Also, errors aren’t the only thing contributing to defensive WAR.

        • @prjeter: This was somehow posted in a completely random spot. Supposed to be tied to Todd’s comments about Votto’s dWAR.

  7. Chris makes a point above that Homer was theoretically and allegedly “unlucky” by giving up 3 homers on 11 flyballs.

    By observation, though, Homer threw 3 terrible pitches that got crushed, and were homers even into the teeth of a stiff Wrigley wind blowing in. Homer wasn’t unlucky; he was bad, at least on those pitches.

    • @Big Ed: That’s what puts me more on the side of FIP than xFIP than a lot of the advanced stats people. As I indicate in the post, if you think some pitchers are more inclined to give up home runs than others (like Mike Leake over his career), then you’d favor FIP.

      Remember, FIP measures the actual home runs given up. It’s xFIP that normalizes all HR/FB. So if you want to blame Homer for those three home runs, FIP is your stat for that.

      (I do think the home run to Navarro was a bit unlucky. It was fair by about a foot.)

    • @Big Ed: The word “luck” gets thrown around a lot and I’m not always sure we understand what it means.

      Is it bad luck that 3 fly balls happened to leave the yard last night? Is it bad luck that Homer happened to throw 3 very hittable pitches in one game? Is it good luck that the hitters were able to make such solid contact? Were there other bad pitches that weren’t hit? Is it luck that on some nights your ace just isn’t feeling it? How many things can he actually control?

      Our brains are programmed to take what we see and simplify it into understandable terms. But I don’t think it’s always that simple.

      • @Aaron Lehr: I agree. This is one of the tricky issues with the advanced stats (or any stats, really). We measure a player’s performance and evaluate his skill and value over the long run. A month, a season, a career. But they play the games one at a time. And we all know that in the variations from game to game, at bat to at bat, there is more going on than just luck (as defined by random variations from expected performance based on cumulative statistics). Is there real luck involved (meaning, true random events)? I think very little. But are the effects of a thousand tiny variations that we can’t possibly measure and yet that have a significant influence on the outcome a lot like luck? Sure. If Votto starts his swing a tiny fraction sooner or later then his double play hot grounder to a drawn in first basement becomes a two run single to the fielder’s left or right. Same if the pitch speed goes up or down by a mile an hour or two. Same if the fielder is positioned a bit differently. And so on. To me that’s the beauty of the game that the stats can’t capture. As I’ve said before the stats are great tools for evaluation and planning, but pretty lousy when it comes to parsing the outcome of one at bat or one game. And that uncertainty in the outcome of any at bat or any game is what keeps us going back to the park or glued to our tv’s.

      • @Aaron Lehr: “Randomness” is probably a more accurate term concept than “luck.” Phillips, if you remember, got a double and I think 2 RBIs earlier this year on what amounted to a grounder to first base, and Votto got an out when Carlos Gomez pulled one back from over the wall with 2 out in the 9th. And a bullet grounder to the shortstop is a double play, whereas the same ball 6 feet in either direction is a hit, when the hitter (or pitcher) really has no control over that 6 feet. Even the stats don’t necessarily tell you what actually happened.

        I guess teams have to look at a range of stats, trust their scouts’ observations, and make decisions on the whole picture. Latos, meanwhile, pitched his best game as a Red Monday night; loved the pitch efficiency.

    • @Big Ed: I was being a bit sarcastic. My point was that according to the advanced stats (particularly xFIP), Homer would have been considered unlucky.

      I thought Homer was not his sharpest (pretty sure there’s a consensus on that), but wouldn’t really put any of the home runs down to luck. If you watch the tape, the fastball that Schierholtz hit out was right at the glove, but probably without much movement and apparently right in the batter’s hot zone – he got good extension and the fat part of the bat on the ball and it went a long way. Was that a poor location? It was right where Hannigan wanted it. Poor pitch selection? Perhaps. Lack of movement? Perhaps. A batter who guessed right and got the pitch he was hoping for? Maybe. But probably not bad luck. As for Navarro, yes a couple of feet further right and that’s a foul ball, but it was certainly hit really hard so I’d say it would have been good luck if it curved foul, but probably not bad luck that it didn’t, just the natural result of a bomb. And Murphy’s home run just looked like a tiring Homer who couldn’t get one by a decent major league hitter. So my point was that I saw a pitcher who wasn’t particularly sharp, but xFIP would say he was just unlucky. So my stats assessment is as follows: I put some weight in FIP, don’t like xFIP much, would still give Homer the ball for a big game, but was disappointed that Bailey’s stuff wasn’t his sharpest last night and really enjoyed seeing the Reds find a way to win in spite of trying too many sac bunts. And I’m ready for another game at 2:20 today!

    • @Big Ed:

      I’ve made this same argument before as well: How much of it is “luck” and how much of it is “skill” or “failure to perform”? I think sometimes, these stats make the mistake of assuming all pitchers are robots who, all things being equal, have about the same luck and ability to perform in any given situation.

      As they say, when you are good, you make your own luck.

  8. I like advanced stats. And this is a very helpful article–thanks.

    Homer’s got great stuff. But games like last night’s are frustrating. A #1 or # 2 will typically find a way to throttle a team like the Cubs, especially after being given a lead and/or four runs. I hope he keeps developing, and I hope he keeps doing it in a Reds uniform. Assuming both he and Latos continue to improve, and if Cueto can get healthy, that’s a heck of a top 3 in the next couple of years–and that’s not even counting the team ERA leader (Leake), the fireballing lefty with the funky delivery, or the guy who is probably better than all of them (Chapman).

  9. just commenting how good this conversation/chat is. good stuff. There is one more number that I like on the chart, given the quality of pitching being shown, its all the numbers under 30 in parens by 4 of the pitchers, five if we included Cueto(27) on the list.

  10. This reminds me of a football stat. QB Rating. It is based on completions, attempts, interceptions, yards and TD passes. The problem is that TD passes are waited at all.
    If a QB throws 20-30 for 300 yards 3 TDs and no int, he might have a rating of 120.
    A second QB throw for exactly the same thing but has 0 TDs. His rating would be lower. But maybe he threw 3 passes where the receiver was tackled/fell down at the 1 yard line.

    Does that make him a worse passer?

  11. Thanks for the lengthy descriptions Steve, however, I do not see these stats to be significantly more predictive than another.

    I’ll agree that FIP, xFIP, and siera are better at saying what pitch is better or may begin to describe under/over perform, they still are backwards based and prone to large errors.

    FIP: takes scorer descisions out of play. Good but thats about all it does as it assumes all pitcher BABIP, in essence.

    xFIP: attempts to normalize luck and park factors, but only looks at the parks the pitcher has played in, not the ones he will play in in the future, so how predictive can it really be?

    Siera: has some of the issues of xFIP but adds some human judgement in (which we are trying to remove) by using LD%, etc. So now someone had to determine, whas that a line drive or a fly ball? and is that determination any easier than hit vs error.

    I don’t say this as rant against adv stats, but to show that enough error is present in them that we should be careful throwing the word ‘predictive’ around casually.

  12. @CI3J: @Steve Mancuso:

    Regarding the two pitchers and order of events.

    One pitcher gave up the HR from the windup and the other from the strech. Surely, that is a significant change in circumstance that could explain a skill difference?

  13. I am WAY late to the party today and this maybe mentioned somewhere above in the comments, but regarding the concern some have about “luck.” A simple exercise to perform is this.

    Take three stats, ERA, FIP, and xFIP. Which of them is better at predicting what a pitcher’s future ERA will be?

    The answer is xFIP followed by FIP followed by ERA. Are the advanced stats perfect? No. But they are better than the old-school stats when it comes to telling us how good a player has been.

    • Are the advanced stats perfect? No. But they are better than the old-school stats when it comes to telling us how good a player has been.

      This is my whole point, stats of any variety are better at looking back than looking forward. ‘predictive’ is a loaded word and implies a level of confidence that may be an overreach for the reasons mentioned.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s