Modern Baseball

Walks are Refundable

The purpose of this article is two-fold.   First, we’ll quickly look and see how much it costs teams, on average, to let a guy reach Super Two status.  For an explanation of Super Two, you can go read this. Second, we’ll take a detailed look at how much teams pay for certain types of statistics during salary arbitration, and we’ll build a model to predict a player’s salary based on our findings.

To start, I’ll describe the sample I’m using.  Basically, I pulled the data for every hitter between 2012 and 2015 who went through some sort of salary arbitration (Super 2, Arb1, Arb2, or Arb3).  That’s a pretty simple sample, I’d say!   This sample included 299 player-seasons from 174 players.

Let’s look at Super Two guys first.  In my sample, there are 41 players who entered salary arbitration via Super Two status.   Here’s a quick look at the average salary and raise received by players who were Super Two players to begin, and through the end of the sample’s arbitration years.  (NOTE: I set all players’ salaries at $500K for their first three (or two) years of team control to save time.  Adding the extra $5,000 to $30,000 wasn’t going to materially change the results and it probably saved me an hour of digging!)

super2_3

I thought it was pretty interesting that each year’s average raise was almost identical, except for the anomalous Arb2 year.  Big raises for Brandon Belt, Lucas Duda, Trevor Plouffe, and Brandon Moss brought that average up quite a bit.

In the total row you can see the average player who reached Super Two status earned about $14.3 million over his 4 arbitration years.

So, how does this compare to non-Super Two guys?  This would likely be beneficial information to have if you were a team deciding whether or not to play service time shenanigans with a guy, or just let him reach Super Two status and help the team early.   Behold:

super2_2

And now in graphical form!

super2_1

On average, by the end of arbitration, players’ average salaries are about the same, but the Super Two player earns about $3.4 million extra dollars over a 4 year arbitration period, or about $850,000 per season.

With that said, it’s probably a good idea to try and keep your non-superstar players from reaching Super Two status.

(Note: A certain guy who reached Super Two and got the largest raise of the entire sample, also signed a long-term deal the next year, giving us a sort of an outlier, which I omitted from all of the Super Two stuff above.  We refer to this guy as “Buster Posey,” who went from league minimum to $8,000,000 as a Super Two player.  He won an MVP, I hear.)

So, now to the 2nd part (the meat!) of the article; a look at regular salary arbitration and what sort of statistics teams pay for.

The stats we’ll be looking are are as follows: age, games played, plate appearances, batting average , on-base percentage, slugging percentage, isolated power, home runs, stolen bases, FanGraphs Positional Defensive Runs Above Average (Def), hits, times on base, and times on base excluding home runs.  From these stats and the analysis thereof, we’ll attempt to create a predictive model for salary arbitration hearings based on what teams have actually paid guys in the past.

The methodology for this was a bit tricky at first.  First, you need to get all the data. This was a bit more time consuming than I thought.  There isn’t a place that has service time, salary data, and all the statistics in a single exportable database.  So, every piece of data had to be entered into my own worksheets by hand! (I had this article planned for April and it became far too much work!) Second, you need to get the details of each salary arbitration hearing, and third, you need to find a meaningful way to compare individual seasons to the delta between what a player previous earned and what he was awarded in that year’s salary arbitration.

In order to get an idea of how well certain stats correlate to pay raises, I used a familiar method (to readers of my articles and posts) known as linear regression.  This method gives you a value called R-squared [R2] which ranges from 0 to 1.  The closer to 1 an R2 value is, the more explanatory one variable is of the other.

Visually, we can look at graphs to see what I mean.   Below are a few choice stats with their graphs, showing the relationship between the stat and the raise each player received.

arb3

def

 

arb4

arb2

As you can see, some things correlate well (times on base, home runs) and some things don’t correlate well (OBP and defense).

Something you may have picked out is that certain stats that don’t take playing time into account; OBP, for example. This causes a low R2.  This is because a .400 OBP over a 2-week AAA call-up doesn’t really provide all that much accumulated value, but a .380 OBP over a full season would provide tons of value.

With that in mind, here’s a list of all the variables I looked at, along with their R2 values for a single-variable linear regression.

arb7

Interesting to me was how important plate appearances are in calculating pay raises.  An R2 of nearly 0.6 is very high.  Intuitively, this makes sense, because a player who receives a large amount of plate appearances is probably doing something right.

Also interesting, and the thing which spawned the title of this article, is that batting average is more predictive of raises than OBP.  This essentially means walks and hit-by-pitches are “free” for a team when negotiating in salary arbitration (given what happened in my sample).

So, now that we have some ideas of what might be important, we need to move onto another method; multivariate linear regression.  This lets us use multiple variables to make a model more accurate.  This method has a few well-documented issues, namely oversampling (i.e.- “kitchen sink regression”), where the explanatory value between the independent variables and dependent variable always increases when you add more independent variables, even if they are gibberish.  Because of this, we use what is known as “Adjusted R-Squared,” which makes an adjustment to the R2 value based on how many independent variables were used.

My first effort was to look at AVG, OBP, and SLG…the good ‘ol triple slash…to see how all of those variables together worked out.  These three netted an Adjusted R2 of 0.2364, which is lower than SLG by itself.  Basically, this means adding AVG and OBP to SLG did nothing to strengthen this particular model.

Next, I decided to look at only things that involve some sort of playing time component, since those had high individual correlations from above.

I decided to look at getting on base (OBP), power (ISO), stolen bases, defense, and plate appearances.  Those stats should give a nice, well-rounded view of what type of player we’re looking at.  The Adjusted R2 for this venture? 0.6633.  Much better than our first attempt, but still only slightly higher than “times on base” by itself.  Let’s keep trying.

Next, I simply tried using the highest two individual variables (HR and PA) to see where we get.  How about an Adjusted R2 of .7227?  That’s nice!  I think we’re on to something.

Next, I started adding individual variables to the PA-HR-combo to see if I got better results.  After testing everything, the best I could find was the triumvirate of PA, HR, and TOB-HR.  The reason I use TOB-HR is to avoid double counting home runs.  This gave an Adjusted R2 of 0.7500.   Since TOB counts singles, doubles, triples, walks, and HBP the same, I tried adding SLG back into the mix… this actually lowered the Adjusted R2 to 0.7498.  Basically, SLG didn’t explain enough of the remaining variance to justify the noise it created by being included.

So, here’s a chart to summarize what I just long-windedly typed:

arb6

There you have it… nearly three-quarters of what determines your arbitration salary raise comes from how often you went to bat, how many home runs you hit, and how often you got on base outside of your home runs.  Definitely not what I was expecting, but hey!

So, from that data we can create a model using the outputs of the multivariate linear regression.  We get the following model:

Salary Raise = ($78,537*HR) + ($16,233*TOB-HR) – ($218*PA) -$391,280

As a player, you start out with a $391,280 hole, and it costs you $218 every time you step into the batter’s box.  Hitting a homer will net you about $78k, while getting on base will net you about $16k.   Obviously, you aren’t going to get a salary cut, so the model has an effective validity from 0 to infinity.  Within the sample of 299 player-seasons, there were only two instances of a zero-dollar raise: Jurickson Profar, when he was injured for an entire season; and Drew Butera, who went 1-10 in 6 games for the Dodgers in 2013.

As the last step, let us graph every player’s expected raise (as calculated by the above model), versus his actual raise and see what is looks like!

arb1

The model itself caries an R2 of 0.7803, meaning approximately 78% of what goes into a player’s raise can be deduced by PA, HR, and TOB-HR.

On the graph, I labeled a few outliers .The reasons for each are pretty evident; Buster Posey won an MVP, Chris Davis hit 53 home runs, Giancarlo Stanton already accumulated 117 career homers before hitting Arb1, and Matt Wieters just had a really good overall year as a catcher for Baltimore.  Eliminating those outliers makes the R2 of the model increase to over 0.8.

So, loyal Nation readers… what do you think makes up the other 20%?  Agent? Team? Desire to sign the player later?  Let’s hear about it in the comments!

Also, try predicting what you think Peraza, Herrera, Duvall, and Winker’s stats will be in their first arbitration year, along with their predicted salaries!  I promise you’ll have fun… 😉

 

31 thoughts on “Walks are Refundable

  1. If Duvall stays in the 4 or 5 slot, hits 30HR, and gets his OBP rate into the .310 range, he is going to make a lot of money. Winker really is going to be a poor man’s Votto unless he bats 1 or 2 or hits a lot more homers. Or did I not understand?

    • You understood.

      So, if we say 30 HR, 600 PA, and a .310 OBP, that means his TOB-HR would be about 156.

      Plugging that into the model, we’d get a first year ARB salary of about $4,860,000.

        • I get Winker at $5.2 million with those numbers. Recall that the model is for the raise itself, so we have to add back the $500,000 base salary.

          So, you have the following:

          HR portion: $78,537*15=$1.178 million
          TOB-HR: ((670*.4)-15)*$16,233 = $4.1 million
          PA: -$218*760 = -$165,680
          Adjustment: -$391,280
          Min Salary: $500,000

          Total Salary = $500,000 + $1,178,000 + $4,106,000 – $165,680 – $391,280
          Total Salary ~ $5.2 million

          Having a .400 OBP season with 15 HR is like a Matt Carpenter year… even with middling defense in LF/RF, that’s a ~4 WAR year… very, very good outcome for Winker, and I’m sure the Reds would be happy to pay him that much.

          The only similar seasons in the sample are the following

          Austin Jackson 2012 – .377 OBP, 16 HR, 617 PA – $3.5M salary
          Buster Posey 2012 – .408 OBP, 24 HR, 610 PA – $8.0 M salary

          So, those are big stats.

    • Ah, but what if a player’s age when he reaches arbitration has an effect (positive or negative) on his raise? Most likely, it does. In Duvall’s case, he may not see as big of a raise as another younger player with similar stats.

      Patrick, did you pull any data on that variable?

      • I should have looked closer at your chart above. There I see that the age regressor has a pretty low correlative value. Interesting result as I would think a team would at least be somewhat concerned with a player’s age even in arbitration years. But I guess it plays a much stronger role in free agency.

        • I really though age would play a larger role. I think the reason it does not has to do with the fact that so many of the guys who reach Arb2 and Arb3 are already 27+ years old, given that most guys don’t break into the majors until at least 23-24 years old.

          Most of the guys who are studs, and also young, end up signing early extensions. This is exactly what happened with Posey, Donaldson, and several others.

      • Arbitration is essentially a one year at a time cash and carry process; so, I’d think the impact of age would be minimal.
        In Duvall’s specific case, he figures to be first time arb eligible ahead of his age 30 season. As a corner OF or even corner IF, I wouldn’t think his age would matter at all.

        Another view is that while a team wants to pay everyone as little as possible, they might have slightly more incentive to hold down a younger guy’s 1st year arb salary because they would likely be more inclined to at some point want a longer term deal with the player; and, the first year arbitration eligible salary whether negotiated or won at actual arb, sets the the low bar from there on out.

        • Age probably does not figure into team arb calculations if the player is providing some value to the team. All but the latest of late-bloomers will be done with arb by the time age has any impact.

  2. I am curious as to why you didn’t use WAR as one of the statistics just to what the correlation was.

    If Super 2 only costs an average of 3 million extra over 4 years then I dont see any reason for teams to even worry about it. If a guy can help your team now, there is no reason to delay his call up for weeks to save 800 k per year for four years

    Also, this seems to suggest that the Reds should continue to get good value out of Billy Hamilton even in his arb years as stolen bases and defense don’t seem to be rewarded.

    • Because WAR is made up of runs above replacement for offense, defense, and base running.

      I wanted to isolate the counting statistics. The reason being, a Duvall-type with 2.5 WAR will get paid more than a Hamilton-type with 2.5 WAR and the model wouldn’t explain that if we used WAR as an explanatory variable. We’d still be left with the question “Why did Duvall earn more?”

  3. Great stuff Patrick… This is the sort of analysis that I hope the Reds are doing. It may help them find how to get the most production for the fewest dollars.

    • It’s funny, because if they are doing this sort of thinking, they might already be implementing the plan.

      As I’ve said from the beginning, a guy like Peraza will cost significantly less money producing 2.0 WAR with base running and defense than a guy like Schebler would hitting 25 HR.

      Now, the problem with that strategy might be that shoehorning yourself into guys with super low ceilings like Peraza means you are never going to find a superstar. Guys like Winker, Senzel are probably a better model than Peraza.

      • Patrick: First, thanks for the exhaustive research. Question, though: Why is Peraza’s ceiling super low? I assume that you are referring to his lack of power, but there have certainly been superstars (Pete Rose, for one) who weren’t home run hittersw.

        • It isn’t only the lack of power (which projects to around 5-8 HR a year, where Rose in his prime hit 10-15 HR), it is his lack of walking. Rose has a career walk rate of 9.9%, which coupled with a high BABIP, made him a valuable offensive player.

          Unless Peraza turns into a 10-15 HR guy (which is possible given he’s athletic and not tiny and still very young), he’ll need to vastly improve his plate discipline if he ever wants to sniff a 4 WAR season, which is sort of my floor for being called a “star.”

          At this time, what I see in Peraza is a guy who is almost a certainty to put up 1.0 WAR, very likely to put up 1.5 WAR, somewhat likely to put up 2.0 WAR, and probably fairly unlikely to put up any more than 2.5 WAR.

          Also, this may be controversial… but Pete really only played at a superstar level for a very short period of time. His norm was “star” rather than superstar.

          Case in point – Rose career had 354 offensive runs above average in 3562 games. Joey Votto currently has….354 offensive runs above average in 1268 games. (Libery taken with rounding to the nearest whole number)

        • Thanks, Patrick. Reasonable answer and interesting point on Rose–a point I find uncontroversial, actually.

      • Yes! Cheap wins! Although cheap wins may never get you 110 wins, it may improve the chances of never getting 80 losses, if that makes sense.

        I’ll also add an entirely subjective controversial anecdotal comment. Speed, pitching, defense seem to be much more deadly in the postseason. If their plan is to claw their way into wild cards and then apply the heat to opposing teams in the postseason at a discount of the cost, not entirely a bad plan for smallish market team.

    • Most production for the fewest dollars, though, only goes so far, since there’s no salary cap. The Reds could get the best production per dollar of any team and still finish under .500. They’ll need more than efficiency and will likely have to add some superstars to the stable to complement the lower ceiling guys.

      • Joey Votto won the MVP while making 525k. Kris Bryant makes about 15% of what Devon Mesoraco makes.

        If you build a roster using the price for production model then you should have enough flexibility to selectively retain a superstar or 2 who becomes expensive. When Votto, Bruce, Bailey, Cueto, Chapman, Frazier were cheap the Reds could afford more expensive, ancillary pieces. When those guys became expensive they couldn’t afford anyone else and the cheap players they had weren’t good enough to make up the difference.

  4. Patrick, it’s so nice to see another post from you. I realize the time commitment required to asseble the data needed for your posts and I appreciate the effort and commitment.

    From your data analysis, I get the impression that while front offices around MLB are moving to a new generation of methodology, the arbitrators are stuck decades behind the times in their awareness and award determinations. Does that sound like a reasonable takeaway from your analysis?

    • It does sound reasonable. Part of the problem is arbitrators (and players and their agents) try to use the most reasonable (or favorable, in the case of the player) past comps. If we keep looking backwards, we’ll always have some of this old school mentality baked in.

    • And thanks for the kind words. 😉 I do like being a part of the community, so I’ll post often enough so Chad and Steve don’t revoke my posting privs!

  5. Great post Patrick. Elite defense at a premium position is 20℅ +…… what will cozart get? That last 2 months has to hurt him bad….or maybe not…. Billy with his soon to be gold glove and duvall with his power and solid defense are setting themselves up to cash in….Reds will need to be proactive if both continue in 2017 that which they did in 2016.

  6. Jeter, the Reds need to hire you. I’m totally serious. I bet I can account for the outliers by 2 factors. First, be a catcher (high non offensive value) or have CRAZY power. Hence, HR are non-linear in that you get extra points for over say 40ish and definitely 50. I didn’t read all the repossessed so apologies if I repeated anything.

    • You certainly could be right. If I still had access to some more substantial analysis packages I could have looked for a mixed-model approach, but I am stuck with Excel’s analysis add-in! 😉

  7. Great analysis as usual Patrick. I wonder if the Super-2 rules will be tweaked in the new upcoming CBA? I am not a fan of teams holding back players until after June 1 or thereabouts. Not having a set date, and using only a % calculation that changes from year to year seems to be unfair to both the teams and the players. Middling players who are lucky enough to get the service time get the status that a star 2 year player is not afforded. Super-2 status, if there is going to be one, should be earned through on-field play. Not in the fact that a player can stay on a 25-man roster for 2+ years.
    The money difference a player can make by achieving Super-2 status was laid out nicely. I had wondered many times what the differences were. Luckily, most salary negotiations don’t reach arbitration anymore. But the ones that do help to set the market rate.
    Personally, I’d like to cap individual player salaries (+ all bonuses) at $20M, up the minimum player salary to $750k, and then pour some money into minor league salaries. MLB neglecting the issue of minor league salaries while they throw money around like Monopoly money at the ML level is criminal. MLB needs to protect and re-enforce the foundations of the game at the minor league level. Sturdy the foundation while rejecting lavishing in opulence in the Majors.
    Maybe even cap the number of players a team can have earning the $20M. Limit the number of years a contract can be at 5. The $25M-$30M+ / yr. salaries are going to kill the game.

    • The union will fight any kind of cap with every option they have, including a strike. Neither the owners nor the players want a strike, so it won’t happen. Baseball will be riding high popularity wise after this World Series, and to mess that up with a strike would be incredibly counterproductive.

      • I clearly see your point.
        It won’t happen with this new CBA, but 4 years from now, that might be a different story. The minor league salaries will have to be addressed soon. I would rather take $12M away from Kershaw, Grienke, etc. and bring them down to $20M and take the saved money and spread it out through the minor leagues.
        At some point the MLBPA is going to have to represent the minor league players. The betterment of a few players over that of hundreds of minor leaguers is what the union should be fighting against. The rights of a few over the rights of hundreds seems very elitist to me. However, the MLBPA doesn’t currently represent the minor leaguers, so their fate is left up to the teams.

    • Biggest issue with changing anything is that minor league players are not members of the MLBPA, so the union has no real charter or mandate to do anything at all for the minor leaguers.

Comments are closed.