About a week ago, I saw this tweet:

MLB strength of schedule so far:

1. Reds*

2. Cubs

22. White Sox

30. Nationals* – Reds have played the Cubs 7 times.

— Gruncle Pooch (@Pooch7171) April 26, 2016

And this tweet:

@ZachENQ according to Baseball Reference the #Reds have the best strength of schedule in #MLB and people bash BP as the manager

— Jakob Porter (@Jakob130) April 27, 2016

And then this article was posted a few days ago:

The 30 is live! Breaking down strength of schedule for Reds/Diamondbacks/Tigers/Mets & singing praises of Conforto: https://t.co/RydWgUxk5K

— Jonah Keri (@jonahkeri) May 2, 2016

All of which made me wonder, how exactly does strength of schedule work in baseball? With a different pitcher everyday, the ridiculous randomness of play, and general parity across the league, how could you possibly determine strength of schedule in any meaningful way?

It turns out strength of schedule can be found a couple of different ways, neither of which are useful or accurate (in my opinion). First, there’s ESPN’s Relative Power Index (RPI), which is 75 percent strength of schedule and 25 percent how good that team is, leading to a largely irrelevant statistic.

The exact formula is 25 percent the team’s winning percentage, 50 percent opponent’s average win percentage, and 25 percent opponents’ opponents average win percentage. In theory, RPI is cool because it controls for the difficulty of other teams schedules, taking the measure a step further, but the inclusion of the team’s own winning percentage is weird. Just look at the current RPI rankings and try to draw a meaningful conclusion.

The second method is Baseball Reference’s (the one Jonah Keri used in his article). Here, a team is ranked on how many more runs their opponents would score when compared to the average MLB team. You can clearly see that the Reds and the Braves have had a rough go of it this year, while the Cubs, Dodgers, and Nationals are just coasting.

While this method is far more useful (the rankings actually make sense), it still doesn’t do the one thing that a good measure of strength of schedule should: account for pitching.

Imagine you come to the park one day to play the Dodgers and are facing Clayton Kershaw. That Dodgers team seems a lot more fearsome than the one you played yesterday when Alex Wood was starting. But according to all available strength of schedule measures, it’s still the Dodgers so it’s still the same level of difficulty.

With this grievous oversight in how we think of our favorite’s team schedule in mind, I decided to make my own strength of schedule measure, using pitching as the foundation.

What I did was compile for all 30 teams each opposing starting pitcher for 2016 and that pitcher’s FIP from 2015. I used FIP because it is inherently not dependent on the defense behind it, thus more accurately capturing the skill of the pitcher and not the team. I also used 2015 stats as opposed to 2016 or career because a) most pitchers haven’t logged enough innings in 2016 for those numbers to meaningful, and b) factoring the type of pitcher C.C. Sabathia was in his prime to his 2016 self seemed disingenuous.

From those numbers, I assigned a point value to each pitcher, breaking them into different types.

- An Ace is a pitcher you expect to win every time out. Justin Verlander in his prime. These are the guys that don’t even need people playing the field behind them. These are the Satchel Paiges of the world.
- A Dependable pitcher is one you expect to compete and give you a quality start each time. This is John Lackey. This is Michael Wacha. This is your number two who you overpay for but you know you couldn’t live without.
- The Toss Up guy is the one you run out there because your other guys need rest, and if this guy has a good day, then you could eek out a win. This is the type of pitcher Reds fans know all too well. Alfredo Simon is the prototypical Toss Up.
- The Spot Starter is someone who you only run out there you’re rebuilding or your training room resembles the library during finals week. Newbies, or Debuts, also fall here because they’re unproven and you can’t assign a score to potential.

So the first part of my strength of schedule stat was just a raw score portraying the average relative difficulty of pitchers faced.

Realizing that while pitching is an integral part of a team’s success, the other measures do have it right in needing to include whole team measures. After all, Clayton Kershaw pitching in front of the Bad News Bears will not do nearly as well as in front of the Dodgers. That being the case, I included a second numeric score, which represented the difference of the cumulative win-loss records of their opponents. Instead, of taking the present-day win-loss for each opponent though, I took their record at the time of the game, hopefully reflecting the current strength of that team.

To get the final number–the Pitching Dependent Strength of Schedule if you will–I added the raw pitching score to the difference number and then divided by the number of games played.

As you can see, nothing radically changed. The Reds and Braves have still played far and away the hardest schedule, while the Dodgers and the Nationals have been taking a month-long cake walk. The Tigers and the Cubs did jump pretty significantly from the bottom to the middle, but in terms of magnitude, neither has had a particularly difficult road.

Looking at just the Reds, you realize pretty quickly just how strong the NL Central is. The Cubs and Pirates inflate the win differential exponentially, and the both of them plus the Cardinals all have top-tier pitchers throughout their rotation. So the road’s not going to get any easier any time soon.

Though I like to think my measure is a bit more accurate than the professional statisticians, it has its fair share of problems. For one, it doesn’t control for Dallas Keuchel being otherworldly last year–earning Ace status–but being painfully mortal this year. Same with newbies like Kenta Maeda, who has dominated, but earns no pitching points because he has no stats to base from. Also, converted relievers like Juan Nicasio pose a problem in that it is far easier to compile a FIP below three pitching just one inning at a time than with a full starter’s workload.

If any of you have tweaks to my system or have a far superior system to propose in the comments, I’m all ears. Personally, I feel pitching must be accounted for, but obviously the powers that be feel a bit differently.

The bottom line is: This has not and will not be a fun season for Reds’ fans. But, if you want to point to one cold, unfeeling number to blame, here it is. Point and blame away.

Very creative, but a rabbit hole of nuance (why stop at pitching– what about times facing the “b” lineup or a star player taking a day off, and then there’s a day that the Reds blow up Johnny Cueto, and ace, for 6 runs?). I’m pretty content to go by overall won loss record and trust that in the end, the good teams have winning records, the bad teams do not.

Interesting but, sort of like rankings in college sports, maybe it doesn’t make sense to look at this type of statistic until the season is a bit older. Maybe that way you could use the FIP (assuming sample size is right) from the current year, negating the Keuchel effect.

Also, are you including relievers in this? It would be interesting and also might raise the strength of schedule when a newbie or tossup pitcher is up there. If the starter can only go 5 innings but the bullpen is lights out doesn’t that sort of raise the strength of the pitching staff? Maybe you could just use FIP of the entire pen if going that deep on relievers with low innings counts doesn’t fit the bill.

Overall interesting to think about.

I have nothing germane to contribute to the discussion. But I want to say how much I enjoyed reading it. And reading the comments as well.

Certainly a noble attempt to quantify SOS for baseball. I think SOS is just very hard to get an accurate read on because there are so many factors to consider for individual games, and some were mentioned above.

For instance, in college baskeball, my favorite team UNC took a little flack in the media when they won the ACC outright this past season because they had the weakest SOS of the top tier schools. Fortunately they went ahead and won the ACC tournament too so they could shut up some of the talking heads. But really, there were two major factors contributing to the poor SOS. #1 Carolina couldn’t play themselves. So they were in effect penalized by not playing the top team in the conference when everyone else got a boost from playing them. #2 They played 0-18 BC twice this year which significantly dragged their SOS numbers. If Carolina had played any of the other middling schools at home instead of BC they were very likely to still have won and their SOS numbers for the conference wouldn’t have merited mention. Of course you are dealing with a small sample size (18 conference games) and an extremely unbalanced schedule for all teams. But the point is, there are factors out there that cannot be predicted or determined simply from looking at SOS.

The Reds will have a more difficult schedule because they can’t play the Reds. The Reds will also benefit in their SOS numbers from playing a larger amount of games against the Cubs, Cardinals, and Pirates. Does that make their schedule harder? Yes. Does that make SOS numbers mean anything more? Not really in my opinion.

Very neat concept. I never thought of doing something like that. Wonder what it might look like if one were to take the SIERA and wRC+ metrics and then compute and index based on those metrics, weighted by opponents’ index? I wish I had more time to do that kind of stuff.

Many, many years ago I was working on something similar for picking NFL games. Essentially I was comparing how many points each team had allowed versus common opponents versus how many they had scored versus common opponents with data of the two teams in question removed from all calculations. As the season progressed (sample size grew) it started to get very accurate not so much at picking the raw score but indicating the relative closeness of the game (and usually the winner except where the index was very tight).

I’ve been sitting here trying to recreate my exact logic and apply it to MLB but alas the synapses have aged and the notes (this was back in the pencil, paper, and 4 function calculator era) were long since lost one of numerous moves…..

There is also the additional layer of complexity with baseball games in that a lot depends on the exact starting pitcher. This is neat mathematical fun though!

Kudos for finding something positive the Reds are leading the league in!

Nice article, enjoyed reading it. However, I think your method is actually double-counting the pitching, rather than adding a missing element. If I’m the Dodgers, and I have 5 Clayton Kershaw clones, my winning percentage is going to benefit dramatically, right? Or, if I have 5 Alex Woods, it’s going to suffer; and different mixes of pitchers are going to have different impacts on how many games I win. Or, if they don’t, then my defense and offense are amazing, and I’m still a really tough team to play.

Likewise, Runs Scored and Runs Allowed in comparison to an average team are going to be highly dependent on who my pitchers are and how well they pitched. In other words, wpct. and RS/RA are already measuring the impacts of different pitchers and their pitching.