# When Do Stats Stabilize?

If you’ve been around this website for the past few years, chances are you have seen me write something like “Well, [Stat X] doesn’t begin to stabilize until a player reaches [X Plate Appearances].” Statements like these hint at a topic that is near and dear to me; sample size!

We all know the most common baseball caveat; small sample size. What does it really mean, though? It is consistent among statistics? How do you know when a sample size is no longer small? I hope to shed some light on these topics.

To begin, I must disclose this is definitely not my own work. Many folks have tinkered with this stuff over the years, but the most commonly used metrics from Russell Carleton, aka “Pizza Cutter.” You can read one of his posts here. I’ll do my best to shortly summarize and then we’ll look at some Reds stats!

Basically, by using math we can determine what sample size of plate appearance, at-bats, or balls in play are required before you can begin to trust the stat you are analyzing. For example, we all know as baseball fans that if a batter gets a hit in his first at-bat of the season, we shouldn’t consider his “true talent level” to be a 1.000 AVG. If a hitter goes 7-for-10, we know his true talent level is not a .700 AVG. However, most fans get to a point where they start to believe the numbers they are seeing. This may be different for each fan. Take Joey Votto’s slow start this year. Entering Thursday’s game with the Cubs, he had a .182 AVG in 63 plate appearances. It’s natural to wonder what’s wrong with Joey. But if I told you batting average took *over 900* plate appearances to begin to stabilize, would you stop worrying? Some might. Some might ask “What is stabilizing?” Glad you asked!

Stabilization points were determined by Carleton using a method called split-half reliability. Essentially, you take a sample and cut it in half, then arrange those halves in every possibly combination and run correlations between the two sets of samples. Average those out, and you get an overall correlation. So, what is stabilization? Carleton surmised that once all the sets of samples reach a mean correlation of around 0.7, they have “begun to stabilize.” Another way to say this is that the signal-to-noise ratio is at 0.5 (i.e. – 0.707*0.707=0.5), which means there is as much true talent level description in the stat as there is random variation.

So, when do some stats begin to stabilize? Here’s a chart of a few for your viewing pleasure:

Notice the quickest things to stabilize are all related to how often you swing; swing percentage, strikeout percentage, and contact percentage. Other things take very, very long to stabilize. Batting average, for example, takes longer than a full season to determine a true talent level. For some players, we’re already to the stabilization points for Swing% and K%. For others, we’re close. Let’s have a look at the first three and see if we can make any conclusions from the data:

(Note: The asterisk denotes that I also used Scott Schebler’s 2015 AAA numbers when determining his K%, and the double-asterisk denotes that I used Devin Mesoraco’s 2014 numbers in the 2015 column since he didn’t really have a 2015 to speak of.)

This chart is sorted by Swing Rate. I think if you’ve been watching the Reds this year you didn’t need a chart to tell you Brandon Phillips has been swinging at everything in sight. Now you have proof! Given that Swing% stabilizes rather quickly, we can make the statement that Phillips has likely altered his game plan and his new approach and true talent level has his swinging at many more pitches than at any other time in his career.

This chart is sorted by Strikeout Rate. Zack Cozart has been dazzling so far. If he truly changed his approach and we can trust his increased contact rate and decreased strikeout rate, he could be in for a career year, even if his luck on balls-in-play takes a major turn for the worse.

As expected, since Phillips is swinging way more than ever he’s striking out less. This may be counter-intuitive, but swinging early and often means you put the ball in play before you ever have the chance to strike out.

This chart has been sorted a third and final time, now by Contact%. This stat begins to stabilize around 100 PA, so there can still be some noise in these. But they can also be telling.

The point of this article was not to draw any huge, ground-breaking conclusions, but simply to introduce the idea of reliability and stabilization so we can discuss it more in the future!

I’d interested to hear what you guys can glean from the above data, as well as your eye balls! Post your findings on the comments below!

(Note: For those of you who were expecting a third RE24 article, I apologize. I decided three articles in a row about the same topic was probably not fun for readers who didn’t like the topic.)

I’m surprised Mesoraco’s contact % is so high for 2016. Seems like he’s been struggling. I guess he’s either unlucky with BABIP so far or not hitting the ball hard.

But he also has a very low swing rate. He also has a low BABP but he ALWAYS has had that so it may be that he just will have that.

I worry that Mes is never going to be a star player. The good thing is that he takes walks, but the bad thing is that his swing is complicated and can get out of mechanics easily. Other than 2 months in mid 2014, he is a mediocre hitter. His defense is also shaky.

Devin is coming back from nearly a year on the sideline. It’s going to take him awhile to rediscover his 2014 form, but I’m fairly confident he can do it. Here’s all you need to know about him:

For his career, nearly 30% of all balls he makes contact with are “hard contact”, which are much more likely to result in hits, especially of the extra base variety. In 2014, nearly 40% of his contact was “hard”. But in limited at bats last year and so far this year, he has yet to crack 20%. He’s just now entering his prime years, so there is no reason not to think he can’t get his hard contact % at least back to his career norms if not well beyond them like he did in 2014.

Once he starts making harder contact with the ball, the hits (and doubles, and home runs) will come.

The bottom line is, it’s way too early to give up on Devin. It’s all about sample sizes. Someone should really should write an article about that…. (/snark)

Not at all ready to give up on Mesoraco, but I’m sure glad the Reds have a solid backup catcher in Barnhart.

Planning on doing all the other more interesting stats once we being to near their appropriate stabilization points.

I thought baseball’s most common caveat is “but that was against the Phillies.”

Bryce Harper gets to play the Phillies, Braves, and Marlins something like 60 times this year.

When we talk about statistics “stabilizing” that doesn’t mean they’ve reached a point where they can be highly predictive. Remember that the data has just cleared the 50-50 split between talent and luck. So there’s still a hefty portion of luck. Plus, talent changes as the year goes on. Players aren’t the same players all year long. For example, Devin Mesoraco has sat out an entire season. He’s likely to be a different player in the first month than he is in the third month. Pitchers change a lot during the course of a season.

So the stability threshold is more a floor. Before you reach it, the numbers are unreliable. After you reach it, they mean more in terms of evaluating what a player has done, but are still far from strongly predictive about the future.

Well-stated, Steve.