Cautionary Notes About Sample Size Stabilization Points

In this post I’m going to try to tie a pizza utensil, Bill Murray, and Charlie Blackmon together all while trying to help you avoid a pitfall I think many are making in their fantasy baseball research.

Read enough fantasy baseball advice and you’re bound to come across something like this:

We can now trust player x’s <insert rate statistic here> because he’s reached the number of plate appearances for the stat to become reliable.

Or maybe this:

We’ve reached the point of the season where <insert rate statistic here> starts to stabilize.

Maybe you even clicked on a link near the comment, saw some fancy tables with a lot of other stats and when they “stabilize”, references to r-squared, and then concluded, “Seems legit to me.”

These comments are usually followed by some kind of analysis that uses the stat in order to project into the future. This is the problem! More on that in a bit.

Not So Fast My Friend

I’ve long been a victim of this. I’m not a statistician, so if I someone makes claims like this and links to a study that looks legitimate at a quick glance, I’ll buy into it. This seemed even more reliable because the study is quoted at a lot of reputable sites like Fangraphs, Beyond the Box Score, and more.

But I’m also a regular listener of Fangraphs’ “Sleeper and the Bust Podcast” with Eno Sarris and Jason Collette. I’ve heard Sarris mention a disclaimer several times when referring to sample size stabilization points that has always left me a little unsettled. So I decided to investigate.

A Little History

The original study was performed in 2007 by a man writing under the name “Pizza Cutter”. It’s a heavy load of information to consume, but I do recommend it so you can understand how he performed the test. Plus, it’s proven to be a very popular piece of reference material, so it wouldn’t hurt to familiarize yourself with it.

You can’t really tell from the original study, but it turns out that the research has been misused and misinterpreted by many people. So much so that Pizza Cutter himself has since written several times that his work is being misused.

These Stats Are Not Predictive

Russel A. Carleton, who ditched the Pizza Cutter nickname (except on Twitter), is the man behind the stabilization points research. He has this to say about the predictive value of stabilization points:

…they are not nearly as powerful in predicting the future as people seem to believe that they are.

And it makes sense. When developing projections before the season starts, the typical projection system uses at least three years of data. So then why are we so quick to believe that three weeks of April at bats are meaningful at predicting the rest of the season!?!?!

When referring to an example of how his study is used to say, “this new strikeout rate that we’ve seen is what we can start to expect”, Carleton writes,

That’s not what the study was actually about.

If you haven’t gone and read the article yet, I do recommend it. You can just sense the angst in Carleton’s writing. The title of the article, “It Happens Every May”, speaks volumes. I can just see Carleton surfing the web as we speak reading countless articles inappropriately referencing his work and thinking to himself, “Every season I have to put up with this $#!_”.

It Helps To Understand What Carleton Was Trying To Do

Carleton wasn’t trying to develop a projection methodology in doing this research. He states that one of his favorite things to do is to make up his own statistics and study if they correlate to other metrics we already use in baseball research.

It doesn’t make sense to do advanced baseball research on small sample sizes. So all he wanted to know was how soon into a season, or with how small of a sample size, could he begin conducting these studies of his.

What Stabilization Points Really Mean

Carleton clearly states that these really are meant to be used for restrospective analysis. Meaning they are for backward looking uses. I probably don’t have to warn you that we are not in the backward looking business when playing fantasy baseball. We are forward looking. We want to predict the future.

I think this quote from the same Carleton article is very illustrative of what the stabilization points are good for:

If you took Player X and gave him 60 PA, and then gave him another 60 PA in roughly the same circumstances, how well would his performance in each match up?

Again, he was trying to prove the validity of newly created baseball metrics. So his intent was to find the earliest point where he could reliably use other more commonly used stats.

Using 2014 as an example, he wants to know if Charlie Blackmon were to repeat the season and face the exact same pitchers, in the exact same order, in the exact same ball parks, in the exact same weather, how close would the second time of those exact same scenarios be to the first time through? Carleton was trying to find the point at which Blackmon’s statistics represent the true Charlie Blackmon talent level and are not susceptible to noise from a small sample.

Carleton wants to know what would happen if Charlie Blackmon was put through Groundhog Day for purposes of baseball research. He was not trying to give you fantasy baseball advice.

Think About It This Way

Especially when we’re dealing with a stat like strikeout rate which stabilizes in 60 PA, a player’s next 60 PA are not likely going to be in roughly the same circumstances.

A player might have faced Yu Darvish twice in the first four weeks of the season, or had a four game series in Coors Field, or had a blister on the palm of his hand for two weeks.

This is precisely what Carleton wanted to know. He wanted to control the environment as much as possible. That is why his study was designed the way it was. But it is very much NOT what we want for fantasy baseball analysis.

At such a small sample size, we really don’t know much more about a player than we knew from studying the last three years of statistics that were incorporated into our preseason projections.

Pizza Cutter Does Give Us This Advice

What I have chosen to extract from Carleton’s writing is his strong reminder that each player’s true talent level is continually changing. Every day a player is a little bit different than the day before. They’re a little older, a little wiser, they learned something from the day before, they might have forgotten something learned long ago, their talents either enhanced slightly or deteriorated.

So even if a player you drafted has a 30% strikeout rate and he has reached the “stabilization point”, it’s not the end of the world. Tomorrow is a new day and he might be a slightly different player. You never know, within a few more days he might have changed back to that same player with a career 20% strikeout rate.

References

In writing this article, I referred to the following articles and websites:

525,600 Minutes: How Do You Measure a Player in a Year?
By Pizza Cutter, 11/14/2007
It Happens Every May
By Russell Carleton, 7/24/2012
It’s a Small Sample Size After All (updated research)
By Russell Carleton, 7/16/2012
When Samples Become Reliable
By Eric Seidman, 5/22/2009

Thanks For Sticking It Out to the End

Stay smart.

Follow @smartfantasybb

Not So Fast My Friend

A Little History

These Stats Are Not Predictive

It Helps To Understand What Carleton Was Trying To Do

What Stabilization Points Really Mean

Think About It This Way

Pizza Cutter Does Give Us This Advice

References

Thanks For Sticking It Out to the End

Share this article: