Projections

An Important Concept Behind Making Projections

Envision a Major League Baseball player’s stat line.  If you’re having trouble doing that, here’s one:

Paul_Goldschmidt_2014_Projection
Paul Goldschmidt’s recent MLB stat lines, courtesy of Fangraphs.com.

Those are Paul Goldschmidt’s Major League statistics for the last three seasons.

How Do We Take That Information And Create 2014 Projections?

Do we just eyeball it and say, “He hit 20 HR in 2012 and 36 in 2013, so I’ll project 28.”?   Do we give more weight to 2013, because it’s the most recent season?  Is Goldschmidt still improving?  Could he hit more than 36?

What about stolen bases?  Or batting average?  Runs?  RBI?

There are a lot of moving parts here.  And they’re all somewhat related to each other. How do you make sense of all this information and develop a sound, reliable, and accurate projection for what will happen in 2014?

We Have To Disaggregate the Data

“There you go again, Tanner.  Using words like ‘disaggregate’.  What does that even mean?”

An Example

Assume you own an ice cream cone stand and you’re trying to project what sales of ice cream will be this month.  What factors would go into that calculation?

You could just project it at a very high level and say, “Sales were $10,000 last month and $9,000 the month before.  So I will estimate $9,500 for the current month”.  And that might give you a reasonably close estimate.

But the key to accurate projections is to look at underlying data or events that make up that end result.  You want to break apart the big event, or disaggregate it into smaller events you can study and measure.  Instead of trying to guess the ending sales result, you’re better off trying to project the smaller things that make up that monthly total:

  • The average selling price per ice cream cone
  • The number of ice cream cones sold
  • How many hours is the stand open each day?
  • How many people will walk by the ice cream stand in a day?  In an hour?
  • Out of every 100 people that walk by the stand, how many buy a cone?

After you have estimated this information, you run the math and calculate the total sales for the month.

Why This Works

It’s hard to just look at $9,000 and $10,000 of monthly ice cream sales and make sense of those numbers.  But if you know that you raised the price of each cone 25 cents, that you just hired an employee that will allow you to keep the stand open longer each day, that the employee has a striking resemblance to Jennifer Lawrence (with long hair, please) and has an uncanny ability to sell ice cream, and that there is a large festival taking place this month that will bring an extra 5,000 people by the stand, then you’ll be able to make a much more accurate projection than you would by simply looking at past monthly sales figures.

Applying This To Baseball

You can think of our typical rotisserie baseball categories as aggregated data, like the monthly ice cream sales.   When you break it down a home run is actually the end result of many smaller outcomes that added up to the end result of a baseball being hit over the fence.

All of these events have to happen for a home run to occur:

  • The ball has to clear the fence, which means:
    • The ball has to travel X number of feet
    • The fence is < X from home plate
  • The ball has to be hit in the air (a fly ball)
  • The hitter has to have an at bat, which means:
    • The hitter has to have a plate appearance
    • The hitter has to make contact (no swing and miss)
    • The hitter has to swing

We could take this further, but you get the idea.

We Live In An Amazing Time

Fortunately, we have data available (for free!) to measure every bullet point above.  Sticking with our original Goldschmidt example:

(more…)

Case Study - Weighted Average Probabilities and Ryan Braun

Case Study – Weighted Average Probabilities and Ryan Braun

Hindsight is 20-20.  We all know this.  And now that Ryan Braun has been suspended for his association in the Biogenesis scandal, it’s easy to to say that we overvalued Braun in our draft preparation.  But let’s look back to what we knew in the preseason and use this as a learning opportunity to apply a lesson in weighted average probability and expected results.

What Did We Know?

News surfaced in early 2013 that Ryan Braun and numerous other players were associated with Biogenesis.  Documents were obtained that showed an official link between the players and the clinic.   There was speculation that the players involved could face suspensions during the season.

We didn’t know much more than this.  Would players miss 50 games?  100 games? Would the suspensions come down during the 2013 season?  Or after?  Could MLB even uncover enough evidence to support suspensions?

What Could Happen?

For Braun, we could reasonably assume he’d be the target of a 100-game suspension. He was nearly the recipient of a 50-game suspension in the fall of 2012, but managed to avoid it on a technicality.  So new evidence could push him from a first-time offender to a second-time offender (and a 100-game penalty).

Let’s Start A Basic Projection For Braun’s 2013 Season

If we are to build a projection for Braun’s 2013 season, a reasonable place to start would be to look at career averages.  Braun played a partial season in 2007 and played at least 150 games in 2008-2012.  So let’s use these last five years of “full seasons” and figure out the average production as our baseline estimate:

WAP1

These average to 154 games, 672 plate appearances, 34 home runs, 105 runs, 109 RBI, and 22 SB.

But What If This Isn’t An Average Season?

We know Braun was nearly caught as a PED user in 2012. So what if he was scared into stopping his use of PEDs?  Can we build this into our estimate?

We don’t have any scientific data to understand the exact effect of PEDs.  So let’s throw out a rough guess and say we think the effect of stopping the use of PEDs would slightly decrease his production.  We’ll say his numbers would remain at 154 games and 672 plate appearances, but he drops to 25 HR, 90 R, 90 RBI, and 20 SB.

To summarize our two scenarios:

WAP2

How Likely Are These Scenarios To Occur?

You might have your own beliefs about the likelihood of each, but for the sake of example let’s say we think Braun is 90% likely to have another year in line with his past five seasons and 10% likely to experience a year where the effect of no PEDs drags his performance down some.

WAP3

And What If He Gets Suspended?

Again, for the sake of illustrating a simple example, assume a 50% chance Braun does not get suspended during the year and a 50% chance Braun misses half the season.

These 50-50 alternatives are subsets of our previous two scenarios.  So the 90% chance Braun has another average year now becomes a 45% chance (90% * 50%) he has a career average year and does not get suspended and a 45% chance he has a career average year and does get suspended.

Likewise, the 10% chance he sees a drop in productivity due to coming off PEDs is split into a 5% bucket of not being suspended and a 5% bucket of being suspended.

Regardless of the scenarios we lay out, we must remain at 100% total probability for all the possible outcomes.  Something has to happen.  And with 45, 45, 5, and 5, we’re still at 100%.

WAP4

Weighted Average Probability, Expected Results

Once you have probabilities for each possible outcome, it’s easy to calculate the total expected result.  We simply multiply the expected statistics for each scenario by the likelihood of that scenario.  This is the “weighting”.

Look at the 5 Year Avg – No Suspension example.  We have determined this scenario has a 45% chance of occurring.  45% multiplied by 672 plate appearances is 302.40.  45% multiplied by 34 home runs is 15.3.  And so on.

Here are the weighted averages of all scenarios:

WAP5

Our overall or actual expectation is the sum of each different weighted scenario.  You can see this total at the bottom of the table above.  After taking all possible scenarios and their probabilities into account, we estimated Braun for 25 HR, 78 R, 80 RBI, and 16 SB.

The Bigger Point

This approach of calculating weighted average probabilities can be used in many different scenarios.  Do you think there’s a 25% chance Troy Tulowitzki plays a full season, a 50% chance he plays 120 games, and a 25% chance he plays 80 games?  Do you think a rookie has a 25% chance of being called up in May, 25% in June, and 50% in July?  Do you think there’s a 50% chance a player will bat leadoff during the year and a 50% chance he’ll bat 9th?  Is there a 25% chance a rookie call-up will break onto the scene and be very productive, a 50% chance he’ll be an average player, and a 25% chance he’ll be sent back to the minors?

In any of these situations, calculate an estimated outcome and weight it using the probability of that outcome occurring.

Be Smart

Thanks for reading and continue to make smart choices.