Wednesday, October 19, 2011

How much does "Moneyball" help a team?

How much is sabermetrics worth to a team?

That's probably a hard question to answer. Every team uses statistics to some extent. Even before sabermetrics, teams were looking at player statistics to decide who to play and who not to play. They may not have had any fancy formulas, but they had a pretty good idea of how to weight the relative contributions of players. Nobody ever released a 30-HR guy because he was only hitting .240, and nobody ever released a .330 hitter because he had no power. Intuitive evaluations weren't perfect, of course, but they were pretty reasonable most of the time.

Where sabermetrics helps, I think, is not in evaluating actual performance, but in helping figure out *future* performance. How to extrapolate minor-league performance in to major league performance ... how to take luck out of a player's batting or pitching line ... figuring how different kinds of players age ... that sort of thing.

Suppose you took a team management right out of the early 1970s, and gave them a team today without letting them learn anything discovered after 1977. How much would that team underperform compared to the rest of MLB? I don't have an answer to the question, but I'd be interested in hearing yours.

Anyway, here's a narrower question. How much can a more sabermetric approach *today* benefit a team, compared to, say, the typical team's sabermetric approach? For instance, how much did Billy Beane really mean to the A's?

A couple of weeks ago, Tango did a study to figure out which teams did better or worse than expected, given their payroll. The A's were the team that outperformed the most over the last decade -- about 7 games per season, it looks like. That's a lot, but there's probably a whole bunch of luck there, since we're cherry-picking them as the best of the lot. Also, it's possible that much of their outperformance came in the early years, when, as many critics of "Moneyball" hype have pointed out, they had three underpriced ace starters.

So, we'd have to regress that 7 games to the mean a fair bit. If you made me make an arbitrary guess, I'd be willing to bet that less than half of that seven game advantage came from sabermetrics. (But, I have no real basis for that guess without studying it.)


Anyway, with the Cubs signing Theo Epstein, we now have a market estimate for what sabermetrics might be worth today. Epstein's new agreement is for about $4 million per season. He still had one year to go on his contract with the Red Sox, for which they will receive some sort of compensation from the Cubs. Let's say that compensation will be worth $1 million. So Epstein's value is around $5 million. I don't know how much an average replacement level GM makes, by comparison. To be conservative, let's say it's $500,000, although it's probably more than that. That means that Epstein's excess value is $4.5 million, exactly what it costs in free agent players to gain one extra win.

It looks like that's what Epstein is worth: one win per season.

Is that a lot? Frankly, I don't know. It's a competitive market for players these days, with lots of money on the line, and there's lots of random luck in who makes it and who doesn't. In that light, it could be that one win per season is an exceptional, genius-level performance.

If that's the case, doesn't it mean that the "Moneyball" approach is overrated? I mean, one win a year. At that rate, it would take decades, even centuries, to have good statistical evidence that the sabermetric approach works.

Of course, you have to remember that that's compared to other teams ... and, nowadays, those other teams are doing a fair amount of statistical work themselves. Maybe it's three or four games over a team that won't look at anything new at all, that never heard of Voros McCracken and winds up overpaying pitchers with lucky BABIPs. And, maybe Epstein took less pay than he was worth in order to become a Cub. Maybe it's a win and a quarter, or a win and a half.

Still ... to me, one game doesn't seem that unreasonable. The point might not be that an you can win pennants just by embracing sabermetrics. The point might be that, with every team in a sabermetric arms race against every other team, you certainly can *lose* pennants if you persist in living in the 70s.

But, again ... one game. Doesn't that mean that if a team does well, and someone credits "Moneyball," they're probably just blowing smoke?


UPDATES:

1. In the comments, Bill Waite suggests that sabermetrically-savvy managers might have a significant impact, too. He says that just rejigging the lineup is worth almost half a game a season, and says that the difference between best and worst could be as much as eight games.

Food for thought. It would be interesting to consider how to try to look for this in the historical record (if indeed that is possible), since we know that some managers are indeed more numbers-oriented than others.

2. Matt Swartz e-mailed me about a study where he found a positive correlation between sabermetric management and team performance. It's here.


Labels: , ,

13 Comments:

At Wednesday, October 19, 2011 12:10:00 PM, Anonymous mettle said...

Interesting!
My intuition says that the A's 7-regressed-to-something-like-4 wins is a more reasonable estimate than the 1 win Theo is paid. So, it seems that there is an inefficiency there, perhaps?
I guess really figuring it out requires defining what Moneyball is or isn't, which seems pretty tricky. And in terms of evaluating individual GMs (regardless of philosophy), it seems quite impossible to establish a sample size (of what? trades? wins/$?) that will tell you anything, as you point out.

 
At Wednesday, October 19, 2011 1:51:00 PM, Anonymous Bill Waite said...

I'd have to guess that it's significantly more than 1 win. 1 win would be about a 0.625% improvement in single-game win percentage. I don't know about GMs off the top of my head, but I know a lot of MLB managers out there are losing a lot more than 5/8 of a percent through boneheaded in-game decisions.

Recently, people on Tango's blog have been pointing out the most obvious mistakes made by managers in the ALCS and NLCS, and it seems as if managers are making 2-3 boneheaded decisions per game with an expected loss of about 1% of a game per bad decision. (Small sample size, I know, but bad decisions don't seem all that rare to me.)

If we're talking about sabermetric perfection (where your team's proprietary model is somewhat more accurate than anything publicly available, and every decision you make on and off the field has been well studied) vs. complete idiocy on and off the field, the difference would be pretty large. Maybe 8-10 games per season? More?

If we're talking decently smart vs. extra-smart (where even the "replacement-level" GMs are at least smart enough to realize that OBP is more useful than batting average), the difference might be in the range of 1-4 wins. (And the concept of a "replacement-level" GM is silly; the worst GMs are bad because the owner doesn't recognize the difference between good and bad GMs, not because the worldwide pool of potentially competent GMs is that tiny.)

 
At Wednesday, October 19, 2011 2:26:00 PM, Blogger Phil Birnbaum said...

Bill,

That's a good point about in-game managerial decisions. I was thinking only about GMs, but managers might be a factor too. I always thought the biggest factor for managers was in choosing which players to put in the lineup ... I still do, but after reading some of MGL's posts lately, I'm starting to think that standard strategic decisions might be bigger than I thought.

Hmmmm ... 8-10 games per season? You may be right, but seems like a lot. Where would they come from?

1-4 games seems to me like the right order of magnitude. My gut says the low end of that, a game or two. But, again, you may be right.

 
At Wednesday, October 19, 2011 2:55:00 PM, Blogger Phil Birnbaum said...

A related question, for bill or anyone else: have there been smart GMs or managers in the past whose sabermetric sophistication got them lots of wins?

Like, Earl Weaver, good pitching and the three run homer (and, presumably, OBP). How many wins was Earl worth, and why?

 
At Thursday, October 20, 2011 2:25:00 PM, Anonymous Bill Waite said...

Keep in mind that 8 games per season is only 5%, and I'm talking about best vs. worst.

A lot of that 5% comes from the idiocy of the worst managers, like those who IBB Miguel Cabrera w/ nobody on base.

The rest comes from the accumulation of lots of minor edges. For example, I recently updated my expected runs calculator (posted on Tango's blog under "Markov 2) to look at individual batters and show the effect of changing the batting order (w/o changing the identity of the starters). It's not polished and ready for publication, but it appears to show that the popular strategy of putting your superstar in the cleanup slot typically scores about .03 runs per game less than the optimal batting order.

That's a tiny amount, obviously, but it adds up to 4.86 runs per season, which is about half a win.

And that's just one minor thing that I threw together with less than a week of actual work.

I imagine that a great statistics-minded manager with a handful of programmers working for him could come up with several minor unpublished findings like that, and they would all add up to several wins per season.

 
At Thursday, October 20, 2011 2:51:00 PM, Anonymous Bill Waite said...

I don't know much about GMs, though. I imagine there are GMs out there making objectively bad draft decisions, and I imagine there's a blogger out there complaining about all the bad decisions, but I personally don't know anything about drafting, trading, free agent contract negotiations or anything like that.

 
At Thursday, October 20, 2011 4:04:00 PM, Anonymous mettle said...

iTo begin, a few of the problems of using $/win (or WOWY) as a way of evaluating GMs:
1) GMs don't have the chance to build their system from scratch and presumably it would take something like 6 years for the org to completely reflect their decisions. So, most GMs performance is based on the decisions made by the previous guy; you can only really start to look at $/win after year 4 or 5; how many GMs make it that far? Removing the first 5 year kills your sample size.

2) You really want to look at (payroll + development costs)/win and not just payroll, since isn't the GM in charge of the farm system and hiring the people running the farm as well?

3) Is it the GM that's responsible for marketing the team and hiring all the people whose job it is to extract revenue from fans? If so, that should somehow be reflected.

Therefore, I think you need to determine the GM value-added as reflecting signing the correct FAs and making the correct trades and drafting the right players (and perhaps #3). This would be a huge pain, but I don't see any other way.

I guess we would start by defining the average GM as having a net WAR+/- of 0, right? So, that would reflect getting equal return in trades for what's given up, and for paying the right amount for FAs.

It's not clear to me that we want or need the notion of the replacement-level GM here, right, especially because there are so few?

Then, you'd need to essentially evaluate every move they made, but again, you could only do that many many years after the fact. For example, the Bagwell-Andersen trade wouldn't have looked too terrible for Lou Gorman (only -13WAR), when he was fired in 1993.

Not an easy problem...

 
At Friday, October 21, 2011 2:52:00 PM, Anonymous Bill Waite said...

I think the best way to evaluate GMs (given small sample size and long lag time) is to look at the expected return on their decisions based on the most accurate published statistical model. There must be some simple and effective models for predicting how well a college player will perform in the MLB given his college stats, and I'm sure that some GMs are routinely making draft decisions that are unquestionably wrong.

It wouldn't help you distinguish between the best and second-best GMs, given that they will probably both make reasonable decisions according to the existing models AND the best GM should know a few things we don't know, but it would at least give you an idea of the order of magnitude of the difference between a competent GM and the worst GMs.

 
At Friday, October 21, 2011 3:04:00 PM, Anonymous Bill Waite said...

But mettle makes a good point; hiring good coaches and creating a good farm system is very important and would be very difficult to measure.

On another note, it occurs to me that because GM value is so much harder to measure than player value, the best GMs SHOULD be dramatically underpaid compared to players with the same win value.

Also, since GMs don't have the same PR value as a famous athlete, they are genuinely worth less than a famous athlete who provides the same number of wins.

And since the value of a great organization comes from not just the one guy in the GM spot, but from every guy working under him, the GM's salary shouldn't be anywhere near the full win value of the organization he puts together.

So a GM getting paid what appears to be the equivalent of one win is not evidence that a good organization is worth one win; it's evidence that a good organization is probably worth quite a bit more than one win. (Just looking at salary in a vacuum, I'd say maybe 3-4 wins?)

 
At Monday, October 24, 2011 5:57:00 PM, Blogger Don Coffin said...

I'm really late to this, for which I apologize. I took the data from Matt Swartz's study and regresses actial wins (WINS) on his measure of sabremetric management (SABR) and payroll in millions (PAY:

WINS = 65.2 + 1.2*SABR + 0.14*PAY
(12.5) (0.6) (4.5)

Adjusted R-sq = 0.44

(t-statistics in parentheses).

So the measure of sabr-ness is not statistically significant in this extremely simpe model. However, thaking the coefficient at face value, it suggests that the difference between the most sabrmetric team (Red Sox, 3.93) and the least (KC, 1.7) is 2.23*1.2 = 2.76 wins a year. The difference between best and average would be about 1.7. For whatever this is worth, which (I would suggest) is not a whole lot...)

 
At Monday, October 24, 2011 6:01:00 PM, Blogger Don Coffin said...

Incidentally, the regression also suggests that, based on whatever year those data were for, the payroll difference, highest (Yankees; $215 million) to lowest (Marlins; $35 million) would be worth about 25 wins...

 
At Monday, October 24, 2011 6:10:00 PM, Blogger Phil Birnbaum said...

Thanks, Doc! I was wondering about the coefficient myself. 2.76 wins from best to worst does seem reasonable, or at least in line with Theo Epstein being paid 1 win.

BTW, the $7 million per win you came up with is consistent with previous regressions for other seasons.

 
At Wednesday, October 26, 2011 5:17:00 PM, Anonymous Bill Waite said...

Doc's analysis is a good one, but I would argue that Schwartz's methodology (surveying his coworkers about the PERCEPTION of whether sabermetrics is used often or rarely) is likely to understate the impact sabermetrics has on teams who use it well and overstate the role of sabermetrics on teams that spout a lot of PR bullshit without fully understanding the statistics they're trying to use. It may also fail to capture the difference between the true worst teams and sanely managed teams with a non-sabermetric public image.

But, given the baseline number of 2.7, I would be surprised if the actual best management in the league is more than 5.4 payroll-adjusted wins better than the actual worst management in the league.

I still believe that a team COULD do better than that; there are just SO MANY decisions that a large organization like that has to make, and if you could squeeze 0.1 extra wins out of every draft pick and every trade, $100,000 extra dollars out of every contract negotiation, .02 extra runs out of every pitching change, etc., there just have to be a ton of wins on the table.

But maybe the practical realities of running a large organization prevent teams from getting more than 60-70% coverage of their decisions; maybe some of the decisions I'm talking about cost more to figure out than you could gain from getting them right. I don't know.

 

Post a Comment

<< Home