Tuesday, February 21, 2012

Why is the SD of the sum proportional to the square root?

(Warning: Math/teaching statistics post. No sports this time.)

----

If you take the sum of two identical independent variables, the SD of the sum is *not* two times the SD of each variable. It’s only *the square root of two* times the SD.

There’s a mathematical proof of that, but I’ve always wondered if there was an intuitive understanding of why that is.

First, you can get a range without doing any math at all.

It seems obvious that when you add up two variables, you’re going to get a wider spread than when you just look at one. For instance, you can see that it’s easier to go (say) 10 games over .500 over two years, than over just one year. You can see that if you roll one die, it’s hard to go two points over or under the average (you have to roll a 1 or 6). But if you roll 100 dice, it’s easier to go two points over the average (You can roll anything except 349, 350, or 351.)

So, the spread of the sum is wider than the spread of the original. In other words, it's more than 1.00 times the original.

Now, if you just doubled everything, it’s obvious that the multiplier would be 2.00, that the curve would be twice as wide. A team that goes +10 over one season will go +20 over two seasons. The team that goes -4 will go -8. And so on. So, if you do that, it's exactly 2.00 times the original.

But, in real life, you don't just double everything -- there’s regression to the mean. The team that goes +10 one random season probably will go a lot less than +10 the second random season. And if you roll 6 on the first die, you're probably going to roll less than 6 the second die. So the curve will be less stretched out than if you just doubled everything.

That means the multiplier has to be less than 2.00.

So, the answer has to be something between 1.00 and 2.00. The square root of two is
1.41, which fits right in. It seems reasonable. But why exactly the square root of two? Why not 1.5, or 1.3, or 1.76, or Pi divided by 2?

I’m looking for an intuitive way to explain why it’s the square root of two. I’ve come up with two different ways, but I’m not really happy with either. They’re both ways in which you can see how the square root comes into it, but I don’t think you really *feel* it.

Here they are anyway. Let me know if you have improvements, or you know of any others. I’m not looking for a mathematical proof -- there are lots of those around -- I’m just looking for an explanation that lets you say, “ah, I get it!”

-------

Explanation 1:

First, I’m going to cheat a bit and use something simpler than a normal distribution. I’m going to use a normal six-sided die. That’s because of my limited graphics skills.

So, here’s the distribution of a single die. Think of it as a bar graph, but using balls instead of bars.


Part of my cheating is that I’m going to use the shortcut that the SD of a distribution is proportional to its horizontal width. That’s not true for normal distributions, but if you pretend it is, you’ll still get the intuitive idea.

Now, since we’re adding two dice, I’m going to prepare a little addition table with one die on the X axis, and another on the Y axis. The sums are in white:




Now, I’m going to take away the axes, and just leave the sums:




The balls represent the distribution of the sum of the dice. We want the standard deviation of this distribution. That is, we want to somehow measure its spread.

We can’t do it just like this, because the sums seem scattered around, instead of organized into the graph of a distribution. But we can fix that, just by turning it 45 degrees. I’ll also add some color, to make it easier to see:




See? Now the distribution is in a more familiar format. All the 2s are in a vertical line, and all the 3s, 4s, and so on. (Well, they should be exactly vertical, but they’re a bit off … my graphics abilities are pretty mediocre, so I couldn’t get that square to be exactly square. But you know what I mean.)

It’s like the usual bar graph you see of the distribution of the sum of two dice, except that the bar extends above and below the main axis, instead of just above. (If you want, imagine that the column of 7s is sitting on the floor. Then let gravity drop all the other columns down to also rest on the floor. That will give you the more standard bar graph.)

Now, in the above diagram, look at the main horizontal axis, the one that goes 2-4-6-8-10-12. The length of that axis is the spread of the graph, the one that we’re using to represent the standard deviation. What’s that length?

Well, it’s the hypotenuse of a right triangle, where the two sides are the spread of the original die.



By the Pythagorean theorem -- the real one, not the baseball one -- the diagonal must be exactly the square root of two times the original.

As I said, I’m not thrilled with this, but it kind of illustrates where the square root comes from.

-----

Method 2:

If I just take one die and double it, I get twice the variance. This looks like this:



The blue and green are the two SDs of 1. The pink line just goes from beginning to end, and its length represents the SD of the sum. Obviously, that SD is 2.

Now,
suppose I take one die, but, instead of just doubling it for the sum, I insetad add the amount on the bottom of the die. I always get 7 (because that’s how dice are designed). That means the bottom is perfectly negatively correlated with the top. The variance of the top is 1, the variance of the bottom is 1, but the variance of the sum is zero (since the sum is always the same). That looks like this, with the "second die" arrow going exactly the opposite direction of the first. The pink line isn't a line at all -- which is to say, it's a line of length zero, since the beginning is the same as the end.



Now, what if I take the one die and roll it again? Then, the second die is completely independent of the first die. It doesn’t go right, and it doesn't go left. It has to go in a direction that’s independent of the first direction. Like, straight up:

Now, the distance from beginning to end is the hypotenuse of the triangle, which is the square root of 2! Which is what we were trying to show.

----------

As I said, I’m not thrilled with these explanations. Are there better ones?



Labels:

Wednesday, February 15, 2012

Absence of evidence vs. evidence of absence

People tell me that Albert Pujols is a better hitter than John Buck. So I did a study. I watched all their at-bats in August, 2011. I observed that Pujols hit .298, and Buck hit .254.

Yes, Pujols' batting average was better than Buck's, but the difference wasn't statistically significant. In fact, it wasn't even close: it was less than 1 standard deviation!

So, clearly, August's performance shows no evidence that Pujols and Buck are different in ability.

Does that sound wrong? It's right, I think, at least as I understand how things work in the usual statistical studies. If you fail to reject the null hypothesis, you are entitled to use the words "no evidence."

Which is a little weird, because, of course, it *is* evidence, although perhaps *weak* evidence. I suppose they could have chosen to say "not enough" evidence, or "insufficient" evidence, but that carries with it an implication that the null hypothesis is correct. If I say, "the study found no evidence that whites are smarter than blacks," that sounds fine. But if I say, "the study found insufficient evidence that whites are smarter than blacks," that sounds racist.

The problem is, if you don't really know what "no evidence" really means, you might get the wrong impression. You might have 25 different studies testing whether Pujols is better than Buck, each of them using a different month. They all fail to reject the hypothesis that they're equal, and they all say they found "no evidence". (That's not unlikely: to be significant at .01 for a single month, you'd have to find Pujols outhitting Buck by about 200 points.)

And you think, hey, "25 studies all failed to find any evidence. That, in itself, is pretty good evidence that there's nothing there."

But, the truth is, they all found a little bit of evidence, not *no* evidence. If you multiply *no* evidence by 25, you still have *no* evidence. But if you multiply a little bit of evidence by 25, now you have *enough* evidence.

------

There's an old saying, "absence of evidence is not evidence of absence." The idea is, just because I look around my office and don't see any proctologists or asteroids, it doesn't mean proctologists or asteroids don't exist. I may just not be looking in the right place, or looking hard enough. Similarly, if I look at only one month of Pujols/Buck, and I don't see a difference, it doesn't mean the difference isn't there. It might just mean that I'm not looking hard enough.

This is the point Bill James was making in his "Underestimating the Fog." We looked for clutch hitting, and we didn't find it. And so we concluded that it didn't exist. But ... maybe we we just need to look harder, or in different places.

What Bill was asking is: we have the absence of evidence, but do we have the evidence of absence?

------

Specifically, what *would* constitute evidence of absence? The technically-correct answer: nothing. In normal statistical inference, there's actually no evidence that can support absence.

Suppose I do a study of clutch hitting, and I find it's not significantly different from zero. But ... my parameter estimate is NOT zero. It's something else, maybe (and I'm making this up), .003. And maybe the SD is .004.

If I think clutch hitting is zero, and you think it's .003, we can both point to this study as confirming our hypotheses. I say, "look, it's not statistically significantly different from zero." And you say, "yeah, but it's not statistically significantly different from .003 either. Moreover, the estimate actually IS .003! So the evidence supports .003 at least as much as zero."

That leaves me speechless (unless I want to make a Bayesian argument, which let's assume I don't). After all, it's my own fault. I didn't have enough data. My study was incapable of noticing a difference between .000 and .003.

So I go back to the drawing board, and use a lot more data. And, this time, I come up with an estimate of .001, with an SD of .002.

And we have the same conversation! I say, "look, it's not different from zero." And you say, "it's not different from .001, either. I still think clutch hitting exists at .001."

So I go and try again. And, every time, I don't have an infinite amount of data, so, every time, my point estimate is something other than zero. And every time, you point to it and say, "See? Your study is completely consistent with my hypothesis that clutch hitting exists. It's only a matter of how much."

------

What's the way out of this? The way out of this is to realize that you can't use statistics to prove a point estimate. The question, "does clutch hitting exist?" is the same as the question "is clutch hitting exactly zero?". And, no statistical technique can ever give you an exact number. There will always be a standard error, and a confidence interval, so it will always be possible that the answer is not zero.

You can never "prove" a hypothesis about a single point. You can only "disprove" it. So, you can never use statistical techniques to demonstrate that something does not exist.

What we should be talking about is not existence, but size. We can't find evidence of absence, but we can certainly find evidence of smallness. When an announcer argues for the importance of being able to step up when the game is on the line, we can't say, "we studied it and there's no such thing". But we *can* say, "we studied it, and even under the most optimistic assumptions, the best clutch hitter in the league is only going to hit maybe .020 better in the clutch ... and there's no way to tell who he is."

Or, the short form -- "we studied it, and the differences between players are so small that they're not worth worrying about."

------

But aren't there issues where it's important to actually be able to disprove a hypothesis? Take, for instance, ESP. Some people believe they can do better than chance at guessing which card is drawn from an ESP deck.

If we do a study, and the subject guesses exactly what you'd expect by chance, you'd think that would qualify as a failure to find ESP. But when you calculate the confidence interval, centered on zero, you might have to say, "our experiment suggests that if ESP exists, its maximum level is one extra correct guess in 10,000."

And, of course, the subject will hold it up, and triumphantly say, "look, the scientists say that I might have a small amount of ESP!!"

What's the solution there? It's to be common-sense Bayesian. It's to say, "going into the study, we have a great deal of "evidence of absence" that ESP doesn't exist -- not from statistical tests, but from the world's scientific knowledge and history. If you want to challenge that, you need an equal amount of evidence."

That makes sense for ESP, but not for clutch hitting. Don't we actually *know* that clutch hitting talent must exist, even at a very small level? Every human being is different in how they respond to pressure. Some batters may try to zone out, trying to forget about the situation and hit from instinct. Some may decide to concentrate more. Some may decide to watch the pitcher's face between pitches, instead of adjusting their batting glove.

Any of those things will necessarily change the results a tiny bit, in one direction or the other. Maybe concentration makes things worse, maybe it makes it better. Maybe it's even different for different hitters.

But we *know* something has to be different. It would be much, much too coincidental if every batter did something different, but the overall effect is exactly .0000000.

Clutch hitting talent *must* exist, although it might be very, very small.

So why are we so fixated on zero? It doesn't make sense. We know, by logical argument, that clutch hitting can't be exactly zero. We also know, by logical argument, that even if it *were* exactly zero, it's impossible to have enough evidence of that.

When we say "clutch hitting doesn't exist," we're using it as a short form for, "clutch hitting is so small that, for all intents and purposes, it might as well not exist."

------

When the effect is small, like clutch hitting, it's not a big deal. But when the effect might be big, it's a serious issue.

A lot of formal studies -- not just clutch hitting or baseball -- will find they can't reject the null hypothesis. They usually say, "we found no evidence," and then they go on to assume that that also means they can assume that what they're looking for doesn't exist.

They'll do a study on, I don't know, whether an announcer is right that playing a day game after a night game affects you as a hitter. And they'll get an estimate that says that batters are 40 points of OPS worse the day after. But it's not statistically significant. And they say, "See? Baseball guys don't know what they're talking about. There's no evidence of an effect!"

But that's wrong. Because, unlike clutch hitting, the confidence interval does NOT show an effect that "for all intents and purposes, might as well not exist." The confidence interval is compatible with a *large* effect, of at least 80 points. (That is, since 2 SD is enough to drop from 40 points to zero on one side, it's also enough to rise from 40 points to 80 points on the other side.)

So it's not that there's evidence of absence. There's just absence of evidence.

And that's because of the way they did their study. It was just too small to find any evidence -- just like my office is too small to find any asteroids.



Labels: ,

Tuesday, February 14, 2012

Two new "Moneyball"-type possibilities

I'm usually doubtful that significant "Moneyball"-type inefficiencies still exist in sports. But, recently, two possibilities came up that got me wondering.

First, in a discussion about baseball player aging, commenter Guy suggested that there are lots of good young players kept in the minors when they're good enough to be playing full-time in the majors. He mentions Wade Boggs, whom the Red Sox held back in the early 80s in favor of Carney Lansford.

It's certainly a possibility, especially when you consider the Jeremy Lin story. Of course, baseball and hockey are different from basketball and football, because they have minor leagues in which players get to show their stuff. But, still.

Second, and even bigger, is something Gabriel Desjardins discovered.

For the past several seasons, the NHL has been keeping track of the player who draws a penalty -- that is, the victim who was fouled. Desjardins grabbed the information and tallied the numbers.

Most of the players near the top of the list are who you would expect -- Crosby, Ovechkin, and so on. But the runaway leader is Dustin Brown, of the Los Angeles Kings.


Over the past seven seasons, Brown drew 380 opposition penalties. Ovechkin was second, with 255; Ryan Smyth was twentieth, at 181.

That means the difference between first and second place was almost twice the difference between second and twentieth place. Dustin Brown is exceptionally good at getting his team a power play.

Desjardins writes,

"Incidentally, 380 non-coincidental penalties is worth roughly $33M in 2012 dollars relative to the league average, and quite a bit more relative to replacement level. ... Dustin Brown has made roughly $15M so far in his career, making him one of the biggest deals in the entire league."


Wow. If you had tried to convince me that you could find an official NHL stat that would uncover $33 million worth of hidden value, I wouldn't have believed you. But there it is.




Labels: , , ,

Wednesday, February 08, 2012

A research study is just a peer-reviewed argument (part II)

I've always said that a regression doesn't speak for itself. A regression is just manipulated data. To support a hypothesis, you need more that just data: you need an argument about why that data matters.

I wrote about that here, when I said that a research paper is just a peer-reviewed argument. Some commenters disagreed. They argued that science is, and has to be, objective -- whereas, arguments are always subjective.

Having thought about it further, I don't understand how it isn't more obvious that there's always a subjective argument involved. At the very least, if you find a significant association between X and Y, you have to at least suggest whether X causes Y, whether Y causes X, or whether something else causes both.

So, I don't get it. For those of you who don't believe that studies need to argue subjectively, what is it you're thinking?

---

Here's an example to let you be specific. It's an imaginary regression, where A, B, and C are used to predict X. I'm assuming .05 is the threshold for significance, but if you prefer a different level, feel free to change the p-values accordingly.

Here are the dependent variables, the coefficients, and the significance levels. An asterisk means the value is significantly different from zero.

A +0.15 p=0.05 *
B +0.13 p=0.08
C +0.16 p=0.04 *

What can you conclude?

Sure, you can say, "a unit increase in A was associated with a 0.15 increase in the dependent variable X, and that was statistically significantly different from zero." But that's not really a conclusion, that's just reading the results right off the regression. Papers wouldn't have a "conclusions" section if that was all they contained.

So, now, let me ask you: what would you write in your conclusions that's not subjective?

Labels: , ,

Tuesday, February 07, 2012

A Don Cherry / Darryl Sittler tracer

Today is the 36th anniversary of Darryl Sittler's 10-point night, and a few websites (like this one) are linking to a Don Cherry clip where he talks about the game.

Cherry says that the Leafs "showed no mercy" on the Bruins, and says you should never embarrass another team, because it might come back to haunt you. He says that the Leafs gave Sittler a silver tea set after the game, "for murdering us like that. I got the paper, I cut out the picture, and every time we came to Maple Leaf Gardens after that, I put this picture up ..."

"For the next three years, we never lost to the Leafs at the Gardens."

So as not to be accused of picking on Ken Dryden, I looked it up.

The presentation of the tea set was Friday, April 9, 1976, before a playoff game against the Penguins (according to the next day's Globe and Mail). The next two games in Toronto against the Bruins were actually Leaf wins:

12/09/76: Leafs 7, Bruins 5
11/27/76: Leafs 4, Bruins 2

But after that, the Leafs didn't beat the Bruins at home until the 1982-83 season:

03/26/77: Bruins 7, Leafs 5
11/19/77: Bruins 3, Leafs 1
02/15/78: Bruins 4, Leafs 2
04/08/78: Bruins 3, Leafs 1
10/28/78: Bruins 5, Leafs 3
12/05/78: Bruins 5, Leafs 1
12/27/78: Bruins 1, Leafs 1
04/04/79: Bruins 3, Leafs 3
11/17/79: Bruins 2, Leafs 0
04/02/80: Bruins 5, Leafs 2
12/27/80: Bruins 6, Leafs 3
03/11/81: Bruins 3, Leafs 3
11/21/81: Bruins 5, Leafs 3

10/27/82: Leafs 4, Bruins 1

So, if Cherry remembers the events correctly, it must be that he started posting the photo only after the game of 11/27/76. And, although it was at least five years until the Leafs won again, rather than just three, Cherry left the Bruins after the 1978-79 season, so he might just be referring to his own tenure there.

Thanks to the Hockey Summary Project for the scores.


(Note: This was cross-posted to the SIHR mailing list.)


Labels: , , ,

Friday, February 03, 2012

Bettors don't regress to the mean enough, investment firm claims

An investment firm has a Super Bowl prediction method that they think can beat the spread, according to this Bloomberg story.

The firm, Analytic Investors LLC, does this: for each of the NFL's 32 teams, they figure out how much money you would have made betting that team during the regular season (betting them outright, it appears, not betting them against the spread).

In 2011, betting the 49ers would have generated an investment return of 52.9 percent this year (I don't know how that's calculated, but that's all they say), making them the team with the highest "alpha". The Colts were the worst, at minus 57.6 percent. The Giants were +32.3 percent, and the Patriots +16.1 percent.

For the Super Bowl, the method says, you should bet on the team that returned the least during the regular season. That's because you should avoid the team with the higher return, because bettors who made more money off their team are "overreacting to information."

"[The Giants] have been the hotter team. They are like the cocktail party stock that everyone’s talking about, that some people have made a lot of money on.”


OK, fair enough. But ... is there evidence that this works? The article doesn't give any, except to say that it's beat the spread for the last eight consecutive Super Bowls. That doesn't mean much, of course, since nobody's claiming the system is accurate enough for eight in a row to be expected.

(Suppose the method predicts with a 60% success rate, which seems way optimistic. Then the chance of 8 in a row is around 1 in 60. At 50%, the chance is 1 in 256.)

This is probably just a publicity stunt to get some exposure for the firm. But it seems like an interesting hypothesis to check out. If a team outperforms expectations during the regular season, it probably did so by luck. And, it seems reasonable to suggest that maybe bettors misinterpret that luck as skill, and overweight the team's future chances.

You'd need a bigger study, of course. Suppose every year you looked at the last three games of the season, for the top 6 and bottom 6 teams in terms of "alpha" in the earlier weeks. That would give you maybe 25 games a year, 500 games over twenty years, 250 games for each group. Worth a shot. I don't have NFL data or betting line data, but I'm sure someone out there does.



Hat tip: Freakonomics


Labels: , ,

Monday, January 30, 2012

Do NHL teams get a boost after killing a two-man advantage?

In an OHL game I was watching the other day, one of the teams had a two-man advantage and didn't score. The announcer was disappointed that the shorthanded team to get a boost from having killed off the penalties, as conventional wisdom says they should.

Is conventional wisdom right? Now that I have access to a database of NHL games (thanks again to the Hockey Summary Project), I was able to check.

This study is basically the same format as the study I did on fights a few weeks back. I found all games from 1967-68 to 1984-85 where one team killed off a two-man advantage (of any length). Then, I found a random control game, which matched the score differential and the relative quality of the home and road teams. When I was done, I had two pools, each comprised of 1,703 games.

The teams that killed the penalties scored an average 0.26 more goals than their opponents from that point to the end of the game (actually, to the 17:00 mark of the third period). On the other hand, the control team scored only 0.12 more goals then their opponents.

That's statistically significant, at almost exactly 2 SDs.

I'll put that in chart form to make it easier to read, along with the SD. I use the term "killing teams" to mean the ones that actually killed off the two-man advantage.

Killing teams .... +0.26 goals (+/- 0.05)
Control teams .... +0.12 goals (+/- 0.05)
------------------------------------------
Difference ....... +0.14 goals (+/- 0.07)

At six goals per win, you'd have expected the extra goals to have resulted in around 40 extra wins. They actually resulted in 32 extra wins. Actually, 36 extra wins, minus 8 fewer ties:

Killing teams .... 836-604-263
Control teams .... 806-626-271
------------------------------------
Difference ....... +36 wins, -8 ties

So, should we conclude that killing off a two-man advantage causes a psychological boost? Well, not so fast. Because, after you take two consecutive penalties, the referee is very likely to try to even things up by giving future penalties to the other team.

The difference of +0.14 goals is almost exactly what you'd get from a single power play. So, if the result of surviving a two-man advantage is that you get one extra "free" power play in the remainder of the game, that would explain the results exactly.

As it turns out, it's not quite that high. It's only half that high. On average, the teams that survived being shorthanded two men got about half an extra power play in the remainder of the game:

Killing teams ... +.346 power plays rest of game
Control teams ... -.130 power plays rest of game
------------------------------------------------
Difference ...... +.476 power plays rest of game

That leaves about 0.07 goals per game as the unexplained difference. It's only 1 SD, which is no longer statistically significant. It's about the effect of half a power play. Or, with an average save percentage of .900, it works out to 7/10 of an additional shot on goal.

--------

We can also handle the penalty issue another way. We can insist that when we choose a control game for the real game, we make sure the control team was the lone who took the last penalty. That way, we'd expect some of the referee "evening up" difference to disappear. Perhaps not all of it, because a two-man advantage isn't the same as a one-man advantage -- but at least part of it.

The additional restriction reduced the sample size to 1,662 games; for the remaining 41 games, I couldn't find a suitable control.

As it turns out, the goal difference stays about the same, even though the penalty difference is significantly reduced:

Killing teams ... +0.25 goals (+/- 0.05)
Control teams ... +0.08 goals (+/- 0.05)
----------------------------------------
Difference ...... +0.17 goals (+/- 0.07)

Killing teams ... +.340 power plays rest of game
Control teams ... +.032 power plays rest of game
------------------------------------------------
Difference ...... +.308 power plays rest of game

The difference of .308 power plays accounts for around .04 goals of the observed .17 difference. That leaves .13, which is a little less than 2 SD from zero. Not statistically significant, but close. (Technically, it's even less than that, because the control games aren't completely independent. Also, when I ran the study a second time, I got +0.10 goals instead of +0.08, which lowers the difference. So think of the 1.9 SD as probably a bit too high.)

Strangely, though, there wasn't as much difference in game results; only the equivalent of 13.5 wins:

Killing teams ... 815-591-256
Control teams ... 807-610-245
------------------------------
Difference: +8 wins, +11 ties

Again at six goals per win, you'd expect 47 wins, not 13.5. What happened?

Well, it turns out that the "killing" teams spent a lot of their goals winning blowouts. For instance, in games won by six goals or more, they were 81-34. The control group was only 73-51.

In those games, the difference was 12.5 wins. That normally "costs" 75 goals, but, for these games, the difference was really around 150 goals. So, that accounts for 75 of the 282 goal difference right there.

The "killing" group also "wasted" goals in the 3- and 4-goal games. That was offset by the opposite effect in five-goal games, but not by much.

------

If you recall, we found the same effect when we looked at fighting: teams that started a fight appeared to score more goals, but not necessarily win more games.

What connects the two studies is ... penalties. It could be that teams that get penalized a lot win a lot of blowouts. Not necessarily because of cause-and-effect, but because it just so happened that, between 1967 and 1984, certain teams just happened to be high in both categories.

Or, it could be coincidence. Or, it could be something else.

------

For my bottom line, I'd say: after killing off a two-man advantage, teams did appear to benefit by about 1/7 of a goal. Half of that can be traced to referees calling fewer penalties against them in the remainder of the game.

The other half is unknown. It's not statistically significant, so you have to give serious consideration to the idea that it's just coincidence ... but the teams *did* appear to benefit, by around 0.07 goals.

Historically, the average size of the "boost" in a team's play after a two-man kill has been small: the equivalent of less than a single shot on goal over the remainder of the game.



Labels: , , ,

Friday, January 20, 2012

GiveWell: Overcomplicating research studies can cost lives

"GiveWell" is an organization that evaluates charities. Not just the usual things -- how well they're run, or how much money goes to administrative expenses -- but also how much good they do for the money they receive.

The idea is: if you have $100 to give to try to make the world a better place, shouldn't you give that $100 where it would give the most benefit? Not just to whoever shows up at your door that day, or whatever organization makes you feel guiltiest, or whoever's suffering kids look the cutest ... but, seriously, to where you can do the most good.

That might not appeal to everyone. If you donate to maximize your own good feelings, instead of the good your donation actually does, GiveWell's evaluations won't make much difference to you. Some people hate to say "no", and so they prefer to give $5 to each of the twenty charities that ask for money. Some people prefer to give to diseases that killed their loved ones, or diseases associated with heroes like Terry Fox. Some people give to causes that signal their political views. Most people prefer to give to help people in their own city or country, even when their dollars will save many more lives abroad.

(I've done all these things, and I'm bit embarrassed about some of them. But I'm not alone. I mean, people give money to the Children's Wish Foundation to send a terminally ill kid to Disneyland ... which is nice, but, that same amount of money might actually save ten lives if they sent it to Africa where kids are actually dying of things that are easily preventable. I'm not sure what's up with me, and my fellow humans, sometimes. But I digress.)

So, in at least one sense, GiveWell is to donors what sabermetrics is to Joe Morgan. It does analysis to reach conclusions that some might find uncomfortable.

However, in another sense, what GiveWell does is *unlike* sabermetrics, in that it usually doesn't try to get down to the third decimal place. It argues that it can evaluate charities heuristically, that the differences are big enough that they can figure out which charities are the best, using the charities' own reports. As I interpret what they're saying, GiveWell can very easily tell you whether a charity is a Danny Ainge or an Albert Pujols, and it can even tell you more subtle things, like whether a charity is a Joe Carter or an Albert Pujols. But it doesn't try to figure out if a charity is a Ryan Braun or an Albert Pujols. It will just tell you that both are recommended.

That is, GiveWell argues that its goals are better met by the transparency of its recommendations than by any detailed, opaque analyses.

Which is almost exactly what I argued in one of my recent posts -- that, in research, simplicity and transparency are more important than rigor. Simple studies make it much easier to understand the results and catch the inevitable errors. A gentleman from GiveWell, Elie Hassenfeld, read that post, and pointed me to a particular example of a serious error that his organization uncovered.

(Disclaimer: I don't really know much about GiveWell. However, I've been impressed by what I've seen, and at least two of the blogs I read and respect (here's one) say very good things about them. So my Bayesian evaluation of them is quite high.)

-----

As I said, GiveWell doesn't believe they need detailed statistical cost/benefit studies to decide which charities to recommend. However, charities themselves often use such analyses to decide where the money should be spent. There's a whole bunch of organizations and academics devoted to figuring out how to save the most lives for the fewest dollars.

With that objective, the Bill and Melinda Gates Foundation donated $3.5 million to fund a study, "Disease Control Priorities in Developing Countries". They published a report ranking various interventions on cost-effectiveness. The Gates Foundation didn't do that itself -- it was done jointly by The World Bank, the National Institutes of Health, the World Health Organization, and the Population Reference Bureau. Those sound like heavyweights in the world health field.

The results found that -- unsurprisingly to me -- hygiene promotion was the cheapest way to reduce death and disease. The second cheapest, though, was deworming. Specifically, "soil-transmitted helminth" (STH) deworming treatments.

After the report was released, the Gates Foundation provided another $4.4 million to promote the findings. And the findings did indeed attract serious attention. GiveWell writes,

The DCP2’s cost-effectiveness estimates for deworming have been cited widely to advocate a greater focus on treating STH infections, including in:

-- an article in The Lancet

-- a report by REACH, a consortium of large international NGOs and other organizations working to end child hunger, which labeled deworming one of 11 “promoted interventions”

-- the most-cited paper published in the journal International Health

-- an editorial by Peter Hotez, a co-founder of the Global Network for Neglected Tropical Diseases, which has received more than $40 million in funding from the Gates Foundation

-- work by charity evaluators, such as GiveWell, Giving What We Can, and the University of Pennsylvania’s Center for High Impact Philanthropy.


But, as GiveWell later discovered, it turns out the STH estimate was wrong.

That doesn't sound too serious, but here's the thing: it's not just that the estimate was wrong. It was wrong by a factor of almost ONE HUNDRED. The study said that you could save one "disability-adjusted life year" by spending $3.41 on deworming treatments. But, after correcting for the (acknowledged) errors in the study, the actual number was $326.43.

All these well-respected organizations, with serious researchers and serious money, wound up promoting a conclusion that was about as wrong as it could have been. Until the error was caught, then, effectively, 99% of the money devoted to STH treatment was wasted.

How did GiveWell catch the error? Subject matter expertise, mostly. In reading the report, they noticed that the STH estimate was much, much lower than other estimates they had seen. Instead of just assuming that this research was somehow better than the previous studies, they investigated.

That seems like just common sense, right? If you see a study that says an iPod can be bought for $3, when you know it usually costs $300, you should look again, shouldn't you? But that didn't happen until someone at GiveWell decided to figure out what was going on.

So they wrote to one researcher, who sent them to other researchers, who sent them complicated spreadsheets. They tried to figure those out, but they couldn't, so they wrote back and forth with questions and explanations. They were referred to still another researcher, who sent them a copy of yet another study that was the source of some of the data.

Eventually, they figured out where the issues were ... if you want a full explanation, it's in their post. It was a lot of detailed, technical effort to figure out what went wrong, and which parameters were in error.

GiveWell's conclusions:

We believe that the errors we’ve found in the estimate would have been caught by a helminth expert independently examining the estimate. Therefore, the presence of these errors implies to us that there has been no such examination. If this is the case, it would argue against the reliability of the DCP2’s estimates in general.

We’ve previously argued for a limited role for cost-effectiveness estimates; we now think that the appropriate role may be even more limited, at least for opaque estimates (e.g., estimates published without the details necessary for others to independently examine them) like the DCP2’s.

More generally, we see this case as a general argument for expecting transparency, rather than taking recommendations on trust - no matter how pedigreed the people making the recommendations. Note that the DCP2 was published by the Disease Control Priorities Project, a joint enterprise of The World Bank, the National Institutes of Health, the World Health Organization, and the Population Reference Bureau, which was funded primarily by a $3.5 million grant from the Gates Foundation. The DCP2 chapter on helminth infections, which contains the $3.41/DALY estimate, has 18 authors, including many of the world’s foremost experts on soil-transmitted helminths.

Absolutely right. You can't substitute credentials for subject matter expertise, and you can't substitute complexity for transparency.

And, one thing I would add: when a study appears to discover that you can get benefits at 99% off the original, well-accepted price ... you have to be suspicious about accepting that conclusion, even if you have no other reason to believe there was any mistake.

-----

P.S. GiveWell expands on the theme here.



Labels: , , ,

Sunday, January 15, 2012

Are more NHL penalties called in back-to-back games?

In a comment to one of the posts on "make-up" penalties, J.-P. Martel wrote,

"... blow-outs can easily lead to situations that get out of hand, so referees may call penalties on the leading team so that the trailing team still thinks it has a chance to come back, rather than resort to fighting to "prepare" the next game between the two teams.

Actually, you may want to check penalties in the second half of the third period when the teams' next game is (or may be, depending on outcome) against each other (particularly in the playoffs), as opposed to when it's not."


So I did. And, J.-P. is right, it looks like there's something there.

I found all cases from 1967-68 to 1984-85 where teams played back-to-back games (regular season only). Then, I formed three groups:

-- first game of back-to-back games
-- second game of back-to-back games
-- other games that year between those two teams

It turns out that, overall, there are more penalties than usual in the first game, and fewer penalties than usual in the second game:

First game .... 12.36
Second game ... 10.87
Other games ... 11.77

Broken down by periods:

-------------- Gm 1 --- Gm 2 --- Other
--------------------------------------
Period 1 ..... 4.78 ... 3.98 ... 4.37
Period 2 ..... 4.25 ... 3.75 ... 4.12
Period 3 ..... 3.32 ... 3.13 ... 3.26
--------------------------------------
Total ....... 12.36 .. 10.87 .. 11.77

So: there's 0.6 extra penalties in the first game, and 0.9 fewer penalties in the second game.

I thought the second game would be dirtier because the player are holding recent grudges from the previous game, but the numbers show the opposite. The players seem to be more aggressive early, rather than late. In fact, more than half the "first game" effect happens in the first period. By contrast, a large "second game" effect seems to last two periods rather than one.

Most of the differences are statistically significant, which suggests that they're all real. For those scoring at home, here are the standard errors:

-------------- Gm 1 --- Gm 2 --- Other
--------------------------------------
Period 1 ..... 0.16 ... 0.12 ... 0.06
Period 2 ..... 0.13 ... 0.11 ... 0.06
Period 3 ..... 0.15 ... 0.12 ... 0.07
--------------------------------------
Total ........ 0.30 ... 0.24 ... 0.13

Finally, coming back to J.-P.'s hypothesis about the second half of the third period of the first game, here are the numbers:

First game .... 1.76
Second game ... 1.63
Other games ... 1.67

So, yes, there's a small effect where, when the teams are going to meet again next game, the referee calls more penalties than normal in the last ten minutes of the third period. Whether that's because of the referee, or the players, we can't tell.

Taken alone, these differences aren't statistically significant. But, considering they match the pattern, and the broader picture is statistically significant, we can be fairly confident that this is a real effect we're seeing.

That's actually why I saved J.-P.'s scenario for last, so I could first show that the effect is probably real and not just random.

-----

UPDATE, 1/15/2012:

Technical note: the "other games" rows and columns are weighted by games, rather than matchups. Suppose teams A and B had back-to-back games, and so did C and D. But A and B met only 2 other times that year, while C and D met 4 other times. That means that C/D will be overrepresented in the "other games" column.

If I reweight that column so A/B and C/D get equal weight, the results change just a little bit. These are the revised "other" columns:

Overall ...... 11.48 (was 11.77)
1st period .... 4.24 (was 4.37)
2nd period .... 4.07 (was 4.12)
3rd period .... 3.15 (was 3.26)
Last 10 min ... 1.59 (was 1.67)



Labels: ,

Friday, January 13, 2012

Do hockey fights lift a team's performance? Part II

The previous post was a study on NHL fights. It found that, generally, a fight doesn't help the team that it's sometimes said to help (the team that's behind in the game, for instance), but in one particular case, MAYBE it did. That was the case where:

(a) one team was behind in the game
(b) that team fought more regularly than the other team, and
(c) the player fighting also fought more often than the other team's fighter.

In that situation, that team appeared to benefit by around 0.13 goals, as compared to a similar team that didn't fight. That was about the same as one extra power play.

However, the result was not statistically significant, being only 1 SD away from zero. Still, I left it at least a little bit open whether the effect *might* be real.

Tom Tango is more skeptical than that:


It’s not monkeys at a typewriter creating Shakespeare, but it’s close.


Well, I have some more evidence that supports that point of view.

I repeated the study 27 times, to get a larger sample of random control games. (I didn't pick the number 27 beforehand; I just ran the thing over and over until I got sick of it.) Here's the average of those 27 runs:

Actual teams .... -0.18 goals
Control teams ... -0.29 goals
------------------------------
Difference ...... +0.11 goals

To remind you what this means: the fighting team meeting the conditions was outscored by its opponent by 0.18 goals over the rest of the game. On the other hand, the control teams, which were selected randomly from games which matched as closely as possible (except for the fight), got outscored by 0.29 goals.

So, it looks like the team that fought gained 0.11 goals per game. As I said, that result is not statistically significant.

But now, here's the new thing. Even though the fighting team gained 0.11 goals, it actually lost more games. Here are the records, in W-L-T format:

Actual teams .... 52-274-38
Control teams ... 49-267-48
----------------------------
Difference ...... -2 wins

So, even though the fighting teams did better on the scoreboard, they did worse in terms of winning games. Actually, they won three extra games, but they lost seven more and tied 10 fewer. That adds up to minus four points in the standings, which is why I write "-2 wins". (I'm ignoring the "pity point" for an overtime loss.)

You wouldn't expect this to happen, that you score more goals but lose more games. The better your goal differential, the better your outcomes should be. I think I saw Gabriel Desjardins write, somewhere, that six goals equals one win. The observed difference of +0.11 goals per game, over 364 games, equals around 40 goals, which is almost seven wins.

But instead of winning seven extra games, the fighting teams *lost* two extra games.

Why did this happen? I think it's just luck, well within the bounds of random error. I think the +0.11 goals per game is random chance, I think the -2 wins is random chance, and I think the discrepancy between the two results is also random chance.

In any case, if you don't like all this talk of significance levels and randomness, you can just summarize like this: overall, the teams that fought wound up very slightly better on the scoreboard, but very slightly worse in the standings.





Labels: , , ,

Tuesday, January 10, 2012

Do hockey fights lift a team's performance?

It's been said that when an NHL team needs a lift, a fight can jolt it out of its complacency and make it better. And, just a few days ago, the media cited a study by researcher Terry Appleby, of powerscouthockey.com, showing that momentum (in terms of shots on goal) usually increases for at least one team after a fight.

But, if *either* team can benefit from a fight, what's the point? You want to know if *your* team can benefit from a fight, at least more than the other team does.

The problem is: how can you know that? A fight involves both teams, so if it helps one, it hurts another by the same amount. If you look at both teams, you'll always find the total effect to be zero.

So, the "fighting helps a team" theory has to say *which* team is helped. The most logical interpretation would be that that the fight helps the team that instigated it.

If you're going to study that, you need to know which team is the instigating team. That's tough to figure out from historical data. But, one shortcut would be to assume that the team that generally gets involved in more fights is the team that's more likely to have instigated. The 1974-75 Philadelphia Flyers took 76 fighting penalties (actually, 76 offsetting majors, which I used as a proxy for fights). That same season, the expansion Kansas City Scouts took only 19. It seems fair to assume that if a fight broke out at a Flyers/Scouts game, it was the Flyers who were likely responsible.

On that assumption, I decided to check.

Using data from the Hockey Summary Project, I looked at fights between 1967-68 and 1984-85, and checked to see how the more-likely-to-fight team did in the remainder of the game. Then, I found a control game to match it with. The result was two large groups of games, which could then be compared.

I'll give you an example of how the controls were found.

On Feburary 16, 1969, the Bruins played the Black Hawks at Chicago Stadium. Just as the first period ended, with the score 2-0 Chicago, the Bruins' Don Awrey got into a fight against Stan Mikita of the Hawks.

I looked for a game to serve as the control for that Boston/Chicago game. What I wanted was:

1. A game in the same season, the season before, or the season after;
2. ... where the home team had the same size lead at that same time of the game;
3. ... and where the two teams were of roughly similar relative quality.


#1 and #2 were non-negotiable (except that all differences of 4 or more goals were considered the same). But, for #3, the quality only had to be close, within two goals (which I'll explain in a minute).

I started pulling random games until I found one that matched all three requirements. In this particular case, the control wound up being the Bruins vs. Rangers game of February 23, 1969.

That game qualifies under the rules because

1. 1968-69 is in the same season as the original;
2. That game had the home team also leading by two goals at 20:00 of the first period, and
3. The two sets of teams are of similar relative quality.

Now, let me explain #3.

In 1968-69, the Bruins were +82 in goal differential (303 goals for, 221 against). The Black Hawks were +34 (280-246). So, for the original game, the home team was 48 goals worse than the visiting team.

Since the control game was the same year, the Bruins were still +82. The Rangers were +35 (231-196). So, in the control game, the home team was 47 goals worse than the visiting team.

Since "47 goals worse" is within two goals of "48 goals worse," that's close enough for the Bruins/Rangers game to serve as a control. If it hadn't been within two goals -- which is most of the time -- that game wouldn't have qualified under #3, and I would have tried another random game. (If there were absolutely no games that qualified under #3, I would have taken the one where the team quality was closest in goals. If none of the random games had qualified under #1 and #2, I would have thrown the original game out of the study -- but that never happened.)

OK, so now we have our real game, and our control game.

Which team in our real game are we going to expect to have gotten the "lift" from the fight? In 1968-69, the Bruins had 41 fights, but the Black Hawks had only 20. So, the assumption is that the fight was more the work of the Bruins, and they should be the ones expected to benefit.

How did the Bruins do in the rest of the game relative to the Black Hawks?

Well, the final score was 5-1 Hawks. Since it was 2-0 at the time of the fight, that means the Black Hawks outscored the Bruins 3-0 in the remainder of the game. In other words, a "minus 3" goal differential for the visiting Bruins. (I excluded any goals in the last three minutes of the third period, to make sure empty-net goals didn't screw things up.)

What about the control game? That game actually wound up 9-0 Rangers, which means 7-0 Rangers from the fight to the end of the game. Since the "real" game was relative to the Bruins, the visiting team, we also want to express the control game from the standpoint of the visiting team. So that's "minus 7".

So, our score so far is:

Actual games: -3.0 goal differential for the fighting team

Control group: -7.0 goal differential for the control team

So far, it looks like fighting helps, by four goals a fight!

Of course, that's only one game. I repeated this process for every fight from 1967-68 to 1984-85. Actually, not *every* fight. First, I included only fights where one team appeared to be significantly more aggressive than the other (specifically, where the two teams were 10 or more fighting penalties apart for the season). Second, I included only first- or second-period fights, to increase the amount of time for the "lift" effect to make itself felt.

Even with those restrictions, there were 2,834 fights total. The results:

Fighting teams ... -0.04 goals
Control group .... -0.02 goals

The team with more fights was 0.04 goals worse than the other team over the remainder of the game. It "should have" been 0.02 goals worse. (Both numbers are negative probably because the teams that got in more fights were slightly worse teams overall than their opponents.)

So, there seems to be a small, negative effect: a team loses one additional goal for every 50 fights. But, that difference isn't even close to statistically significant. It's less than one SD from zero. (The two individual SDs are about 0.04 each, so the SD of the difference is around 0.06.)

Conclusion: it doesn't appear that fighting helps a team.

-----

Maybe a difference of 10 fights a year isn't enough to separate the two teams? I redid the study, but required the teams to be 20 fighting penalties apart. That reduced the sample size to 1,581 each group. The results were about the same (the +/- in parentheses is the standard error):

Fighting teams .... 0.00 goals (+/- 0.05)
Control group .... -0.03 goals (+/- 0.06)

-----

Looking at the entire database, I found that the average fight starts with a goal differential of 1.617. The average goal differential in all other games, weighted by the times of fights, is 1.421 goals. So, it seems like fights start when the game is a little more lopsided than usual.

So, maybe it's the team that's *trailing* that starts the fight, in an effort to wake itself up. Maybe we should look at trailing teams, not goonier teams.

I tried that. I threw away all situations where the score was tied when the fight happened, and looked at all the rest. The results:

2,941 datapoints
----------------
Trailing teams ... -0.20 goals (+/- 0.04)

Control group .... -0.19 goals (+/- 0.04)


Again, no real difference.

-----

Trying again, but looking only at fights where one team was trailing by at least three goals:

591 datapoints
--------------
Trailing teams ... -0.25 goals (+/- 0.08)

Control group .... -0.29 goals (+/- 0.08)

Nothing there, either.

-----

Is it possible that the benefit accrues only to GOOD teams trailing by three goals? Those are the teams playing the worst relative to their abilities, so the "wake up" effect should be strongest. Here are teams trailing by 3 goals that were at least +30 in goal differential for the season:

122 datapoints
--------------
Trailing teams ... +0.14 goals (+/- 0.17)

Control group .... +0.17 goals (+/- 0.16)

Nope. What if we look at good teams trailing by *any* number of goals?

841 datapoints
--------------
Trailing teams ... +0.40 goals (+/- 0.07)

Control group .... +0.26 goals (+/- 0.07)

Aha! This time, there's a small "lift" effect, at about 1.4 SD. But, why would there be an effect for teams trailing by 1 goal, but not for teams traling by 3 goals?

I got curious and ran the same study again, and this time the random control group came in at +0.33, bring the difference down to 1.0 SD. (Of course, it's not appropriate to dismiss the first result just because the second one came out less extreme.)

-----

At this point, you might reasonably argue that the rules "team with more majors that year" and "team trailing in the game" are not precise enough in selecting teams that started the fight. So, this time, I assumed the fight was started by the *player* with the most majors that season, rather than the *team* with the most majors that season. So when the goon of a pacifist team starts a fight with a pacifist of a goon team, you go with the goon player on the pacifist team. The results:

4,185 datapoints
----------------
Goonier Player ... +0.01 goals (+/- 0.03)

Control Group .... -0.02 goals (+/- 0.03)

Again, less than 1 SD difference. There's not much difference between this "goon player" breakdown and the previous "goon team" breakdown, probably because most of the goonier players also played on goonier teams. But it was worth a try.

-----

Finally, one last try. For this run, I combined all three criteria. To be included in the study:

(a) one team had to have at least 20 more majors for the season than its opponent;
(b) that same team's fighter had to have more majors that year than his opponent; and
(c) that same team had to be trailing in the game.

This *has* to work, right? I mean, that pushes all the right buttons: a truculent team, with a figher selected for that purpose, behind in the game and likely to be needing a lift. If *those* teams don't benefit from the fight, then who would?

I expected the same non-result, but, this time, we get the biggest effect so far:

364 datapoints
--------------
Teams qualifying ... -0.18 goals (+/- 0.10)

Control group ...... -0.38 goals (+/- 0.11)

There's a difference of .20 goals -- almost a fifth of a goal per fight! Taken at face value, that means that when a team like that starts a fight, it benefits by even more than a power play (which has a 15 to 20 percent success rate).

That difference is still only about 1.4 SD from zero. Still, I hate to just dismiss it. I've always thought that if you get a result that's significant in the real-world (hockey) sense, but it's not statistically signficant, that's a problem with your study -- it's just that you haven't used enough data to be able to prove anything. We should still be open to the possibility that the effect might be real.

I ran it a few more times, to check if maybe the control group was just a random outlier. The extra results:

Control group: -0.26 goals
Control group: -0.21 goals
Control group: -0.27 goals
Control group: -0.38 goals (again)
Control group: -0.38 goals (again)
Control group: -0.29 goals

So, the original run was a little extreme, but not much.

There are, however, some mitigating factors. First, the control group numbers aren't all independent, since there's a limited number of control games to choose randomly from. Second, we obviously can't do extra runs to reduce random chance in the *real* games, but it's still possible those teams scored more goals for random reasons having nothing to do with any lift they got from the fight. Third, the SDs of both groups are a bit understated: I calculated them based on the assumption that games are independent, but they're not -- a real game appears in the study multiple times, once for each fight, and a control game could get randomly selected more than once, too.

If you average the seven control groups in the seven repetitions of the study, you get -0.31 goals. That's 0.13 goals worse than the actual games. Taking into account the fact that we ran the control group five times, the 0.13 difference is now around 1 SD.

Oh, and this is as good a time as any to emphasize that I could also have screwed up somewhere ... I've already had to rerun everything once when I found a misplaced parenthesis in my code.

-----

So, I guess, our overall conclusion from this study isn't completely certain. We wind up with a summary like:

1. The effect doesn't seem to exist for run-of-the-mill fights.
2. When a goon fighter on a goon team fights when his team is down, it seems to benefit that team by 1/8 of a goal, or a bit less than a normal power play.
3. But, that effect isn't statistically significant, so we have some doubts that it's real.
4. And, with only 364 such datapoints qualifying out of around 5,000, only a small percentage of fights match the criterion for that kind of boost.

If you had to reduce that to one line, it might be:

At best, there might be a small effect in certain specific circumstances ... but much, much less than sportscasters make it out to be.




UPDATE: Part 2 is here.

Labels: , , ,

Monday, January 09, 2012

Do NHL referees call "make up" penalties? Part IV

A couple of links to other similar studies on penalty-calling:

1. Commenter Jack linked to this article with some basketball foul-calling data. Turns out the more consecutive fouls against one team, the more likely the next will go to the other team.

2. Another reader pointed me to a 2009 hockey study (web version here, PDF here) by Jack Brimberg and William J. Hurley. They looked at the first three penalties of every game, and found results similar to what I found.


Labels: , , , , ,

Saturday, January 07, 2012

Ken Dryden tracers from "The Game"

In my review of Ken Dryden's book "The Game," I listed seven of the details that I tried to trace. I now have an eighth one, and then a retrace of the second one.

These two updates were originally posted to the SIHR mailing list. The original seven are here. Thanks again to the Hockey Summary Project for the data making these tracers possible.

----

8. Here's Dryden, from page 121 of my edition:

"A few months ago, we played the Colorado Rockies at the Forum. Early in the game, I missed an easy shot from the blueline, and a little unnerved, for the next fifty minutes I juggled long shots, and allowed big rebounds for three additional goals. After each Rockies goal, the team would put on a brief spurt and score quickly, and so with only minutes remaining, the game was tied. Then the Rockies scored again, this time a long, sharp-angled shot that squirted through my legs. The game had seemed finally lost. But in the last three minutes, Lapointe scored, then Lafleur, and we won 6-5. Alone in the dressing room later .... I just sat there, unable to understand why I felt the way I did. Only slowly did it come to me: I had been irrelevant; I couldn't even lose the game."

In Dryden's career, I found 12 Canadiens games against Colorado. Montreal won 11 of them and tied one. But none of them was by a score of 6-5.

Montreal did not have any 6-5 wins at all in 1978-79 (when the book is set), or in the previous two seasons. In Dryden's entire career, I found only two 6-5 Montreal wins where he was in net.

One was February 12, 1972, against the Kings. The narrative doesn't match. In that game, the Habs led 6-3, and then the Kings scored two late goals.

The other was November 16, 1972. In that game, Dryden was replaced by Michel Plasse after the first period, so that doesn't match either.

So, I extended the search to look for all games where Dryden gave up 5+ goals, but the Canadiens won anyway. There were five such games:

February 12, 1972, 6-5 against LA (described above)
November 22, 1974, 7-6 against Kansas City
February 18, 1976, 7-5 against Toronto
November 21, 1976, 9-5 against Toronto
December 23, 1977, 7-5 against New York Islanders.

None of the games match exactly, but the Kansas City game is the best candidate:

-- It was against the team that eventually became the Rockies;

-- The opposition scored late (14:49 of the third), and the Habs won it later (17:53);

-- It was a one-goal game;

-- Dryden probably didn't play great (6 goals on 21 shots, seven per period);

-- The last five goals alternated by team.

But other things don't match:

-- The Scouts tied it late, not took the lead late;

-- The Habs scored one goal to win, not two;

-- The goal was scored by Doug Risebrough, not Lapointe or Lafleur -- in fact, neither Lapointe nor Lafleur scored at all that game;

-- The early goals didn't alternate (The goal sequence was kMMMMkkkMkMkM);

-- The game was in Kansas City, not Montreal;

-- The game happened several years previously, rather than months.

I checked the Globe and Mail recap for that game ... it was just a couple of paragraphs long, and didn't mention Dryden at all, or how the Kansas City goals went in. I don't have online access to any Montreal newspapers to get a more detailed game story.

-----

2. Number 2 in my blog post listed a game against Toronto. Dryden writes that the Leafs tie the game early, get confident that they can keep up with the Canadiens, and begin to take over the play. But Mark Napier and Pat Hughes score two quick goals for the Habs. The Canadiens score two more, and then the Leafs get two late. The next day, the players wonder why coach Scotty Bowman didn't give them hell for allowing those two late goals.

It all adds up to 6-4. There was no 6-4 win in Toronto in 1978-79.

So, I looked for other games that might match.

There were only two games during Dryden's career where Napier and Hughes both scored.

One was November 15, 1978, a 6-1 win over Colorado. That doesn't match.

But the game of January, 17, 1979 is probably it. It matches, though not exactly:

-- It's in 1978-79, the season Dryden was writing about.

-- It was against Los Angeles, not Toronto, and home, not road.

-- Although the Kings scored to make it 2-1 at 8:43 of the second period, they never tied it up.

-- After the Kings' goal to make it 2-1, Napier and Hughes scored to make it 4-1.

-- After that, the Habs scored three more goals (not two): Houle, then Napier and Hughes again.

-- After that, the Kings got their two late goals.

-- The final was 7-3, not 6-4.


But, I think, pretty close anyway.



Labels: ,