Jonah Lehrer illustrates a common misunderstanding of (im)probability. He writes:
It’s been a hotly debated scientific question for decades: was Joe DiMaggio’s 56-game hitting streak a genuine statistical outlier, or is it an expected statistical aberration, given the long history of major league baseball?
He is referring to the observation that 56-game hitting streaks while intuitively improbable will nevertheless happen when the game has been around for long enough. Does this make it less of a feat?
- Say I have a monkey banging on a keyboard. Take any seqeunce of letters. The chance that the monkey will bang out that particular sequence is impossibly small. But one sequence will be produced. When we see that sequence produced do we change our minds and say that’t not so surprising after all because there was certain to be one unlikely sequence produced? No. Similarly, the chance that somebody will hit safely in 56 straight games could be high, but the chance that it will be player X is small. Indeed, that probability is equal to the probability that player X is the greatest streak hitter ever to play the game. So if X turns out to be Joe DiMaggio then we conclude that Joe DiMaggio indeed accomoplished quite a feat.
- We might be asking a different question. We grant that DiMaggio achieved the highly improbable and hit for the longest streak of any player in history, but we ask whether 56 is really all that long? After all, he didn’t hit for 57, which is even less likely. To address this question we might ask, on average, how many players “should” hit safely in 56 straight games in the time that the game has been around? But this question is very easy to answer. Our best estimate of the expected number of players to hit 56-game streaks is 1, the actual number. (Because the number is close to zero, this estimate is noisy, but this is still the best estimate without making any assumptions about the underlying distribution.)

5 comments
Comments feed for this article
July 2, 2009 at 10:05 am
Ana Andjelic
Would DiMaggio’s hitting streak be treated as a aberration 100 years from now? That is, as the time passes, and more people have more winning streaks, is it more likely not to be considered an outlier? Maybe we just treat it as such because we don’t have enough other, possible and probable, information.
July 3, 2009 at 10:10 pm
jeff
Its possible but that depends on whether lots more hitters make 56 between now and then. The more that do the less impressive it looks. On the other hand if few match him then he looks more and more exceptional.
Since we don’t know which of these two scenarios will occur, I would phrase your question a different way. Do we *expect* that he will look less and less like an outlier. And the answer to that follows from a basic law of probability: our best estimate of our future belief is the belief we hold today. So we expect that he will look like just as much of an outlier (but no more) as he looks today.
July 2, 2009 at 4:13 pm
allan
I’m not sure how this is a misunderstanding. He links to some very interesting statistical physics work, and the podcast he cites also did an better than decent job of talking about what rare events actually mean.
July 2, 2009 at 4:56 pm
Jonathan Berman
The following links to an article that tackles some of these questions :
“But the right question is not how likely it was for DiMaggio to have a 56-game hitting streak in 1941. The question is: How likely was it that anyone in the history of baseball would have achieved a streak that long or longer?…Using a comprehensive collection of baseball statistics from 1871 to 2005, we simulated the entire history of baseball 10,000 times in a computer…And suddenly the unlikely becomes likely: we get a very long streak each time we run baseball history.”
July 2, 2009 at 9:59 pm
mike
So what’s the difference between an ‘outlier’ and an ‘aberration’? Does one mean 1/10^6 and the other is like 1/10^666, or something like that?
Anyway, not really sure why they needed to use a simulation. They could have just done the math.
Joe’s got an avg of .325 ( .675 miss). With 4 at bats per game, that’s .675^4 miss, or 80% hit. Take that to the power of 56 and you’ve got around 2/10^6 chance for any given string of 56 games to all have hits.
He played >1700 games, which give’s him that less 56 opportunities. That means he personally had around an 1/300 chance of getting that streak. (I suppose one could refine the % somehow, given the knowledge that those opportunities overlap, but such is the nature of stats)
Their sim had him winning the streak contest 30 times out of 10k, which is a bit different from getting the 60 or so games with hits, but the percentages are still in the same ballpark.