You are currently browsing the tag archive for the ‘statistics’ tag.

A buyer and a seller negotiating a sale price.  The buyer has some privately known value and the seller has some privately known cost and with positive probability there are gains from trade but with positive probability the seller’s cost exceeds the buyers value.  (So this is the Myerson-Satterthwaite setup.)

Do three treatments.

  1. The experimenter fixes a price in advance and the buyer and seller can only accept or reject that price.  Trade occurs if and only if they both accept.
  2. The seller makes a take it or leave it offer.
  3. The parties can freely negotiate and they trade if and only if they agree on a price.

Theoretically there is no clear ranking of these three mechanisms in terms of their efficiency (the total gains from trade realized.)  In practice the first mechanism clearly sacrifices some efficiency in return for simplicity and transparency.  If the price is set right the first mechanism would outperform the second in terms of efficiency due to a basic market power effect.  In principle the third treatment could allow the parties to find the most efficient mechanism, but it would also allow them to negotiate their way to something highly inefficient.

A conjecture would be that with a well-chosen price the first mechanism would be the most efficient in practice.   That would be an interesting finding.

A variation would be to do something similar but in a public goods setting.  We would again compare simple but rigid mechanisms with mechanisms that allow for more strategic behavior.  For example, a version of mechanism #1 would be one in which each individual was asked to contribute an equal share of the cost and the project succeeds if and only if all agree to their contributions.  Mechanism #3 would allow arbitrary negotation with the only requirement be that the total contribution exceeds the cost of the project.

In the public goods setting I would conjecture that the opposite force is at work.  The scope for additional strategizing (seeding, cajoling, guilt-tripping, etc) would improve efficiency.

Anybody know if anything like these experiments have been done?

Nonsense?

For Shmanske, it’s all about defining what counts as 100% effort. Let’s say “100%” is the maximum amount of effort that can be consistently sustained. With this benchmark, it’s obviously possible to give less than 100%. But it’s also possible to give more. All you have to do is put forth an effort that can only be sustained inconsistently, for short periods of time. In other words, you’re overclocking.

And in fact, based on the numbers, NBA players pull greater-than-100-percent off relatively frequently, putting forth more effort in short bursts than they can keep up over a longer period. And giving greater than 100% can reduce your ability to subsequently and consistently give 100%. You overdraw your account, and don’t have anything left.

Here is the underlying paper.  <Painfully repressing the theorist’s impulse to redefine the domain to paths of effort rather than flow efforts, thus restoring the spiritually correct meaning of 100%>

Cap curl:  Tim Carmody guest blogging at kottke.org.

In tennis, a server should win a larger percentage of second-serve points compared to first-serve points; that much we know.  Partly that’s because a server optimally serves more faults (serves that land out) on first serve than second serve.  But what if we condition on the event that the first serve goes in? Here’s a flawed logic that takes a bit of thinking to see through:

Even conditional on a first serve going in, the probability that the server wins the point must be no larger than the total win probability for second serves. Because suppose it were larger.  Then the server wins with a higher probability when his first serve goes in.  So he should ease off just a bit on his first serve so that a larger percentage lands in, raising the total probability that he wins the point.  Even though the slightly slower first serve wins with a slightly reduced probability (conditional on going in) he still has a net gain as long as he eases off just slightly so that it is still larger than the second serve percentage. Indeed the lower probability of a fault could even raise the total probability that he wins on the first serve.

Consider the following syllogism:

  1. If a person is an American, he is probably not a member of Congress.
  2. This person is a member of Congress.
  3. Therefore he is probably not American.

As John D. Cook writes:

We can’t reject a null hypothesis just because we’ve seen data that are rare under this hypothesis. Maybe our data are even more rare under the alternative. It is rare for an American to be in Congress, but it is even more rare for someone who is not American to be in the US Congress!

Jonah Lehrer writes about how bad NFL teams are at drafting talented players, particularly at the quarterback position.

Despite this advantage, however, sports teams are impressively amateurish when it comes to the science of human capital. Time and time again, they place huge bets on the wrong players. What makes these mistakes even more surprising is that teams have a big incentive to pick the right players, since a good QB (or pitcher or point guard) is often the difference between a middling team and a contender. (Not to mention, the player contracts are worth tens of millions of dollars.) In the ESPN article, I focus on quarterbacks, since the position is a perfect example of how teams make player selection errors when they focus on the wrong metrics of performance. And the reason teams do that is because they misunderstand the human mind.

He talks about a test that is given to college quarterbacks eligible for the NFL draft to test their ability to make good decisions on the field.  Evidently this test is considered important by NFL scouts and indeed scores on this test are good predictors of whether and when a QB will be selected in the draft.

However,

Consider a recent study by economists David Berri and Rob Simmons. While they found that Wonderlic scores play a large role in determining when QBs are selected in the draft — the only equally important variables are height and the 40-yard dash — the metric proved all but useless in predicting performance. The only correlation the researchers could find suggested that higher Wonderlic scores actually led to slightly worse QB performance, at least during rookie years. In other words, intelligence (or, rather, measured intelligence), which has long been viewed as a prerequisite for playing QB, would seem to be a disadvantage for some guys. Although it’s true that signal-callers must grapple with staggering amounts of complexity, they don’t make sense of questions on an intelligence test the same way they make sense of the football field. The Wonderlic measures a specific kind of thought process, but the best QBs can’t think like that in the pocket. There isn’t time.

I have not read the Berri-Simmons paper but inferences like this raise alarm bells.  For comparison, consider the following observation. Among NBA basketball players, height is a poor predictor of whether a player will be an All-Star.  Therefore, height does not matter for success in basketball.

The problem is that, both in the case of IQ tests for QBs and height for NBA players, we are measuring performance conditional on being good enough to compete with the very best. We don’t have the data to compare the QBs who are drafted to the QBs who are not and how their IQ factors into the difference in performance.

The observable characteristic (IQ scores, height) is just one of many important characteristics, some of which are not quantifiable in data. Given that the player is selected into the elite, if his observable score is low we can infer that his unobservable scores must be very high to compensate. But if we omit those intangibles in the analysis, it will look like people with low scores are about as good as people with high scores and we would mistakenly conclude that they don’t matter.

I am always writing about athletics from the strategic point of view:  focusing on the tradeoffs.  One tradeoff in sports that lends itself to strategic analysis is effort vs performance.  When do you spend the effort to raise your level of play and rise to the occasion?

My posts on those subjects attract a lot of skeptics.  They doubt that professional athletes do anything less than giving 100% effort.  And if they are always giving 100% effort, then the outcome of a contest is just determined by gourd-given talent and random factors. Game theory would have nothing to say.

We can settle this debate.  I can think of a number of smoking guns to be found in data that would prove that, even at the highest levels, athletes vary their level of performance to conserve effort; sometimes trying hard and sometimes trying less hard.

Here is a simple model that would generate empirical predictions.  Its a model of a race. The contestants continuously adjust how much effort to spend to run, swim, bike, etc. to the finish line. They want to maximize their chance of winning the race, but they also want to spend as little effort as necessary.  So far, straightforward.  But here is the key ingredient in the model: the contestants are looking forward when they race.

What that means is at any moment in the race, the strategic situation is different for the guy who is currently leading compared to the trailers.  The trailer can see how much ground he needs to make up but the leader can’t see the size of his lead.

If my skeptics are right and the racers are always exerting maximal effort, then there will be no systematic difference in a given racer’s time when he is in the lead versus when he is trailing.  Any differences would be due only to random factors like the racing conditions, what he had for breakfast that day, etc.

But if racers are trading off effort and performance, then we would have some simple implications that, if it were born out in data, would reject the skeptics’ hypothesis.  The most basic prediction follows from the fact that the trailer will adjust his effort according to the information he has that the leader does not have.  The trailer will speed up when he is close and he will slack off when he has no chance.

In terms of data the simplest implication is that the variance of times for a racer when he is trailing will be greater than when he is in the lead.  And more sophisticated predictions would follow.  For example the speed of a trailer would vary systematically with the size of the gap while the speed of a leader would not.

The results from time trials (isolated performance where the only thing that matters is time) would be different from results in head-to-head competitions. The results in sequenced competitions, like downhill skiing, would vary depending on whether the racer went first (in ignorance of the times to beat) or last.

And here’s my favorite:  swimming races are unique because there is a brief moment when the leader gets to see the competition:  at the turn.  This would mean that there would be a systematic difference in effort spent on the return lap compared to the first lap, and this would vary depending on whether the swimmer is leading or trailing and with the size of the lead.

And all of that would be different for freestyle races compared to backstroke (where the leader can see behind him.)

Finally, it might even be possible to formulate a structural model of an effort/performance race and estimate it with data.  (I am still on a quest to find an empirically oriented co-author who will take my ideas seriously enough to partner with me on a project like this.)

Drawing:  Because Its There from www.f1me.net

Boston being a center for academia as well as professional sports, Harvard and MIT faculty and students are leading the way in the business of sports consulting.

And some of those involved aren’t that far away from being kids. Harvard sophomore John Ezekowitz, who is 20, works for the NBA’sPhoenix Suns from his Cambridge dorm room, looking beyond traditional basketball statistics like points, rebounds, assists, and field goal percentage to better quantify player performance. He is enjoying the kind of early exposure to professional sports once reserved for athletic phenoms and once rare at institutions like Harvard and MIT. “If I do a good job, I can have some new insight into how this team plays, what works and what doesn’t,” says Ezekowitz. “To think that I might have some measure of influence, however small, over how a team plays is a thrill.” It’s not a bad job, either. While he doesn’t want to reveal how much he earns as a consultant, he says that not only does he eat better than most college students, the extra cash also allows him to feed his golf-club-buying habit.

From a fun little article by Andrew Gelman and Deborah Nolan:

The law of conservation of angular momentum tells us that once the coin is in the air, it spins at a nearly constant rate (slowing down very slightly due to air resistance). At any rate of spin, it spends half the time with heads facing up and half the time with heads facing down, so when it lands, the two sides are equally likely (with minor corrections due to the nonzero thickness of the edge of the coin); see Figure 3. Jaynes (1996) explained why weighting the coin has no effect here (unless, of course, the coin is so light that it floats like a feather): a lopsided coin spins around an axis that passes through its center of gravity, and although the axis does not go through the geometrical center of the coin, there is no difference in the way the biased and symmetric coins spin about their axes.

On the other hand, a weighted coin spun on a table will show a bias for the weighted side.  The article describes some experiments and statistical tests to use in the classroom.  There are some entertaining stories too.  Like how the King of Norway avoided losing the entire Island of Hising to the King of Sweden by rolling a 13 with a pair of dice (“One die landed six, and the other split in half landing with both a six and a one showing.”)

Visor volley:  Toomas Hinnosaar.

By asking a hand-picked team of 3 or 4 experts in the field (the “peers”), journals hope to accept the good stuff, filter out the rubbish, and improve the not-quite-good-enough papers.

…Overall, they found a reliability coefficient (r^2) of 0.23, or 0.34 under a different statistical model. This is pretty low, given that 0 is random chance, while a perfect correlation would be 1.0. Using another measure of IRR, Cohen’s kappa, they found a reliability of 0.17. That means that peer reviewers only agreed on 17% more manuscripts than they would by chance alone.

That’s from neuroskeptic writing about an article that studies the peer-review process.  I couldn’t tell you what Cohen’s kappa means but let’s just take the results at face value:  referees disagree a lot.  Is that bad news for peer-review?

Suppose that you are thinking about whether to go to a movie and you have three friends who have already seen it.  You must choose in advance one or two of them to ask for a recommendation.  Then after hearing their recommendation you will decide whether to see the movie.

You might decide to ask just one friend.  If you do it will certainly be the case that sometimes she says thumbs-up and sometimes she says thumbs-down. But let’s be clear why.  I am not assuming that your friends are unpredictable in their opinions.  Indeed you may know their tastes very well.  What I am saying is rather that, if you decide to ask this friend for her opinion, it must be because you don’t know it already. That is, prior to asking you cannot predict whether or not she will recommend this particular movie.  Otherwise, what is the point of asking?

Now you might ask two friends for their opinions.  If you do, then it must be the case that the second friend will often disagree with the first friend.  Again, I am not assuming that your friends are inherently opposed in their views of movies. They may very well have similar tastes. After all they are both your friends. But, you would not bother soliciting the second opinion if you knew in advance that it was very likely to agree or disagree with the first on this particular movie. Because if you knew that then all you would have to do is ask the first friend and use her answer to infer what the second opinion would have been.

If the two referees you consult are likely to agree one way or the other, you get more information by instead dropping one of them and bringing in your third friend, assuming he is less likely to agree.

This is all to say that disagreement is not evidence that peer-review is broken. Exactly the opposite:  it is a sign that editors are doing a good job picking referees and thereby making the best use of the peer-review process.

It would be very interesting to formalize this model, derive some testable implications, and bring it to data. Good data are surely easily accessible.

(Picture:  Right Sizing from www.f1me.net)

(Regular readers of this blog will know I consider that a good thing.)

The fiscal multiplier is an important and hotly debated measure for macroeconomic policy. If the government spends an additional dollar, a dollar’s worth of output is produced, but in addition the dollar is added to disposable income of the recipients who then spend some fraction of it. More output is produced, etc.

It’s hard to measure the multiplier because observed increases in government spending are endogenous and correlated with changes in output for reasons that have nothing to do with fiscal stimulus.

Daniel Shoag develops an instrument which isolates a random component to state-level government spending changes.

Many US states manage pensions which are defined-benefit plans. Defined benefits means that retirees are guaranteed a certain benefit level. This means that the state government bears all of the risk from the investments of these pension funds. Excess returns from these funds are unexpected exogenous windfalls to state spending budgets.

With this instrument, Daniel estimates that an additional dollar of state government spending increases income in the state by $2.12. That is a large multiplier.

The result must be interpreted with some caveats in mind. First, state spending increases act differently than increases at the national level where general equilibrium effects on prices and interest rates would be larger. Second, these spending increases are funded by windfall returns. The effects are likely to be different than spending increases funded by borrowing which crowds out private investment.

Here’s a broad class of games that captures a typical form of competition.  You and a rival simultaneously choose how much effort to spend and depending on your choices, you earn a score, a continuous variable.  The score is increasing in your effort and decreasing in your rival’s effort.  Your payoff is increasing in your score and decreasing in your effort.  Your rival’s payoff is decreasing in your score and his effort.

In football, this could model an individual play where the score is the number of yards gained.  A model like this gives qualitatively different predictions when the payoff is a smooth function of the score versus when there are jumps in the payoff function.  For example, suppose that it is 3rd down and 5 yards to go. Then the payoff increases gradually in the number of yards you gain but then jumps up discretely if you can gain at least 5 yards giving you a first down. Your rival’s payoff exhibits a jump down at that point.

If it is 3rd down and 20 then that payoff jump requires a much higher score. This is the easy case to analyze because the jump is too remote to play a significant role in strategy.  The solution will be characterized by a local optimality condition.  Your effort is chosen to equate the marginal cost of effort to the marginal increase in score, given your rival’s effort.  Your rival solves an analogous problem.  This yields an equilibrium score strictly less than 20.  (A richer, and more realistic model would have randomness in the score.)  In this equilibrium it is possible for you to increase your score, even possibly to 20, but the cost of doing so in terms of increased effort is too large to be profitable.

Suppose that in the above equilibrium you gain 4 yards. Then when it is 3rd down and 5 this equilibrium will unravel.  The reason is that although the local optimality condition still holds, you now have a profitable global deviation, namely putting in enough effort to gain 5 yards.  That deviation was possible before but unprofitable because 5 yards wasn’t worth much more than 4.  Now it is.

Of course it will not be an equilibrium for you to gain 5 yards because then your opponent can increase effort and reduce the score below 5 again.  If so, then you are wasting the extra effort and you will reduce it back to the old value. But then so will he, etc.  Now equilibrium requires mixing.

Finally, suppose it is 3rd down and inches.  Then we are back to a case where we don’t need mixing.  Because no matter how much effort your opponent uses you cannot be deterred from putting in enough effort to gain those inches.

The pattern of predictions is thus:  randomness in your strategy is non-monotonic in the number of yards needed for a first down.  With a few yards to go strategy is predictable, with a moderate number of yards to go there is maximal randomness, and then with many yards to go, strategy is predictable again. Variance in the number of yards gained in these cases will exhibit a similar non-monotonicity.

This could be tested using football data, with run vs. pass mix being a proxy for randomness in strategy.

While we are on the subject, here is my Super Bowl tweet.

I am talking about world records of course.  Tyler Cowen linked to this Boston Globe piece about the declining rate at which world records are broken in athletic events, especially Track and Field.  (Usain Bolt is the exception.)

How quickly should we expect the rate of new world records to decline?  Suppose that long jumps are independent draws from a Normal distribution.  Very quickly the world record will be in the tail.  At that point breaking the record becomes very improbable.  But should the rate decline quickly from there?  Two forces are at work.

First, every new record pushes us further into the tail and reduces the probability, and hence freqeuncy, of new records.  But, because of the thin tail property of the Normal distribution, new records will with very high probability be tiny advances.  So the new record will be harder to beat but not by very much.

So the rate will decline and asymptotically it will be zero, but how fast will it converge to zero?  Will there be a constant K such that we will have to wait no more than nK years for the nth record to be broken or will it be faster than that?

I am sure there is an easy answer to this question for the Normal distribution and probably a more general result, but my intuition isn’t taking me very far.  Probably this is a standard homework problem in probability or statistics.

The Boston Globe piece is about humans ceasing to progress physically.  The theory could shed light on this conclusion.  If the answer above is that the arrival rate increases exponentially, I wonder what rate the mean of the distribution can grow and still give rise to the slowdown.  If the mean grows logarithmically?

Tennis commentators will typically say about a tall player like John Isner or Marin Cilic that their height is a disadvantage because it makes them slow around the court.  Tall players don’t move as well and they are not as speedy.

On the other hand, every year in my daughter’s soccer league the fastest and most skilled player is also among the tallest.  And most NBA players of Isner’s height have no trouble keeping up with the rest of the league. Indeed many are faster and more agile than Isner.  LeBron James is 6’8″.

It is not true that being tall makes you slow. Agility scales just fine with height and it’s a reasonable assumption that agility and height are independently distributed in the population. Nevertheless it is true in practice that all of the tallest tennis players on the tour are slower around the court.

But all of these facts are easily reconcilable.  In the tennis production function, speed and height are substitutes.  If you are tall you have an advantage in serving and this can compensate for lower than average speed if you are unlucky enough to have gotten a bad draw on that dimension.  So if we rank players in terms of some overall measure of effectiveness and plot the (height, speed) combinations that produce a fixed level of effectiveness, those indifference curves slope downward.

When you are selecting the best players from a large population, the top players will be clustered around the indifference curve corresponding to “ridiculously good.” And so when you plot the (height, speed) bundles they represent, you will have something resembling a downward sloping curve.  The taller ones will be slower than the average ridiculously good tennis player.

On the other hand, when you are drawing from the pool of Greater Winnetka Second Graders with the only screening being “do their parent cherish the hour per week of peace and quiet at home while some other parent chases them around?” you will plot an amorphous cloud.  The best player will be the one farthest to the northeast, i.e. tallest and fastest.

Finally, when the sport in question is one in which you are utterly ineffective unless you are within 6 inches of the statistical upper bound in height, then  a) within that range height differences matter much less in terms of effectiveness so that height is less a substitute for speed at the margin and b) the height distribution is so compressed that tradeoffs (which surely are there) are less stark.  Mugsy Bogues notwithstanding.

From Presh Talwalker:

In poker tournaments, everyone gets a fair shot at holding the dealer position as seats are assigned randomly.

In home games, an attempt is also made to assign the dealer spot randomly. There are many methods to choosing the dealer. One of the common methods is dealing to the first ace. It works like this: the host deals a card to each player, face up, and continues to deal until someone receives an ace. This player gets to start the game as dealer.

The question is: does dealing to the first ace give everyone an equal chance to be dealer? Is this a fair system?

Answer:  it’s not.  Presh goes through the full analysis, but here’s a simple way to see why.  Suppose you have 5 players at the table and you are dealing from a deck of 5 cards with 2 aces in it.  Every time you deal there will be two people with aces.  But the person who gets to be dealer is the one who is closest to the host’s left.  If the deal went in the other direction, someone closer to the host’s right would be dealer.

It can’t be fixed by tossing a coin to decide which direction to deal because that would disadvantage players sitting directly across from the dealer.  You need to randomly choose the first person to deal to.  But if you have a trustworthy device for doing that, you don’t need to bother with the aces.

(Regular readers of this blog will know that I consider that a good thing.)

It is rare that I even understand a seminar in econometric theory let alone come away being able to explain it in words but this one was exceptionally clear.

A perennial applied topic is to try to measure the returns to education.  If someone attends an extra year of school how does that affect, say, their lifetime earnings? Absent a controlled experiment, the question is plagued with identification problems.  You can’t just measure the earnings of someone with N years of education and compare that with the earnings of someone with N-1 years because those people will be different along other, unobservable, dimensions.  For example, if intrinsically smarter students go to school longer and earn more, then that difference will be at least partially attributable to intrinsic smartness, independent of the extra year of school.

Even a controlled experiment has confounding factors.  Say you divide the population randomly into two groups and lower the cost of schooling for one group. Then you see the difference in education levels and lifetime earnings among these groups.  These data are hard to interpret because different people in the treated group will respond differently to the cost reduction, probably again depending on their unobserved characteristics.  Those who chose to get an extra year of education are not a random sample from the treated group.

Torgovitzky shows that under a natural assumption you can nevertheless identify the returns to additional schooling for students of all possible innate ability levels, even if those are unobservable.  The assumption is that the ranking of students by educational attainment is unaffected by the treatment.  That is, if students of ability level A get more education than students of ability level B when education is costly, they would also get more education than B when education is less costly.  (Of course their absolute level of education will be affected.)

The logic is surprisingly simple.  Under this assumption, when you look at students in the Qth percentile of education attainment in the treated and control groups, you know they have the same distribution of unobserved ability.  So whatever their difference in earnings is fully explained by their difference in education attainment. (Remember that the Qth percentile measures the relative position in the distributions.  The Qth percentile of the treated groups education distribution is a higher raw number of years of schooling.)

Not only that, but after some magic (see figure 1 in the paper), the entire function mapping (quantiles of) ability level and education to earnings can be identified from data.

In a paper published in the Journal of Quantitative Analysis in Sports; Larsen, Price, and Wolfers demonstrate a profitable betting strategy based on the slight statistical advantage of teams whose racial composition matches that of the referees.

We find that in games where the majority of the officials are white, betting on the team expected to have more minutes played by white players always leads to more than a 50% chance of beating the spread. The probability of beating the spread increases as the racial gap between the two teams widens such that, in games with three white referees, a team whose fraction of minutes played by white players is more than 30 percentage points greater than their opponent will beat the spread 57% of the time.

The methodology of the paper leaves some lingering doubt however because the analysis is retrospective and only some of the tested strategies wind up being profitable.  A more convincing way to do a study like this is to first make a public announcement that you are doing a test and, using the method discussed in the comments here, secretly document what the test is.  Then implement the betting strategy and announce the results, revealing the secret announcement.

 

It’s looney to celebrate New Years, New Millenia, etc.  Every day counts equally in the march of time.  Just by arbitrary historical accident one of those days is called the first day of the year.

But it occurred to me when I wrote the post about the rise in stock prices last month that there is social value from coordinating our focus on arbitrary milestone days.  If someone presents statistics to you about the behavior of some variable over the course of a year, which would be more meaningful?

  1. Stocks rose x% from July 9 2010 to July 9 2011.
  2. Stocks rose x% from Jan 1 2010 to Jan 1 2011.

Subjectively, it is more likely that the dates for the first range were cherrypicked by the statistician to generate the conclusion.  Restricting attention to dates that have significance “outside the model” makes the exhibit more credible.

Commenting on Jonah Lehrer’s article on “The Truth Wears Off,” and how once rock-solid science eventually becomes impossible to replicate, Chris Blattman blames publication bias in all of its various forms.

The culprit? Not biology. Not adaptation to drugs. Not even prescription to less afflicted patients. Rather, it’s scientists themselves.

Journals reward statistical significance, and too many academics massage or select results until the magical two asterisks are reached.

But more worrisome is that much of the problem might be more unconscious: a profession-wide tendency to pay attention to, pursue, write up, publish, and cite unusually large and statistically significant findings.

This is all true, and it’s why you should reject out of hand studies like the one documenting “precognition” that made the rounds a few months ago.  (Who’s gonna even mention, let alone publish a study reporting that “we tried but just couldn’t find evidence that people can see the future”?)

But do be careful:  if there is a publication bias in favor of the unexpected, then you have just as much reason to doubt that the “truth wears off.”  If a fact was first proven then disproven, was publication bias to blame for the proof or the disproof?

 

In sports, high-powered incentives separate the clutch performers from the chokers.  At least that’s the usual narrative but can we really measure clutch performance?  There’s always a missing counterfactual.  We say that he chokes if he doesn’t come through when the stakes are raised.  But how do we know that he wouldnt have failed just as miserably under normal circumstances?  As long as performance has a random element, pure luck (good or bad) can appear as if it were caused by circumstances.

You could try a controlled experiment, and probably psychologists have.  But there is the usual leap of faith required to extrapolate from experimental subjects in artificial environments to professionals trained and selected for high-stakes performance.

Here is a simple quasi-experiment that could be done with readily available data.  In basketball when a team accumulates more than 5 fouls, each additional foul sends the opponent to the free-throw line.  This is called the “bonus.”  In college basketball the bonus has two levels.  After fouls 5-10 (correction: fouls 7-9) the penalty is what’s called a “one and one.”  One free-throw is awarded, and then a second free-throw is awarded only if the first one is good.  After 10 fouls the team enters the “double bonus” where the shooter is awarded two shots no matter what happens on the first.  (In the NBA there is no “single bonus,” after 5 fouls the penalty is two shots.)

The “front end” of the one-and-one is a higher stakes shot because the gain from making it is 1+p points where p is the probability of making the second.  By contrast the gain from making the first of two free throws is just 1 point.  On all other dimensions these are perfectly equivalent scenarios, and it is the most highly controlled scenario in basketball.

The clutch performance hypothesis would imply that success rates on the front end of a one and one are larger than success rates on the first free-throw out of two.  The choke-under-pressure hypothesis would imply the opposite.  It would be very interesting to see the data.

And if there was a difference, the next thing to do would be to analyze video to look for differences in how players approach these shots.  For example I would bet that there is a measurable difference in the time spent preparing for the shot.  If so, then in the case of choking the player is “overthinking” and in the clutch case this would provide support for an effort-performance tradeoff.

What makes an actor a big box office draw?  Is it fame alone or is talent required? Usually that question is confounded:  it’s hard to rule out that an actor became famous because he is talented.

The actors playing Harry, Ron, and Hermione in Harry Potter and The Deathly Hallows are most certainly famous, but almost certainly not because they are talented.  They were cast in that movie nearly 10 years ago when their average age was 11.  No doubt talent played a role in that selection but acting talent at the age of 11 is no predictor of talent at age 20.  The fact that they are in Deathly Hallows is statistically independent of how talented they are.

That is, from the point of view of today it is as if they were randomly selected to be famous film stars out of the vast pool of actors who have been training just as hard as they from ages 11 to 20.  So they are our natural experiment.  If they go on to be successful film stars after the Harry Potter franchise comes to an end then this is statistical evidence that fame itself makes a Hollywood star.

Here’s Daniel Radcliffe on fame.

We focused our analysis on twelve distinct types of touch that occurred when two or more players were in the midst of celebrating a positive play that helped their team (e.g., making a shot). These celebratory touches included fist bumps, high fives, chest bumps, leaping shoulder bumps, chest punches, head slaps, head grabs, low fives, high tens, full hugs, half hugs, and team huddles. On average, a player touched other teammates (M = 1.80, SD = 2.05) for a little less than two seconds during the game, or about one tenth of a second for every minute played.

That is the highlight from the paper “Tactile Communication, Cooperation, and Performance” which documents the connection between touching and success in the NBA. Controlling (I can’t vouch for how well) for variables like salary, preseason expectations, and early season success, the conclusion is:  the more hugs in the first half of the season the more success in the second half.

Suppose you find out that someone named Rory L. Newbie predicted the financial crisis.  Should you conclude that he has some unique expertise in predicting financial crises?  Seems obvious right: someone who has no expertise would need tremendous luck to make a correct prediction, so Rory must be an expert.

But you know that millions of people are making predictions all the time, and even if not a single one of them has any expertise,  the numbers guarantee that at least one of them is going to get it right, just by sheer luck. So for sure someone like Rory is going to get it right, that doesn’t make it any more likely that he is a true expert.

But this sounds unfair to Rory.  Rory made his prediction all on his own and he got it right.  All those other people had nothing to do with it.  If Rory were the only person on the planet then when he gets it right he is an expert.  It seems that just because there are lots of other people on the planet making predictions, Rory is no longer an expert.  How could it be that his being an expert is dependent on how many other people there are in the world?

The way to resolve this is to remember that we only came to know about Rory because he made a correct prediction.  If Rory hadn’t made a correct prediction but instead Rube did, then we would have been talking about Rube instead of Rory.  No matter who it was that made the correct prediction, and for sure there’s somebody out there who did, we would be talking about that person. The name Rory is a trick because in this scenario it is really naming “the person who made a correct prediction.”

But when there’s only Rory, the name refers to that fixed individual.  He was very unlikely to make a correct prediction by dumb luck and so we are correct to conclude his prediction was born of expertise.

Fine, but I leave you with one more paradox for you to resolve on your own. Suppose Rory told his prediction to his wife in advance.  For Rory’s wife Rory is a fixed person.  While there are still many other predictors on the planet, none of them are Rory.  They are irrelevant for Rory’s wife deciding whether Rory is an expert. Now Rory’s prediction comes true. Impossible by dumb luck alone so Rory’s wife concludes that he is an expert.  But, following our logic from above, nobody else does.

Normally a difference of opinion between two people is logically consistent provided they were led to their opinions by different information.  But Rory’s wife and the rest of the world have exactly the same information. This particular guy Rory made a prediction and got it right.  There is nothing that Rory’s wife knows that the rest of the world doesn’t know.  And Rory’s wife is just as aware as the rest of the world that there is a world full of people making predictions. For Rory’s wife that doesn’t matter. Why should it for the rest of the world?

(Not a post about Juan Williams.) From the comments in a post from Jonathan Weinstein:

In fact, there is a simple procedure to simulate a (exactly) unbiased random coin from a biased one. Flip your coin twice (and repeat the procedure if you obtain the same outcome). Call “Heads” if you first got heads than tails, and “Tails” otherwise.

Check out the whole discussion to see how this relates to Ultimate Frisbee, the essential randomness of the last digit in any large integer, Mark Machina, and Fourier analysis over Abelian groups.

From the latest issue of the Journal of Wine Economics, comes this paper.

The purpose of this paper is to measure the impact of Robert Parker’s oenological grades on Bordeaux wine prices. We study their impact on the so-called en primeur wine prices, i.e., the prices determined by the chaˆteau owners when the wines are still extremely young. The Parker grades are usually published in the spring of each year, before the wine prices are established. However, the wine grades attributed in 2003 have been published much later, in the autumn, after the determination of the prices. This unusual reversal is exploited to estimate a Parker effect. We find that, on average, the effect is equal to 2.80 euros per bottle of wine. We also estimate grade-specific effects, and use these estimates to predict what the prices would have been had Parker attended the spring tasting in 2003.

Note that the €2.80 number is the effect on price from having a rating at all, averaging across good ratings and bad.  You do have to buy some identifying assumptions, however.

The most widely cited study on the effect of cell phone usage on traffic accidents is this one by Redelmeier and Tibshirani in the New England Journal of Medicine.  Their conclusion is that talking on the phone leads to a fourfold increase in accident risk.

Their method is interesting.  It’s called a case crossover design, and it works like this.  We want to know the odds ratio of an accident when you talk on the phone versus when you don’t.  Let’s write it like this, where A is the event of an accident and C is the event of talking on a cell phone while driving.

\frac{P(A \mid C)}{P(A \mid \neg C)}.

But we have no way of estimating numerator or denominator from traffic accident data because we would need to know the counterfactuals of how often people drive (with and without talking on the phone) and don’t have accidents.  Case crossover studies are based on a little algebraic trick which transforms the odds ratio into something we can estimate, with just a little more data.  Using Bayes’ rule and two lines of algebra, we can rewrite it like this.

\frac{P(A \mid C)}{P(A \mid \neg C)} = \frac{P(C \mid A)}{P(\neg C \mid A)} \cdot \frac{P(\neg C)}{P(C)}.

From accident data we can estimate the first term on the right-hand-side.  We just calculate the fraction of accidents in which someone was talking on the phone.  The finesse comes in when we estimate the second term. We don’t want to just estimate the overall frequency of cell phone use because we estimated the first term using a selected sample of people who had accidents.  They may be different from the population as a whole. We want the cellphone usage rates for the people in our sample.

Case crossover studies take each person in the data who had an accident and ask them to report whether they were talking on the phone while driving at the same time of day one week before.  Thus, each person generates their own control case.  It’s a valid control because its the same person, driving at the same time, and on average therefore under the same conditions.  These survey data are used to estimate the second term.

It’s really clever and its used a lot in epidemiological studies.  (People get sick, some were exposed to some potential hazard, others not.  The method is used to estimate the increase in risk of getting sick due to being exposed to the hazard.)

I have never seen it in economics however.  In fact, this was the first I ever heard of it.  So its natural to wonder why.  And it doesn’t take long before you see that it has a serious weakness when applied to data with a lot of heterogeneity.

To see the problem, suppose that there are two types of people. The first group, in addition to being generally accident prone are also easily distracted.  Everyone else is a safe driver and talking on cellphones doesn’t make them any less safe.  Then our sample of people who actually had accidents would consist disproportionately of the first group.  We would be estimating the effect of cell phone use on them alone. If they make up a small fraction of the population then we are drastically overestimating the increase in risk.

It’s fair to say that at best we can use the estimate of 4 as an upper bound on the risk ratio averaging over the entire population.  That population average could be zero and still be consistent with the findings from case crossover studies.  And there is no simple way to remedy the problems with this method.  So I think there is good reason to approach this question from a different direction.

As I described before, if cell phone distractions increase accident risk we would see it by comparing the population of drivers to drivers with hearing impairment, who don’t use cell phones.  And it turns out that the data exist.  In the NHTSA’s database of traffic accidents, there is this variable:

P18 Person’s Physical Impairment

Definition: Identifies physical impairments for all drivers and non-motorists which may have contributed to the cause of the crash.

And “deaf” is impairment number 9.

If a tree falls in a forest and no one is around to hear it, does it make a sound?

This old philosophical conundrum can be mapped into the dilemma facing the aging academic:

If I publish a paper and nobody reads it, teaches it or cites it, can it ever be a truly great paper?

As with all questions with no Platonic certitude, economists say: Let the market speak and tell us the answer.

Glenn Ellison has studied a more serious version of my question in his paper “How Does the Market Use Citation Data? The Hirsch Index in Economics.” The Hirsch index for an author is the highest number h such that the author has h papers with at least h citations.  So, an index of 5 means you have five  papers with at least five citations and that you do not have six papers with at least six citations etc.

Glenn points out  that the Hirsch index doesn’t do a great job at ranking economists.  Nobel prize winner Roger Myerson’s Hirsch index is a mere 32.  But he has a few papers with over a thousand citations.  Seminal papers in economics tend to get a huge number of citations but most only get a few.  So, the plain vanilla Hirsch index needs to be re-evaluated.

Glenn turns to the market to guide his measure.  He studies an index of the form h is the highest number such that the author has at least h papers with at least a times h to the power b citations. The plain vanilla Hirsch index sets a=b=1. Glenn estimates a and b in various ways. In one method, he looks at the NRC department rankings and finds the variables a and b that best predict the NRC rank of a (young) economist’s department.  To cut a long story short, a=5 and b=2 come out as the best predictors.  With this estimation in hand, we can perform various comparisons – Which fields are highly cited? Which economists are highly cited? Etc..

Here are some tasty morsels of information.  International finance, trade and behavioral economics are highly cited fields (Table 6).  Micro theory and cross-sectional econometrics are the worst and IO does not do too well either.  These facts mean Yale and NU, which are strong in these three areas, are under-cited economics departments.  But basically one gets the picture that an economists citations are closely connected to the rank of the university where s/he is employed.

Ranking young economists, it is pretty obvious who is going to come out on top: Daron Acemoglu with an index of 7.84 (Table 7).  This means Daron has 7.84 papers with roughly 300 citations.  Ed Glaeser and Chad Jones are close behind.  Once you adjust by field, more theorists start to rank highly: Glenn, Ilya Segal, Stephen Morris and Susan Athey pop up.  Also, my friend Aviv Nevo gets a shout out as an underplaced guy.

A few comments:

Most of these people are tenured well before their citations go crazy.  Expert opinion not data-mining leads to their tenure.  This tells you how well expert opinion predicts citations.  Also, to the extent that citations take time, expert opinion will always play a role in tenure decisions.   There is a difference between external opinion and internal opinion.  The same few people always get asked to write letters and they will do a good job.  But internal opinions may be more noisy and depend on the quality of the department.  Then, Glenn’s field-adjusted citation measure gives you some idea of a candidate’s quality and might be a valuable input into the tenure decision.

Finally, there are citations and citations.  A paper getting regular cites in top journals is better than a paper getting cites in lower tier journals.  This can be dealt with by improving the citation index.

At another extreme, some papers may be journalistic, not academic, and then their citations mean less.  For example, Malcom Gladwell gets high citations for the Tipping Point but he did not do any of the original scientific research on which his book is based.  Of course he writes wonderfully and comes up with amazing examples and he is clearly an intellectual.  I bet Harvard would love to have him an as an adjunct professor but they will not give him a tenured professorship.

Despite these caveats, the generalized Hirsch index is an interesting input for academic decision-making.

They say you can’t compare the greats from yesteryear with the stars of today. But when it comes to Nobel laureates, to some extent you can.

The Nobel committee is just like a kid with a bag of candy.  Every day (year) he has to decide which piece of candy to eat (to whom to give the prize) and each day some new candy might be added to his bag (new candidates come on the scene.)  The twist is that each piece of candy has a random expiration date (economists randomly perish) so sometimes it is optimal to defer eating his favorite piece of candy in order to enjoy another which otherwise might go to waste.

The empirical question we are then left with is to uncover the Nobel committee’s underlying ranking of economists based on the awards actually given over time.  It’s not so simple, but there are some clear inferences we can make. (Here’s a list of Laureates up to 2006, with their ages.)

To see that it is not so simple, note that just because X got the prize and Y didn’t doesn’t mean that X is better than Y.  It could have been that the committee planned eventually to give the prize to Y but Y died earlier than expected (or Y is still alive and the time has not yet arrrived.)

When would the committee award the prize to X before Y despite ranking Y ahead of X?  A necessary condition is that Y is older than X and is therefore going to expire sooner.  (I am assuming here that age is a sufficient statistic for mortality risk.)  That gives us our one clear inference:

If X received the prize before Y and X was born later than Y then X is revealed to be better than Y.

(The specific wording is to emphasize that it is calendar age that matters, not age at the time of receiving the prize.  Also if Y never received the prize at all that counts too.)

Looking at the data, we can then infer some rankings.

One of the  first economists to win the prize, Ragnar Frisch (who??) is not revealed preferred to anybody. By contrast, Paul Samuelson, who won the very next year is revealed preferred to kuznets, hicks, leontif, von hayk, myrdal, kantorovich, koopmans, friedman, meade, ohlin, lewis, schulz, stigler, stone, allais, haavelmo, coase and vickrey.

Outdoing Samuelson is Ken Arrow, who is revealed preferred to everyone Samuelson is plus simon, klein, tobin, debreu, buchanan, north, harsanyi, schelling and hurwicz (! hurwicz won the prize 37 years later!), but minus kuznets (a total of 25!)

Also very impressive is Robert Merton who had an incredible streak of being revealed preferred to everyone winning the prize from 1998 to 2006, ended only by Maskin and Myerson (but see below.)

On the flipside, there’s Tom Schelling who is revealed to be worse than 28 other Laureates.  Leo Hurwicz is revealed to be worse than all of those plus Phelps. Hurwicz is not revealed preferred to anybody, a distinction he shares with Vickrey, Havelmo, Schultz (who??), Myrdal (?), Kuznets and Frisch.

Paul Krugman is batting 1,000 having been revealed preferred to all (two) candidates coming after him:  Williamson and Ostrom.

Similar exercises could be carried out with any prize that has a “lifetime achievement” flavor (for example Sophia Loren is revealed preferred to Sidney Poitier, natch.)

There’s a real research program here which should send decision theorists racing to their whiteboards.  We deduced one revealed preference implication. Question:  is that all we can deduce or are there other implied relations?  This is actually a family of questions that depend on how strong assumptions we want to make about the expiration dates in the candy bag.  At one extreme we could ask “is any ranking consistent with the boldface rule above rationalizable by some expiration dates known to the child but not to us?”  My conjecture is yes, i.e. that the boldface rule exhausts all we can infer.

At the other end, we might assume that the committee knows only the age of the candidates and assumes that everyone of a given age has the average mortality rate for that age (in the United States or Europe.)  This potentially makes it harder to rationalize arbitrary choices and could lead to more inferences.  This appears to be a tricky question (the infinite horizon introduces some subtleties.  Surely though Ken Arrow has already solved it but is too modest to publish it.)

Of course, the committee might have figured out that we are making inferences like this and then would leverage those to send stronger signals.  For example, giving the prize to Krugman at age 56 becomes a very strong signal.  This would add some noise.

Finally, the kid-with-a-candy-bag analogy breaks down when we notice that the committee forms bundles.  Individual rankings can still be inferred but more considerations come into play.  Maskin and Myerson got the prize very young, but Hurwicz, with whom they shared the prize, was very close to expiration. We can say that the oldest in a bundle is revealed preferred to anyone older who receives a prize later.  Plus we can infer rankings of fields by looking at the timing of prizes awarded to researchers in similar areas.  For example, time-series econometrics (2003) is revealed preferred to the theory of organizations (2009.)

The Bottom Line:  There is clear advice here for those hoping to win the prize this year, and those who actually do.  If you do win the prize, for your acceptance speech you should start by doing pushups to prove how virile you are.  This signals to the world that you were not given the award because of an impending expiration date but that in fact there was still plenty of time left but the committee still saw fit to act now.  And if you fear you will never win the prize, the sooner you expire the more willing will the public be to believe that you would have won if only you had stuck around.

Cell phone use increases the risk of traffic accidents right?  But how do we prove that?  By showing that a large fraction of accidents involve people talking on cell phones?  Not enough.  A huge fraction of accidents involve people wearing shoes too.

I thought about this for a while and short of a careful randomized experiment it seems hard to get a handle on this using field data.  I poked around a bit and I didn’t find much that looked very convincing.  To give you an example of the standards of research on this topic, one study I found actually contains the following line:

Results Driver’s use of a mobile phone up to 10 minutes before a crash was associated with a fourfold increased likelihood of crashing (odds ratio 4.1, 95% confidence interval 2.2 to 7.7, P < 0.001).

(Think about that for a second.)

Here’s something we could try.  Compare the time trend of accident rates for the overall population of drivers with the same trend restricted to deaf drivers. We would want a time period that begins before the widespread use of mobile phones and continues until today.  Presumably the deaf do not talk on cell phones. So if cell phone use contributed to an increase in traffic risk we would see that in the general population but not among the deaf.

On the other hand, the deaf can use text messaging.  Since there was a period of time when cell phones were in widespread use but text messaging was not, then this gives us an additional test.  If text messaging causes accidents, then this is a bump we should see in both samples.

Anyone know if the data are available?  I am serious.

This article from Not Exactly Rocket Science discusses an experiment studying “competition” between the left and right sides of the brain. Subjects in the experiment had to pick up an object placed at different points on a table and what was observed was which hand they used depending on where the object was. The article makes this observation in passing.

they always used the nearest hand to pick up targets at the far edges of the table, but they used either hand for those near the middle. Their reaction times were slower when they had to choose which hand to use, and particularly if the target was near the centre of the table.

This much is expected, but it supports the idea that the brain is choosing between possible movements associated with each hand. At the centre of the table, when the choice is least clear, it takes longer to come down on one hand or the other.

I stopped there.  Because while this sounds intuitive, there is another intuition that points squarely in the opposite direction.  When the object is in the center of the table, that’s when it matters least which hand you use, so there is no reason to spend extra time thinking about it.  Right?  So…when you have competing intuitions you need a model.

You have to take an action, say “left” or “right” and your payoff depends on the state of the world, some number between -1 and 1.  You prefer “right” when the state is positive and “left” when the state is negative and the farther away from zero is the state, the stronger is that preference.  When the state is exactly zero you are indifferent.

You don’t know the state with perfect precision.  Instead, you initially receive a noisy signal about the state and you have to decide whether to take action right away (and which action) or wait and get a more accurate signal.  It’s costly to wait.  For what values of the initial signal do you wait?  Note that in this model, both of the competing intuitions are present.  If your initial signal is close to zero, it is likely that the true state is close to zero so your loss from choosing the wrong action is small.  Thus the gain from waiting for better information is small.  On the other hand, if your initial signal is far from zero, then the new information is unlikely to affect which action you take so again the gain from waiting is small.

But now we can compute the relative gain.  And the in-passing intuition quoted above is the winner.

Consider two possible values of the initial signal, both positive but one close to zero and one close to +1.  In either case if you don’t wait you will take action “right.”  Now consider the gain from waiting.  Take any state x and let’s consider the scenario where waiting would lead you to believe that the state is x.  If x is positive then you would still choose “right” and waiting would not gain anything.  So fix any negative x and ask what would the gain be if waiting led you to believe that the state is x.  The key observation is that for any fixed x, this gain would be the same regardless of which of the initial signals you had.

So the comparison then just boils down to comparing how likely it is to switch to x from the two different initial signals.  And this comparison depends on how far to the left x is.  Signals very close to -1 are much easier to reach from an initial signal close to zero than from an initial signal close to 1.  And these are the signals where the gain is large.  On the other hand, for x’s just to the left of zero (where the gain is small), the relative likelihood of reaching x from the two initial signals is closer to 50-50.

Formally, unless the distribution generating these signals is very strange, the distribution of payoff gains after an initial signal close to zero first-order stochastically dominates the distribution of payoff gains when you start close to 1.  So you are always more inclined to wait when your initial signal is close to zero.

For 15 years, the British bookmaker William Hill allowed bettors to wager on their own weight loss, often taking out full-page newspaper ads to publicize the bet.  This was a clear opportunity for those looking to lose weight to make a commitment, with real teeth.  Here is a paper by Nicholas Burger and John Lynham which analyzes the data.

Descriptive statistics are presented in Table 2, which shows that 80% of bettors lose their bets. Odds for the bets range from 5:1 to 50:1 and potential payoffs average $2332.9 The average daily weight loss that a bettor must achieve to win their bet is 0.39 lbs. In terms of reducing caloric intake to lose weight, this is equivalent to reducing daily consumption by two Starbucks hot chocolates. The first insight we draw from this market is that although bettors are aware of their need for commitment mechanisms, those in our sample are not particularly skilled at selecting the right mechanisms.10 Bettors go to great lengths to construct elaborate constraints on their behaviour, which are usually unsuccessful.

Women do much worse than men.  Bets in which the winnings were committed to charity outperformed the average.  Bets with a longer duration (Lose 2x pounds in 2T days rather than x pounds in T days) have longer odds, suggesting that the market understands time inconsistency.

Beanie barrage:  barker.

Follow

Get every new post delivered to your Inbox.

Join 737 other followers