By asking a hand-picked team of 3 or 4 experts in the field (the “peers”), journals hope to accept the good stuff, filter out the rubbish, and improve the not-quite-good-enough papers.

…Overall, they found a reliability coefficient (r^2) of 0.23, or 0.34 under a different statistical model. This is pretty low, given that 0 is random chance, while a perfect correlation would be 1.0. Using another measure of IRR, Cohen’s kappa, they found a reliability of 0.17. That means that peer reviewers only agreed on 17% more manuscripts than they would by chance alone.

That’s from neuroskeptic writing about an article that studies the peer-review process.  I couldn’t tell you what Cohen’s kappa means but let’s just take the results at face value:  referees disagree a lot.  Is that bad news for peer-review?

Suppose that you are thinking about whether to go to a movie and you have three friends who have already seen it.  You must choose in advance one or two of them to ask for a recommendation.  Then after hearing their recommendation you will decide whether to see the movie.

You might decide to ask just one friend.  If you do it will certainly be the case that sometimes she says thumbs-up and sometimes she says thumbs-down. But let’s be clear why.  I am not assuming that your friends are unpredictable in their opinions.  Indeed you may know their tastes very well.  What I am saying is rather that, if you decide to ask this friend for her opinion, it must be because you don’t know it already. That is, prior to asking you cannot predict whether or not she will recommend this particular movie.  Otherwise, what is the point of asking?

Now you might ask two friends for their opinions.  If you do, then it must be the case that the second friend will often disagree with the first friend.  Again, I am not assuming that your friends are inherently opposed in their views of movies. They may very well have similar tastes. After all they are both your friends. But, you would not bother soliciting the second opinion if you knew in advance that it was very likely to agree or disagree with the first on this particular movie. Because if you knew that then all you would have to do is ask the first friend and use her answer to infer what the second opinion would have been.

If the two referees you consult are likely to agree one way or the other, you get more information by instead dropping one of them and bringing in your third friend, assuming he is less likely to agree.

This is all to say that disagreement is not evidence that peer-review is broken. Exactly the opposite:  it is a sign that editors are doing a good job picking referees and thereby making the best use of the peer-review process.

It would be very interesting to formalize this model, derive some testable implications, and bring it to data. Good data are surely easily accessible.

(Picture:  Right Sizing from