By asking a hand-picked team of 3 or 4 experts in the field (the “peers”), journals hope to accept the good stuff, filter out the rubbish, and improve the not-quite-good-enough papers.
…Overall, they found a reliability coefficient (r^2) of 0.23, or 0.34 under a different statistical model. This is pretty low, given that 0 is random chance, while a perfect correlation would be 1.0. Using another measure of IRR, Cohen’s kappa, they found a reliability of 0.17. That means that peer reviewers only agreed on 17% more manuscripts than they would by chance alone.
That’s from neuroskeptic writing about an article that studies the peer-review process. I couldn’t tell you what Cohen’s kappa means but let’s just take the results at face value: referees disagree a lot. Is that bad news for peer-review?
Suppose that you are thinking about whether to go to a movie and you have three friends who have already seen it. You must choose in advance one or two of them to ask for a recommendation. Then after hearing their recommendation you will decide whether to see the movie.
You might decide to ask just one friend. If you do it will certainly be the case that sometimes she says thumbs-up and sometimes she says thumbs-down. But let’s be clear why. I am not assuming that your friends are unpredictable in their opinions. Indeed you may know their tastes very well. What I am saying is rather that, if you decide to ask this friend for her opinion, it must be because you don’t know it already. That is, prior to asking you cannot predict whether or not she will recommend this particular movie. Otherwise, what is the point of asking?
Now you might ask two friends for their opinions. If you do, then it must be the case that the second friend will often disagree with the first friend. Again, I am not assuming that your friends are inherently opposed in their views of movies. They may very well have similar tastes. After all they are both your friends. But, you would not bother soliciting the second opinion if you knew in advance that it was very likely to agree or disagree with the first on this particular movie. Because if you knew that then all you would have to do is ask the first friend and use her answer to infer what the second opinion would have been.
If the two referees you consult are likely to agree one way or the other, you get more information by instead dropping one of them and bringing in your third friend, assuming he is less likely to agree.
This is all to say that disagreement is not evidence that peer-review is broken. Exactly the opposite: it is a sign that editors are doing a good job picking referees and thereby making the best use of the peer-review process.
It would be very interesting to formalize this model, derive some testable implications, and bring it to data. Good data are surely easily accessible.
(Picture: Right Sizing from www.f1me.net)
6 comments
Comments feed for this article
February 16, 2011 at 12:01 am
MMP
now suppose I add in some sort of reputatation and moral hazard. you’re worried as an editor that i’m a slacker as a referee… i suspect that you can use ‘high’ levels of agreement on the reports relative to your ex-ante opinion as evidence that the referees aren’t doing their job and just basing their opinions off the authors’ names instead. did you ever do that as an editor?
February 16, 2011 at 12:02 am
jeff
only with you.
February 16, 2011 at 1:26 am
anonymouse
Prof. Myerson once related a story to us about a paper he had submitted to some journal. One referee said, “This is great but you should really keep the first half and get rid of the second.” Another referee said, “This is great but you should really keep the second half and get rid of the first.” And fortunately the editor decided to take the union of those recommendations, rather than the intersection…
Having a variety of opinions means the editor can take the appropriate unions and intersections of different sorts of comments the referees might make.
February 16, 2011 at 10:08 am
wellplacedadjective
efficient and durable decision rules with incomplete information?
February 25, 2011 at 11:22 pm
zbicyclist
I’m reminded of a study (in Chance, I believe, some years ago) that compared top movie reviewers. Interestingly (1) reliability was pretty low, and (2) Siskel and Ebert, whose show emphasized disagreement, were more in agreement than other top critics of the day.
December 2, 2012 at 11:25 pm
A New Paper About The Editorial Process « Cheap Talk
[…] The random selection of referees removes this potential objection. […]