From the authors:
A fundamental problem of the peer review process is that it introduces conflicting interests or moral hazard problems in a variety of situations. By accepting high quality work and thus promoting it, the referee risks to draw the attention to these ideas and possibly away from her own. A post-doc looking for his next position is maybe not happy to accept a good paper of his peer who competes for the same position. A big-shot in a particular field might fear to risk his ’guru status’ by accepting challenging and maybe better ideas than his own, etc. In other words, referees who optimize their overall ’utility’ (status, papers, fame, position, ...) might find that accepting good scientific work of others is in direct conflict with their own utility maximization. In the following we call utility optimizing referees rational.Though one might argue that it is obvious that self interests would hurts peer review, we would like to be able to put some numbers behind the idea so as to "quantify" how bad the problem might be.
To test the effects of a rational referee, the authors ran several simulations where papers are refereed by 3 types of referees described below. The scientific quality of the papers follow a Gaussian distribution, ie each paper "is assigned an 'IQ' index... drawn from a normal distribution" with mean=100 and standard deviation = 10. In addition to being a referee, each person in the simulation is also an author of a paper himself/herself but never referees their own paper. Here are the types of referees considered:
- The correct referee: This person is competent to judge the quality of the work, and only accepts the best scientific papers given. (Using an algorithm described in the above paper.)
- The stupid referee: This is someone who is not competent to properly judge the work and so the acceptance or rejection is random. (Who hasn't run across this? :) )
- The rational referee: This is someone who compares the quality of the paper they are refereeing to the quality of their own work and accepts or rejects accordingly.
At this point it should be pointed out that if the average paper accepted has a score of 100, then the peer review process does no better than flipping a coin. With that said, here are some plots:
The above plots show the results of average paper quality versus the fraction of rational referees. The three separate colored curves are for different fractions of "stupid" referees. For example, the blue curve has 10% of the referees being stupid.
The plot above here shows what happens after t publication rounds. Fig. a is when all referees are correct. As you can see the average paper IQ is ~120. Fig. b shows a histogram of the IQ of the papers accepted compared to the gaussian distribution they were drawn from. Fig. c is the same as Fig. a except now 10% of the referees are rational and Fig. d is the same as Fig. b with the same caveat. If only 10% of referees are rational, the paper quality diminishes significantly.
The authors conclude thus:
And as we all know, referees that factor in their own self interest or are incompetent to judge the works they are assigned exist. As may be inferred from above, if these two groups aren't kept in check, the process may become no more reliable than flipping a coin.
The presence of relatively small fractions of ’rational’ and/or ’random’ referees (deviating from correct behavior) considerably reduces the average quality of published or sponsored science as a whole... systemic level. Our message is clear: if it can not be guar- anteed that the fraction of ’rational’ and ’random’ referees is confined to a very small number, the peer review system will not perform much better than by accepting papers by throwing (an unbiased!) coin.
Stefan Thurner, & Rudolf Hanel (2010). Peer-review in a world with rational scientists: Toward selection of the average E-Print arXiv: 1008.4324v1