Why a Crowd-Sourced Peer-Review System Would Be Good for Philosophy (guest post) | Daily Nous



Why a Crowd-Sourced Peer-Review System Would Be Good for Philosophy (guest post)

By
Justin Weinberg
.
February 7, 2022 at 6:00 am

Would "an online, crowd-sourced peer-review system" work better than traditional peer-review as a "quality control device" in philosophy? In a paper forthcoming in The British Journal for the Philosophy of Science, three philosophers, Marcus Arvan (Tampa), Liam Kofi Bright (LSE), and Remco Heesen (Western Australia), argue for a positive answer to this question.

In the following guest post,* they lay out some of the main considerations in favor of the idea that they discuss more fully in the paper itself, as well as address some objections to it.


Why a Crowd-Sourced Peer-Review System Would Be Good for Philosophy
by Marcus Arvan, Liam Kofi Bright, and Remco Heesen

Peer review is often thought to be an important form of quality control on academic research. But, assuming it is, what is the best form of peer review for this purpose? It appears to be widely assumed that peer review at academic journals is the best method. For example, hiring and tenure committees evaluate candidates on the basis of their publication record. But, is peer review at journals really the best method for evaluating quality? We argue not. Using the Condorcet Jury Theorem, we contend than an online, crowd-sourced peer-review system similar to what currently prevails in math and physics is likely to perform better as a quality control device than traditional peer review.

We first argue that, if any form of peer review is to have any success at quality control, two conditions need to be satisfied. First, researchers in a given field must be competent at evaluating quality of research. Second, for a given paper there must be some intersubjective agreement (however broad or vague) on what constitutes quality appropriate for that paper. If either of these assumptions were false, then no system of peer review could perform the form of quality control commonly attributed to it.

Next, we assume that a crowd-sourced peer-review system could be expected to have a higher average number of reviewers per paper than traditional peer review. This is plausible because the number of reviewers who evaluate a given paper in journal review is miniscule: papers submitted to journals are standardly evaluated by an editor or two at the ‘desk-reject’ stage, and if they pass this stage, they are normally sent to only one to three reviewers. We expect that an online, crowd-sourced system would involve many more people reviewing papers, particularly if a crowd-sourced peer-review website (built on top of preprint servers like arXiv or PhilPapers) incentivized reviewing.

Based on these assumptions, we construct a series of arguments that a crowd-sourced approach is likely to evaluate the quality of academic research more reliably than traditional peer review. Our arguments are based on the Condorcet Jury Theorem, the famous mathematical finding that larger numbers of evaluators are far more likely to evaluate a proposition correctly than a smaller group. To see how, consider a jury of 100 people tasked with voting on whether p is true. Suppose that the average likelihood that any individual member will judge p rightly is slightly better than chance, or .51. Chances are that 51 members of the jury will vote correctly and 49 won’t. This means that it takes only one additional errant vote to tip the scales toward the majority judgment failing to evaluate p correctly—a probability of .38. Now consider a jury of 100,000. If the average jury member’s accuracy remains .51, then the most likely result is 51,000 jury members voting correctly and 49,000 incorrectly. This means that for the majority judgment to err, 1000 additional voters must err—which only occurs with a probability of one in ten billion. In short, the Condorcet theorem demonstrates that larger numbers of evaluators are more likely to correctly evaluate something as a group than a smaller number.

We then provide three arguments using this theorem that a crowd-sourced peer-review system is likely to result in more reliable group judgments of paper quality than journal review. We argue that this follows irrespective of whether the crowd-sourced system involves (1) binary judgments (i.e. paper X is good/not good), (2) reviewer scores (i.e. evaluating papers on some scale, i.e. 1-100), and (3) qualitative reasons given by reviewers. Since peer review at journals standardly utilizes one or more of these measures of quality—as reviewers may be asked to render an overall judgment on a paper (accept/reject), rate a paper numerically, or write qualitative reviewer reports—it follows that a crowd-sourced peer-review system is likely to better evaluate paper quality than journal review.

Finally, we address a variety of objections, including logistical concerns about how an online, crowd-sourced system would work. First, we argue that ‘review bombing’ and trolling could be addressed in several ways, ranging from technological solutions (such as statistical software to detect and flag correlated votes) to human-based ones, including but not limited to initially anonymizing papers for some period of time, to the ability of reviewers or moderators to flag suspicious reviews, to two types of reviewers with separate reviewer scores: expert reviewers and general reviewers. Second, to the common objection that journals are likely to select more reliable reviewers than a crowd-based system would have—since journals (particularly selective ones) may be likely to select the most highly-established experts in a field as reviewers—we argue that a variety of findings cast doubt on this. Empirical studies on peer-review indicate that interrater reliability among journal reviewers is barely better than chance, and moreover, that journal review is disproportionately conservative, preferring ‘safe’ papers over more ambitious ones. We suggest a variety of reasons for this: journals have incentives to avoid false positives (publishing bad papers); reviewers and editors have incentives to reject papers given that the journal can only accept few papers; well-established researchers have reasons to be biased in favor of the status quo; and small groups of reviewers who publish in the same area and attend conferences together may be liable to groupthink. These speculations are backed up by numerous examples in a variety of fields—including philosophy, psychology, and economics—of influential or otherwise prestigious papers (including Nobel Prize winning economics papers) being systematically rejected by journals. We argue that, whatever biases will exist in a crowd-sourced model, they are likely to be distributed more randomly. Hence, the combined judgment of crowd-sourced reviewers will be more reliable on average, not less.

If we are correct, should peer review at journals disappear? We are agnostic about this (at least as a group), as the disciplines of math and physics combine crowd-sourced peer review with journal review. Given that some may be likely to remain skeptical of online reviews, we suspect that a Rottentomatoes-like crowd-sourced peer review site—perhaps housed at PhilPapers or here—might complement rather than supplant peer-reviewed journals, in broadly the way that math and physics currently do – a ‘best of both worlds’ approach. Indeed, it would be interesting to compare how the systems work concurrently.

Would a crowd-based peer-review system like we propose actually work in practice? Would enough people partake in it? Would reviews be thoughtful and evidence-based (reflecting reviewer competence) or incompetent? Could logistical problems (such as the kinds of things that have plagued Rottentomatoes.com) be overcome? We argue that answers to these questions cannot be settled a priori, but that there are a number of reasons to be optimistic. Finally, we offer suggestions for how to ‘beta’ (and later tweak) our proposal. Only time will tell, but we believe that what we currently lack are not good reasons for attempting to create such a forum—as our paper purports to show that there are good reasons to try. What we currently lack is the will to create such a system, and we hope that our paper contributes to building this will.

Attach an image to this comment
6 Comments
Most reacted comment
Hottest comment thread
Oldest
Comment Link
9 hours ago

I think the comparison to physics might be misleading. Physics doesn’t have any formal crowd-sourced peer review: arxiv.org doesn’t have a "rate my paper" function, for instance. What it does have is a pretty systematic culture of posting to arxiv at, before, or (occasionally) instead of submitting to a journal.

If the proposal is that we should develop that culture too, I’m all for it, and it would be logistically very easy to do given our existing archives (PhilPapers and philsci-archive). Indeed, to some extent we already have that culture in philosophy of physics: a respectable fraction of people, including me, post their paper physics-style. (And I’ve done so since I was a grad student.) If anything I think preprint submission has decreased among junior people in my field and I’d love to see it reversed.

If the proposal is to develop a formal, aggregative system to rank preprints, I’m ambivalent about it, but at any rate it would be going well beyond what physics does.Report

17
Reply
Hide Replies
Comment Link
Reply to David Wallace
8 hours ago

As a junior philosopher of physics with a mixed record of uploading preprints: I am generally wary of posting preprints for anything that I intend to possibly submit to a triple-blind journal, for fear of (increasing the odds of) losing the best relevant editor at that journal. Since BJPS, in particular, is one of these journals, there is often at least one such journal on the list for any given manuscript. So, I feel this chilling effect most of the time — but it only takes the one journal to induce it.Report

5
Reply
Comment Link
6 hours ago

Nice to see the virtues of crowd-reviewing rigorously expounded. This is something that I’ve been thinking about as a possible PhilPapers project. I continue to think about it, though these projects have taken a back seat with COVID-induced disruptions.Report

5
Reply
Comment Link
4 hours ago

I am dubious that we should ‘crowd source’ assessments that involve expertise that only a minority in the crowd will have. For example: even if the Condorcet jury theorem does well in cases where people have broadly the same background knowledge (e.g., how many marbles in the jar) it would be a mistake to crowd source questions about whether some theory in quantum physics should be published (or: imagine if they decided whether to publish Andrew Wiles’ FLT proof by crowdsourcing it… even though only a small handful of people even understood it). But what we say for quantum physics and math should presumably go also for philosophy (e.g., crowdsourcing the latest argument for what grounds facts about grounding, etc).Report

Last edited 4 hours ago by H. N. Torrance
5
Reply
Hide Replies
Marcus Arvan
Comment Link
Reply to H. N. Torrance
3 hours ago

Hi H.N.: it’s worth noting here that we build components into our proposal to address this, including mechanisms for identifying pools of expert reviewers in different subfields and reporting their scores and reviews separately from non-experts. We argue that there are advantages in providing both types of scores simultaneously. Yes, experts are experts. But experts can also be subject to groupthink and have dubious assumptions pointed out by relative outsiders. So, having both types of reviewers in a crowd-based system corrects for/balances the respect epistemic merits of both types of reviewer (and better so, we argue, that journal peer-review alone).Report

2
Reply
Comment Link
2 hours ago

‘Would "an online, crowd-sourced peer-review system" work better than traditional peer-review as a "quality control device" in philosophy?’

This is the first sentence of this article. I think it is the wrong question. This would be the question I would ask: Would "an online, crowd-sourced peer-review system" enhance the traditional peer-review system?

And even if the answer were a tepid, maybe.., I think it is worth a shot. What have we got to lose? Not much! And the potential gains could fundamentally change the dynamics of innovation in theoretical thinking.

Now that we all agree that such a system would potentially be beneficial, let me refer to the last statement of the article to drill to the crux of the issue as it stands: ‘…[W]hat we currently lack are not good reasons for attempting to create such a forum—as our paper purports to show that there are good reasons to try. What we currently lack is the will to create such a system, and we hope that our paper contributes to building this will.’

Luckily, my system—The Matrix-8 Solution—will handle the logistical problems traditionally besetting democratic processes in large groups. It should put Condorcet at ease. Not only does it allow for large numbers of evaluators – my system solves the Democratic Trilemma to boot! And in Trusted Reputation, it solves the longstanding problem of differentiating between bots, trolls, and honorable participants. It is currently under development as the governance system for an up-and-coming cryptocurrency. It could easily be tweaked to fit your proposed forum’s needs.

You can see the dynamic of Trusted Reputation here, and the full White Paper for the system here. Report

2
Reply

Recent Comments

Subscribe

Archives

Paid Advertisements

Heap of Links

2021 ©Daily Nous