Why a Crowd-Sourced Peer-Review System Would Be Good for Philosophy (guest post) | Daily Nous

Published by Reblogs - Credits in Posts, February 7th, 2022

Daily Nous

Why a Crowd-Sourced Peer-Review System Would Be Good for Philosophy (guest post)

. February 7, 2022 at 6:00 am

Would "an online, crowd-sourced peer-review system" work better than traditional peer-review as a "quality control device" in philosophy? In a paper forthcoming in The British Journal for the Philosophy of Science, three philosophers, Marcus Arvan (Tampa), Liam Kofi Bright (LSE), and Remco Heesen (Western Australia), argue for a positive answer to this question.

In the following guest post,* they lay out some of the main considerations in favor of the idea that they discuss more fully in the paper itself, as well as address some objections to it.

Why a Crowd-Sourced Peer-Review System Would Be Good for Philosophy
by Marcus Arvan, Liam Kofi Bright, and Remco Heesen

Peer review is often thought to be an important form of quality control on academic research. But, assuming it is, what is the best form of peer review for this purpose? It appears to be widely assumed that peer review at academic journals is the best method. For example, hiring and tenure committees evaluate candidates on the basis of their publication record. But, is peer review at journals really the best method for evaluating quality? We argue not. Using the Condorcet Jury Theorem, we contend than an online, crowd-sourced peer-review system similar to what currently prevails in math and physics is likely to perform better as a quality control device than traditional peer review.

We first argue that, if any form of peer review is to have any success at quality control, two conditions need to be satisfied. First, researchers in a given field must be competent at evaluating quality of research. Second, for a given paper there must be some intersubjective agreement (however broad or vague) on what constitutes quality appropriate for that paper. If either of these assumptions were false, then no system of peer review could perform the form of quality control commonly attributed to it.

Next, we assume that a crowd-sourced peer-review system could be expected to have a higher average number of reviewers per paper than traditional peer review. This is plausible because the number of reviewers who evaluate a given paper in journal review is miniscule: papers submitted to journals are standardly evaluated by an editor or two at the ‘desk-reject’ stage, and if they pass this stage, they are normally sent to only one to three reviewers. We expect that an online, crowd-sourced system would involve many more people reviewing papers, particularly if a crowd-sourced peer-review website (built on top of preprint servers like arXiv or PhilPapers) incentivized reviewing.

Based on these assumptions, we construct a series of arguments that a crowd-sourced approach is likely to evaluate the quality of academic research more reliably than traditional peer review. Our arguments are based on the Condorcet Jury Theorem, the famous mathematical finding that larger numbers of evaluators are far more likely to evaluate a proposition correctly than a smaller group. To see how, consider a jury of 100 people tasked with voting on whether p is true. Suppose that the average likelihood that any individual member will judge p rightly is slightly better than chance, or .51. Chances are that 51 members of the jury will vote correctly and 49 won’t. This means that it takes only one additional errant vote to tip the scales toward the majority judgment failing to evaluate p correctly—a probability of .38. Now consider a jury of 100,000. If the average jury member’s accuracy remains .51, then the most likely result is 51,000 jury members voting correctly and 49,000 incorrectly. This means that for the majority judgment to err, 1000 additional voters must err—which only occurs with a probability of one in ten billion. In short, the Condorcet theorem demonstrates that larger numbers of evaluators are more likely to correctly evaluate something as a group than a smaller number.

We then provide three arguments using this theorem that a crowd-sourced peer-review system is likely to result in more reliable group judgments of paper quality than journal review. We argue that this follows irrespective of whether the crowd-sourced system involves (1) binary judgments (i.e. paper X is good/not good), (2) reviewer scores (i.e. evaluating papers on some scale, i.e. 1-100), and (3) qualitative reasons given by reviewers. Since peer review at journals standardly utilizes one or more of these measures of quality—as reviewers may be asked to render an overall judgment on a paper (accept/reject), rate a paper numerically, or write qualitative reviewer reports—it follows that a crowd-sourced peer-review system is likely to better evaluate paper quality than journal review.

Finally, we address a variety of objections, including logistical concerns about how an online, crowd-sourced system would work. First, we argue that ‘review bombing’ and trolling could be addressed in several ways, ranging from technological solutions (such as statistical software to detect and flag correlated votes) to human-based ones, including but not limited to initially anonymizing papers for some period of time, to the ability of reviewers or moderators to flag suspicious reviews, to two types of reviewers with separate reviewer scores: expert reviewers and general reviewers. Second, to the common objection that journals are likely to select more reliable reviewers than a crowd-based system would have—since journals (particularly selective ones) may be likely to select the most highly-established experts in a field as reviewers—we argue that a variety of findings cast doubt on this. Empirical studies on peer-review indicate that interrater reliability among journal reviewers is barely better than chance, and moreover, that journal review is disproportionately conservative, preferring ‘safe’ papers over more ambitious ones. We suggest a variety of reasons for this: journals have incentives to avoid false positives (publishing bad papers); reviewers and editors have incentives to reject papers given that the journal can only accept few papers; well-established researchers have reasons to be biased in favor of the status quo; and small groups of reviewers who publish in the same area and attend conferences together may be liable to groupthink. These speculations are backed up by numerous examples in a variety of fields—including philosophy, psychology, and economics—of influential or otherwise prestigious papers (including Nobel Prize winning economics papers) being systematically rejected by journals. We argue that, whatever biases will exist in a crowd-sourced model, they are likely to be distributed more randomly. Hence, the combined judgment of crowd-sourced reviewers will be more reliable on average, not less.

If we are correct, should peer review at journals disappear? We are agnostic about this (at least as a group), as the disciplines of math and physics combine crowd-sourced peer review with journal review. Given that some may be likely to remain skeptical of online reviews, we suspect that a Rottentomatoes-like crowd-sourced peer review site—perhaps housed at PhilPapers or here—might complement rather than supplant peer-reviewed journals, in broadly the way that math and physics currently do – a ‘best of both worlds’ approach. Indeed, it would be interesting to compare how the systems work concurrently.

Would a crowd-based peer-review system like we propose actually work in practice? Would enough people partake in it? Would reviews be thoughtful and evidence-based (reflecting reviewer competence) or incompetent? Could logistical problems (such as the kinds of things that have plagued Rottentomatoes.com) be overcome? We argue that answers to these questions cannot be settled a priori, but that there are a number of reasons to be optimistic. Finally, we offer suggestions for how to ‘beta’ (and later tweak) our proposal. Only time will tell, but we believe that what we currently lack are not good reasons for attempting to create such a forum—as our paper purports to show that there are good reasons to try. What we currently lack is the will to create such a system, and we hope that our paper contributes to building this will.

Categories Publishing

 Don't move! 6 likes

Attach an image to this comment

6 Comments

Most reacted comment

Hottest comment thread

Oldest 

David Wallace





Comment Link

 9 hours ago

I think the comparison to physics might be misleading. Physics doesn’t have any formal crowd-sourced peer review: arxiv.org doesn’t have a "rate my paper" function, for instance. What it does have is a pretty systematic culture of posting to arxiv at, before, or (occasionally) instead of submitting to a journal.

If the proposal is that we should develop that culture too, I’m all for it, and it would be logistically very easy to do given our existing archives (PhilPapers and philsci-archive). Indeed, to some extent we already have that culture in philosophy of physics: a respectable fraction of people, including me, post their paper physics-style. (And I’ve done so since I was a grad student.) If anything I think preprint submission has decreased among junior people in my field and I’d love to see it reversed.

If the proposal is to develop a formal, aggregative system to rank preprints, I’m ambivalent about it, but at any rate it would be going well beyond what physics does.Report

 Hide Replies

junior philosopher of physics





Comment Link

 Reply to David Wallace

 8 hours ago

As a junior philosopher of physics with a mixed record of uploading preprints: I am generally wary of posting preprints for anything that I intend to possibly submit to a triple-blind journal, for fear of (increasing the odds of) losing the best relevant editor at that journal. Since BJPS, in particular, is one of these journals, there is often at least one such journal on the list for any given manuscript. So, I feel this chilling effect most of the time — but it only takes the one journal to induce it.Report

David Bourget





Comment Link

 6 hours ago

Nice to see the virtues of crowd-reviewing rigorously expounded. This is something that I’ve been thinking about as a possible PhilPapers project. I continue to think about it, though these projects have taken a back seat with COVID-induced disruptions.Report

H. N. Torrance





Comment Link

 4 hours ago

I am dubious that we should ‘crowd source’ assessments that involve expertise that only a minority in the crowd will have. For example: even if the Condorcet jury theorem does well in cases where people have broadly the same background knowledge (e.g., how many marbles in the jar) it would be a mistake to crowd source questions about whether some theory in quantum physics should be published (or: imagine if they decided whether to publish Andrew Wiles’ FLT proof by crowdsourcing it… even though only a small handful of people even understood it). But what we say for quantum physics and math should presumably go also for philosophy (e.g., crowdsourcing the latest argument for what grounds facts about grounding, etc).Report

Last edited 4 hours ago by H. N. Torrance

 Hide Replies

Marcus Arvan





Comment Link

 Reply to H. N. Torrance

 3 hours ago

Hi H.N.: it’s worth noting here that we build components into our proposal to address this, including mechanisms for identifying pools of expert reviewers in different subfields and reporting their scores and reviews separately from non-experts. We argue that there are advantages in providing both types of scores simultaneously. Yes, experts are experts. But experts can also be subject to groupthink and have dubious assumptions pointed out by relative outsiders. So, having both types of reviewers in a crowd-based system corrects for/balances the respect epistemic merits of both types of reviewer (and better so, we argue, that journal peer-review alone).Report

John Huckel





Comment Link

 2 hours ago

‘Would "an online, crowd-sourced peer-review system" work better than traditional peer-review as a "quality control device" in philosophy?’

This is the first sentence of this article. I think it is the wrong question. This would be the question I would ask: Would "an online, crowd-sourced peer-review system" enhance the traditional peer-review system?

And even if the answer were a tepid, maybe.., I think it is worth a shot. What have we got to lose? Not much! And the potential gains could fundamentally change the dynamics of innovation in theoretical thinking.

Now that we all agree that such a system would potentially be beneficial, let me refer to the last statement of the article to drill to the crux of the issue as it stands: ‘…[W]hat we currently lack are not good reasons for attempting to create such a forum—as our paper purports to show that there are good reasons to try. What we currently lack is the will to create such a system, and we hope that our paper contributes to building this will.’

Luckily, my system—The Matrix-8 Solution—will handle the logistical problems traditionally besetting democratic processes in large groups. It should put Condorcet at ease. Not only does it allow for large numbers of evaluators – my system solves the Democratic Trilemma to boot! And in Trusted Reputation, it solves the longstanding problem of differentiating between bots, trolls, and honorable participants. It is currently under development as the governance system for an up-and-coming cryptocurrency. It could easily be tweaked to fit your proposed forum’s needs.

You can see the dynamic of Trusted Reputation here, and the full White Paper for the system here. Report

Heap of Links

New developments in plagiarism: AI paraphrasing tools -- one professor's experience detecting its use by a student
"Dear Professor James, I am so sorry but really I do not feel a bit like an examination paper in philosophy today" -- Gertrude Stein was apparently the teacher's pet in William James' class
"To be honest it’s a bit embarrassing to see philosophy operating downstream from popular culture and corporate PR, rather than approaching these overwhelmingly dominant forces critically" -- Justin E.H. Smith (Paris) brings his fascinatingly wide angle perspective to VR, AI, and applied ethics
"That question just stuck with me… you could be doing a whole string of science based on a flawed metaphysical assumption… Someone needs to work on this" -- the "origin story" of philosopher Quayshawn Spencer (U. Penn)
What are the rules for not being a "COVID jerk"? -- Eric Schwitzgebel (Riverside) lets us know
Psychological research inspired by Parfit’s work on the connection between prudence, morality, and the metaphysics of the self -- some experiments "suggest that people who score high on the Future Self Continuity measure have higher moral standards"
"What if animals do know what it means to die?" -- work in philosophy, psychology, and biology is helping us understand whether animals understand death (via Kris McDaniel)
"It is the duty of philosophy to destroy the illusions which had their origin in misconceptions, whatever darling hopes and valued expectations may be ruined by its explanations" -- Immanuel Kant interviewed by Richard Marshall at 3:16AM
What is gender? -- Robin Dembroff (Yale) in conversation with Justin E.H. Smith (Paris)
"Liberal neutrality rests on substantive moral goods: moral relations between diverse persons" -- and not only is that not incoherent, argues Kevin Vallier (Bowling Green), it's part of neutrality's appeal
"If you cannot do anything about what upsets you, you should attempt to free yourself from such negative emotions… If, by contrast, there is an opportunity for changing the distressing situation, then you should embrace the pain you feel and let it motivate you" -- Katharina Volk (Columbia) on how to make sense of Cicero's changing view of the emotions
If "actions in virtual worlds will potentially be as meaningful as actions in the physical world," what ethics apply to them? What law? -- an excerpt from Reality+, the new book from David Chalmers (NYU)
The new version of GPT, "InstructGPT," is better at following people’s instructions -- but "a byproduct of training our models to follow user instructions is that they may become more susceptible to misuse if instructed to produce unsafe outputs. Solving this requires our models to refuse certain instructions; doing this reliably is an important open research problem"
"Why did Husserl begin thinking about movement?" -- Carrie Noland (UC Irvine), a professor of French and comparative literature, on her "adventure" looking into Husserl's influences and motivation
At public schools, should students be taught "tolerance as non-disapproval" or "tolerance as forbearance"? -- there's controversy no matter what, argues Christina Easton (Warwick)
"Well, I see metaphysics as ‘lifestyle’" -- Wilhelm Dilthey is "interviewed" by Richard Marshall at 3:16AM
"‘Love Letters’ tells the tale of a white college [philosophy] professor named Anna Stubblefield and the black family whose lives she turned upside down when she helped teach their disabled son a controversial typing technique known as ‘facilitated communication’ but then took things too far" -- writer Andrew Bluestone has won a Humanitas Fellowship to work on this script
"Much of our reasoning under uncertainty involves negotiating an accuracy-informativity tradeoff, and that this helps to explain a variety of patterns in the things people tend to guess, believe, and assert" -- Kevin Dorst (Pitt) & Matthew Mandelkern (NYU) on whether the conjunction fallacy is really a fallacy
The song has lyrics from Wittgenstein and is dedicated to Rosalind Hursthouse -- it's by New Zealand's Karl Steven (of Supergroove), who took a break from his musical career to get a PhD in philosophy from Cambridge (via Yuri Cath)
Amartya Sen on the memories that shaped his research -- in an interview on the radio show "Marketplace"
"The philosophy of mind is not, pace so many of its contemporary exponents, an ethically neutral or ideologically innocent study. The philosophy of mind is a part of "human science"; politics has everything to do with it" -- Sophie-Grace Chappell (Open U.) argues that consciousness is both gendered and sexed
"A life in VR could be just as meaningful as a life in the physical world" -- David Chalmers (NYU) in conversation with Evan Selinger (RIT)
"Maintaining our punishing attitude towards plagiarism could reap benefits well beyond discouraging plagiarism itself" -- Stuart Ritchie (KCL) counters recent arguments for why we ought not care about plagiarism
A philosophy PhD’s suicide and the mission of an academic organization with which many political philosophers have been involved -- the "existential struggle" taking place at Liberty Fund (via Chris Bertram)
"All those yellow and green Wordle grids popping up on our screens give us a steady stream of small communions" -- C. Thi Nguyen (Utah) on how the popular word game provides moments of mutual understanding
"The show… takes the form of a gathering of ‘radical fairies,’ who come together each year to mourn, and re-enact, the death of Socrates" -- a new jazz opera about the final hours of Socrates is opening in Manhattan
"The defining characteristic of fiction is that it’s made up. So how can we learn from it?" -- that may sound like an easy puzzle to solve, but it's not, argues Amy Kind (Claremont McKenna) in her guest stint at The Splintered Mind
"Likely the first book about moral philosophy to feature endorsements from Steve Carell, Amy Poehler, Ted Danson and Mindy Kaling" -- Michael Schur, the creator of the TV show "The Good Place," has written a book
It’s "not about invincibility, but about vulnerability. And the role supportive others play in sustaining our resilience" -- one of several aspects of Stoicism discussed in an interview with Nancy Sherman (Georgetown)
"If I am right, neither the science of physics, nor any other science, could express all the truths; but the world could nonetheless be wholly physical" -- Tim Crane (CEU) on the real lesson of Frank Jackson's famous Mary example

F U L L Q U . A R T

Why a Crowd-Sourced Peer-Review System Would Be Good for Philosophy (guest post) | Daily Nous

Daily Nous

Why a Crowd-Sourced Peer-Review System Would Be Good for Philosophy (guest post)

Why a Crowd-Sourced Peer-Review System Would Be Good for Philosophy
by Marcus Arvan, Liam Kofi Bright, and Remco Heesen

Recent Comments

Subscribe

Archives

Heap of Links

Daily Nous

Why a Crowd-Sourced Peer-Review System Would Be Good for Philosophy (guest post)

Why a Crowd-Sourced Peer-Review System Would Be Good for Philosophy by Marcus Arvan, Liam Kofi Bright, and Remco Heesen

Recent Comments

Subscribe

Archives

Heap of Links

Why a Crowd-Sourced Peer-Review System Would Be Good for Philosophy
by Marcus Arvan, Liam Kofi Bright, and Remco Heesen