Why a Crowd-Sourced Peer-Review System Would Be Good for Philosophy (guest post) | Daily Nous
Published by Reblogs - Credits in Posts,
Why a Crowd-Sourced Peer-Review System Would Be Good for Philosophy (guest post)
Would "an online, crowd-sourced peer-review system" work better than traditional peer-review as a "quality control device" in philosophy? In a paper forthcoming in The British Journal for the Philosophy of Science, three philosophers, Marcus Arvan (Tampa), Liam Kofi Bright (LSE), and Remco Heesen (Western Australia), argue for a positive answer to this question.
In the following guest post,* they lay out some of the main considerations in favor of the idea that they discuss more fully in the paper itself, as well as address some objections to it.
Why a Crowd-Sourced Peer-Review System Would Be Good for Philosophy
by Marcus Arvan, Liam Kofi Bright, and Remco Heesen
Peer review is often thought to be an important form of quality control on academic research. But, assuming it is, what is the best form of peer review for this purpose? It appears to be widely assumed that peer review at academic journals is the best method. For example, hiring and tenure committees evaluate candidates on the basis of their publication record. But, is peer review at journals really the best method for evaluating quality? We argue not. Using the Condorcet Jury Theorem, we contend than an online, crowd-sourced peer-review system similar to what currently prevails in math and physics is likely to perform better as a quality control device than traditional peer review.
We first argue that, if any form of peer review is to have any success at quality control, two conditions need to be satisfied. First, researchers in a given field must be competent at evaluating quality of research. Second, for a given paper there must be some intersubjective agreement (however broad or vague) on what constitutes quality appropriate for that paper. If either of these assumptions were false, then no system of peer review could perform the form of quality control commonly attributed to it.
Next, we assume that a crowd-sourced peer-review system could be expected to have a higher average number of reviewers per paper than traditional peer review. This is plausible because the number of reviewers who evaluate a given paper in journal review is miniscule: papers submitted to journals are standardly evaluated by an editor or two at the ‘desk-reject’ stage, and if they pass this stage, they are normally sent to only one to three reviewers. We expect that an online, crowd-sourced system would involve many more people reviewing papers, particularly if a crowd-sourced peer-review website (built on top of preprint servers like arXiv or PhilPapers) incentivized reviewing.
Based on these assumptions, we construct a series of arguments that a crowd-sourced approach is likely to evaluate the quality of academic research more reliably than traditional peer review. Our arguments are based on the Condorcet Jury Theorem, the famous mathematical finding that larger numbers of evaluators are far more likely to evaluate a proposition correctly than a smaller group. To see how, consider a jury of 100 people tasked with voting on whether p is true. Suppose that the average likelihood that any individual member will judge p rightly is slightly better than chance, or .51. Chances are that 51 members of the jury will vote correctly and 49 won’t. This means that it takes only one additional errant vote to tip the scales toward the majority judgment failing to evaluate p correctly—a probability of .38. Now consider a jury of 100,000. If the average jury member’s accuracy remains .51, then the most likely result is 51,000 jury members voting correctly and 49,000 incorrectly. This means that for the majority judgment to err, 1000 additional voters must err—which only occurs with a probability of one in ten billion. In short, the Condorcet theorem demonstrates that larger numbers of evaluators are more likely to correctly evaluate something as a group than a smaller number.
We then provide three arguments using this theorem that a crowd-sourced peer-review system is likely to result in more reliable group judgments of paper quality than journal review. We argue that this follows irrespective of whether the crowd-sourced system involves (1) binary judgments (i.e. paper X is good/not good), (2) reviewer scores (i.e. evaluating papers on some scale, i.e. 1-100), and (3) qualitative reasons given by reviewers. Since peer review at journals standardly utilizes one or more of these measures of quality—as reviewers may be asked to render an overall judgment on a paper (accept/reject), rate a paper numerically, or write qualitative reviewer reports—it follows that a crowd-sourced peer-review system is likely to better evaluate paper quality than journal review.
Finally, we address a variety of objections, including logistical concerns about how an online, crowd-sourced system would work. First, we argue that ‘review bombing’ and trolling could be addressed in several ways, ranging from technological solutions (such as statistical software to detect and flag correlated votes) to human-based ones, including but not limited to initially anonymizing papers for some period of time, to the ability of reviewers or moderators to flag suspicious reviews, to two types of reviewers with separate reviewer scores: expert reviewers and general reviewers. Second, to the common objection that journals are likely to select more reliable reviewers than a crowd-based system would have—since journals (particularly selective ones) may be likely to select the most highly-established experts in a field as reviewers—we argue that a variety of findings cast doubt on this. Empirical studies on peer-review indicate that interrater reliability among journal reviewers is barely better than chance, and moreover, that journal review is disproportionately conservative, preferring ‘safe’ papers over more ambitious ones. We suggest a variety of reasons for this: journals have incentives to avoid false positives (publishing bad papers); reviewers and editors have incentives to reject papers given that the journal can only accept few papers; well-established researchers have reasons to be biased in favor of the status quo; and small groups of reviewers who publish in the same area and attend conferences together may be liable to groupthink. These speculations are backed up by numerous examples in a variety of fields—including philosophy, psychology, and economics—of influential or otherwise prestigious papers (including Nobel Prize winning economics papers) being systematically rejected by journals. We argue that, whatever biases will exist in a crowd-sourced model, they are likely to be distributed more randomly. Hence, the combined judgment of crowd-sourced reviewers will be more reliable on average, not less.
If we are correct, should peer review at journals disappear? We are agnostic about this (at least as a group), as the disciplines of math and physics combine crowd-sourced peer review with journal review. Given that some may be likely to remain skeptical of online reviews, we suspect that a Rottentomatoes-like crowd-sourced peer review site—perhaps housed at PhilPapers or here—might complement rather than supplant peer-reviewed journals, in broadly the way that math and physics currently do – a ‘best of both worlds’ approach. Indeed, it would be interesting to compare how the systems work concurrently.
Would a crowd-based peer-review system like we propose actually work in practice? Would enough people partake in it? Would reviews be thoughtful and evidence-based (reflecting reviewer competence) or incompetent? Could logistical problems (such as the kinds of things that have plagued Rottentomatoes.com) be overcome? We argue that answers to these questions cannot be settled a priori, but that there are a number of reasons to be optimistic. Finally, we offer suggestions for how to ‘beta’ (and later tweak) our proposal. Only time will tell, but we believe that what we currently lack are not good reasons for attempting to create such a forum—as our paper purports to show that there are good reasons to try. What we currently lack is the will to create such a system, and we hope that our paper contributes to building this will.
I think the comparison to physics might be misleading. Physics doesn’t have any formal crowd-sourced peer review: arxiv.org doesn’t have a "rate my paper" function, for instance. What it does have is a pretty systematic culture of posting to arxiv at, before, or (occasionally) instead of submitting to a journal.
If the proposal is that we should develop that culture too, I’m all for it, and it would be logistically very easy to do given our existing archives (PhilPapers and philsci-archive). Indeed, to some extent we already have that culture in philosophy of physics: a respectable fraction of people, including me, post their paper physics-style. (And I’ve done so since I was a grad student.) If anything I think preprint submission has decreased among junior people in my field and I’d love to see it reversed.
If the proposal is to develop a formal, aggregative system to rank preprints, I’m ambivalent about it, but at any rate it would be going well beyond what physics does.Report
As a junior philosopher of physics with a mixed record of uploading preprints: I am generally wary of posting preprints for anything that I intend to possibly submit to a triple-blind journal, for fear of (increasing the odds of) losing the best relevant editor at that journal. Since BJPS, in particular, is one of these journals, there is often at least one such journal on the list for any given manuscript. So, I feel this chilling effect most of the time — but it only takes the one journal to induce it.Report
Nice to see the virtues of crowd-reviewing rigorously expounded. This is something that I’ve been thinking about as a possible PhilPapers project. I continue to think about it, though these projects have taken a back seat with COVID-induced disruptions.Report
I am dubious that we should ‘crowd source’ assessments that involve expertise that only a minority in the crowd will have. For example: even if the Condorcet jury theorem does well in cases where people have broadly the same background knowledge (e.g., how many marbles in the jar) it would be a mistake to crowd source questions about whether some theory in quantum physics should be published (or: imagine if they decided whether to publish Andrew Wiles’ FLT proof by crowdsourcing it… even though only a small handful of people even understood it). But what we say for quantum physics and math should presumably go also for philosophy (e.g., crowdsourcing the latest argument for what grounds facts about grounding, etc).Report
Hi H.N.: it’s worth noting here that we build components into our proposal to address this, including mechanisms for identifying pools of expert reviewers in different subfields and reporting their scores and reviews separately from non-experts. We argue that there are advantages in providing both types of scores simultaneously. Yes, experts are experts. But experts can also be subject to groupthink and have dubious assumptions pointed out by relative outsiders. So, having both types of reviewers in a crowd-based system corrects for/balances the respect epistemic merits of both types of reviewer (and better so, we argue, that journal peer-review alone).Report
‘Would "an online, crowd-sourced peer-review system" work better than traditional peer-review as a "quality control device" in philosophy?’
This is the first sentence of this article. I think it is the wrong question. This would be the question I would ask: Would "an online, crowd-sourced peer-review system" enhance the traditional peer-review system?
And even if the answer were a tepid, maybe.., I think it is worth a shot. What have we got to lose? Not much! And the potential gains could fundamentally change the dynamics of innovation in theoretical thinking.
Now that we all agree that such a system would potentially be beneficial, let me refer to the last statement of the article to drill to the crux of the issue as it stands: ‘…[W]hat we currently lack are not good reasons for attempting to create such a forum—as our paper purports to show that there are good reasons to try. What we currently lack is the will to create such a system, and we hope that our paper contributes to building this will.’
Luckily, my system—The Matrix-8 Solution—will handle the logistical problems traditionally besetting democratic processes in large groups. It should put Condorcet at ease. Not only does it allow for large numbers of evaluators – my system solves the Democratic Trilemma to boot! And in Trusted Reputation, it solves the longstanding problem of differentiating between bots, trolls, and honorable participants. It is currently under development as the governance system for an up-and-coming cryptocurrency. It could easily be tweaked to fit your proposed forum’s needs.
You can see the dynamic of Trusted Reputation here, and the full White Paper for the system here. Report
Recent Comments
Subscribe
Archives
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- September 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- September 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
Heap of Links
- New developments in plagiarism: AI paraphrasing tools -- one professor's experience detecting its use by a student
- "Dear Professor James, I am so sorry but really I do not feel a bit like an examination paper in philosophy today" -- Gertrude Stein was apparently the teacher's pet in William James' class
- "To be honest it’s a bit embarrassing to see philosophy operating downstream from popular culture and corporate PR, rather than approaching these overwhelmingly dominant forces critically" -- Justin E.H. Smith (Paris) brings his fascinatingly wide angle perspective to VR, AI, and applied ethics
- "That question just stuck with me… you could be doing a whole string of science based on a flawed metaphysical assumption… Someone needs to work on this" -- the "origin story" of philosopher Quayshawn Spencer (U. Penn)
- What are the rules for not being a "COVID jerk"? -- Eric Schwitzgebel (Riverside) lets us know
- Psychological research inspired by Parfit’s work on the connection between prudence, morality, and the metaphysics of the self -- some experiments "suggest that people who score high on the Future Self Continuity measure have higher moral standards"
- "What if animals do know what it means to die?" -- work in philosophy, psychology, and biology is helping us understand whether animals understand death (via Kris McDaniel)
- "It is the duty of philosophy to destroy the illusions which had their origin in misconceptions, whatever darling hopes and valued expectations may be ruined by its explanations" -- Immanuel Kant interviewed by Richard Marshall at 3:16AM
- What is gender? -- Robin Dembroff (Yale) in conversation with Justin E.H. Smith (Paris)
- "Liberal neutrality rests on substantive moral goods: moral relations between diverse persons" -- and not only is that not incoherent, argues Kevin Vallier (Bowling Green), it's part of neutrality's appeal
- "If you cannot do anything about what upsets you, you should attempt to free yourself from such negative emotions… If, by contrast, there is an opportunity for changing the distressing situation, then you should embrace the pain you feel and let it motivate you" -- Katharina Volk (Columbia) on how to make sense of Cicero's changing view of the emotions
- If "actions in virtual worlds will potentially be as meaningful as actions in the physical world," what ethics apply to them? What law? -- an excerpt from Reality+, the new book from David Chalmers (NYU)
- The new version of GPT, "InstructGPT," is better at following people’s instructions -- but "a byproduct of training our models to follow user instructions is that they may become more susceptible to misuse if instructed to produce unsafe outputs. Solving this requires our models to refuse certain instructions; doing this reliably is an important open research problem"
- "Why did Husserl begin thinking about movement?" -- Carrie Noland (UC Irvine), a professor of French and comparative literature, on her "adventure" looking into Husserl's influences and motivation
- At public schools, should students be taught "tolerance as non-disapproval" or "tolerance as forbearance"? -- there's controversy no matter what, argues Christina Easton (Warwick)
- "Well, I see metaphysics as ‘lifestyle’" -- Wilhelm Dilthey is "interviewed" by Richard Marshall at 3:16AM
- "‘Love Letters’ tells the tale of a white college [philosophy] professor named Anna Stubblefield and the black family whose lives she turned upside down when she helped teach their disabled son a controversial typing technique known as ‘facilitated communication’ but then took things too far" -- writer Andrew Bluestone has won a Humanitas Fellowship to work on this script
- "Much of our reasoning under uncertainty involves negotiating an accuracy-informativity tradeoff, and that this helps to explain a variety of patterns in the things people tend to guess, believe, and assert" -- Kevin Dorst (Pitt) & Matthew Mandelkern (NYU) on whether the conjunction fallacy is really a fallacy
- The song has lyrics from Wittgenstein and is dedicated to Rosalind Hursthouse -- it's by New Zealand's Karl Steven (of Supergroove), who took a break from his musical career to get a PhD in philosophy from Cambridge (via Yuri Cath)
- Amartya Sen on the memories that shaped his research -- in an interview on the radio show "Marketplace"
- "The philosophy of mind is not, pace so many of its contemporary exponents, an ethically neutral or ideologically innocent study. The philosophy of mind is a part of "human science"; politics has everything to do with it" -- Sophie-Grace Chappell (Open U.) argues that consciousness is both gendered and sexed
- "A life in VR could be just as meaningful as a life in the physical world" -- David Chalmers (NYU) in conversation with Evan Selinger (RIT)
- "Maintaining our punishing attitude towards plagiarism could reap benefits well beyond discouraging plagiarism itself" -- Stuart Ritchie (KCL) counters recent arguments for why we ought not care about plagiarism
- A philosophy PhD’s suicide and the mission of an academic organization with which many political philosophers have been involved -- the "existential struggle" taking place at Liberty Fund (via Chris Bertram)
- "All those yellow and green Wordle grids popping up on our screens give us a steady stream of small communions" -- C. Thi Nguyen (Utah) on how the popular word game provides moments of mutual understanding
- "The show… takes the form of a gathering of ‘radical fairies,’ who come together each year to mourn, and re-enact, the death of Socrates" -- a new jazz opera about the final hours of Socrates is opening in Manhattan
- "The defining characteristic of fiction is that it’s made up. So how can we learn from it?" -- that may sound like an easy puzzle to solve, but it's not, argues Amy Kind (Claremont McKenna) in her guest stint at The Splintered Mind
- "Likely the first book about moral philosophy to feature endorsements from Steve Carell, Amy Poehler, Ted Danson and Mindy Kaling" -- Michael Schur, the creator of the TV show "The Good Place," has written a book
- It’s "not about invincibility, but about vulnerability. And the role supportive others play in sustaining our resilience" -- one of several aspects of Stoicism discussed in an interview with Nancy Sherman (Georgetown)
- "If I am right, neither the science of physics, nor any other science, could express all the truths; but the world could nonetheless be wholly physical" -- Tim Crane (CEU) on the real lesson of Frank Jackson's famous Mary example
Tags: Philosophy, Bam, Education, Metaphilosophy, Zas