Against “publication lakes”: glam journals provide a needed service to science
Or why evaluating papers (partly) by the journal impact factor is not so bad
(I have had this in my drafts for months, but the recent announcement by eLife that it was becoming a publication lake made me revisit it and publish it).
There is a cluster of ideas in parts of the open science community that I am adjacent to that are roughly the following:
The Journal Impact Factor (JIF, a number that applies to journals rather than individual papers) is a highly imperfect metric.1
We should not evaluate papers by where they are published at all.2
We should divorce the evaluation for correctness from any notion of “predicted impact.” Thus, pre-publication review should only evaluate papers for correctness and not for “perceived impact.” Correct papers can be published into a paper lake and it should be post-publication review that determines whether they are worthy of having their profile raised.
This has been embodied into initiatives like PLoS One, the DORA Declaration, and eLife’s new model.
I also once thought roughly this way. Now, it feels like a naïve ideal about how the internet would work that we would take seriously in 2004. This kind of naïve view has been disproved in practice. YouTube and Twitter are publication lakes: you put things out there and up/down votes (or retweets, &c) plus automated algorithms determine how much attention you get (and how many resources go along with it).
In some sense, this works: millions of people use these services every day (including myself). It is hard to argue, though, that it is doing a good job directing attention towards the most reliable sources of information. What goes viral is driven in large part by pure randomness and Matthew Effects (where the famous get more attention).
Filtering is necessary
Sometimes there is a glib statement that “you should not rely on journal names to evaluate papers, you have to read them.” In practice, this is absurd. Given that 100s of papers are published every day on any given field, there is a need to select and filter them. When evaluating thousands of papers in hundreds of CVs, is every member of a committee expected to read them?
The idealized model of the publication lake argues that post-publication curation and automated filtering will solve the problem. Having seen semi-automated emergent filtering on social media, I think it will bring its own set of problems.
In particular, I am pretty sure that, for all its fault, the system whereby a knowledgeable editor uses multiple peer reviews to make an educated guess about the importance of a paper is a superior system (albeit not a perfect one, but let us not fall prey to the Nirvana Fallacy).
I do read a lot of preprints. Which preprints do I read? I mostly read preprints by people whose names I already recognize. When thousands of papers are thrown into the “level playing field” of biorxiv, pre-existing markers of prestige end up taking an even greater role.
I also skim the titles that Google Scholar emails me a few times a week as it sends a select of papers that have cited me. I don’t know how it selects those, but it selects 10 or so per week, which I again skim for recognizable markers (journal titles and authors).3
I also sometimes read papers that are popular on social media. But how well does being able to go viral on social media correlate with solid science? Is that correlation even positive? (see this recent story).
Maybe we already have publication lakes and post-publication review
An alternative view is that, actually, we already have a publication lake and we call it biorxiv. We also have post-publication review, except that confusingly we call it “pre-publication review” even though it happens after publication in the biorxiv lake. Then, the preprint gets a stamp of approval by a journal editor, again confusingly called “publishing it” (even though it was already published).
We have our lake and eat it too.
Can we have prestige at a lower cost?
If there is one problem with the current system it is the high costs4: A significant amount of effort are spent in optimizing for increased probably of a high-profile publication.
Papers in journals such as Nature can take years for the review process and revisions for a paper in those journals often take more effort than a completely new manuscript for a lower-profile journal. We recently spent a full 30 months between first submission and official publication getting something published. The science did improve quite a lot between the first and the last submission, but there was also a large fraction of effort that was purely driven by the publication process: we spent months revising text, redoing figures, writing 100s of pages of “reviewer rebuttals.” There was also a lot of uncertainty and angst and there was less mental space to work on future projects.
This is an economics problem: there is a societal value for prestige sorting, but the individual value of prestige is too large and there are incentives to over-optimize for this. The question should have been how can we keep the societal value that is derived from prestige sorting while minimizing the costs thereof?
In this respect, eLife had actually taken significant positive steps before. It was a high(ish)-profile journal which took steps to minimize the amount of effort it demanded of its authors (consolidating reviewer reports, minimizing the number of review rounds, …).5 Now, it seems that it has taken the view that prestige sorting is bad and no cost should be paid for it. Frankly, I fail to see what it is contributing to the problem space that is not already offered by PLOS One of F1000 Research, except that it is reusing a brand that had acquired some cachet (see this critical tweet). I feel that rather than a new model we did not have before (because PLOS One is over 15 years old now), we have instead lost an alternative that was worth exploring.
I have remarked that this is the area of my life where Twitter opinions and real-life opinions differ the most. On Twitter, mocking the JIF is de rigueur. In real life, it is taken seriously almost all the time.
I have seen ridiculous usages of the JIF (comparisons between journals where the difference was on the third decimal point), but as the saying goes “reverse stupidity is not intelligence” and most often people correctly see that JIF as a noisy metric that, nonetheless roughly mirrors the field consensus as to what journals are more prestigious and worth paying attention to. Cross-field comparisons are the most problematic , which may be partly why bioinformaticians are often particularly salty about it—bioinformatic papers get fewer citations partly due to bioinformaticians not citing each other very much, see earlier rant about Application Notes.
Note that point 2
does not logically derive from point 1
at all. A metric being imperfect is not a reason to completely disregard it.
Web of Science similarly emails me papers, but it seems to make no selection and instead send a random sample. It is interesting how truly random these papers are. Often they hold only a very tenuous connection to my work and I only rarely wish to follow up.
I do not mean a high price, price and cost and different things (briefly, price is what you pay in money, cost is the total amount of resources expended; if someone else is paying or the cost is paid in effort, then the price goes down, but the costs remain and can even increase).
On the negative side, it did impose idiosyncratic formatting requirements, which it enforced and which made it costly to submit to them.