TL;DR: Open access publishing has very high administrative overhead and is therefore too expensive. Github and similar services have substantial and perhaps insurmountable technical and funding advantages as publishing platforms. Scientific publications should therefore be git repos, created by the researchers themselves, that contain the manuscript, data, and analysis code, and that are hosted on, e.g., github, gitlab, bitbucket, sourceforge, and a few others. A ‘journal’ would just be a managed collection of repos. Reviews would be handled via issues. Long-term archiving and minting of doi’s would be handled by zenodo or equivalent data archiving services. Journal reputations would be based on the reputations of the editors, who are typically researchers themselves.
July 17, 2019: Added links at the end to researchers and journals that are already publishing on Github.
Elsevier, the world’s largest publisher of scientific articles, just cut off access to its journals by the University of California, one of the world’s largest producers of scientific articles, because UC objected to Elsevier’s increasingly exorbitant subscription fees. Meanwhile, many funders of scientific research, mostly in Europe but also including the Gates Foundation, have signed on to Plan S, which stipulates that research that is funded with money from supporting institutions must be published in open access journals or platforms.
Open access journals are expensive, though, typically charging $1000 or more per article. The reason they are so expensive is that administration and software development is expensive. arXiv.org, the famous physics/math preprint server that hosts articles for free, has a relatively small leadership team of six people, yet salaries alone amount to more than $1.3 million/year. Add in indirect costs, and the total is about $2 million/year, covered by grants, memberships, and Cornell. Their servers and misc expenses are less than 1/10 of the total.
PLOS, PeerJ, and other open access journals cover their substantial administrative and development costs with publication fees that range from $1000-$3000/article, which is about what traditional journal publishers like Elsevier charge for open access.
The Center for Open Science (COS) and osf.io offer free preprint, preregistration, and file hosting services (full disclosure: I use osf.io, and we received one of their $1000 preregistration awards). They currently have about 50 employees and are spending in the neighborhood of $7 million/year, which, as far as I can tell, comes mostly from grants. As COS itself admits, sustainability is a major concern, and will probably involve charging fees to stakeholder communities, e.g., universities.
In sum, the multiple open science initiatives each have their own admin teams, incurring high administrative overhead, and are chasing a relatively small pool of users, many of whom have little funding or incentive to contribute to these public goods.
The open source software community, like the science community, wants to give its products away for free in exchange for prestige. Their efforts have transformed the planet. Unlike science, however, open source developers pay nothing to publish their highly technical “documents” (code). How do they do it? In a word, Github.
If you don’t know what Github is I describe it in a bit more detail below. For now, think of it as an online service for collaborating on software development. The open source community is allowed to use Github for free because the tech industry benefits tremendously from open source code and the talent that produces it. Open source is therefore subsidized by the fees commercial firms pay to use Github, which has estimated annual revenues of $250 million, and was acquired last year by Microsoft for $7.5 billion. Microsoft’s annual revenue is $110 billion. Gitlab, a similar service, has $10.5 million in annual revenue, and recently received $100 million in venture capital. Atlassian, which owns Bitbucket, yet another such service, offers a number of commercial collaboration services and has about $1 billion in annual revenue.
As of this writing, Github alone has over 30 million users working on close to 100 million projects. The technological and financial investment in these platforms and the economies of scale are orders of magnitude larger than those enjoyed by any open science initiative.
In 2011, Marcio von Muhlen argued that academia needed a Github of science. He made three key points:
Each of these points is just as true today as it was then. The only thing I would add is that the Github of science should be…Github. The costs of hosting scientific articles on Github or similar services, such as Gitlab and Bitbucket, would be a rounding error.
Moreover, much of science’s computational infrastructure is already developed on Github and friends. This includes python and scipy, machine learning frameworks, key r packages, and much more.
Just like it benefits from open source software, the business community benefits tremendously from science. Instead of researchers paying the scientific publishing oligopoly hefty fees to publish tax-funded research, commercial businesses would subsidize the (small) cost for researchers to publish their research on git hosting services.
Sounds fair to me.