The Office of Science and Technology Policy (OSTP) and the National Science and Technology Council’s Subcommittee on Open Science (SOS) are engaged in ongoing efforts to facilitate implementation and compliance with the 2013 memorandum Increasing Access to the Results of Federally Funded Scientific Research and to address recommended actions made by the Government Accountability Office in a November 2019 report.
In February 2020, OSTP and the SOS issued a request for information to provide all interested individuals and organizations with the opportunity to provide recommendations on approaches for ensuring broad public access to the peer-reviewed scholarly publications, data, and code that result from federally funded scientific research. The MIT Libraries provided the following comments, endorsed by MIT’s Committee on the Library System, on Federal Register Document 2020-03189.
MIT has a long history of promoting public access to educational materials through MIT OpenCourseWare (OCW) and MITx, and to research papers through the open access repository DSpace@MIT. MIT reaffirmed commitment to public access to research outputs in the 2019 MIT Ad Hoc Task Force on Open Access to MIT’s Research’s (MIT OATF) recommendations. In recognition of the fact that public access to research accelerates the progress of science and its application to the world’s greatest challenges, the MIT OATF recommends that “data, code, and other types of scholarly work, especially when necessary to validate, replicate, and/or reuse scholarly work, must be openly and responsibly available.”
The current global pandemic gives new urgency to the cause of open science, as open sharing of research data and papers is critical to understanding and combating the coronavirus. As policy-makers, medical professionals, and ordinary citizens seek accurate information about this virus the open availability of research hastens our collective knowledge and ability to respond effectively. The Covid-19 Open Research Dataset (CORD-19) is an example of the kinds of open resources that ought to be common rather than special projects spurred by crisis. Understanding and solving a range of new and persistent global challenges—from coronavirus to cancer to climate change—requires ongoing and immediate public access to peer-reviewed articles, data, and code.
Our responses to the questions posed in the RFI are informed by a two-year broad-based engagement process undertaken by the MIT OATF, which included all sectors of the MIT community (faculty, staff, research scientists, postgraduate fellows and associates, graduate and undergraduate students), as well as consultation with a range of external stakeholders and subject matter experts.
What current limitations exist to the effective communication of research outputs (publications, data, and code) and how might communications evolve to accelerate public access while advancing the quality of scientific research? What are the barriers to and opportunities for change?
As articulated in the MIT Framework for Publisher Contracts, “the benefits to society are greatest when…scholarship is freely and immediately available to the entire world to access, read, and use; without restriction and for any lawful purpose.” The primary barriers to realizing this vision are political and cultural rather than technological. Federal agencies are uniquely positioned to affect political and cultural change through policy, incentives, and support.
Actions we recommend at these levels include:
- Requiring immediate open access to journal articles emerging from publicly-funded research, under open licenses and in formats that permit broad reuse, including computational access.
- Policies that default to open sharing for data and code, with opt-out exceptions available for a range of ethical, legal, security, privacy and/or professional considerations;
- Providing incentives for sharing of data and code, including supporting credentialing and peer-review; and encouraging open licensing. Such policies and incentives are crucial to the acceleration of scientific progress and to addressing the reproducibility crisis.
- Requiring the use of standard persistent identifiers, e.g. Digital Object Identifiers (DOIs) for publications, data and code, ORCIDs for authors, and Ringgold IDs for organizations. Such identifiers are essential infrastructure for scholarly communications, enabling interoperability, disambiguation, discovery, and attribution.
- Recognizing data and code as “legitimate, citable products of research”and providing incentives and support for systems of data sharing and citation similar to that of the crystallographic community, and emphasizing open systems that support machine negotiation of citations and computational access.
- Expanding use of and support for open enabling infrastructure, such as the Public Access Submission System (PASS), and GitHub.
- Supporting the development and/or maintenance of trusted and reliable data and code repositories, guided by the FAIR principles, which “put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.”
- Developing policies that support the creation of new academy-based publishing technologies and platforms. Public access to research is dependent on stable, interoperable, and sustainable infrastructure; which we believe is better provided by well supported academy-based platforms than by dependence on an increasingly consolidated commercial market. One local example of the kind of not-for-profit consortium of academic, industry, and advocacy organizations that federal agencies might support is the Knowledge Futures Group (KFG), which has its origins in a partnership between the MIT Press and the MIT Media Lab.
What more can Federal agencies do to make tax-payer funded research results, including peer-reviewed author manuscripts, data, and code funded by the Federal Government freely and publicly accessible in a way that minimizes delay, maximizes access, and enhances usability? How can the Federal Government engage with other sectors to achieve these goals?
Federal agencies should eliminate all embargo periods permitted on tax-payer funded peer-reviewed manuscripts and should require immediate open access to all manuscripts resulting from federally-funded research as funders in Europe have and as Canada has just committed to. This will ensure tax-payer funded research is “freely and publicly accessible in a way that minimizes delay, maximizes access, and enhances usability.” While the 2013 White House Directive resulted in substantial progress towards open access, many publishers have adopted post-publication embargoes of twelve months and longer, despite the fact that the White House Directive encouraged agencies to “use a twelve-month post-publication embargo period as a guideline for making research papers publicly available.” In negotiations with publishers, it is clear that many consider the Federal agencies’ twelve-months embargo guidance asa de facto federally-endorsed standard they can invoke in refusing to accept contracts that call for immediate open access for articles and data. By eliminating embargoes on federally funded research, federal agencies can shift the norm towards immediate open access for the entire system.
Viable open access business models are emerging both in Europe under government requirements and in the US as well. One promising model is the collaborative agreement between the Association for Computing Machinery (ACM) and the libraries at four research universities—MIT, University of California, Carnegie Mellon, and Iowa State University—to co-create a sustainable open access business model for the ACM. As MIT and other universities seek new kinds of open access agreements, libraries, scholarly societies, and Federal agencies have the opportunity to work collaboratively to advance our shared goals of supporting scholars and advancing knowledge. Non-profit university presses likewise require new models to thrive under a fully open access ecosystem. The MIT Press stands as an early leader in open access publishing, providing examples and evidence of sustainable open access practices for scholarly publishers. With an abundance of guidance and proven business models available, the argument that embargoes are a necessary means to sustain scholarly publishing rings hollow. Embargoes artificially delay access to scholarlyliterature, limiting the impact of science, hindering rapid innovation, and slowing theadvancementof knowledge thatwould serve nationaland global interests.
Federal agencies can and should engage with university libraries, university-based presses, and scholarly societies in advancing sustainable models for open scholarly publishing. The willingness and creativity of the research library community in imagining and supporting innovation in open publishing is evident via the over 100 individual research libraries and consortia that have endorsed The MIT Framework for Publisher Contracts. The Framework focuses on leveraging existing academy-owned and operated open access repositories to provide public access to research outputs, while also encouraging publishers, scholars, libraries, and funders to fundamentally rethink their relationships with one another. One example of the kinds of engagements between Federal agencies and research libraries is the recent collaboration between MIT Libraries and the Department of Energy on defining workflows for compliance with their public access policy using MIT’s open access repository.
Supporting university presses in transitioning publication and business models to full and immediate open access is another way Federal agencies can engage with key stakeholders to advance the public access goals of the Federal Government. Locally, the experience of the MIT Press is instructive. The MIT Press has grown from two OA journals in 2015 to twelve in 2020, and is actively seeking to transition existing journals to OA with a responsible financial model. In January of 2019, MIT Press launched a new OA journal Quantitative Science Studies when the editorial board of an Elsevier owned, society-supported journal resigned when Elsevier refused to provide open access to citation data, to lower the APC charges for open articles, or to transfer ownership of the title to the board. With the support of Federal agencies, The MIT Press and other university-based publishers could assist other journals and societies in the transition to open access. Nonprofit presses like MIT Press could also benefit from alternative funding practices to support journals, such as direct federal funding to cover costs associated with peer review, copyediting, proofreading and other typical publishing functions.
Federal agencies should also ensure that research products based on federally funded research are openly and publicly available for computational access and analysis. Machine access to much of the scientific literature is often impossible or extremely cumbersome at best. Many publishers prohibit computational access to content, including content the federal government has made open, through contracts and licenses that offer open access to human readers, but not to machines. The NIH public access policy has significantly opened up reading access to medical research, but the substantial research advances available through text and data mining are largely prohibited by an “all rights reserved” limitation on most articles in PubMedCentral. Computational access and analysis of literature allows scholars to apply artificial intelligence and machine learning techniques to address powerfully important issues ranging from “early breast cancer detection” to the current global COVID-19 pandemic. Federal agencies requiring research outputs to be openly available for machine access and computational analysis would enhance usability of federally-funded research.
Federal agencies can also significantly advance open sharing of publications through infrastructure support. Investing in the infrastructure needed for scholarly societies to provide auto-deposit to open repositories (disciplinary, federal, or institutional) would make open access publishing easier for scholars and publishers alike, and would be significantly more sustainable and impactful than investing in open access article-by-article through financial supports for article processing charges, a model that contains some potential perils. Having more agencies follow the lead of NIH and NASA in working with the Public Access Submission System (or PASS) would provide a needed process improvement for authors and universities who aim to meet public access requirements, while minimizing administrative barriers that reduce compliance levels and negatively impact researcher productivity. MIT has contributed to PASS and has worked with commercial and society publishers on auto-deposit services, and we would be happy to share what we have learned and partner with OSTP and others in thinking through these kinds of models.
For data and code, Federal agencies could significantly advance responsible sharing by providing incentives for cleaning, documenting, and making the datasets and code underlying published research appropriately available. These activities are expensive and labor intensive and all too often go unrewarded in the current highly competitive grant-seeking environment. To make federally funded data and code more openly and publicly available, Federal agencies should promote, support, and require effective data practices, such as persistent identifiers for data, and efficient means for creating auditable and machine readable data management plans. Examples of effective and responsible data sharing policies include the PLoS data sharing policy and the MIT Press Research Data Policy. Federal actions on open data should align with the Joint Declaration of Data Citation Principles and the FAIR Data Principles.
How would American science leadership and American competitiveness benefit from immediate access to these resources? What are potential challenges and effective approaches for overcoming them? Analyses that weigh the trade-offs of different approaches and models, especially those that provide data, will be particularly helpful.
In this particular cultural moment there are significant pressures toward closure rather than openness, especially in relation to data and code. Many of these pressures emanate from fears that American technological and economic competitiveness might be hindered by the open, global sharing of our valuable research products. Access to sensitive research products can be protected by use of classification, while still allowing public access to research that falls outside of the classification framework. There are also legitimate concerns about the misuse of American research and technology in ways that violate individual rights and/or in ways that run counter to American values and global objectives. Ultimately, MIT is guided on these issues by the vision articulated by President L. Rafael Reif: that we must focus on “building a farsighted national strategy for sustaining American leadership in science and innovation” and in doing so, we must resist the urge “to try to double-lock all our doors.”1
Although some publishers see lost revenue as a troubling trade-off for open access to research publications, many publishers have found ways to offset those trade-offs and are eager to embrace new models for open publishing. Recently, MIT Press joined other distinguished open access publishers in signing a letter to the US government supporting open access to publications, indicating that “the U.S. will best lead the world by showcasing its research for everyone, including the American taxpayers who have funded it, to learn from and build on.”
As MIT and other leading US universities seek to “bridge the gap between discovery and commercialization” by supporting the launch of technology companies with the potential to transform the planet and solve some of the world’s most pressing challenges, lack of open access means that recent graduates and other unaffiliated innovators do not have access to peer-reviewed literature, data, and code that would inform and accelerate their work. In working with start-up founders supported by The Engine at MIT, MIT librarians were stymied by embargoes and paywalls in providing access to information to these innovators. Given widespread recognition of improved ROI on research dollars from open access as manifest in policies in Europe and China, US Federal agencies can best accelerate American innovation, discovery, and competitiveness by adopting zero embargo open access policies for federally funded research outputs. As MIT President Rafael Reif said in comments before the House Ways and Means Committee: “Whatever else the U.S. does to counter the challenges posed by China…we must enhance our capacity to get the most out of [our] investment” in research and technology.”
While challenges in sharing code and data vary across disciplines, the MIT OATF concluded that making data and code openly available is critical to “support the robust validation and replication of research.” In weighing and addressing trade-offs associated with security, sensitivity, and privacy concerns, the MIT OATF recommended that “data, code, and other types of scholarly work…must be openly and responsibly available” [emphasis added] and that responsible data sharing should follow the principle of “as open as possible, as closed as necessary.” This recommendation reflects full recognition of concerns about asymmetries of data sharing and the potential implications for national security and competitiveness. It is consistent with the approach newly announced by the Canadian government in which researchers’ outputs will be “open by design and by default,” including a recommendation that the government develop a “framework identifying criteria for when restricting access to federal scientific research outputs is warranted.” This approach would be productive in the US as well. The Canadian policy reflects evidence of the propulsive power of data sharing to advance innovation, as shown in the increase in citation of articles whose associated data sets are publicly available and in the highly productive economic impact and success of the Human Genome Project, built on open data.
Federal funders can take actions even in the context of complex trade-offs. Trade-offs related to code present a particular need: balancing openness that can fuel and speed innovation with incentives and support for invention. At MIT the dynamic tension between these aims was discussed as part of the task force’s community engagement, and it was concluded that MIT should “encourage more open sharing of code and reduce the potential negative impact of the proliferation of software patents on entrepreneurship and innovation.” This can be done by developing a set of recommended open licenses for software, by creating and publicizing guidelines, policies, and practices for publishing code under open source licenses, by reviewing software licensing practices to ensure they promote innovation, and through encouraging authors to distribute code openly under popular open source licenses. We suggest that similar steps be adopted by Federal agencies to help promote processes, policies, and infrastructure so data and code sharing can advance as sharing of publications has. Agencies can also address the vital need for sustainable data and code repositories and credit infrastructure that enables recognition of these contributions to the scholarly record and to society.
Any additional information that might be considered for Federal policies related to public access to peer-reviewed author manuscripts, data, and code resulting from federally supported research.
The challenges presented by the current COVID-19 pandemic make the need for immediate open access to the products of federally funded research abundantly clear and increasingly urgent. While it is laudable that many publishers are opening access to content to support faculty and students now forced to work and learn remotely, our ability to understand and successfully address the medical, social, and economic challenges of this pandemic requires fully open access for humans and machines to all relevant research articles, data, and code. Ensuring such access now and into the future requires strong federal open access policies, as well as aggressive support for enabling infrastructure and business models.
From the 2008 NIH Public Access Policy to the 2013 White House Directive, federal policies and actions have played a major role in advancing openness in research. Federal agencies could continue the push to ensure publicly funded research is openly available to solve the world’s greatest challenges by doing the following: eliminating embargoes; supporting the development of key infrastructure; incentivizing responsible sharing of data and code; and encouraging and endorsing partnerships among scholarly societies, libraries and non-profit publishers to develop new publishing models.
Based on the challenges mentioned above, bold action in support of responsible open sharing of research outputs is called for. The United States should lead in advancing openness in the service of innovation, discovery, and the rapid application of new knowledge to complex local and global challenges. MIT President Reif’s recent comments to the House Ways and Means Committee are again relevant here: “Leading in research is a necessary but not sufficient condition for prosperity and security. We also have to be the best and the fastest at translating ideas into products and processes. That’s not something that can be accomplished by closing off our system – that just would shut down intellectual exchange that benefits us.” We thank the Office of Science and Technology and Policy for this opportunity to comment on a topic that is central to the advancement of this critical intellectual exchange — an exchange that is an essential ingredient for building human knowledge and solving humanity’s greatest problems.
[1] L. Rafael Reif, “China’s Challenge Is America’s Opportunity,” New York Times, August 8, 2018.