Applied data science, machine learning, and artificial intelligence research often require computational use of data. FAIR data is machine findable, accessible, interoperable, and reusable. Open repositories and open-data aggregators, such as those indexed in re3data.org and FAIRsharing.org, increasingly provide access to datasets that more successfully meet the standards laid out in the FAIR principles. However, many of these FAIR datasets needed for computational research still may not be open, their access restricted by a fee or other barriers.
These FAIR but not open datasets can include:
- datasets “closed as necessary” due to protection of personal privacy, national security, and competitiveness
- well-structured and curated texts and data from proprietary sources, such as:
- primary research articles available for text mining (e.g., Text and Data Mining at MIT)
- reference books (e.g., Landolt-Börnstein ) and datasets (e.g., NIST SRM)
- curated databases of data extracted from literature (e.g., Reaxys, GlobalData Power, GooglePatents, etc.)
- in-house databases from industry (e.g., pharmaceutical companies, data analytics companies, etc.)
Here are some resources for working with these two types of datasets that can be FAIR but not necessarily open.
For datasets requiring protection of confidentiality and intellectual property, see MIT Libraries Data Management Services’ Confidentiality and Intellectual Property page for resources on how to share them and contact data-management@mit.edu to make them FAIRer. Some repositories provide closed access or restricted access to these datasets (See examples of closed access or restricted access in re3data.org). If you need to make use of such datasets, look into the terms of use from the dataset records or contact the authors or creators directly to check the possibility and conditions of gaining access to them.
To use curated data from proprietary sources, oftentimes, you will need to discuss and negotiate terms and conditions for computational access and use, including additional fees, with the providers of these sources. If you need computational access and use of an MIT subscribed resource, please contact textmine@mit.edu or your subject specialist. You can learn more about technical and legal challenges of using data and texts not in the open from this Force11 Scholarly Communication Institute 2020 workshop and evaluate the FAIRness of them with the assessment tool from the workshop.