A guide to Open Data
What data can be shared?
Most research data involving human participants can be shared openly so long as appropriate consent, anonymisation, rights management, and access control has been considered. Data protection laws (including GDPR) govern the processing of personal data, but do not apply to anonymised data. However, be aware that some data seemingly anonymised can still be used to identify participants. Data sharing should be considered from the outset of your research project, and appropriate ethical and legal issues given full consideration.
What follows is some tips centring on data involving human participants, but some of this section might be relevant to commercial, security, or legal implications exist.
In order for data involving human participants to be shared, you must seek informed consent from your participants for this. You must inform participants about what data will be stored, where it will be stored, for how long it will be stored (e.g., indefinitely), and how their confidentiality will be protected. The UK Data Service has a repository of example sharing-friendly forms for different types of data. Ensure consent forms do not promise to destroy the data or promise that the data will only be seen by the research team.
Consideration of how to anonymise your data should occur at the outset of your project, and should occur in tandem with your planning around participant consent. Anonymising your data is the best way to minimise the risk of leaking personally identifiable information. Remove all personal identifiers (both direct and indirect) from both quantitative and qualitative data sets before sharing. If your data cannot be fully anonymised (i.e., where there is a risk of re-identification of participants), consider imposing access controls.
Where data are sensitive, safeguarding or shielding subsets of data may be appropriate. Most data repositories allow management of access via an End User Licence which stipulates conditions for access, including agreement not to attempt to re-identify individuals.
Making Data Open
Open data should abide by the FAIR principles: It should be Findable (i.e., easily discoverable for both humans and automated computer searches), Accessible (clear instructions provided on access and authentication), Interoperable (compatible with other data types), and Reusable (full descriptions provided of the data, as well as clear usage licences).
Keep the following recommendations in mind:
- Use clear and detailed data descriptions via use of a "data dictionary". This will help others understand your data allowing better reuse of it.
- Try to make the data accessible on their own terms (i.e., independent of the paper reporting the results). This can be achieved by posting the data in a dedicated public repository.
- Ensure missing data or data exclusions are fully annotated.
- Provide as much unprocessed data as possible so users can "rebuild" information.
- Include analysis code and processing scripts where feasible.
- Bonus points for not using proprietary formats / file types as the only way to access data. Where possible, convert data in proprietary format to open or standard formats before sharing the data.
- Consider asking colleagues to review the data submission before sharing to ensure quality control and accessibility.
Steps to Open Data
Follow these steps to share your FAIR data with others:
- Find a suitable repository. Make sure the repository is suitable for your needs. Is a general or specific repository more suitable? What are the file size limits? How long will the data be available? Do you need to manage sharing permissions, or set up an embargo period? Will you get a DOI or other persistent identifier? You can find a data repository at re3data.org Remember that Keele also has a data repository.
- Provide reuse guidance by deciding on an appropriate licence for your data. Be clear about how you would like the data cited.
- Share the persistent URL of your data in your publications / conference talks using the data.