Skip to main content

The 2023 NIH Data Management and Sharing Policy recommends that the DMS plan be two pages or less in length and include the following elements:

  • Data Type: A brief description of the scientific data to be managed, preserved, and shared. For any ethical, legal, or technical factors that may limit sharing, the DMS plan should include a rationale for limiting data sharing.
  • Related Tools, Software, and/or Code: An indication of whether specialized tools are needed to access or manipulate shared scientific data to support replication or reuse, the name(s) of the necessary tool(s), and how the tools can be accessed.
  • Standards: A description of the standards, if any, that will be applied to the scientific data and associated metadata.
  • Data Preservation, Access, and Associated Timelines: Plans and timelines for data preservation and access.
  • Access, Distribution, or Reuse Considerations: A description of any applicable factors affecting access, distribution, or reuse of scientific data related to privacy, security, informed consent, and proprietary issues.
  • Oversight of Data Management and Sharing: An indication of how compliance with the plan will be monitored and managed, frequency of oversight, and by whom.

Policy Links

Writing a Data Management & Sharing Plan

NOT-OD-21-14: Supplemental Information to the NIH Policy for Data Management and Sharing: Elements of an NIH Data Management and Sharing Plan


Sample Responses for DMS Plan Elements

Below are some sample responses for each NIH DMS plan element as presented in the DMPTool. These responses are based on a hypothetical research project investigating the effectiveness of a health intervention.

1. Data Type

A. Types and amount of scientific data expected to be generated in the project

Summarize the types and estimated amount of scientific data to be generated and/or used in the project.

This project will produce pre- and post-intervention health training assessment survey data and post-intervention training focus group data. Data will be collected from 120 participants, generating 24 assessment survey datasets and 12 focus group audio files and transcripts totaling approximately 100MB in size. The following data files will be used or produced during the project:

1. REDCap survey data will be exported to csv file and converted to Stata files for analysis.
2. Focus groups will be recorded as mp3 files and transcribed and coded using NVivo. The transcripts will be saved as .pdf files and the NVivo database will be exported as .qdpx file.

B. Scientific data that will be preserved and shared, and the rationale for doing so.

Describe which scientific data from the project will be preserved and shared and provide the rationale for this decision.

We expect the raw and analysis data sets will contain Protected Health Information (PHI) and/or Personally Identifiable Information (PII). Based on ethical considerations and the need to protect the confidentiality of participants, we expect to produce the following public-use data sets of all participants to be preserved and shared: de-identified and aggregate survey data, as well as redacted and de-identified focus group transcripts.

For any participants agreeing to share their identifiable survey data in the donation agreement, these survey data will be shared in a separate file.

C. Metadata, other relevant data, and associated documentation

Briefly list the metadata, other relevant data, and any associated documentation (e.g., study protocols and data collection instruments) that will be made accessible to facilitate interpretation of the scientific data.

To facilitate interpretation of the data, study protocols, survey instruments and codebooks, and qualitative schedule of questions, coding schema, and code reports of frequency and density will be shared and associated with the relevant datasets.

2. Related Tools/Software and/or Code

State whether specialized tools are needed to access or manipulate shared scientific data to support replication or reuse, and name(s) of the needed tool(s) and software and specify how they can be accessed.

Survey data will be made available in Stata (requires at least Stata SE 16) and tabular (.csv) format that can be loaded into Excel, R (v3.6 or later), or other commonly used statistics programs to be accessed or manipulated. Focus group data will be made available in .pdf for long term preservation and can be transferred to commonly-used qualitative data analysis (QDA) software programs.

3. Standards

State what common data standards will be applied to the scientific data and associated metadata to enable interoperability of datasets and resources, and provide the name(s) of the data standards that will be applied and describe how these data standards will be applied to the scientific data generated by the research proposed in this project. If applicable, indicate that no consensus standards exist.

To facilitate their efficient use, all our data and materials will be structured and described using the following standards:

Formal standards for pre- and post-intervention training assessment data have not yet been widely adopted. However, our data and other materials will be structured and described according to best practices.

Data will be stored in commonly used and open formats, such as .csv and Stata .dta for survey data and .pdf and .qdpx for focus group data, complying with the REFI-QDA Standard for qualitative data transfer. Information needed to make use of these data [e.g., the meaning of variable names, codes, information about missing data, other metadata, etc.] will be recorded in codebooks and coding schema that will be accessible to the research team and subsequently shared alongside final datasets.

Information about our research process, including the details of our analysis pipeline will be maintained contemporaneously, using study protocols and qualitative coding scheme. This information will be accessible to all members of the research team and will be shared alongside our data.

4. Data Preservation, Access, and Associated Timelines

A. Repository where scientific data and metadata will be archived.

Provide the name of the repository(ies) where scientific data and metadata arising from the project will be archived.

Tabular dataset(s) will be deposited in the UNC Dataverse, a generalist data repository managed by the Odum Institute at the University of North Carolina at Chapel Hill. The Odum Institute Data Archive staff has over 50 years of experience in data archiving, management, and sharing. Their expertise will ensure that the data are well-documented and curated in compliance with best practices and standards for data sharing and preservation.

Qualitative data will be shared in Syracuse University’s Qualitative Data Repository (QDR), through the institutional membership managed by the Odum Institute Data Archive. QDR provides the UNC community with dedicated expertise and a customized data repository for preserving and sharing qualitative data sets to ensure long-term access and use.

B. How scientific will be findable and identifiable

Describe how the scientific data will be findable and identifiable, i.e., via a persistent unique identifier or other standard indexing tools.

UNC Dataverse provides persistent identifiers, robust standardized metadata, and is committed to long-term preservation and access of research data. Data are published under a CC0 license by default and tracks all downloads and explorations of the files. UNC Dataverse is routinely backed up and preserved on multiple geographically distributed servers and is a member of Data-PASS, a community committed to the sustainability and access of research data.

QDR assigns persistent digital object identifiers (DOIs), annotations, data citations, and multiple export formats for bibliographic citations. Data citations enable QDR to track the re-use of datasets. QDR staff conducts multiple backups and routine file integrity checks, as well as monitors for file-format obsolescence. The published data in QDR will be linked with the survey data published in UNC Dataverse to facilitate discovery.

C. When and how long the scientific data will be made available

Describe when the scientific data will be made available to other users (i.e., no later than time of an associated publication or end of the performance period, whichever comes first) and for how long data will be available.

Data will be made available once analysis is completed or at the time of associated publication. Data will be stored in UNC Dataverse and the Qualitative Data Repository in perpetuity, as indicated in their preservation policies.

5. Access, Distribution, or Reuse Considerations

A. Factors affecting subsequent access, distribution, or reuse of scientific data

NIH expects that in drafting Plans, researchers maximize the appropriate sharing of scientific data Describe and justify any applicable factors or data use limitations affecting subsequent access, distribution, or reuse of scientific data related to informed consent, privacy and confidentiality protections, and any other considerations that may limit the extent of data sharing.

All participants will consent to the sharing of aggregate and de-identified survey and focus group data. Any potentially identifying variables or focus group comments will be stripped from the public-use data in compliance with IRB protocols and human subjects protections.

Participants will have the option to consent to sharing their identifiable survey data for future research and scholarly use as part of a donation agreement. These identifiable survey data will be made available as a separate, identifiable dataset in UNC Dataverse.

B. Whether access to scientific data will be controlled

State whether access to scientific data derived from humans will be controlled (i.e., made available by a data repository only after approval).

Due to the small sample size and potential for re-identification, the raw data will not be made available for public use. Interested researchers wishing to build upon the raw data and transcripts may submit a data use agreement to request access to these data. Researchers will be required to comply with IRB protocols and to ensure the data are stored on a secure, off-network system with access limited to only project members approved in the data use agreement. Any breach of this agreement is subject to the terms of use stipulated in the data use agreement.

C. Protections for privacy, rights, and confidentiality of human research participants

If generating scientific data derived from humans, describe how human research participants will be protected (e.g., confidentiality, and other protective measures).

In order to ensure participant consent for data sharing, IRB paperwork and informed consent documents will include language describing plans for data management and sharing data, describing the motivation for sharing, and explaining that personal identifying information will be removed from public-use data. A donation agreement will allow participants to decide whether to share their identifiable survey data for future research and scholarly use.

To protect participant privacy and confidentiality, public-use data will be de-identified using the safe harbor method as detailed by the US Department of Health and Human Services. This method will remove any variables or values within the data that could be used to re-identify a participant.

6. Oversight of Data Management and Sharing

Describe how compliance of this Plan will be monitored and managed, frequency of oversight, and by whom at your institution (e.g., titles, roles).

The following individuals will be responsible for data collection, management, storage, retention, and dissemination of project data, including updating and revising the Data Management and Sharing Plan when necessary.

– PI Name, Researcher, Institution/Department, ORCID, email
– Project Manager Name, Researcher, Institution/Department, ORCID, email
– Data Manager Name, Researcher, Institution/Department, ORCID, email
– Analyst Name, Researcher, Institution/Department, ORCID, email
– Research Data Archivist, The Odum Institute Data Archive at the University of North Carolina at Chapel Hill, ORCID, email