Data Management Plans for IRB: Security, Storage, and Sharing

The data management section of an IRB protocol is where vague language is most common and most costly. Reviewers want to see the full lifecycle: what you collect, how it is identified, where it lives, who can access it, how long it is retained, whether it will be shared, and how it will be destroyed. Each piece is a specific operational commitment, and the regulation at 45 CFR 46.111(a)(7) requires provisions to protect privacy and confidentiality as a condition of approval.

What data you are collecting

Start with a full inventory. For each data type, specify:

  • Identifiable, coded, or de-identified.
  • Structured (survey responses, measurements) or unstructured (transcripts, recordings, images).
  • Collected directly from participants or derived from other sources.

"Coded" is not the same as "de-identified." Coded data has identifiers removed and replaced with a study ID, with a linking file maintained separately. De-identified data has no link back to identifiers at all.

Identifiability

Reviewers think about identifiability in three buckets:

  • Direct identifiers: name, email, phone, address, SSN, MRN.
  • Indirect identifiers: birth date, zip code, job title, unique combinations of demographics.
  • Data that is identifiable in context: a small purposive sample where any detail narrows to one person.

For qualitative research especially, indirect identifiability is often the larger risk. A transcript with no names but with specific role, institution, and location information can identify someone to a reader who knows the setting. Stats for Scholars has useful guidance on re-identification risk in small samples — the math matters, and reviewers increasingly ask for it.

Storage

Name the platform. "A password-protected computer" is not enough. Specify:

  • Primary storage location (university-managed OneDrive, REDCap, encrypted server, approved cloud platform).
  • Encryption at rest (disk encryption, platform encryption).
  • Encryption in transit (HTTPS, VPN, SFTP).
  • Backup location and frequency.

For HIPAA-regulated data or data under special regulatory regimes (CUI, FERPA), the storage platform must be approved for that data classification. Check with your institution's information security office before drafting.

Access controls

Who can access the data? Reviewers want a specific list:

  • PI and co-investigators by name and role.
  • Research assistants with current CITI training.
  • Transcriptionists (under confidentiality agreement).
  • Statistical consultants (often under a data use agreement).

Describe the access mechanism: a shared folder with role-based permissions, an institutional study account, a password manager. Two-factor authentication should be explicit.

Linking files and code lists

If you maintain a linking file between study IDs and participant identifiers, describe:

  • Where it is stored (separate from main data, ideally different platform or encrypted container).
  • Who has access (usually PI only or PI plus one designee).
  • When it will be destroyed (often at the end of data collection or at the end of the follow-up window).

A linking file that lives in the same folder as the deidentified data defeats its own purpose.

Retention

Institutions typically require retention of research data for a minimum period — 3 to 7 years is common, longer for federally funded work. State the retention period and cite the institutional policy if applicable. For data that will be retained indefinitely (because it informs a longitudinal study or a published dataset), say so and describe the ongoing protections.

Data sharing

Increasingly, journals and funders require data sharing. Describe:

  • Whether data will be shared (yes, no, or conditionally).
  • In what form (de-identified public-use file, restricted-access repository, data enclave).
  • Where (specific repository if known).
  • Under what terms (data use agreement, license).

Consent language must be consistent with sharing plans. If you promise "only the research team will see the data" and then deposit the data in a public repository, you have a consent problem. Our consent form guide includes sample language for sharing scenarios.

Destruction

State the destruction plan. For electronic data, use secure deletion tools or the institutional equivalent (not just "empty trash"). For paper records, cross-cut shredding. For audio and video, the same secure deletion standard as digital. Give a specific trigger (end of study, final publication, retention period expiry) and a specific method.

Tying it back to the protocol

The data management plan should mirror the consent form and the protocol. If the protocol says interviews will be transcribed by a professional service, the data management plan should describe how that service handles audio, what their confidentiality agreement requires, and how the files are transferred.

For a structured template with the expected fields, see the data management plan template in our template library. For how the data management section connects to the rest of the protocol, see our protocol writing guide.

Common reviewer comments

The recurring flags:

  • Vague storage language ("secure server" without naming it).
  • No encryption specified in transit.
  • Linking file in the same location as the deidentified data.
  • Retention period not stated.
  • Destruction method not specified.
  • Access list not matching personnel list.
  • Sharing plan inconsistent with consent language.

Each is a short fix. Doing all of them at drafting time saves a revision cycle.

The underlying principle

The data management plan is the part of the protocol that proves you have thought about the full lifecycle of participant information. A clear, specific, operational plan tells a reviewer that the rest of the protocol is probably equally careful. A vague one does the opposite. Reviewers read data management as a proxy for overall protocol quality, so it is worth the extra hour at drafting.