Data Management and Confidentiality for IRB

Your data management plan is the part of the protocol where abstract confidentiality promises become concrete. The IRB wants to know exactly what happens to each piece of information you collect, from the moment a participant provides it to the moment it is destroyed or archived. This guide walks through the vocabulary, the technical safeguards, the retention rules, and the data-sharing arrangements that reviewers expect to see.

Core federal criteria appear in 45 CFR 46.111(a)(7): "when appropriate, there are adequate provisions to protect the privacy of subjects and to maintain the confidentiality of data." Your institution, funders, and (for some studies) HIPAA may impose additional requirements.

Privacy vs. confidentiality

These terms are not interchangeable in IRB documents.

Privacy refers to the participant's control over access to themselves — when and how they interact with researchers, what they disclose, where interviews occur.
Confidentiality refers to what researchers do with information after it is provided — how it is stored, who has access, and how it is shared.

Reviewers expect protections for both. A private setting for an interview protects privacy; encryption of the resulting audio file protects confidentiality.

Identifiability: anonymous, de-identified, coded

Many protocols describe data as "anonymous" when they are actually coded. Precision here matters because the level of identifiability determines regulatory obligations, consent language, and data sharing options.

Anonymous

No identifiers are ever collected, and no one — not even the researcher — can link data back to a participant. A paper survey dropped in a box with no name or IP address is anonymous. An "anonymous online survey" that captures IP addresses in server logs is not.

De-identified

Identifiers were collected but have been removed following a documented method. Under the HIPAA Safe Harbor method, 18 specified identifiers are stripped. Under the expert determination method, a statistician certifies that the risk of re-identification is very small. The key is that no link back exists.

Coded

Identifiers have been replaced with a code (e.g., participant ID 047), and a separate key links codes to identities. Coded data is still considered identifiable under the Common Rule as long as the key exists and is accessible to the research team.

"Confidential" is not a substitute for a plan

Writing "data will be kept confidential" without specifics will almost always draw a revision request. Describe the actual file, the encryption, the location, the access controls, and the retention schedule.

Data security controls

At rest

Store data on institutionally managed, encrypted drives or approved cloud services (e.g., institutional OneDrive, Box, REDCap, Qualtrics). Avoid personal Google Drive or Dropbox accounts.
Encrypt laptops and external drives with full-disk encryption (BitLocker on Windows, FileVault on macOS).
Store paper records in a locked cabinet in a locked office. Name who holds the keys.
Keep the code-identity key physically and logically separate from the coded data — different folder, different access list.

In transit

Transmit data only over encrypted connections (HTTPS, SFTP, institutional VPN).
Do not send identifiable data by email without encryption. Institutional secure-email services are acceptable when configured correctly.
For interviews, use platforms with end-to-end encryption or institutionally approved recording (Zoom with cloud recording disabled, for example).

Access control

Limit access to the minimum personnel necessary. Name each person in the protocol.
Use role-based permissions rather than shared accounts.
Require multi-factor authentication on all accounts that touch study data.
Log access where the platform supports it.

Certificates of Confidentiality

For sensitive research (e.g., substance use, mental health, illegal activity), consider a Certificate of Confidentiality under 42 U.S.C. 241(d). NIH-funded research is automatically covered; other federal funders and private investigators can apply. A CoC protects identifiable data from subpoena in most circumstances.

Retention

Retention periods are set by federal regulation, funder policy, and institutional rules. Common defaults:

Common Rule: records must be retained for at least 3 years after the completion of the research (45 CFR 46.115(b))
HIPAA-covered research: 6 years after creation or last use
FDA-regulated research: often longer (e.g., 2 years after marketing approval or study termination)
Funder requirements: NIH generally expects at least 3 years; NSF often 3 years; many universities require 5–7 years

State the longest applicable retention period in the protocol, the responsible custodian, and the destruction method (secure shredding for paper, cryptographic erase for digital).

Sharing and secondary use

If you plan to share data — with collaborators, in a public repository, or for future use — the consent form must describe this explicitly under 45 CFR 46.116(b)(9). Vague "data may be used for future research" language does not satisfy the broad consent provisions; see Categories 7 and 8 under 45 CFR 46.104 for the narrow circumstances in which broad consent is available.

Data Use Agreements

When sharing identifiable or limited-dataset data with outside collaborators, a Data Use Agreement (DUA) is typically required. Key elements:

Specific data elements transferred
Permitted uses and explicit prohibition on re-identification attempts
Required security controls at the receiving institution
Reporting obligations for breaches
Destruction or return requirements at end of use
Publication rights and credit

Route DUAs through your institution's sponsored programs or technology transfer office — do not sign on behalf of the institution as an individual investigator.

Public repositories

Deposits to dbGaP, ICPSR, OSF, or similar repositories must match what you told participants they agreed to. If your consent form did not mention public deposit, you cannot deposit without re-consent or an IRB-approved waiver.

Breach response

Plan for the possibility of unauthorized access before it happens. Your protocol should name:

Who is notified within the study team
Who is notified at the institution (IRB, information security, privacy officer)
How and when affected participants are notified
Timelines (most institutions require IRB notification within 5 business days of discovery)

Unanticipated problems and protocol deviations related to confidentiality must be reported per your IRB's policy — see the amendments and reporting guide.

Data management plan checklist

Every data element collected is named
Identifiability status is specified (anonymous, de-identified, coded)
Storage location, format, and encryption described
Access list named and minimum-necessary justified
Transmission security described
Retention period and custodian specified
Destruction method described
Data sharing arrangements (if any) described and reflected in consent
Certificate of Confidentiality considered for sensitive data
Breach response procedures documented

Match the consent to the plan

Every confidentiality promise you make to participants in the consent form must be backed by a specific provision in the data management plan. Reviewers compare the two documents directly.

A solid data management plan makes the rest of the protocol easier to defend — it grounds your confidentiality claims in concrete practices. Next, review the amendments guide to understand how to report changes or incidents once you are under IRB oversight.