How to Classify Research Data

Properly protecting research data is a fundamental obligation warranted by the research community's underlying obligations to:

  • the providers and sources of the data,
  • uphold the efficacy of the campus' research mission, and
  • to prevent financial or reputational damages to the University.

To protect research data appropriately and effectively, researchers must understand and carry out their responsibilities related to data security.  The first step towards that goal is to identify the appropriate data classification, which defines the necessary security control requirements for protecting research data.

Why should research data be classified?

Researchers have an obligation to securely protect research data when:

  • The data elements pose a risk of exposing the identity of the research participants.
  • The risk of exposure includes personal medical or financial information, social security or driver's license numbers, or other highly sensitive information that could require notification to the affected research participants in the event of a breach.
  • A data usage agreement (DUA) from the data provider explicitly stipulates the related security control requirements.

Researchers also have an obligation to meet campus security policies:

  • To provide baseline protection of the research data that corresponds to the protection level classification, regardless of an existing DUA.
  • To act as responsible members of the campus computing community by protecting endpoint and server devices from compromise that could affect other members of campus.

And at a basic level, researchers have an obligation to avoid a costly security incident that could delay or distract from their research goals by protecting data appropriately.

A relevant example of this last point occurred recently on campus.  Ransomware infected a researcher's workstation and spread to the department's network file-share drive, encrypting files containing over 20 years of research project data, with little hope of retrieving the encryption key except by paying the ransom.

This disaster was averted by restoring the files from a recent backup, a good example of security preparedness.  Proper security logging also helped to rule out any incidents of illicit access to personally identifiable information.  Without such logging, the department may have been responsible for costly notification regarding potential identity fraud to research subjects.  Additional security safeguards based upon campus policies, when implemented appropriately, could have prevented this incident or stopped it from spreading.

How is research data classified?

The UC Berkeley Data Classification Standard is a framework for assessing data sensitivity, measured by the adverse business impact a breach of the data would have upon the campus.  The following protection levels reflect the basic principle that as the risk associated with the research data increases, more exacting security requirements must be implemented.

Data Class Impact Data Examples

Protection
Level 2
(PL2)

High
(Extremely sensitive individually identifiable information)
  • Information governed by a contract or Data Usage Agreement (DUA) between the research unit and the data-provider that requires compliance with the HIPAA Security Rule or notification to research subjects in the event of a breach.
  • Highly-sensitive information, such as personally identifiable health information, criminal history, etc.
  • Protected Health Information (PHI) protected by HIPAA.  Note that research health information (RHI) is similar to PHI, but has key differences.  For more information, please refer to the UC positional paper and UCB CPHS HIPAA website about what is and is not HIPAA PHI.
Protection
Level 1
(PL1)
Moderate
(Moderately sensitive individually identifiable information)
  • Information governed by a contract or Data Usage Agreement (DUA) between the research unit and the data-provider that does NOT require notification to research subjects in the event of a breach.
  • PHI-related limited data sets.
  • Student record information protected by FERPA.
Protection
Level 0
(PL0)
Low
(Non-sensitive individually identifiable or public information)
  • Fully de-identified research information about people that is not PHI related (caution required as full de-identification is very difficult).
  • Identifiable information which the subject has consented to make publicly available.
  • Information intended for public access, e.g., public directory information.

Steps for classifying research data

The following steps provide a guideline for the considerations necessary to determining the data classification protection level for research data.  Please use the provided template to answer the following questions:

Step 1 Start by identifying the purpose and nature of the research and the data to be classified.
  • Does the research involve human subjects?
  • Is the data public (no sharing restriction) or private (only those with a need-to-know can access)
Step 2 Identify the specific data elements.

For example:

  • Health related information
  • Personally Identifiable Information (PII)
  • Data collected about human research subjects
Step 3 Identify any laws, regulations, or data usage agreements that govern the data.
  • Is there a DUA (Data Usage Agreement) between the research unit and the data-provider?
  • Does the data fall under the catagory of CA State "notice-triggering" information?  (e.g., social security number, driver's license number)
  • Does the data include health information protected by HIPAA?
Step 4 Estimate the number of sensitive records stored.
  • Use this number to help determine the potential impact of a breach (see Step 5)
  • For data elements covered by CA State Law, does the number of records exceed the minimum limit (500) for "notice-triggering" requirements?
Step 5 Understand what notification requirements may exist in the event of a breach and the potential impact of those requirements.
  • Does the DUA specify requirements for an incident response plan?
  • Who will need to be contacted when a security incident is reported?
  • Estimate the cost of notification in the event of a breach (approx. $200 per person)
  • Include potential DUA penalties or fees, and possible litigation costs
Step 6 Estimate the impact to the research project if the data is lost.
  • Will the research project be able to continue unimpeded if the data is lost?  Is there a backup plan?
  • How will the project be affected if the research is impacted by lost work and delays?
  • Will the validity of the research outcome be in question because of a security event?
  • How will the reputation of the research unit (and the University) be affected by a breach, especially in terms of future projects and funding?

Protection Level Requirements

Based on the data protection levels defined in the Data Classification Standard, the Minimum Security Standard for Electronic Information (MSSEI) policy identifies the security protections required to safeguard the data.

The MSSEI requirements include the Minimum Security Standard for Networked Devices (MSSND), which is a mandatory set of protections for all endpoint devices that utilize campus network services, and is required for all protection level data classes.

These basic requirements, such as keeping operating system and productivity software programs up-to-date, and running current malware detection tools, go a long way towards protecting the campus from security incidents such as the ransomware example cited above.

Following is an overview of the basic requirements for each of the protection level data classes:

Data Class Security Requirements
PL0 All MSSND requirements
PL1 MSSND + MSSEI requirements for PL1 data + other relevant requirements (e.g., DUA)
PL2 MSSND + MSSEI requirements for PL2 data + other relevant requirements (e.g., DUA, HIPAA, etc.)

For classification of PL1 or PL2 data, please contact the Research Data Management Program and/or Information Security & Policy (ISP) for assistance with how to apply the MSSEI requirements to research data, and for help with planning the implementation of the requirements.

Additional Resources