|Articles|July 11, 2023

MHE July 2023
Volume 33
Issue 7

‘Deidentified’ Health Data Not So Deidentified After All

Because personal information can be discovered rather easily through ‘reidentification,’ additional steps are needed to protect the privacy of people’s healthcare data.

All Jessilyn Dunn, Ph.D., wanted was a clear-cut answer.

As an assistant professor of biomedical engineering at Duke University, a critical first step in her research is to contact Duke’s information technology (IT) department to get an assessment of the privacy risks associated with the data sets she plans to use. Such assessments are necessary to properly secure and store the data upon which her research relies.

But the IT department’s responses were never fully satisfying.

“They couldn’t seem to give us a straight answer on what the risk level of the data was,” she recalls. “And so they would default to the highest risk level, which would mean it would be much more expensive to store the data.”

Frustrated, Dunn decided to take matters into her own hands. She resolved to undertake a thorough analysis of existing scientific literature to better understand the privacy risks associated with the use of healthcare data.

She found an answer. It wasn’t what she expected.

Reidentifying the deidentified

At the heart of Dunn’s question is the concept of data “deidentification”: decoupling health data from personal information, so the stripped-down data can be used and shared in scientific research without compromising the privacy of the individuals who generated the data. The concept is critically important in the age of digital health because millions of people around the globe now use wearable health devices, prescription digital therapeutics and other technologies that generate constant streams of very valuable — but also very personal — health data.

Optimally, such a massive amount of data could be used to improve everything from digital health algorithms to drug development to, ultimately, patient outcomes. However, most academic institutions and government agencies will only allow the use of such data if it does not risk patient privacy.

The Health Insurance Portability and Accountability Act (HIPAA) laid out 18 personal health information identifiers— name, birth date, address, phone number and so on — that could be used to link data to individuals. The standard practice in deidentifying information is to remove those identifiers from a given data set before utilizing or sharing the data. In that way, deidentifying data is pretty straightforward, Dunn says.

However, if one takes a broader view, things get more complicated. Deleting columns on a spreadsheet is simple enough, but Dunn and colleagues wondered whether doing so would actually achieve the goal of making it difficult or impossible to reidentify the people who generated the data.

“So I guess I would say the technical definition (of deidentified data) and the functional definition may be slightly different,” she says.

Diving in

To find out if HIPAA-based deidentification actually works, Dunn and colleagues conducted a systematic review of literature pertaining to the reidentification of people based on health data sets, primarily of data generated from wearable devices. They found a total of 72 studies that met their inclusion criteria.

It did not take long to see a trend. Dunn says a former student, Lucy Chikwetu, M.Sc., did the data collection. “As she was going through the data, I would mention that it seems like there are a lot more reidentification possibilities than we initially thought,” she says.

Still, she knew she could not draw conclusions until all the data were compiled. Once collected, though, her fears were confirmed. The studies showed that reidentification from deidentified data was possible between 86% and 100% of the time, suggesting a high risk of reidentification. In some cases, just a few seconds of sensor data could be used to reidentify a person by matching the sensor data in a deidentified data set to the same data in another data set with identifying information. In an era when the generation of data is far outpacing privacy controls and regulation, finding matching data sets with identifying information is not as hard as it might seem, Dunn says.

Dunn and her colleagues laid out a scenario in which an employee participates in her company’s wellness program, which involves tracking daily step counts and heart rate. Separately, in this hypothetical, the patient participates in a stroke prevention study that also uses step and heart-rate data. Following study protocols, the patient discloses to the researchers that she has HIV, a fact that she has not told her employer.

Although the study deidentifies the information before publishing the data, her employer could obtain the study data set, match it to their wellness program data and learn that the employee has HIV. All of that might be perfectly legal, Dunn says. However, it would also expose the employee to potential discrimination due to her HIV diagnosis.

Given the ubiquity of personal data on the internet and the wide array of data peddlers — legal and otherwise — the risks to patient privacy are quite high.

And although the hypothetical outlined in the study referred to step counts and heart-rate data, Dunn says a wide variety of data can be leveraged to identify people. For example, a studypublished in 2019 in the journal IEEE Access of 46 participants found that just 2 seconds of electroencephalogram data could be used to correctly identify 95% of patients.

Another study, also published in IEEE Access, showed that accelerometer and gyroscope data gathered from daily activities like toothbrushing could be used to identify the patients who generated the data. Dunn and her colleagues reported their findings in February 2023 in The Lancet Digital Health.

Cause for concern, not retreat

The study has major implications for biomedical research. It means that a central premise upon which digital health innovation has been built — that big data can be used without risking participant privacy — is shaky. In publishing the analysis, Dunn says her goal is to shed light on privacy issues, not to bring digital health research to a halt. “We want to avoid a situation where we post data without understanding the consequences,” she says.

Improving data privacy

Instead of encouraging people to avoid sharing data, she says, she hopes to start a conversation about finding new ways to share data safely. A number of proposals are already being developed to improve the privacy of personal health data. One potential part of the solution, Dunn says, is so-called “secure enclaves,” hardware being rolled out that provides an extra layer of protection by cordoning off sensitive information from users or applications that do not need access to it.

Another notion is the use of federated learning systems. “So rather than bringing data to your algorithms, you bring your algorithms to data,” Dunn says. Such a system would allow algorithms to learn from and analyze data without needing to compromise the security of the data.

Another important point, Dunn says, is for researchers to be more discerning and to think more critically about who needs access to which data. Rather than making an entire data set publicly available to anyone, they could offer tiers of access, making sure the right data get into the “right hands, for the right reasons,” she says.

Sharing data involves significant risks, she says, but it also brings the potential for monumental scientific advancement.

“And so we have to be able to think about both sides of that and make informed decisions,” Dunn comments. “There’s a lot we have
to learn.”

Jared Kaltwasser is a writer in Iowa and a frequent contributor toManaged Healthcare Executive.

Articles in this issue

over 2 years ago

Article

2023 Halftime Report — Retail Healthcare, Price Transparency, Evolution of Medicaid Programs

over 2 years ago

Article

2023 Halftime Report — Price Negotiation, PBM Legislation, Med Shortages

over 2 years ago

Article

A Conversation With Eric Hunter, MBA on Redetermination, Combining with SCAN and Major Players ‘Sniffing Around’ Medicaid and Oregon

over 2 years ago

Article

Insomnia and its Harms

over 2 years ago

Article

Healthcare and Climate Change

over 2 years ago

Article

Is Telehealth Leaving Some Patients Behind?

over 2 years ago

Article

4 Important Approvals Expected This Year

over 2 years ago

Article

The Humira Biosimilar Gold Rush is On

Get the latest industry news, event updates, and more from Managed healthcare Executive.

Subscribe Now!

‘Deidentified’ Health Data Not So Deidentified After All

Articles in this issue

Newsletter

Related Content

Cancer survival fell and deaths rose during first two years of COVID-19

FDA issues draft guidance on MRD and complete response as primary endpoints for accelerated multiple myeloma drug approvals

7 things you need to know about the PBM reforms signed into law this week

TrumpRx launches; some experts question its long-term value

PrEP underprescribed in young women despite need

Latest CME

Community Oncology Connections™: Optimizing SCLC Treatment Strategies and Managing Adverse Events Across Disease Stages | South Carolina

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | Kansas

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | Wyoming and Montana

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | New Mexico

Community Oncology Connections™: Optimizing SCLC Treatment Strategies and Managing Adverse Events Across Disease Stages | North Carolina

A Breath of Strength: Managing Cancer Associated LEMS and Lung Cancer as One

Huntington Beach Symposia

Striking the Right Nerve: Managing Cancer Associated LEMS in Lung Cancer Patients

3rd Annual Hawaii Lung: A Multidisciplinary Case-Based Conference

Community Practice Connections™: Incorporating Recent Updates in the Treatment of Metastatic ALK-Positive NSCLC

Virtual Testing Board: Digging Deeper on Your Testing Reports to Elevate Patient Outcomes in Advanced Non–Small Cell Lung Cancer

Mastering Advances in Managing Unresectable and Metastatic NSCLC—Immunotherapy, Targeted Therapies, and Emerging Strategies

Cases & Conversations™: Expert Perspectives on Leveraging Recent Advances to Transform SCLC Treatment

22nd Annual Winter Lung Cancer Conference®

Show Me Your Care Plan!™ Insights for Oncology Nurses on Comprehensive SCLC Treatment and Care Strategies

Show Me Your Care Plan!™ Insights for Oncology Nurses on Comprehensive SCLC Treatment and Care Strategies

(CME Credit) Advancing Outcomes in Limited-Stage Small Cell Lung Cancer: From Evidence to Practice

Medical Crossfire®: Expert Perspectives on Targeting c-Met Overexpression and 𝘔𝘌𝘛 Genomic Alterations in NSCLC – Unveiling the Complexities of 𝘔𝘌𝘛 Dysregulation

PER Tumor Board®: Applying Recent Advances to Transform the Treatment Paradigm in SCLC—Expert Perspectives on New Approvals and Emerging Strategies

Tumor Board: Expert Insights on Managing Classical 𝘌𝘎𝘍𝘙 Mutations, 𝘌𝘎𝘍𝘙 Exon 20 Insertions, and Atypical 𝘌𝘎𝘍𝘙 Mutations in Metastatic NSCLC

Medical Crossfire®: DLL3-Driven Innovations in Small Cell Lung Cancer – How Do Experts Apply Pivotal Advances to Practice?

Medical Crossfire®: The Precision Path for HER2 and TROP2-Targeted Treatments in Non–Small Cell Lung Cancer

Trending on Managed Healthcare Executive

7 things you need to know about the PBM reforms signed into law this week

What exactly is managed care today?

PBM reform. It has finally happened

TrumpRx launches; some experts question its long-term value

3 takeaways from Cigna’s 2025 earnings call