Can Patient Data Be Truly 'De-Identified' for Research?Lawsuit Against University of Chicago, Google, Raises Important Privacy Issues
A lawsuit against the University of Chicago Medical Center and Google seeking class action status points to the important privacy and security issues raised when sharing patient data for research purposes - and whether data can be truly "de-identified."
See Also: Infrastructure Monitoring 101
The lawsuit filed in a federal court in Illinois on June 26 by a former University of Chicago Medicine patient on behalf of other affected individuals alleges that patient health records were not properly de-identified by the hospital before they were shared without patient consent with Google to support the company's predictive medical data analytics technology development efforts.
On its website, the University of Chicago Medicine says the collaboration signed in 2017 with Google was designed to study ways to use data in electronic medical records to make discoveries that could improve the quality of healthcare. The work focuses on using new machine-learning techniques to create predictive models that could help prevent unplanned hospital readmissions, avoid costly complications and save lives, the university says.
The lawsuit notes that HIPAA allows for sharing for research purposes patient information that has been de-identified by one of two de-identification methods. Those methods include the "expert determination" method to determine if risk of de-identification is small and the "safe harbor" method, which involves removing a long list of identifiers.
"If patient data is properly de-identified it is no longer considered protected health information under HIPAA, and can be shared for research," notes attorney Stephen Wu of Silicon Valley Law Group. "In this [case] context, I don't know if Google is a 'business associate' under HIPAA." But if the data has been de-identified, and it's no longer PHI, then Google would be a vendor but not necessarily a BA in this situation." Under HIPAA, properly de-identified data can be shared with third-parties that are not business associates, he notes.
Privacy attorney David Holtzman of the security consultancy CynergisTek notes that HIPAA requirements for disclosing protected health information to a vendor who is a business associate are separate and distinct from that of a researcher.
"The HIPAA rules define research as a systematic investigation designed to develop or contribute to generalizable knowledge," he says. "A covered entity is permitted to disclose protected health information for research without the patient's authorization so long as the researchers have obtained the approval of a privacy board or institutional review board that has oversight over the research program."
HHS guidance notes that the HIPAA Privacy Rule establishes the conditions under which PHI may be used or disclosed by covered entities for research purposes. The Privacy Rule also defines the means by which individuals will be informed of uses and disclosures of their medical information for research purposes, and their rights to access information about them held by covered entities. But a covered entity may always use or disclose for research purposes health information which has been de-identified, HHS adds.
The lawsuit alleges that while the medical center claims it de-identified patient records shared with Google, the data included date stamps of when patients checked in and out of the hospital, as well as "copious free-text notes." As a result, the lawsuit alleges, through Google's "prolific data mining ... [the company] is uniquely able to determine the identity of almost every medical record released by the university."
Google and the university touted the security measures used to transfer and store these records, along with the fact that they had been "de-identified," the suit alleges. "In reality, these records were not sufficiently anonymized and put the patients' privacy at grave risk," the suit alleges.
"The inclusion of, at the very least, the date stamp data immediately places the transfer of this medical data outside of the safe harbor provisions of HIPAA," the suit claims.
"The university did not perform an expert determination before transferring the medical records to Google; or, alternatively, if it did make that attempt, any finding that 'the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information' [as required under HIPAA] was woefully misplaced," the suit says.
"While defendants claim to have de-identified the hundreds of thousands of medical records transferred to Google without patient permission, Google is uniquely able to re-identify those records," the lawsuit alleges.
"The widespread availability of new tools and technologies makes the current de-identification standards less meaningful."
—David Holtzman, CynergisTek
Google is one of the largest data mining companies in the world, drawing data from thousands of sources and compiling information about individuals' personal traits, the lawsuit contends. "Based on these detailed profiles alone, Google has access to public and nonpublic information that could easily lead to the re-identification of the medical records it received," the suit alleges.
"Beyond the vast amount of personal information Google possesses, and its incredibly powerful analytics capabilities ... Google has in its possession detailed geolocation information that it can use to pinpoint and match exactly when certain people entered and exited the university's hospital."
In a statement provided to Information Security Media Group, Google says: "We believe our healthcare research could help save lives in the future, which is why we take privacy seriously and follow all relevant rules and regulations in our handling of health data. In particular, we take compliance with HIPAA seriously, including in the receipt and use of the limited data set provided by the University of Chicago."
The University of Chicago Medical Center did not immediately respond to ISMG's request for comment on the lawsuit.
Among other allegations, the lawsuit claims the university did not obtain patients' express consent to disclose their medical records to Google.
The suit alleges that the medical center's notice of privacy practices explicitly states that it "will obtain your written permission ... for the sale of your medical information." Nowhere does the university disclose that it would transfer patients' medical records to Google, the suit claims.
"Likewise, the university's admission and outpatient agreement and authorization form does not give the university permission to disclose patient's medical records to Google for any purpose whatsoever," according to the lawsuit
Data Re-identification Risk?
"I'm not a data scientist but my gut instincts tell me that on a case-by-case, it's getting harder to identify data and getting easier to combine [de-identified] data sets with other [de-identified] data sets for re-identification," Wu says.
Whether Google's Android phone, geolocation and data mining technology - combined with other factors - can be used to help re-identify patient data is unclear, Wu says.
"My mind is open to the possibility, but I'm skeptical," he says. "There are other reasons why an individual could enter or leave a hospital at any particular time - perhaps for a sales call or to visit a patient," he says.
But technology attorney Steven Teppler of the law firm Mandelbaum Salsburg says truly de-identifying data is becoming more difficult.
"With the evolution of AI [artificial intelligence], and with the help of patients themselves - who increasingly provide identifiable bits of information about themselves every time they fire up a browser, in particular, Chrome, or access a web site, any 'de-identification' process becomes less efficacious, to the point of being nearly useless."
Holtzman offers a similar assessment. "HIPAA's de-identification standards were developed at a time where technology and the availability of comprehensive data profiles made it a low risk for re-identification information to an individual consumer," he says.
"The widespread availability of new tools and technologies makes the current de-identification standards less meaningful. A number of states are exploring ways to limit the use of de-identified health data or to require authorization of the individual prior to making the data identifiable."
So what are the emerging lessons for covered entities regarding sharing data for research purposes?
Wu says HIPAA covered entities, such as hospitals, should hire data scientists who are "up on the latest technologies" and make sure they conduct due diligence "that de-identified data can't be potentially re-identified by vendors on the sly."
Teppler urges covered entities to obtain authorization from patients for sharing data for research. "And be very clear about the scope of the authorization," he says. "Obtain assurances - where possible - that the data for which consent is authorized is not further disclosed in a manner outside the scope of the authorization."
Covered entities and business associates need to think beyond the minimum requirements of HIPAA when sharing de-identified data, Holtzman contends.
"In light of the ubiquity of personal data characteristics and the ability to construct complete profiles of individuals, healthcare organizations should obtain contractual guarantees prohibiting the attempt to re-identify health-related data by any party who comes to possess the information," Holtzman says. "Otherwise, they run the risk of defending themselves against class action lawsuits by plaintiffs' attorneys who are continually pushing the edge of expanding protections of consumer's rights to control the use of their personal information."