Australian Health Breach Exposes Danger of 'Anonymous' DataResearchers Cracked Encrypted Health Information
The abrupt withdrawal of a set of historical medical data by Australia's Department of Health once again shows the dangerous waters organizations tread when trying to make valuable data sets available to the public, but still preserve privacy.
See Also: Infrastructure Monitoring 101
The data was a 30-year sampling of claims Australians made under Medicare, the country's public health service, and for pharmaceutical benefits. The data was released as part of a large Australian program to make government data more accessible.
But researchers at the University of Melbourne found that the method used to encrypt a key part of the information could be reverse engineered in a few days due to a weak algorithm.
The finding underscores the importance of carefully examining mathematical techniques used to anonymize data, says Vanessa Teague, one of the researchers and a senior lecturer in the computing and information systems department at the University of Melbourne.
"The answer is there should be an opportunity to examine the algorithm in advance," Teague says.
Although no patient information was exposed, the discovery was compelling enough for the Department of Health to withdraw the data set until it can be more strongly secured. Australia's Privacy Commissioner has also launched an investigation.
Not So Anonymous
The data was released in mid-August and contained claims from the Medicare Benefits Schedule and Pharmaceutical Benefits Schedule lodged between 1984 and 2014. It contained more than 1 billion lines of information on about 3 million Australians, about 10 percent of Medicare's patients.
The Department of Health took steps to protect sensitive data. The data did not contain patient names, but instead assigned them an ID number, which was not connected to their real Medicare number. Only a person's birth year was included. The dates of medical consultations were changed to within 14 days of the actual event.
Medical facilities have a provider ID, which is supplied to Medicare as part of a claim. Their provider ID was used as a seed, or a starting point to generate what should be a random, encrypted value. In theory, it should not be possible to reverse the value and link it back to a real provider ID.
Authorities were providing the data so that medical industry stakeholders could run analyses and reveal more about use of the country's public health system. The structure of the data was designed to allow it to be linked to other data sets collected by hospitals and records of services, such as immunizations.
The researchers - Chris Culnane, Benjamin Rubinstein and Teague - found they were able to decrypt the service provider ID. When the agency originally posted the data, the Department of Health described, in part, the encryption algorithm used.
The description allowed the researchers to figure out its weaknesses and eventually decrypt the provider IDs. But Teague says the researchers did not successfully decrypt the patient ID.
Even if an attacker can't decrypt the patient or service provider IDs, the information that is not encrypted could help triangulate on a patient. For example, the set contained information about several people who were born in the 1890s. That's a bit of distinct information since there are only a few of those people. Combined with other public information, the birth year could be used to make solid guess about a person's identity, Teague says.
Another hypothetical example might be a member of Parliament who suffered a heart attack, Teague says. The media might have covered such an event, and the person's birth year wouldn't be hard to find. The more an attacker knows about a person, the greater that chance that information could be leveraged to discover more, Teague says. "It's a question of how much information you've got and how unusual that information is."
Now a Criminal Offense
The Department of Health's mistake is not likely to raise confidence in the Australian government's handling of data.
Part of the uproar around the country's recent census focused on how that data is going to be retained for longer periods. Similar to the handling of the data held by the Department of the Health, the Australian Bureau of Statistics plans to anonymize data, but in such a way that it can be accurately linked to other government databases.
There's a fear of "linkage attacks," where supposedly anonymous data can be linked to real people or events.
One of the most famous examples dates from 2006, when AOL released 9 million search queries from more than 650,000 users over a three-month period. In an investigation, the New York Times showed it was trivial to analyze search terms and contextually hone in on a particular user, even after AOL had replaced user IDs with an anonymous number.
Following the Department of Health's situation, Attorney-General George Brandis announced a quick amendment to the Privacy Act 1988, making it a criminal offense to de-anonymize government data. Teague says a better approach would be to start with more secure anonymizing techniques.