Access Management , Cloud Security , Data Loss Prevention (DLP)

Does This Exposed Chinese Database Pose a Security Threat?

The Zhenhua Data Leak Is Scraped Public Data. It Poses No Threat.
Does This Exposed Chinese Database Pose a Security Threat?
The login page for the Overseas Key Information Database, which collects data from social media platforms and other sources.

A leaked database compiled by a Chinese company has suddenly become the focus of media reports warning that it could be used for espionage by Beijing. But on closer examination, the data is public information that's been scraped, largely from social media sites and other public sources.

See Also: Identity Security Clinic

On Monday, news outlets including the ABC and the Australian Financial Review released a coordinated scoop about a leaked database from China. The data includes that of prominent members of Australian society, including many politicians.

The database contains details on at least 2.4 million people, including 35,000 Australians and Prime Minister Scott Morrison, as well as many business people.

The breathless reporting about the database has stoked fears Beijing may be collecting data on Australians and other people around the world to spy on them. But while it's easy to spin up a furor over anything involving China and cybersecurity, this data exposure deserves a more precise examination.

The database comes from a company called Zhenhua Data. According to Christopher Balding, an American academic in Vietnam, a source in China passed him the data, putting the source "at risk" from the Chinese Communist Party.

"The individual who provided the Shenzhen Zhenhua database by putting themselves at risk to get this data out has done an enormous service and is proof that many inside China are concerned about CCP authoritarianism and surveillance," Balding writes in a blog post on Monday.

What is the OKIDB?

The database is called the Overseas Key Information Database, or OKIDB. As I read the reports about it, I thought I might have seen it before. In fact, I had.

By virtue of being on the cybersecurity beat, I often receive tips about leaks. I've amassed files filled with random leaked data, many of which remain unconfirmed. The OKIDB information had remained in that bucket.

I started posting screenshots on Twitter of the version of the OKIDB I'd seen. I tagged Balding and Robert Potter, the co-founder of a Canberra-based company called Internet 2.0. Balding shared the database with Internet 2.0 to put it into a more digestible format because the version he'd received was corrupted.

I called Potter on Monday morning, and it became clear that the OKIDB that I saw is the same database Balding and Potter possess. In response, some people have rightly asked me why I didn't write about this sooner. Here's the skinny.

The database was brought to my attention in late December 2019 or early January by a computer security researcher who is not based in China. The database had been left on the internet, open for anyone to access, presumably by mistake. In more precise terms, it was an unsecured Elasticsearch cluster.

Elasticsearch is an open-source platform for storing and querying data. By default, Elasticsearch clusters are not publicly accessible. But the clusters can be rolled out in a misconfigured manner, leaving data open on the Internet. Often, it's possible to hunt out misconfigured Elasticsearch instances using device-focused search engines such as

When I reviewed the data stored in the OKIDB, it appeared impressive mostly for its size - hundreds and hundreds of gigabytes - but otherwise the data didn't appear to be sensitive. All of it seemed to be public. For example, there were bits from U.S. Navy press releases announcing deployments of ships, some of which had been translated into Mandarin.

One of the indices contained a list of U.S. Air Force personnel. It included names and addresses but no birth dates. Those listings contained a couple of interesting fields, such as "airmenID" and "medicalExpirationDate." But that data turned out to be public. There were also entries for U.S. Navy officers but included links to public biographies that have been posted on Navy websites.

A sample of the data concerning U.S. Air Force personnel. The data is public, but ISMG chose to redact the name and address in this specific record.

Other indices contained what appeared to be research papers from think tanks. Copious amounts of data had been copied from sources including Crunchbase and EveryPolitician. Largely, however, I didn't see anything that raised alarms.

Social Media Scraping

So where did all this data originate? The database is related to a domain, aggso[dot]com, which belonged to a commercial Chinese company. The company specialized in aggregating data.

The front page of its now-shuttered website, okidb.aggso[dot]com, mentioned numerous data sources, including LinkedIn, Facebook, Instagram, YouTube, Twitter and Medium. It appeared quite similar to U.S. companies such as Spokeo or Pipl, which mine a variety of public data sources and link them together.

Aggso[dot]com advertised its ability to collect massive amounts of data posted publicly on popular social networking sites.

Early views of aggso[dot]com on the Wayback Machine from around 2012 show that it started out as something called the Weiju Social Media Management System.

A view of on Jan. 4, 2012.

Over time, the company changed how it marketed itself. The Australian Financial Review reports that Zhenhua Data was recently marketing the data it holds as the "Internet Big Data Military Intelligence System." While the company's website is now offline, it had listed such customers as the People's Liberation Army and Communist Party, the Australian Financial Review reports.

After reviewing the data set in January, I didn't see much to merit a story. To be sure, it contained a huge amount of data, some of which had obvious ties to China, but nothing appeared to be overtly nefarious. I also tried to contact the registrant for aggso[dot]com but received no reply. OKIDB joined the long list of other data exposures that I have learned about but not seen fit to report on.

Risky Data Collection?

I asked Potter this key question: What kind of non-public data is in the database? Because if there is any, it might give more weight to suggestions that the collected data poses a risk.

Potter responded that "it depends on how you define open source" and that "there seemed to be a fair amount in there that had been pinched from other platforms, which in and of itself wasn't open source as a method it was ingested in."

Asking Potter to define exactly what that meant, he told me that there seemed to be data that was "not classified but they're not public sources." He mentioned data from Factiva, the news-monitoring and research tool from Dow Jones. I pointed out Factiva isn't sensitive, but rather subscriber-only content.

To be sure, there are reasons to be worried about China's cyber activity. U.S. prosecutors have pinned on China some of the largest and most worrisome hacks in memory, including the U.S. Office of Personnel Management, Equifax and health insurance giant Anthem. Here in Australia, the country has been blamed for attacks on Parliament's email system and against Australian National University. The data from those hacks has never publicly surfaced. If Zhenhua's repository had that kind of data, this would be a much more significant finding.

Caution: I have seen only a small slice of the data. There could be material in there that is highly sensitive. But if that is true, then I call on anyone who's making this out to be a significant national security concern to describe that highly sensitive data more fully. So far, that hasn't happened.

I cringed when I saw the Australian Financial Review's hyperbolic headline contending that this material comprises a "social media warfare database." Anyone who posts material to social media sites or the internet in general should expect that data to be scraped by marketing agencies and others. By this point in the internet's history, everyone should have gotten fair warning that this is the current state of affairs. Be careful what you expose.

Zhenhua Data looks like a company that has done what countless other Western companies have done in the age in which data is the new oil: Collect it and sell it. The company wasn't trying to hide. Neither was it very good at securing its own data.

About the Author

Jeremy Kirk

Jeremy Kirk

Executive Editor, Security and Technology, ISMG

Kirk was executive editor for security and technology for Information Security Media Group. Reporting from Sydney, Australia, he created "The Ransomware Files" podcast, which tells the harrowing stories of IT pros who have fought back against ransomware.

Around the Network

Our website uses cookies. Cookies enable us to provide the best experience possible and help us understand how visitors use our website. By browsing, you agree to our use of cookies.