Fraud Management & Cybercrime , Governance & Risk Management , IT Risk Management

Data Analytics Firm Polecat Exposed 30TB of Data

Researchers Say Social Media Information Exposed Prajeet Nair (@prajeetspeaks) • March 5, 2021

An unsecured server belonging to UK-based data analytics company Polecat exposed an estimated 30 terabytes of data, including 12 billion records related to social media, according to Wizcase CyberResearch Team. The server has since been secured.

Researchers found that the unsecured Polecat ElasticSearch server was accessible without any authentication and had no encryption in place, with data continuing to be put into the database after the company had been notified of the breach.

Researchers first discovered the exposed database on Oct. 29, 2020. The database was secured on Nov. 2, 2020.

The database contained 30TB of data, exposing over 12 billion records, including over 6.5 billion tweets, almost 5 billion records labeled “social” - which seemed to be all tweets - and over 1 billion posts across different blogs and websites. The data exposed included tweet content, tweet ID, author username, views/follower count, post content, URL, time it was harvested, publisher, region and post title.

Researchers found that the exposed ElasticSearch server could have been discovered and accessed by anyone with the server URL.

Polecat did not respond to Information Security Media Group's request for comment.

While much of the exposed data was gleaned from publicly available sources, Wizcase notes: "It is important to mention that the server exposed some well-protected usernames and hashed passwords belonging to Polecat’s employees. This shows that the company is aware of the security measures required to protect its data, and that the server exposure was likely a result of human error."

Dean Ferrando, a systems engineer at the security firm Tripwire, notes: "Misconfigurations like these are becoming all too common. Exposing sensitive data doesn’t require a sophisticated vulnerability, and the rapid growth of cloud-based data storage has exposed weaknesses in processes that leave data available to anyone. A misconfigured database on an internal network might not be noticed, and if noticed might not go public, but the stakes are higher when your data storage is directly connected to the Internet."

Data Stolen

The day after the server was exposed, researchers discovered that a Meow attack had erased half of the data.

Meow attacks are purely destructive with no financial motivation, replacing the original index with a newly created one with the suffix “-meow”. "The malicious actors behind the attacks do it just for fun, because they can and because it is really simple to do,” says Bob Diachenko, the security researcher who first reported this attack type.

Wizcase researchers say that they found a few more terabytes of data was missing following the Meow attack, leaving the database with just over 2 billion records accounting for around 4TB.

But they suspect that the remaining data was later hacked by a third actor, who left a ransom note, asking for 0.04 bitcoin (around $550 at that time) to get the data back. “It’s important to note that these types of scams/ransoms are usually automated and sent to many open databases," the researchers note.

Polecat, which analyzes data collected for such purposes as election result predictions, started to harvest the exposed data in July 2019, but the server contained records dating back to 2007, the Wizcase researchers say.

The researchers also found that new indices were being added even on the day they discovered the leak, with many new records added every second.

The Wizcase CyberResearch Team estimates that Polecat harvested about 20 million to 50 million tweets per day since the end of July 2019, which represents 4% to 10% of the approximately 500 million total tweets sent each day.

"The breach affected tweets and posts from Twitter users all over the world, in many different languages and across multiple countries," the researchers note.

Upon analyzing a sample of the leaked database, researchers found much of the content related to topics such as racism, propaganda, firearms, COVID-19 and healthcare, as well as politicians such as Donald Trump, Barack Obama, Vladimir Putin and others.

"The data exposed is public data, most likely harvested with the tools Polecat is promoting. Yet, the number of tweets harvested seems very high," the researchers note. "Some of the users whose data was harvested seemed to have all of their tweets exposed in the database, while others had only a few."

The researchers state that since the data had already been discovered by other perpetrators, any of the hackers who found and downloaded the data could try to sell it to Polecat’s competitors.

"Even if there had not been an upcoming election, the data could still have been used to analyze other trends, especially considering the amount of data the leak contained," the researchers note.

Widespread Problem

Misconfigurations are a widespread and a growing problem because many organizations fails to implement and enforce policies and procedures around change control and secure configuration management.

For example, an unprotected database - apparently owned by Adit, a Houston-based online medical appointment and patient management software company - exposed information on 3.1 million patients (see: Unsecured Database Exposed on Web - Then Deleted).

In another incident, an unsecured Amazon Web Services database belonging to India's Dr Lal Path Labs, which offers diagnostic testing, exposed approximately 50GB of patient data, including notes related to the results of COVID-19 tests conducted in October 2020. (see: Unsecured AWS Database Left Patient Data Exposed).