A Hong Kong-registered company that sells data on social media influencers has exposed as many as 235 million user profiles scraped from Instagram, TikTok, and YouTube on the web without a password or any other authentication required to access it, according to a report by British research firm Comparitech.
Security researcher Bob Diachenko, who leads Comparitech’s cybersecurity research team, uncovered three identical copies of a database which included names, contact information, images and statistics about followers on August 1, Comparitech said in the report on Wednesday.
The data was from a company called Social Data, which helps businesses “find influencers and get in-depth insights into demographic and psychographic data of influencers and their audience throughout different types of social media on the web”, according to its website.
Get the latest insights and analysis from our Global Impact newsletter on the big stories originating in China.
The vast majority of the profiles were scraped from Facebook-owned Instagram, with the largest data sets including two with data from more than 95 million Instagram profiles each, while at least 42 million records from TikTok and nearly 4 million from Google-owned YouTube were also included in the database, according to the Comparitech report, which added that about one in five records contained either a phone number or email address.
The breach comes at a time when both Western and Chinese social media giants are coming under heavy scrutiny from governments over their data protection policies. Last year, Facebook agreed to pay a fine over the Cambridge Analytica scandal, which involved millions of Facebook users’ personal data being harvested without their consent and used for political campaigns including those related to the 2016 US Presidential Election and the UK’s referendum the same year on leaving the European Union.
TikTok has also been criticised by governments in countries including the US, India and France for its data collection practices. The short video app is now blocked in India and faces a similar ban in the US if it does not divest its American operations within 90 days, US President Donald Trump said last Friday.
Much of the data originated from another now-defunct firm called Deep Social, with which Social Data denies any connection, said Comparitech. It added in the report that Social Data’s chief technology officer acknowledged the exposure and the servers hosting the data were taken down about three hours later.
Web scraping is an automated task that copies data and information from web pages in bulk. It can be difficult to distinguish the automated bots from normal website visitors, so it is hard for social media platforms to prevent them from accessing user profiles, according to the research firm.
Such scraping and storing of information is “vulnerable to spam marketing and phishing campaigns”, Comparitech warned in its report, adding that “even though the information is publicly available, the size and scope of an aggregated database makes it more vulnerable to mass attack than it would be in isolation”.
Facebook spokeswoman Stephanie Otway said that scraping people's information from Instagram is a clear violation of the company’s policies.
“We revoked Deep Social's access to our platform in June 2018 and sent a legal notice prohibiting any further data collection,” Otway said.
A TikTok representative said the short video app places the “highest priority on user privacy” and has anti-scraping policies in place.
“Our Terms of Service prohibit third parties from running automated scripts to collect information from our platform, including public profile information,” the representative said. “If we identify any such practices, we will take rapid action, including seeking legal redress.”
A YouTube representative said that the video platform’s terms of service explicitly forbids collecting data that can be used to identify a person.
“We are currently investigating the specific issue, and will send Social Data a cease and desist letter if the scraping activity is verified or otherwise we believe it necessary,” the representative said.
Social Data did not immediately respond to the Post’s request for comment. According to the Comparitech report, a spokesperson from Social Data told the research firm that “all of the data is available freely to anyone with internet access” and that “social networks themselves expose the data to outsiders – that is their business”.
“Those users who do not wish to provide information, make their accounts private,” the spokesperson reportedly said.
Michael Gazeley, managing director of Hong Kong cybersecurity firm Network Box, said that despite the size of the leak, he did not think that it was a particularly serious breach.
“I don't think it's really a breach of privacy, if the data is already public,” he said. “It's far more worrying when critical, private, data is leaked. For example: passwords, bank details, health records.”
He added: “It becomes more serious if it's possible to do data analysis, for say political manipulation, but the key data, in this case as far as I understand it, isn't critical private data”
Nathaniel Rushforth, a US-qualified lawyer and cybersecurity specialist at Shanghai-based DaWo Law Firm, also said that scraping public profile information is a legal “grey area”, and whether it amounts to a real breach of privacy is “highly debatable”.
“Scraping itself is not necessarily illegal, and it probably doesn’t really breach anybody’s privacy in any significant way,” he said, although he added that some countries penalise offences such as misusing scraped data to inappropriately target people for financial gain or exploiting the data in anticompetitive ways.
“The only real way to prevent a determined data-gatherer from obtaining information on you is to limit what information you put online,” Rushforth said.
More from South China Morning Post:
- Hong Kong police raid office of poll organisers involved in Saturday’s opposition primary, over suspected data leak from 2013 project
- China wakes up to wide web of online data leaks and privacy concerns
- Mainland Chinese hackers attacked government agencies to steal data, Taiwan says
- Attacking TikTok won’t better protect Americans’ data privacy, and Donald Trump knows it
- Beijing internet court rules against Tencent, ByteDance in user data infringement cases
This article Nearly 235 million social media profiles from Instagram, TikTok and YouTube exposed in data leak first appeared on South China Morning Post