The data within the three different PDL indexes also varied slightly, some focusing on scraped LinkedIN information, email addresses and phone numbers, while other indexes provided information on individual social media profiles such as a person’s Facebook, Twitter, and Github URLs. According to their website, the PDL application can be used to search: Over 1.5 Billion unique people, including close to 260 million in the US.
The server was found to be publicly exposed without password protection or encryption during routine IP-address checks on potentially unsecured databases, researchers said. It contained more than 318 million records in total.SocialArks’ data-management platform is used for programmatic advertising and marketing. It bills itself as a “cross-border social-media management company dedicated to solving the current problems of brand building, marketing, marketing, social customer management in China’s foreign trade industry.”
The affected server, hosted by Tencent, was segmented into indices in order to store data obtained from each social-media source, which allowed researchers to look into the data further. “Our research team was able to determine that the entirety of the leaked data was ‘scraped’ from social-media platforms, which is both unethical and a violation of Facebook’s, Instagram’s and LinkedIn’s terms of service,” researchers said, in a Monday blog post. The scraped profiles included 11,651,162 Instagram user profiles; 66,117,839 LinkedIn user profiles; 81,551,567 Facebook user profiles; and 55,300,000 Facebook profiles that were deleted within a few hours after the open server was discovered.
The public profile data included biographies, profile pictures, follower totals, location settings, contact details such as email addresses and phone numbers, number of followers, number of comments, frequently used hashtags, company names, employment position and more.“Social media data scraped for marketing purposes will inevitably include sensitive information,” Jack Mannino, CEO at nVisium, told Threatpost. “For every privacy-conscious person using social media, there is an exponentially greater number of people publicly sharing intimate details about their private lives. To protect yourself, restrict public access to your profile and media assets, be sensible about what you post online, and be careful what permissions you grant to applications that may abuse, misuse or steal your information.”
To create personalized Products that are unique and relevant to you, we use your connections, preferences, interests and activities based on the data we collect and learn from you and others (including any data with special protections you choose to provide); how you use and interact with our Products; and the people, places, or things you're connected to and interested in on and off our Products.
However, in addition to the collating of publicly available data, the database also included, inexplicably, private data for social-media users.“SocialArks’ database stored personal data for Instagram and LinkedIn users such as private phone numbers and email addresses for users that did not divulge such information publicly on their accounts,” researchers said. “How SocialArks could possibly have access to such data in the first place remains unknown…It remains unclear how the company managed to obtain private data from multiple secure sources…Moreover, the company’s server had insufficient security and was left completely unsecured.”
Threatpost has reached out to SocialArks for more information. The database was secured by SocialArks the same day that Security Detectives alerted the company to the issue.
SocialArks suffered a similar data breach in August, which affected 66 million LinkedIn users, 11.6 million Instagram accounts and 81.5 million Facebook accounts – about 150 million in all. The information exposed also consisted of scraped, publicly available data such as full names, country of residence, place of work, position, subscriber data and contact information, as well as direct links to profiles.
Having a central repository for such information opens the door to high-volume, automated social-engineering attacks, experts warned.
“Most data scraping is completely innocuous and carried out by web developers, business intelligence analysts, honest businesses such as travel booker sites, as well as being done for market research purposes online,” the researchers said. “However, even if such data is obtained legally – if it is stored without adequate cybersecurity, large leaks affecting millions of people can occur. When private information including phone numbers, email addresses and birth information is extracted and/or leaked, criminals are empowered to commit heinous acts including identity theft and financial fraud.”
Dirk Schrader, global vice president at New Net Technologies, said that the fact the scraping took place at all – public or private information – is in itself of interest.
But buried within its business-like announcement of the indictment of four Chinese military hackers, there is the following statement, which has huge implications for privacy: For years, we have witnessed China’s voracious appetite for the personal data of Americans, including the theft of personnel records from the U.S. Office of Personnel Management, the intrusion into Marriott hotels, and Anthem health insurance company, and now the wholesale theft of credit and other information from Equifax.
“Public profiles have been scraped before and the giants in that space usually try to block mass scraping attempts as the intention behind is to get access to their ‘oil,'” he told Threatpost. “Why it hasn’t worked in this case would be an interesting fact to know. As a likely affected LinkedIn user, my choices are limited. Either I accept that scraping will happen, or I can reduce my profile which limits my ability to make business connections to a certain extent. How much information a user provides is their choice. Scraping itself, especially when the data collected is so badly secured, increases the likelihood to be targeted with specific attacks and unwanted emails.”
Supply-Chain Security: A 10-Point Audit Webinar: Is your company’s software supply-chain prepared for an attack? On Wed., Jan. 20 at 2p.m. ET, start identifying weaknesses in your supply-chain with actionable advice from experts – part of a limited-engagement and LIVE Threatpost webinar. CISOs, AppDev and SysAdmin are invited to ask a panel of A-list cybersecurity experts how they can avoid being caught exposed in a post-SolarWinds-hack world. Attendance is limited: Register Now and reserve a spot for this exclusive Threatpost Supply-Chain Security webinar – Jan. 20, 2 p.m. ET.