A collection containing data about more than 700 million users, believed to have been scraped from LinkedIn, was leaked online this week after hackers previously tried to sell it earlier this year in June.
The collection, obtained by The Record from a source, is currently being shared in private Telegram channels in the form of a torrent file containing approximately 187 GB of archived data.
The Record analyzed files from this collection and found the data to be authentic, with data points such as:
- LinkedIn profile names
- LinkedIn ID
- LinkedIn profile URL
- Location information (town, city, country)
- Email addresses
While the vast majority of the data points contained in the leak are already public information and pose no threat to LinkedIn users, the leak also contains email addresses that are not normally viewable to the public on the official LinkedIn site.
Linked to users’ real-life names and personas, the email addresses and the leak are a gold mine for threat actors looking to target high-profile executives or employees working in sensitive areas of a company, such as financial departments or security teams.
Fortunately, the leak does not include email addresses for each and every user, meaning that the vast majority of the entries included in this leak are worthless.
Contacted via email earlier this week, LinkedIn deferred comment to its June 2021 official statement.
At the time, LinkedIn said that no data breach occurred, and the data was scraped off LinkedIn but also other sites as well.
In fairness, the company might be getting the raw end of the stick in this situation, as data scraped off its website and enriched with email addresses from other sources might not be something that LinkedIn can control, and the company can’t be blamed for threat actors collecting public data needed to power its service in the first place.
But in the general picture, incidents of scraping public sites have also been getting more common, such as scrapes of Clubhouse, Instagram, and Facebook data.
While the data they collect is typically considered public information and not particularly sensitive in any way, these collections are still sought after for other purposes, such as building OSINT databases and enriching them with information from multiple sources in order to have a better understanding of the would-be victims threat actors would like to select and target in the future.