Is It Legal to Scrape Data from LinkedIn? A Comprehensive Guide

LinkedIn is the world‘s largest professional networking site with over 900 million members in more than 200 countries worldwide. With detailed data on companies, jobs, and individual professionals, it‘s no surprise that LinkedIn is an appealing target for web scraping. Many businesses and researchers are interested in extracting large amounts of LinkedIn data to gain business intelligence, generate leads, conduct market research, or train machine learning models.

But is it actually legal to scrape data from LinkedIn? The short answer is: it depends. There are many factors that impact the legality of scraping LinkedIn, and the legal landscape is complex and evolving. In this guide, we‘ll take a deep dive into LinkedIn scraping from a legal perspective. We‘ll look at LinkedIn‘s terms of service, past legal cases, and the key issues involved.

What is Web Scraping and LinkedIn Scraping?

First, let‘s define what we mean by "web scraping" and "LinkedIn scraping." Web scraping refers to the automated process of extracting data from websites using software tools called web scrapers, spiders, or bots. Instead of manually copying and pasting data, a web scraper can quickly extract large amounts of information.

LinkedIn scraping is simply the process of scraping data from LinkedIn.com. This could include extracting:

  • Individual profiles with names, job titles, work history, education, skills, etc.
  • Company pages with employee counts, job openings, industries, locations, etc.
  • Job postings with descriptions, requirements, salaries, application links, etc.
  • Posts, articles, and other user-generated content
  • Connection graphs and social network data

With a wealth of valuable business and professional data, LinkedIn is a prime source for web scraping.

LinkedIn‘s Terms of Service and robots.txt

To determine if scraping LinkedIn is allowed, the first place to look is LinkedIn‘s user agreement and terms of service. As of 2023, LinkedIn‘s user agreement prohibits scraping without express permission:

"You agree that you will not: […] Develop, support or use software, devices, scripts, robots or any other means or processes (including crawlers, browser plugins and add-ons or any other technology) to scrape the Services or otherwise copy profiles and other data from the Services."

The "Services" refers to LinkedIn‘s website, content, and related services. So according to the TOS, any unauthorized scraping is prohibited.

Additionally, LinkedIn has a robots.txt file that specifies certain restrictions for web crawlers. However, it does not disallow all crawling and scraping. The robots.txt includes comments stating:

"LinkedIn‘s appreciation for the existence of web crawlers goes hand-in-hand with our commitment to the privacy of our members. Therefore we request that you do not crawl or index individual profile pages, in order to maintain our members‘ privacy. Crawling company pages, on the other hand, is allowed."

So LinkedIn‘s robots.txt draws a distinction between scraping individual profile pages, which it disallows, and scraping company pages, which it allows.

Past Legal Cases Involving LinkedIn and Scraping

LinkedIn has been involved in several high-profile legal cases related to scraping:

hiQ Labs v. LinkedIn (2019)

The most significant case is hiQ Labs v. LinkedIn. HiQ was a data analytics startup that scraped publicly available LinkedIn profiles to provide business insights to its clients. In 2017, LinkedIn sent hiQ a cease and desist letter demanding that it stop scraping LinkedIn data. HiQ filed suit, seeking a declaration that its scraping activity was lawful.

In 2019, the U.S. Ninth Circuit Court of Appeals ruled in favor of hiQ. The court held that hiQ‘s scraping of publicly available data likely did not violate the Computer Fraud and Abuse Act (CFAA), a federal anti-hacking law. The court found that since the scraped data was not behind a login, hiQ did not circumvent any access restrictions. Therefore, hiQ was likely not "without authorization" under the CFAA.

However, the case was remanded back to the district court for further proceedings and later settled out of court in 2022. So while it was an important ruling, the hiQ case did not definitively settle the legality of scraping LinkedIn.

LinkedIn v. Markovitz (2022)

In another recent case, LinkedIn sued data analytics firm Mantheos and its founder Luke Markovitz for scraping LinkedIn data in violation of the site‘s terms of service. LinkedIn alleged that Mantheos created thousands of fake LinkedIn accounts to collect data from hundreds of thousands of real accounts.

In 2022, the court granted LinkedIn a permanent injunction prohibiting Mantheos from scraping LinkedIn going forward. The court found that Mantheos likely violated the CFAA by circumventing LinkedIn‘s technical restrictions by using fake accounts and automated scripts.

Factors Impacting the Legality of LinkedIn Scraping

As the hiQ and Markovitz cases illustrate, there are several key factors that impact whether scraping LinkedIn is legal in a particular case:

Authorization and Credentials

A key issue is whether the scraper has authorization to access and collect the data. Logging into LinkedIn with valid user credentials can, in some cases, constitute authorization to access certain data available to that account. However, LinkedIn‘s TOS prohibit using automation to scrape even when logged in.

Using fake accounts to log in and scrape, as in the Markovitz case, is more likely to be deemed "without authorization" or "exceeding authorized access" under laws like the CFAA.

Terms of Service

Another important factor is whether the scraping violates the TOS of the target site. Most websites have terms prohibiting scraping. As we‘ve seen, LinkedIn‘s TOS expressly disallows using automation to scrape member data.

However, courts have reached different conclusions on whether violating a site‘s TOS is by itself a basis for liability under laws like the CFAA. Some courts have held that mere TOS violations do not constitute "without authorization" access. Other courts have found that TOS-violating scrapers can still face liability.

Technical Controls

Courts also look at whether the scraper circumvented any technical access controls or restrictions. Disregarding the robots.txt file, evading IP address blocking, or bypassing login requirements can tip the scales toward an unauthorized access finding.

As noted above, LinkedIn‘s robots.txt prohibits scraping individual profile data but allows scraping company pages. A court might find a scraper that violated the robots.txt to be "without authorization."

Publicly Available vs. Login-Protected Data

Another consideration is whether the scraped data is available to the general public or protected behind a login. Scraping publicly available data is less likely to violate anti-hacking laws like the CFAA.

In the hiQ case, the court emphasized that the scraped LinkedIn data was not behind a login barrier. However, some courts have still found scraping public data to be unauthorized if it violates the TOS.

Data Use and Privacy

How the scraped data will be used can also impact the legal analysis. Scraping data for non-commercial research purposes may be treated differently than scraping for commercial gain.

Additionally, if the scraping captures personal data or raises privacy concerns, it may implicate data privacy laws like the California Consumer Privacy Act (CCPA), EU General Data Protection Regulation (GDPR), or Illinois Biometric Information Privacy Act (BIPA). Scrapers may need to comply with requirements around notice, consent, opt-out, and data protection.

Jurisdiction

Finally, the specific jurisdiction and applicable laws matter. The CFAA is the main federal anti-hacking statute in the U.S., but states have their own laws as well.

The hiQ case was in the Ninth Circuit, which has adopted a somewhat narrow view of CFAA liability for web scraping. But courts in other circuits have interpreted the statute more broadly to cover TOS violations.

The location of the scraper and the scraped site can also determine which laws apply. A U.S.-based court may be less likely to enforce foreign scraping laws. And if the scraper and scraped site are in different countries, there may be additional legal complexity.

Risks and Potential Consequences of LinkedIn Scraping

Scraping LinkedIn does carry some legal risks, as the cases above illustrate. Potential consequences can include:

  • Cease and desist letters from LinkedIn
  • Account suspensions or bans
  • Civil lawsuits for breach of contract, trespass, misappropriation, etc.
  • Criminal penalties for violation of anti-hacking laws
  • Injunctions prohibiting future scraping activity
  • Monetary damages and attorneys‘ fees

The specific risks depend on factors like the scale of scraping, the methods used, and the scraped data‘s ultimate use. But in general, the more unauthorized and invasive the scraping, the higher the legal risk.

Legitimate and Responsible LinkedIn Scraping

This doesn‘t mean all LinkedIn scraping is off-limits. There are some practices that can help keep scraping activity on the right side of the law:

  • Stay within the bounds of the site‘s TOS and robots.txt
  • Don‘t circumvent access controls or use fake accounts
  • Limit the scale and speed of scraping
  • Only collect publicly available data
  • Use the data for legitimate purposes like research and analysis
  • Comply with relevant privacy laws and best practices
  • Be transparent about your scraping activity

By scraping ethically and responsibly, you can reduce your legal exposure while still gaining valuable insights from LinkedIn data.

Alternatives to Scraping LinkedIn

If you want to access LinkedIn data without the legal risks of scraping, there are some alternatives to consider:

  • Use the official LinkedIn API, which provides authorized access to certain data
  • Partner with LinkedIn directly to get access to data for specific use cases
  • Collect your own first-party data with user consent
  • Purchase LinkedIn data from authorized data brokers or aggregators
  • Use other public data sources and research methods

These options can provide many of the same benefits as scraping without running afoul of LinkedIn‘s TOS.

The Bottom Line

So is it legal to scrape LinkedIn? As we‘ve seen, it depends on a variety of factors and the specific circumstances involved. The legal landscape around web scraping remains complex and fact-specific.

In general, scraping that circumvents LinkedIn‘s technical controls, violates the TOS, uses fake accounts, or captures personal data is more likely to be deemed unlawful. But scraping that stays within the bounds of the robots.txt, only accesses public data, and has a legitimate purpose may be defensible.

Ultimately, anyone considering scraping LinkedIn should carefully evaluate the risks and consult with legal counsel. The law in this area continues to evolve, and there are still many unsettled questions.

Web scraping can be a powerful tool for data-driven insights, but it‘s important to approach it thoughtfully and responsibly, especially when it comes to sensitive platforms like LinkedIn. By understanding the key legal issues and following best practices, you can reap the benefits of LinkedIn data while mitigating your exposure.