Web Scraping: For what purpose is it practiced and how it affects data protection?

  • 28/06/2021

Purpose of its practice
Currently, there is a great demand for data of all kinds by companies. Some want to obtain directories with emails from other companies that operate within a certain sector to try to contact them, while others want to know the visitor data that their competitors' websites receive. There are others who want to have a clearer profile of the users who interact with certain pages.

Data is very valuable information and can make a difference when it comes to gaining a competitive advantage.

What is done through scraping is to extract information from web pages, copy it, and create a database with it that can be useful to another company.

It can also be used as a black-hat SEO technique, so that competing websites receive a lot of traffic from scraper bots that are extracting the data, but being very short visits they convey to Google the idea that the content is not well, leading to loss of ranking on search results pages.

How it affects data protection?
If web scraping is not illegal, it is precisely because it is developed within the limits established by the RGPD and the LOPD.

It is possible to extract data from a website as long as it is from publicly accessible sources and there is a legitimate or general public interest. In no case can this technique be used to obtain information that infringes the regulations on intellectual property, the right to privacy of people or to carry out practices related to identity theft.

If these limits are not respected, the interested parties can file a complaint for the mishandling of the data, which would force a person who has carried out scraping to demonstrate that there is a legitimate interest that enables them to obtain and use such data.

Web scraping is legal but not inevitable. If we are the owners of a web page and we want to prevent others from extracting information from it, the best we can do is categorize all the data within the intellectual property of the web, for which we will have to resort to copyright protection. The other option we have left is to protect all the data through a password, so that only those who have it can access it.

