Diving Into the Initiative of Data Scraping

  • 24/06/2021

The concepts of web crawling or web spider refer specifically to the fact that to obtain the web pages that interest us we have to track their web links, that is, we have to navigate through the tree structure (or graph) of the website, making a recursive exploration of all its links to the depth level that we determine. Under the term screen scraping, we refer to the extraction of information from a web page, part of whose content is created at the time it is viewed, so we have to resort to headless browser-type tools that include Javascript and CSS engines.

One of the initial sources of links to start making data scraping services that could be the result page of a web browser, as would be the case of the SERP ( Search Engine Results Page, or results page of the search ) of Google, after making a search by the keyword ( keyword ), or of the indexed links of a website ( site:) , of the links that contain a certain term, or of complex queries using any of the commands and operators that Google allows us ( footprints). We could also help ourselves with the internal search engine that ecommerces usually have or any well-made website.

The uses that can be given to these techniques are very varied: to perform web optimizations of all kinds (such as SEO / WPO), to extract data and analyze it, to feed a database, to migrate a website, to collect and offer data scattered over several websites, generate alerts, and some other less lawful or discouraged.

Among the practical applications would include the monitoring prices of the competition, location of items or stock in eCommerces, collection of product files in eCommerces, detection of changes in websites, registering launches and news, etc.

