A few years back, not long ago, one of my batch mates studying computer science had a brilliant idea for a university project – this project was supposed to be his great escape plan from getting bad grades, this was supposed to salvage all the shortfalls he had in his semester grades. What was this project about? His brilliant idea was to deploy a mega web crawler to crawl and fetch results from multiple high flying and well-known websites and then publish all fetched and crawled data into a neat analytical note for all to benefit.
While this was a truly noble thought and really did have the potential to sail him through the academic year, there was one small problem here.
It is not legal!
Now, who would want to get on the wrong side of law for a petty project. Although the internet is flooded with tutorials and content supporting the idea of running a web scraping or web crawling project, the fact that legal aspects are blatantly ignored and education about associated law being conveniently overlooked is quite worrisome.
Web scraping and crawling is really useful and adds a lot of value; however, it is much more valuable when it comes free of legal trouble.
While this article is intended to cover major nuances of the legal aspects to web scraping and web crawling, it is always advisable to seek professional legal counsel. This article here can be referred to as an indicative primer to the existent legal aspects of web crawling and web scraping.
Before delving ahead into the depths of legal aspects associated with web crawling and web scraping, it would be worth to take a look at these concepts individually.
So, what exactly does web scraping mean?
Web scraping is nothing but the entire process of scraping through the content hosted on a web page and parsing for particularly relevant data. Such data sets can be stored externally on various locations.
Web scraping is often used synonymously with web crawling; however, they are chalk and cheese or at least cheddar and parmesan. ??
Unlike web scraping, the web crawling process includes but is not limited to downloading relevant information hosted on a web page but also fetching URLs and crawling through them. The fetched data is typically stored as an index or database which makes life simpler in case n wishes to look it up.
As an example, to understand the difference better, the web scraping process may help you scrape Yahoo! Finance for fetching stock quotes and then use this as the base for your analytics. At the same time, a web crawler has the potential to not only fetch data but also execute a full-fledged search engine task – similar togoogle’s very own web crawler, Googlebot.
Hence, although these terminologies are used in replacement of each other, the processes serve entirely different purposes.
While web scraping is used widely and contributes massively to the industries’ collective business and competitive intelligence, it has received flak from web administrators all across the globe for a host of reasons. A few reasons are as follows –
Watch this space for more. So far, we've only established the ground rules; the game is yet to begin.