Web Crawling & Web Scraping: The Mystery Behind Legal Lines (Part 2)

  • 25/11/2017

In the earlier version, we established basic concepts and the basic thought behind having web crawling and web scraping serices and its legality. In case you missed out on it, you can click here to read 'Web Crawling & Web Scraping: The Mystery Behind Legal Lines (Part 1)'

In continuation -

While web scraping is used widely and contributes massively to the industries’ collective business and competitive intelligence, it has received flak from web administrators all across the globe for a host of reasons. A few reasons are as follows –

  1. While the objectives for web crawling and web scraping are often sugar-coated with words like business intelligence and competitive intelligence, a financial gain is typically the ulterior motive which makes the entire process viewed with a bias on psychological levels.
  2. Terms of Services and copyrights are blatantly overlooked and ignored with quite some level of convenience.
  3. The fine line between use and abuse of processes like web crawling and web scraping has blurred to quite an extent due to which there’s more abuse on the web instead of use. The most common form of abuse high frequency requests – the web scraping or crawling bot visits and requests information from the website at an alarmingly high frequency. This creates a massive pressure on the website’s server and chokes the bandwidth only to slow it down just enough to ruin the actual human interaction with the website. In other unwelcomed actions the web crawler may choose to maintain anonymity thus making its very existence on a webpage a major threat or it may simply perform prohibited functions like circumventing security panels to steal data instead of requesting for it.
  4. With the rapid pace of growth in the web scraping and crawling industry, the process of web scraping and web crawling is on the verge of being a literal nuisance to the websites being scraped. This is true especially for data aggregating websites like social media networks and e-commerce websites like facebook and amazon, respectively.

However, web crawling is the exact same process which typically became popular search engines’ claim to fame. Popular search engines like Google and Bing use their proprietary web crawling bots to fetch and index all web pages hosted over the internet. While at the very core these search engines use web crawlers only, their web crawling processes do more good than harm to the web pages that it crawls in more than one ways and hence it is seen as favourable.

So, is web scraping and web crawling in India, or for that matter – anywhere in the world, legal or illegal? This is a question that has been conveniently slided away since years now. Let’s break this down for you.

Now, think of it this way – if you were to scrape or crawl your own website, would it be legal? Well, of course, without a doubt. Now one may argue that picking on something within one’s own house is anyway legal, what about picking something from someone else’s house?

The common sense would prevail. What if you wanted something from your friend’s house? How would you go about it? There are two ways – you just pick it and go (which would amount to theft and would certainly be illegal) or you simply ask your friend for it and then with his/her permission take it home. In the latter case, it is completely legal and avoids future unfavorable repercussions.

Similar to the above anecdote, the legal problem arises when you try to crawl or scrape a website that does not belong to you, without the website administrator’s approval. Here, a non-approval, implicitly refers to an attempt to circumvent or blatantly bypass the guidelines typically established in the website’s “Terms of Use”. Once you do this, you have exposed yourself to massive legal vulnerability.

Keep an eye on this space for more.

Get A Quote