People typically proceed with their web crawling and web scraping ambitions without really giving the legal side a deserved thought. While in the process and asked about the legal side of the story, it is interesting to hear what they have to say and then try understand how good is their argument.
Following are the most typical arguments people offer for covering their legal side -
1. “Public domain data is meant for free use”
Now, while there is some element of truth in the argument, but that’s not going to save you in the courtroom, especially when the attorney rips you apart on grounds of breaching a “copyright”. Yes, copyright. We know data can’t be copyrighted, but a ‘creative arrangement of data” can sure be copyrighted and using such a “creatively arranged data” is a legal offence. And yes, this can be used well in the courtroom.
The Digital Millenium Copyright Act protects copyrighted pieces of digital work and clearly states – “Facts cannot be copyrighted. However, the creative selection, coordination and arrangement of information and materials forming a database or compilation may be protected by copyright. Note, however, that the copyright protection only extends to the creative aspect, not to the facts contained in the database or compilation.”
So, what does this mean? Simple. You can not do whatever you want to do with data available on public domain.
2. “This is covered as fair use”
This is quite an ambiguous zone.
- In a typical case of Kelly v. Arriba, the court ruled in favour of the image search engine ditto.com for displaying works of a professional photographer as thumbnail
- Interestingly, in a case of Associated Press v. Meltwater U.S. Holdings Inc, the court’s verdict went against Meltwater for publishing scraped articles as excerpts from original
At the end, this may really boil down to the strength of your attorney.
3. “This is exactly what I can do anyway through a search engine”
While it is true that you can anyway fetch this data manually, but honestly, it does not matter. Websites typically have a clause in their “Terms of Use” which generally prohibits the use of automated services which include data extraction, data scraping, data crawling & harvesting along with similar associated services.
4.” At worse, my IP or MAC can be blocked”
Well, as established in earlier posts as well, the worst is worse than what you can really fathom. While the worst-case scenario depends heavily on the website administrator’s mood, it is far from simply blocking you.
Facebook’s attorney had once threatened a certain Mr. Warden with a lawsuit to dissuade him from publishing data that had been collected by scraping facebook for millions of profiles.
In a similar case, Linkedin blocked a certain Mr. Michael Keating’s account because he had built a tool which was believed to be to scrape Linkedin (which was not). Nevertheless, he has never been able to restore his account.
Another episode involving Linkedin ended up pretty well for the company where Robocog Inc was instructed to pay 40 grand in USD to Linkedin for allegedly scraping the website without authorisation.
There is more to follow. Keep watching this space for more arguments and the real facts of the matter.