The primary commandment in the Bible of web scraping is to not harm the website. Next, it is very important to know the second rule – never go against commandment number one. Web scrapers and service providers are all in for a democratic and judicious use of the free-floating data; however, not at the cost of site admins. Among the web scraping and data extraction community, there’s a huge respect for scraping ethics & etiquettes. Also, like every society, even this has its own share of good and bad people; however, Botscraper is definitely one among the good.
Some may call it spiders, crawlers and all sorts of eerie and creepy names, but the truth is that these bots are very mindful and polite in their behavior.
What do you mean by polite bots, you ask? Well, it’s the same difference as screaming at your neighbor through your window asking her to send her kid with a bowl of sugar because you’ve run out of it versus “politely” stepping out and knocking on their front door requesting if you could borrow some sugar.
This is exactly the same difference between rowdy and polite bots. Polite bots figuratively knock on a websites doors and requests for data without causing any ruckus or harm. However, the unruly and rowdy bot just barges into a website and rummages through whatever data it finds – it’s kind of like what an alcoholic would do to a bar when he’s deprived of alcohol.
So, just like good manners, polite behavior and mindful actions make a person “polite, there are a few basic traits that pushes a web scraping and data extracting bot into the list of polite web scrapers.
A polite web scraping bot will always respect robots.txt.
Now, what’s a “robots.txt”? A simple analogy would be – if you visit someone’s home and they do not wish that you see a certain object, they will put it away; away into a wardrobe with a lock. The wardrobe is “robots.txt” and the object is nothing but the website’s data which the website admin doesn’t want web scraping crawlers to read into. Now, there are two options – either the bot respects the admin’s choice or it turns unruly and breaks the lock to try and forcefully read into the data.
The latter is definitely not polite.
Next, a polite data extracting bot will never degrade a site’s performance. Just imagine that a nosy neighbor comes to your home to borrow sugar while you are busy vacuuming the living room. Continue imagining that nosy neighbor is stepping all over your freshly vacuumed carpet, jumping all over it with his dirty slipper and literally not letting you do your job till he gets his cup of sugar.
By now, you definitely have got the drift. A polite web scraper will never degrade a site’s performance.
A polite bot will never try to be a pain in the administrator’s neck. Unlike those uninvited relatives and guests who just show up at your place without prior intimation and then continuously demand for hospitality efforts, these polite bots won’t keep beating at the door till the administrator loses his cool and either lets the bot in or beats the crap out of it.
While the world of internet and data is overflowing, one must never forget that someone owns the home to the data – the website and he/she has every right to love and take care of it. While polite bots slowly and gradually get almost all the data they set out for, the contrast would have to face the wrath of angry and intelligent owners who wouldn’t just crush the bot out of existence, but would present legal trouble to both – the owner of the bot as well as to the person for whom the web scraper was out crawling.
Data is a beautiful world, as long as respect is a mutual thing and politeness is the norm. We at Botscraper strongly believe in maintaining the highest level of ethos while offering nothing but excellence to our clients.