BLOG

SOMETIMES, WE EXPRESS!

Scraping Amazon Best Sellers Rankings Data using BotScraper

  • 31/01/2023

Scraping Amazon Best Sellers Rankings Data using BotScraper

Amazon is a gigantic marketplace with massive amounts of products in every possible category. This website will rarely have anything that you may not need. Hence, obtaining data from Amazon is not a cakewalk. With numerous product categories and intricate competition where there are many substitutes for every product, pinpointing the best sellers will be a real task. And if tried manually, it is no less than a nightmare.

What is the solution?

Amazon scraping services. They provide various solutions like scrape Amazon reviews and Amazon data extractor tools from the best web scraping services USA. Today, we shall learn scraping Amazon best sellers rankings data using Botscraper.

Since there are many products out there, let's understand the process of scraping Amazon with an example of the best-selling authors or their books.

Mandates

BeautifulSoup and Pandas are the two libraries that you shall utilize to start the data extraction process of Amazon. In order to do so, you will need to install Python3 and pip which can be done using the following command.

$ pip install beautifulsoup4 pandas

Move on to import them with the standard library of urllib : after creating a unique Python script.

from urllib.request import urlopen, Request

from bs4 import BeautifulSoup

import pandas as pd

You will require the following requirements to proceed to answer the question of how to scrape reviews from Amazon.

  1. Open the page using the command urlopen.
  2. The contact with the server will need a request.
  3. You will parse HTML with the use of BeautifulSoup.
  4. The data extraction will be in a CSV file using pandas.

Page Request

This step in the Amazon data extractor will include requesting a page that will be possible after funneling the URL in the urlopen function. Here’s how it can be done.

# Page for best sellers in writing (Authorship subcategory)

url = 'https://www.amazon.com/gp/bestsellers/books/11892'

request = Request(url, headers={'User-agent': 'Mozilla/5.0'})

html = urlopen(request)

Please take a note of the following.

Amazon may block your request. But it can be avoided if you configure the user-agent header. Otherwise, you shall witness urllib.error.HTTPError: HTTP Error 503: Service Unavailable error.

HTML Parsing

Urlopen isn't capable of decoding the html variable that is inclusive of HTML of the page. Hence, it returns it. Such a situation will require parsing the same using BeautifulSoup. At this point, you will be able to extract the information you require based on the framework of the HTML you parsed.

soup = BeautifulSoup(html, 'html.parser')

Hereon, the next step is to visit the browser page and address the framework of the new HTML. CTRL+SHIFT+C on windows and CMD+SHIFT+C on MacOS will help you to evaluate this HTML in your browser. Here, you need to identify the relevant information that you are looking for and the relevant HTML. After that, click on the same to find the HTML tag on the elements bar.

Scrape Amazon

This is the step where the Amazon data extractor will actually scrape the best seller’s webpage. You should now move on after the inspection of the HTML in the browser where you will witness each element having a tag of div including the id sign gridItemRoot. The find_all() approach will collect all the div tags incorporating the id.

At this point, we are near the things that we require to access data about every best seller in the books section.

books = soup.find_all('div', id="gridItemRoot")

Here’s an insight: Amazon has the potential to frame its HTML in a better structure than the one we just came across. What we see here is the id attribute which is not exclusive. A class attribute would have been a better option.

Continuing with one product at a time, you can access information on your desired HTML with a unique loop.

for book in books:

    rank = book.find('span', class_="zg-bdg-text").get_text().replace('#', '')

    print(rank)

    title = book.find(

        'div',

    class_="_p13n-zg-list-grid-desktop_truncationStyles_p13n-sc-css-line-clamp-1__1Fn1y"

    ).get_text(strip=True)

    print(f"Title: {title}")

This particular instance will bring you the rank with the span tag of the HTML and a class zg-bdg-text. A slight change will be necessary for the rank result to replace it with # for a cancellation.

Next, you will receive the HTML tag div that will be resonating with the long class. This is a similar instance to gaining the rank result. Now you have the reviews of your product.

for book in books:

    ...

    author = book.find('div', class_="a-row a-size-small").get_text(strip=True)

    print(f"Author: {author}")

    r = book.find('div', class_="a-icon-row").find('a', class_="a-link-normal")

    rating = r.get_text(strip=True).replace(' out of 5 stars', '') if r else None

There are multiple formats of data, and if you are looking solely for numerical information, a little tweaking in the inputs will do the job.

Convert Into CSV

You must have vacant lists just before the loop and add the data to each list in the loop in order to transfer the information to a CSV file.

A Pandas DataFrame cannot be exported to a CSV file until you reach that stage.

pd.DataFrame({

    'Rank': ranks,

    'Title': titles,

    'Author': authors,

    'Rating': ratings

}).to_csv('best_seller.csv', index=False)

An important notification:

This instance of scraping Amazon best sellers rankings data has a minor constraint in the code. The reason is our application of BeautifulSoup, which is only an HTML parser. This is also an issue because the code on the page is from JavaScript.

Conclusion

Nevertheless, apart from product data, there is a lot of information that can be obtained using an esteemed Amazon data web scraping services USA. This was a basic and initial tutorial that will help you begin your journey to scrape Amazon.

If your business is actively scraping information from Amazon, it is highly recommended to reach out to web scraping services USA, like BotScraper. We will not only assist you with Amazon scraping services but also with many different web scraping services.

So contact us today if you have any data scraping needs, including scraping Amazon best sellers rankings data. We look forward to hearing from you!


Get A Quote