Web Scraping -Amazon Reviews

Vijesh s
3 min readNov 14, 2019

Importance of Data

Data must be readily available when it is needed. It is essential to run your business efficiently but is also critical to achieving compliance.

Web Scraping

How good it will be when we have data from the websites, which we are viewing through a web browser. There is no option of saving it anywhere.

Copying manually will be a tedious job, Web scraping is a technique which will help in automating the process of saving the required data from any website.

Setting Header and cookies

Python Requests does not force you to use headers while scraping but there are few smart websites that do not allow you to get read anything important unless certain headers and cookies are not set in it.

Inspect the Website

To get data about the elements we want to access, we first need to inspect the web page using the developer tools.

we can right-click on the element to inspect it

Scraping Product Asin No

ASIN stands for Amazon Standard Identification Number.This helps us to access each product.

This function helps us to return the HTML page of the website which we want to scrape.

If the status code of the page is 200 then it means that we can access the page successfully.

So, we are planning to scrape the data of JBL speakers, Amazon page shows they have different variety of JBL speakers in 16 different pages.

As the file path is already defined in the function, we just give the search function as “jbl+speakers&page=” which will take us throughout the pages.

We have identified the attributes and the tag name for the product name and Asin no by inspecting the product. Also, it is similar to all the products present on the page.

BeautifulSoup along with the find function helps us to take out the data and save it in a list.

In case of product name, we are accessing the text of the searched content, and in case of Asin no we are accessing the “data-asin” in which the Asin No is there.

Scraping Product customer Review Link

With Asin no we have scraped, we can access the product individually, so first, we declare a function because this time the file path is different.

This does the same job of returning the HTML page of the requested page.

Next, we have to scrape the link “All customer Reviews” at the end of the each product page. I found that few of Asin No is missing, maybe because amazon must have removed the product from its site.

So we have a conditional loop(IF), and then with the help of BeautifulSoup and find function we are saving the link in a list.

Here the link is in the “href” tag and it is saved in a list.

Scraping Customer Reviews

Once all the links are available with us we can easily access all the review pages. As the file follows a different path we are declaring another function that does a similar job as before.

As each product will have multiple pages of reviews we are trying to get the first 15 pages of reviews of each product.

we can access the next page by changing the number in the web address, this helps us to go through all the pages using a loop.

With the help of BeautifulSoup and the find function, we can save all the reviews in a list.

Saving the reviews in a CSV file

Once the reviews are scraped it needed to be saved in a file for further analysis.

First, we are importing pandas and then converting the list of reviews into a pandas data frame.

Then by using a command “.to_csv” we are saving the data frame into a CSV file.

The complete code can be found at

https://github.com/vijeshs/Web-Scraping-

This will be a simple procedure to scrape Amazon reviews.

--

--