Web Scraping for Price Comparison in Python - ProxyScrape

Web scraping is the art of extracting data from the internet. When it comes to its applications, it has a vast amount of applications. One of them is price comparison from different websites. Online shopping has become the boom in the industry now, and comparing the pricing of certain products has become a necessity. We all visit multiple websites when we need to purchase a particular product but have you ever thought of making a price comparison tool that does the same job for you and places the best deal in front of you?  

In this article, we will be making an amazing price comparison tool in Python that will let you track the price of the products across different sources and inform you about the performance of different competitors in the market. Furthermore, it will also let the business be informed either the price of a specific product goes up or down than the predicted price.

The data source we will be using for this article will be a JSON file, and we will compare the product prices we are getting from Amazon, eBay, and Walmart. Our sample data looks like below,

    "last_visited": "2018-01-30T13:38:01",
    "name": "PUMA Men's Evospeed 17.4 TT Soccer Shoe",
    "amazon_price": 36.94,
    "ebay_price": 37,
    "walmart_price": 37,
    "amazon_url": "https://www.amazon.com/PUMA-Evospeed-Soccer-Ultra-Yellow-Peacoat-Orange/dp/B01J5LEMZI/",
    "ebay_url": "https://www.ebay.com/itm/PUMA-Mens-Evospeed-17-4-Tt-Soccer-Shoe/302471489090",
    "walmart_url": "https://www.walmart.com/ip/PUMA-Men-s-Evospeed-17-4-Tt-Soccer-Shoe/587074448",
    "description": "The new evospeed 17.4 is a performance football boot for players of all levels. The soft and lightweight synthetic leather on the upper keeps the boot lightweight, comfortable and ensures durability. The lightweight outsole offers the perfect balance between traction, stability and acceleration PUMA is the global athletic brand that successfully fuses influences from sport, lifestyle and fashion. PUMA's unique industry perspective delivers the unexpected in sport-lifestyle footwear, apparel and accessories, through technical innovation and revolutionary design.",
    "brand": "PUMA",
    "image": "https://images-na.ssl-images-amazon.com/images/I/61v1mylcAqL._UL1500_.jpg"
    "last_visited": "2018-01-30T13:38:07",
    "name": "L'Oreal Paris Skin Care Revitalift Cicacream Face Moisturizer",
    "amazon_price": 13.97,
    "ebay_price": 13.99,
    "walmart_price": 13.97,
    "amazon_url": "https://www.amazon.com/LOreal-Paris-Revitalift-Cicacream-Moisturizer/dp/B074MBDRHW",
    "ebay_url": "https://www.ebay.com/itm/LOREAL-Paris-NEW-Revitalift-Cicacream-Anti-Wrinkle-Skin-Barrier-Repair-ORIGINAL/112715734801",
    "walmart_url": "https://www.walmart.com/ip/L-Or-al-Paris-Revitalift-Cicacream-Anti-Wrinkle-Skin-Barrier-Repair/519350834",
    "description": "Skin's moisture barrier weakens with age, resulting in greater moisture loss, more prominent wrinkles and loss of firmness. Lightweight, protective cream is formulated with Pro-Retinol, a powerful wrinkle-fighting ingredient and Centella Asiatica, an herb used in traditional Chinese medicine. Strengthens and repairs skin barrier to help resist visible lines, loss of firmness and other signs of aging that a weakened skin barrier can accentuate. See visible results immediately: skin feels healthier, softer, smoother and more supple. Skin feels noticeably more hydrated. Skin barrier is stronger, helping to resist signs of aging. In two weeks: fine lines appear visibly reduced. Firmness and elasticity look noticeably improved. In four weeks: wrinkles appear less visible. Clarity and tone improves, skin exudes luminosity. Skin continues to look and feel soft, smooth, healthy.",
    "brand": "L'Oreal Paris",
    "image": "https://images-na.ssl-images-amazon.com/images/I/71Ff2vn4vjL._SL1500_.jpg"
    "last_visited": "2018-01-30T13:38:12",
    "name": "Adidas Dynamic Pulse By Adidas For Men",
    "amazon_price": 6.96,
    "ebay_price": 18.99,
    "walmart_price": 7,
    "amazon_url": "https://www.amazon.com/Adidas-Dynamic-Toilette-3-4-Ounce-Bottle/dp/B000VON5F2/",
    "ebay_url": "https://www.ebay.com/itm/Adidas-DYNAMIC-PULSE-Cologne-for-Men-3-4-oz-edt-3-3-Spray-New-in-BOX/252837623533",
    "walmart_url": "https://www.walmart.com/ip/Adidas-Dynamic-Pulse-for-Men-3-4-oz-EDT/28664356",
    "description": "Launched by the design house of Adidas in 1997, ADIDAS DYNAMIC PULSE is a men's fragrance that possesses a blend of A fresh scent of citrus, cedar and mint with low tones of sweet fruits, fragrant woods and tonka bean. It is recommended for daytime wear.When applying any fragrance please consider that there are several factors which can affect the natural smell of your skin and, in turn, the way a scent smells on you. For instance, your mood, stress level, age, body chemistry, diet, and current medications may all alter the scents you wear. Similarly, factor such as dry or oily skin can even affect the amount of time a fragrance will last after being applied",
    "brand": "adidas",
    "image": "https://images-na.ssl-images-amazon.com/images/I/41%2BAnOP5nbL.jpg"
    "last_visited": "2018-01-30T13:38:19",
    "name": "Canon EOS Rebel T6 Digital SLR Camera",
    "amazon_price": 449,
    "ebay_price": 449,
    "walmart_price": 449,
    "amazon_url": "https://www.amazon.com/Canon-Digital-Camera-18-55mm-3-5-5-6/dp/B01CO2JPYS",
    "ebay_url": "https://www.ebay.com/itm/Canon-EOS-Rebel-T6-DSLR-Camera-with-18-55mm-Lens/232596041502",
    "walmart_url": "https://www.walmart.com/ip/Canon-EOS-Rebel-T6-DSLR-Camera-with-18-55mm-Lens-Black/50820749",
    "description": "",
    "brand": "Canon",
    "image": "https://images-na.ssl-images-amazon.com/images/I/81YszfZS8%2BL._SL1500_.jpg"
    "last_visited": "2018-01-30T13:38:25",
    "name": "Woodland Fox Critter 36' Mylar Balloon",
    "amazon_price": 5.49,
    "ebay_price": 6.49,
    "walmart_price": 7.6,
    "amazon_url": "https://www.amazon.com/Woodland-Fox-Critter-Mylar-Balloon/dp/B00S9TKVYO",
    "ebay_url": "https://www.ebay.com/itm/Woodland-Critters-Fox-36-inch-Foil-Balloon/132058119680",
    "walmart_url": "https://www.walmart.com/ip/Woodland-Fox-Foil-Balloon/43350002",
    "description": "Celebrate any occasion with an adorable woodland fox critter balloon! 36\" Woodland Critters fox shape foil balloon.",
    "brand": "Betallic",
    "image": "https://images-na.ssl-images-amazon.com/images/I/71Z9bG-BzuL._SL1500_.jpg"

Some of the important fields relevant to the script we are writing are amazon_price, ebay_price, and walmart_price.

Now we have seen our data. So let’s get into the development phase.

We will make the tool in Python 3.x, and first of all, we will be using the JSON library for parsing JSON and further processing. The tool provides amazing functionality by printing the product name and price of the site. We are importing JSON library to parse JSON.

import json

Now we will call the open() function in the code snippet to read the content from the JSON file,

import json
if __name__ == '__main__':
    price_data = None
    price = []
    with open('data.json', encoding='utf8') as f:
        price_data = f.read()
    if price_data is not None:
       json_price_data = json.loads(price_data)

Now our JSON data is read, we will convert it into Python’s built-in data structures for which the code will call json.loads() method for converting JSON string into a dictionary or a list of dictionaries, depending upon the entries.

Since the main goal is to find the store that sells the product at the lowest price, our target is to find the minimum price and other relevant details like product and store name. The price info of the relevant store is store in amazon_price, ebay_price, and Walmart_price keys. To find the minimum of each product, we need to iterate the price list items.

for d in json_price_data:
            price.append({'name': d['name'], 'price': float(d['amazon_price']), 'url': d['amazon_url']})
            price.append({'name': d['name'], 'price': float(d['walmart_price']), 'url': d['walmart_url']})
            price.append({'name': d['name'], 'price': float(d['ebay_price']), 'url': d['ebay_url']})
            minPricedItem = min(price, key=lambda x: x['price'])
            price = []

We are using lambdas and setting the key of min() to make sure the price field is being compared. It produces the following output:

Let’s restructure the format a little bit.

for d in json_price_data:
            price.append({'name': d['name'], 'price': d['amazon_price'], 'url': d['amazon_url']})
            price.append({'name': d['name'], 'price': d['walmart_price'], 'url': d['walmart_url']})
            price.append({'name': d['name'], 'price': d['ebay_price'], 'url': d['ebay_url']})
            minPricedItem = min(price, key=lambda x: float(x['price']))
            store_name = ''
            # Pick the store name based on url
            if 'amazon' in minPricedItem['url'].lower():
                store_name = 'Amazon'
            elif 'walmart' in minPricedItem['url'].lower():
                store_name = 'Amazon'
            elif 'ebay' in minPricedItem['url'].lower():
                store_name = 'eBay'
            print('{} is available in cheap price at {}. The price is ${}'.format(minPricedItem['name'], store_name,
            price = []

It will give the following output:

Congratulations! We have successfully made the script that you can run periodically to get the updated prices of the product.


This article explored one more wonder of web scraping, i.e. “Price Comparison”. Not only this, we have built a tool that can do the price comparison job for you and keep you updated with the market trends. I hope everything is clear and the article was fun learning along. That was all for this article. See you in the next ones!