Web scraping is the art of extracting data from the internet. When it comes to its applications it has a vast amount of applications. One of them is price comparison from different websites. Online shopping has become the boom in the industry now, and comparing the pricing of certain products has become a necessity. We
Web scraping is the art of extracting data from the internet. When it comes to its applications it has a vast amount of applications. One of them is price comparison from different websites. Online shopping has become the boom in the industry now, and comparing the pricing of certain products has become a necessity. We all visit multiple websites when we need to purchase a particular product but have you ever thought of making a price comparison tool that does the same job for you and places the best deal in front of you?
In this article, we will be making an amazing web scraping for price comparison tool in Python that will let you track the price of the products across different sources and inform you about the performance of different competitors in the market. Furthermore, it will also inform the business whether the price of a specific product goes up or down the predicted price.
The data source we will be using for this article will be a JSON file, and we will compare the product prices we are getting from Amazon, eBay, and Walmart. Our sample data looks like below,
Feel free to jump to any sections to learn more about web scraping for price comparison in python!
[
{
"last_visited": "2018-01-30T13:38:01",
"name": "PUMA Men's Evospeed 17.4 TT Soccer Shoe",
"amazon_price": 36.94,
"ebay_price": 37,
"walmart_price": 37,
"amazon_url": "https://www.amazon.com/PUMA-Evospeed-Soccer-Ultra-Yellow-Peacoat-Orange/dp/B01J5LEMZI/",
"ebay_url": "https://www.ebay.com/itm/PUMA-Mens-Evospeed-17-4-Tt-Soccer-Shoe/302471489090",
"walmart_url": "https://www.walmart.com/ip/PUMA-Men-s-Evospeed-17-4-Tt-Soccer-Shoe/587074448",
"description": "The new evospeed 17.4 is a performance football boot for players of all levels. The soft and lightweight synthetic leather on the upper keeps the boot lightweight, comfortable and ensures durability. The lightweight outsole offers the perfect balance between traction, stability and acceleration PUMA is the global athletic brand that successfully fuses influences from sport, lifestyle and fashion. PUMA's unique industry perspective delivers the unexpected in sport-lifestyle footwear, apparel and accessories, through technical innovation and revolutionary design.",
"brand": "PUMA",
"image": "https://images-na.ssl-images-amazon.com/images/I/61v1mylcAqL._UL1500_.jpg"
},
{
"last_visited": "2018-01-30T13:38:07",
"name": "L'Oreal Paris Skin Care Revitalift Cicacream Face Moisturizer",
"amazon_price": 13.97,
"ebay_price": 13.99,
"walmart_price": 13.97,
"amazon_url": "https://www.amazon.com/LOreal-Paris-Revitalift-Cicacream-Moisturizer/dp/B074MBDRHW",
"ebay_url": "https://www.ebay.com/itm/LOREAL-Paris-NEW-Revitalift-Cicacream-Anti-Wrinkle-Skin-Barrier-Repair-ORIGINAL/112715734801",
"walmart_url": "https://www.walmart.com/ip/L-Or-al-Paris-Revitalift-Cicacream-Anti-Wrinkle-Skin-Barrier-Repair/519350834",
"description": "Skin's moisture barrier weakens with age, resulting in greater moisture loss, more prominent wrinkles and loss of firmness. Lightweight, protective cream is formulated with Pro-Retinol, a powerful wrinkle-fighting ingredient and Centella Asiatica, an herb used in traditional Chinese medicine. Strengthens and repairs skin barrier to help resist visible lines, loss of firmness and other signs of aging that a weakened skin barrier can accentuate. See visible results immediately: skin feels healthier, softer, smoother and more supple. Skin feels noticeably more hydrated. Skin barrier is stronger, helping to resist signs of aging. In two weeks: fine lines appear visibly reduced. Firmness and elasticity look noticeably improved. In four weeks: wrinkles appear less visible. Clarity and tone improves, skin exudes luminosity. Skin continues to look and feel soft, smooth, healthy.",
"brand": "L'Oreal Paris",
"image": "https://images-na.ssl-images-amazon.com/images/I/71Ff2vn4vjL._SL1500_.jpg"
},
{
"last_visited": "2018-01-30T13:38:12",
"name": "Adidas Dynamic Pulse By Adidas For Men",
"amazon_price": 6.96,
"ebay_price": 18.99,
"walmart_price": 7,
"amazon_url": "https://www.amazon.com/Adidas-Dynamic-Toilette-3-4-Ounce-Bottle/dp/B000VON5F2/",
"ebay_url": "https://www.ebay.com/itm/Adidas-DYNAMIC-PULSE-Cologne-for-Men-3-4-oz-edt-3-3-Spray-New-in-BOX/252837623533",
"walmart_url": "https://www.walmart.com/ip/Adidas-Dynamic-Pulse-for-Men-3-4-oz-EDT/28664356",
"description": "Launched by the design house of Adidas in 1997, ADIDAS DYNAMIC PULSE is a men's fragrance that possesses a blend of A fresh scent of citrus, cedar and mint with low tones of sweet fruits, fragrant woods and tonka bean. It is recommended for daytime wear.When applying any fragrance please consider that there are several factors which can affect the natural smell of your skin and, in turn, the way a scent smells on you. For instance, your mood, stress level, age, body chemistry, diet, and current medications may all alter the scents you wear. Similarly, factor such as dry or oily skin can even affect the amount of time a fragrance will last after being applied",
"brand": "adidas",
"image": "https://images-na.ssl-images-amazon.com/images/I/41%2BAnOP5nbL.jpg"
},
{
"last_visited": "2018-01-30T13:38:19",
"name": "Canon EOS Rebel T6 Digital SLR Camera",
"amazon_price": 449,
"ebay_price": 449,
"walmart_price": 449,
"amazon_url": "https://www.amazon.com/Canon-Digital-Camera-18-55mm-3-5-5-6/dp/B01CO2JPYS",
"ebay_url": "https://www.ebay.com/itm/Canon-EOS-Rebel-T6-DSLR-Camera-with-18-55mm-Lens/232596041502",
"walmart_url": "https://www.walmart.com/ip/Canon-EOS-Rebel-T6-DSLR-Camera-with-18-55mm-Lens-Black/50820749",
"description": "",
"brand": "Canon",
"image": "https://images-na.ssl-images-amazon.com/images/I/81YszfZS8%2BL._SL1500_.jpg"
},
{
"last_visited": "2018-01-30T13:38:25",
"name": "Woodland Fox Critter 36' Mylar Balloon",
"amazon_price": 5.49,
"ebay_price": 6.49,
"walmart_price": 7.6,
"amazon_url": "https://www.amazon.com/Woodland-Fox-Critter-Mylar-Balloon/dp/B00S9TKVYO",
"ebay_url": "https://www.ebay.com/itm/Woodland-Critters-Fox-36-inch-Foil-Balloon/132058119680",
"walmart_url": "https://www.walmart.com/ip/Woodland-Fox-Foil-Balloon/43350002",
"description": "Celebrate any occasion with an adorable woodland fox critter balloon! 36\" Woodland Critters fox shape foil balloon.",
"brand": "Betallic",
"image": "https://images-na.ssl-images-amazon.com/images/I/71Z9bG-BzuL._SL1500_.jpg"
}
]
Some of the important fields relevant to the script we are writing are amazon_price, ebay_price, and walmart_price.
Now we have seen our data. So let’s get into the development phase.
We will make the tool in Python 3.x, and first of all, we will be using the JSON library for parsing JSON and further processing. The tool provides amazing functionality by printing the product name and price of the site. We are importing JSON library to parse JSON.
import json
Now we will call the open() function in the code snippet to read the content from the JSON file,
import json
if __name__ == '__main__':
price_data = None
price = []
with open('data.json', encoding='utf8') as f:
price_data = f.read()
if price_data is not None:
json_price_data = json.loads(price_data)
Now our JSON data is read, we will convert it into Python’s built-in data structures for which the code will call json.loads() method for converting JSON string into a dictionary or a list of dictionaries, depending upon the entries.
Since the main goal is to find the store that sells the product at the lowest price, our target is to find the minimum price and other relevant details like the product and store name. The price info of the relevant store is stored in amazon_price, ebay_price, and Walmart_price keys. To find the minimum of each product, we need to iterate the price list items.
for d in json_price_data:
price.append({'name': d['name'], 'price': float(d['amazon_price']), 'url': d['amazon_url']})
price.append({'name': d['name'], 'price': float(d['walmart_price']), 'url': d['walmart_url']})
price.append({'name': d['name'], 'price': float(d['ebay_price']), 'url': d['ebay_url']})
minPricedItem = min(price, key=lambda x: x['price'])
print(minPricedItem)
print('=================')
price = []
We are using lambdas and setting the key of min() to make sure the price field is being compared. It produces the following output:
Let’s restructure the format a little bit.
for d in json_price_data:
price.append({'name': d['name'], 'price': d['amazon_price'], 'url': d['amazon_url']})
price.append({'name': d['name'], 'price': d['walmart_price'], 'url': d['walmart_url']})
price.append({'name': d['name'], 'price': d['ebay_price'], 'url': d['ebay_url']})
minPricedItem = min(price, key=lambda x: float(x['price']))
store_name = ''
# Pick the store name based on url
if 'amazon' in minPricedItem['url'].lower():
store_name = 'Amazon'
elif 'walmart' in minPricedItem['url'].lower():
store_name = 'Amazon'
elif 'ebay' in minPricedItem['url'].lower():
store_name = 'eBay'
print('{} is available in cheap price at {}. The price is ${}'.format(minPricedItem['name'], store_name,
minPricedItem['price']))
price = []
It will give the following output:
Congratulations! We have successfully made the script that you can run periodically to get the updated prices of the product.
ProxyScrape is one of the most popular and reliable proxy providers online. Three proxy services include dedicated datacentre proxy servers, residential proxy servers, and premium proxy servers. So, what is the best possible solution for the best HTTP proxy for web scraping for pricing comparison using python? Before answering that questions, it is best to see the features of each proxy server.
A dedicated datacenter proxy is best suited for high-speed online tasks, such as streaming large amounts of data (in terms of size) from various servers for analysis purposes. It is one of the main reasons organizations choose dedicated proxies for transmitting large amounts of data in a short amount of time.
A dedicated datacenter proxy has several features, such as unlimited bandwidth and concurrent connections, dedicated HTTP proxies for easy communication, and IP authentication for more security. With 99.9% uptime, you can rest assured that the dedicated datacenter will always work during any session. Last but not least, ProxyScrape provides excellent customer service and will help you to resolve your issue within 24-48 business hours.
Next is a residential proxy. Residential is a go-to proxy for every general consumer. The main reason is that the IP address of a residential proxy resembles the IP address provided by ISP. This means getting permission from the target server to access its data will be easier than usual.
The other feature of ProxyScrape’s residential proxy is a rotating feature. A rotating proxy helps you avoid a permanent ban on your account because your residential proxy dynamically changes your IP address, making it difficult for the target server to check whether you are using a proxy or not.
Apart from that, the other features of a residential proxy are: unlimited bandwidth, along with concurrent connection, dedicated HTTP/s proxies, proxies at any time session because of 7 million plus proxies in the proxy pool, username and password authentication for more security, and last but not least, the ability to change the country server. You can select your desired server by appending the country code to the username authentication.
The last one is the premium proxy. Premium proxies are the same as dedicated datacenter proxies. The functionality remains the same. The main difference is accessibility. In premium proxies, the proxy list (the list that contains proxies) is made available to every user on ProxyScrape’s network. That is why premium proxies cost less than dedicated datacenter proxies.
So, what is the best possible solution for the best HTTP proxy for web scraping for pricing comparison using python? The answer would be “residential proxy.” The reason is simple. As said above, the residential proxy is a rotating proxy, meaning that your IP address would be dynamically changed over a period of time which can be helpful to trick the server by sending a lot of requests within a small time frame without getting an IP block.
Next, the best thing would be to change the proxy server based on the country. You just have to append the country ISO_CODE at the end of the IP authentication or username and password authentication.
Suggested Reads:
Scrape YouTube Comments – 5 Simple StepsThe Top 8 Best Python Web Scraping Tools in 2023Web Scraping For News Articles Using Python– Best Way In 2023
This article explored one more wonder of web scraping, i.e. “Price Comparison”. Not only this, we have built a tool that can do the price comparison job for you and keep you updated with the market trends. This article hopes to give enough information on web scraping for price comparison in an easy way. A proxy server is the best companion for web scraping. ProxyScrape provides best in a class residential proxy for your web scraping for price comparison projects. You can check the best residential proxy here.