Welcome to Read the deals project documentation!¶
Contents:
Authors¶
- Ajay Kumar Soma
Idea backgorund - The Existing online discount sites¶
All discounts and Deals at one place¶
There are many sites on the net which show all the discounts from eretailers in one place. Make them searchable and accessible to price conscious consumers who like deals and discounts. But the same kind of service is not available for the offline or in-store deals and discounts. I guess the following are the reasons
sample sites¶
1. Item level discount details¶
2. All coupons with out item info¶
- cupondunia - All cupons
- savemyrupee - All cupons It has wish list, if you give an item, they will find lowest price for you
online deals advantage¶
- online retailers provide apis. so easy to get data.
- Online retailers pay referral bonus
offline deals disadvantages¶
- too many interfaces and no api support.
- difficult to earn referral bonus.
Our Idea¶
To bring all offline discounts and deals to online just like the online discount sites.
challenges - our approach¶
- Product data - use webscraper
- Store data - build very simple publishing system for each store
- Referral bonus - right now focus on reaching consumers, later plan for revenue from,
- Ads
- Referral
- Analytic data
Our unique features¶
0. Discounts Guide¶
A discounts guide that tells you why the price is marked down.
1. Deal Rating¶
Also a rating for each discount which tells whether discount is good or not
2. Use Likes and Comments¶
Like/dislike and comments on each item and over all sale.
3. Reserve item¶
User can reserve an item for a day or 2 days and collect it from the store.
4. Wish List¶
User can make a wish list for an item at a certain price or discount percentage and we will send alerts
5. Location based Discount alerts¶
User will get alerts when he is near a store which matches his wish list. This is the most important feature.
6. Offline and Online price comparison¶
Compare the price offline, if the prod is avail online, show the comparison and give the customer a reason to visit the store.
Template¶
$project will solve your problem of where to start with documentation, by providing a basic explanation of how to do it easily.
Look how easy it is to use:
import project # Get your stuff done project.do_stuff()
Features¶
- Be awesome
- Make things faster
Contribute¶
- Issue Tracker: github.com/$project/$project/issues
- Source Code: github.com/$project/$project
Support¶
If you are having issues, please let us know. We have a mailing list located at: project@google-groups.com
License¶
The project is licensed under the BSD license.
Scraper¶
The Scraper is the heart of our app. It fetches the data and pumps to the app. So the scraper design is very critical. I have considered several tools and technologies for scraping.
I have done scraping in node.js and python. Python is my choice as it is very simple and follows sequential programming approach.
The following explains the scraping browsers and frameworks
1. Scrapy-Python¶
Scrapy is one of the best framework for scraping. But it can’t scrape dynanmic sites where the content is generated on the fly by java script. because it uses headless browser.
2. Selenium-xvfb¶
Selenium uses real browser so it can scrape dynamic sites. But is bit slow. It doesnt matter. it does the job pretty good. XVFB is emulator to run browser in hidden mode.
Puma India scraping¶
It is a dynamic site. But the initial page is static. so for the first page, we used scrapy. for products page used selenium.
Puma-lev1¶
- its job is to fetch all the unique product pages along with the following fields.
- id
- process_lev1
- process_lev2
- sale
- name
- url
- image_small
- regular_price
- discounted_price
- command to run ::
- cd puma-spider/stack/stack/
- scrapy crawl puma-lev1
Program Logic¶
- Fetch the data from the site and while saving it, do the following.
If the item is not present (check using URL as key), Then Insert it.
- If item exists in the DB, then update the following fields.
- sale
- regular_price
- discounted_price
Puma-lev2¶
- command to run ::
python puma-lev2.py refresh=N
python puma-lev2.py refresh=Y
- The level2 spider reads yields the following fields.
- images - original, big, small
- style number
- availability
- size
Keep a refresh flag for this program. if refresh flag is Y, then process all else process only those with process_lev2 flag N