Welcome to pyolx’s documentation!¶
Introduction¶
pyolx supplies two methods to scrape data from www.olx.pl website
Scraping category data¶
This method scrapes available offer urls from OLX search results with parameters
-
olx.category.
get_category
(main_category, sub_category, detail_category, region, **filters)[source]¶ Parses available offer urls from given category from every page
Parameters: - main_category – Main category
- sub_category – Sub category
- detail_category – Detail category
- region – Region of search
- filters – Dictionary with additional filters. Following example dictionary contains every possible filter
with examples of it’s values.
Example: - input_dict = {
- “[filter_float_price:from]”: 2000, # minimal price “[filter_float_price:to]”: 3000, # maximal price “[filter_enum_floor_select][0]”: 3, # desired floor, enum: from -1 to 11 (10 and more) and 17 (attic) “[filter_enum_furniture][0]”: True, # furnished or unfurnished offer “[filter_enum_builttype][0]”: “blok”, # valid build types: # blok, kamienica, szeregowiec, apartamentowiec, wolnostojacy, loft “[filter_float_m:from]”: 25, # minimal surface “[filter_float_m:to]”: 50, # maximal surface “[filter_enum_rooms][0]”: 2 # desired number of rooms, enum: from 1 to 4 (4 and more)
}
Returns: List of all offers for given parameters Return type: list
It can be used like this:
input_dict = {'[filter_float_price:from]': 2000}
parsed_urls = olx.category.get_category("nieruchomosci", "mieszkania", "wynajem", "Gdańsk", **input_dict)
The above code will put a list of urls containing all the apartments found in the given category into the parsed_url variable
Scraping offer data¶
This method scrapes all offer details from
-
olx.offer.
get_descriptions
(parsed_urls)[source]¶ Parses details of offers in category
Parameters: parsed_urls (list) – List of offers urls Returns: List of details of offers Return type: list Except: If this offer is not available anymore
It can be used like this:
descriptions = olx.offer.get_descriptions(parsed_urls)
The above code will put a list of offer details for each offer url provided in parsed_urls into the descriptions variable
Category methods¶
-
olx.category.
get_category
(main_category, sub_category, detail_category, region, **filters)[source]¶ Parses available offer urls from given category from every page
Parameters: - main_category – Main category
- sub_category – Sub category
- detail_category – Detail category
- region – Region of search
- filters – Dictionary with additional filters. Following example dictionary contains every possible filter
with examples of it’s values.
Example: - input_dict = {
- “[filter_float_price:from]”: 2000, # minimal price “[filter_float_price:to]”: 3000, # maximal price “[filter_enum_floor_select][0]”: 3, # desired floor, enum: from -1 to 11 (10 and more) and 17 (attic) “[filter_enum_furniture][0]”: True, # furnished or unfurnished offer “[filter_enum_builttype][0]”: “blok”, # valid build types: # blok, kamienica, szeregowiec, apartamentowiec, wolnostojacy, loft “[filter_float_m:from]”: 25, # minimal surface “[filter_float_m:to]”: 50, # maximal surface “[filter_enum_rooms][0]”: 2 # desired number of rooms, enum: from 1 to 4 (4 and more)
}
Returns: List of all offers for given parameters Return type: list
-
olx.category.
get_offers_for_page
(main_category, sub_category, detail_category, region, page, **filters)[source]¶ Parses offers for one specific page of given category with filters.
Parameters: - main_category (str) – Main category
- sub_category (str) – Sub category
- detail_category (str) – Detail category
- region (str) – Region of search
- page (int) – Page number
- filters (dict) – See :meth category.get_category for reference
Returns: List of all offers for given page and parameters
Return type: list
-
olx.category.
get_page_count
(markup)[source]¶ Reads total page number from OLX search page
Parameters: markup (str) – OLX search page markup Returns: Total page number extracted from js script Return type: int
-
olx.category.
get_page_count_for_filters
(main_category, sub_category, detail_category, region, **filters)[source]¶ Reads total page number for given search filters
Parameters: - main_category (str) – Main category
- sub_category (str) – Sub category
- detail_category (str) – Detail category
- region (str) – Region of search
- filters – See :meth category.get_category for reference
Returns: Total page number
Return type: int
Offer methods¶
-
olx.offer.
get_additional_rent
(offer_markup)[source]¶ Searches for additional rental costs
Parameters: offer_markup (str) – Returns: Additional rent Return type: int
-
olx.offer.
get_date_added
(offer_markup)[source]¶ Searches of date of adding offer
Parameters: offer_markup (str) – Class “offerbody” from offer page markup Returns: Date of adding offer Return type: str
-
olx.offer.
get_descriptions
(parsed_urls)[source]¶ Parses details of offers in category
Parameters: parsed_urls (list) – List of offers urls Returns: List of details of offers Return type: list Except: If this offer is not available anymore
-
olx.offer.
get_gps
(offer_markup)[source]¶ Searches for gps coordinates (latitude and longitude)
Parameters: offer_markup (str) – Class “offerbody” from offer page markup Returns: Tuple of gps coordinates Return type: tuple
-
olx.offer.
get_img_url
(offer_markup)[source]¶ Searches for images in offer markup
Parameters: offer_markup (str) – Class “offerbody” from offer page markup Returns: Images of offer in list Return type: list
-
olx.offer.
get_poster_name
(offer_markup)[source]¶ Searches for poster name
Parameters: offer_markup (str) – Class “offerbody” from offer page markup Returns: Poster name Return type: str
-
olx.offer.
get_surface
(offer_markup)[source]¶ Searches for surface in offer markup
Parameters: offer_markup (str) – Class “offerbody” from offer page markup Returns: Surface Return type: float Except: When there is no offer surface it will return None
-
olx.offer.
get_title
(offer_markup)[source]¶ Searches for offer title on offer page
Parameters: offer_markup (str) – Class “offerbody” from offer page markup Returns: Title of offer Return type: str
-
olx.offer.
parse_description
(offer_markup)[source]¶ Searches for description if offer markup
Parameters: offer_markup (str) – Body from offer page markup Returns: Description of offer Return type: str
-
olx.offer.
parse_flat_data
(offer_markup)[source]¶ Parses flat data from script of Google Tag Manager
Data includes if offer private or business, number of floor, number of rooms, built type and furniture.
Parameters: offer_markup (str) – Body from offer page markup Returns: Dictionary of flat data Return type: dict
-
olx.offer.
parse_offer
(markup, url)[source]¶ Parses data from offer page markup
Parameters: - markup (str) – Offer page markup
- url (str) – Url of current offer page
Returns: Dictionary with all offer details
Return type: dict
Utils methods¶
-
olx.utils.
city_name
(city)[source]¶ Creates valid OLX url city name
OLX city name can’t include polish characters, upper case letters. It also should replace white spaces with dashes.
Parameters: city (str) – City name not in OLX url format Returns: Valid OLX url city name Return type: str Example: >> city_name(“Ruda Śląska”) “ruda-slaska”
-
olx.utils.
flatten
(container)[source]¶ Flatten a list
Parameters: container (list) – list with nested lists Returns: list with elements that were nested in container Return type: list
-
olx.utils.
get_content_for_url
(url)[source]¶ Connects with given url
If environmental variable DEBUG is True it will cache response for url in /var/temp directory
Parameters: url (str) – Website url Returns: Response for requested url
-
olx.utils.
get_search_filter
(filter_name, filter_value)[source]¶ Generates url search filter
Parameters: - filter_name (str) – Filter name in OLX format. See :meth:’olx.get_category’ for reference
- filter_value – Correct value for filter
Returns: Percent-encoded url search filter
:rtype str
Example: >> get_search_filter([filter_float_price:from], 2000) “search%5Bfilter_float_price%3Afrom%5D=2000”
-
olx.utils.
get_url
(main_category, sub_category, detail_category, region, page=None, **filters)[source]¶ Creates url for given parameters
Parameters: - main_category (str) – Main category
- sub_category (str) – Sub category
- detail_category (str) – Detail category
- region (str) – Region of search
- page (int) – Page number
- filters (dict) – Dictionary with additional filters. See :meth:’olx.get_category’ for reference
Returns: Url for given parameters
Return type: str