Welcome to pyolx’s documentation!

Introduction

pyolx supplies two methods to scrape data from www.olx.pl website

Scraping category data

This method scrapes available offer urls from OLX search results with parameters

olx.category.get_category(main_category=None, sub_category=None, detail_category=None, region=None, search_query=None, url=None, **filters)[source]

Parses available offer urls from given category from every page

Parameters:
  • url – User defined url for OLX page with offers. It overrides category parameters and applies search filters.
  • main_category – Main category
  • sub_category – Sub category
  • detail_category – Detail category
  • region – Region of search
  • search_query – Additional search query
  • filters – Dictionary with additional filters. Following example dictionary contains every possible filter

with examples of it’s values.

Example:
input_dict = {
“[filter_float_price:from]”: 2000, # minimal price “[filter_float_price:to]”: 3000, # maximal price “[filter_enum_floor_select][0]”: 3, # desired floor, enum: from -1 to 11 (10 and more) and 17 (attic) “[filter_enum_furniture][0]”: True, # furnished or unfurnished offer “[filter_enum_builttype][0]”: “blok”, # valid build types: # blok, kamienica, szeregowiec, apartamentowiec, wolnostojacy, loft “[filter_float_m:from]”: 25, # minimal surface “[filter_float_m:to]”: 50, # maximal surface “[filter_enum_rooms][0]”: 2 # desired number of rooms, enum: from 1 to 4 (4 and more)

}

Returns:List of all offers for given parameters
Return type:list

It can be used like this:

input_dict = {'[filter_float_price:from]': 2000}
parsed_urls = olx.category.get_category("nieruchomosci", "mieszkania", "wynajem", "Gdańsk", **input_dict)

The above code will put a list of urls containing all the apartments found in the given category into the parsed_url variable

Scraping offer data

This method scrapes all offer details from

It can be used like this:

descriptions = olx.offer.get_descriptions(parsed_urls)

The above code will put a list of offer details for each offer url provided in parsed_urls into the descriptions variable

Category methods

olx.category.get_category(main_category=None, sub_category=None, detail_category=None, region=None, search_query=None, url=None, **filters)[source]

Parses available offer urls from given category from every page

Parameters:
  • url – User defined url for OLX page with offers. It overrides category parameters and applies search filters.
  • main_category – Main category
  • sub_category – Sub category
  • detail_category – Detail category
  • region – Region of search
  • search_query – Additional search query
  • filters – Dictionary with additional filters. Following example dictionary contains every possible filter

with examples of it’s values.

Example:
input_dict = {
“[filter_float_price:from]”: 2000, # minimal price “[filter_float_price:to]”: 3000, # maximal price “[filter_enum_floor_select][0]”: 3, # desired floor, enum: from -1 to 11 (10 and more) and 17 (attic) “[filter_enum_furniture][0]”: True, # furnished or unfurnished offer “[filter_enum_builttype][0]”: “blok”, # valid build types: # blok, kamienica, szeregowiec, apartamentowiec, wolnostojacy, loft “[filter_float_m:from]”: 25, # minimal surface “[filter_float_m:to]”: 50, # maximal surface “[filter_enum_rooms][0]”: 2 # desired number of rooms, enum: from 1 to 4 (4 and more)

}

Returns:List of all offers for given parameters
Return type:list
olx.category.get_offers_for_page(page, main_category=None, sub_category=None, detail_category=None, region=None, search_query=None, url=None, **filters)[source]

Parses offers for one specific page of given category with filters.

Parameters:
  • page (int) – Page number
  • url (str, None) – User defined url for OLX page with offers. It overrides category parameters and applies search filters.
  • main_category (str, None) – Main category
  • sub_category (str, None) – Sub category
  • detail_category (str, None) – Detail category
  • region (str, None) – Region of search
  • filters (dict) – See :meth category.get_category for reference
Returns:

List of all offers for given page and parameters

Return type:

list

olx.category.get_page_count(markup)[source]

Reads total page number from OLX search page

Parameters:markup (str) – OLX search page markup
Returns:Total page number extracted from js script
Return type:int
olx.category.get_page_count_for_filters(main_category=None, sub_category=None, detail_category=None, region=None, search_query=None, url=None, **filters)[source]

Reads total page number for given search filters

Parameters:
  • url (str, None) – User defined url for OLX page with offers. It overrides category parameters and applies search filters.
  • main_category (str, None) – Main category
  • sub_category (str, None) – Sub category
  • detail_category (str, None) – Detail category
  • region (str, None) – Region of search
  • search_query (str, None) – Additional search query
  • filters – See :meth category.get_category for reference
Returns:

Total page number

Return type:

int

olx.category.parse_ads_count(markup)[source]

Reads total number of adds

Parameters:markup (str) – OLX search page markup
Returns:Total ads count from script
Return type:int
olx.category.parse_available_offers(markup)[source]

Collects all offer links on search page markup

Parameters:markup (str) – Search page markup
Returns:Links to offer on given search page
Return type:list
olx.category.parse_offer_url(markup)[source]

Searches for offer links in markup

Offer links on OLX are in class “linkWithHash”. Only www.olx.pl domain is whitelisted.

Parameters:markup (str) – Search page markup
Returns:Url with offer
Return type:str

Offer methods

olx.offer.get_additional_rent(offer_markup)[source]

Searches for additional rental costs

Parameters:offer_markup (str) –
Returns:Additional rent
Return type:int
olx.offer.get_date_added(offer_markup)[source]

Searches of date of adding offer

Parameters:offer_markup (str) – Class “offerbody” from offer page markup
Returns:Date of adding offer
Return type:str
olx.offer.get_gps(offer_markup)[source]

Searches for gps coordinates (latitude and longitude)

Parameters:offer_markup (str) – Class “offerbody” from offer page markup
Returns:Tuple of gps coordinates
Return type:tuple
olx.offer.get_gpt_script(offer_markup)[source]

Parses data from script of Google Tag Manager

Parameters:offer_markup (str) – Body from offer page markup
Returns:GPT dict data
Return type:dict
olx.offer.get_img_url(offer_markup)[source]

Searches for images in offer markup

Parameters:offer_markup (str) – Class “offerbody” from offer page markup
Returns:Images of offer in list
Return type:list
olx.offer.get_poster_name(offer_markup)[source]

Searches for poster name

Parameters:offer_markup (str) – Class “offerbody” from offer page markup
Returns:Poster name or None if poster name was not found (offer is outdated)
Return type:str, None
Except:Poster name not found
olx.offer.get_surface(offer_markup)[source]

Searches for surface in offer markup

Parameters:offer_markup (str) – Class “offerbody” from offer page markup
Returns:Surface or None if there is no surface
Return type:float, None
Except:When there is no offer surface it will return None
olx.offer.get_title(offer_markup)[source]

Searches for offer title on offer page

Parameters:offer_markup (str) – Class “offerbody” from offer page markup
Returns:Title of offer
Return type:str, None
olx.offer.parse_description(offer_markup)[source]

Searches for description if offer markup

Parameters:offer_markup (str) – Body from offer page markup
Returns:Description of offer
Return type:str
olx.offer.parse_flat_data(offer_markup, data_dict)[source]

Parses flat data

Data includes if offer private or business, number of floor, number of rooms, built type and furniture.

Parameters:
  • offer_markup (str) – Body from offer page markup
  • data_dict (dict) – Dict with GPT script data
Returns:

Dictionary of flat data

Return type:

dict

olx.offer.parse_offer(url)[source]

Parses data from offer page url

Parameters:
  • url (str) – Offer page markup
  • url – Url of current offer page
Returns:

Dictionary with all offer details or None if offer is not available anymore

Return type:

dict, None

olx.offer.parse_region(offer_markup)[source]

Parses region information

Parameters:offer_markup (str) – Class “offerbody” from offer page markup
Returns:Region of offer
Return type:list
olx.offer.parse_tracking_data(offer_markup)[source]

Parses price and add_id from OLX tracking data script

Parameters:offer_markup (str) – Head from offer page
Returns:Tuple of int price and it’s currency or None if this offer page got deleted
Return type:tuple, None
Except:This offer page got deleted and has no tracking script.

Utils methods

olx.utils.city_name(city)[source]

Creates valid OLX url city name

OLX city name can’t include polish characters, upper case letters. It also should replace white spaces with dashes.

Parameters:city (str) – City name not in OLX url format
Returns:Valid OLX url city name
Return type:str
Example:

>> city_name(“Ruda Śląska”) “ruda-slaska”

olx.utils.get_content_for_url(url)[source]

Connects with given url

If environmental variable DEBUG is True it will cache response for url in /var/temp directory

Parameters:url (str) – Website url
Returns:Response for requested url
olx.utils.get_search_filter(filter_name, filter_value)[source]

Generates url search filter

Parameters:
  • filter_name (str) – Filter name in OLX format. See :meth:’olx.get_category’ for reference
  • filter_value – Correct value for filter
Returns:

Percent-encoded url search filter

:rtype str

Example:

>> get_search_filter([filter_float_price:from], 2000) “search%5Bfilter_float_price%3Afrom%5D=2000”

olx.utils.get_url(main_category=None, sub_category=None, detail_category=None, region=None, search_query=None, page=None, user_url=None, **filters)[source]

Creates url for given parameters

Parameters:
  • user_url – User defined OLX search url
  • main_category (str, None) – Main category
  • sub_category (str, None) – Sub category
  • detail_category (str, None) – Detail category
  • region (str, None) – Region of search
  • search_query (str) – Search query string
  • page (int, None) – Page number
  • filters (dict) – Dictionary with additional filters. See :meth:’olx.get_category’ for reference
Returns:

Url for given parameters

Return type:

str

Indices and tables