Welcome to pyolx’s documentation!

Introduction

pyolx supplies two methods to scrape data from www.olx.pl website

Scraping category data

This method scrapes available offer urls from OLX search results with parameters

olx.category.get_category(main_category, sub_category, detail_category, region, **filters)[source]

Parses available offer urls from given category from every page

Parameters:
  • main_category – Main category
  • sub_category – Sub category
  • detail_category – Detail category
  • region – Region of search
  • filters – Dictionary with additional filters. Following example dictionary contains every possible filter

with examples of it’s values.

Example:
input_dict = {
“[filter_float_price:from]”: 2000, # minimal price “[filter_float_price:to]”: 3000, # maximal price “[filter_enum_floor_select][0]”: 3, # desired floor, enum: from -1 to 11 (10 and more) and 17 (attic) “[filter_enum_furniture][0]”: True, # furnished or unfurnished offer “[filter_enum_builttype][0]”: “blok”, # valid build types: # blok, kamienica, szeregowiec, apartamentowiec, wolnostojacy, loft “[filter_float_m:from]”: 25, # minimal surface “[filter_float_m:to]”: 50, # maximal surface “[filter_enum_rooms][0]”: 2 # desired number of rooms, enum: from 1 to 4 (4 and more)

}

Returns:List of all offers for given parameters
Return type:list

It can be used like this:

input_dict = {'[filter_float_price:from]': 2000}
parsed_urls = olx.category.get_category("nieruchomosci", "mieszkania", "wynajem", "Gdańsk", **input_dict)

The above code will put a list of urls containing all the apartments found in the given category into the parsed_url variable

Scraping offer data

This method scrapes all offer details from

olx.offer.get_descriptions(parsed_urls)[source]

Parses details of offers in category

Parameters:parsed_urls (list) – List of offers urls
Returns:List of details of offers
Return type:list
Except:If this offer is not available anymore

It can be used like this:

descriptions = olx.offer.get_descriptions(parsed_urls)

The above code will put a list of offer details for each offer url provided in parsed_urls into the descriptions variable

Category methods

olx.category.get_category(main_category, sub_category, detail_category, region, **filters)[source]

Parses available offer urls from given category from every page

Parameters:
  • main_category – Main category
  • sub_category – Sub category
  • detail_category – Detail category
  • region – Region of search
  • filters – Dictionary with additional filters. Following example dictionary contains every possible filter

with examples of it’s values.

Example:
input_dict = {
“[filter_float_price:from]”: 2000, # minimal price “[filter_float_price:to]”: 3000, # maximal price “[filter_enum_floor_select][0]”: 3, # desired floor, enum: from -1 to 11 (10 and more) and 17 (attic) “[filter_enum_furniture][0]”: True, # furnished or unfurnished offer “[filter_enum_builttype][0]”: “blok”, # valid build types: # blok, kamienica, szeregowiec, apartamentowiec, wolnostojacy, loft “[filter_float_m:from]”: 25, # minimal surface “[filter_float_m:to]”: 50, # maximal surface “[filter_enum_rooms][0]”: 2 # desired number of rooms, enum: from 1 to 4 (4 and more)

}

Returns:List of all offers for given parameters
Return type:list
olx.category.get_offers_for_page(main_category, sub_category, detail_category, region, page, **filters)[source]

Parses offers for one specific page of given category with filters.

Parameters:
  • main_category (str) – Main category
  • sub_category (str) – Sub category
  • detail_category (str) – Detail category
  • region (str) – Region of search
  • page (int) – Page number
  • filters (dict) – See :meth category.get_category for reference
Returns:

List of all offers for given page and parameters

Return type:

list

olx.category.get_page_count(markup)[source]

Reads total page number from OLX search page

Parameters:markup (str) – OLX search page markup
Returns:Total page number extracted from js script
Return type:int
olx.category.get_page_count_for_filters(main_category, sub_category, detail_category, region, **filters)[source]

Reads total page number for given search filters

Parameters:
  • main_category (str) – Main category
  • sub_category (str) – Sub category
  • detail_category (str) – Detail category
  • region (str) – Region of search
  • filters – See :meth category.get_category for reference
Returns:

Total page number

Return type:

int

olx.category.parse_available_offers(markup)[source]

Collects all offer links on search page markup

Parameters:markup (str) – Search page markup
Returns:Links to offer on given search page
Return type:list
olx.category.parse_offer_url(markup)[source]

Searches for offer links in markup

Offer links on OLX are in class “linkWithHash”. Only www.olx.pl domain is whitelisted.

Parameters:markup (str) – Search page markup
Returns:Url with offer
Return type:str

Offer methods

olx.offer.get_additional_rent(offer_markup)[source]

Searches for additional rental costs

Parameters:offer_markup (str) –
Returns:Additional rent
Return type:int
olx.offer.get_date_added(offer_markup)[source]

Searches of date of adding offer

Parameters:offer_markup (str) – Class “offerbody” from offer page markup
Returns:Date of adding offer
Return type:str
olx.offer.get_descriptions(parsed_urls)[source]

Parses details of offers in category

Parameters:parsed_urls (list) – List of offers urls
Returns:List of details of offers
Return type:list
Except:If this offer is not available anymore
olx.offer.get_gps(offer_markup)[source]

Searches for gps coordinates (latitude and longitude)

Parameters:offer_markup (str) – Class “offerbody” from offer page markup
Returns:Tuple of gps coordinates
Return type:tuple
olx.offer.get_img_url(offer_markup)[source]

Searches for images in offer markup

Parameters:offer_markup (str) – Class “offerbody” from offer page markup
Returns:Images of offer in list
Return type:list
olx.offer.get_poster_name(offer_markup)[source]

Searches for poster name

Parameters:offer_markup (str) – Class “offerbody” from offer page markup
Returns:Poster name
Return type:str
olx.offer.get_surface(offer_markup)[source]

Searches for surface in offer markup

Parameters:offer_markup (str) – Class “offerbody” from offer page markup
Returns:Surface
Return type:float
Except:When there is no offer surface it will return None
olx.offer.get_title(offer_markup)[source]

Searches for offer title on offer page

Parameters:offer_markup (str) – Class “offerbody” from offer page markup
Returns:Title of offer
Return type:str
olx.offer.parse_description(offer_markup)[source]

Searches for description if offer markup

Parameters:offer_markup (str) – Body from offer page markup
Returns:Description of offer
Return type:str
olx.offer.parse_flat_data(offer_markup)[source]

Parses flat data from script of Google Tag Manager

Data includes if offer private or business, number of floor, number of rooms, built type and furniture.

Parameters:offer_markup (str) – Body from offer page markup
Returns:Dictionary of flat data
Return type:dict
olx.offer.parse_offer(markup, url)[source]

Parses data from offer page markup

Parameters:
  • markup (str) – Offer page markup
  • url (str) – Url of current offer page
Returns:

Dictionary with all offer details

Return type:

dict

olx.offer.parse_price(offer_markup)[source]

Searches for price on offer page

Parameters:offer_markup (str) – Head from offer page
Returns:Tuple of int price and it’s currency
Return type:tuple
olx.offer.parse_region(offer_markup)[source]

Parses region information :param offer_markup: :return:

Utils methods

olx.utils.city_name(city)[source]

Creates valid OLX url city name

OLX city name can’t include polish characters, upper case letters. It also should replace white spaces with dashes.

Parameters:city (str) – City name not in OLX url format
Returns:Valid OLX url city name
Return type:str
Example:

>> city_name(“Ruda Śląska”) “ruda-slaska”

olx.utils.flatten(container)[source]

Flatten a list

Parameters:container (list) – list with nested lists
Returns:list with elements that were nested in container
Return type:list
olx.utils.get_content_for_url(url)[source]

Connects with given url

If environmental variable DEBUG is True it will cache response for url in /var/temp directory

Parameters:url (str) – Website url
Returns:Response for requested url
olx.utils.get_search_filter(filter_name, filter_value)[source]

Generates url search filter

Parameters:
  • filter_name (str) – Filter name in OLX format. See :meth:’olx.get_category’ for reference
  • filter_value – Correct value for filter
Returns:

Percent-encoded url search filter

:rtype str

Example:

>> get_search_filter([filter_float_price:from], 2000) “search%5Bfilter_float_price%3Afrom%5D=2000”

olx.utils.get_url(main_category, sub_category, detail_category, region, page=None, **filters)[source]

Creates url for given parameters

Parameters:
  • main_category (str) – Main category
  • sub_category (str) – Sub category
  • detail_category (str) – Detail category
  • region (str) – Region of search
  • page (int) – Page number
  • filters (dict) – Dictionary with additional filters. See :meth:’olx.get_category’ for reference
Returns:

Url for given parameters

Return type:

str

olx.utils.replace_all(text, input_dict)[source]

Replace specific strings in string

Parameters:
  • text (str) – string with strings to be replaced
  • input_dict (dict) – dictionary with elements in format string: string to be replaced with
Returns:

String with replaced strings

Return type:

str

Indices and tables