Welcome to region’s documentation!

Region

This is a package for spatially-explicit regionalization. It is currently under active development.

Installation

Download & install

You can clone the github-repo and install region using:

git clone https://github.com/pysal/region.git
cd region
python3 setup.py

Check your installation

To see whether the installation was successful, start a Python 3 session and run

>>> import region

If this command doesn’t raise an ImportError, then region has been installed properly.

User Guide

Introduction

Regionalization or spatial clustering

Let’s assume we have a map of areas and each area has an attribute (or a set of attributes) attached to it. Let’s further assume we want to cluster the areas according to the attribute(s) into larger spatial units – let’s call these larger spatial units regions. If we used a conventional clustering algorithm like the K-Means, we would often end up with regions that are not spatially contiguous (that means the regions wouldn’t form a connected set of areas). Sometimes this is not what we want and we are rather interested in a clustering under contiguity restrictions. Other terms for this task are regionalization and spatial clustering.

This sort of clustering is often one of several steps when analysing geographical data. For example, it helps condense large amounts of data, thus facilitating calculations and improving the readability of visualizations.

region

The region package offers various clustering algorithms which can be used with Python 3. The algorithms work with a range of common representations of your map, this means you can use geopandas’ GeoDataFrames, PySAL’s W objects, networkx’ Graphs, sparse (adjacency) matrices, or simple dictionaries.

With the p-regions problem and the max-p-regions problem, region addresses different regionalization tasks.

The p-Regions Problem

The task

The p-regions problem is defined as a regionalization task where the number of regions (clusters) is known. This number is often denoted by p in the literature, hence the name of the problem. The goal is to construct regions with a large degree of similarity among a region’s areas and with large dissimilarities between regions. Furthermore, each region has to consist of spatially contiguous areas. This condition distinguishes regionalization algorithms from conventional clustering algorithms like the k-Means-algorithm.

There are many approaches to solving the p-regions problem. One way to solve it is to translate it into an mixed integer program and solve it exactly while another category of algorithms comprises heuristic approaches.

An example

Let’s have a look at an example. Assume we have a map of 3 x 3 areas. Each area has an attribute and the situation looks as follows:

 726.7 | 623.6 | 487.3
----------------------
 200.4 | 245.0 | 481.0
----------------------
 170.9 | 225.9 | 226.9

Furthermore, assume we want to cluster the areas into two regions. A possible solution could have all areas with an attribute value greater than 300 clustered into one region. The remaining areas would then form the other region. Here we see our nine areas again but this time each number denotes what region an area belongs to:

    1  |   1   |   1
----------------------
    0  |   0   |   1
----------------------
    0  |   0   |   0

Defining the above input data, calculating the solution and plotting two maps (with the color once indicating the areas’ attribute values and once their calculated region) takes only a few lines of code – check it out.

Exact methods

[DCM2011] describe three different ways to translate a p-regions problem into exact optimization models and call these approaches Flow, Order, and Tree respectively. While the objective function of these models ensures maximum homogeneity within the regions, the models’ constraints ensure that the resulting regions are spatially contiguous. Implementations for all of the three approaches are available through the region.p_regions.exact.PRegionsExact class. For detailed information on the three different methods please refer to [DCM2011].

Heuristic methods

While exact approaches ensure optimality, they can be rather slow in delivering the solution. That’s why – especially for clustering tasks including many areas – it may be preferable to use an heuristic approach. Many heuristic algorithms have been designed aiming for a high probability of a good solution while keeping the run times low.

This category of algorithms includes the AZP described in [OR1995]. One drawback of this algorithm is that it can get caught in a local optimum and thus may fail to reach the global optimum. That’s why [OR1995] also present three more algorithms which aim to improve results. These algorithms use simulated annealing, a basic tabu queue, or a reactive tabu queue. You can find the implementations in the region.p_regions.azp module. For detailed information on the four different algorithms please refer to [OR1995].

The Max-p-Regions Problem

The task

As the p-regions problem the max-p-regions problem is about finding a clustering which satisfies the spatial contiguity condition. A key difference is that in the max-p-regions problem there is no predefined number of regions. The primary goal is to find a clustering that has as many regions as possible. In order to avoid a clustering in which every area forms its own region there is another condition. It is stated by using so called spatial extensive attributes which each area has. The condition requires that the sum of these spatial extensive attributes reaches or exceeds a predefined threshold.

When the maximum number of regions is found, a second goal has to be met, namely to find the best clustering with the maximum number of regions. Here, optimality is defined by the areas’ attributes. (These attributes may – and in most applications will – be different from the spatial extensive attributes mentioned above.) The max-p-regions problem can be solved using either an exact or a heuristic approach. For a detailed description please refer to [DAR2012].

Exact methods

[DAR2012] shows a way to translate the max-p-regions problem into an integer programming problem. It is implemented in the region.max_p_regions.exact.MaxPRegionsExact class.

Heuristic methods

Since solving the problem exactly may be very slow, [DAR2012] also suggests an heuristic approach. This algorithm involves two steps - a so called construction phase and a local search phase. You can find the implementation in the region.max_p_regions.heuristics.MaxPRegionsHeu class. Please note that region’s implementation uses a modified heuristic p-regions-algorithm for the local search phase. The modification ensures the threshold condition on the spatial extensive attributes is met.

region - package overview

region package

Subpackages

region.max_p_regions package
Subpackages
region.max_p_regions.tests package
Submodules
region.max_p_regions.tests.data module

The data in this file describes the max-p-regions problem depicted in figure 2 on p. 402 in [DAR2012].

region.max_p_regions.tests.test_exact module
region.max_p_regions.tests.test_exact.test_dict_basic()
region.max_p_regions.tests.test_exact.test_dict_multi_attr()
region.max_p_regions.tests.test_exact.test_geodataframe_basic()
region.max_p_regions.tests.test_exact.test_geodataframe_multi_attr()
region.max_p_regions.tests.test_exact.test_graph_dict_basic()
region.max_p_regions.tests.test_exact.test_graph_dict_multi_attr()
region.max_p_regions.tests.test_exact.test_graph_str_basic()
region.max_p_regions.tests.test_exact.test_graph_str_multi_attr()
region.max_p_regions.tests.test_exact.test_scipy_sparse_matrix()
region.max_p_regions.tests.test_exact.test_scipy_sparse_matrix_multi_attr()
region.max_p_regions.tests.test_exact.test_w_basic()
region.max_p_regions.tests.test_exact.test_w_multi_attr()
region.max_p_regions.tests.test_heu module
region.max_p_regions.tests.test_heu.test_dict_basic()
region.max_p_regions.tests.test_heu.test_dict_multi_attr()
region.max_p_regions.tests.test_heu.test_geodataframe_basic()
region.max_p_regions.tests.test_heu.test_geodataframe_multi_attr()
region.max_p_regions.tests.test_heu.test_graph_dict_basic()
region.max_p_regions.tests.test_heu.test_graph_dict_multi_attr()
region.max_p_regions.tests.test_heu.test_graph_str_basic()
region.max_p_regions.tests.test_heu.test_graph_str_multi_attr()
region.max_p_regions.tests.test_heu.test_scipy_sparse_matrix()
region.max_p_regions.tests.test_heu.test_scipy_sparse_matrix_multi_attr()
region.max_p_regions.tests.test_heu.test_w_basic()
region.max_p_regions.tests.test_heu.test_w_multi_attr()
Submodules
region.max_p_regions.exact module
class region.max_p_regions.exact.MaxPRegionsExact

Bases: object

A class for solving the max-p-regions problem by transforming it into a mixed-integer-programming problem (MIP) as described in [DAR2012].

labels_

dict – Each key is an area and each value the region it has been assigned to.

fit(adj, attr, spatially_extensive_attr, threshold, solver='cbc', metric='euclidean')

Alias for fit_from_scipy_sparse_matrix().

Solve the max-p-regions problem as MIP as described in [DAR2012].

The resulting region labels are assigned to the instance’s labels_ attribute.

Parameters:
  • adj (class:scipy.sparse.csr_matrix) – Adjacency matrix representing the areas’ contiguity relation.
  • attr (numpy.ndarray) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering.
  • spatially_extensive_attr (numpy.ndarray) – Array (number of areas x number of attributes) of areas’ attributes relevant to ensuring the threshold condition.
  • threshold (numbers.Real or numpy.ndarray) – The lower bound for a region’s sum of spatially extensive attributes. The argument’s type is numbers.Real if there is only one spatially extensive attribute per area, otherwise it is a one-dimensional array with as many entries as there are spatially extensive attributes per area.
  • solver ({"cbc", "cplex", "glpk", "gurobi"}, default: "cbc") –

    The solver to use. Unless the default solver is used, the user has to make sure that the specified solver is installed.

    • ”cbc” - the Cbc (Coin-or branch and cut) solver
    • ”cplex” - the CPLEX solver
    • ”glpk” - the GLPK (GNU Linear Programming Kit) solver
    • ”gurobi” - the Gurobi Optimizer
  • metric (str or function, default: "euclidean") – See the metric argument in region.util.get_metric_function().
fit_from_dict(neighbors_dict, attr, spatially_extensive_attr, threshold, solver='cbc', metric='euclidean')

Alternative API for fit_from_scipy_sparse_matrix().

Parameters:
  • neighbors_dict (dict) – Each key represents an area and each value is an iterable of neighbors of this area.
  • attr (dict) – A dict with the same keys as neighbors_dict and values representing the attributes for calculating homo-/heterogeneity. A value can be scalar (e.g. float or int) or a numpy.ndarray.
  • spatially_extensive_attr (dict) – A dict with the same keys as neighbors_dict and values representing the spatially extensive attribute (scalar or iterable of scalars). In the max-p-regions problem each region’s sum of spatially extensive attributes must be greater than a specified threshold. In case of iterables of scalars as dict-values all elements of the iterable have to fulfill the condition.
  • threshold (numbers.Real or numpy.ndarray) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • solver ({"cbc", "cplex", "glpk", "gurobi"}, default: "cbc") – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • metric (str or function, default: "euclidean") – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
fit_from_geodataframe(gdf, attr, spatially_extensive_attr, threshold, solver='cbc', metric='euclidean', contiguity='rook')

Alternative API for fit_from_scipy_sparse_matrix().

Parameters:
  • gdf (GeoDataFrame) –
  • attr (str or list) – The clustering criteria (columns of the GeoDataFrame gdf) are specified as string (for one column) or list of strings (for multiple columns).
  • spatially_extensive_attr (str or list) – The name (str) or names (list of strings) of column(s) in gdf containing the spatially extensive attributes.
  • threshold (numbers.Real or numpy.ndarray) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • solver ({"cbc", "cplex", "glpk", "gurobi"}, default: "cbc") – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • metric (str or function, default: "euclidean") – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • contiguity ({"rook", "queen"}, default: "rook") –

    Defines the contiguity relationship between areas. Possible contiguity definitions are:

    • ”rook” - Rook contiguity.
    • ”queen” - Queen contiguity.
fit_from_networkx(graph, attr, spatially_extensive_attr, threshold, solver='cbc', metric='euclidean')

Alternative API for fit_from_scipy_sparse_matrix().

Parameters:
  • graph (networkx.Graph) –
  • attr (str, list or dict) – If the clustering criteria are present in the networkx.Graph graph as node attributes, then they can be specified as a string (for one criterion) or as a list of strings (for multiple criteria). Alternatively, a dict can be used with each key being a node of the networkx.Graph graph and each value being the corresponding clustering criterion (a scalar (e.g. float or int) or a numpy.ndarray). If there are no clustering criteria present in the networkx.Graph graph as node attributes, then a dictionary must be used for this argument. Refer to the corresponding argument in fit_from_dict() for more details about the expected dict.
  • spatially_extensive_attr (str, list or dict) – If the spatially extensive attribute is present in the networkx.Graph graph as node attributes, then they can be specified as a string (for one attribute) or as a list of strings (for multiple attributes). Alternatively, a dict can be used with each key being a node of the networkx.Graph graph and each value being the corresponding spatially extensive attribute (a scalar (e.g. float or int) or a numpy.ndarray). If there are no spatially extensive attributes present in the networkx.Graph graph as node attributes, then a dictionary must be used for this argument. Refer to the corresponding argument in fit_from_dict() for more details about the expected dict.
  • threshold (numbers.Real or numpy.ndarray) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • solver ({"cbc", "cplex", "glpk", "gurobi"}, default: "cbc") – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • metric (str or function, default: "euclidean") – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
fit_from_scipy_sparse_matrix(adj, attr, spatially_extensive_attr, threshold, solver='cbc', metric='euclidean')

Solve the max-p-regions problem as MIP as described in [DAR2012].

The resulting region labels are assigned to the instance’s labels_ attribute.

Parameters:
  • adj (class:scipy.sparse.csr_matrix) – Adjacency matrix representing the areas’ contiguity relation.
  • attr (numpy.ndarray) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering.
  • spatially_extensive_attr (numpy.ndarray) – Array (number of areas x number of attributes) of areas’ attributes relevant to ensuring the threshold condition.
  • threshold (numbers.Real or numpy.ndarray) – The lower bound for a region’s sum of spatially extensive attributes. The argument’s type is numbers.Real if there is only one spatially extensive attribute per area, otherwise it is a one-dimensional array with as many entries as there are spatially extensive attributes per area.
  • solver ({"cbc", "cplex", "glpk", "gurobi"}, default: "cbc") –

    The solver to use. Unless the default solver is used, the user has to make sure that the specified solver is installed.

    • ”cbc” - the Cbc (Coin-or branch and cut) solver
    • ”cplex” - the CPLEX solver
    • ”glpk” - the GLPK (GNU Linear Programming Kit) solver
    • ”gurobi” - the Gurobi Optimizer
  • metric (str or function, default: "euclidean") – See the metric argument in region.util.get_metric_function().
fit_from_w(w, attr, spatially_extensive_attr, threshold, solver='cbc', metric='euclidean')

Alternative API for fit_from_scipy_sparse_matrix().

Parameters:
region.max_p_regions.heuristics module
class region.max_p_regions.heuristics.MaxPRegionsHeu(local_search=None, random_state=None)

Bases: object

assign_enclaves(partition, enclave_areas, neigh_dict, attr)

Start with a partial partition (not all areas are assigned to a region) and a list of enclave areas (i.e. areas not present in the partial partition). Then assign all enclave areas to regions in the partial partition and return the resulting partition.

Parameters:
  • partition (list) – Each element (of type set) represents a region.
  • enclave_areas (list) – Each element represents an area.
  • neigh_dict (dict) – Each key represents an area. Each value is an iterable of the corresponding neighbors.
  • attr (numpy.ndarray) – See the corresponding argument in fit_from_scipy_sparse_matrix().
Returns:

partition – Each element (of type set) represents a region.

Return type:

list

find_best_area(region, candidates, attr)
Parameters:
  • region (iterable) – Each element represents an area.
  • candidates (iterable) – Each element represents an area bordering on region.
  • attr (numpy.ndarray) – See the corresponding argument in fit_from_scipy_sparse_matrix().
Returns:

An element of candidates with minimal dissimilarity when being moved to the region region.

Return type:

best_area

find_best_region_idx(area, partition, candidate_regions_idx, attr)
Parameters:
  • area – The area to be moved to one of the regions specified by candidate_regions_idx.
  • partition (list) – Each element (of type set) represents a region.
  • candidate_regions_idx (iterable) – Each element is the index of a region in the partition list.
  • attr (numpy.ndarray) – See the corresponding argument in fit_from_scipy_sparse_matrix().
Returns:

best_idx – The index of a region (w.r.t. partition) which has the smallest sum of dissimilarities after area area is moved to the region.

Return type:

int

fit(adj, attr, spatially_extensive_attr, threshold, max_it=10, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)

Alias for fit_from_scipy_sparse_matrix().

Solve the max-p-regions problem in a heuristic way (see [DAR2012]).

The resulting region labels are assigned to the instance’s labels_ attribute.

Parameters:
  • adj (scipy.sparse.csr_matrix) – Adjacency matrix representing the areas’ contiguity relation.
  • attr (numpy.ndarray) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering.
  • spatially_extensive_attr (numpy.ndarray) – Array (number of areas x number of attributes) of areas’ attributes relevant to ensuring the threshold condition.
  • threshold (numbers.Real or numpy.ndarray) – The lower bound for a region’s sum of spatially extensive attributes. The argument’s type is numbers.Real if there is only one spatially extensive attribute per area, otherwise it is a one-dimensional array with as many entries as there are spatially extensive attributes per area.
  • max_it (int, default: 10) – The maximum number of partitions produced in the algorithm’s construction phase.
  • objective_func (region.objective_function.ObjectiveFunction, default: ObjectiveFunctionPairwise()) – The objective function to use.
fit_from_dict(neighbors_dict, attr, spatially_extensive_attr, threshold, max_it=10, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)

Solve the max-p-regions problem in a heuristic way (see [DAR2012]).

The resulting region labels are assigned to the instance’s labels_ attribute.

Parameters:
  • neighbors_dict (dict) – Each key represents an area and each value is an iterable of neighbors of this area.
  • attr (dict) – A dict with the same keys as neighbors_dict and values representing the attributes for calculating homo-/heterogeneity. A value can be scalar (e.g. float or int) or a numpy.ndarray.
  • spatially_extensive_attr (dict) – A dict with the same keys as neighbors_dict and values representing the spatially extensive attribute (scalar or iterable of scalars). In the max-p-regions problem each region’s sum of spatially extensive attributes must be greater than a specified threshold. In case of iterables of scalars as dict-values all elements of the iterable have to fulfill the condition.
  • threshold (numbers.Real or numpy.ndarray) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • max_it (int, default: 10) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • objective_func (region.ObjectiveFunction, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
fit_from_geodataframe(gdf, attr, spatially_extensive_attr, threshold, max_it=10, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>, contiguity='rook')

Alternative API for fit_from_scipy_sparse_matrix().

Parameters:
  • gdf (geopandas.GeoDataFrame) –
  • attr (str or list) – The clustering criteria (columns of the GeoDataFrame gdf) are specified as string (for one column) or list of strings (for multiple columns).
  • spatially_extensive_attr (str or list) – The name (str) or names (list of strings) of column(s) in gdf containing the spatially extensive attributes.
  • threshold (numbers.Real or numpy.ndarray) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • max_it (int, default: 10) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • objective_func (region.ObjectiveFunction, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • contiguity ({"rook", "queen"}, default: "rook") –

    Defines the contiguity relationship between areas. Possible contiguity definitions are:

    • ”rook” - Rook contiguity.
    • ”queen” - Queen contiguity.
fit_from_networkx(graph, attr, spatially_extensive_attr, threshold, max_it=10, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)

Alternative API for fit_from_scipy_sparse_matrix().

Parameters:
  • graph (networkx.Graph) –
  • attr (str, list or dict) – If the clustering criteria are present in the networkx.Graph graph as node attributes, then they can be specified as a string (for one criterion) or as a list of strings (for multiple criteria). Alternatively, a dict can be used with each key being a node of the networkx.Graph graph and each value being the corresponding clustering criterion (a scalar (e.g. float or int) or a numpy.ndarray). If there are no clustering criteria present in the networkx.Graph graph as node attributes, then a dictionary must be used for this argument. Refer to the corresponding argument in fit_from_dict() for more details about the expected dict.
  • spatially_extensive_attr (str, list or dict) – If the spatially extensive attribute is present in the networkx.Graph graph as node attributes, then they can be specified as a string (for one attribute) or as a list of strings (for multiple attributes). Alternatively, a dict can be used with each key being a node of the networkx.Graph graph and each value being the corresponding spatially extensive attribute (a scalar (e.g. float or int) or a numpy.ndarray). If there are no spatially extensive attributes present in the networkx.Graph graph as node attributes, then a dictionary must be used for this argument. Refer to the corresponding argument in fit_from_dict() for more details about the expected dict.
  • threshold (numbers.Real or numpy.ndarray) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • max_it (int, default: 10) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • objective_func (region.ObjectiveFunction, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
fit_from_scipy_sparse_matrix(adj, attr, spatially_extensive_attr, threshold, max_it=10, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)

Solve the max-p-regions problem in a heuristic way (see [DAR2012]).

The resulting region labels are assigned to the instance’s labels_ attribute.

Parameters:
  • adj (scipy.sparse.csr_matrix) – Adjacency matrix representing the areas’ contiguity relation.
  • attr (numpy.ndarray) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering.
  • spatially_extensive_attr (numpy.ndarray) – Array (number of areas x number of attributes) of areas’ attributes relevant to ensuring the threshold condition.
  • threshold (numbers.Real or numpy.ndarray) – The lower bound for a region’s sum of spatially extensive attributes. The argument’s type is numbers.Real if there is only one spatially extensive attribute per area, otherwise it is a one-dimensional array with as many entries as there are spatially extensive attributes per area.
  • max_it (int, default: 10) – The maximum number of partitions produced in the algorithm’s construction phase.
  • objective_func (region.objective_function.ObjectiveFunction, default: ObjectiveFunctionPairwise()) – The objective function to use.
fit_from_w(w, attr, spatially_extensive_attr, threshold, max_it=10, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)

Alternative API for fit_from_scipy_sparse_matrix().

Parameters:
  • w (libpysal.weights.weights.W) – W object representing the contiguity relation.
  • attr (numpy.ndarray) – Each element specifies an area’s attribute which is used for calculating the objective function.
  • spatially_extensive_attr (numpy.ndarray) – Each element specifies an area’s spatially extensive attribute which is used to ensure that the sum of spatially extensive attributes in each region adds up to a threshold defined by the threshold argument.
  • threshold (numbers.Real or numpy.ndarray) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • max_it (int, default: 10) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • objective_func (region.ObjectiveFunction, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
grow_regions(adj, attr, spatially_extensive_attr, threshold)
Parameters:
Returns:

resultresult[0] is a list. Each list element is a set of a region’s areas. Note that not every area is assigned to a region after this function has terminated, so they won’t be in any of the set`s in `result[0]. result[1] is a list of areas not assigned to any region.

Return type:

tuple

region.p_regions package
Subpackages
region.p_regions.tests package
Submodules
region.p_regions.tests.data module

The data in this file describes the p-regions problem depicted in figure 1 on p. 106 in [DCM2011].

region.p_regions.tests.test_azp module
region.p_regions.tests.test_azp.test_dict()
region.p_regions.tests.test_azp.test_dict_multi_attr()
region.p_regions.tests.test_azp.test_geodataframe()
region.p_regions.tests.test_azp.test_geodataframe_multi_attr()
region.p_regions.tests.test_azp.test_graph_dict_basic()
region.p_regions.tests.test_azp.test_graph_dict_multi_attr()
region.p_regions.tests.test_azp.test_graph_str_basic()
region.p_regions.tests.test_azp.test_graph_str_multi_attr()
region.p_regions.tests.test_azp.test_scipy_sparse_matrix()
region.p_regions.tests.test_azp.test_scipy_sparse_matrix_multi_attr()
region.p_regions.tests.test_azp.test_w_basic()
region.p_regions.tests.test_azp.test_w_multi_attr()
region.p_regions.tests.test_azp_basic_tabu module
region.p_regions.tests.test_azp_basic_tabu.test_dict()
region.p_regions.tests.test_azp_basic_tabu.test_dict_multi_attr()
region.p_regions.tests.test_azp_basic_tabu.test_geodataframe()
region.p_regions.tests.test_azp_basic_tabu.test_geodataframe_multi_attr()
region.p_regions.tests.test_azp_basic_tabu.test_graph_dict_basic()
region.p_regions.tests.test_azp_basic_tabu.test_graph_dict_multi_attr()
region.p_regions.tests.test_azp_basic_tabu.test_graph_str_basic()
region.p_regions.tests.test_azp_basic_tabu.test_graph_str_multi_attr()
region.p_regions.tests.test_azp_basic_tabu.test_scipy_sparse_matrix()
region.p_regions.tests.test_azp_basic_tabu.test_scipy_sparse_matrix_multi_attr()
region.p_regions.tests.test_azp_basic_tabu.test_w_basic()
region.p_regions.tests.test_azp_basic_tabu.test_w_multi_attr()
region.p_regions.tests.test_azp_reactive_tabu module
region.p_regions.tests.test_azp_reactive_tabu.test_dict()
region.p_regions.tests.test_azp_reactive_tabu.test_dict_multi_attr()
region.p_regions.tests.test_azp_reactive_tabu.test_geodataframe()
region.p_regions.tests.test_azp_reactive_tabu.test_geodataframe_multi_attr()
region.p_regions.tests.test_azp_reactive_tabu.test_graph_dict_basic()
region.p_regions.tests.test_azp_reactive_tabu.test_graph_dict_multi_attr()
region.p_regions.tests.test_azp_reactive_tabu.test_graph_str_basic()
region.p_regions.tests.test_azp_reactive_tabu.test_graph_str_multi_attr()
region.p_regions.tests.test_azp_reactive_tabu.test_scipy_sparse_matrix()
region.p_regions.tests.test_azp_reactive_tabu.test_scipy_sparse_matrix_multi_attr()
region.p_regions.tests.test_azp_reactive_tabu.test_w_basic()
region.p_regions.tests.test_azp_reactive_tabu.test_w_multi_attr()
region.p_regions.tests.test_azp_simulated_annealing module
region.p_regions.tests.test_azp_simulated_annealing.test_dict()
region.p_regions.tests.test_azp_simulated_annealing.test_dict_multi_attr()
region.p_regions.tests.test_azp_simulated_annealing.test_geodataframe()
region.p_regions.tests.test_azp_simulated_annealing.test_geodataframe_multi_attr()
region.p_regions.tests.test_azp_simulated_annealing.test_graph_dict_basic()
region.p_regions.tests.test_azp_simulated_annealing.test_graph_dict_multi_attr()
region.p_regions.tests.test_azp_simulated_annealing.test_graph_str_basic()
region.p_regions.tests.test_azp_simulated_annealing.test_graph_str_multi_attr()
region.p_regions.tests.test_azp_simulated_annealing.test_scipy_sparse_matrix()
region.p_regions.tests.test_azp_simulated_annealing.test_scipy_sparse_matrix_multi_attr()
region.p_regions.tests.test_azp_simulated_annealing.test_w_basic()
region.p_regions.tests.test_azp_simulated_annealing.test_w_multi_attr()
region.p_regions.tests.test_exact module
region.p_regions.tests.test_exact.method(request)
region.p_regions.tests.test_exact.test_dict(method)
region.p_regions.tests.test_exact.test_dict_multi_attr(method)
region.p_regions.tests.test_exact.test_geodataframe(method)
region.p_regions.tests.test_exact.test_geodataframe_multi_attr(method)
region.p_regions.tests.test_exact.test_graph_dict_basic(method)
region.p_regions.tests.test_exact.test_graph_dict_multi_attr(method)
region.p_regions.tests.test_exact.test_graph_str_basic(method)
region.p_regions.tests.test_exact.test_graph_str_multi_attr(method)
region.p_regions.tests.test_exact.test_scipy_sparse_matrix(method)
region.p_regions.tests.test_exact.test_scipy_sparse_matrix_multi_attr(method)
region.p_regions.tests.test_exact.test_w_basic(method)
region.p_regions.tests.test_exact.test_w_multi_attr(method)
Submodules
region.p_regions.azp module
class region.p_regions.azp.AZP(allow_move_strategy=None, random_state=None)

Bases: object

Class offering the implementation of the AZP algorithm (see [OR1995]).

labels_

numpy.ndarray – Each element is a region label specifying to which region the corresponding area was assigned to by the last run of a fit-method.

fit(adj, attr, n_regions, initial_labels=None, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)

Alias for fit_from_scipy_sparse_matrix().

Perform the AZP algorithm as described in [OR1995].

The resulting region labels are assigned to the instance’s labels_ attribute.

Parameters:
  • adj (scipy.sparse.csr_matrix) – Adjacency matrix representing the contiguity relation.
  • attr (numpy.ndarray) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering.
  • n_regions (int) – Number of desired regions.
  • initial_labels (numpy.ndarray or None, default: None) – One-dimensional array of labels at the beginning of the algorithm. If None, then a random initial clustering will be generated.
  • objective_func (region.objective_function.ObjectiveFunction, default: ObjectiveFunctionPairwise()) – The objective function to use.
fit_from_dict(neighbor_dict, attr, n_regions, initial_labels=None, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)

Alternative API for fit_from_scipy_sparse_matrix().

Parameters:
  • neighbor_dict (dict) – Each key is an area and each value is an iterable of the key area’s neighbors.
  • attr (dict) – Each key is an area and each value is the corresponding clustering-relevant attribute.
  • n_regions (int) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • initial_labels (dict or None, default: None) – Each key represents an area. Each value represents the region, the corresponding area is assigned to at the beginning of the algorithm. If None, then a random initial clustering will be generated.
  • objective_func (region.ObjectiveFunction, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
fit_from_geodataframe(gdf, attr, n_regions, contiguity='rook', initial_labels=None, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)

Alternative API for fit_from_scipy_sparse_matrix().

Parameters:
  • gdf (geopandas.GeoDataFrame) –
  • attr (str or list) – The clustering-relevant attributes (columns of the GeoDataFrame gdf) are specified as string (for one column) or list of strings (for multiple columns).
  • n_regions (int) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • contiguity ({"rook", "queen"}, default: "rook") –

    Defines the contiguity relationship between areas. Possible contiguity definitions are:

    • ”rook” - Rook contiguity.
    • ”queen” - Queen contiguity.
  • initial_labels (numpy.ndarray or None, default: None) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • objective_func (region.ObjectiveFunction, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
fit_from_networkx(graph, attr, n_regions, initial_labels=None, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)

Alternative API for fit_from_scipy_sparse_matrix().

Parameters:
  • graph (networkx.Graph) – Graph representing the contiguity relation.
  • attr (str, list or dict) – If the clustering criteria are present in the networkx.Graph graph as node attributes, then they can be specified as a string (for one criterion) or as a list of strings (for multiple criteria). Alternatively, a dict can be used with each key being a node of the networkx.Graph graph and each value being the corresponding clustering criterion (a scalar (e.g. float or int) or a numpy.ndarray). If there are no clustering criteria present in the networkx.Graph graph as node attributes, then a dictionary must be used for this argument. Refer to the corresponding argument in fit_from_dict() for more details about the expected dict.
  • n_regions (int) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
  • initial_labels (str or dict or None, default: None) – If str, then the string names the graph’s attribute holding the information about the initial clustering. If dict, then each key is a node and each value is the region the key area is assigned to at the beginning of the algorithm. If None, then a random initial clustering will be generated.
  • objective_func (region.ObjectiveFunction, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument in fit_from_scipy_sparse_matrix().
fit_from_scipy_sparse_matrix(adj, attr, n_regions, initial_labels=None, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)

Perform the AZP algorithm as described in [OR1995].

The resulting region labels are assigned to the instance’s labels_ attribute.

Parameters:
  • adj (scipy.sparse.csr_matrix) – Adjacency matrix representing the contiguity relation.
  • attr (numpy.ndarray) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering.
  • n_regions (int) – Number of desired regions.
  • initial_labels (numpy.ndarray or None, default: None) – One-dimensional array of labels at the beginning of the algorithm. If None, then a random initial clustering will be generated.
  • objective_func (region.objective_function.ObjectiveFunction, default: ObjectiveFunctionPairwise()) – The objective function to use.
fit_from_w(w, attr, n_regions, initial_labels=None, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)

Alternative API for fit_from_scipy_sparse_matrix().

Parameters:
class region.p_regions.azp.AZPBasicTabu(tabu_length=None, repetitions_before_termination=5, random_state=None)

Bases: region.p_regions.azp.AZPTabu

Implementation of the AZP with basic tabu (refer to [OR1995]).

labels_

numpy.ndarray – Each element is a region label specifying to which region the corresponding area was assigned to by the last run of a fit-method.

class region.p_regions.azp.AZPReactiveTabu(max_iterations, k1, k2, random_state=None)

Bases: region.p_regions.azp.AZPTabu

Implementation of the AZP with reactive tabu (refer to [OR1995]).

labels_

numpy.ndarray – Each element is a region label specifying to which region the corresponding area was assigned to by the last run of a fit-method.

class region.p_regions.azp.AZPSimulatedAnnealing(init_temperature=None, max_iterations=inf, sa_moves_term=10, nonmoving_steps_before_stop=3, repetitions_before_termination=5, random_state=None)

Bases: object

Class offering the implementation of the AZP-SA algorithm (see [OR1995]).

labels_

numpy.ndarray – Each element is a region label specifying to which region the corresponding area was assigned to by the last run of a fit-method.

fit_from_dict(neighbor_dict, attr, n_regions, initial_labels=None, cooling_factor=0.85, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)
Parameters:
fit_from_geodataframe(gdf, attr, n_regions, contiguity='rook', initial_labels=None, cooling_factor=0.85, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)
Parameters:
fit_from_networkx(graph, attr, n_regions, initial_labels=None, cooling_factor=0.85, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)
Parameters:
fit_from_scipy_sparse_matrix(adj, attr, n_regions, initial_labels=None, cooling_factor=0.85, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)
Parameters:
fit_from_w(w, attr, n_regions, initial_labels=None, cooling_factor=0.85, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)
Parameters:
move_made_alert()
sa_moves_alert()
class region.p_regions.azp.AZPTabu(allow_move_strategy=None, random_state=None)

Bases: region.p_regions.azp.AZP, abc.ABC

Superclass for tabu variants of the AZP.

reset_tabu(tabu_len=None)
region.p_regions.azp_util module
class region.p_regions.azp_util.AllowMoveAZP

Bases: region.p_regions.azp_util.AllowMoveStrategy

class region.p_regions.azp_util.AllowMoveAZPMaxPRegions(spatially_extensive_attr, threshold, decorated_strategy)

Bases: region.p_regions.azp_util.AllowMoveStrategy

Ensures that the spatially extensive attribute adds up to a given threshold in each region. Only moves preserving this condition in both the donor as well as the recipient region are allowed. The check for the recipient region is necessary in case there is an area with a negative spatially extensive attribute.

start_new_component(initial_labels, attr, objective_func, comp_idx)
class region.p_regions.azp_util.AllowMoveAZPSimulatedAnnealing(init_temperature, sa_moves_term=inf)

Bases: region.p_regions.azp_util.AllowMoveStrategy

notify_min_sa_moves()
notify_move_made()
register_move_made(observer_func)
Parameters:observer_func (callable) – A function to call when a move is allowed.
register_sa_moves_term(observer_func)
Parameters:observer_func (callable) – A function to call when the certain number of SA-moves has been reached. This number is called Q in [OR1995] (page 431).
reset()
update_temperature(temp)
class region.p_regions.azp_util.AllowMoveStrategy

Bases: abc.ABC

start_new_component(initial_labels, attr, objective_func, comp_idx)

This method should be called whenever a new connected component is clustered.

Parameters:
  • initial_labels (numpy.ndarray) – The region labels of the areas in the currently considered connected component. Shape: number of areas in the currently considered component.
  • attr (numpy.ndarray) – The areas’ attributes. Shape: number of areas in the currently considered component.
  • objective_func (region.objective_function.ObjectiveFunction) – The objective function to use.
  • comp_idx (numpy.ndarray) – The indices of those areas belonging to the component clustered next.
region.p_regions.exact module
class region.p_regions.exact.PRegionsExact

Bases: object

A class for solving the p-regions problem by transforming it into a mixed-integer-programming problem (MIP) as described in [DCM2011].

labels_

numpy.ndarray – Each element is a region label specifying to which region the corresponding area was assigned to by the last run of a fit-method.

method

str – The method used in the last call of a fit-method for translating the p-regions problem into an MIP.

metric

function – The distance metric specified in the last call of a fit-method.

n_regions

int – The number of regions the areas were clustered into by the last run of a fit-method.

solver

str – The solver used in the last call of a fit-method.

fit(adj, attr, n_regions, method='flow', solver='cbc', metric='euclidean')

Alias for fit_from_scipy_sparse_matrix().

Solve the p-regions problem as MIP as described in [DCM2011].

The resulting region labels are assigned to the instance’s labels_ attribute.

Parameters:
  • adj (class:scipy.sparse.csr_matrix) – Adjacency matrix representing the areas’ contiguity relation.
  • attr (numpy.ndarray) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering.
  • n_regions (int) – Number of desired regions.
  • method ({"flow", "order", "tree"}, default: "flow") –

    The method to translate the clustering problem into an exact optimization model.

    • ”flow” - Flow model on p. 112-113 in [DCM2011]
    • ”order” - Order model on p. 110-112 in [DCM2011]
    • ”tree” - Tree model on p. 108-110 in [DCM2011]
  • solver ({"cbc", "cplex", "glpk", "gurobi"}, default: "cbc") –

    The solver to use. Unless the default solver is used, the user has to make sure that the specified solver is installed.

    • ”cbc” - the Cbc (Coin-or branch and cut) solver
    • ”cplex” - the CPLEX solver
    • ”glpk” - the GLPK (GNU Linear Programming Kit) solver
    • ”gurobi” - the Gurobi Optimizer
  • metric (str or function, default: "euclidean") – See the metric argument in region.util.get_metric_function().
fit_from_dict(neighbors_dict, attr, n_regions, method='flow', solver='cbc', metric='euclidean')

Alternative API for fit_from_scipy_sparse_matrix().

Parameters:
fit_from_geodataframe(gdf, attr, n_regions, method='flow', solver='cbc', metric='euclidean', contiguity='rook')

Alternative API for fit_from_scipy_sparse_matrix().

Parameters:
  • gdf (GeoDataFrame) –
  • attr (str or list) – The clustering criteria (columns of the GeoDataFrame gdf) are specified as string (for one column) or list of strings (for multiple columns).
  • n_regions (int) – See the corresponding argument in fit_from_scipy_sparse_matrix().
  • method (str) – See the corresponding argument in fit_from_scipy_sparse_matrix().
  • solver (str) – See the corresponding argument in fit_from_scipy_sparse_matrix().
  • contiguity ({"rook", "queen"}, default: "rook") –

    Defines the contiguity relationship between areas. Possible contiguity definitions are:

    • ”rook” - Rook contiguity.
    • ”queen” - Queen contiguity.
  • metric (str or function, default: "euclidean") – See the corresponding argument in fit_from_scipy_sparse_matrix().
fit_from_networkx(graph, attr, n_regions, method='flow', solver='cbc', metric='euclidean')

Alternative API for fit_from_scipy_sparse_matrix().

Parameters:
  • graph (networkx.Graph) – Graph representing the areas’ contiguity relation.
  • attr (str, list or dict) – If the clustering criteria are present in the networkx.Graph graph as node attributes, then they can be specified as a string (for one criterion) or as a list of strings (for multiple criteria). Alternatively, a dict can be used with each key being a node of the networkx.Graph graph and each value being the corresponding clustering criterion (a scalar (e.g. float or int) or a numpy.ndarray). If there are no clustering criteria present in the networkx.Graph graph as node attributes, then a dictionary must be used for this argument. Refer to the corresponding argument in fit_from_dict() for more details about the expected dict.
  • n_regions (int) – See the corresponding argument in fit_from_scipy_sparse_matrix().
  • method (str) – See the corresponding argument in fit_from_scipy_sparse_matrix().
  • solver (str) – See the corresponding argument in fit_from_scipy_sparse_matrix().
  • metric (str or function, default: "euclidean") – See the corresponding argument in fit_from_scipy_sparse_matrix().
fit_from_scipy_sparse_matrix(adj, attr, n_regions, method='flow', solver='cbc', metric='euclidean')

Solve the p-regions problem as MIP as described in [DCM2011].

The resulting region labels are assigned to the instance’s labels_ attribute.

Parameters:
  • adj (class:scipy.sparse.csr_matrix) – Adjacency matrix representing the areas’ contiguity relation.
  • attr (numpy.ndarray) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering.
  • n_regions (int) – Number of desired regions.
  • method ({"flow", "order", "tree"}, default: "flow") –

    The method to translate the clustering problem into an exact optimization model.

    • ”flow” - Flow model on p. 112-113 in [DCM2011]
    • ”order” - Order model on p. 110-112 in [DCM2011]
    • ”tree” - Tree model on p. 108-110 in [DCM2011]
  • solver ({"cbc", "cplex", "glpk", "gurobi"}, default: "cbc") –

    The solver to use. Unless the default solver is used, the user has to make sure that the specified solver is installed.

    • ”cbc” - the Cbc (Coin-or branch and cut) solver
    • ”cplex” - the CPLEX solver
    • ”glpk” - the GLPK (GNU Linear Programming Kit) solver
    • ”gurobi” - the Gurobi Optimizer
  • metric (str or function, default: "euclidean") – See the metric argument in region.util.get_metric_function().
fit_from_w(w, attr, n_regions, method='flow', solver='cbc', metric='euclidean')

Alternative API for fit_from_scipy_sparse_matrix().

Parameters:
region.tests package
Submodules
region.tests.test_util module
region.tests.test_util.all_elements_equal(iterable)
region.tests.test_util.not_all_elements_equal(iterable)
region.tests.test_util.test_AZP_azp_connected_component__one_area()
region.tests.test_util.test_assert_feasible__contiguity()
region.tests.test_util.test_assert_feasible__number_of_regions()
region.tests.test_util.test_assert_feasible__pass_connected()
region.tests.test_util.test_assert_feasible__pass_disconnected()
region.tests.test_util.test_pop_randomly_from()
region.tests.test_util.test_random_element_from()
region.tests.util module
region.tests.util.compare_region_lists(actual, desired)
Parameters:
  • actual (list) – Every element (of type set) represents a region.
  • desired (list) – Every element (of type set) represents a region.
Raises:

AssertionError – If the two arguments don’t represent the same clustering.

region.tests.util.convert_from_geodataframe(gdf)

Convert a GeoDataFrame to other types representing the contiguity relation of the GeoDataFrame’s areas.

Parameters:gdf (GeoDataFrame) –
Returns:other_formats – The 1st entry is a sparse adjacency matrix of type scipy.sparse.csr_matrix. The 2nd entry is a networkx graph. The 3rd entry is a dict. Each key is an area and each value is an iterable of the key area’s neighbors. The 4th entry is a PySAL W object.
Return type:tuple
region.tests.util.region_list_from_array(labels)
Parameters:labels (numpy.ndarray) – One-dimensional array of the areas’ region labels.
Returns:region_list – Each element is a set of areas belonging to the same region. The list’s elements (the sets) are sorted by their smallest entry.
Return type:list

Examples

>>> import numpy as np
>>> labels = np.array([0, 0, 0,
...                    1, 1, 0,
...                    1, 1, 1])
>>> desired = [{0, 1, 2, 5}, {3, 4, 6, 7, 8}]
>>> all(r1 == r2
...     for r1, r2 in zip(region_list_from_array(labels), desired))
True

Submodules

region.csgraph_utils module

Utility functions for graph-related operations with sparse adjacency matrices (scipy.sparse.csr_matrix) using and supplementing scipy’s [compressed sparse graph routines]( https://docs.scipy.org/doc/scipy/reference/sparse.csgraph.html).

region.csgraph_utils.is_connected(adj)
Parameters:adj (scipy.sparse.csr_matrix) – Adjacency matrix.
Returns:connectedTrue if graph defined by adjecency matrix adj is connected. False otherwise.
Return type:bool

Examples

>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> connected = csr_matrix(np.array([[0, 1],
...                                  [1, 0]]))
>>> is_connected(connected)
True
>>> disconnected = csr_matrix(np.array([[0, 0],
...                                     [0, 0]]))
>>> is_connected(disconnected)
False
region.csgraph_utils.neighbors(adj, area)
Parameters:
  • adj (scipy.sparse.csr_matrix) – Adjacency matrix.
  • area (int) – An area specified as index in the graph represented by the adjacency matrix adj. The integer must be in {0, 1, …, adj.shape[0]-1}.
Returns:

neighs – The neighbors of area.

Return type:

numpy.ndarray

Examples

>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> adjacency_matrix = csr_matrix(np.array([[0, 1, 1],
...                                         [1, 0, 0],
...                                         [1, 0, 0]]))
>>> (neighbors(adjacency_matrix, 0) == np.array([1, 2])).all()
True
>>> (neighbors(adjacency_matrix, 1) == np.array([0])).all()
True
region.csgraph_utils.sub_adj_matrix(adj, nodes, wo_nodes=None)
Parameters:
  • adj (scipy.sparse.csr_matrix) – Adjacency matrix.
  • nodes (numpy.ndarray) – 1-dimensional array of nodes in the graph represented by the adj argument. The elements in nodes are (distinct) integers in {0, 1, …, nodes.shape[0]}.
  • wo_nodes (slice, int, array of ints, or None) – Nodes to neglect from the nodes argument.
Returns:

sub_adj – Adjacency matrix of the subgraph consisting of only the nodes in the nodes argument.

Return type:

scipy.sparse.csr_matrix

Examples

>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> adjacency_matrix = csr_matrix(np.array([[0, 1, 1],
...                                         [1, 0, 0],
...                                         [1, 0, 0]]))
>>> nodes = np.array([0, 2])
>>> obtained = sub_adj_matrix(adjacency_matrix, nodes)
>>> desired = np.array([[0, 1],
...                     [1, 0]])
>>> (obtained.todense() == desired).all()
True
>>> nodes = np.array([1, 2])
>>> obtained = sub_adj_matrix(adjacency_matrix, nodes)
>>> desired = np.array([[0, 0],
...                     [0, 0]])
>>> (obtained.todense() == desired).all()
True
>>> type(obtained) == csr_matrix
True
>>> # tests for `wo_nodes` argument
>>> all_nodes = np.arange(adjacency_matrix.shape[0])
>>> neglected_nodes = np.array([1])
>>> obtained = sub_adj_matrix(adjacency_matrix, all_nodes, neglected_nodes)
>>> desired = np.array([[0, 1],
...                     [1, 0]])
>>> (obtained.todense() == desired).all()
True
region.objective_function module
class region.objective_function.ObjectiveFunction(metric=None)

Bases: abc.ABC

update(moving_area, recipient_region, labels, attr)

Calculate the difference in the objective value caused by moving moving_area to recipient_region.

Parameters:
  • moving_area (int) – The area to move.
  • recipient_region (int) – The recipient region.
  • labels (numpy.ndarray) – The areas’ region labels before the move. Shape: number of areas.
  • attr (numpy.ndarray) – The areas’ attributes. Shape: number of areas.
Returns:

diff – The change in the objective function caused by moving moving_area to recipient_region.

Return type:

float

class region.objective_function.ObjectiveFunctionCenter(metric=None, center=<function mean>, reduction=<function sum>)

Bases: region.objective_function.ObjectiveFunction

update(moving_area, recipient_region, labels, attr)
class region.objective_function.ObjectiveFunctionPairwise(metric=None)

Bases: region.objective_function.ObjectiveFunction

update(moving_area, recipient_region, labels, attr)
region.util module
exception region.util.MissingMetric

Bases: RuntimeError

Raised when a distance metric is required but was not set.

region.util.Move

alias of move

region.util.all_elements_equal(array)
region.util.array_from_df_col(df, attr)

Extract one or more columns from a DataFrame as numpy array.

Parameters:
  • df (Union[DataFrame, GeoDataFrame]) –
  • attr (Union[str, Sequence[str]]) – The columns’ names to extract.
Returns:

col – The specified column(s) of the array.

Return type:

numpy.ndarray

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({"col1": [1, 2, 3],
...                    "col2": [7, 8, 9]})
>>> (array_from_df_col(df, "col1") == np.array([[1],
...                                         [2],
...                                         [3]])).all()
True
>>> (array_from_df_col(df, ["col1"]) == np.array([[1],
...                                           [2],
...                                           [3]])).all()
True
>>> (array_from_df_col(df, ["col1", "col2"]) == np.array([[1, 7],
...                                                   [2, 8],
...                                                   [3, 9]])).all()
True
region.util.array_from_dict_values(dct, sorted_keys=None, flat_output=False, dtype=<class 'float'>)

Return values of the dictionary passed as dct argument as an numpy array. The values in the returned array are sorted by the keys of dct.

Parameters:
  • dct (dict) –
  • sorted_keys (iterable, optional) – If passed, then the elements of the returned array will be sorted by this argument. Thus, this argument can be passed to suppress the sorting, or for getting a subset of the dictionary’s values or to get repeated values.
  • flat_output (bool, default: False) – If True, the returned array will be one-dimensional. If False, the returned array will be two-dimensional with one row per key in dct.
  • dtype (default: np.float64) – The dtype of the returned array.
Returns:

array

Return type:

numpy.ndarray

Examples

>>> dict_flat = {0: 0, 1: 10}
>>> dict_it = {0: [0], 1: [10]}
>>> desired_flat = np.array([0, 10])
>>> desired_2d = np.array([[0],
...                        [10]])
>>> flat_flat = array_from_dict_values(dict_flat, flat_output=True)
>>> (flat_flat == desired_flat).all()
True
>>> flat_2d = array_from_dict_values(dict_flat)
>>> (flat_2d == desired_2d).all()
True
>>> it_flat = array_from_dict_values(dict_it, flat_output=True)
>>> (it_flat == desired_flat).all()
True
>>> it_2d = array_from_dict_values(dict_it)
>>> (it_2d == desired_2d).all()
True
region.util.array_from_graph(graph, attr)
Parameters:
  • graph (networkx.Graph) –
  • attr (str or iterable) – If str, then it specifies the an attribute of the graph’s nodes. If iterable of strings, then multiple attributes of the graph’s nodes are specified.
Returns:

array – Array with one row for each node in graph.

Return type:

numpy.ndarray

Examples

>>> import networkx as nx
>>> edges = [(0, 1), (1, 2),          # 0 | 1 | 2
...          (0, 3), (1, 4), (2, 5),  # ---------
...          (3, 4), (4,5)]           # 3 | 4 | 5
>>> graph = nx.Graph(edges)
>>> data_dict = {node: 10*node for node in graph}
>>> nx.set_node_attributes(graph, "test_data", data_dict)
>>> desired = np.array([[0],
...                     [10],
...                     [20],
...                     [30],
...                     [40],
...                     [50]])
>>> (array_from_graph(graph, "test_data") == desired).all()
True
>>> (array_from_graph(graph, ["test_data"]) == desired).all()
True
>>> (array_from_graph(graph, ["test_data", "test_data"]) ==
...  np.hstack((desired, desired))).all()
True
region.util.array_from_graph_or_dict(graph, attr)
region.util.array_from_region_list(region_list)
Parameters:region_list (list) – Each list element is an iterable of a region’s areas.
Returns:labels – Each element specifies the region of the corresponding area.
Return type:numpy.ndarray

Examples

>>> import numpy as np
>>> obtained = array_from_region_list([{0, 1, 2, 5}, {3, 4}])
>>> desired = np.array([ 0, 0, 0, 1, 1, 0])
>>> (obtained == desired).all()
True
region.util.assert_feasible(solution, adj, n_regions=None)
Parameters:
  • solution (numpy.ndarray) – Array of region labels.
  • adj (scipy.sparse.csr_matrix) – Adjacency matrix representing the contiguity relation.
  • n_regions (int or None) – An int represents the desired number of regions. If None, then the number of regions is not checked.
Raises:

exc : `ValueError` – A ValueError is raised if clustering is not spatially contiguous. Given the n_regions argument is not None, a ValueError is raised also if the number of regions is not equal to the n_regions argument.

region.util.check_solver(solver)
region.util.copy_func(f)

Return a copy of a function. This is useful e.g. to create aliases (whose docstrings can be changed without affecting the original function). The implementation is taken from https://stackoverflow.com/a/13503277.

Parameters:f (function) –
Returns:g – Copy of f.
Return type:function
region.util.count(arr, el)
Parameters:
Returns:

result – The number of occurences of el in arr.

Return type:

numpy.ndarray

Examples

>>> arr = np.array([0, 0, 0, 1, 1])
>>> count(arr, 0)
3
>>> count(arr, 1)
2
>>> count(arr, 2)
0
region.util.dataframe_to_dict(df, cols)
Parameters:
  • df (Union[pandas.DataFrame, geopandas.GeoDataFrame]) –
  • cols (Union[str, list]) – If str, then it is the name of a column of df. If list, then it is a list of strings. Each string is the name of a column of df.
Returns:

result – The keys are the elements of the DataFrame’s index. Each value is a numpy.ndarray holding the corresponding values in the columns specified by cols.

Return type:

dict

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({"data": [100, 120, 115]})
>>> result = dataframe_to_dict(df, "data")
>>> result == {0: 100, 1: 120, 2: 115}
True
>>> import numpy as np
>>> df = pd.DataFrame({"data": [100, 120],
...                    "other": [1, 2]})
>>> actual = dataframe_to_dict(df, ["data", "other"])
>>> desired = {0: np.array([100, 1]), 1: np.array([120, 2])}
>>> all(np.array_equal(actual[i], desired[i]) for i in desired)
True
region.util.dict_from_graph_attr(graph, attr, array_values=False)
Parameters:
  • graph (networkx.Graph) –
  • attr (str, iterable, or dict) – If str, then it specifies the an attribute of the graph’s nodes. If iterable of strings, then multiple attributes of the graph’s nodes are specified. If dict, then each key is a node and each value the corresponding attribute value. (This format is also this function’s return format.)
  • array_values (bool, default: False) – If True, then each value is transformed into a numpy.ndarray.
Returns:

result_dict – Each key is a node in the graph. If array_values is False, then each value is a list of attribute values corresponding to the key node. If array_values is True, then each value this list of attribute values is turned into a numpy.ndarray. That requires the values to be shape-compatible for stacking.

Return type:

dict

Examples

>>> import networkx as nx
>>> edges = [(0, 1), (1, 2),          # 0 | 1 | 2
...          (0, 3), (1, 4), (2, 5),  # ---------
...          (3, 4), (4,5)]           # 3 | 4 | 5
>>> graph = nx.Graph(edges)
>>> data_dict = {node: 10*node for node in graph}
>>> nx.set_node_attributes(graph, "test_data", data_dict)
>>> desired = {key: [value] for key, value in data_dict.items()}
>>> dict_from_graph_attr(graph, "test_data") == desired
True
>>> dict_from_graph_attr(graph, ["test_data"]) == desired
True
region.util.distribute_regions_among_components(component_labels, n_regions)
Parameters:
  • component_labels (list) –

    Each element specifies to which connected component an area belongs. An example would be [0, 0, 1, 0, 0, 1] for the following two islands:

    island one        island two
    .-------.         .---.
    | 0 | 1 |         | 2 |
    | - - - |         | - |
    | 3 | 4 |         | 5 |
    `-------´         `---´
    
  • n_regions (int) –
Returns:

result_dict – Each key is a label of a connected component. Each value specifies into how many regions the component is to be clustered.

Return type:

Dict[int, int]

region.util.find_sublist_containing(el, lst, index=False)
Parameters:
  • el – The element to search for in the sublists of lst.
  • lst (collections.Sequence) – A sequence of sequences or sets.
  • index (bool, default: False) – If False (default), the subsequence or subset containing el is returned. If True, the index of the subsequence or subset in lst is returned.
Returns:

result – See the index argument for more information.

Return type:

collections.Sequence, collections.Set or int

Raises:

exc : LookupError – If el is not in any of the elements of lst.

Examples

>>> lst = [{0, 1}, {2}]
>>> find_sublist_containing(0, lst, index=False) == {0, 1}
True
>>> find_sublist_containing(0, lst, index=True) == 0
True
>>> find_sublist_containing(2, lst, index=False) == {2}
True
>>> find_sublist_containing(2, lst, index=True) == 1
True
region.util.generate_initial_sol(adj, n_regions)

Generate a random initial clustering.

Parameters:
Yields:

region_labels (numpy.ndarray) – An array with -1 for areas which are not part of the yielded component and an integer >= 0 specifying the region of areas within the yielded component.

region.util.get_metric_function(metric=None)
Parameters:metric (str or function or None, default: None) –

Using None is equivalent to using “euclidean”.

If str, then this string specifies the distance metric (from scikit-learn) to use for calculating the objective function. Possible values are:

  • ”cityblock” for sklearn.metrics.pairwise.manhattan_distances
  • ”cosine” for sklearn.metrics.pairwise.cosine_distances
  • ”euclidean” for sklearn.metrics.pairwise.euclidean_distances
  • ”l1” for sklearn.metrics.pairwise.manhattan_distances
  • ”l2” for sklearn.metrics.pairwise.euclidean_distances
  • ”manhattan” for sklearn.metrics.pairwise.manhattan_distances

If function, then this function should take two arguments and return a scalar value. Furthermore, the following conditions must be fulfilled:

  1. d(a, b) >= 0, for all a and b
  2. d(a, b) == 0, if and only if a = b, positive definiteness
  3. d(a, b) == d(b, a), symmetry
  4. d(a, c) <= d(a, b) + d(b, c), the triangle inequality
Returns:metric_func – If the metric argument is a function, it is returned. If the metric argument is a string, then the corresponding distance metric function from sklearn.metrics.pairwise is returned.
Return type:function
region.util.get_solver_instance(solver_string)
region.util.make_move(moving_area, new_label, labels)

Modify the labels argument in place (no return value!) such that the area moving_area has the new region label new_label.

Parameters:
  • moving_area – The area to be moved (assigned to a new region).
  • new_label (int) – The new region label of area moving_area.
  • labels (numpy.ndarray) – Each element is a region label of the area corresponding array index.

Examples

>>> import numpy as np
>>> labels = np.array([0, 0, 0, 0, 1, 1])
>>> make_move(3, 1, labels)
>>> (labels == np.array([0, 0, 0, 1, 1, 1])).all()
True
region.util.pop_randomly_from(lst)
region.util.raise_distance_metric_not_set(x, y)
region.util.random_element_from(lst)
region.util.scipy_sparse_matrix_from_dict(neighbors)
Parameters:neighbors (dict) – Each key represents an area. The corresponding value contains the area’s neighbors.
Returns:adj – Adjacency matrix representing the areas’ contiguity relation.
Return type:scipy.sparse.csr_matrix

Examples

>>> neighbors = {0: {1, 3}, 1: {0, 2, 4}, 2: {1, 5},
...              3: {0, 4}, 4: {1, 3, 5}, 5: {2, 4}}
>>> obtained = scipy_sparse_matrix_from_dict(neighbors)
>>> desired = np.array([[0, 1, 0, 1, 0, 0],
...                     [1, 0, 1, 0, 1, 0],
...                     [0, 1, 0, 0, 0, 1],
...                     [1, 0, 0, 0, 1, 0],
...                     [0, 1, 0, 1, 0, 1],
...                     [0, 0, 1, 0, 1, 0]])
>>> (obtained.todense() == desired).all()
True
>>> neighbors = {"left": {"middle"},
...              "middle": {"left", "right"},
...              "right": {"middle"}}
>>> obtained = scipy_sparse_matrix_from_dict(neighbors)
>>> desired = np.array([[0, 1, 0],
...                     [1, 0, 1],
...                     [0, 1, 0]])
>>> (obtained.todense() == desired).all()
True
region.util.scipy_sparse_matrix_from_w(w)
Parameters:w (libpysal.weights.weights.W) – A W object representing the areas’ contiguity relation.
Returns:adj – Adjacency matrix representing the areas’ contiguity relation.
Return type:scipy.sparse.csr_matrix

Examples

>>> import libpysal as ps
>>> neighbor_dict = {0: {1}, 1: {0, 2}, 2: {1}}
>>> w = ps.weights.W(neighbor_dict)
>>> obtained = scipy_sparse_matrix_from_w(w)
>>> desired = np.array([[0, 1, 0],
...                     [1, 0, 1],
...                     [0, 1, 0]])
>>> (obtained.todense() == desired).all()
True
region.util.separate_components(adj, labels)

Take a labels array and yield modifications of it (one modified array per connected component). The modified array will be unchanged at those indices belonging to the current connected component. Thus it will have integers >= 0 there. At all other indices the Yielded array will be -1.

Parameters:
Yields:

comp_dict (numpy.ndarray) – Each yielded dict represents one connected component of the graph specified by the adj argument. In a yielded dict, each key is an area and each value is the corresponding region-ID.

Examples

>>> edges_island1 = [(0, 1), (1, 2),          # 0 | 1 | 2
...                  (0, 3), (1, 4), (2, 5),  # ---------
...                  (3, 4), (4,5)]           # 3 | 4 | 5
>>>
>>> edges_island2 = [(6, 7),                  # 6 | 7
...                  (6, 8), (7, 9),          # -----
...                  (8, 9)]                  # 8 | 9
>>>
>>> graph = nx.Graph(edges_island1 + edges_island2)
>>> adj = nx.to_scipy_sparse_matrix(graph)
>>>
>>> # island 1: island divided into regions 0, 1, and 2
>>> sol_island1 = [area%3 for area in range(6)]
>>> # island 2: all areas are in region 3
>>> sol_island2 = [3 for area in range(6, 10)]
>>> labels = np.array(sol_island1 + sol_island2)
>>>
>>> yielded = list(separate_components(adj, labels))
>>> yielded.sort(key=lambda arr: arr[0], reverse=True)
>>> (yielded[0] == np.array([0, 1, 2, 0, 1, 2, -1, -1, -1, -1])).all()
True
>>> (yielded[1] == np.array([-1, -1, -1, -1, -1, -1, 3, 3, 3, 3])).all()
True
region.util.w_from_gdf(gdf, contiguity)

Get a W object from a GeoDataFrame.

Parameters:
  • gdf (GeoDataFrame) –
  • contiguity ({"rook", "queen"}) –
Returns:

weights – The contiguity information contained in the gdf argument in the form of a W object.

Return type:

W

Indices and tables