Welcome to region’s documentation!¶
Region¶
This is a package for spatially-explicit regionalization. It is currently under active development.
Installation¶
Download & install¶
You can clone the github-repo and install region using:
git clone https://github.com/pysal/region.git
cd region
python3 setup.py
Check your installation¶
To see whether the installation was successful, start a Python 3 session and run
>>> import region
If this command doesn’t raise an ImportError
, then region has been installed properly.
User Guide¶
Introduction¶
Regionalization or spatial clustering¶
Let’s assume we have a map of areas and each area has an attribute (or a set of attributes) attached to it. Let’s further assume we want to cluster the areas according to the attribute(s) into larger spatial units – let’s call these larger spatial units regions. If we used a conventional clustering algorithm like the K-Means, we would often end up with regions that are not spatially contiguous (that means the regions wouldn’t form a connected set of areas). Sometimes this is not what we want and we are rather interested in a clustering under contiguity restrictions. Other terms for this task are regionalization and spatial clustering.
This sort of clustering is often one of several steps when analysing geographical data. For example, it helps condense large amounts of data, thus facilitating calculations and improving the readability of visualizations.
region¶
The region package offers various clustering algorithms which can be used with Python 3. The algorithms work with a range of common representations of your map, this means you can use geopandas’ GeoDataFrame
s, PySAL’s W
objects, networkx’ Graph
s, sparse (adjacency) matrices, or simple dictionaries.
With the p-regions problem and the max-p-regions problem, region addresses different regionalization tasks.
The p-Regions Problem¶
The task¶
The p-regions problem is defined as a regionalization task where the number of regions (clusters) is known. This number is often denoted by p in the literature, hence the name of the problem. The goal is to construct regions with a large degree of similarity among a region’s areas and with large dissimilarities between regions. Furthermore, each region has to consist of spatially contiguous areas. This condition distinguishes regionalization algorithms from conventional clustering algorithms like the k-Means-algorithm.
There are many approaches to solving the p-regions problem. One way to solve it is to translate it into an mixed integer program and solve it exactly while another category of algorithms comprises heuristic approaches.
An example¶
Let’s have a look at an example. Assume we have a map of 3 x 3 areas. Each area has an attribute and the situation looks as follows:
726.7 | 623.6 | 487.3
----------------------
200.4 | 245.0 | 481.0
----------------------
170.9 | 225.9 | 226.9
Furthermore, assume we want to cluster the areas into two regions. A possible solution could have all areas with an attribute value greater than 300 clustered into one region. The remaining areas would then form the other region. Here we see our nine areas again but this time each number denotes what region an area belongs to:
1 | 1 | 1
----------------------
0 | 0 | 1
----------------------
0 | 0 | 0
Defining the above input data, calculating the solution and plotting two maps (with the color once indicating the areas’ attribute values and once their calculated region) takes only a few lines of code – check it out.
Exact methods¶
[DCM2011] describe three different ways to translate a p-regions problem into exact optimization models and call these approaches Flow, Order, and Tree respectively. While the objective function of these models ensures maximum homogeneity within the regions, the models’ constraints ensure that the resulting regions are spatially contiguous. Implementations for all of the three approaches are available through the region.p_regions.exact.PRegionsExact
class. For detailed information on the three different methods please refer to [DCM2011].
Heuristic methods¶
While exact approaches ensure optimality, they can be rather slow in delivering the solution. That’s why – especially for clustering tasks including many areas – it may be preferable to use an heuristic approach. Many heuristic algorithms have been designed aiming for a high probability of a good solution while keeping the run times low.
This category of algorithms includes the AZP described in [OR1995]. One drawback of this algorithm is that it can get caught in a local optimum and thus may fail to reach the global optimum. That’s why [OR1995] also present three more algorithms which aim to improve results. These algorithms use simulated annealing, a basic tabu queue, or a reactive tabu queue. You can find the implementations in the region.p_regions.azp
module. For detailed information on the four different algorithms please refer to [OR1995].
The Max-p-Regions Problem¶
The task¶
As the p-regions problem the max-p-regions problem is about finding a clustering which satisfies the spatial contiguity condition. A key difference is that in the max-p-regions problem there is no predefined number of regions. The primary goal is to find a clustering that has as many regions as possible. In order to avoid a clustering in which every area forms its own region there is another condition. It is stated by using so called spatial extensive attributes which each area has. The condition requires that the sum of these spatial extensive attributes reaches or exceeds a predefined threshold.
When the maximum number of regions is found, a second goal has to be met, namely to find the best clustering with the maximum number of regions. Here, optimality is defined by the areas’ attributes. (These attributes may – and in most applications will – be different from the spatial extensive attributes mentioned above.) The max-p-regions problem can be solved using either an exact or a heuristic approach. For a detailed description please refer to [DAR2012].
Exact methods¶
[DAR2012] shows a way to translate the max-p-regions problem into an integer programming problem. It is implemented in the region.max_p_regions.exact.MaxPRegionsExact
class.
Heuristic methods¶
Since solving the problem exactly may be very slow, [DAR2012] also suggests an heuristic approach. This algorithm involves two steps - a so called construction phase and a local search phase. You can find the implementation in the region.max_p_regions.heuristics.MaxPRegionsHeu
class. Please note that region’s implementation uses a modified heuristic p-regions-algorithm for the local search phase. The modification ensures the threshold condition on the spatial extensive attributes is met.
region - package overview¶
region package¶
Subpackages¶
region.max_p_regions package¶
Subpackages¶
The data in this file describes the max-p-regions problem depicted in figure 2 on p. 402 in [DAR2012].
-
region.max_p_regions.tests.test_exact.
test_dict_basic
()¶
-
region.max_p_regions.tests.test_exact.
test_dict_multi_attr
()¶
-
region.max_p_regions.tests.test_exact.
test_geodataframe_basic
()¶
-
region.max_p_regions.tests.test_exact.
test_geodataframe_multi_attr
()¶
-
region.max_p_regions.tests.test_exact.
test_graph_dict_basic
()¶
-
region.max_p_regions.tests.test_exact.
test_graph_dict_multi_attr
()¶
-
region.max_p_regions.tests.test_exact.
test_graph_str_basic
()¶
-
region.max_p_regions.tests.test_exact.
test_graph_str_multi_attr
()¶
-
region.max_p_regions.tests.test_exact.
test_scipy_sparse_matrix
()¶
-
region.max_p_regions.tests.test_exact.
test_scipy_sparse_matrix_multi_attr
()¶
-
region.max_p_regions.tests.test_exact.
test_w_basic
()¶
-
region.max_p_regions.tests.test_exact.
test_w_multi_attr
()¶
-
region.max_p_regions.tests.test_heu.
test_dict_basic
()¶
-
region.max_p_regions.tests.test_heu.
test_dict_multi_attr
()¶
-
region.max_p_regions.tests.test_heu.
test_geodataframe_basic
()¶
-
region.max_p_regions.tests.test_heu.
test_geodataframe_multi_attr
()¶
-
region.max_p_regions.tests.test_heu.
test_graph_dict_basic
()¶
-
region.max_p_regions.tests.test_heu.
test_graph_dict_multi_attr
()¶
-
region.max_p_regions.tests.test_heu.
test_graph_str_basic
()¶
-
region.max_p_regions.tests.test_heu.
test_graph_str_multi_attr
()¶
-
region.max_p_regions.tests.test_heu.
test_scipy_sparse_matrix
()¶
-
region.max_p_regions.tests.test_heu.
test_scipy_sparse_matrix_multi_attr
()¶
-
region.max_p_regions.tests.test_heu.
test_w_basic
()¶
-
region.max_p_regions.tests.test_heu.
test_w_multi_attr
()¶
Submodules¶
-
class
region.max_p_regions.exact.
MaxPRegionsExact
¶ Bases:
object
A class for solving the max-p-regions problem by transforming it into a mixed-integer-programming problem (MIP) as described in [DAR2012].
-
labels_
¶ dict – Each key is an area and each value the region it has been assigned to.
-
fit
(adj, attr, spatially_extensive_attr, threshold, solver='cbc', metric='euclidean')¶ Alias for
fit_from_scipy_sparse_matrix()
.Solve the max-p-regions problem as MIP as described in [DAR2012].
The resulting region labels are assigned to the instance’s
labels_
attribute.Parameters: - adj (class:scipy.sparse.csr_matrix) – Adjacency matrix representing the areas’ contiguity relation.
- attr (
numpy.ndarray
) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering. - spatially_extensive_attr (
numpy.ndarray
) – Array (number of areas x number of attributes) of areas’ attributes relevant to ensuring the threshold condition. - threshold (numbers.Real or
numpy.ndarray
) – The lower bound for a region’s sum of spatially extensive attributes. The argument’s type is numbers.Real if there is only one spatially extensive attribute per area, otherwise it is a one-dimensional array with as many entries as there are spatially extensive attributes per area. - solver ({"cbc", "cplex", "glpk", "gurobi"}, default: "cbc") –
The solver to use. Unless the default solver is used, the user has to make sure that the specified solver is installed.
- ”cbc” - the Cbc (Coin-or branch and cut) solver
- ”cplex” - the CPLEX solver
- ”glpk” - the GLPK (GNU Linear Programming Kit) solver
- ”gurobi” - the Gurobi Optimizer
- metric (str or function, default: "euclidean") – See the metric argument in
region.util.get_metric_function()
.
-
fit_from_dict
(neighbors_dict, attr, spatially_extensive_attr, threshold, solver='cbc', metric='euclidean')¶ Alternative API for
fit_from_scipy_sparse_matrix()
.Parameters: - neighbors_dict (dict) – Each key represents an area and each value is an iterable of neighbors of this area.
- attr (dict) – A dict with the same keys as neighbors_dict and values
representing the attributes for calculating homo-/heterogeneity. A
value can be scalar (e.g. float or int) or a
numpy.ndarray
. - spatially_extensive_attr (dict) – A dict with the same keys as neighbors_dict and values representing the spatially extensive attribute (scalar or iterable of scalars). In the max-p-regions problem each region’s sum of spatially extensive attributes must be greater than a specified threshold. In case of iterables of scalars as dict-values all elements of the iterable have to fulfill the condition.
- threshold (numbers.Real or
numpy.ndarray
) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
. - solver ({"cbc", "cplex", "glpk", "gurobi"}, default: "cbc") – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - metric (str or function, default: "euclidean") – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
.
-
fit_from_geodataframe
(gdf, attr, spatially_extensive_attr, threshold, solver='cbc', metric='euclidean', contiguity='rook')¶ Alternative API for
fit_from_scipy_sparse_matrix()
.Parameters: - gdf (GeoDataFrame) –
- attr (str or list) – The clustering criteria (columns of the GeoDataFrame gdf) are specified as string (for one column) or list of strings (for multiple columns).
- spatially_extensive_attr (str or list) – The name (str) or names (list of strings) of column(s) in gdf containing the spatially extensive attributes.
- threshold (numbers.Real or
numpy.ndarray
) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
. - solver ({"cbc", "cplex", "glpk", "gurobi"}, default: "cbc") – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - metric (str or function, default: "euclidean") – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - contiguity ({"rook", "queen"}, default: "rook") –
Defines the contiguity relationship between areas. Possible contiguity definitions are:
- ”rook” - Rook contiguity.
- ”queen” - Queen contiguity.
-
fit_from_networkx
(graph, attr, spatially_extensive_attr, threshold, solver='cbc', metric='euclidean')¶ Alternative API for
fit_from_scipy_sparse_matrix()
.Parameters: - graph (networkx.Graph) –
- attr (str, list or dict) – If the clustering criteria are present in the networkx.Graph
graph as node attributes, then they can be specified as a string
(for one criterion) or as a list of strings (for multiple
criteria).
Alternatively, a dict can be used with each key being a node of the
networkx.Graph graph and each value being the corresponding
clustering criterion (a scalar (e.g. float or int) or a
numpy.ndarray
). If there are no clustering criteria present in the networkx.Graph graph as node attributes, then a dictionary must be used for this argument. Refer to the corresponding argument infit_from_dict()
for more details about the expected dict. - spatially_extensive_attr (str, list or dict) – If the spatially extensive attribute is present in the
networkx.Graph graph as node attributes, then they can be
specified as a string (for one attribute) or as a list of
strings (for multiple attributes).
Alternatively, a dict can be used with each key being a node of the
networkx.Graph graph and each value being the corresponding
spatially extensive attribute (a scalar (e.g. float or int) or
a
numpy.ndarray
). If there are no spatially extensive attributes present in the networkx.Graph graph as node attributes, then a dictionary must be used for this argument. Refer to the corresponding argument infit_from_dict()
for more details about the expected dict. - threshold (numbers.Real or
numpy.ndarray
) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
. - solver ({"cbc", "cplex", "glpk", "gurobi"}, default: "cbc") – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - metric (str or function, default: "euclidean") – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
.
-
fit_from_scipy_sparse_matrix
(adj, attr, spatially_extensive_attr, threshold, solver='cbc', metric='euclidean')¶ Solve the max-p-regions problem as MIP as described in [DAR2012].
The resulting region labels are assigned to the instance’s
labels_
attribute.Parameters: - adj (class:scipy.sparse.csr_matrix) – Adjacency matrix representing the areas’ contiguity relation.
- attr (
numpy.ndarray
) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering. - spatially_extensive_attr (
numpy.ndarray
) – Array (number of areas x number of attributes) of areas’ attributes relevant to ensuring the threshold condition. - threshold (numbers.Real or
numpy.ndarray
) – The lower bound for a region’s sum of spatially extensive attributes. The argument’s type is numbers.Real if there is only one spatially extensive attribute per area, otherwise it is a one-dimensional array with as many entries as there are spatially extensive attributes per area. - solver ({"cbc", "cplex", "glpk", "gurobi"}, default: "cbc") –
The solver to use. Unless the default solver is used, the user has to make sure that the specified solver is installed.
- ”cbc” - the Cbc (Coin-or branch and cut) solver
- ”cplex” - the CPLEX solver
- ”glpk” - the GLPK (GNU Linear Programming Kit) solver
- ”gurobi” - the Gurobi Optimizer
- metric (str or function, default: "euclidean") – See the metric argument in
region.util.get_metric_function()
.
-
fit_from_w
(w, attr, spatially_extensive_attr, threshold, solver='cbc', metric='euclidean')¶ Alternative API for
fit_from_scipy_sparse_matrix()
.Parameters: - w (libpysal.weights.W) – W object representing the areas’ contiguity relation.
- attr (
numpy.ndarray
) – See the corresponding argument infit_from_scipy_sparse_matrix()
. - spatially_extensive_attr (
numpy.ndarray
) – See the corresponding argument infit_from_scipy_sparse_matrix()
. - threshold (numbers.Real or
numpy.ndarray
) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
. - solver ({"cbc", "cplex", "glpk", "gurobi"}, default: "cbc") – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - metric (str or function, default: "euclidean") – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
.
-
-
class
region.max_p_regions.heuristics.
MaxPRegionsHeu
(local_search=None, random_state=None)¶ Bases:
object
-
assign_enclaves
(partition, enclave_areas, neigh_dict, attr)¶ Start with a partial partition (not all areas are assigned to a region) and a list of enclave areas (i.e. areas not present in the partial partition). Then assign all enclave areas to regions in the partial partition and return the resulting partition.
Parameters: - partition (list) – Each element (of type set) represents a region.
- enclave_areas (list) – Each element represents an area.
- neigh_dict (dict) – Each key represents an area. Each value is an iterable of the corresponding neighbors.
- attr (
numpy.ndarray
) – See the corresponding argument infit_from_scipy_sparse_matrix()
.
Returns: partition – Each element (of type set) represents a region.
Return type: list
-
find_best_area
(region, candidates, attr)¶ Parameters: - region (iterable) – Each element represents an area.
- candidates (iterable) – Each element represents an area bordering on region.
- attr (
numpy.ndarray
) – See the corresponding argument infit_from_scipy_sparse_matrix()
.
Returns: An element of candidates with minimal dissimilarity when being moved to the region region.
Return type: best_area
-
find_best_region_idx
(area, partition, candidate_regions_idx, attr)¶ Parameters: - area – The area to be moved to one of the regions specified by candidate_regions_idx.
- partition (list) – Each element (of type set) represents a region.
- candidate_regions_idx (iterable) – Each element is the index of a region in the partition list.
- attr (
numpy.ndarray
) – See the corresponding argument infit_from_scipy_sparse_matrix()
.
Returns: best_idx – The index of a region (w.r.t. partition) which has the smallest sum of dissimilarities after area area is moved to the region.
Return type:
-
fit
(adj, attr, spatially_extensive_attr, threshold, max_it=10, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Alias for
fit_from_scipy_sparse_matrix()
.Solve the max-p-regions problem in a heuristic way (see [DAR2012]).
The resulting region labels are assigned to the instance’s
labels_
attribute.Parameters: - adj (
scipy.sparse.csr_matrix
) – Adjacency matrix representing the areas’ contiguity relation. - attr (
numpy.ndarray
) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering. - spatially_extensive_attr (
numpy.ndarray
) – Array (number of areas x number of attributes) of areas’ attributes relevant to ensuring the threshold condition. - threshold (numbers.Real or
numpy.ndarray
) – The lower bound for a region’s sum of spatially extensive attributes. The argument’s type is numbers.Real if there is only one spatially extensive attribute per area, otherwise it is a one-dimensional array with as many entries as there are spatially extensive attributes per area. - max_it (int, default: 10) – The maximum number of partitions produced in the algorithm’s construction phase.
- objective_func (
region.objective_function.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – The objective function to use.
- adj (
-
fit_from_dict
(neighbors_dict, attr, spatially_extensive_attr, threshold, max_it=10, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Solve the max-p-regions problem in a heuristic way (see [DAR2012]).
The resulting region labels are assigned to the instance’s
labels_
attribute.Parameters: - neighbors_dict (dict) – Each key represents an area and each value is an iterable of neighbors of this area.
- attr (dict) – A dict with the same keys as neighbors_dict and values
representing the attributes for calculating homo-/heterogeneity. A
value can be scalar (e.g. float or int) or a
numpy.ndarray
. - spatially_extensive_attr (dict) – A dict with the same keys as neighbors_dict and values representing the spatially extensive attribute (scalar or iterable of scalars). In the max-p-regions problem each region’s sum of spatially extensive attributes must be greater than a specified threshold. In case of iterables of scalars as dict-values all elements of the iterable have to fulfill the condition.
- threshold (numbers.Real or
numpy.ndarray
) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
. - max_it (int, default: 10) – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - objective_func (
region.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
.
-
fit_from_geodataframe
(gdf, attr, spatially_extensive_attr, threshold, max_it=10, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>, contiguity='rook')¶ Alternative API for
fit_from_scipy_sparse_matrix()
.Parameters: - gdf (
geopandas.GeoDataFrame
) – - attr (str or list) – The clustering criteria (columns of the GeoDataFrame gdf) are specified as string (for one column) or list of strings (for multiple columns).
- spatially_extensive_attr (str or list) – The name (str) or names (list of strings) of column(s) in gdf containing the spatially extensive attributes.
- threshold (numbers.Real or
numpy.ndarray
) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
. - max_it (int, default: 10) – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - objective_func (
region.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
. - contiguity ({"rook", "queen"}, default: "rook") –
Defines the contiguity relationship between areas. Possible contiguity definitions are:
- ”rook” - Rook contiguity.
- ”queen” - Queen contiguity.
- gdf (
-
fit_from_networkx
(graph, attr, spatially_extensive_attr, threshold, max_it=10, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Alternative API for
fit_from_scipy_sparse_matrix()
.Parameters: - graph (networkx.Graph) –
- attr (str, list or dict) – If the clustering criteria are present in the networkx.Graph
graph as node attributes, then they can be specified as a string
(for one criterion) or as a list of strings (for multiple
criteria).
Alternatively, a dict can be used with each key being a node of the
networkx.Graph graph and each value being the corresponding
clustering criterion (a scalar (e.g. float or int) or a
numpy.ndarray
). If there are no clustering criteria present in the networkx.Graph graph as node attributes, then a dictionary must be used for this argument. Refer to the corresponding argument infit_from_dict()
for more details about the expected dict. - spatially_extensive_attr (str, list or dict) – If the spatially extensive attribute is present in the
networkx.Graph graph as node attributes, then they can be
specified as a string (for one attribute) or as a list of
strings (for multiple attributes).
Alternatively, a dict can be used with each key being a node of the
networkx.Graph graph and each value being the corresponding
spatially extensive attribute (a scalar (e.g. float or int) or
a
numpy.ndarray
). If there are no spatially extensive attributes present in the networkx.Graph graph as node attributes, then a dictionary must be used for this argument. Refer to the corresponding argument infit_from_dict()
for more details about the expected dict. - threshold (numbers.Real or
numpy.ndarray
) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
. - max_it (int, default: 10) – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - objective_func (
region.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
.
-
fit_from_scipy_sparse_matrix
(adj, attr, spatially_extensive_attr, threshold, max_it=10, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Solve the max-p-regions problem in a heuristic way (see [DAR2012]).
The resulting region labels are assigned to the instance’s
labels_
attribute.Parameters: - adj (
scipy.sparse.csr_matrix
) – Adjacency matrix representing the areas’ contiguity relation. - attr (
numpy.ndarray
) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering. - spatially_extensive_attr (
numpy.ndarray
) – Array (number of areas x number of attributes) of areas’ attributes relevant to ensuring the threshold condition. - threshold (numbers.Real or
numpy.ndarray
) – The lower bound for a region’s sum of spatially extensive attributes. The argument’s type is numbers.Real if there is only one spatially extensive attribute per area, otherwise it is a one-dimensional array with as many entries as there are spatially extensive attributes per area. - max_it (int, default: 10) – The maximum number of partitions produced in the algorithm’s construction phase.
- objective_func (
region.objective_function.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – The objective function to use.
- adj (
-
fit_from_w
(w, attr, spatially_extensive_attr, threshold, max_it=10, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Alternative API for
fit_from_scipy_sparse_matrix()
.Parameters: - w (
libpysal.weights.weights.W
) – W object representing the contiguity relation. - attr (
numpy.ndarray
) – Each element specifies an area’s attribute which is used for calculating the objective function. - spatially_extensive_attr (
numpy.ndarray
) – Each element specifies an area’s spatially extensive attribute which is used to ensure that the sum of spatially extensive attributes in each region adds up to a threshold defined by the threshold argument. - threshold (numbers.Real or
numpy.ndarray
) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
. - max_it (int, default: 10) – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - objective_func (
region.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
.
- w (
-
grow_regions
(adj, attr, spatially_extensive_attr, threshold)¶ Parameters: - adj (
scipy.sparse.csr_matrix
) – See the corresponding argument infit_from_scipy_sparse_matrix()
. - attr (
numpy.ndarray
) – See the corresponding argument infit_from_scipy_sparse_matrix()
. - spatially_extensive_attr (
numpy.ndarray
) – See the corresponding argument infit_from_scipy_sparse_matrix()
. - threshold (numbers.Real or
numpy.ndarray
) – See the corresponding argument infit_from_scipy_sparse_matrix()
.
Returns: result – result[0] is a list. Each list element is a set of a region’s areas. Note that not every area is assigned to a region after this function has terminated, so they won’t be in any of the set`s in `result[0]. result[1] is a list of areas not assigned to any region.
Return type: tuple
- adj (
-
region.p_regions package¶
Subpackages¶
The data in this file describes the p-regions problem depicted in figure 1 on p. 106 in [DCM2011].
-
region.p_regions.tests.test_azp.
test_dict
()¶
-
region.p_regions.tests.test_azp.
test_dict_multi_attr
()¶
-
region.p_regions.tests.test_azp.
test_geodataframe
()¶
-
region.p_regions.tests.test_azp.
test_geodataframe_multi_attr
()¶
-
region.p_regions.tests.test_azp.
test_graph_dict_basic
()¶
-
region.p_regions.tests.test_azp.
test_graph_dict_multi_attr
()¶
-
region.p_regions.tests.test_azp.
test_graph_str_basic
()¶
-
region.p_regions.tests.test_azp.
test_graph_str_multi_attr
()¶
-
region.p_regions.tests.test_azp.
test_scipy_sparse_matrix
()¶
-
region.p_regions.tests.test_azp.
test_scipy_sparse_matrix_multi_attr
()¶
-
region.p_regions.tests.test_azp.
test_w_basic
()¶
-
region.p_regions.tests.test_azp.
test_w_multi_attr
()¶
-
region.p_regions.tests.test_azp_basic_tabu.
test_dict
()¶
-
region.p_regions.tests.test_azp_basic_tabu.
test_dict_multi_attr
()¶
-
region.p_regions.tests.test_azp_basic_tabu.
test_geodataframe
()¶
-
region.p_regions.tests.test_azp_basic_tabu.
test_geodataframe_multi_attr
()¶
-
region.p_regions.tests.test_azp_basic_tabu.
test_graph_dict_basic
()¶
-
region.p_regions.tests.test_azp_basic_tabu.
test_graph_dict_multi_attr
()¶
-
region.p_regions.tests.test_azp_basic_tabu.
test_graph_str_basic
()¶
-
region.p_regions.tests.test_azp_basic_tabu.
test_graph_str_multi_attr
()¶
-
region.p_regions.tests.test_azp_basic_tabu.
test_scipy_sparse_matrix
()¶
-
region.p_regions.tests.test_azp_basic_tabu.
test_scipy_sparse_matrix_multi_attr
()¶
-
region.p_regions.tests.test_azp_basic_tabu.
test_w_basic
()¶
-
region.p_regions.tests.test_azp_basic_tabu.
test_w_multi_attr
()¶
-
region.p_regions.tests.test_azp_reactive_tabu.
test_dict
()¶
-
region.p_regions.tests.test_azp_reactive_tabu.
test_dict_multi_attr
()¶
-
region.p_regions.tests.test_azp_reactive_tabu.
test_geodataframe
()¶
-
region.p_regions.tests.test_azp_reactive_tabu.
test_geodataframe_multi_attr
()¶
-
region.p_regions.tests.test_azp_reactive_tabu.
test_graph_dict_basic
()¶
-
region.p_regions.tests.test_azp_reactive_tabu.
test_graph_dict_multi_attr
()¶
-
region.p_regions.tests.test_azp_reactive_tabu.
test_graph_str_basic
()¶
-
region.p_regions.tests.test_azp_reactive_tabu.
test_graph_str_multi_attr
()¶
-
region.p_regions.tests.test_azp_reactive_tabu.
test_scipy_sparse_matrix
()¶
-
region.p_regions.tests.test_azp_reactive_tabu.
test_scipy_sparse_matrix_multi_attr
()¶
-
region.p_regions.tests.test_azp_reactive_tabu.
test_w_basic
()¶
-
region.p_regions.tests.test_azp_reactive_tabu.
test_w_multi_attr
()¶
-
region.p_regions.tests.test_azp_simulated_annealing.
test_dict
()¶
-
region.p_regions.tests.test_azp_simulated_annealing.
test_dict_multi_attr
()¶
-
region.p_regions.tests.test_azp_simulated_annealing.
test_geodataframe
()¶
-
region.p_regions.tests.test_azp_simulated_annealing.
test_geodataframe_multi_attr
()¶
-
region.p_regions.tests.test_azp_simulated_annealing.
test_graph_dict_basic
()¶
-
region.p_regions.tests.test_azp_simulated_annealing.
test_graph_dict_multi_attr
()¶
-
region.p_regions.tests.test_azp_simulated_annealing.
test_graph_str_basic
()¶
-
region.p_regions.tests.test_azp_simulated_annealing.
test_graph_str_multi_attr
()¶
-
region.p_regions.tests.test_azp_simulated_annealing.
test_scipy_sparse_matrix
()¶
-
region.p_regions.tests.test_azp_simulated_annealing.
test_scipy_sparse_matrix_multi_attr
()¶
-
region.p_regions.tests.test_azp_simulated_annealing.
test_w_basic
()¶
-
region.p_regions.tests.test_azp_simulated_annealing.
test_w_multi_attr
()¶
-
region.p_regions.tests.test_exact.
method
(request)¶
-
region.p_regions.tests.test_exact.
test_dict
(method)¶
-
region.p_regions.tests.test_exact.
test_dict_multi_attr
(method)¶
-
region.p_regions.tests.test_exact.
test_geodataframe
(method)¶
-
region.p_regions.tests.test_exact.
test_geodataframe_multi_attr
(method)¶
-
region.p_regions.tests.test_exact.
test_graph_dict_basic
(method)¶
-
region.p_regions.tests.test_exact.
test_graph_dict_multi_attr
(method)¶
-
region.p_regions.tests.test_exact.
test_graph_str_basic
(method)¶
-
region.p_regions.tests.test_exact.
test_graph_str_multi_attr
(method)¶
-
region.p_regions.tests.test_exact.
test_scipy_sparse_matrix
(method)¶
-
region.p_regions.tests.test_exact.
test_scipy_sparse_matrix_multi_attr
(method)¶
-
region.p_regions.tests.test_exact.
test_w_basic
(method)¶
-
region.p_regions.tests.test_exact.
test_w_multi_attr
(method)¶
Submodules¶
-
class
region.p_regions.azp.
AZP
(allow_move_strategy=None, random_state=None)¶ Bases:
object
Class offering the implementation of the AZP algorithm (see [OR1995]).
-
labels_
¶ numpy.ndarray
– Each element is a region label specifying to which region the corresponding area was assigned to by the last run of a fit-method.
-
fit
(adj, attr, n_regions, initial_labels=None, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Alias for
fit_from_scipy_sparse_matrix()
.Perform the AZP algorithm as described in [OR1995].
The resulting region labels are assigned to the instance’s
labels_
attribute.Parameters: - adj (
scipy.sparse.csr_matrix
) – Adjacency matrix representing the contiguity relation. - attr (
numpy.ndarray
) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering. - n_regions (int) – Number of desired regions.
- initial_labels (
numpy.ndarray
or None, default: None) – One-dimensional array of labels at the beginning of the algorithm. If None, then a random initial clustering will be generated. - objective_func (
region.objective_function.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – The objective function to use.
- adj (
-
fit_from_dict
(neighbor_dict, attr, n_regions, initial_labels=None, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Alternative API for
fit_from_scipy_sparse_matrix()
.Parameters: - neighbor_dict (dict) – Each key is an area and each value is an iterable of the key area’s neighbors.
- attr (dict) – Each key is an area and each value is the corresponding clustering-relevant attribute.
- n_regions (int) – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - initial_labels (dict or None, default: None) – Each key represents an area. Each value represents the region, the corresponding area is assigned to at the beginning of the algorithm. If None, then a random initial clustering will be generated.
- objective_func (
region.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
.
-
fit_from_geodataframe
(gdf, attr, n_regions, contiguity='rook', initial_labels=None, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Alternative API for
fit_from_scipy_sparse_matrix()
.Parameters: - gdf (
geopandas.GeoDataFrame
) – - attr (str or list) – The clustering-relevant attributes (columns of the GeoDataFrame gdf) are specified as string (for one column) or list of strings (for multiple columns).
- n_regions (int) – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - contiguity ({"rook", "queen"}, default: "rook") –
Defines the contiguity relationship between areas. Possible contiguity definitions are:
- ”rook” - Rook contiguity.
- ”queen” - Queen contiguity.
- initial_labels (
numpy.ndarray
or None, default: None) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
. - objective_func (
region.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
.
- gdf (
-
fit_from_networkx
(graph, attr, n_regions, initial_labels=None, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Alternative API for
fit_from_scipy_sparse_matrix()
.Parameters: - graph (networkx.Graph) – Graph representing the contiguity relation.
- attr (str, list or dict) – If the clustering criteria are present in the networkx.Graph
graph as node attributes, then they can be specified as a string
(for one criterion) or as a list of strings (for multiple
criteria).
Alternatively, a dict can be used with each key being a node of the
networkx.Graph graph and each value being the corresponding
clustering criterion (a scalar (e.g. float or int) or a
numpy.ndarray
). If there are no clustering criteria present in the networkx.Graph graph as node attributes, then a dictionary must be used for this argument. Refer to the corresponding argument infit_from_dict()
for more details about the expected dict. - n_regions (int) – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - initial_labels (str or dict or None, default: None) – If str, then the string names the graph’s attribute holding the information about the initial clustering. If dict, then each key is a node and each value is the region the key area is assigned to at the beginning of the algorithm. If None, then a random initial clustering will be generated.
- objective_func (
region.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
.
-
fit_from_scipy_sparse_matrix
(adj, attr, n_regions, initial_labels=None, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Perform the AZP algorithm as described in [OR1995].
The resulting region labels are assigned to the instance’s
labels_
attribute.Parameters: - adj (
scipy.sparse.csr_matrix
) – Adjacency matrix representing the contiguity relation. - attr (
numpy.ndarray
) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering. - n_regions (int) – Number of desired regions.
- initial_labels (
numpy.ndarray
or None, default: None) – One-dimensional array of labels at the beginning of the algorithm. If None, then a random initial clustering will be generated. - objective_func (
region.objective_function.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – The objective function to use.
- adj (
-
fit_from_w
(w, attr, n_regions, initial_labels=None, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Alternative API for
fit_from_scipy_sparse_matrix()
.Parameters: - w (
libpysal.weights.weights.W
) – W object representing the contiguity relation. - attr (
numpy.ndarray
) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
. - n_regions (int) – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - initial_labels (
numpy.ndarray
or None, default: None) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
. - objective_func (
region.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
.
- w (
-
-
class
region.p_regions.azp.
AZPBasicTabu
(tabu_length=None, repetitions_before_termination=5, random_state=None)¶ Bases:
region.p_regions.azp.AZPTabu
Implementation of the AZP with basic tabu (refer to [OR1995]).
-
labels_
¶ numpy.ndarray
– Each element is a region label specifying to which region the corresponding area was assigned to by the last run of a fit-method.
-
-
class
region.p_regions.azp.
AZPReactiveTabu
(max_iterations, k1, k2, random_state=None)¶ Bases:
region.p_regions.azp.AZPTabu
Implementation of the AZP with reactive tabu (refer to [OR1995]).
-
labels_
¶ numpy.ndarray
– Each element is a region label specifying to which region the corresponding area was assigned to by the last run of a fit-method.
-
-
class
region.p_regions.azp.
AZPSimulatedAnnealing
(init_temperature=None, max_iterations=inf, sa_moves_term=10, nonmoving_steps_before_stop=3, repetitions_before_termination=5, random_state=None)¶ Bases:
object
Class offering the implementation of the AZP-SA algorithm (see [OR1995]).
-
labels_
¶ numpy.ndarray
– Each element is a region label specifying to which region the corresponding area was assigned to by the last run of a fit-method.
-
fit_from_dict
(neighbor_dict, attr, n_regions, initial_labels=None, cooling_factor=0.85, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Parameters: - neighbor_dict (dict) – Refer to the corresponding argument in
AZP.fit_from_dict()
. - attr (dict) – Refer to the corresponding argument in
AZP.fit_from_dict()
. - n_regions (int) – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - initial_labels (dict or None, default: None) – Refer to the corresponding argument in
AZP.fit_from_dict()
. - cooling_factor (float, default: 0.85) – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - objective_func (
region.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
.
- neighbor_dict (dict) – Refer to the corresponding argument in
-
fit_from_geodataframe
(gdf, attr, n_regions, contiguity='rook', initial_labels=None, cooling_factor=0.85, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Parameters: - gdf (
geopandas.GeoDataFrame
) – Refer to the corresponding argument inAZP.fit_from_geodataframe()
. - attr (str or list) – Refer to the corresponding argument in
AZP.fit_from_geodataframe()
. - n_regions (int) – Refer to the corresponding argument in
AZP.fit_from_geodataframe()
. - contiguity (str) – Refer to the corresponding argument in
AZP.fit_from_geodataframe()
. - initial_labels (
numpy.ndarray
or None, default: None) – Refer to the corresponding argument inAZP.fit_from_geodataframe()
. - cooling_factor (float, default: 0.85) – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - objective_func (
region.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
.
- gdf (
-
fit_from_networkx
(graph, attr, n_regions, initial_labels=None, cooling_factor=0.85, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Parameters: - graph (networkx.Graph) – Refer to the corresponding argument in
AZP.fit_from_networkx()
. - attr (str, list or dict) – Refer to the corresponding argument in
AZP.fit_from_networkx()
. - n_regions (int) – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - initial_labels (str or dict or None, default: None) – Refer to the corresponding argument in
AZP.fit_from_networkx()
. - cooling_factor (float, default: 0.85) – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - objective_func (
region.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
.
- graph (networkx.Graph) – Refer to the corresponding argument in
-
fit_from_scipy_sparse_matrix
(adj, attr, n_regions, initial_labels=None, cooling_factor=0.85, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Parameters: - adj (
scipy.sparse.csr_matrix
) – Refer to the corresponding argument inAZP.fit_from_scipy_sparse_matrix()
. - attr (
numpy.ndarray
) – Refer to the corresponding argument inAZP.fit_from_scipy_sparse_matrix()
. - n_regions (int) – Refer to the corresponding argument in
AZP.fit_from_scipy_sparse_matrix()
. - initial_labels (
numpy.ndarray
or None, default: None) – Refer to the corresponding argument inAZP.fit_from_scipy_sparse_matrix()
. - cooling_factor (float, default: 0.85) – Float \(\in (0, 1)\) specifying the cooling factor for the simulated annealing.
- objective_func (
region.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
.
- adj (
-
fit_from_w
(w, attr, n_regions, initial_labels=None, cooling_factor=0.85, objective_func=<region.objective_function.ObjectiveFunctionPairwise object>)¶ Parameters: - w (
libpysal.weights.weights.W
) – Refer to the corresponding argument inAZP.fit_from_w()
. - attr (
numpy.ndarray
) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
. - n_regions (int) – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - initial_labels (
numpy.ndarray
or None, default: None) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
. - cooling_factor (float, default: 0.85) – Refer to the corresponding argument in
fit_from_scipy_sparse_matrix()
. - objective_func (
region.ObjectiveFunction
, default: ObjectiveFunctionPairwise()) – Refer to the corresponding argument infit_from_scipy_sparse_matrix()
.
- w (
-
move_made_alert
()¶
-
sa_moves_alert
()¶
-
-
class
region.p_regions.azp.
AZPTabu
(allow_move_strategy=None, random_state=None)¶ Bases:
region.p_regions.azp.AZP
,abc.ABC
Superclass for tabu variants of the AZP.
-
reset_tabu
(tabu_len=None)¶
-
-
class
region.p_regions.azp_util.
AllowMoveAZP
¶
-
class
region.p_regions.azp_util.
AllowMoveAZPMaxPRegions
(spatially_extensive_attr, threshold, decorated_strategy)¶ Bases:
region.p_regions.azp_util.AllowMoveStrategy
Ensures that the spatially extensive attribute adds up to a given threshold in each region. Only moves preserving this condition in both the donor as well as the recipient region are allowed. The check for the recipient region is necessary in case there is an area with a negative spatially extensive attribute.
-
start_new_component
(initial_labels, attr, objective_func, comp_idx)¶
-
-
class
region.p_regions.azp_util.
AllowMoveAZPSimulatedAnnealing
(init_temperature, sa_moves_term=inf)¶ Bases:
region.p_regions.azp_util.AllowMoveStrategy
-
notify_min_sa_moves
()¶
-
notify_move_made
()¶
-
register_move_made
(observer_func)¶ Parameters: observer_func (callable) – A function to call when a move is allowed.
-
register_sa_moves_term
(observer_func)¶ Parameters: observer_func (callable) – A function to call when the certain number of SA-moves has been reached. This number is called Q in [OR1995] (page 431).
-
reset
()¶
-
update_temperature
(temp)¶
-
-
class
region.p_regions.azp_util.
AllowMoveStrategy
¶ Bases:
abc.ABC
-
start_new_component
(initial_labels, attr, objective_func, comp_idx)¶ This method should be called whenever a new connected component is clustered.
Parameters: - initial_labels (
numpy.ndarray
) – The region labels of the areas in the currently considered connected component. Shape: number of areas in the currently considered component. - attr (
numpy.ndarray
) – The areas’ attributes. Shape: number of areas in the currently considered component. - objective_func (
region.objective_function.ObjectiveFunction
) – The objective function to use. - comp_idx (
numpy.ndarray
) – The indices of those areas belonging to the component clustered next.
- initial_labels (
-
-
class
region.p_regions.exact.
PRegionsExact
¶ Bases:
object
A class for solving the p-regions problem by transforming it into a mixed-integer-programming problem (MIP) as described in [DCM2011].
-
labels_
¶ numpy.ndarray
– Each element is a region label specifying to which region the corresponding area was assigned to by the last run of a fit-method.
-
method
¶ str – The method used in the last call of a fit-method for translating the p-regions problem into an MIP.
-
metric
¶ function – The distance metric specified in the last call of a fit-method.
-
n_regions
¶ int – The number of regions the areas were clustered into by the last run of a fit-method.
-
solver
¶ str – The solver used in the last call of a fit-method.
-
fit
(adj, attr, n_regions, method='flow', solver='cbc', metric='euclidean')¶ Alias for
fit_from_scipy_sparse_matrix()
.Solve the p-regions problem as MIP as described in [DCM2011].
The resulting region labels are assigned to the instance’s
labels_
attribute.Parameters: - adj (class:scipy.sparse.csr_matrix) – Adjacency matrix representing the areas’ contiguity relation.
- attr (
numpy.ndarray
) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering. - n_regions (int) – Number of desired regions.
- method ({"flow", "order", "tree"}, default: "flow") –
The method to translate the clustering problem into an exact optimization model.
- solver ({"cbc", "cplex", "glpk", "gurobi"}, default: "cbc") –
The solver to use. Unless the default solver is used, the user has to make sure that the specified solver is installed.
- ”cbc” - the Cbc (Coin-or branch and cut) solver
- ”cplex” - the CPLEX solver
- ”glpk” - the GLPK (GNU Linear Programming Kit) solver
- ”gurobi” - the Gurobi Optimizer
- metric (str or function, default: "euclidean") – See the metric argument in
region.util.get_metric_function()
.
-
fit_from_dict
(neighbors_dict, attr, n_regions, method='flow', solver='cbc', metric='euclidean')¶ Alternative API for
fit_from_scipy_sparse_matrix()
.Parameters: - neighbors_dict (dict) – Each key represents an area and each value is an iterable of neighbors of this area.
- attr (dict) – A dict with the same keys as neighbors_dict and values
representing the clustering criteria. A value can be scalar (e.g.
float or int) or a
numpy.ndarray
. - n_regions (int) – See the corresponding argument in
fit_from_scipy_sparse_matrix()
. - method (str) – See the corresponding argument in
fit_from_scipy_sparse_matrix()
. - solver (str) – See the corresponding argument in
fit_from_scipy_sparse_matrix()
. - metric (str or function, default: "euclidean") – See the corresponding argument in
fit_from_scipy_sparse_matrix()
.
-
fit_from_geodataframe
(gdf, attr, n_regions, method='flow', solver='cbc', metric='euclidean', contiguity='rook')¶ Alternative API for
fit_from_scipy_sparse_matrix()
.Parameters: - gdf (GeoDataFrame) –
- attr (str or list) – The clustering criteria (columns of the GeoDataFrame gdf) are specified as string (for one column) or list of strings (for multiple columns).
- n_regions (int) – See the corresponding argument in
fit_from_scipy_sparse_matrix()
. - method (str) – See the corresponding argument in
fit_from_scipy_sparse_matrix()
. - solver (str) – See the corresponding argument in
fit_from_scipy_sparse_matrix()
. - contiguity ({"rook", "queen"}, default: "rook") –
Defines the contiguity relationship between areas. Possible contiguity definitions are:
- ”rook” - Rook contiguity.
- ”queen” - Queen contiguity.
- metric (str or function, default: "euclidean") – See the corresponding argument in
fit_from_scipy_sparse_matrix()
.
-
fit_from_networkx
(graph, attr, n_regions, method='flow', solver='cbc', metric='euclidean')¶ Alternative API for
fit_from_scipy_sparse_matrix()
.Parameters: - graph (networkx.Graph) – Graph representing the areas’ contiguity relation.
- attr (str, list or dict) – If the clustering criteria are present in the networkx.Graph
graph as node attributes, then they can be specified as a string
(for one criterion) or as a list of strings (for multiple
criteria).
Alternatively, a dict can be used with each key being a node of the
networkx.Graph graph and each value being the corresponding
clustering criterion (a scalar (e.g. float or int) or a
numpy.ndarray
). If there are no clustering criteria present in the networkx.Graph graph as node attributes, then a dictionary must be used for this argument. Refer to the corresponding argument infit_from_dict()
for more details about the expected dict. - n_regions (int) – See the corresponding argument in
fit_from_scipy_sparse_matrix()
. - method (str) – See the corresponding argument in
fit_from_scipy_sparse_matrix()
. - solver (str) – See the corresponding argument in
fit_from_scipy_sparse_matrix()
. - metric (str or function, default: "euclidean") – See the corresponding argument in
fit_from_scipy_sparse_matrix()
.
-
fit_from_scipy_sparse_matrix
(adj, attr, n_regions, method='flow', solver='cbc', metric='euclidean')¶ Solve the p-regions problem as MIP as described in [DCM2011].
The resulting region labels are assigned to the instance’s
labels_
attribute.Parameters: - adj (class:scipy.sparse.csr_matrix) – Adjacency matrix representing the areas’ contiguity relation.
- attr (
numpy.ndarray
) – Array (number of areas x number of attributes) of areas’ attributes relevant to clustering. - n_regions (int) – Number of desired regions.
- method ({"flow", "order", "tree"}, default: "flow") –
The method to translate the clustering problem into an exact optimization model.
- solver ({"cbc", "cplex", "glpk", "gurobi"}, default: "cbc") –
The solver to use. Unless the default solver is used, the user has to make sure that the specified solver is installed.
- ”cbc” - the Cbc (Coin-or branch and cut) solver
- ”cplex” - the CPLEX solver
- ”glpk” - the GLPK (GNU Linear Programming Kit) solver
- ”gurobi” - the Gurobi Optimizer
- metric (str or function, default: "euclidean") – See the metric argument in
region.util.get_metric_function()
.
-
fit_from_w
(w, attr, n_regions, method='flow', solver='cbc', metric='euclidean')¶ Alternative API for
fit_from_scipy_sparse_matrix()
.Parameters: - w (libpysal.weights.W) – W object representing the areas’ contiguity relation.
- attr (
numpy.ndarray
) – See the corresponding argument infit_from_scipy_sparse_matrix()
. - n_regions (int) – See the corresponding argument in
fit_from_scipy_sparse_matrix()
. - method (str) – See the corresponding argument in
fit_from_scipy_sparse_matrix()
. - solver (str) – See the corresponding argument in
fit_from_scipy_sparse_matrix()
. - metric (str or function, default: "euclidean") – See the corresponding argument in
fit_from_scipy_sparse_matrix()
.
-
region.tests package¶
Submodules¶
-
region.tests.test_util.
all_elements_equal
(iterable)¶
-
region.tests.test_util.
not_all_elements_equal
(iterable)¶
-
region.tests.test_util.
test_AZP_azp_connected_component__one_area
()¶
-
region.tests.test_util.
test_assert_feasible__contiguity
()¶
-
region.tests.test_util.
test_assert_feasible__number_of_regions
()¶
-
region.tests.test_util.
test_assert_feasible__pass_connected
()¶
-
region.tests.test_util.
test_assert_feasible__pass_disconnected
()¶
-
region.tests.test_util.
test_pop_randomly_from
()¶
-
region.tests.test_util.
test_random_element_from
()¶
-
region.tests.util.
compare_region_lists
(actual, desired)¶ Parameters: Raises: AssertionError
– If the two arguments don’t represent the same clustering.
-
region.tests.util.
convert_from_geodataframe
(gdf)¶ Convert a GeoDataFrame to other types representing the contiguity relation of the GeoDataFrame’s areas.
Parameters: gdf (GeoDataFrame) – Returns: other_formats – The 1st entry is a sparse adjacency matrix of type scipy.sparse.csr_matrix
. The 2nd entry is a networkx graph. The 3rd entry is a dict. Each key is an area and each value is an iterable of the key area’s neighbors. The 4th entry is a PySAL W object.Return type: tuple
-
region.tests.util.
region_list_from_array
(labels)¶ Parameters: labels ( numpy.ndarray
) – One-dimensional array of the areas’ region labels.Returns: region_list – Each element is a set of areas belonging to the same region. The list’s elements (the sets) are sorted by their smallest entry. Return type: list Examples
>>> import numpy as np >>> labels = np.array([0, 0, 0, ... 1, 1, 0, ... 1, 1, 1]) >>> desired = [{0, 1, 2, 5}, {3, 4, 6, 7, 8}] >>> all(r1 == r2 ... for r1, r2 in zip(region_list_from_array(labels), desired)) True
Submodules¶
region.csgraph_utils module¶
Utility functions for graph-related operations with sparse adjacency matrices (scipy.sparse.csr_matrix) using and supplementing scipy’s [compressed sparse graph routines]( https://docs.scipy.org/doc/scipy/reference/sparse.csgraph.html).
-
region.csgraph_utils.
is_connected
(adj)¶ Parameters: adj ( scipy.sparse.csr_matrix
) – Adjacency matrix.Returns: connected – True if graph defined by adjecency matrix adj is connected. False otherwise. Return type: bool Examples
>>> import numpy as np >>> from scipy.sparse import csr_matrix >>> connected = csr_matrix(np.array([[0, 1], ... [1, 0]])) >>> is_connected(connected) True >>> disconnected = csr_matrix(np.array([[0, 0], ... [0, 0]])) >>> is_connected(disconnected) False
-
region.csgraph_utils.
neighbors
(adj, area)¶ Parameters: - adj (
scipy.sparse.csr_matrix
) – Adjacency matrix. - area (int) – An area specified as index in the graph represented by the adjacency matrix adj. The integer must be in {0, 1, …, adj.shape[0]-1}.
Returns: neighs – The neighbors of area.
Return type: Examples
>>> import numpy as np >>> from scipy.sparse import csr_matrix >>> adjacency_matrix = csr_matrix(np.array([[0, 1, 1], ... [1, 0, 0], ... [1, 0, 0]])) >>> (neighbors(adjacency_matrix, 0) == np.array([1, 2])).all() True >>> (neighbors(adjacency_matrix, 1) == np.array([0])).all() True
- adj (
-
region.csgraph_utils.
sub_adj_matrix
(adj, nodes, wo_nodes=None)¶ Parameters: - adj (
scipy.sparse.csr_matrix
) – Adjacency matrix. - nodes (
numpy.ndarray
) – 1-dimensional array of nodes in the graph represented by the adj argument. The elements in nodes are (distinct) integers in {0, 1, …, nodes.shape[0]}. - wo_nodes (slice, int, array of ints, or None) – Nodes to neglect from the nodes argument.
Returns: sub_adj – Adjacency matrix of the subgraph consisting of only the nodes in the nodes argument.
Return type: Examples
>>> import numpy as np >>> from scipy.sparse import csr_matrix >>> adjacency_matrix = csr_matrix(np.array([[0, 1, 1], ... [1, 0, 0], ... [1, 0, 0]])) >>> nodes = np.array([0, 2]) >>> obtained = sub_adj_matrix(adjacency_matrix, nodes) >>> desired = np.array([[0, 1], ... [1, 0]]) >>> (obtained.todense() == desired).all() True >>> nodes = np.array([1, 2]) >>> obtained = sub_adj_matrix(adjacency_matrix, nodes) >>> desired = np.array([[0, 0], ... [0, 0]]) >>> (obtained.todense() == desired).all() True >>> type(obtained) == csr_matrix True >>> # tests for `wo_nodes` argument >>> all_nodes = np.arange(adjacency_matrix.shape[0]) >>> neglected_nodes = np.array([1]) >>> obtained = sub_adj_matrix(adjacency_matrix, all_nodes, neglected_nodes) >>> desired = np.array([[0, 1], ... [1, 0]]) >>> (obtained.todense() == desired).all() True
- adj (
region.objective_function module¶
-
class
region.objective_function.
ObjectiveFunction
(metric=None)¶ Bases:
abc.ABC
-
update
(moving_area, recipient_region, labels, attr)¶ Calculate the difference in the objective value caused by moving moving_area to recipient_region.
Parameters: - moving_area (int) – The area to move.
- recipient_region (int) – The recipient region.
- labels (
numpy.ndarray
) – The areas’ region labels before the move. Shape: number of areas. - attr (
numpy.ndarray
) – The areas’ attributes. Shape: number of areas.
Returns: diff – The change in the objective function caused by moving moving_area to recipient_region.
Return type:
-
-
class
region.objective_function.
ObjectiveFunctionCenter
(metric=None, center=<function mean>, reduction=<function sum>)¶ Bases:
region.objective_function.ObjectiveFunction
-
update
(moving_area, recipient_region, labels, attr)¶
-
-
class
region.objective_function.
ObjectiveFunctionPairwise
(metric=None)¶ Bases:
region.objective_function.ObjectiveFunction
-
update
(moving_area, recipient_region, labels, attr)¶
-
region.util module¶
-
exception
region.util.
MissingMetric
¶ Bases:
RuntimeError
Raised when a distance metric is required but was not set.
-
region.util.
Move
¶ alias of
move
-
region.util.
all_elements_equal
(array)¶
-
region.util.
array_from_df_col
(df, attr)¶ Extract one or more columns from a DataFrame as numpy array.
Parameters: Returns: col – The specified column(s) of the array.
Return type: Examples
>>> import pandas as pd >>> df = pd.DataFrame({"col1": [1, 2, 3], ... "col2": [7, 8, 9]}) >>> (array_from_df_col(df, "col1") == np.array([[1], ... [2], ... [3]])).all() True >>> (array_from_df_col(df, ["col1"]) == np.array([[1], ... [2], ... [3]])).all() True >>> (array_from_df_col(df, ["col1", "col2"]) == np.array([[1, 7], ... [2, 8], ... [3, 9]])).all() True
-
region.util.
array_from_dict_values
(dct, sorted_keys=None, flat_output=False, dtype=<class 'float'>)¶ Return values of the dictionary passed as dct argument as an numpy array. The values in the returned array are sorted by the keys of dct.
Parameters: - dct (dict) –
- sorted_keys (iterable, optional) – If passed, then the elements of the returned array will be sorted by this argument. Thus, this argument can be passed to suppress the sorting, or for getting a subset of the dictionary’s values or to get repeated values.
- flat_output (bool, default: False) – If True, the returned array will be one-dimensional. If False, the returned array will be two-dimensional with one row per key in dct.
- dtype (default: np.float64) – The dtype of the returned array.
Returns: array
Return type: Examples
>>> dict_flat = {0: 0, 1: 10} >>> dict_it = {0: [0], 1: [10]} >>> desired_flat = np.array([0, 10]) >>> desired_2d = np.array([[0], ... [10]]) >>> flat_flat = array_from_dict_values(dict_flat, flat_output=True) >>> (flat_flat == desired_flat).all() True >>> flat_2d = array_from_dict_values(dict_flat) >>> (flat_2d == desired_2d).all() True >>> it_flat = array_from_dict_values(dict_it, flat_output=True) >>> (it_flat == desired_flat).all() True >>> it_2d = array_from_dict_values(dict_it) >>> (it_2d == desired_2d).all() True
-
region.util.
array_from_graph
(graph, attr)¶ Parameters: - graph (networkx.Graph) –
- attr (str or iterable) – If str, then it specifies the an attribute of the graph’s nodes. If iterable of strings, then multiple attributes of the graph’s nodes are specified.
Returns: array – Array with one row for each node in graph.
Return type: Examples
>>> import networkx as nx >>> edges = [(0, 1), (1, 2), # 0 | 1 | 2 ... (0, 3), (1, 4), (2, 5), # --------- ... (3, 4), (4,5)] # 3 | 4 | 5 >>> graph = nx.Graph(edges) >>> data_dict = {node: 10*node for node in graph} >>> nx.set_node_attributes(graph, "test_data", data_dict) >>> desired = np.array([[0], ... [10], ... [20], ... [30], ... [40], ... [50]]) >>> (array_from_graph(graph, "test_data") == desired).all() True >>> (array_from_graph(graph, ["test_data"]) == desired).all() True >>> (array_from_graph(graph, ["test_data", "test_data"]) == ... np.hstack((desired, desired))).all() True
-
region.util.
array_from_graph_or_dict
(graph, attr)¶
-
region.util.
array_from_region_list
(region_list)¶ Parameters: region_list (list) – Each list element is an iterable of a region’s areas. Returns: labels – Each element specifies the region of the corresponding area. Return type: numpy.ndarray
Examples
>>> import numpy as np >>> obtained = array_from_region_list([{0, 1, 2, 5}, {3, 4}]) >>> desired = np.array([ 0, 0, 0, 1, 1, 0]) >>> (obtained == desired).all() True
-
region.util.
assert_feasible
(solution, adj, n_regions=None)¶ Parameters: - solution (
numpy.ndarray
) – Array of region labels. - adj (
scipy.sparse.csr_matrix
) – Adjacency matrix representing the contiguity relation. - n_regions (int or None) – An int represents the desired number of regions. If None, then the number of regions is not checked.
Raises: exc : `ValueError` – A ValueError is raised if clustering is not spatially contiguous. Given the n_regions argument is not None, a ValueError is raised also if the number of regions is not equal to the n_regions argument.
- solution (
-
region.util.
check_solver
(solver)¶
-
region.util.
copy_func
(f)¶ Return a copy of a function. This is useful e.g. to create aliases (whose docstrings can be changed without affecting the original function). The implementation is taken from https://stackoverflow.com/a/13503277.
Parameters: f (function) – Returns: g – Copy of f. Return type: function
-
region.util.
count
(arr, el)¶ Parameters: - arr (
numpy.ndarray
) – - el (object) –
Returns: result – The number of occurences of el in arr.
Return type: Examples
>>> arr = np.array([0, 0, 0, 1, 1]) >>> count(arr, 0) 3 >>> count(arr, 1) 2 >>> count(arr, 2) 0
- arr (
-
region.util.
dataframe_to_dict
(df, cols)¶ Parameters: - df (Union[
pandas.DataFrame
,geopandas.GeoDataFrame
]) – - cols (Union[str, list]) – If str, then it is the name of a column of df. If list, then it is a list of strings. Each string is the name of a column of df.
Returns: result – The keys are the elements of the DataFrame’s index. Each value is a
numpy.ndarray
holding the corresponding values in the columns specified by cols.Return type: Examples
>>> import pandas as pd >>> df = pd.DataFrame({"data": [100, 120, 115]}) >>> result = dataframe_to_dict(df, "data") >>> result == {0: 100, 1: 120, 2: 115} True >>> import numpy as np >>> df = pd.DataFrame({"data": [100, 120], ... "other": [1, 2]}) >>> actual = dataframe_to_dict(df, ["data", "other"]) >>> desired = {0: np.array([100, 1]), 1: np.array([120, 2])} >>> all(np.array_equal(actual[i], desired[i]) for i in desired) True
- df (Union[
-
region.util.
dict_from_graph_attr
(graph, attr, array_values=False)¶ Parameters: - graph (networkx.Graph) –
- attr (str, iterable, or dict) – If str, then it specifies the an attribute of the graph’s nodes. If iterable of strings, then multiple attributes of the graph’s nodes are specified. If dict, then each key is a node and each value the corresponding attribute value. (This format is also this function’s return format.)
- array_values (bool, default: False) – If True, then each value is transformed into a
numpy.ndarray
.
Returns: result_dict – Each key is a node in the graph. If array_values is False, then each value is a list of attribute values corresponding to the key node. If array_values is True, then each value this list of attribute values is turned into a
numpy.ndarray
. That requires the values to be shape-compatible for stacking.Return type: Examples
>>> import networkx as nx >>> edges = [(0, 1), (1, 2), # 0 | 1 | 2 ... (0, 3), (1, 4), (2, 5), # --------- ... (3, 4), (4,5)] # 3 | 4 | 5 >>> graph = nx.Graph(edges) >>> data_dict = {node: 10*node for node in graph} >>> nx.set_node_attributes(graph, "test_data", data_dict) >>> desired = {key: [value] for key, value in data_dict.items()} >>> dict_from_graph_attr(graph, "test_data") == desired True >>> dict_from_graph_attr(graph, ["test_data"]) == desired True
-
region.util.
distribute_regions_among_components
(component_labels, n_regions)¶ Parameters: Returns: result_dict – Each key is a label of a connected component. Each value specifies into how many regions the component is to be clustered.
Return type:
-
region.util.
find_sublist_containing
(el, lst, index=False)¶ Parameters: - el – The element to search for in the sublists of lst.
- lst (collections.Sequence) – A sequence of sequences or sets.
- index (bool, default: False) – If False (default), the subsequence or subset containing el is returned. If True, the index of the subsequence or subset in lst is returned.
Returns: result – See the index argument for more information.
Return type: collections.Sequence, collections.Set or int
Raises: exc : LookupError – If el is not in any of the elements of lst.
Examples
>>> lst = [{0, 1}, {2}] >>> find_sublist_containing(0, lst, index=False) == {0, 1} True >>> find_sublist_containing(0, lst, index=True) == 0 True >>> find_sublist_containing(2, lst, index=False) == {2} True >>> find_sublist_containing(2, lst, index=True) == 1 True
-
region.util.
generate_initial_sol
(adj, n_regions)¶ Generate a random initial clustering.
Parameters: - adj (
scipy.sparse.csr_matrix
) – - n_regions (int) –
Yields: region_labels (
numpy.ndarray
) – An array with -1 for areas which are not part of the yielded component and an integer >= 0 specifying the region of areas within the yielded component.- adj (
-
region.util.
get_metric_function
(metric=None)¶ Parameters: metric (str or function or None, default: None) – Using None is equivalent to using “euclidean”.
If str, then this string specifies the distance metric (from scikit-learn) to use for calculating the objective function. Possible values are:
- ”cityblock” for sklearn.metrics.pairwise.manhattan_distances
- ”cosine” for sklearn.metrics.pairwise.cosine_distances
- ”euclidean” for sklearn.metrics.pairwise.euclidean_distances
- ”l1” for sklearn.metrics.pairwise.manhattan_distances
- ”l2” for sklearn.metrics.pairwise.euclidean_distances
- ”manhattan” for sklearn.metrics.pairwise.manhattan_distances
If function, then this function should take two arguments and return a scalar value. Furthermore, the following conditions must be fulfilled:
- d(a, b) >= 0, for all a and b
- d(a, b) == 0, if and only if a = b, positive definiteness
- d(a, b) == d(b, a), symmetry
- d(a, c) <= d(a, b) + d(b, c), the triangle inequality
Returns: metric_func – If the metric argument is a function, it is returned. If the metric argument is a string, then the corresponding distance metric function from sklearn.metrics.pairwise is returned. Return type: function
-
region.util.
get_solver_instance
(solver_string)¶
-
region.util.
make_move
(moving_area, new_label, labels)¶ Modify the labels argument in place (no return value!) such that the area moving_area has the new region label new_label.
Parameters: - moving_area – The area to be moved (assigned to a new region).
- new_label (int) – The new region label of area moving_area.
- labels (
numpy.ndarray
) – Each element is a region label of the area corresponding array index.
Examples
>>> import numpy as np >>> labels = np.array([0, 0, 0, 0, 1, 1]) >>> make_move(3, 1, labels) >>> (labels == np.array([0, 0, 0, 1, 1, 1])).all() True
-
region.util.
pop_randomly_from
(lst)¶
-
region.util.
raise_distance_metric_not_set
(x, y)¶
-
region.util.
random_element_from
(lst)¶
-
region.util.
scipy_sparse_matrix_from_dict
(neighbors)¶ Parameters: neighbors (dict) – Each key represents an area. The corresponding value contains the area’s neighbors. Returns: adj – Adjacency matrix representing the areas’ contiguity relation. Return type: scipy.sparse.csr_matrix
Examples
>>> neighbors = {0: {1, 3}, 1: {0, 2, 4}, 2: {1, 5}, ... 3: {0, 4}, 4: {1, 3, 5}, 5: {2, 4}} >>> obtained = scipy_sparse_matrix_from_dict(neighbors) >>> desired = np.array([[0, 1, 0, 1, 0, 0], ... [1, 0, 1, 0, 1, 0], ... [0, 1, 0, 0, 0, 1], ... [1, 0, 0, 0, 1, 0], ... [0, 1, 0, 1, 0, 1], ... [0, 0, 1, 0, 1, 0]]) >>> (obtained.todense() == desired).all() True >>> neighbors = {"left": {"middle"}, ... "middle": {"left", "right"}, ... "right": {"middle"}} >>> obtained = scipy_sparse_matrix_from_dict(neighbors) >>> desired = np.array([[0, 1, 0], ... [1, 0, 1], ... [0, 1, 0]]) >>> (obtained.todense() == desired).all() True
-
region.util.
scipy_sparse_matrix_from_w
(w)¶ Parameters: w ( libpysal.weights.weights.W
) – A W object representing the areas’ contiguity relation.Returns: adj – Adjacency matrix representing the areas’ contiguity relation. Return type: scipy.sparse.csr_matrix
Examples
>>> import libpysal as ps >>> neighbor_dict = {0: {1}, 1: {0, 2}, 2: {1}} >>> w = ps.weights.W(neighbor_dict) >>> obtained = scipy_sparse_matrix_from_w(w) >>> desired = np.array([[0, 1, 0], ... [1, 0, 1], ... [0, 1, 0]]) >>> (obtained.todense() == desired).all() True
-
region.util.
separate_components
(adj, labels)¶ Take a labels array and yield modifications of it (one modified array per connected component). The modified array will be unchanged at those indices belonging to the current connected component. Thus it will have integers >= 0 there. At all other indices the Yielded array will be -1.
Parameters: - adj (
scipy.sparse.csr_matrix
) – Adjacency matrix representing the contiguity relation. - labels (
numpy.ndarray
) –
Yields: comp_dict (
numpy.ndarray
) – Each yielded dict represents one connected component of the graph specified by the adj argument. In a yielded dict, each key is an area and each value is the corresponding region-ID.Examples
>>> edges_island1 = [(0, 1), (1, 2), # 0 | 1 | 2 ... (0, 3), (1, 4), (2, 5), # --------- ... (3, 4), (4,5)] # 3 | 4 | 5 >>> >>> edges_island2 = [(6, 7), # 6 | 7 ... (6, 8), (7, 9), # ----- ... (8, 9)] # 8 | 9 >>> >>> graph = nx.Graph(edges_island1 + edges_island2) >>> adj = nx.to_scipy_sparse_matrix(graph) >>> >>> # island 1: island divided into regions 0, 1, and 2 >>> sol_island1 = [area%3 for area in range(6)] >>> # island 2: all areas are in region 3 >>> sol_island2 = [3 for area in range(6, 10)] >>> labels = np.array(sol_island1 + sol_island2) >>> >>> yielded = list(separate_components(adj, labels)) >>> yielded.sort(key=lambda arr: arr[0], reverse=True) >>> (yielded[0] == np.array([0, 1, 2, 0, 1, 2, -1, -1, -1, -1])).all() True >>> (yielded[1] == np.array([-1, -1, -1, -1, -1, -1, 3, 3, 3, 3])).all() True
- adj (
-
region.util.
w_from_gdf
(gdf, contiguity)¶ Get a W object from a GeoDataFrame.
Parameters: - gdf (GeoDataFrame) –
- contiguity ({"rook", "queen"}) –
Returns: weights – The contiguity information contained in the gdf argument in the form of a W object.
Return type: W