Spaceland: access shapefiles in Python¶
Spaceland is a modern Python library for fast, Pythonic access to ESRI shapefiles.
Or, at least, that’s what it will be. While Spaceland is in active development it’s still early days and the library isn’t yet feature-complete. What it does support is reading dBase III files (for why this is important, see What’s a shapefile?).
Spaceland is developed on GitHub and contributions are welcome.
What’s a shapefile?¶
Created by ESRI in the early 1990s, the shapefile is a historical data format that has outlived its technical usefulness and yet persists as the lingua franca of most geospatial tools.
Aim of the library¶
The aim of Spaceland is to provide the fastest and most idiomatic method of reading ESRI shapefiles in Python 3. To support that aim, the objectives are:
- Read all shape/record from shapefiles as fast as possible
- Be written in idiomatic Python 3 and provide a modern Python 3 interface to shapefiles
- Use built-in types as much as possible (you shouldn’t have to know about shapefile/dBase internals to use the data)
- Provide a high-level interface to data in zipped shapefiles
- Provide a low-level interface for those that need it
- Let people convert shapefile data into a more modern formats
- Integrate with orher Python geospatial libraries
- Include as close to 100% test coverage as possible
- Include high-quality documentation on the library and the shapefile format
For further details, see the roadmap.
What it won’t do¶
Spaceland is read-only. The shapefile should be considered a historical data format since it’s not well-suited to our Web-focussed world, and more suitable formats are now available (e.g. GeoJSON, TopoJSON, geospatial databases).
Spaceland won’t convert between coordinate systems, nor will it manipulate or analyse the data. But it should integrate with packages that do.
Roadmap¶
The current version is 0.1.0-dev
. Here’s the roadmap for getting to version 1.0.0
.
Version 0.1.0¶
- Read two-dimensional points, poly lines and polygons from shapefiles
- Publish package on PyPI
Version 0.2.0¶
- Read from ZIP files directly
- High-level interface for reading shapes and records
- Installation and quick start documentation
Version 0.3.0¶
- Support shape indexes (
.shx
files) - Include projection metadata from
.prj
files - Automatically select dBase III character-encoding from
.cpg
files
Version 0.4.0¶
- Convert shapefiles to GeoJSON
- Command-line interface for converting shapefiles to GeoJSON
- Documentation on the command-line interface
- Integrate with Shapely
Version 0.5.0¶
- Support multi points
- Support three-dimensional points, poly lines, polygons, and multi points
Version 0.6.0¶
- Support measures on points, poly lines, polygons, and multi points
Version 1.0.0¶
Support surface patches (
MultiPatch
)
Installation¶
To do.
Quick start¶
To do.
Command-line interface¶
The dbfr
command allows you to read records from a dBASE III file:
usage: dbfr [-h] [--encoding ENCODING] [--delimiter DELIMITER] [--quote QUOTE]
[--quote-always] [--escape ESCAPE] [--no-header] [--crlf]
filename
Convert a dBase III file to CSV.
positional arguments:
filename
optional arguments:
-h, --help show this help message and exit
--encoding ENCODING, -e ENCODING
set encoding used to decode the DBF input
--delimiter DELIMITER, -d DELIMITER
set field separator for CSV output
--quote QUOTE, -q QUOTE
set quote character for CSV output
--quote-always quote all fields in output
--escape ESCAPE set character used to escape a quote character
--no-header, -n don't output column names in the first row
--crlf use '\r\n' line endings in the output
Reading shapefile attributes from a DBF file¶
To do.
The spaceland
package¶
The spaceland
package — named after the three-dimensional world in Edwin Abbot’s book Flatland: A Romance of Many Dimensions — contains everything required to read ESRI shapefiles. It’s broken down into several core modules:
The spaceland.shp
module¶
Read non-topological geometric records from the ESRI Shapefile format.
The Shapefile format was documented by ESRI in 1998 and is available in a document titled ESRI Shapefile Technical Description.
-
class
spaceland.shp.
Shapefile
(shp: typing.IO[bytes]) → None¶ Read records from an ESRI shapefile.
A shapefile is a binary format created by ESRI in the early 1990s for storing non-topographical geometries. After a short header containing file metadata the geometries are stored in a sequence of individual records. The format is compact and fast to read but because it can’t contain indexes, details of the projection used, or metadata on individual shapes, it’s commonly accompanied by other files (e.g. a dBase III database for geometry metadata).
Class objects allow for iteration and can be used as context managers.
-
get_parse_function
()¶ Return a function capable of parsing a particular type of shape.
The function returned will be suitable for parsing shapefile records of one type (e.g. two-dimensional points). The type is defined in the header of the shapefile, and so the returned function will handle all non-null records within a single shapefile.
Return type: Callable
[[bytes
],tuple
]
-
records
()¶ Yield all geometric records in the shapefile, one-by-one.
Records are returned in file order. Records are returned as a tuple, with the structure of the tuple dependent on the shape type. The structure of each shape type’s tuple is detailed in the shape parsing functions:
parse_null_record()
for null shapesparse_point_record()
for two-dimensional points
The appropriate parsing function for a file can be found using
Shapefile.get_parse_function()
.Return type: Iterable
[tuple
]
-
-
class
spaceland.shp.
ShapefileMeta
(shape_type, x_min, y_min, x_max, y_max, z_min, z_max, m_min, m_max)¶ -
m_max
¶ Alias for field number 8
-
m_min
¶ Alias for field number 7
-
shape_type
¶ Alias for field number 0
-
x_max
¶ Alias for field number 3
-
x_min
¶ Alias for field number 1
-
y_max
¶ Alias for field number 4
-
y_min
¶ Alias for field number 2
-
z_max
¶ Alias for field number 6
-
z_min
¶ Alias for field number 5
-
-
spaceland.shp.
parse_null_record
(content)¶ Parse a null shape record from a shapefile.
A null shape is an empty record with no geometric data. It can be used as a shape type for a shapefile but it’s also valid as a placeholder in a shapefile of any other type. That is, a shapefile of polygons can also incude null shape records. This is the only valid way a shapefile can contain multiple shape types.
Parameters: content ( bytes
) – An empty byte stringReturn type: tuple
Returns: An empty tuple.
-
spaceland.shp.
parse_point_record
(content)¶ Parse a point shape record from a shapefile.
A point consists of a pair of double-precision coordinates ordered x, y.
Parameters: content ( bytes
) – 16 bytes containing two 64-bit IEEE double-precision floating-point numbers, in little-endian byte order.Return type: tuple
Returns: An tuple containing a point in x, y order.
The spaceland.dbf
module¶
Reads the subset of the dBase III file format used by ESRI shapefiles.
The dBase III format was never specified publicly but it has been reverse-engineered. The best documentation on the subject can be found at http://www.clicketyclick.dk/databases/xbase/format/dbf.html.
-
class
spaceland.dbf.
DbaseFile
(dbf: typing.IO[bytes], encoding: str = 'ascii') → None¶ Read fields and records from a dBase III binary file.
A dBase III file is a simple tabular data format consisting of a header, fields (columns), and records (rows). Fields are typed; as used in the ESRI shapefile format, the records in a dBase III file must have one of five field types: string, float, integer, date, or boolean. All types allow null values.
Class objects allow for iteration and slicing, and they also work as context managers.
-
record
(index)¶ Return the record at the given index.
Parameters: index ( int
) – The position of the record relative to the beginning of the file.Return type: tuple
Returns: A namedtuple, each item matching one field in the record.
-
records
(start=0)¶ Yield the records in the file.
A record is a set of fields and their values. The field names, types, and order are consistent across all records in the file.
It’s possible that a field has an invalid value (e.g. a non-numeric value in an integer field). When this happens the value becomes
None
and no error is raised.Parameters: start ( int
) – The record from which to start iteration. By default starts with the first record in the file.Yields: A namedtuple, each item matching one field in the record. Item names and order are consistent across records within the same file, but will differ between files. Return type: Iterable
[tuple
]
-
-
spaceland.dbf.
get_parse_str
(encoding)¶ Return a function that decodes bytes to strings.
The returned function decodes the bytes using the character encoding passed to this function.
>>> utf8 = get_parse_str("UTF-8") >>> utf8(b'\xf0\x9f\x91\x8d') '👍'
Parameters: encoding ( str
) – The name of a character encoding that can be used to decode the bytes to a string.Return type: Callable
[[bytes
],str
]Returns: A function that uses the given character encoding to convert bytes to strings.
-
spaceland.dbf.
parse_bool
(value)¶ Convert bytes to a boolean value.
Parameters: value ( bytes
) – A bytes value to be converted to a boolean value.Return type: Optional
[bool
]Returns: True
if the bytes value isY
,y
,T
, ort
;False
if the bytes value isN
,n
,F
, orf
;None
otherwise.
-
spaceland.dbf.
parse_date
(value)¶ Convert bytes in the format
YYYYMMDD
to a datetime.date object.Parameters: value ( bytes
) – A bytes value to be converted to a date.Return type: Optional
[date
]Returns: A datetime.date object if the bytes value is a valid date, but None
otherwise.
The spaceland.cli
module¶
Command-line interface to the library’s functionality.
This module provides the following functions that are registered as
‘console script’ entry points in setup.py
:
dbf_to_csv()
: convert dBase III files to CSVs (as commanddbfr
)
When the package is installed via setuptools (e.g. using
pip install
) the commands are immediately available to the user.
-
spaceland.cli.
dbf_to_csv
()¶ Read a dBase III file and convert it to a CSV.
Used as a ‘console script’ entry point in
setup.py
and available on the command-line asdbfr
. The dBase III file named as an argument is parsed and converted to CSV, and output tostdout
. The CSV dialect used can be configured using command-line options, as can the character-encoding used when reading the dBase file.Return type: None
-
spaceland.cli.
extant_file
(arg)¶ Type-check an argument to ensure it names an existing file.
Return type: Path