Pygaarst: geospatial analysis and remote sensing tools for Python¶
Pygaarst is a Python package that is designed to make it easy to access geospatial raster data such as remote sensing imagery, and perform frequently needed processing steps in a human-friendly way.
Loading data, accessing the intended band or dataset, converting it to meaningful units and calculating standardised indices should be easy. As should be plotting the imagery on a map or combining raster with vector datasets, such as ESRI shapefiles.
Some examples:
>>> from pygaarst import raster
>>> basedir = "path/to/data"
>>> landsatscene = "LE70690142004201EDC01"
>>> sc = raster.Landsatscene(os.path.join(basedir, landsatscene))
>>> swirband = sc.band7
>>> type(swirband.data)
<type 'numpy.ndarray'>
>>> type(swirband.radiance)
<type 'numpy.ndarray'>
>>> swirband.radiance.shape
(8021, 8501)
>>> ndvi = sc.ndvi
As of the moment this documentation is written, pygaarst is under development and not all of the API design should be considered stable. The following capabilities are supported:
- Python 2 (2.6+)
- GeoTiff base class (in pygaarst.raster), with Landsatband, ALIband and Hyperionband inheriting from it
- Landsatscene, Hyperionscene and ALIscene to represent a scene directory as retrieved from the USGS data portal
- VIIRSHDF5 for NPP VIIRS SDS dataset files in HDF5 format, as retrieved from NOAA’s portals
- Reading georeference files or metadata and calculation of pixel-center and, for gridded data, pixel-corner coordinate references
- A metadata parser for USGS-style MTL files in pygaarst.mtlutils
- Multiple helper methods, functions and properties for frequently repeated tasks: transformation to at-sensor radiance and reflectance, NDVI and NBR (for Landsat) as well as generic normalized difference indices, radiant temperature from thermal infrared bands, calculation and export of radiance spectra for Hyperion, LTK cloud masking algorithm for Landsat...
- A client for the NASA’s modaps data download API in pygaarst.modapsclient
The following capabilities are planned, roughly in order of priority:
- HDF4 (EOS) swath datasets (MODIS and ASTER Level 1B)
- Basic geometric and statistical operations involving raster and vector structures such as overlays, find-nearest, is-in...
- Python 3 support
- MODIS gridded data products
A step-by-step example of using pygaarst.raster to access and plot VIIRS and Landsat data is worked through in this IPython Notebook.
Release: | 0.1 |
---|---|
Date: | March 04, 2015 |
Contents:
Getting started¶
Prerequisites¶
As of now, Pygaarst requires Python 2.7. Python 3 (probably >= 3.3) is on the roadmap.
In case you only need the Landsat/Hyperion/ALI metadata parsing features or the client for the NASA MODAPS API there are no dependencies on external libraries.
The following Python packages are required for using any raster manipulation functionality. This includes having the corresponding binary libraries installed:
- numpy
- pyproj
- GDAL
The following Python packages are prerequisites for optional functionality:
- h5py (for reading and processing HDF5 files such as VIIRS scenes)
- python-hdf4 (for reading and processing HDF4 files such as MODIS scenes – but in its absence, GDAL will be tried)
- matplotlib (for plotting)
- mpl_toolkits.basemap (for plotting on a map)
- pytest (for unit tests)
- fiona (for reading and processing GIS vector files)
- shapely (for operations on vector data)
Future functionality is expected to require the following packages:
- netCDF4 (for reading and processing NetCDF files)
Please note that installing the prerequisites may, depending on your configuration, require some planning:
- Installing Numpy via pip will require a C and a Fortran 77 compiler to build the extension modules. A good alternative is to use a binary distribution. The same is true for Matplotlib.
- GDAL, pyproj, h5py, python-hdf4, netCDF4 and GEOS (which is a prerequisite for shapely and the Basemap toolkit) require libraries to be installed before the Python bindings can be installed. The easiest way to do this in a consistent manner is often to use a package manager on GNU/Linux or OS X. For Windows, binary packages are available.
- The Basemap toolkit for Matplotlib is a very large install. For Windows, a binary package is available. For GNU/Linux and OS X, it needs to be built from source. If you don’t need to plot on maps, you don’t need it!
- In many cases, it may be easiest to install a complete scientific Python
software package such as the Enthought Python Distribution or Anaconda, or to use a package manager (on GNU/Linux or OS X) There is one exception to this: Pygaarst can use GDAL to access HDF-EOS (HDF4) files such as MODIS and ASTER L1B data and georeference files. GDAL is, however, not by default compiled with HDF4 support. On OS X, it is rather difficult to install a GDAL with HDF4 support via either Homebrew or the Anaconda Python distribution. The frameworks provided by kyngchaos, while otherwise a sub-optimal way to provide the necessary environment, however, have the required support.
To account for individual user preferences, dependencies are listed in setup.py as optional (using the extras_required key) for automatic install. The requirements.txt file lists them, however, and can be used to install them via pip. You may edit requirements.txt to comment out the optional dependencies you know you won’t need.
My personal installation flow, which works well on OS X is to:
- first install all the system libraries with the correct support options
- then install the Python packages via pip
Pygaarst is distributed under the terms of the MIT License.
Installation¶
The use of a virtualenv is recommended.
At the current stage, there is no stable release for Pygaarst yet. The latest code from the master branch can be installed via:
$ pip install git+https://github.com/chryss/pygaarst.git
Usage example¶
A step-by-step example of using pygaarst.raster to access and plot VIIRS and Landsat data is worked through in this IPython Notebook.
Working with generic raster formats¶
GeoTIFF¶
Pygaarst raster module provides a custom GeoTIFF class with methods and properties that aim at making it easy to open and process GeoTIFF files. It works with both single and multi-band GeoTIFF files. Under the hood, the GeoTIFF class uses GDAL, so the instance attributes it makes available are will look familiar to those who work with GDAL. The default public methods and attributes are:
- class pygaarst.raster.GeoTIFF(filepath)¶
Represents a GeoTIFF file for data access and processing and provides a number of useful methods and attributes.
- Arguments:
- filepath (str): the full or relative file path
- clone(newpath, newdata)¶
Creates new GeoTIFF object from existing: new data, same georeference.
- Arguments:
- newpath: valid file path newdata: numpy array, 2 or 3-dim
- Returns:
- A raster.GeoTIFF object
- ij2xy(i, j)¶
Converts array index pair(s) to easting/northing coordinate pairs(s).
NOTE: array coordinate origin is in the top left corner whereas easting/northing origin is in the bottom left corner. Easting and northing are floating point numbers, and refer to the top-left corner coordinate of the pixel. i runs from 0 to nrow-1, j from 0 to ncol-1. For i=nrow and j=ncol, the bottom-right corner coordinate of the bottom-right pixel will be returned. This is identical to the bottom- right corner.
- Arguments:
- i (int): scalar or array of row coordinate index j (int): scalar or array of column coordinate index
- Returns:
- x (float): scalar or array of easting coordinates y (float): scalar or array of northing coordinates
- simpleplot()¶
Quick and dirty plot of each band (channel, dataset) in the image. Requires Matplotlib.
- xy2ij(x, y, precise=False)¶
Convert easting/northing coordinate pair(s) to array coordinate pairs(s).
NOTE: see note at ij2xy()
- Arguments:
- x (float): scalar or array of easting coordinates y (float): scalar or array of northing coordinates precise (bool): if true, return fractional array coordinates
- Returns:
- i (int, or float): scalar or array of row coordinate index j (int, or float): scalar or array of column coordinate index
- Lat¶
Latitude coordinate of each pixel corner, as an array
- Lat_pxcenter¶
Latitude coordinate of each pixel center, as an array
- Lon¶
Longitude coordinate of each pixel corner, as an array
- Lon_pxcenter¶
Longitude coordinate of each pixel center, as an array
- coordtrans¶
A PROJ4 Proj object, which is able to perform coordinate transformations
- data¶
2D numpy array for single-band GeoTIFF file data. Otherwise, 3D.
- delx¶
The sampling distance in x-direction, in physical units (eg metres)
- dely¶
The sampling distance in y-direction, in physical units (eg metres). Negative in northern hemisphere.
- easting¶
The x-coordinates of first row pixel corners, as a numpy array: upper-left corner of upper-left pixel to upper-right corner of upper-right pixel (ncol+1).
- northing¶
The y-coordinates of first column pixel corners, as a numpy array: lower-left corner of lower-left pixel to upper-left corner of upper-left pixel (nrow+1).
- proj4¶
The dataset’s coordinate reference system as a PROJ4 string
- projection¶
The dataset’s coordinate reference system as a Well-Known String
- x_pxcenter¶
The x-coordinates of pixel centers, as a numpy array ncol.
- y_pxcenter¶
y-coordinates of pixel centers, nrow.
HDF4 and HDF-EOS¶
[TBC]
Working with specific remote sensing data¶
Landsat¶
Usage¶
Pygaarst offers an intuitive and friendly way to access Landsat scene data and metadata as distributed as at-sensor scaled radiance data files (“Level1”) in GeoTIFF format, one file per imaging band, as zipped tar archives.
The work is done by two classes: Landsatscene, which takes the name of the unzipped data directory, and Landsatband, which can be instantiated from a Landsatscene object:
>>> from pygaarst import raster
>>> basedir = "path/to/data"
>>> landsatscene = "LE70690142004201EDC01"
>>> sc = raster.Landsatscene(os.path.join(basedir, landsatscene))
>>> type(sc)
<class 'pygaarst.raster.Landsatscene'>
>>> b2 = sc.band2
>>> type(b2.data)
<type 'numpy.ndarray'>
>>> b2.data.shape
(8021, 8501)
The Landsatscene knows how to calculate often used radiometric quantities such as vegetation indices, or access the Landsat metadata. Both old and new metadata format is supported:
>>> ndvi = sc.NDVI
>>> type(ndvi)
<type 'numpy.ndarray'>
>>> sc.meta['IMAGE_ATTRIBUTES']['SUN_ELEVATION']
44.26873508
>>> sc.meta['PRODUCT_METADATA']['DATE_ACQUIRED']
datetime.date(2004, 7, 19)
Landsatscene also supports a filename infix in case all scene files have been pre-processed. A typical example would be if the GeoTIFF band files have been sub-setted and saved under a new file name that adds a label at the end of the filename, before the extension. For example:
>>> b2.data.shape
(8021, 8501)
>>> sc.infix = '_CLIP'
>>> b3 = sc.band3
>>> b3.data.shape
(347, 373)
>>> b2.filepath
'path/to/data/LE70690142004201EDC01_B2.TIF'
>>> b3.filepath
'path/to/data/LE70690142004201EDC01_B3_CLIP.TIF'
The Landsatband object knows its own geographic attributes and can translate the raw data into radiance and reflectance. For thermal IR bands, calculation of radiant (brightness) temperatures is supported:
>>> b3.radiance
array([[ 23.59606299, 24.83937008, 23.59606299, ..., 29.81259843,
29.81259843, 29.81259843],
[ 25.46102362, 24.21771654, 23.59606299, ..., 30.43425197,
30.43425197, 29.19094488],
...,
[ 41.0023622 , 39.75905512, 40.38070866, ..., 90.11299213,
91.35629921, 92.5996063 ],
[ 40.38070866, 38.51574803, 41.0023622 , ..., 90.11299213,
90.73464567, 90.73464567]])
>>> b3.Lat
array([[ 65.23978665, 65.23979005, 65.23979346, ..., 65.24086228,
65.24086467, 65.24086706],
[ 65.2400558 , 65.2400592 , 65.24006261, ..., 65.24113144,
65.24113383, 65.24113622],
...,
[ 65.33291157, 65.332915 , 65.33291841, ..., 65.3339918 ,
65.3339942 , 65.3339966 ],
[ 65.33318072, 65.33318414, 65.33318756, ..., 65.33426096,
65.33426336, 65.33426576]])
>>> b6 = sc.band6L
>>> b6.tKelvin
array([[ 298.51857637, 298.51857637, 299.01776653, ..., 301.97175896,
301.48420756, 301.48420756],
[ 297.51409689, 298.01736166, 298.51857637, ..., 302.45745107,
301.97175896, 301.48420756],
...,
[ 294.96609241, 294.96609241, 294.96609241, ..., 290.23752626,
290.77248753, 290.23752626],
[ 294.96609241, 294.96609241, 294.96609241, ..., 290.77248753,
290.77248753, 290.77248753]])
All parameters are, where provided, taken from the scene-specific metadata. Where not, they are taken from the published literature.
The Landsatscene class¶
- class pygaarst.raster.Landsatscene(dirname)¶
A container object for TM/ETM+ L5/7 and OLI/TIRS L8 scenes
- Arguments:
- dirname (str): the full or relative file path that contains all files
- NBR¶
The Normalized Burn Index
- NDVI¶
The Normalized Difference Vegetation Index
- TIRband¶
Label of a suitable thermal infrared band for the scene. Used in loops over Landsat scenes from multiple platform/sensor combinations. Will return B6 for L4/5, B6H for L7, B10 for L8.
- ltkcloud¶
Cloud masking and landcover classification with the Luo–Trishchenko–Khlopenkov algorithm (doi:10.1109/LGRS.2010.2095409).
- Returns:
- A numpy array of the same shape as the input data The classes signify: - 1.0 - bare ground - 2.0 - ice/snow - 3.0 - water - 4.0 - cloud - 5.0 - vegetated soil
- naivecloud¶
Heuristic cloud masking via thermal IR threshold
The Landsatband class¶
- class pygaarst.raster.Landsatband(filepath, band=None, scene=None)¶
Represents a band of a Landsat scene.
Implemented: TM/ETM+ L5/7 and OLI/TIRS L8, old and new metadata format
- newmetaformat¶
Checks if band uses old or new metadata format
- radiance¶
Radiance in W/um/m^2/sr derived from DN and metadata, as numpy array
- reflectance¶
Reflectance (0 .. 1) derived from DN and metadata, as numpy array
- tKelvin¶
Radiant (brightness) temperature at the sensor in K, implemented for Landsat thermal infrared bands.
EO-1 Hyperion, ALI and general USGS Level 1 data¶
Hyperion and ALI¶
Hyperion imaging spectroscopy data and ALI multi-spectral imagery, processed to a Level 1 product, are distributed by the USGS in a format very similar to Landsat data. Pygaarst provides classes similar to the Landsat-specific classes for these sensors.
- class pygaarst.raster.Hyperionscene(dirname)¶
A container object for EO-1 Hyperion scenes. Input: directory name, which is expected to contain all scene files.
- get_datacube(outfn, bandlist, islice=None, jslice=None, set_fh=False)¶
Creates a rasterhelpers.Datacube object from a bandlist and the radiance data from the whole Hyperion scene.
- Arguments:
- outfn (str): file path of the HDF5 file that stores the cube bandlist (sequence of str): a list or array of band names islice (sequence of int): list or array of row indices jslice (sequence of int): list or array of column indices set_fh (bool): should an open filehandle be set as an argument?
- spectrum(i_idx, j_idx, bands='calibrated', bdsel=[])¶
Calculates the radiance spectrum for one pixel.
- Arguments:
i_idx (int): first coordinate index of the pixel j_idx (int): second coordinate index of the pixel bands (str): indicates the bands that are used
‘calibrated’ (default): only use calibrated bands ‘high’: use uncalibrated bands 225-242 ‘low’: use uncalibrated bands 1-7 ‘all’: use all available bands ‘selected’: use bdsel attribute or argumentbdsel: sequence data type containing band indices to select
- class pygaarst.raster.Hyperionband(filepath, band=None, scene=None)¶
Represents a band of an EO-1 Hyperion scene.
- radiance¶
Radiance in W / um / m^2 / sr derived from digital number and metadata, as numpy array
- class pygaarst.raster.ALIscene(dirname)¶
A container object for EO-1 ALI scenes. Input: directory name, which is expected to contain all scene files.
The USGSL1 parent classes¶
All of them inherit from parent classes called USGSL1scene and USGSL1band respectively.
- class pygaarst.raster.USGSL1scene(dirname)¶
A container object for multi- and hyperspectral satellite imagery scenes as provided as Level 1 (at-sensor calibrated scaled radiance data) by various USGS data portals: Landsat 4/5 TM, Landsat 7 ETM+, Landsat 7 OLI/TIRS, EO1 ALI and EO1 Hyperion
- Arguments:
- dirname (str): name of directory that contains all scene files.
- get_normdiff(label1, label2)¶
Calculate a generic normalized difference index
- Arguments:
- label1, label2 (str): valid band labels, usually of the form ‘bandN’
The metatadata parser¶
All these USGS-provided satellite remote sensing data use essentially the same metadata format. It is provided in a text file that can be recognized by the letters MTL in the filename. The data structure is nested naturally maps onto a dictionary of dictionaries, with unlimited nesting depth. Unfortunately it appears to be quite badly documented, or in any event I have been unable to locate a data description or specification.
For objects of type pygaarst.raster.USGSL1scene or its children (pygaarst.raster.Landsatscene, pygaarst.raster.ALIscene or pygaarst.raster.Hyperionscene), the parsed metadata dictionary is provided in the meta attribute:
>>> print sc.meta
{'IMAGE_ATTRIBUTES': {'GEOMETRIC_RMSE_MODEL': 2.868, 'CLOUD_COVER': 31.0, 'GEOMETRIC_RMSE_MODEL_X': 1.73, 'SUN_AZIMUTH': 162.9104406, 'SUN_ELEVATION': 44.26873508, 'GROUND_CONTROL_POINTS_MODEL': 210, 'IMAGE_QUALITY': 9, 'GEOMETRIC_RMSE_MODEL_Y': 2.287}, 'RADIOMETRIC_RESCALING': {'RADIANCE_MULT_BAND_7': 0.044, 'RADIANCE_MULT_BAND_5': 0.126, 'RADIANCE_MULT_BAND_4': 0.969, ...
The parser is implemented via the module pygaarst.mtlutils. An arbitrary MTL file can be parsed by calling pygaarst.mtlutils.parsemeta():
pygaarst.mtlutils
Helpers for parsing MTL metadata files used by USGS Landsat and EO-1 (Hyperion and ALI) Level 1 GeoTIFF scene data archives
Created by Chris Waigl on 2014-04-20.
[2014-04-20] Refactoring original landsatutils.py, as MTL file format is also used by Hyperion and ALI.
- exception pygaarst.mtlutils.MTLParseError¶
Custom exception: parse errors in Landsat or EO-1 MTL metadata files
- pygaarst.mtlutils.parsemeta(metadataloc)¶
Parses the metadata.
- Arguments:
- metadataloc: a filename or a directory.
Returns metadata dictionary
MODIS swath data¶
[This is a little harder...]
Suomi/NPP VIIRS¶
- class pygaarst.raster.VIIRSHDF5(filepath, geofilepath=None, variable=None)¶
A class providing access to a VIIRS SDS HDF5 file or dataset Parameters: filepath: full or relative path to the data file geofilepath (optional): override georeference array file from metadata; full or relative path to georeference file variable (optional): name of a variable to access
- geodata¶
Object representing the georeference data, in its entirety
- lats¶
Latitudes as provided by georeference array
- lons¶
Longitudes as provided by georeference array
MODAPS data download API client¶
This is a REST-full web service client that implements the NASA LAADSWEB data API (http://ladsweb.nascom.nasa.gov/data/api.html). Usage:
>>> from pygaarst import modapsclient as m
>>> a = m.ModapsClient()
>>> retvar = a.[methodname](args)
Module pygaarst.modapsclient
A client to do talk to MODAPS web services. See http://ladsweb.nascom.nasa.gov/data/web_services.html
Created by Chris Waigl on 2013-10-22.
- class pygaarst.modapsclient.ModapsClient¶
Implements a client for MODAPS web service retrieval of satellite data, without post-processing
See http://ladsweb.nascom.nasa.gov/data/quickstart.html
- getAllOrders(email)¶
All orders for an email address
- getBands(product)¶
Available bands for a product
- getBrowse(fileId)¶
fileIds is a single file-ID
- getCollections(product)¶
Available collections for a product
- getDataLayers(product)¶
Available data layers for a product
- getDateCoverage(collection, product)¶
Available dates for a collection/product combination
TODO: add some result postprocessing - not a good format
- getFileOnlineStatuses(fileIds)¶
fileIds is a comma-separated list of file-IDs
- getFileProperties(fileIds)¶
fileIds is a comma-separated list of file-IDs
- getFileUrls(fileIds)¶
fileIds is a comma-separated list of file-IDs
- getMaxSearchResults()¶
Max number of search results that can be returned
- getOrderStatus(OrderID)¶
Order status for an order ID. TODO: implement
- getOrderUrl(OrderID)¶
Order URL(?) for order ID. TODO: implement
- getPostProcessingTypes(products)¶
Products: comma-concatenated string of valid product labels
- listCollections()¶
Lists all collections. Deprecated: use getCollections
- listMapProjections()¶
Lists all available map projections
- listProductGroups(instrument)¶
Lists all available product groups
- listProducts()¶
Lists all available products
- listProductsByInstrument(instrument, group=None)¶
Lists all available products for an instrument
- listReprojectionParameters(reprojectionName)¶
Lists reprojection parameters for a reprojection
- listSatelliteInstruments()¶
Lists all available satellite instruments
- orderFiles(FileIDs)¶
Submits an order. TODO: implement
- searchForFiles(products, startTime, endTime, north, south, east, west, coordsOrTiles=u'coords', dayNightBoth=u'DNB', collection=5)¶
Submits a search based on product, geography and time
- searchForFilesByName(collection, pattern)¶
Submits a search based on a file name pattern