Australian Geoscience Data Cube¶
Overview¶
The Australian Geoscience Data Cube provides an integrated gridded data analysis environment for decades of analysis ready earth observation satellite and related data from multiple satellite and other acquisition systems.
In 2014, Geoscience Australia, CSIRO and the NCI established the Australian Geoscience Data Cube, building on earlier work of Geoscience Australia and expanding it to include additional earth observation satellite and other gridded data collections (e.g. MODIS, DEM) in order to expand the range of integrated data analysis capabilities that were available. The complete software stack and petabytes of EO are deployed at the NCI petascale computing facility for use by NCI users.
Contents:
Installation¶
These installation instructions are tested on:
- Ubuntu 14.04 or newer
- Mac OS X 10.11.3
- Windows 7 Enterprise SP1 64-bit (Intel)
Microsoft Windows¶
Python 2.7 or Python 3.5 environment¶
1. Download and install a standard python release from http://www.python.org/ . The AGDC supports versions 2.7 and 3.5.
Note
If in a restricted environment with no local administrator access, python can be installed by running:
msiexec /a python-2.7.11.msi TARGETDIR=C:\Python27
Or by launching the version 3.5 installer and selecting not to install for all users (only single user install).
Ensure pip is installed:
cd C:\Python27 python -m ensurepip
Upgrade and Install python virtualenv:
python -m pip install --upgrade pip setuptools virtualenv
Create an AGDC virtualenv:
mkdir C:\envs Scripts\virtualenv C:\envs\agdcv2
Note
3.5 only workaround: Copy vcruntime140.dll
from Python install dir into
virtualenv `Scripts` folder.
Activate virtualenv:
C:\envs\agdcv2\Scripts\activate
The python virtual environment isolates this python installation from other python installations (which may be in use for other application software) to prevent conflicts between different python module versions.
Python modules¶
On windows systems by default there are no ready configured compilers, and so libraries needed for some python modules must be obtained in precompiled (binary) form.
Download and install binary wheels from http://www.lfd.uci.edu/~gohlke/pythonlibs/
You will need to download at least:
- GDAL
- rasterio
- numpy
- netCDF4
- psycopg2
- numexpr
- scipy
- pandas
- matplotlib
The following may also be useful:
- lxml
- pyzmq
- udunits2
Install these packages by running in your Downloads
directory:
pip install *.whl
Note
It may be necessary to manually replace *.whl
with the full filenames for each
.whl file (unless using a unix-like shell instead of the standard windows command line
console).
Note
For 3.5 only
If there are problems loading libraries. Try:
copy site-packages/matplotlib/msvcp140.dll site-packages/osgeo/
Also, install the python notebook interface for working with datacube example notebooks:
pip install jupyter
PostgreSQL Portable¶
An easy to install version of PostgreSQL can be downloaded from http://sourceforge.net/projects/postgresqlportable/ . It can install and run as an unprivileged windows user.
After installing, launch PostgreSQLPortable.exe
(and place a shortcut in the windows Startup menu).
To prepare the database for first use, enter the following commands in the PostgrSQL Portable window, substituting “u12345” with your windows login user-ID:
create role u12345 superuser login;
create database datacube;
Datacube installation¶
Obtain a current copy of the datacube source code from GitHub. A simple way is to extract https://github.com/data-cube/agdc-v2/archive/develop.zip into a subdirectory of the python environment.
Install the datacube module by running:
cd agdc-v2-develop
python setup.py install
Extra instructions for installing Compliance Checker¶
pip install cf_units
- Download and install udunits2 from gohlke
- Edit site-packages/cf_units/etc/site.cfg with path to udunits2.dll which should be venv/share/udunits/udunits2.dll
Ubuntu¶
Required software¶
Ubuntu 16.04 includes packages for PostgreSQL 9.5. On earlier versions of Ubuntu you can use the postgresql.org repo as described on [their download page](http://www.postgresql.org/download/linux/ubuntu/).
PostgreSQL:
apt-get install postgresql-9.5 postgresql-client-9.5 postgresql-contrib-9.5
HDF5, and netCDF4:
apt-get install libhdf5-serial-dev libnetcdf-dev
GDAL:
apt-get install libgdal1-dev
Optional packages (useful utilities, docs):
apt-get install postgresql-doc-9.5 libhdf5-doc netcdf-doc libgdal1-doc
apt-get install hdf5-tools netcdf-bin gdal-bin pgadmin3
Python and packages¶
Python 2.7 and 3.4+ are supported.
Download the latest version of the software from the repository and install it:
git clone https://github.com/data-cube/agdc-v2
cd agdc-v2
git checkout develop
python setup.py install
It may be useful to use conda to install binary packages:
conda install psycopg2 gdal libgdal hdf5 rasterio netcdf4 libnetcdf pandas
Note
Usage of virtual environments is recommended
Mac OS X¶
Note
This section was typed up from memory. Verification and input would be appreciated.
Required software¶
Homebrew:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Postgres.app:
http://postgresapp.com/
HDF5, netCDF4, and GDAL:
brew install hdf5 netcdf gdal
Python and packages¶
Python 2.7 and 3.4+ are supported.
Download the latest version of the software from the repository and install it:
git clone https://github.com/data-cube/agdc-v2
cd agdc-v2
git checkout develop
python setup.py install
It may be useful to use conda to install binary packages:
conda install psycopg2 gdal libgdal hdf5 rasterio netcdf4 libnetcdf pandas
Note
Usage of virtual environments is recommended
AGDC Database Setup¶
Attention
You must have a properly configured Postgres installation for this to work. If you have a fresh install of Postgres
on Ubuntu then you may want to configure the postgres
user password to complete the postgres setup
Create Database¶
If you have existing Postgres authentication:
createdb datacube
or specify connection details manually:
createdb -h <hostname> -U <username> datacube
Note
You can also delete the database by doing dropdb datacube
. This step is not reversible.
Create Configuration File¶
Datacube looks for configuration file in ~/.datacube.conf:
[datacube]
db_database: datacube
# A blank host will use a local socket. Specify a hostname to use TCP.
db_hostname:
# Credentials are optional: you might have other Postgres authentication configured.
# The default username otherwise is the current user id.
# db_username:
# db_password:
[locations]
# Where to reach storage locations from the current machine.
# -> Location names are arbitrary, but correspond to names used in the
# storage type files.
# -> We may eventually support remote protocols (http, S3, etc) to lazily fetch remote data.
eotiles: file:///short/public/democube/
Change eotiles to point to the location where the datacube should store the storage units. Note the URI syntax (file:// prefix is required).
See also Runtime Config
Create the Database Schema¶
datacube-config can create and populate the datacube schema (agdc):
datacube-config -v database init
Ingestion Configuration¶
Dataset Preparation¶
Dataset Metadata is required to accompany the dataset for it to be recognised by Data Cube. It defines critical metadata of the dataset such as:
- measurements
- platform and sensor names
- geospatial extents and projection
- acquisition time
Note
Some metadata requires cleanup before they are ready to be loaded.
For more information see Dataset Metadata.
Storage Types¶
Storage Type is a document that defines the way an input dataset is stored inside the Data Cube.
It controls things like:
- which measurements are stored
- what projection the data is stored in
- what resolution the data is stored in
- how data is tiled
- where the data is stored
Multiple storage type definitions can be used to ingest datasets into different projections, resolutions, etc.
For more information see Storage Type.
datacube-config can be used to add storage types. Sample configs are in docs/config_samples
.
datacube-config storage add docs/config_samples/ga_landsat_7/ls7_albers.yaml
Note
You should refer to platform
in your metadata file to determine which kind of mapping to configure. For example, LANDSAT_5
means you should configure for the Landsat 5 configuration.
Configuration samples are available as part of the open source Github repoistory.
Ingestion¶
datacube-ingest can be used to ingest prepared datasets:
datacube-ingest -v ingest packages/nbar/LS8_OLITIRS_TNBAR_P54_GALPGS01-002_112_079_20140126 packages/pq/LS8_OLITIRS_PQ_P55_GAPQ01-002_112_079_20140126
Configuration Files¶
Dataset Metadata¶
Dataset Metadata is a document that defines critical metadata of the dataset such as:
- measurements
- platform and sensor names
- geospatial extents and projection
- acquisition time
id: 4678bf44-82b5-11e5-9264-a0000100fe80
ga_label: LS5_TM_NBAR_P54_GANBAR01-002_090_085_19900403
ga_level: P54
product_type: NBAR
creation_dt: 2015-03-22 01:37:41
checksum_path: package.sha1
platform:
code: LANDSAT_5
instrument:
name: TM
format:
name: GeoTiff
acquisition:
aos: 1990-04-03 23:05:30
los: 1990-04-03 23:13:06
groundstation:
code: ASA
label: Alice Springs
eods_domain_code: '002'
extent:
coord:
ul:
lat: -35.04885921004133
lon: 148.08553520367545
ur:
lat: -34.996165736608994
lon: 150.7361052128533
ll:
lat: -37.014186845449004
lon: 148.11284610299305
lr:
lat: -36.95758002539804
lon: 150.829848574551
from_dt: 1990-04-03 23:10:30
center_dt: 1990-04-03 23:10:42
to_dt: 1990-04-03 23:10:54
grid_spatial:
projection:
geo_ref_points:
ul:
x: 599000.0
y: 6121000.0
ur:
x: 841025.0
y: 6121000.0
ll:
x: 599000.0
y: 5902975.0
lr:
x: 841025.0
y: 5902975.0
datum: GDA94
ellipsoid: GRS80
zone: -55
unit: metre
image:
satellite_ref_point_start:
x: 90
y: 85
satellite_ref_point_end:
x: 90
y: 85
bands:
'10':
path: product/scene01/LS5_TM_NBAR_P54_GANBAR01-002_090_085_19900403_B10.tif
lineage:
machine: {}
source_datasets: {}
Storage Type¶
A Storage Type is a document that defines the way an input dataset is stored inside the Data Cube.
It controls things like:
- which measurements are stored
- what projection the data is stored in
- what resolution the data is stored in
- how data is tiled
- where the data is stored
name: ls5_nbar
description: LS5 NBAR 25 metre, 1 degree tile
# Any datasets matching these metadata properties.
match:
metadata:
platform:
code: LANDSAT_5
instrument:
name: TM
product_type: NBAR
location_name: eotiles
file_path_template: '{platform[code]}_{instrument[name]}_{tile_index[0]}_{tile_index[1]}_NBAR_{start_time}.nc'
global_attributes:
title: Experimental Data files From the Australian Geoscience Data Cube - DO NOT USE
summary: These files are experimental, short lived, and the format will change.
source: This data is a reprojection and retile of Landsat surface reflectance scene data available from /g/data/rs0/scenes/
product_version: '0.0.0'
license: Creative Commons Attribution 4.0 International CC BY 4.0
storage:
driver: NetCDF CF
crs: |
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.0174532925199433,
AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4326"]]
tile_size:
longitude: 1.0
latitude: 1.0
resolution:
longitude: 0.00025
latitude: -0.00025
chunking:
longitude: 500
latitude: 500
time: 1
dimension_order: ['time', 'latitude', 'longitude']
aggregation_period: year
roi:
longitude: [110, 120]
latitude: [10, 20]
measurements:
'10':
dtype: int16
nodata: -999
resampling_method: cubic
varname: band_10
'20':
dtype: int16
nodata: -999
resampling_method: cubic
varname: band_20
- name
- Name of the storage type. It’s used as a human-readable identifer. Must be unique and consist of alphanumeric characters and/or underscores.
- description (optional)
- A human-readable description of the storage type.
- location_name
- Name of the location where the storage units go. See Runtime Config.
- file_path_template
- File path pattern defining the name of the storage unit files.
- TODO: list available substitutions
- match/metadata
- TODO
- global_attributes
- TODO: list useful attributes
- storage
- driver
- Storage type format. Currently only ‘NetCDF CF’ is supported
- crs
- WKT defining the coordinate reference system for the data to be stored in.
- TODO: support EPSG codes?
- tile_size
- Size of the tiles for the data to be stored in specified in projection units.
- Use ‘latitude’ and ‘longitude’ if the projection is geographic, else use ‘x’ and ‘y’
- aggregation_period
- Storage unit aggregation period. One of ‘month’, ‘year’
- resolution
Resolution for the data to be stored in specified in projection units. Negative values flip the axis.
- Use ‘latitude’ and ‘longitude’ if the projection is geographic, else use ‘x’ and ‘y’
- chunking
- Size of the internal NetCDF chunks in ‘pixels’.
- dimension_order
- Order of the dimensions for the data to be stored in.
- Use ‘latitude’ and ‘longitude’ if the projection is geographic, else use ‘x’ and ‘y’
- TODO: currently ignored. Is it really needed?
- roi (optional)
- Define region of interest for the subset of the data to be ingested Currently only bounding box specified in projection units is supported
- measurements
Mapping of the input measurement names as specified in Dataset Metadata to the per-measurement ingestion parameters
- dtype
- Data type to store the data in. One of (u)int(8,16,32,64), float32, float64
- resampling_method
- Resampling method. One of nearest, cubic, bilinear, cubic_spline, lanczos, average.
- varname
- Name of the NetCDF variable to store the data in.
- nodata (optional)
- No data value
Runtime Config¶
Runtime Config document specifies various runtime configuration options such as: database connection parameters and location mappings
[Data Cube]
db_hostname: 130.56.244.227
db_database: democube
db_username: cube_user
[locations]
eotiles: file:///short/public/democube/
v1tiles: file:///g/data/rs0/tiles/EPSG4326_1deg_0.00025pixel/
- locations
Mapping of location names to URI prefixes. How to reach each location from the current machine.
Note: You may want to rename
eotiles
path to a location you can modify. The database will create storage there.
Tools¶
datacube-config¶
Usage: datacube-config [OPTIONS] COMMAND [ARGS]...
Configure the Data Cube
Options:
--version
-v, --verbose Use multiple times for more verbosity
-C, --config_file TEXT
--log-queries Print database queries.
-h, --help Show this message and exit.
Commands:
check Verify & view current configuration.
collections Dataset collections
database Initialise the database
storage Storage types
datacube-ingest¶
Usage: datacube-ingest [OPTIONS] COMMAND [DATASETS]...
Data Management Tool
Options:
--version
-v, --verbose Use multiple times for more verbosity
-C, --config_file TEXT
--log-queries Print database queries.
-h, --help Show this message and exit.
Commands:
ingest Ingest datasets into the Data Cube.
stack Stack storage units
datacube-search¶
Usage: datacube-search [OPTIONS] COMMAND [ARGS]...
Search the Data Cube
Options:
--version
-v, --verbose Use multiple times for more verbosity
-C, --config_file TEXT
--log-queries Print database queries.
-f [csv|pretty] Output format [default: pretty]
-h, --help Show this message and exit.
Commands:
datasets Datasets
units Storage units
Developers Guide¶
This documentation applies to version: 1.0.4+dirty
Search API¶
Coming soon
Data Access API¶
See Data Access Application Programming Interface for more details
Create API¶
API.__init__ |
List Properties¶
API.list_products |
|
API.list_fields |
|
API.list_field_values |
|
API.list_all_field_values |
|
API.list_storage_units |
Data Access¶
API.get_dataset |
|
API.get_data_array |
Cell-based Procesing¶
API.list_cells |
|
API.list_tiles |
|
API.get_dataset_by_cell |
|
API.get_data_array_by_cell |
Analytics Functions¶
API.get_descriptor |
|
API.get_data |
Exploratory Data Analysis¶
Coming soon
Datacube Storage¶
Datacube Index¶
Apache License¶
Version: | 2.0 |
---|---|
Date: | January 2004 |
URL: | http://www.apache.org/licenses/ |
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION¶
1. Definitions.¶
“License” shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
“Licensor” shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
“Legal Entity” shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, “control” means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
“You” (or “Your”) shall mean an individual or Legal Entity exercising permissions granted by this License.
“Source” form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
“Object” form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
“Work” shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
“Derivative Works” shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
“Contribution” shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, “submitted” means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as “Not a Contribution.”
“Contributor” shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
2. Grant of Copyright License.¶
Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
3. Grant of Patent License.¶
Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
4. Redistribution.¶
You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
- You must give any other recipients of the Work or Derivative Works a copy of this License; and
- You must cause any modified files to carry prominent notices stating that You changed the files; and
- You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
- If the Work includes a
"NOTICE"
text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within suchNOTICE
file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within aNOTICE
text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of theNOTICE
file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to theNOTICE
text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
5. Submission of Contributions.¶
Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
6. Trademarks.¶
This License does not grant permission to use the trade names, trademarks,
service marks, or product names of the Licensor, except as required for
reasonable and customary use in describing the origin of the Work and
reproducing the content of the NOTICE
file.
7. Disclaimer of Warranty.¶
Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
8. Limitation of Liability.¶
In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability.¶
While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work¶
To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets “[]” replaced with your own identifying information. (Don’t include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same “printed page” as the copyright notice for easier identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.