Welcome to intake_xarray’s documentation!

This package enables the set of data-loading methods from Xarray to be used within the Intake data access and cataloging system.

Quickstart

intake-xarray provides quick and easy access to n dimensional data suitable for reading by xarray.

Installation

To use this plugin for intake, install with the following command:

conda install -c conda-forge intake-xarray

Usage

Inline use

After installation, the functions intake.open_netcdf, intake.open_rasterio, intake.open_zarr, intake.open_xarray_image, and intake.open_opendap will become available. They can be used to open data files as xarray objects.

Creating Catalog Entries

Catalog entries must specify driver: netcdf, driver: rasterio, driver: zarr, driver: xarray_image, or driver: opendap as appropriate.

The zarr and image plugins allow access to remote data stores (s3 and gcs), settings relevant to those should be passed in using the parameter storage_options.

Choosing a Driver

While all the drivers in the intake-xarray plugin yield xarray objects, they do not all accept the same file formats.

netcdf/grib/tif

Supports any local or downloadable file that can be passed to xarray.open_mfdataset. Works for:

opendap

Supports OPeNDAP URLs, optionally with esgf, urs or generic_http authentication.

zarr

Supports .zarr directories. See https://zarr.readthedocs.io/ for more information.

rasterio

Supports any file format supported by rasterio.open - most commonly geotiffs.

Note: Consider installing rioxarray and using the netcdf driver with engine="rasterio".

xarray_image

Supports any file format that can be passed to scikit-image.io.imread which includes all the common image formats (jpg, png, tif, …)

Caching

Remote files can be cached locally by `fsspec<https://filesystem-spec.readthedocs.io/en/latest/features.html#url-chaining>`_. Note that opendap does not support caching as the URL does not back a downloadable file.

API Reference

intake_xarray.netcdf.NetCDFSource(*args, ...)

Open a xarray file.

intake_xarray.opendap.OpenDapSource(*args, ...)

Open a OPeNDAP source.

intake_xarray.xzarr.ZarrSource(*args, **kwargs)

Open a xarray dataset.

intake_xarray.raster.RasterIOSource(*args, ...)

Open a xarray dataset via RasterIO.

intake_xarray.image.ImageSource(*args, **kwargs)

Open a xarray dataset from image files.

class intake_xarray.netcdf.NetCDFSource(*args, **kwargs)[source]

Open a xarray file.

Parameters
urlpathstr, List[str]

Path to source file. May include glob “*” characters, format pattern strings, or list. Some examples:

  • {{ CATALOG_DIR }}/data/air.nc

  • {{ CATALOG_DIR }}/data/*.nc

  • {{ CATALOG_DIR }}/data/air_{year}.nc

chunksint or dict, optional

Chunks is used to load the new dataset into dask arrays. chunks={} loads the dataset with dask using a single chunk for all arrays.

combine({‘by_coords’, ‘nested’}, optional)

Which function is used to concatenate all the files when urlpath has a wildcard. It is recommended to set this argument in all your catalogs because the default has changed and is going to change. It was “nested”, and is now the default of xarray.open_mfdataset which is “auto_combine”, and is planed to change from “auto” to “by_corrds” in a near future.

concat_dimstr, optional

Name of dimension along which to concatenate the files. Can be new or pre-existing if combine is “nested”. Must be None or new if combine is “by_coords”.

path_as_patternbool or str, optional

Whether to treat the path as a pattern (ie. data_{field}.nc) and create new coodinates in the output corresponding to pattern fields. If str, is treated as pattern to match on. Default is True.

xarray_kwargs: dict

Additional xarray kwargs for xr.open_dataset() or xr.open_mfdataset().

storage_options: dict

If using a remote fs (whether caching locally or not), these are the kwargs to pass to that FS.

Attributes
cache
cache_dirs
cat
classname
description
dtype
entry
gui

Source GUI, with parameter selection and plotting

has_been_persisted
hvplot

Returns a hvPlot object to provide a high-level plotting API.

is_persisted
path_as_pattern
pattern
plot

Returns a hvPlot object to provide a high-level plotting API.

plots

List custom associated quick-plots

shape
urlpath

Methods

__call__(**kwargs)

Create a new instance of this source with altered arguments

close()

Delete open file from memory

configure_new(**kwargs)

Create a new instance of this source with altered arguments

describe()

Description from the entry spec

discover()

Open resource and populate the source attributes.

export(path, **kwargs)

Save this data for sharing with other people

get(**kwargs)

Create a new instance of this source with altered arguments

persist([ttl])

Save data from this source to local persistent storage

read()

Return a version of the xarray with all the data in memory

read_chunked()

Return xarray object (which will have chunks)

read_partition(i)

Fetch one chunk of data at tuple index i

to_dask()

Return xarray object where variables are dask arrays

to_spark()

Provide an equivalent data object in Apache Spark

yaml()

Return YAML representation of this data-source

get_persisted

set_cache_dir

class intake_xarray.opendap.OpenDapSource(*args, **kwargs)[source]

Open a OPeNDAP source.

Parameters
urlpath: str

Path to source file.

chunks: None, int or dict

Chunks is used to load the new dataset into dask arrays. chunks={} loads the dataset with dask using a single chunk for all arrays.

auth: None, “esgf” or “urs”

Method of authenticating to the OPeNDAP server. Choose from one of the following: None - [Default] Anonymous access. ‘esgf’ - Earth System Grid Federation. ‘urs’ - NASA Earthdata Login, also known as URS. ‘generic_http’ - OPeNDAP servers which support plain HTTP authentication None - No authentication. Note that you will need to set your username and password respectively using the environment variables DAP_USER and DAP_PASSWORD.

engine: str

Engine used for reading OPeNDAP URL. Should be one of ‘pydap’ or ‘netcdf4’.

Attributes
cache
cache_dirs
cat
classname
description
dtype
entry
gui

Source GUI, with parameter selection and plotting

has_been_persisted
hvplot

Returns a hvPlot object to provide a high-level plotting API.

is_persisted
plot

Returns a hvPlot object to provide a high-level plotting API.

plots

List custom associated quick-plots

shape

Methods

__call__(**kwargs)

Create a new instance of this source with altered arguments

close()

Delete open file from memory

configure_new(**kwargs)

Create a new instance of this source with altered arguments

describe()

Description from the entry spec

discover()

Open resource and populate the source attributes.

export(path, **kwargs)

Save this data for sharing with other people

get(**kwargs)

Create a new instance of this source with altered arguments

persist([ttl])

Save data from this source to local persistent storage

read()

Return a version of the xarray with all the data in memory

read_chunked()

Return xarray object (which will have chunks)

read_partition(i)

Fetch one chunk of data at tuple index i

to_dask()

Return xarray object where variables are dask arrays

to_spark()

Provide an equivalent data object in Apache Spark

yaml()

Return YAML representation of this data-source

get_persisted

set_cache_dir

class intake_xarray.xzarr.ZarrSource(*args, **kwargs)[source]

Open a xarray dataset.

Parameters
urlpath: str

Path to source. This can be a local directory or a remote data service (i.e., with a protocol specifier like 's3://).

storage_options: dict

Parameters passed to the backend file-system

kwargs:

Further parameters are passed to xr.open_zarr

Attributes
cache
cache_dirs
cat
classname
description
dtype
entry
gui

Source GUI, with parameter selection and plotting

has_been_persisted
hvplot

Returns a hvPlot object to provide a high-level plotting API.

is_persisted
plot

Returns a hvPlot object to provide a high-level plotting API.

plots

List custom associated quick-plots

shape

Methods

__call__(**kwargs)

Create a new instance of this source with altered arguments

close()

Delete open file from memory

configure_new(**kwargs)

Create a new instance of this source with altered arguments

describe()

Description from the entry spec

discover()

Open resource and populate the source attributes.

export(path, **kwargs)

Save this data for sharing with other people

get(**kwargs)

Create a new instance of this source with altered arguments

persist([ttl])

Save data from this source to local persistent storage

read()

Return a version of the xarray with all the data in memory

read_chunked()

Return xarray object (which will have chunks)

read_partition(i)

Fetch one chunk of data at tuple index i

to_dask()

Return xarray object where variables are dask arrays

to_spark()

Provide an equivalent data object in Apache Spark

yaml()

Return YAML representation of this data-source

get_persisted

set_cache_dir

close()[source]

Delete open file from memory

class intake_xarray.raster.RasterIOSource(*args, **kwargs)[source]

Open a xarray dataset via RasterIO.

This creates an xarray.array, not a dataset (i.e., there is exactly one variable).

See https://rasterio.readthedocs.io/en/latest/ for the file formats supported, particularly GeoTIFF, and http://xarray.pydata.org/en/stable/generated/xarray.open_rasterio.html#xarray.open_rasterio for possible extra arguments

Parameters
urlpath: str or iterable, location of data

May be a local path, or remote path if including a protocol specifier such as 's3://'. May include glob wildcards or format pattern strings. Must be a format supported by rasterIO (normally GeoTiff). Some examples:

  • {{ CATALOG_DIR }}data/RGB.tif

  • s3://data/*.tif

  • s3://data/landsat8_band{band}.tif

  • s3://data/{location}/landsat8_band{band}.tif

  • {{ CATALOG_DIR }}data/landsat8_{start_date:%Y%m%d}_band{band}.tif

chunks: None or int or dict, optional

Chunks is used to load the new dataset into dask arrays. chunks={} loads the dataset with dask using a single chunk for all arrays. default None loads numpy arrays.

path_as_pattern: bool or str, optional

Whether to treat the path as a pattern (ie. data_{field}.tif) and create new coodinates in the output corresponding to pattern fields. If str, is treated as pattern to match on. Default is True.

Attributes
cache
cache_dirs
cat
classname
description
dtype
entry
gui

Source GUI, with parameter selection and plotting

has_been_persisted
hvplot

Returns a hvPlot object to provide a high-level plotting API.

is_persisted
path_as_pattern
pattern
plot

Returns a hvPlot object to provide a high-level plotting API.

plots

List custom associated quick-plots

shape
urlpath

Methods

__call__(**kwargs)

Create a new instance of this source with altered arguments

close()

Delete open file from memory

configure_new(**kwargs)

Create a new instance of this source with altered arguments

describe()

Description from the entry spec

discover()

Open resource and populate the source attributes.

export(path, **kwargs)

Save this data for sharing with other people

get(**kwargs)

Create a new instance of this source with altered arguments

persist([ttl])

Save data from this source to local persistent storage

read()

Return a version of the xarray with all the data in memory

read_chunked()

Return xarray object (which will have chunks)

read_partition(i)

Fetch one chunk of data at tuple index i

to_dask()

Return xarray object where variables are dask arrays

to_spark()

Provide an equivalent data object in Apache Spark

yaml()

Return YAML representation of this data-source

get_persisted

set_cache_dir

class intake_xarray.image.ImageSource(*args, **kwargs)[source]

Open a xarray dataset from image files.

This creates an xarray.DataArray or an xarray.Dataset. See http://scikit-image.org/docs/dev/api/skimage.io.html#skimage.io.imread for the file formats supported.

NOTE: Although skimage.io.imread is used by default, any reader function which accepts a file object and outputs a numpy array can be used instead.

Parameters
urlpathstr or iterable, location of data

May be a local path, or remote path if including a protocol specifier such as 's3://'. May include glob wildcards or format pattern strings. Must be a format supported by skimage.io.imread or user-supplied imread. Some examples:

  • {{ CATALOG_DIR }}/data/RGB.tif

  • s3://data/*.jpeg

  • https://example.com/image.png

  • s3://data/Images/{{ landuse }}/{{ '%02d' % id }}.tif

chunksint or dict

Chunks is used to load the new dataset into dask arrays. chunks={} loads the dataset with dask using a single chunk for all arrays.

path_as_patternbool or str, optional

Whether to treat the path as a pattern (ie. data_{field}.tif) and create new coodinates in the output corresponding to pattern fields. If str, is treated as pattern to match on. Default is True.

concat_dimstr or iterable

Dimension over which to concatenate. If iterable, all fields must be part of the the pattern.

imreadfunction (optional)

Optionally provide custom imread function. Function should expect a file object and produce a numpy array. Defaults to skimage.io.imread.

preprocessfunction (optional)

Optionally provide custom function to preprocess the image. Function should expect a numpy array for a single image and return a numpy array.

coerce_shapeiterable of len 2 (optional)

Optionally coerce the shape of the height and width of the image by setting coerce_shape to desired shape.

Attributes
cache
cache_dirs
cat
classname
description
dtype
entry
gui

Source GUI, with parameter selection and plotting

has_been_persisted
hvplot

Returns a hvPlot object to provide a high-level plotting API.

is_persisted
path_as_pattern
pattern
plot

Returns a hvPlot object to provide a high-level plotting API.

plots

List custom associated quick-plots

shape
urlpath

Methods

__call__(**kwargs)

Create a new instance of this source with altered arguments

close()

Delete open file from memory

configure_new(**kwargs)

Create a new instance of this source with altered arguments

describe()

Description from the entry spec

discover()

Open resource and populate the source attributes.

export(path, **kwargs)

Save this data for sharing with other people

get(**kwargs)

Create a new instance of this source with altered arguments

persist([ttl])

Save data from this source to local persistent storage

read()

Return a version of the xarray with all the data in memory

read_chunked()

Return xarray object (which will have chunks)

read_partition(i)

Fetch one chunk of data at tuple index i

to_dask()

Return xarray object where variables are dask arrays

to_spark()

Provide an equivalent data object in Apache Spark

yaml()

Return YAML representation of this data-source

get_persisted

set_cache_dir

Contributing to intake-xarray

Contributions are highly welcomed and appreciated. Every little help counts, so do not hesitate!

Feature requests and feedback

Do you like intake-xarray? Share some love on Twitter or in your blog posts!

We’d also like to hear about your propositions and suggestions. Feel free to submit them as issues and:

  • Explain in detail how they should work.

  • Keep the scope as narrow as possible. This will make it easier to implement.

Report bugs

Report bugs for intake-stac in the issue tracker.

If you are reporting a bug, please include:

  • Your operating system name and version.

  • Any details about your local setup that might be helpful in troubleshooting, specifically the Python interpreter version, installed libraries, and intake-stac version.

  • Detailed steps to reproduce the bug.

If you can write a demonstration test that currently fails but should pass (xfail), that is a very useful commit to make as well, even if you cannot fix the bug itself.

Fix bugs

Look through the GitHub issues for bugs.

Talk to developers to find out how you can fix specific bugs.

Write documentation

intake-xarray could always use more documentation. What exactly is needed?

  • More complementary documentation. Have you perhaps found something unclear?

  • Docstrings. There can never be too many of them.

  • Blog posts, articles and such – they’re all very appreciated.

You can also edit documentation files directly in the GitHub web interface, without using a local copy. This can be convenient for small fixes.

Note

Build the documentation locally with the following command:

$ conda env create -f docs/environment.yml
$ cd docs
$ make html

The built documentation should be available in the docs/_build/.

Preparing Pull Requests

  1. Fork the intake-xarray GitHub repository. It’s fine to use intake-xarray as your fork repository name because it will live under your user.

  2. Clone your fork locally using git and create a branch:

    $ git clone git@github.com:YOUR_GITHUB_USERNAME/intake-xarray.git
    $ cd intake-xarray
    
    # now, to fix a bug or add feature create your own branch off "master":
    
    $ git checkout -b your-bugfix-feature-branch-name master
    
  3. Install development version in a conda environment:

    $ conda env create -f ci/environment-py39.yml
    $ conda activate test_env
    $ pip install . -e
    
  4. Run all the tests

    Now running tests is as simple as issuing this command:

    $ pytest --verbose
    

    This command will run tests via the “pytest” tool

  5. Commit and push once your tests pass and you are happy with your change(s):

    $ git commit -a -m "<commit message>"
    $ git push -u
    
  6. Finally, submit a pull request through the GitHub website using this data:

    head-fork: YOUR_GITHUB_USERNAME/intake-xarray
    compare: your-branch-name
    
    base-fork: intake/intake-xarray
    base: master
    

Release a new version

intake-xarray uses the pypipublish GitHub action to publish new versions on PYPI. Just create a new tag git tag 0.4.1, git push upstream –tags, then create a release by visiting https://github.com/intake/intake-xarray/releases/new. When the release is created the version will automatically be uploaded to https://pypi.org/project/intake-xarray/.

Indices and tables