Welcome to H5Z-ZFP¶
H5Z-ZFP is a compression filter for HDF5 using the
ZFP compression library,
supporting lossy and lossless compression of floating point and integer data to meet
bitrate, accuracy, and/or precision targets. The filter uses the
registered HDF5 filter ID, 32013
.
It supports single and double precision floating point and integer data chunked in 1, 2 or
3 dimensions. The filter will function on datasets of more than 3 dimensions (or 4
dimensions for ZFP versions 0.5.4 and newer), albeit at the
possible expense of compression performance, as long as no more than 3
(or 4) dimensions of the HDF5 dataset chunking are of size greater than 1.
Contents:
Installation¶
Installing via Spack¶
The HDF5 and ZFP libraries and the H5Z-ZFP plugin are all now part of the
Spack package manager. If you already have Spack installed, the easiest way to
install H5Z-ZFP is to simply use the Spack command spack install h5z-zfp
.
If you do not have Spack installed, it is very easy to install.
git clone https://github.com/llnl/spack.git
. spack/share/spack/setup-env.sh
spack install h5z-zfp
By default, H5Z-ZFP will attempt to build with Fortran support which requires a Fortran compiler. If you wish to exclude support for Fortran, use the command
spack install h5z-zfp~fortran
Note that these commands will build H5Z-ZFP and all of its dependencies including
the HDF5 library (as well as a number of other dependencies you may not initially
expect. Be patient and let the build complete). In addition, by default, Spack installs
packages to directory hashes within the cloned Spack repository’s directory tree,
$spack/opt/spack
. You can find the resulting installed HDF5 library with the command
spack find -vp hdf5
and your resulting H5Z-ZFP plugin installation with the command
spack find -vp h5z-zfp
. If you wish to exercise more control over where Spack
installs things, have a look at
configuring Spack
Manual Installation¶
If Spack is not an option for you, information on manually installing is provided here.
Prerequisites¶
- ZFP Library (or from Github)
- HDF5 Library
- H5Z-ZFP filter plugin
Compiling ZFP¶
- There is a
Config
file in top-level directory of the ZFP distribution that holdsmake
variables the ZFP Makefiles use. By default, this file is setup for a vanilla GNU compiler. If this is not the appropriate compiler, editConfig
as necessary to adjust the compiler and compilation flags. - An important flag you will need to adjust in order to use the ZFP library with this HDF5 filter is
the
BIT_STREAM_WORD_TYPE
CPP flag. To use ZFP with H5Z-ZFP, the ZFP library must be compiled withBIT_STREAM_WORD_TYPE
ofuint8
. Typically, this is achieved by including a line inConfig
of the formDEFS += -DBIT_STREAM_WORD_TYPE=uint8
. If you attempt to use this filter with a ZFP library compiled differently from this, the filter’scan_apply
method will always return false. This will result in silently ignoring an HDF5 client’s request to compress data with ZFP. Also, be sure to see Endian Issues. - After you have setup
Config
, simply runmake
and it will build the ZFP library placing the library in alib
sub-directory and the necessary include files ininc[lude]
sub-directory. - For more information and details, please see the ZFP README.
Compiling HDF5¶
- If you want to be able to run the fortran tests for this filter, HDF5 must be
configured with both the
--enable-fortran
and--enable-fortran2003
configuration switches. Otherwise, any vanilla installation of HDF5 is acceptable. - The Fortran interface to this filter requires a Fortran 2003 compiler because it uses ISO_C_BINDING to define the Fortran interface.
Compiling H5Z-ZFP¶
H5Z-ZFP is designed to be compiled both as a standalone HDF5 plugin and as a separate library an application can explicitly link. See Plugin vs. Library Operation.
Once you have installed the prerequisites, you can compile H5Z-ZFP using a command-line…
make [FC=<Fortran-compiler>] CC=<C-compiler>
ZFP_HOME=<path-to-zfp> HDF5_HOME=<path-to-hdf5>
PREFIX=<path-to-install>
where <path-to-zfp>
is a directory containing ZFP inc[lude]
and lib
dirs and
<path-to-hdf5>
is a directory containing HDF5 include
and lib
dirs.
If you don’t specify a C compiler, it will try to guess one from your path. Fortran
compilation is optional. If you do not specify a Fortran compiler, it will not attempt
to build the Fortran interface. However, if the variable FC
is already defined in
your enviornment (as in Spack for example), then H5Z-ZFP will attempt to build Fortran.
If this is not desired, the solution is to pass an empty FC
on the make command
line as in…
make FC= CC=<C-compiler>
ZFP_HOME=<path-to-zfp> HDF5_HOME=<path-to-hdf5>
PREFIX=<path-to-install>
The Makefile uses GNU Make syntax and is designed to work on OSX and Linux. The filter has been tested on gcc, clang, xlc, icc and pgcc compilers and checked with valgrind.
The command make help
will print useful information
about various make targets and variables. make check
will compile everything
and run a handful of tests.
If you don’t specify a PREFIX
, it will install to ./install
. The installed
package will look like…
$(PREFIX)/include/{H5Zzfp.h,H5Zzfp_plugin.h,H5Zzfp_props.h,H5Zzfp_lib.h}
$(PREFIX)/plugin/libh5zzfp.{so,dylib}
$(PREFIX)/lib/libh5zzfp.a
where $(PREFIX)
resolves to whatever the full path of the installation is.
To use the installed filter as an HDF5 plugin, you would specify, for example,
setenv HDF5_PLUGIN_PATH $(PREFIX)/plugin
H5Z-ZFP Source Code Organization¶
The source code is in two separate directories
src
includes the ZFP filter and a few header files
H5Zzfp_plugin.h
is an optional header file applications may wish to include because it contains several convenient macros for easily controlling various compression modes of the ZFP library (rate, precision, accuracy, expert) via the Generic Interface.H5Zzfp_props.h
is a header file that contains functions to control the filter using temporary Properties Interface. Fortran callers are required to use this interface.H5Zzfp_lib.h
is a header file for applications that wish to use the filter explicitly as a library rather than a plugin.H5Zzfp.h
is an all-of-the-above header file for applications that don’t care too much about separating out the above functionalities.
test
includes various tests. In particulartest_write.c
includes examples of using both the Generic Interface and Properties Interface. In addition, there is an example of how to use the filter from Fortran intest_rw_fortran.F90
.
Silo Integration¶
This filter is also built-in to the Silo library. In particular, the ZFP library itself is also embedded in Silo but is protected from appearing in Silo’s global namespace through a struct of function pointers (see Namespaces in C). If you happen to examine the source code for H5Z-ZFP, you will see some logic there that is specific to using this plugin within Silo and dealing with ZFP as an embedded library using this struct of function pointers wrapper. Just ignore this.
Interfaces¶
There are two interfaces to control the filter. One uses HDF5’s
generic interface via an array of unsigned int cd_values
as is used
in H5Pset_filter(). The other
uses HDF5 properties
added to the dataset creation property list
used when the dataset to be compressed is being created. You can find examples of writing
HDF5 data using both the
generic
and
properties
interfaces in
test_write.c.
The filter itself supports either interface. The filter also supports all of the standard ZFP controls for affecting compression including rate, precision, accuracy, expert and reversible modes. For more information and details about these modes of controlling ZFP compression, please see the ZFP README.
Finally, you should not attempt to combine the ZFP filter with any other byte order altering filter such as, for example, HDF5’s shuffle filter. Space-performance will be ruined. This is in contrast to HDF5’s deflate filter which often performs better when used in conjunction with the shuffle filter.
Generic Interface¶
The generic interface is the only means of controlling the H5Z-ZFP filter when it is used as a dynamically loaded HDF5 plugin.
For the generic interface, the following CPP macros are defined in
the H5Zzfp_plugin.h
header file:
H5Pset_zfp_rate_cdata(double rate, size_t cd_nelmts, unsigned int *cd_vals);
H5Pset_zfp_precision_cdata(unsigned int prec, size_t cd_nelmts, unsigned int *cd_vals);
H5Pset_zfp_accuracy_cdata(double acc, size_t cd_nelmts, unsigned int *cd_vals);
H5Pset_zfp_expert_cdata(unsigned int minbits, unsigned int maxbits,
unsigned int maxprec, int minexp,
size_t cd_nelmts, unsigned int *cd_vals);
H5Pset_zfp_reversible_cdata(size_t cd_nelmts, unsigned int *cd_vals);
These macros utilize type punning to store the relevant ZFP parameters into a
sufficiently large array (>=6) of unsigned int cd_values
. It is up to
the caller to then call
H5Pset_filter()
with the array of cd_values constructed by one of these macros.
Here is example code from test_write.c…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | if (zfpmode == H5Z_ZFP_MODE_RATE)
H5Pset_zfp_rate_cdata(rate, cd_nelmts, cd_values);
else if (zfpmode == H5Z_ZFP_MODE_PRECISION)
H5Pset_zfp_precision_cdata(prec, cd_nelmts, cd_values);
else if (zfpmode == H5Z_ZFP_MODE_ACCURACY)
H5Pset_zfp_accuracy_cdata(acc, cd_nelmts, cd_values);
else if (zfpmode == H5Z_ZFP_MODE_EXPERT)
H5Pset_zfp_expert_cdata(minbits, maxbits, maxprec, minexp, cd_nelmts, cd_values);
else if (zfpmode == H5Z_ZFP_MODE_REVERSIBLE)
H5Pset_zfp_reversible_cdata(cd_nelmts, cd_values);
else
cd_nelmts = 0; /* causes default behavior of ZFP library */
/* print cd-values array used for filter */
printf("%d cd_values= ",cd_nelmts);
for (i = 0; i < cd_nelmts; i++)
printf("%u,", cd_values[i]);
printf("\n");
/* Add filter to the pipeline via generic interface */
if (0 > H5Pset_filter(cpid, H5Z_FILTER_ZFP, H5Z_FLAG_MANDATORY, cd_nelmts, cd_values)) ERROR(H5Pset_filter);
|
However, these macros are only a convenience. You do not need the
H5Zzfp_plugin.h
header file if you want to avoid using it. But, you are then
responsible for setting up the cd_values
array correctly for the
filter. For reference, the cd_values
array for this ZFP filter is
defined like so…
cd_values index | ||||||
ZFP mode | 0 | 1 | 2 | 3 | 4 | 5 |
rate | 1 | unused | rateA | rateB | unused | unused |
precision | 2 | unused | prec | unused | unused | unused |
accuracy | 3 | unused | accA | accB | unused | unused |
expert | 4 | unused | minbits | maxbits | maxprec | minexp |
reversible | 5 | unused | unused | unused | unused | unsued |
A/B are high/low 32-bit words of a double.
Note that the cd_values used in the generic interface to H5Pset_filter()
are not the same cd_values ultimately stored to the HDF5 dataset header
for a compressed dataset. The values are transformed in the set_local
method to use ZFP’s internal routines for ‘meta’ and ‘mode’ data. So,
don’t make the mistake of examining the values you find in a file and
think you can use those same values, for example, in an invokation of
h5repack.
Because of the type punning involved, the generic interface is not suitable for Fortran callers.
Properties Interface¶
For the properties interface, the following functions are defined in
the H5Zzfp_props.h
header file:
herr_t H5Pset_zfp_rate(hid_t dcpl_id, double rate);
herr_t H5Pset_zfp_precision(hid_t dcpl_id, unsigned int prec);
herr_t H5Pset_zfp_accuracy(hid_t dcpl_id, double acc);
herr_t H5Pset_zfp_expert(hid_t dcpl_id,
unsigned int minbits, unsigned int maxbits,
unsigned int maxprec, int minexp);
herr_t H5Pset_zfp_reversible(hid_t dcpl_id);
These functions take a dataset creation property list, hid_t dcp_lid
and
create temporary HDF5 property
list entries to control the ZFP filter. Calling any of these functions
removes the effects of any previous call to any one of these functions.
In addition, calling any one of these functions also has the effect of
adding the filter to the pipeline.
Here is example code from test_write.c…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | H5Z_zfp_initialize();
/* Setup the filter using properties interface. These calls also add
the filter to the pipeline */
if (zfpmode == H5Z_ZFP_MODE_RATE)
H5Pset_zfp_rate(cpid, rate);
else if (zfpmode == H5Z_ZFP_MODE_PRECISION)
H5Pset_zfp_precision(cpid, prec);
else if (zfpmode == H5Z_ZFP_MODE_ACCURACY)
H5Pset_zfp_accuracy(cpid, acc);
else if (zfpmode == H5Z_ZFP_MODE_EXPERT)
H5Pset_zfp_expert(cpid, minbits, maxbits, maxprec, minexp);
else if (zfpmode == H5Z_ZFP_MODE_REVERSIBLE)
H5Pset_zfp_reversible(cpid);
|
The properties interface is more type-safe than the generic interface.
However, there is no way for the implementation of the properties interface
to reside within the filter plugin itself. The properties interface requires that the caller link
with with the filter as a library, libh5zzfp.a
. The generic
interface does not require this.
Note that either interface can be used whether the
filter is used as a plugin or as a library. The difference
is whether the application calls H5Z_zfp_initialize()
or not.
Fortran Interface¶
A Fortran interface based on the properties interface, described above,
has been added by Scot Breitenfeld of the HDF5 group. The code that
implements the Fortran interface is in the file H5Zzfp_props_f.F90
.
An example of its use is in test/test_rw_fortran.F90
. The properties
interface is the only interface available for Fortran callers.
Plugin vs. Library Operation¶
The filter is designed to be compiled for use as both a standalone HDF5 dynamically loaded HDF5 plugin and as an explicitly linked library. When it is used as a plugin, it is a best practice to link the ZFP library into the plugin dynamic/shared object as a static library. Why? In so doing, we ensure that all ZFP public namespace symbols remain confined to the plugin so as not to interfere with any application that may be directly explicitly linking to the ZFP library for other reasons.
All HDF5 applications are required
to find the plugin dynamic library (named lib*.{so,dylib}
)
in a directory specified by the enviornment
variable, HDF5_PLUGIN_PATH
. Currently, the HDF5 library offers
no mechanism for applications themselves to have pre-programmed
paths in which to search for a plugin. Applications are
then always vulnerable to an incorrectly specified or unspecified HDF5_PLUGIN_PATH
environment variable.
However, the plugin can also be used explicitly as a library. In this case,
do not specify the HDF5_PLUGIN_PATH
enviornment variable and instead
have the application link to libH5Zzfp.a
in the lib
dir of the installation.
Instead two initialization and finalization routines are defined:
int H5Z_zfp_initialize(void);
int H5Z_zfp_finalize(void);
These functions are defined in the H5Zzfp_lib.h
header file.
Any applications that wish to use the filter as a library are required to call
the initialization routine, H5Z_zfp_initialize()
before the filter can be
referenced. In addition, to free up resources used by the filter, applications may
call H5Z_zfp_finalize()
when they are done using the filter.
HDF5 Chunking¶
HDF5’s dataset chunking feature is a way to optimize data layout on disk to support partial dataset reads by downstream consumers. This is all the more important when compression filters are applied to datasets as it frees a consumer from suffering the UNcompression of an entire dataset only to read a portion.
ZFP Chunklets¶
When using HDF5 chunking with ZFP compression, it is important to account for the fact that ZFP does its work in tiny 4d chunklets of its own where d is the dataset dimension (rank in HDF5 parlance). This means that that whenever possible chunking dimensions you select in HDF5 should be multiples of 4. When a chunk dimension is not a multiple of 4, ZFP will wind up with partial chunklets which it will pad with useless data reducing overall time and space efficiency of the results.
The degree to which this may degrade performance depends on the percentage of a chunk that is padded. Suppose we have 2D chunk of dimensions 27 x 101. ZFP will have to treat it as 28 x 104 by padding out each dimension to the next closest multiple of 4. The fraction of space that will wind up being wasted due to ZFP chunklet padding will be (28x104-27x101) / (28x104) which is about 6.4%. On the other hand, consider a 3D chunk that is 1024 x 1024 x 2. ZFP will have to treat it as a 1024 x 1024 x 4 resulting in 50% waste.
The latter example is potentialy very relevant when attemping to apply ZFP to compress data long the time dimension in a large, 3D, simulation. Ordinarily, a simulation advances one time step at a time and so needs to store in memory only the current timestep. In order to give ZFP enough width in the time dimension to satisfy the minimum chunklet dimension size of 4, the simulation needs to keep in memory 4 timesteps. This is demonstrated in the example below.
More Than 3 (or 4) Dimensions¶
Versions of ZFP 0.5.3 and older support compression in only 1,2 or 3 dimensions. Versions of ZFP 0.5.4 and newer also support 4 dimensions.
What if you have a dataset with more dimensions than ZFP can compress? You can still use the H5Z-ZFP filter. But, in order to do so you are required to chunk the dataset [1] . Furthermore, you must select a chunk size such that no more than 3 (or 4 for ZFP 0.5.4 and newer) dimensions are non-unitary (e.g. of size one).
For example, what if you are using ZFP 0.5.3 and have a 4D HDF5 dataset you want to compress? To do this, you will need to chunk the dataset and when you define the chunk size and shape, you will need to select which of the 4 dimensions of the chunk you do not intend to have ZFP perform compression along by setting the size of the chunk in that dimension to unity (1). When you do this, as HDF5 processes writes and reads, it will organize the data so that all the H5Z-ZFP filter sees are chunks which have extent only in the non-unity dimensions of the chunk.
In the example below, we have a 4D array of shape int dims[] = {256,128,32,16};
that we have intentionally constructed to be smooth in only 2 of its 4 dimensions
(e.g. correlation is high in those dimensions). Because of that, we expect ZFP
compression to do well along those dimensions and we do no want ZFP to compress
along the other 2 dimensions. The uncorrelated dimensions here are dimensions
with indices 1
(128
in dims[]
) and 3
(16
in dims[]
).
Thus, our chunk size and shape is chosoen to set the size for those dimension
indices to 1
, hsize_t hchunk[] = {256,1,32,1};
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | if (highd)
{
/* dimension indices 0 1 2 3 */
int fd, dims[] = {256,128,32,16};
int ucdims[]={1,3}; /* UNcorrleted dimensions indices */
hsize_t hdims[] = {256,128,32,16};
hsize_t hchunk[] = {256,1,32,1};
buf = gen_random_correlated_array(TYPDBL, 4, dims, 2, ucdims);
cpid = setup_filter(4, hchunk, zfpmode, rate, acc, prec, minbits, maxbits, maxprec, minexp);
if (0 > (sid = H5Screate_simple(4, hdims, 0))) ERROR(H5Screate_simple);
/* write the data WITHOUT compression */
if (0 > (dsid = H5Dcreate(fid, "highD_original", H5T_NATIVE_DOUBLE, sid, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT))) ERROR(H5Dcreate);
if (0 > H5Dwrite(dsid, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL, H5P_DEFAULT, buf)) ERROR(H5Dwrite);
if (0 > H5Dclose(dsid)) ERROR(H5Dclose);
/* write the data with compression */
if (0 > (dsid = H5Dcreate(fid, "highD_compressed", H5T_NATIVE_DOUBLE, sid, H5P_DEFAULT, cpid, H5P_DEFAULT))) ERROR(H5Dcreate);
if (0 > H5Dwrite(dsid, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL, H5P_DEFAULT, buf)) ERROR(H5Dwrite);
if (0 > H5Dclose(dsid)) ERROR(H5Dclose);
/* clean up from high dimensional test */
if (0 > H5Sclose(sid)) ERROR(H5Sclose);
if (0 > H5Pclose(cpid)) ERROR(H5Pclose);
free(buf);
}
|
What analysis process should you use to select the chunk shape? Depending on what you expect in the way of access patters in downstream consumers, this can be a challenging question to answer. There are potentially two competing interests. One is optimizing the chunk size and shape for access patterns anticipated by downstream consumers. The other is optimizing the chunk size and shape for compression. These two interests may not be compatible and you may have to compromise between them. We illustrate the issues and tradeoffs using an example.
Compression Along the State Iteration Dimension¶
By state iteration dimension, we are referring to the main iteration loop(s) of the data producer. For many PDE-based simulations, the main iteration dimension is time. But, for some outer loop methods, the main iteration dimension(s) might be some kind of parameter study including multiple paramaters.
The challenge here is to manage the data in a way that meets ZFP’s chunklet size and shape minimum requirements. In any H5Dwrite at least 4 samples along a ZFP compression dimension are needed or there will be wasted space due to padding. This means that data must be buffered along those dimensions before H5Dwrite’s can be issued.
For example, suppose you have a tensor valued field (e.g. a 3x3 matrix at every point) over a 4D (3 spatial dimensions and 1 time dimension), regularly sampled domain? Conceptually, this is a 6 dimensional dataset in HDF5 with one of the dimensions (the time dimension) extendible. You are free to define this as a 6 dimensional dataset in HDF5. But, you will also have to chunk the dataset. You can select any chunk shape you want except that no more than 3 (or 4 for ZFP versions 0.5.4 and newer) dimensions of the chunk can be non-unity.
In the code snipit below, we demonstrate this case. A key issue to deal with is that because we will use ZFP to compress along the time dimension, this forces us to keep in memory a sufficient number of timesteps to match ZFP’s chunklet size of 4.
The code below iterates over 9 timesteps. Each of the first two groups of 4
timesteps are buffered in memory in tbuf
. Once 4 timesteps have been buffered, we
can issue an H5Dwrite call doing
hyperslab
can issue an H5Dwrite
call doing hyperslab
partial I/O on the 6D, extendible
dataset. But, notice that the chunk dimensions (line 10) are such that only 4 of the
6 dimensions are non-unity. This means ZFP will only ever see something to
compress that is essentially 4D.
On the last iteration, we have only one new timestep. So, when we write this to ZFP 75% of that write will be wasted due to ZFP chunklet padding. However, if the application were to restart from this time and continue forward, this waste will ulimately get overwritten with new timesteps.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | /* Test six dimensional, time varying array...
...a 3x3 tensor valued variable
...over a 3D+time domain.
Dimension sizes are chosen to miss perfect ZFP block alignment.
*/
if (sixd)
{
void *tbuf;
int t, fd, dims[] = {31,31,31,3,3}; /* a single time instance */
int ucdims[]={3,4}; /* indices of UNcorrleted dimensions in dims (tensor components) */
hsize_t hdims[] = {31,31,31,3,3,H5S_UNLIMITED};
hsize_t hchunk[] = {31,31,31,1,1,4}; /* 4 non-unity, requires >= ZFP 0.5.4 */
hsize_t hwrite[] = {31,31,31,3,3,4}; /* size/shape of any given H5Dwrite */
/* Setup the filter properties and create the dataset */
cpid = setup_filter(6, hchunk, zfpmode, rate, acc, prec, minbits, maxbits, maxprec, minexp);
/* Create the time-varying, 6D dataset */
if (0 > (sid = H5Screate_simple(6, hwrite, hdims))) ERROR(H5Screate_simple);
if (0 > (dsid = H5Dcreate(fid, "6D_extendible", H5T_NATIVE_DOUBLE, sid, H5P_DEFAULT, cpid, H5P_DEFAULT))) ERROR(H5Dcreate);
if (0 > H5Sclose(sid)) ERROR(H5Sclose);
if (0 > H5Pclose(cpid)) ERROR(H5Pclose);
/* Generate a single buffer which we'll modulate by a time-varying function
to represent each timestep */
buf = gen_random_correlated_array(TYPDBL, 5, dims, 2, ucdims);
/* Allocate the "time" buffer where we will buffer up each time step
until we have enough to span a width of 4 */
tbuf = malloc(31*31*31*3*3*4*sizeof(double));
/* Iterate, writing 9 timesteps by buffering in time 4x. The last
write will contain just one timestep causing ZFP to wind up
padding all those blocks by 3x along the time dimension. */
for (t = 1; t < 10; t++)
{
hid_t msid, fsid;
hsize_t hstart[] = {0,0,0,0,0,t-4}; /* size/shape of any given H5Dwrite */
hsize_t hcount[] = {31,31,31,3,3,4}; /* size/shape of any given H5Dwrite */
hsize_t hextend[] = {31,31,31,3,3,t}; /* size/shape of */
/* Update (e.g. modulate) the buf data for the current time step */
modulate_by_time(buf, TYPDBL, 5, dims, t);
/* Buffer this timestep in memory. Since chunk size in time dimension is 4,
we need to buffer up 4 time steps before we can issue any writes */
buffer_time_step(tbuf, buf, TYPDBL, 5, dims, t);
/* If the buffer isn't full, just continue updating it */
if (t%4 && t!=9) continue;
/* For last step, adjust time dim of this write down from 4 to just 1 */
if (t == 9)
{
/* last timestep, write a partial buffer */
hwrite[5] = 1;
hcount[5] = 1;
}
/* extend the dataset in time */
if (t > 4)
H5Dextend(dsid, hextend);
/* Create the memory dataspace */
if (0 > (msid = H5Screate_simple(6, hwrite, 0))) ERROR(H5Screate_simple);
/* Get the file dataspace to use for this H5Dwrite call */
if (0 > (fsid = H5Dget_space(dsid))) ERROR(H5Dget_space);
/* Do a hyperslab selection on the file dataspace for this write*/
if (0 > H5Sselect_hyperslab(fsid, H5S_SELECT_SET, hstart, 0, hcount, 0)) ERROR(H5Sselect_hyperslab);
/* Write this iteration to the dataset */
if (0 > H5Dwrite(dsid, H5T_NATIVE_DOUBLE, msid, fsid, H5P_DEFAULT, tbuf)) ERROR(H5Dwrite);
if (0 > H5Sclose(msid)) ERROR(H5Sclose);
if (0 > H5Sclose(fsid)) ERROR(H5Sclose);
}
if (0 > H5Dclose(dsid)) ERROR(H5Dclose);
free(buf);
free(tbuf);
}
|
[1] | The HDF5 library currently requires dataset chunking anyways for any dataset that has any kind of filter applied. |
Using H5Z-ZFP Plugin with H5Repack¶
A convenient way to use and play with the ZFP filter is a plugin with
the HDF5 h5repack
utility using the -f
filter argument to apply ZFP to existing data in a file.
Patching h5repack¶
Some versions of HDF5’s h5repack
utility contain a bug that prevents
proper parsing of the -f
argument’s option. In order to use h5repack
with -f
argument as described here, you need to apply the patch from
h5repack_parse.patch.
To do so, after you’ve downloaded and untar’d HDF5 but before you’ve built
it, do something like the following using HDF5-1.8.14 as an example:
gunzip < hdf5-1.8.14.tar.gz | tar xvf -
cd hdf5-1.8.14
patch ./tools/h5repack/h5repack_parse.c <path-to-H5Z-ZFP-test-dir>/h5repack_parse.patch
Constructing an HDF5 cd_values array¶
HDF5’s h5repack
utility uses only the generic interface to HDF5 filters.
Another challenge in using h5repack
as described here is constructing the set
unsigned int cd_values
as is used in
H5Pset_filter()
required by the generic HDF5 filter interface, especially because
of the type-punning (doubles as unsigned int) involved.
Note: Querying an existing dataset using h5dump
or h5ls
to obtain
the cd_values stored with a ZFP compressed dataset
will not provide the correct cd_values. This is because the cd_values
stored in the file are different from those used in the generic interface
to invoke the ZFP filter.
To facilitate constructing a valid -f
argument to h5repack
, we have
created a utility program, print_h5repack_farg
, which is presently in the
test
directory and is built when tests are built.
You can use the print_h5repack_farg
utility to read a command-line
consisting of ZFP filter parameters you wish to use and it will output
part of the command-line needed for the -f
argument to h5repack
.
Examples¶
In the examples below, we use h5repack
with the example data file,
mesh.h5
in the tests directory.
To use ZFP filter in rate mode with a rate of 4.5
bits per value,
first, use the print_h5repack_farg
:
% ./print_h5repack_farg zfpmode=1 rate=4.5
Print cdvals for set of ZFP compression paramaters...
zfpmode=1 set zfp mode (1=rate,2=prec,3=acc,4=expert)
rate=4.5 set rate for rate mode of filter
acc=0 set accuracy for accuracy mode of filter
prec=11 set precision for precision mode of zfp filter
minbits=0 set minbits for expert mode of zfp filter
maxbits=4171 set maxbits for expert mode of zfp filter
maxprec=64 set maxprec for expert mode of zfp filter
minexp=-1074 set minexp for expert mode of zfp filter
help=0 this help message
h5repack -f argument...
-f UD=32013,4,1,0,0,1074921472
Next, cut-n-paste the -f UD=32013,4,1,0,0,1074921472
in a command
to h5repack
like so:
env LD_LIBRARY_PATH=<path-to-dir-with-libhdf5.so>:$(LD_LIBRARY_PATH) \
HDF5_PLUGIN_PATH=<path-to-dir-with-libh5zzfp.so> \
$(HDF5_BIN)/h5repack -f UD=32013,4,1,0,0,1074921472 \
-l Pressure,Pressure2,Pressure3:CHUNK=10x20x5 \
-l Velocity,Velocity2,Velocity3,VelocityZ,VelocityZ2,VelocityZ3:CHUNK=11x21x1x1 \
-l VelocityX_2D:CHUNK=21x31 \
mesh.h5 mesh_repack.h5
where the -l
arguments indicate the dataset(s) to be re-packed as well
as their (new) chunking.
To use ZFP filter in accuracy mode with an accuracy of 0.075
,
first, use the print_h5repack_farg
:
% ./print_h5repack_farg zfpmode=3 acc=0.075
Print cdvals for set of ZFP compression paramaters...
zfpmode=3 set zfp mode (1=rate,2=prec,3=acc,4=expert)
rate=4 set rate for rate mode of filter
acc=0.075 set accuracy for accuracy mode of filter
prec=11 set precision for precision mode of zfp filter
minbits=0 set minbits for expert mode of zfp filter
maxbits=4171 set maxbits for expert mode of zfp filter
maxprec=64 set maxprec for expert mode of zfp filter
minexp=-1074 set minexp for expert mode of zfp filter
help=0 this help message
h5repack -f argument...
-f UD=32013,4,3,0,858993459,1068708659
Next, cut-n-paste the -f UD=32013,4,3,0,858993459,1068708659
in a command
to h5repack
like so:
env LD_LIBRARY_PATH=<path-to-dir-with-libhdf5.so>:$(LD_LIBRARY_PATH) \
HDF5_PLUGIN_PATH=<path-to-dir-with-libh5zzfp.so> \
$(HDF5_BIN)/h5repack -f UD=32013,4,3,0,858993459,1068708659 \
-l Pressure,Pressure2,Pressure3:CHUNK=10x20x5 \
-l Velocity,Velocity2,Velocity3,VelocityZ,VelocityZ2,VelocityZ3:CHUNK=11x21x1x1 \
-l VelocityX_2D:CHUNK=21x31 \
mesh.h5 mesh_repack.h5
Endian Issues¶
The ZFP library writes an endian-independent stream.
When reading ZFP compressed data on a machine with a different endian-ness than the writer, there is an unnavoidable inefficiency. Upon reading data from disk and decompressing the read stream with ZFP, the correct endian-ness is returned in the result from ZFP before the buffer is handed back to HDF5 from the decompression filter. This happens regardless of reader and writer endian-ness incompatability. However, the HDF5 library is expecting to get from the decompression filter the endian-ness of the data as it was stored to to file (typically that of the writer machine) and expects to have to byte-swap that buffer before returning to any endian-incompatible caller. So, in the H5Z-ZFP plugin, we wind up having to un-byte-swap an already correct result read in a cross-endian context. That way, when HDF5 gets the data and byte-swaps it, it will produce the correct result. There is an endian-ness test in the Makefile and two ZFP compressed example datasets for big-endian and little-endian machines to test that cross-endian reads/writes work correctly.
Finally, endian-targetting, that is setting the file datatype for an endian-ness that is possibly different than the native endian-ness of the writer (to, for example, alleviate down-stream consumers from having to byte-swap due to endian incompatability between writer and reader) is explicitly dis-allowed because it is not an operation that is currently supported by the HDF5 library.
Tests and Examples¶
The tests directory contains a few simple tests of the H5Z-ZFP filter.
The test client, test_write.c
is compiled a couple of different ways.
One target is test_write_plugin
which demonstrates the use of this filter as
a standalone plugin. The other target, test_write_lib
, demonstrates the use
of the filter as an explicitly linked library. These test a simple 1D array with
and without ZFP compression using either the Generic Interface (for plugin)
or the Properties Interface (for library). You can use the code there as an
example of using the ZFP filter either as a plugin or as a library.
The command test_write_lib help
or test_write_plugin help
will print a
list of the example’s options and how to use them.
Write Test Options¶
./test/test_write_lib --help
ifile="" set input filename
ofile="test_zfp.h5" set output filename
1D dataset generation arguments...
npoints=1024 set number of points for 1D dataset
noise=0.001 set amount of random noise in 1D dataset
amp=17.7 set amplitude of sinusoid in 1D dataset
chunk=256 set chunk size for 1D dataset
doint=0 also do integer 1D data
ZFP compression paramaters...
zfpmode=3 (1=rate,2=prec,3=acc,4=expert,5=reversible)
rate=4 set rate for rate mode of filter
acc=0 set accuracy for accuracy mode of filter
prec=11 set precision for precision mode of zfp filter
minbits=0 set minbits for expert mode of zfp filter
maxbits=4171 set maxbits for expert mode of zfp filter
maxprec=64 set maxprec for expert mode of zfp filter
minexp=-1074 set minexp for expert mode of zfp filter
Advanced cases...
highd=0 run 4D case
sixd=0 run 6D extendable case (requires ZFP>=0.5.4)
help=0 this help message
The test normally just tests compression of 1D array of integer
and double precision data of a sinusoidal array with a small
amount of additive random noise. The highd
test runs a test
on a 4D dataset where two of the 4 dimensions are not correlated.
This tests the plugin’s ability to properly set chunking for
HDF5 such that chunks span only correlated dimensions and
have non-unity sizes in 3 or fewer dimensions. The sixd
test runs a test on a 6D, extendible dataset representing an
example of using ZFP for compression along the time axis.
There is a companion, test_read.c
which is compiled into test_read_plugin
and test_read_lib
which demonstrates use of the filter reading data as a
plugin or library. Also, the commands test_read_lib help
and
test_read_plugin help
will print a list of the command line options.
To use the plugin examples, you need to tell the HDF5 library where to find the
H5Z-ZFP plugin with the HDF5_PLUGIN_PATH
environment variable. The value you
pass is the path to the directory containing the plugin shared library.
Finally, there is a Fortran test example, test_rw_fortran.F90. The Fortran test writes and reads a 2D dataset. However, the Fortran test is designed to use the filter only as a library and not as a plugin. The reason for this is that the filter controls involve passing combinations of integer and floating point data from Fortran callers and this can be done only through the Properties Interface, which by its nature requires any Fortran application to have to link with an implementation of that interface. Since we need to link extra code for Fortran, we may as well also link to the filter itself alleviating the need to use the filter as a plugin. Also, if you want to use Fortran support, the HDF5 library must have, of course, been configured and built with Fortran support as well.
In addition, a number tests are performed in the Makefile which test the plugin
by using some of the HDF5 tools such as h5dump
and h5repack
. Again, to
use these tools to read data compressed with the H5Z-ZFP filter, you will need
to inform the HDF5 library where to find the filter plugin. For example..
env HDF5_PLUGIN_PATH=<dir> h5ls test_zfp.h5
Where <dir>
is the relative or absolute path to a directory containing the
filter plugin shared library.