Welcome to HydroBox’s documentation!¶
Development branch¶
Warning
This documentation is by no means finished and in development. Kind of everything here might be subject to change.
About¶
The HydroBox package is a toolbox for hydrological data analysis developed at the Chair of Hydrology at the Karlsruhe Institute of Technology (KIT). The HydroBox has a submodule called toolbox, which is a collection of functions and classes that accept common numpy and pandas input formats and wrap around scipy functionality. Its purpose is:
- to speed up common hydrological data analysis tasks
- to integrate fully with custom numpy/pandas/scipy code
Jump directly to the installation section or get started.
Installation Guide¶
PyPi¶
Install the Hydrobox using pip. The latest version on PyPI can be installed using pip:
pip install hydrobox
GitHub¶
There might be a more recent version on GitHub available. It can be installed as follows:
git clone https://github.com/mmaelicke/hydrobox.git
cd hydrobox
pip install -r requirements.txt
pip install -e .
Examples¶
Important
These examples should help you tp get started with most of the functionality. However, some examples and tools might need a specific database backend or service running on the machine. In this case you have to install the requirements. The reference section should guide you to the correct function with more detailed information on the setup.
Discharge Tools¶
FDC from random data¶
Workflow¶
The workflow in this example will generate some random data and applies two processing steps to illustrate the general idea. All tools are designed to fit seamlessly into automated processing environments like WPS servers or other workflow engines.
The workflow in this example:
- generates a ten year random discharge time series from a gamma distribution
- aggregates the data to daily maximum values
- creates a flow duration curve
- uses python to visualize the flow duration curve
Generate the data¶
# use the ggplot plotting style
In [1]: import matplotlib as mpl
In [2]: mpl.style.use('ggplot')
In [3]: from hydrobox import toolbox
# Step 1:
In [4]: series = toolbox.io.timeseries_from_distribution(
...: distribution='gamma',
...: distribution_args=[2, 0.5], # [location, scale]
...: start='200001010000', # start date
...: end='201001010000', # end date
...: freq='15min', # temporal resolution
...: size=None, # set to None, for inferring
...: seed=42 # set a random seed
...: )
...:
In [5]: print(series.head())
2000-01-01 00:00:00 1.196840
2000-01-01 00:15:00 0.747232
2000-01-01 00:30:00 0.691142
2000-01-01 00:45:00 0.691151
2000-01-01 01:00:00 2.324857
Freq: 15T, dtype: float64
Apply the aggregation¶
In [6]: import numpy as np
In [7]: series_daily = toolbox.aggregate(series, by='1D', func=np.max)
In [8]: print(series_daily.head())
2000-01-01 3.648999
2000-01-02 3.398266
2000-01-03 3.196676
2000-01-04 3.842573
2000-01-05 2.578654
Freq: D, dtype: float64
Calculate the flow duration curve (FDC)¶
# the FDC is calculated on the values only
In [9]: fdc = toolbox.flow_duration_curve(x=series_daily.values, # an FDC does not need a DatetimeIndex
...: plot=False # return values, not a plot
...: )
...:
In [10]: print(fdc[:5])
[0.0002736 0.0005472 0.00082079 0.00109439 0.00136799]
In [11]: print(fdc[-5:])