Welcome to pyPhenology’s documentation!

Installation

Required dependencies

  • Python 3.4 or later
  • scipy (1.0 or later)
  • numpy (1.8 or later)
  • pandas (0.18.0 or later)

Instructions

pyPhenology is currently available only via directly installing from github using pip.:

pip install git+git://github.com/sdtaylor/pyPhenology

Quickstart

A Thermal Time Model

An example dataset of blueberry (Vaccinium corymbosum) flower and leaf phenology from Harvard forest is available.:

from pyPhenology import models, utils
observations, temp = utils.load_test_data(name='vaccinium')

The two objects observations and temp are pandas data.frames. The observations data.frame contains the direct phenology observations as well as an associated site and year for each. temp is the daily temperature measurements for each site and year represented in observations. Both of these are required for building models. Read more about how phenology data is structured in this package here.

Initialize and fit a Thermal Time (link to model) model, which has 3 parameters:

model = models.ThermalTime()
model.fit(observations, temp)
model.get_params()

A model can also be loaded via a text string:

Model = utils.load_model('ThermalTime')
model = Model()
model.fit(observations, temp)

Using predict will give predictions of the same data as was used for fitting:

model.predict()

New predictions can be made by passing a new observations and temp data.frames, where the ‘doy’ column in observations is not required. For example here we fit the model and prediction on held out data:

observations_test = observations[1:10]
observations_train = observations[10:]

model.fit(observations_train, temp)
observations_test['prediction_doy'] = model.predict(observations_test, temp)

Data Structure

Your data must be structured in a specific way to be used in the package.

Phenology Observation Data

Observation data consists of the following

  • doy: These are the julien date (1-365) of when a specific phenological event happened.
  • site_id: A site identifier for each doy observation
  • year: A year identifier for each doy observation

These should be structured in columns in a pandas data.frame, where every row is a single observation. For example the built in vaccinium dataset looks like this:

from pyPhenology import models, utils
observations, temp = utils.load_test_data(name='vaccinium')

obserations.head()

                species  site_id  year  doy  phenophase
0  vaccinium corymbosum        1  1991  100         371
1  vaccinium corymbosum        1  1991  100         371
2  vaccinium corymbosum        1  1991  104         371
3  vaccinium corymbosum        1  1998  106         371
4  vaccinium corymbosum        1  1998  106         371

There are extra columns here for the species and phenophase, those will be ignored inside the pyPhenology package.

Phenology Environmental Data

The current models only support daily mean temperature as a driver. Models require the daily temperature for every day of the winter and spring leading up to the phenophase event

  • site_id: A site identifier for each location.
  • year: The year of the temperature timeseries
  • temperatuer: The observed daily mean temperature in degrees Celcius.
  • doy: The julien date of the mean temperature

These should columns in a data.frame like the observations. The example vaccinium dataset has temperature observations:

temp.head()

   site_id  temperature    year  doy
0        1        -3.86  1989.0  0.0
1        1        -4.71  1989.0  1.0
2        1        -1.56  1989.0  2.0
3        1        -7.88  1989.0  3.0
4        1       -15.24  1989.0  4.0

On the Julien Date

TODO Jan. 1 is 0, but prior dates of the same winter are negative numbers.

Notes

  • If you have only a single site, make a “dummy” site_id column set to 1 for both temperature and observation dataframes.
  • If you have only a single year

Models

Primary Models

ThermalTime([parameters]) Thermal Time Model
Alternating([parameters]) Alternating model, originally defined in Cannell & Smith 1983.
Uniforc([parameters]) Uniforc model
Unichill([parameters]) Two phase forcing model using a sigmoid function for forcing units and chilling.
Linear([parameters]) A linear regression where DOY ~ mean_spring_tempearture
MSB([parameters]) Macroscale Species-specific Budburst model.

Ensemble Models

BootstrapModel(core_model, num_bootstraps[, …]) Fit a model using bootstrapping of the data.

Controlling Parameter Estimation

By default all parameters in the models are free to vary within their predefined search ranges. The default search ranges are predefined based on being applicable to a wide variety of plants. Initial parameters can be adjusted in two ways.

  • The search range can be ajdusted.
  • Any or all parameters can be set to a fixed value

Both of these are done via the parameters argument in the initial model call.

Setting parameters to fixed values

Here is a common example, where in the thermal time model t1, the day when warming accumulation begins, is set to Jan. 1 (doy 1) by setting it to an integer. The other two parameters, F and T, and then estimated:

from pyPhenology import models, utils
observations, temp = utils.load_test_data(name='vaccinium')

model = models.ThermalTime(parameters={'t1':1})
model.fit(observations, temp)
model.get_params()

{'T': 8.6286577557177608, 'F': 156.76212563809247, 't1': 1}

Similarly, we can also set the temperature threshold T to fixed values. Then only F, the total degree days required, is estimated:

model = models.ThermalTime(parameters={'t1':1,'T':5})
model.fit(observations, temp)
model.get_params()

{'F': 274.29110894742541, 't1': 1, 'T': 5}

Note that if you set all the parameters of a model to fixed values then no fitting can be done:

model = models.ThermalTime(parameters={'t1':1,'T':5, 'F':50})
model.fit(observations, temp)

RuntimeError: No parameters to estimate

One more example where the Uniforc model is set to a t1 of 60 (about March 1), and the other parameters are estimated:

model = models.Uniforc(parameters={'t1':1})
model.fit(observations, temp)
model.get_params()

{'F': 11.050063297905695, 'b': -2.0395193186815908, 'c': 9.3016675933620956, 't1': 1}

Setting a search range for parameters

To specify a different search range for a parameter use tuples with a high and low value. These can be mixed and matched with setting fixed values.

For example the Thermal Time model with narrow search range for t1 and F but T fixed at 5 degrees C:

model = models.ThermalTime(parameters={'t1':(-10,10), 'F':(100,500),'T':5})
model.fit(observations, temp)
model.get_params()

{'t1': 4.9538373877994291, 'F': 270.006971948699, 'T': 5}

The above works for the optimization methods Differential Evolution (the default), Basin Hopping, and Simulated Annealing. For the brute force method you must specify slice.

TODO

Optimizer Methods

To estimate parameters in models pyPhenology uses optimizers built-in to scipy. The most popular optimization technique in phenology is simulated annealing. This was implimented in scipy previously but was dropped in favor of the basin hopping algorithm. Optimizers available are:

  • Differential evolution (the default)
  • Basin hopping
  • Brute force

Examples

Indices and tables