Welcome to pyPhenology’s documentation!¶
Installation¶
Required dependencies¶
Instructions¶
pyPhenology is currently available only via directly installing from github using pip.:
pip install git+git://github.com/sdtaylor/pyPhenology
Quickstart¶
A Thermal Time Model¶
An example dataset of blueberry (Vaccinium corymbosum) flower and leaf phenology from Harvard forest is available.:
from pyPhenology import models, utils
observations, temp = utils.load_test_data(name='vaccinium')
The two objects observations and temp are pandas data.frames. The observations data.frame contains the direct phenology observations as well as an associated site and year for each. temp is the daily temperature measurements for each site and year represented in observations. Both of these are required for building models. Read more about how phenology data is structured in this package here.
Initialize and fit a Thermal Time (link to model) model, which has 3 parameters:
model = models.ThermalTime()
model.fit(observations, temp)
model.get_params()
A model can also be loaded via a text string:
Model = utils.load_model('ThermalTime')
model = Model()
model.fit(observations, temp)
Using predict will give predictions of the same data as was used for fitting:
model.predict()
New predictions can be made by passing a new observations and temp data.frames, where the ‘doy’ column in observations is not required. For example here we fit the model and prediction on held out data:
observations_test = observations[1:10]
observations_train = observations[10:]
model.fit(observations_train, temp)
observations_test['prediction_doy'] = model.predict(observations_test, temp)
Data Structure¶
Your data must be structured in a specific way to be used in the package.
Phenology Observation Data¶
Observation data consists of the following
- doy: These are the julien date (1-365) of when a specific phenological event happened.
- site_id: A site identifier for each doy observation
- year: A year identifier for each doy observation
These should be structured in columns in a pandas data.frame, where every row is a single observation. For example the built in vaccinium dataset looks like this:
from pyPhenology import models, utils
observations, temp = utils.load_test_data(name='vaccinium')
obserations.head()
species site_id year doy phenophase
0 vaccinium corymbosum 1 1991 100 371
1 vaccinium corymbosum 1 1991 100 371
2 vaccinium corymbosum 1 1991 104 371
3 vaccinium corymbosum 1 1998 106 371
4 vaccinium corymbosum 1 1998 106 371
There are extra columns here for the species and phenophase, those will be ignored inside the pyPhenology package.
Phenology Environmental Data¶
The current models only support daily mean temperature as a driver. Models require the daily temperature for every day of the winter and spring leading up to the phenophase event
- site_id: A site identifier for each location.
- year: The year of the temperature timeseries
- temperatuer: The observed daily mean temperature in degrees Celcius.
- doy: The julien date of the mean temperature
These should columns in a data.frame like the observations. The example vaccinium dataset has temperature observations:
temp.head()
site_id temperature year doy
0 1 -3.86 1989.0 0.0
1 1 -4.71 1989.0 1.0
2 1 -1.56 1989.0 2.0
3 1 -7.88 1989.0 3.0
4 1 -15.24 1989.0 4.0
On the Julien Date¶
TODO Jan. 1 is 0, but prior dates of the same winter are negative numbers.
Notes¶
- If you have only a single site, make a “dummy” site_id column set to 1 for both temperature and observation dataframes.
- If you have only a single year
Models¶
Primary Models¶
ThermalTime ([parameters]) |
Thermal Time Model |
Alternating ([parameters]) |
Alternating model, originally defined in Cannell & Smith 1983. |
Uniforc ([parameters]) |
Uniforc model |
Unichill ([parameters]) |
Two phase forcing model using a sigmoid function for forcing units and chilling. |
Linear ([parameters]) |
A linear regression where DOY ~ mean_spring_tempearture |
MSB ([parameters]) |
Macroscale Species-specific Budburst model. |
Ensemble Models¶
BootstrapModel (core_model, num_bootstraps[, …]) |
Fit a model using bootstrapping of the data. |
Controlling Parameter Estimation¶
By default all parameters in the models are free to vary within their predefined search ranges. The default search ranges are predefined based on being applicable to a wide variety of plants. Initial parameters can be adjusted in two ways.
- The search range can be ajdusted.
- Any or all parameters can be set to a fixed value
Both of these are done via the parameters argument in the initial model call.
Setting parameters to fixed values¶
Here is a common example, where in the thermal time model t1
, the day when warming accumulation begins,
is set to Jan. 1 (doy 1) by setting it to an integer. The other two parameters, F
and T
, and then estimated:
from pyPhenology import models, utils
observations, temp = utils.load_test_data(name='vaccinium')
model = models.ThermalTime(parameters={'t1':1})
model.fit(observations, temp)
model.get_params()
{'T': 8.6286577557177608, 'F': 156.76212563809247, 't1': 1}
Similarly, we can also set the temperature threshold T
to fixed values. Then only F
, the total degree days required,
is estimated:
model = models.ThermalTime(parameters={'t1':1,'T':5})
model.fit(observations, temp)
model.get_params()
{'F': 274.29110894742541, 't1': 1, 'T': 5}
Note that if you set all the parameters of a model to fixed values then no fitting can be done:
model = models.ThermalTime(parameters={'t1':1,'T':5, 'F':50})
model.fit(observations, temp)
RuntimeError: No parameters to estimate
One more example where the Uniforc model is set to a t1
of 60 (about March 1), and the other parameters are estimated:
model = models.Uniforc(parameters={'t1':1})
model.fit(observations, temp)
model.get_params()
{'F': 11.050063297905695, 'b': -2.0395193186815908, 'c': 9.3016675933620956, 't1': 1}
Setting a search range for parameters¶
To specify a different search range for a parameter use tuples with a high and low value. These can be mixed and matched with setting fixed values.
For example the Thermal Time model with narrow search range for t1
and F
but T
fixed at 5 degrees C:
model = models.ThermalTime(parameters={'t1':(-10,10), 'F':(100,500),'T':5})
model.fit(observations, temp)
model.get_params()
{'t1': 4.9538373877994291, 'F': 270.006971948699, 'T': 5}
The above works for the optimization methods Differential Evolution (the default), Basin Hopping, and Simulated Annealing. For the brute force method you must specify slice.
TODO
Optimizer Methods¶
To estimate parameters in models pyPhenology uses optimizers built-in to scipy. The most popular optimization technique in phenology is simulated annealing. This was implimented in scipy previously but was dropped in favor of the basin hopping algorithm. Optimizers available are:
- Differential evolution (the default)
- Basin hopping
- Brute force