Divebomb¶
Divebomb is a python package that uses pandas to divide a timeseries of depths into individual dives. The dives are profiled as a Dive
or DeepDive
depending on the animal. The Dive
class is used for frequently surfacing animals, such as seals and whales. The DeepDive
class is used for infrequently surfaceing animals, like sharks.
The dive profiles are reduced to 8 dimensions using Principal Component Analsysis. Guassian Mixed Models are generated using theses variables and the minimal Bayesian Information Criterion is used to determine the optimal number of clusters. The dives are split into the clusters using Agglomerative Hierarchical Clustering (from sklearn). The dives are then display through iPython notebooks or saved to netCDF files organized by cluster.
Dive¶
A Dive
is then profiled with the following attributes:
max_depth
- the max depth in the divedive_start
- the timestamp of the first point in the divedive_end
- the timestamp of the last point in the divebottom_start
- the timestamp of the first point in the dive when the animal is at depthtd_bottom_duration
- a timedelta object containing the duration of the time the animal is at depth in secondstd_descent_duration
- a timedelta object containing the duration of the time the animal is descending in secondstd_ascent_duration
- a timedelta object containing the duration of the time the animal is ascending in secondstd_surface_duration
- a timedelta object containing the duration of the time the animal is at the surface in secondsbottom_variance
- the variance of the depth while the animal is at the bottom of the divedive_variance
- the variance of the depth for the entire dive.descent_velocity
- the average velocity of the descentascent_velocity
- the average velocity of the descentpeaks
- the number of peaks found in the dive profileleft_skew
- a boolean of 1 or 0 indicating if the dive is left skewedright_skew
- a boolean of 1 or 0 indicating if the dive is right skewedno_skew
- a boolean of 1 or 0 indicating if the dive is not skewed
DeepDive¶
A DeepDive
is then profiled with the following attributes:
max_depth
- the max depth in the divemin_depth
- the max depth in the divedive_start
- the timestamp of the first point in the divedive_end
- the timestamp of the last point in the divetd_total_duration
- a timedelta (in seconds since 1970-01-01) containing the duration of the divedepth_variance
- the variance of the depth for the entire dive.average_vertical_velocity
- the mean velocity of the animal over the entire dive with negative value indicating upward movementaverage_descent_velocity
- the average velocity of any downward movement as positive valueaverage_ascent_velocity
- the average velocity of any upward movement as positive valuenumber_of_descent_transitions
- the number of times and animal moves descends any distance in the dive periodnumber_of_ascent_transitions
- the number of times and animal moves ascends any distance in the dive periodtotal_descent_distance_traveled
- the total absolute distance in meters in which the anaimal moves downtotal_ascent_distance_traveled
- the total absolute distance in meters in which the anaimal moves upoverall_change_in_depth
- the difference between the minimum and maximum depth within the dive periodtd_time_at_depth
- the duration in seconds at which the animal spends in the deepest part of the vertical movement (< 85% depth)td_time_pre_depth
- the duration in seconds befor the deepest part of the vertical movement (< 85% depth)td_time_post_depth
- the duration in seconds after the deepest part of the vertical movement (< 85% depth)peaks
- the number of peaks found in the dive profileleft_skew
- a boolean of 1 or 0 indicating if the dive is left skewedright_skew
- a boolean of 1 or 0 indicating if the dive is right skewedno_skew
- a boolean of 1 or 0 indicating if the dive is not skewed
Surface Thresholds¶
A surface threshold is used for surfacing animals to define a depth window for what is considered to be at surface. The surface_threshold
argument defaults to 0
but can be changed in the profile_dives()
function. For example surface_threshold=2
might be passed for animal that is ~2
meters long. surface_threshold
is always passed in meters.
At Depth Thresholds¶
An at depth threshold is used in both the Dive
and the DeepDive
class. The at_depth_thresold
argument is a value between 0
and 1
that determines the window for when an animal is considered to be at bottom of its dive. The default value is 0.15
which means the bottom 15%
of the relative depth is considered to be at bottom. at_depth_thresold
is always as value between 0
and 1
expressing a percentage.
Dive Detection¶
There are two arguments that are used to help determine dives on any animal, dive_detection_sensitivity
and minimal_time_between_dives
. The dive_detection_sensitivity
argument is a value between 0
and 1
. The default is 0.98
for surfacing animals and 0.5
for non-surfacing animals. The dive_detection_sensitivity
helps determine range where dive starts can be determined.
The minimal_time_between_dives
is the minimum time (in seconds) that has to occur before a new dive can start. The default value for this is 10
seconds.
Skews¶
A skew is defined as any difference one way or the other in descent or ascent times for the Dive
class and any difference in pre-depth or post-depth time for a DeepDive
. This method was chosen as researchers found skew was most accurately represented when any difference between the values existed.
Timestamps¶
The input timestamps are expected to be in a datetime format. The output timestamps are in seconds since 1970-01-01
.
Every netCDF file has the time unit saved as an attribute as a reminder. All dive attributes that start with td_
are
a duration in seconds. The time
, dive_start
, dive_end
, and bottom_start
will use the units mentioned above.
the netCDF4 library has a num2date
function that will convert the values back to a datetime object.
Divebomb¶
Divebomb is a python package that uses pandas to divide a timeseries of depths into individual dives. The dives are profiled as a Dive
or DeepDive
depending on the animal. The Dive
class is used for frequently surfacing animals, such as seals and whales. The DeepDive
class is used for infrequently surfaceing animals, like sharks.
The dive profiles are reduced to 8 dimensions using Principal Component Analsysis. Guassian Mixed Models are generated using theses variables and the minimal Bayesian Information Criterion is used to determine the optimal number of clusters. The dives are split into the clusters using Agglomerative Hierarchical Clustering (from sklearn). The dives are then display through iPython notebooks or saved to netCDF files organized by cluster.
Dive¶
A Dive
is then profiled with the following attributes:
max_depth
- the max depth in the divedive_start
- the timestamp of the first point in the divedive_end
- the timestamp of the last point in the divebottom_start
- the timestamp of the first point in the dive when the animal is at depthtd_bottom_duration
- a timedelta object containing the duration of the time the animal is at depth in secondstd_descent_duration
- a timedelta object containing the duration of the time the animal is descending in secondstd_ascent_duration
- a timedelta object containing the duration of the time the animal is ascending in secondstd_surface_duration
- a timedelta object containing the duration of the time the animal is at the surface in secondsbottom_variance
- the variance of the depth while the animal is at the bottom of the divedive_variance
- the variance of the depth for the entire dive.descent_velocity
- the average velocity of the descentascent_velocity
- the average velocity of the descentpeaks
- the number of peaks found in the dive profileleft_skew
- a boolean of 1 or 0 indicating if the dive is left skewedright_skew
- a boolean of 1 or 0 indicating if the dive is right skewedno_skew
- a boolean of 1 or 0 indicating if the dive is not skewed
DeepDive¶
A DeepDive
is then profiled with the following attributes:
max_depth
- the max depth in the divemin_depth
- the max depth in the divedive_start
- the timestamp of the first point in the divedive_end
- the timestamp of the last point in the divetd_total_duration
- a timedelta (in seconds since 1970-01-01) containing the duration of the divedepth_variance
- the variance of the depth for the entire dive.average_vertical_velocity
- the mean velocity of the animal over the entire dive with negative value indicating upward movementaverage_descent_velocity
- the average velocity of any downward movement as positive valueaverage_ascent_velocity
- the average velocity of any upward movement as positive valuenumber_of_descent_transitions
- the number of times and animal moves descends any distance in the dive periodnumber_of_ascent_transitions
- the number of times and animal moves ascends any distance in the dive periodtotal_descent_distance_traveled
- the total absolute distance in meters in which the anaimal moves downtotal_ascent_distance_traveled
- the total absolute distance in meters in which the anaimal moves upoverall_change_in_depth
- the difference between the minimum and maximum depth within the dive periodtd_time_at_depth
- the duration in seconds at which the animal spends in the deepest part of the vertical movement (< 85% depth)td_time_pre_depth
- the duration in seconds befor the deepest part of the vertical movement (< 85% depth)td_time_post_depth
- the duration in seconds after the deepest part of the vertical movement (< 85% depth)peaks
- the number of peaks found in the dive profileleft_skew
- a boolean of 1 or 0 indicating if the dive is left skewedright_skew
- a boolean of 1 or 0 indicating if the dive is right skewedno_skew
- a boolean of 1 or 0 indicating if the dive is not skewed
Surface Thresholds¶
A surface threshold is used for surfacing animals to define a depth window for what is considered to be at surface. The surface_threshold
argument defaults to 0
but can be changed in the profile_dives()
function. For example surface_threshold=2
might be passed for animal that is ~2
meters long. surface_threshold
is always passed in meters.
At Depth Thresholds¶
An at depth threshold is used in both the Dive
and the DeepDive
class. The at_depth_thresold
argument is a value between 0
and 1
that determines the window for when an animal is considered to be at bottom of its dive. The default value is 0.15
which means the bottom 15%
of the relative depth is considered to be at bottom. at_depth_thresold
is always as value between 0
and 1
expressing a percentage.
Dive Detection¶
There are two arguments that are used to help determine dives on any animal, dive_detection_sensitivity
and minimal_time_between_dives
. The dive_detection_sensitivity
argument is a value between 0
and 1
. The default is 0.98
for surfacing animals and 0.5
for non-surfacing animals. The dive_detection_sensitivity
helps determine range where dive starts can be determined.
The minimal_time_between_dives
is the minimum time (in seconds) that has to occur before a new dive can start. The default value for this is 10
seconds.
Skews¶
A skew is defined as any difference one way or the other in descent or ascent times for the Dive
class and any difference in pre-depth or post-depth time for a DeepDive
. This method was chosen as researchers found skew was most accurately represented when any difference between the values existed.
Timestamps¶
The input timestamps are expected to be in a datetime format. The output timestamps are in seconds since 1970-01-01
.
Every netCDF file has the time unit saved as an attribute as a reminder. All dive attributes that start with td_
are
a duration in seconds. The time
, dive_start
, dive_end
, and bottom_start
will use the units mentioned above.
the netCDF4 library has a num2date
function that will convert the values back to a datetime object.
Installation¶
Divebomb can be installed using Pip or through a Conda environment.
Conda¶
conda config --add channels conda-forge
conda install divebomb
Pip¶
pip install divebomb
Code Examples¶
Divebomb¶
The example data set below is dive data from grey seal over the course of a few days.
Example data set: Seal Dives
Dives¶
Pass a Pandas DataFrame to the function with a time
and a depth
(in positive meters) column. Provide the surface threshold using
surface_threshold
(in meters). Refine other arguments as needed.
from divebomb import profile_cluster_export
import pandas as pd
data = pd.read_csv('/path/to/data.csv')
surface_threshold=3
profile_cluster_export(data, folder='results', surface_threshold=surface_threshold , columns={'depth': 'depth', 'time': 'time'})
DeepDives¶
To run the profile_cluster_export()
function on an animal, such as a shark, just set
is_surfacing_animal==False
. This variable makes the function call the
DeepDive
class instead. DeepDives
are not dependent on the animal
surfacing again.
import pandas as pd
from divebomb import profile_cluster_export
df = pd.read_csv('/path/to/data.csv')
dives = profile_cluster_export(df, folder='results', is_surfacing_animal=False)
Changing Surface threshold¶
A surface threshold is used for surfacing animals to define a depth window for
what is considered to be at surface. The surface_threshold
argument
defaults to 0
but can be changed in the profile_cluster_export()
function.
For example surface_threshold=2
might be passed for animal that is ~2
meters long. surface_threshold
is always passed in meters.
Example:
import pandas as pd
from divebomb import profile_cluster_export
data = pd.read_csv('data.csv')
surface_threshold = 3 # in meters
dives = profile_cluster_export(data, folder='results', surface_threshold=surface_threshold)
Changing At Depth Threshold¶
An at depth threshold is used in both the Dive
and the DeepDive
class.
The at_depth_thresold
argument is a value between 0
and 1
that
determines the window for when an animal is considered to be at bottom of its
dive. The default value is 0.15
which means the bottom 15%
of the
relative depth is considered to be at bottom. at_depth_thresold
is always
as value between 0
and 1
expressing a percentage.
Example:
import pandas as pd
from divebomb import profile_cluster_export
data = pd.read_csv('data.csv')
at_depth_threshold = 0.2 # A value betwen 0 and 1
dives = profile_cluster_export(data, folder='results', minimal_time_between_dives=minimal_time_between_dives)
Changing Dive Detection Sensitivity¶
The dive_detection_sensitivity
argument is a value between 0
and 1
.
The default is 0.98
for surfacing animals and 0.5
for non-surfacing
animals. The dive_detection_sensitivity
helps determine range where dive
starts can be determined.
Example:
import pandas as pd
from divebomb import profile_cluster_export
data = pd.read_csv('data.csv')
dive_detection_sensitivity = 0.95
dives = profile_cluster_export(data, folder='results', dive_detection_sensitivity=dive_detection_sensitivity)
Changing Minimal Time Between Dives¶
The minimal_time_between_dives
is the minimum time (in seconds) that has
to occur before a new dive can start. The default value for this is 10
seconds.
Example:
import pandas as pd
from divebomb import profile_cluster_export
data = pd.read_csv('data.csv')
minimal_time_between_dives = 600 # in seconds
dives = profile_cluster_export(data, folder='results', minimal_time_between_dives=minimal_time_between_dives)
Separating Out Components¶
Each of the components from profile_cluster_export() can run separately but their input may rely on the out put from the previous. Below is how to run each of the components separately to modify the clustering or export to CSVs
Profile Dives¶
The profile_dives()
function only profiles the dives. It finds the start points for the
dives, then finds the dive attributes. profile_dives()
takes the surface_threshold
,
dive_detection_sensitivity
, at_depth_thresold
, and is_surfacing_animal
arguments
just like profile_cluster_export()
. It returns three datasets of the profiled dives, any
insufficient dives, and the original data.
from divebomb import profile_dives
import pandas as pd
data = pd.read_csv('/path/to/data.csv')
surface_threshold=3
# Profile dives and save the 3 outputs
dives, insufficient_dives, data = profile_dives(data, surface_threshold=surface_threshold)
profile_dives()
also takes and argument to display the dive in a Jupyter Notebook.
If ipython_display_mode=True
then the dives will be displayed with with a slider to
choose the dive.
from divebomb import profile_dives
import pandas as pd
data = pd.read_csv('/path/to/data.csv')
surface_threshold=3
profile_dives(data, surface_threshold=surface_threshold, ipython_display_mode=True)
Cluster Dives¶
The cluster_dives()
functions will take a DataFrame of profiled
dives and cluster on the arguments passed. You can adjust the number
of clusters, the principle component analysis (PCA) components, and
which attributes are used througharguments in the function. cluster_dives()
returns three datasets: the dives with cluster number, the loadings matrix
for the PCA, and the PCA matrix. Below are some examples.
from divebomb import profile_dives, cluster_dives
import pandas as pd
data = pd.read_csv('/path/to/data.csv')
surface_threshold=3
dives, insufficient_dives, data = profile_dives(data, surface_threshold=surface_threshold)
# Get the profiled dives from the profile_dives function above and
# assign the 3 datasets to variables
clustered_dives, loadings, pca_output_matrix = cluster_dives(dives)
Below is an example of overriding the number of clusters generated.
clustered_dives, loadings, pca_output_matrix = cluster_dives(dives, n_cluster=4)
Below is an example of overriding dimensionality reduction in the PCA (the default is 8).
pca_components
must be less than or equal to the number of columns/attributes being used for the
clustering (dive_start
, dive_end
, surface_threshold
, and insufficient_data
will not count towards the number of columns/attributes).
clustered_dives, loadings, pca_output_matrix = cluster_dives(dives, pca_components=4)
Below is an example of selecting which attributes are used in the clustering. The code
only clusters on td_ascent_duration
, td_bottom_duration
, td_descent_duration
,
and td_dive_duration
. We choose pca_components=2
to reduce the dimensionality from
4 to 2.
clustered_dives, loadings, pca_output_matrix = cluster_dives(dives,
pca_components=2,
attributes=['td_ascent_duration',
'td_bottom_duration',
'td_descent_duration',
'td_dive_duration'])
Export Dives¶
Dives can either be exported to NetCDF or CSV. Both profile_dives()
and cluster_dives()
need to be run and assigned to variables to get all dataset created in the process.
export_to_netcdf()
will take all of the datasets and save them to
a .nc
file as well as saving a .nc
for each individual dive in
folders sorted by cluster.
from divebomb import profile_dives, cluster_dives, export_to_csv, export_to_netcdf
import pandas as pd
data = pd.read_csv('/path/to/data.csv')
surface_threshold=3
dives, insufficient_dives, data = profile_dives(data, surface_threshold=surface_threshold)
# Get the profiled dives from the profile_dives function above
clustered_dives, loadings, pca_output_matrix s = cluster_dives(dives)
# Export to netcdf
export_to_netcdf(folder = "nc_results",
data = data,
dives=clustered_dives,
loadings=loadings,
pca_output_matrix=pca_output_matrix,
insufficient_dives=insufficient_dives)
export_to_csv
will take the inputs and save the clustered dives,
loadings, and PCA matrix to a folder as CSVs.
# Export to CSV (no individual dive files)
export_to_csv(folder = "csv_results",
dives=clustered_dives,
loadings=loadings,
pca_output_matrix=pca_output_matrix,
insufficient_dives=insufficient_dives)
All outputs are DataFrames and can be saved individually by appending
.to_csv('filename.csv', index=False)
to the variable. For example,
the code below will save the profiled dives (no clustering) to a CSV.
from divebomb import profile_dives
import pandas as pd
data = pd.read_csv('/path/to/data.csv')
surface_threshold=3
# Profile dives and save the 3 outputs
dives, insufficient_dives, data = profile_dives(data, surface_threshold=surface_threshold)
dives.to_csv('profile_dives.csv', index=False)
Plotting Results¶
Divebomb includes two functions to plot dives. The first, plot_from_nc()
will plot a single dive with disinguished phases. plot_from_nc()
includes a
type
argument that can either be dive
or deepdive
.
The second function cluster_summary_plot
will plot the minimum, maximum,
and mean depth for each cluster. Time is asjusted to be the number of seconds
into the dive, rather than a timestamp. Both axes can be individually scaled
relative to maximum values of the clusters. For example, time can be scaled to
be a proigress percentage through the dive. Scaling can be applied by passing
the following: scale={'depth'=True, 'time':True}
Below are examples and how
they can be applied.
Single Dive¶
Below is an example of a single dive from a surfacing animal.
from divebomb.plotting import plot_from_nc, cluster_summary_plot
path = '/path/to/results_folder'
cluster = 2
dive_id = 555
# Plot inside a notebook
plot_from_nc(path, cluster, dive_id, ipython_display=True)
# Plot out to an HTML file
plot_from_nc(path, cluster, dive_id, ipython_display=False, filename="dive.html")
Dive Clusters¶
Below is an example of the clusters from a surfacing animal.
from divebomb.plotting import cluster_summary_plot
path = '/path/to/results_folder'
# Plot inside a notebook
cluster_summary_plot(path, ipython_display=True)
# Plot out to an HTML file
cluster_summary_plot(path, ipython_display=False, filename="clusters.html", scale={'depth':False, 'time':True})
Single DeepDive¶
Below is an example of non-surfacing animal dive. This example is also a sparser dataset as there are 10 minutes between data points.
from divebomb.plotting import plot_from_nc, cluster_summary_plot
path = '/path/to/results_folder'
cluster = 3
dive_id = 68
# Plot inside a notebook
plot_from_nc(path, cluster, dive_id, ipython_display=True, type='deepdive)
# Plot out to an HTML file
plot_from_nc(path, cluster, dive_id, ipython_display=False, filename='single_deepdive.html', type='deepdive')
Clustered DeepDives¶
Below is an example of the clusters from a non-surfacing animal. This example is also a sparser dataset as there are 10 minutes between data points.
from divebomb.plotting import cluster_summary_plot
path = '/path/to/results_folder'
# Plot inside a notebook
cluster_summary_plot(path, ipython_display=True)
# Plot out to an HTML file
cluster_summary_plot(path, ipython_display=False, filename='deepdive_clusters.html', title='DeepDive Clusters')
Correcting Depth on Surfacing Animals¶
Depth recordings can be uncalihrated or drift over time. The following are two ways from divebomb’s
preprocessing module to correct for the offset on a surfacing animal.
The data passes to the function must have time
and a depth
(in positive meters) columns.
The first uses a local max:
from divebomb import profile_cluster_export
import pandas as pd
window = 3600 #seconds
data = pd.read_csv('/path/to/data.csv')
corrected_depth_data = correct_depth_offset(data, window=window, aux_file='results/aux_file.nc')
The second wethod uses a rolling average of all surface and near surface values in the time window:
from divebomb import profile_cluster_export
import pandas as pd
window = 3600 # seconds
surface_threshold = 4 # meters
data = pd.read_csv('/path/to/data.csv')
corrected_depth_data = correct_depth_offset(data, window=window, method='mean', surface_threshold=surface_threshold, aux_file='results/aux_file.nc')
Divebomb Functions¶
The following are the primary functions by divebomb to process the dives. The main function
is profile_dives()
and the other functions (display_dive()
, cluster_dives()
, and export_dives()
)
are used has helper functions inside profile_dives()
.
-
divebomb.
clean_dive_data
(data, columns={'depth': 'depth', 'time': 'time'})¶ Parameters: - data – a Pandas DataFrame consisting of a time and a depth column
- columns – column renaming dictionary if needed
Returns: a Pandas DataFrame with
time
in seconds since 1970-10-01 anddepth
-
divebomb.
cluster_dives
(dives, pca_components=8, n_clusters=None, attributes=None)¶ This function takes advantage of sklearn and reduces the dimensionality with Principal Component Analysis, finds the optimal number of n_clusters using Gaussian Mixed Models and the Bayesion Information Criterion, then uses Agglomerative Clustering on the dives profiles to group them.
Parameters: - dives – a pandas DataFrame of dive attributes
- pca_components – the number of components for dimensionality reduction. Should be fewer than the number of columns in the dataset.
- n_clusters – An override for the number of clusters to find when clustering
- attributes – A list of variable/columns to use during the process. This can be a subset of the columns in the data.
Returns: the clustered dives, the PCA loadings matrix, and the PCA output matrix
-
divebomb.
display_dive
(index, data, starts, type='dive', surface_threshold=0, at_depth_threshold=0.15)¶ This function just takes the index, the data, and the starts and displays the dive using plotly. It is used as a helper method for viewing the dives if
ipython_display
isTrue
inprofile_dives()
.Parameters: - index – the index of the dive profile to plot
- data – the dataframe of the original dive data
- starts – the dataframe of the dive starts
- type – s tring that indicates using either the
Dive
orDeepDive
class - surface_threshold – the calculated surface threshold based on animal length
- at_depth_threshold – a value from 0 - 1 indicating distance from the bottom of the dive at which the animal is considered to be at depth
Returns: a dive plot from plotly
-
divebomb.
export_dives
(dives, data, folder, is_surface_events=False)¶ This function exports each dive to its own netCDF file grouped by cluster
Parameters: - dives – a Pandas DataFrame of dive profiles to export
- data – a Pandas dataframe of the original dive data
- folder – a string indicating the parent folder for the files and sub folders
- is_surface_events – a boolean indicating if the dive profiles are entirely surface events
-
divebomb.
export_to_csv
(folder, dives, loadings, pca_output_matrix, insufficient_dives=None)¶ Will output dive profiles, loadings, PCA Matrix, and inssufficent dive into the indicated folder as CSVs.
Parameters: - folder – the path to export all files to, the folder will be overwritten
- dives – a Pandas DataFrame of the dive profiles and clusters, usually
generated from
cluster_dives()
- loadings – a Pandas DataFrame of the Principle Component Analysis
loadings from
cluster_dives()
- pca_output_matrix – a Pandas DataFrame of the Principle Component
Analysis results from
cluster_dives()
- insufficent_dives – a Pandas DataFrame of dives that could not be
profiled from
cluster_dives()
-
divebomb.
export_to_netcdf
(folder, data, dives, loadings, pca_output_matrix, insufficient_dives=None)¶ Will output dive profiles, loadings, PCA Matrix, and inssufficent dive into the indicated folder as netCDF files. Additionally subfolders will be output by cluster with separate files for each dive.
Parameters: - folder – the path to export all files to, the folder will be overwritten
- dives – a Pandas DataFrame of the dive profiles and clusters, usually
generated from
cluster_dives()
- loadings – a Pandas DataFrame of the Principle Component Analysis
loadings from
cluster_dives()
- pca_output_matrix – a Pandas DataFrame of the Principle Component
Analysis results from
cluster_dives()
- insufficent_dives – a Pandas DataFrame of dives that could not be
profiled from
cluster_dives()
-
divebomb.
get_dive_starting_points
(data, dive_detection_sensitivity, is_surfacing_animal=True, minimal_time_between_dives=120, surface_threshold=0, columns={'depth': 'depth', 'time': 'time'})¶ Parameters: - data – a dataframe needing a time and a depth column
- is_surfacing_animal – a boolean indicating whether it’s an animal that is gaurantedd to surface between dives
- dive_detection_sensitivity – a value bteween 0 and 1 indicating the peak detection threshold, the lower the value the deeper the threshold
- minimal_time_between_dives – the minimum time in seconds that needs to occur before there can be a new dive segement
- surface_threshold – the threshold at which is considered surface for surfacing animals, default is 0
- columns – column renaming dictionary if needed
-
divebomb.
profile_cluster_export
(data, folder=None, columns={'depth': 'depth', 'time': 'time'}, is_surfacing_animal=True, dive_detection_sensitivity=None, minimal_time_between_dives=120, surface_threshold=0, at_depth_threshold=0.15)¶ Calls profile_dives, cluster_dives, and export_to_netcdf
Parameters: - data – a dataframe needing a time and a depth column
- folder – a parent folder to write out to
- columns – column renaming dictionary if needed
- is_surfacing_animal – a boolean indicating whether it’s an animal that is gauranteed to surface between dives
- dive_detection_sensitivity – a value bteween 0 and 1 indicating the peak detection threshold, the lower the value the deeper the threshold
- minimal_time_between_dives – the minimum time in seconds that needs to occur before there can be a new dive segement
- surface_threshold – the threshold at which is considered surface for surfacing animals, default is 0
Returns: two dataframes for the dive profiles and the original data
-
divebomb.
profile_dives
(data, columns={'depth': 'depth', 'time': 'time'}, is_surfacing_animal=True, dive_detection_sensitivity=None, minimal_time_between_dives=120, surface_threshold=0, ipython_display_mode=False, at_depth_threshold=0.15)¶ Calls the other functions to split and profile each dive. This function uses the
divebomb.Dive
ordivebomb.DeepDive
class to profile the dives.Parameters: - data – a dataframe needing a time and a depth column
- columns – column renaming dictionary if needed
- is_surfacing_animal – a boolean indicating whether it’s an animal that is gauranteed to surface between dives
- dive_detection_sensitivity – a value bteween 0 and 1 indicating the peak detection threshold, the lower the value the deeper the threshold
- minimal_time_between_dives – the minimum time in seconds that needs to occur before there can be a new dive segement
- surface_threshold – the threshold at which is considered surface for surfacing animals, default is 0
- ipython_display_mode – whether or not to display the dives
Returns: two dataframes for the dive profiles, inssufficient dives, and the original data
Dive Class¶
The Dive class is used to encapsulate all the attributes of a dive and the data
needed to reconstruct a plot of the dive. The at_depth_threshold
defaults
to 0.15
which means anything below 85%
of the depth of the
dive is considered to be at depth.
surface_threshold
is used to determine how shallow the animal should be
before it is considered to be at the surface. It defualts tp 0
but can be
adjusted if the animal is large or you want a larger depth window for surface
behaviours. surface_threshold
should be passed in meters.
-
class
Dive.
Dive
(data, columns={'depth': 'depth', 'time': 'time'}, surface_threshold=0, at_depth_threshold=0.15)¶ Variables: - max_depth – the max depth in the dive
- dive_start – the timestamp of the first point in the dive
- dive_end – the timestamp of the last point in the dive
- bottom_start – the timestamp of the first point in the dive when the animal is at depth
- td_bottom_duration – a timedelta object containing the duration of the time the animal is at depth in seconds
- td_descent_duration – a timedelta object containing the duration of the time the animal is descending in seconds
- td_ascent_duration – a timedelta object containing the duration of the time the animal is ascending in seconds
- td_surface_duration – a timedelta object containing the duration of the time the animal is at the surface in seconds
- bottom_variance – the variance of the depth while the animal is at the bottom of the dive
- dive_variance – the variance of the depth for the entire dive.
- descent_velocity – the average velocity of the descent
- ascent_velocity – the average velocity of the ascent
- peaks – the number of peaks found in the dive profile
- left_skew – a boolean of 1 or 0 indicating if the dive is left skewed
- right_skew – a boolean of 1 or 0 indicating if the dive is right skewed
- no_skew – a boolean of 1 or 0 indicating if the dive is not skewed
- insufficient_data – a boolean indicating whether or not the profile could be completed
-
get_ascent_duration
(at_depth_threshold=0.15)¶ This function also sets the bottom duration.
Parameters: at_depth_threshold – a value from 0 - 1 indicating distance from the bottom of the dive at which the animal is considered to be at depth Returns: the ascent duration in seconds
-
get_ascent_velocity
()¶ Returns: the ascent velocity in m/s
-
get_descent_duration
(at_depth_threshold=0.15)¶ Parameters: at_depth_threshold – a value from 0 - 1 indicating distance from the bottom of the dive at which the animal is considered to be at depth Returns: the descent duration in seconds
-
get_descent_velocity
()¶ Returns: the descent velocity in m/s
-
get_peaks
(surface_threshold=0)¶ Returns: number of peaks found within a dive
-
get_surface_duration
()¶ Returns: the surface duration in seconds
-
plot
()¶ Returns: a plotly graph showing the phases of the dive
-
set_bottom_variance
()¶ This function also set total dive variance
Returns: the standard variance in depth during the bottom portion of the dive in meters
-
set_dive_variance
()¶ Returns: the standard variancet in depth during dive in meters
-
set_skew
()¶ Sets the objects skew as left, right, or no skew
-
to_dict
()¶ Returns: a dictionary of the dive profile
DeepDive Class¶
The Dive class is used to encapsulate all the attriubtes of a dive from a
non-surfacing animal and the data needed to reconstruct a plot of the dive.
The at_depth_threshold
defaults t0 0.15
which means anything below
85%
of the relative depth of the dive is considered to be at depth.
-
class
DeepDive.
DeepDive
(data, columns={'depth': 'depth', 'time': 'time'}, at_depth_threshold=0.15)¶ Variables: - max_depth – the max depth in the dive
- min_depth – the max depth in the dive
- dive_start – the timestamp of the first point in the dive
- dive_end – the timestamp of the last point in the dive
- td_total_duration – a timedelta (in seconds since 1970-01-01) containing the duration of the dive
- depth_variance – the variance of the depth for the entire dive.
- average_vertical_velocity – the mean velocity of the animal over the entire dive with negative value indicating upward movement
- average_descent_velocity – the average velocity of any downward movement as positive value
- average_ascent_velocity – the average velocity of any upward movement as positive value
- number_of_descent_transitions – the number of times and animal moves descends any distance in the dive period
- number_of_ascent_transitions – the number of times and animal moves ascends any distance in the dive period
- total_descent_distance_traveled – the total absolute distance in meters in which the anaimal moves down
- total_ascent_distance_traveled – the total absolute distance in meters in which the anaimal moves up
- overall_change_in_depth – the difference between the minimum and maximum depth within the dive period
- td_time_at_depth – the duration in seconds at which the animal spends in the deepest part of the vertical movement (< 85% depth)
- td_time_pre_depth – the duration in seconds befor the deepest part of the vertical movement (< 85% depth)
- td_time_post_depth – the duration in seconds after the deepest part of the vertical movement (< 85% depth)
- peaks – the number of peaks found in the dive profile
- left_skew – a boolean of 1 or 0 indicating if the dive is left skewed
- right_skew – a boolean of 1 or 0 indicating if the dive is right skewed
- no_skew – a boolean of 1 or 0 indicating if the dive is not skewed
-
get_ascent_vertical_distance
()¶ Returns: the total vertical distance travelled downwards in meters
-
get_average_ascent_velocity
()¶ Returns: the average upwards velocity in m/s
-
get_average_descent_velocity
()¶ Returns: the average downwards velocity in m/s
-
get_descent_vertical_distance
()¶ Returns: the total vertical distance travelled upwards in meters
-
get_peaks
()¶ Returns: number of peaks found within a dive
-
get_time_at_depth
(at_depth_threshold=0.15)¶ Parameters: at_depth_threshold – a value from 0 - 1 indicating distance from the bottom of the dive at which the animal is considered to be at depth Returns: the duration at depth in seconds
-
get_time_post_depth
(at_depth_threshold=0.15)¶ Parameters: at_depth_threshold – a value from 0 - 1 indicating distance from the bottom of the dive at which the animal is considered to be at depth Returns: the duration after depth in seconds
-
get_time_pre_depth
(at_depth_threshold=0.15)¶ Parameters: at_depth_threshold – a value from 0 - 1 indicating distance from the bottom of the dive at which the animal is considered to be at depth Returns: the duration before depth in seconds
-
plot
()¶ Returns: a plotly graph showing the phases of the dive
-
set_skew
()¶ Sets the objects skew as left, right, or no skew
-
to_dict
()¶ Returns: a dictionary of the dive profile
Preprocessing Functions¶
The preprocessing module is used help correct dive drift and offsets. The offset is calculated using a rolling time window, similar to what is explained here.
There are two methods for the main function, correct_depth_offset()
:
- max: zeros the local maxium and uses the difference as the offset for the rest
- mean: uses the time window and a maximum depth to look for the average offset within the window
-
divebomb.preprocessing.
calculate_window_mean
(window, surface_threshold, df)¶ Parameters: - window – an int to determine the size for a rolling median
- surface_threshold – the maximum depth that will be considered for the offset
- df – Pandas Dataframe of the dive data
Returns: An average offset in meters using the defined window
-
divebomb.preprocessing.
correct_depth_offset
(data, window=3600, columns={'depth': 'depth', 'time': 'time'}, aux_file='corrected_depth_auxillary_data.nc', method='max', surface_threshold=4)¶ Parameters: - data – The dataset consisting of a time and a depth column
- window – time window (in seconds) to use in the calculation
- aux_file – A netCDF file to write all of the calculated offsets and window size
- columns – column renaming dictionary if needed
- method – either ‘max’ or ‘mean’ declaring the calculation method, default is max
- surface_threshold – maximum values (in meters) to use when using the mean the calculate
Returns: A DataFrame with a corrected depth
-
divebomb.preprocessing.
zlib_encoding
(ds)¶ This is a helper function for xarray to compress all variables going to netCDF
Parameters: ds – an xarray Dataset Returns: A dictionary indicating zlib compression for all variables
Plotting Functions¶
The plotting module can be used to plot the dives from the output netCDF files.
plot_from_nc()
will plot a single dive separated into its pahses and cluster_summary_plot()
will five the minimum, maximum, and average depth at time (seconds) into the dive for each cluster.
-
divebomb.plotting.
cluster_summary_plot
(folder, ipython_display=True, filename='index.html', title='Dive Cluster Summary', scale={'depth': False, 'time': False})¶ Parameters: - folder – the path to the results folder contianing the cluster folders
- ipython_display – a boolean indicating whether or not to show the dive in a notebook
- filename – the filename to save the dive to if it is not shown in a notebook
- title – the displaye title of the plot
Returns: a plotly graph summary of all of the dive clusters
-
divebomb.plotting.
plot_deepdive_from_nc
(folder, cluster, dive_id, ipython_display=True, filename='index.html', at_depth_threshold=0.15)¶ Parameters: - folder – the path to the results folder contianing the cluster folders
- cluster – the number of the cluster of the dive
- dive_id – the number of of the dive
- ipython_display – a boolean indicating whether or not to show the dive in a notebook
- filename – the filename to save the dive to if it is not shown in a notebook
- at_depth_threshold – a value from 0 - 1 indicating distance from the bottom of the dive at which the animal is considered to be at depth
Returns: a plotly line chart of the dive
-
divebomb.plotting.
plot_dive_from_nc
(folder, cluster, dive_id, ipython_display=True, filename='index.html', at_depth_threshold=0.15, title='Clusters')¶ Parameters: - folder – the path to the results folder contianing the cluster folders
- cluster – the number of the cluster of the dive
- dive_id – the number of of the dive
- ipython_display – a boolean indicating whether or not to show the dive in a notebook
- filename – the filename to save the dive to if it is not shown in a notebook
- at_depth_threshold – a value from 0 - 1 indicating distance from the bottom of the dive at which the animal is considered to be at depth
- title – string title of plot
Returns: a plotly line chart of the dive
-
divebomb.plotting.
plot_from_nc
(folder, cluster, dive_id, ipython_display=True, type='dive', filename='index.html', at_depth_threshold=0.15)¶ Parameters: - folder – the path to the results folder contianing the cluster folders
- cluster – the number of the cluster of the dive
- dive_id – the number of of the dive
- type – a string of either either
dive
ordeepdive
- ipython_display – a boolean indicating whether or not to show the dive in a notebook
- filename – the filename to save the dive to if it is not shown in a notebook
- at_depth_threshold – a value from 0 - 1 indicating distance from the bottom of the dive at which the animal is considered to be at depth
Returns: a plotly line chart of the dive