tunacell: Time-lapse UNiCELLular Analyzer

Welcome to tunacell’s documentation. It is divided in four sections: introduction, user manual, advanced user manual, and API documentation generated from docstrings.

Introduction to tunacell

tunacell is a computational tool to study the dynamic properties of growing and dividing cells. It uses raw data extracted from time-lapse microscopy and provides the user with a set of functions to explore and visualize data, and to compute the statistics of the dynamics.

It has been designed with the purpose of studying growth and gene expression from fluorescent reporters in E. coli, but should be applicable to any growing and dividing micro-organism.

tunacell operates after segmentation and tracking has been performed on images. For this first step, there are many tools available on the web, see Segmentation and tracking tools for a short list, and we will try in the future to facilitate the segmentation output to tunacell’s input pipeline for some of these tools.

What it does

tunacell reads segmented/tracked output and is able to reconstruct lineages and colony structures from which it can directly

  1. plot the trajectories of user-defined observables for small samples, to gain qualitative intuition about the dynamic processes;
  2. perform statistical analysis of the dynamics, to gain some quantitative insights about these processes.

It also provides a Python API for the user to tailor his/her specific analysis.

How it works

tunacell is not as smart as you are, but with the appropriate parameters it will compute information faster than you can.

In order to perform any analysis, the user has to define the following quantities:

  • the particular observable you want to look at, whether it is directly time-lapse raw values (e.g. length, total cell fluorescence), differentiation of it (e.g. growth rate, production rate), or quantities defined at cell-cycle stage (e.g. birth length, cell-cycle increase in fluorescence);
  • the set of filters that will define the ensemble over which statistics are computed;
  • the set of conditions that may define subgroups of samples over which comparison of results is relevant.

These steps are described in Setting up your analysis.

After these quantities are defined, API highest level functions are designed to

Lower-level API-functions may be used by experienced users to tailor new, specific analyses.

Why should I use tunacell?

Because as a Python enthusiast, how cool is to say to your colleagues at the next conference you’ll attend: “I use both Python and tuna to analyze data about how bacteria struggle in life” [1].

One of the novelties of tunacell is to provide a powerful tool to perform conditional analysis of the dynamics. By conditional, we mean performing statistical computations over user-defined subgroups of the original sample ensemble.

A first set of functions to make these subgroups are already defined. Although it is not exhaustive, the pre-defined set of subgrouping functions already covers a wide range of possibilities.

Import/export functions have been implemented to save analyses altogether with their parameters to allow the user to keep a structured track of what has been done, making easier to collaborate on the analysis step.

Finally, experienced users will find it useful to be able to extend tuna’s framework, by designing new filtering functions, or implementing statistical analyses tailored to their particular project.

Where to start then?

We encourage readers to start with the 10 minute tutorial that will present the features described above on a simple, numerically generated dataset.

Then plug-in your data, check the documentation, and discover how cool micro-organisms are (on a dynamical point of view).

Contribute

If you find any bug, feel free to report it.

We also welcome contributors that point directly solutions to bugs, or that have implemented other functions useful for analysing the dynamics of growing micro-organisms.

Segmentation and tracking tools

Before using tunacell you need to have segmented and tracked images from your time-lapse movie. Many different softwares exist, depending on your experimental setup. Some exemples are listed (non-exhaustively):

  • SuperSegger : a Matlab-based software, with GUI. Uses machine learning principles to detect false divisions. Adapted for micro-colony growth in agar pads-like setups. Segment brightfield images (possible to inverse fluorescent images).
  • oufti is also a Matlab-based software, following the previous microbetracker software developed by the same group.
  • moma is a Java-based software particularly adapted to mother machine-like setups (and their paper).
  • ieee_seg is another software adapted to mother machine-like setups.

Footnotes

[1]It is highly recommended to double check the conference topic beforehand.

Install

The easiest way to install tunacell is from wheels:

pip install tunacell

However some introductory tutorial and scripts are missing in the library. To get them you can visit the GitHub repository:

https://github.com/LeBarbouze/tunacell

where you can copy/paste these scripts (look into the scripts folder).

To get everything, a good solution is to fork the repository to your local account and/or to clone the repository on your computer. Change directory in your local repo and do a local install:

pip install -e .

(the -e option stands for editable). With such a clone install, the scripts are in the same place, and you can use the Makefile to run tutorials/demos.

Local install

If Python is installed system-wide, you may have to sudo the command above. When it’s not possible you may give the option to install it on the user directory:

pip install -e –user .

Virtual environment

A better solution when Python is to create a virtual environment where you plan to work with tunacell. It requires pip and virtualenv to be installed on your machine. Then the Makefile does the job, run the command:

make virtualenv

that will set up the virtual environment and install pytest and flake8 locally. Activate the virtual environment with:

source venv/bin/activate

Then you can run the pip install -e., or make pipinstall command, without worrying about permissions since everything will be installed locally, and accesses only when your virtual environment is active. When you finish working with tunacell, type:

deactivate

and that’s it.

Dependencies

tunacell depends on few libraries that are automatically installed if you are using pip.

Numpy, Scipy, matplotlib are classic libraries, as well as pandas that is used to provide the user with DataFrame objects for some statistical analyses.

The tree-like structure arising from dividing cells has been implemented using the treelib library.

We use pyYAML to parse yaml files such as metadata or other library-created files, tqdm package for progress bars, and tabulate for fancy tabular printing.

New to Python

Python is a computer programming language and to use it, you need a Python interpreter. To check whether you have the Python interpreter installed on your system, run the following command in a terminal:

python -V

If the answer shows something like Python 2.7.x, or Python 3.6.y, you’re good to go. Otherwise you should install it, either directly downloading Python, or using a friendlier package that will guide you, such as anaconda.

After that you should be ready, and pip should be automatically installed. Again try:

pip -V

If it is not installed, you may check [this to install pip][install-pip].

Then get back to install instructions above.

10 minute tutorial

This tutorial is an introduction to tunacell’s features. A script is associated in the Github repository at scripts/tutorial.py. If you cloned the repo, you can use the Makefile recipe make tuto.

You do not need to plug in your data yet, as we will use numerically simulated data.

Generating data

To generate data, you can use the tunasimu command from a terminal:

$ tunasimu -s 42

where the seed option is set to generate identical data that what is exposed on this page. The terminal output should resemble:

Path: /home/joachim/tmptunacell
Label: simutest
simulation: 100%|██████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 113.94it/s]

The progress bar indicates the time it takes to run the numerical simulations. A new folder tmptunacell is created in your home directory, it contains a fresh new experiment called simutest, composed of numerically generated data, and everything needed for tunacell’s reader functions to work out properly. The name simutest is set by default; it can be set otherwise using the -l option, though keep in mind other scripts are set to read the simutest by default.

Note

If you are not familiar with the term “numerical simulations”, it means generating “fake” data, that look like what would be observed in actual experiments, but are constructed with controled, specific assumptions. Our assumptions are described in Numerical simulations in tunacell.

In a nutshell, we generate fake, controled data about cells that grow and divide.

By typing:

$ cd
$ cd tmptunacell
$ ls
simutest

you should see this new subfolder. If you cd in this subfolder you will see:

$ cd simutest
$ ls
containers descriptor.csv metadata.yml

The containers folder contains data, the descriptor.csv describes the columns in data files, and finally the metadata.yml file associates metadata to current experiment.

We’ll discuss more details about the folder/files organization in Input file format.

Loading data

tunacell is able to read one time-lapse experiment at a time.

Let’s open a Python session (or better, IPython) and open our recently simulated experiment. We are still in the tmptunacell folder when we start our session. To load the experiment simutest in tunacell, type in:

>>> from tunacell import Experiment
>>> exp = Experiment('simutest')

(note that if your Python console is launched elsewhere, you should rather provide the path to the simutest folder, e.g. /home/joachim/tmptunacell/simutest)

Ok, we’re diving in. Let’s call print about this:

>>> exp
Experiment root: /home/joachim/tmptunacell/simutest
Containers:
        container_001
        container_002
        container_003
        container_004
        container_005
        ...
        (100 containers)
birth_size_params:
  birth_size_mean: 1.0
  birth_size_mode: fixed
  birth_size_sd_to_mean: 0.1
date: '2018-01-27'
division_params:
  div_lambda: 0
  div_mode: gamma
  div_sd_to_mean: 0.1
  div_size_target: 2.0
  use_growth_rate: parameter
label: simutest
level: top
ornstein_uhlenbeck_params:
  noise: 8.897278035522248e-08
  spring: 0.03333333333333333
  target: 0.011552453009332421
period: 5.0
simu_params:
  nbr_colony_per_container: 2
  nbr_container: 100
  period: 5.0
  seed: 42
  start: 0.0
  stop: 180.0

The exp object shows up:

  • the absolute path to the folder corresponding to our experiment;
  • the list of container files;
  • the experiment metadata, which summarizes the content of the metadata.yml file. When data is numerically simulated, parameters from the simulation are automatically exported in the metadata file.

To visualize something we need to select/add some samples.

Selecting small samples

To work with hand-picked, or randomly selected samples, we use:

>>> from tunacell import Parser
>>> parser = Parser(exp)

Let’s add a couple of samples, a first one that we know it exists if you used default settings (in particular the seed parameter), and another random sample that will differ from below:

>>> parser.add_sample(('container_079', 4))  # this one works out on default settings
>>> parser.add_sample(1)  # add 1 random sample default settings have not been used
  index  container        cell
-------  -------------  ------
      0  container_079       4  # you should see this sample
      1  container_019      21  # this one may differ

The particular container label and the cell number you’ve got on your screen is unlikely to be the same as the one I got above. The container label indicates which container file has been open, and the cell identifier indicates which cell has been randomly selected in this container file. This entry is associated to index 0, the starting index in Python.

Inspecting small samples

Cell

Type in:

>>> cell = parser.get_cell(0)
>>> print(cell)
4;p:2;ch:-

cell is the Cell instance associated to our cell sample. The print call shows us three fields separated by semicolons. The first field is the cell’s identifier; the second field indicates the parent cell identifier if it exists (otherwise it’s a - minus sign); the third field indicates the offspring identifiers (again a - minus sign indicates this cell has no descendants).

Raw data is stored under the .data attribute:

>>> print(cell.data)
[(130., 0.01094928, 0.08573655, 1.08951925, 4, 2)
 (135., 0.01019836, 0.13886413, 1.14896798, 4, 2)
 (140., 0.01016952, 0.18872295, 1.20770631, 4, 2)
 (145., 0.00969319, 0.23829265, 1.26908054, 4, 2)
 (150., 0.01036414, 0.28900263, 1.33509524, 4, 2)
 (155., 0.01128818, 0.34241417, 1.40834348, 4, 2)
 (160., 0.01110128, 0.39900286, 1.49033787, 4, 2)
 (165., 0.01161387, 0.45529213, 1.57663389, 4, 2)
 (170., 0.0111819 , 0.5127363 , 1.66985418, 4, 2)
 (175., 0.0117796 , 0.56991253, 1.76811239, 4, 2)]

This is a structured array with column names:

>>> print(cell.data.dtype.names)
('time', 'ou', 'ou_int', 'exp_ou_int', 'cellID', 'parentID')

We can spot three recognizable names: time, cellID, and parentID. They give the acquisition time of current row frame, the cell identifier, and its parent cell identifer (0 is reserved to mention ‘no parent cell’). The other column names are a bit cryptic because they come from numerical simulations (see in Numerical simulations in tunacell for more information). What we need to know so far is that exp_ou_int is synonymous with “cell size”, and ou is synonymous of “instantaneous cell size growth rate”. We intentionnally keep this cryptic names to remember we are dealing with “fake data”.

Colony

Type in:

>>> colony = parser.get_colony(0)
>>> colony.show()
1
├── 2
│   ├── 3
│   └── 4
└── 5
    ├── 6
    └── 7

This features comes from the excellent treelib package that handles the tree structure of dividing cells.

You should find in your tree your randomly selected cell.

Plotting small samples

To plot some quantity, we first need to define the observable from raw data. Raw data is presented as columns, and column names are what we call raw observables.

Defining the observable to plot

To define the observable to plot, we are using the Observable object located in tunacell’s main scope:

>>> from tunacell import Observable

and we will choose the cryptic exp_ou_int raw column in our simulated data, and associate it to the “size” variable:

>>> obs = Observable(name='size', raw='exp_ou_int')
>>> print(obs)
Observable(name='size', raw='exp_ou_int', scale='linear', differentiate=False, local_fit=False, time_window=0.0, join_points=3, mode='dynamics', timing='t', tref=None, )

The output of print statement recapitulates all parameters of the observable. A more human-readable output by using the following method:

>>> print(obs.as_string_table())
parameter      value
-------------  ----------
name           size
raw            exp_ou_int
scale          linear
differentiate  False
local_fit      False
time_window    0.0
join_points    3
mode           dynamics
timing         t
tref

We’ll review the details of Observable object in the Observable section.

Calling the plotting function

Now that we have a colony, we would like to inspect the timeseries of our chosen observable in the different lineages. To do this, we import the main object to plot samples:

>>>  from tunacell.plotting.samples import SamplePlot

And we instantiate it with our settings:

    >>> myplot = SamplePlot([colony, ], parser=parser)
    >>> myplot.make_plot(obs)
>>> myplot.save(user_bname='tutorial_sample', add_obs=False)

To print out the figure:

>>> myplot.fig.show()

or when inline plotting is active just type:

>>> myplot.fig

If none of these commands worked out (that would be fairly strange), you can open the file that has been saved in:

~/tmptuna/simutest/sampleplots/

as tutorial_sample-plot.png.

You should see something that looks like:

_images/tutorial_sample-plot.png

Timeseries of a given observable (here size) in all cells from a colony.

Our cryptic exp_ou_int raw data stands in fact for a quantifier of the size of our simulated growing cells, and this plot shows you how size of cells evolves through time and rounds of divisionsin the same colony. This plot shows how this quantity evolves in time, as well as the tree structure divisions are making.

Further exploration about plotting timeseries of small samples is described in Plotting samples

Statistical analysis of the dynamics

The core of tunacell is to analyze the dynamics through statistics.

Warning

It gets a bit tougher to understand the following points if you’re not familiar with concepts of random processes.

Let’s briefly see how one can perform pre-defined analysis.

First, instead of looking at our previous observable, we will look at the basic ou observable:

>>> ou = Observable(name='growth-rate', raw='ou')

It describes a quantity that fluctuates in time around a given average value. One is then interested in inspecting three main things: what is the average value at each time-point of the experiment? how much are the typical deviations from this average value, at each time-point? And how far these fluctuations propagate in time?

We load a high level api function that perform the pre-defined analysis on single observables in order to answer these 3 main questions:

>>> from tunacell.stats.api import compute_univariate
>>> univariate = compute_univariate(exp, ou)

The first time such a command is run on current exp instance, tunacell will parse all data and count how much containers, cells, colonies, and lineages are present. Such a count is printed and should be:

Count summary:
 - cells : 2834
 - lineages : 1517
 - colonies : 200
 - containers : 100

After such a count is performed, a progress bar informs about the time needed to parse data in order to compute univariate statistics. Results can be exported in a structured folder using:

>>> univariate.export_text()

This object univariate stores our statistical quantifiers for our single observable ou. There are functions to generate plots of the results stored in such univariate object:

>>> from tunacell.plotting.dynamics import plot_onepoint, plot_twopoints

We make the plots by typing:

>>> fig = plot_onepoint(univariate, show_ci=True, save=True)
>>> fig2 = plot_twopoints(univariate, save=True)

It generates two plots. If they have not been printed automatically, you can open the pdf files that have been saved using the last line. They have been saved into a new bunch of folders:

~/tmptuna/simutest/analysis/filterset_01/growth-rate

The first one to look at is plot_onepoint_growth-rate_ALL.png, or by typing:

>>> fig.show()

It should print out like this:

_images/plot_onepoint_growth-rate_ALL.png

Plot of one-point functions: counts, average, and variance vs. time.

This plot is divided in three panels. All panels share the same x-axis, time (expressed here in minutes).

  • The top panel y-axis, Counts, is the number of samples at each time-point (number of cells at each time-point); through divisions, this number of cells should increase, roughly exponentially;
  • The middle panel y-axis, Average, is the sample average of our observable ou (remember, this is our simulated stochastic process), the shadowed region is the 99% confidence interval; here the average value is stable, because our stochastic process is made like this;
  • The bottom panel y-axis, Variance, is the sample variance of the data (this is the square of the standard deviation shadow on middle panel, replotted here for convenience); again the standard deviation is stable, up to estimate fluctuations due to finite-size sampling.

The second plot to look at is plot_twopoints_growth-rate_ALL.png, or:

>>> fig2.show()

which should print like this:

_images/plot_twopoints_growth-rate_ALL.png

Plot of two-point functions: counts, autocorrelation functions, and centered superimposition of autocorrelation functions vs.time.

This plot is again divided in three panels. And for each panel, there are 4 curves that represent the autocorrelation function \(a(s, t)\) for four values of the first argument \(s = t_{\mathrm{ref}}\). The top two panels share the same x-axis:

  • top-panel y-axis, Counts, is the number of independent lineages connecting \(t\) to \(t_{\mathrm{ref}}\) (one colour per \(t_{\mathrm{ref}}\));
  • mid-panel y-axis, Autocorr., is the autocorrelation functions;
  • bottom panel superimposes the autocorrelation functions for the 4 different \(a(t_{\mathrm{ref}}, t)\).

Auto-correlation functions obtained directly by reading the auto-covariance matrix, as represented above, are quite noisy since the number of samples, i.e. the number of lineages connecting a cell at time \(t_{\mathrm{ref}}\) to a cell at time \(t\) is experimentally limited (in our numerical experiment we’re reaching \(10^3\) for the red curve when \(t\) is close to \(t_{\mathrm{ref}}=150\) mins, which begins to be acceptable). tunacell provides tools to compute smoother auto- and cross-correlation functions when some conditions are required. It goes beyond the purpose of this introductory tutorial to expose these tools: you can learn more in a the specific tutorial how to compute the statistics of the dynamics, or in the paper.

What to do next?

If you are eager to explore your dataset, check first how your data should be structured so that tunacell can read it. Then you may check how to set your analysis, how to customize sample plots, and finally how to compute the statistics of the dynamics, in particular with respect to conditional analysis.

Enjoy!

Input file format

This section discusses how raw data should be organized so that tunacell is able to read it.

Two types of input are possible:

  • plain text format (full compatibility): data output from any segmentation software can be translated to plain-text format; its format is explained thoroughly below
  • SuperSegger output format (experimental): data output is read directly from the output of the software (stored in a number of Matlab .mat files under a specific folder structure.

A given experiment is stored in a main folder

The name of the main folder is taken as the label of the experiment, i.e. as a unique name that identifies the experiment.

The scaffold to be used in the main folder is:

<experiment_label>/
    containers/
    descriptor.csv
    metadata.yml

If you executed the tunasimu script (see 10 minute tutorial) you can look in the newly created directory tmptunacell in your home directory: there should be a folder simutest storing data from the numerical simulations:

$ cd simutest
$ ls

and check that the structure matches the scaffold above.

There is a subfolder called containers where raw data files are stored, and two text files: descriptor.csv describes the column organization of raw data text files (see Raw data description), while metadata.yml stores metadata about the experiment (see Metadata description). Both files are needed for tunacell to run properly.

Data is stored in container files in the containers subfolder

Time-lapse data is stored in the containers folder. If you ran the 10 minute tutorial you can check what you find in this folder:

$ cd containers
$ ls

You should see a bunch of .txt files (exactly 100 such files if you stuck to default values for the simulation).

Each file in this containers folder recapitulates raw data of cells observed in fields of view of your experiment, which have been reported by your image analysis process.

Your experiment may consist of multiple fields of view (or even subsets thererof), and we call each of these files a container file. Within a given container file, cell identifiers are univocal: there cannot be two different cells with the same identifier.

The container file is tab-separated values, and each column corresponds to a cell quantifier exported by the image analysis process. Each row represents one acquisition frame for a given cell. Rows are grouped by cell: if cell ‘1’ was imaged on 5 successive frames, there should be 5 successive rows in the container file reporting for raw data about cell ‘1’.

Raw data description

The column name and the type of data for each column is reported in the descriptor.csv file, a comma-separated value files, where each line entry consists of <column-name>,<column-type>.

The column name is arbitrary unless for 3 mandatory quantifiers (see mandatory-fields). The column type must be given as numpy datatypes; mostly used datatypes are:

  • f8 are floating point numbers coded on 8 bytes (this should be your default datatype for most quantifiers, except cell identifiers),
  • i4 means integer coded on 4 bytes,
  • u2 usually refer to the Irish band. For our purpose it also means unsigned integer coded on 2 bytes (this is the default for cell identifier, it counts cells up to 65535, which can be upgraded to u4 pushing the limits to 4294967295 cells—after that let me know if you still haven’t found what you’re looking for)

Mandatory raw data columns

  • cellID: the identifier of a given cell. In our example, cells are labeled numerically by integers, hence the type is u2 (Numpy shortname that means unsigned integer coded on 2 bytes);
  • parentID: the identifier of the parent of given cell. This is mandatory for tunacell to reconstruct lineages and colonies;
  • time: time at which acquisition has been made. Its type should be f8, that means floating type coded on 8 bytes. The unit is left to the user’s appreciation (minutes, hours, or it can even be frame acquisition number—though this is discouraged since physical processes are independent of the period of acquisition).

All other fields are left to the user’s discretion.

Example

In our simutest experiment, one could inspect descriptor.csv:

time,f8
ou,f8
ou_int,f8
exp_ou_int,f8
cellID,u2
parentID,u2

In addition to the mandatory fields listed above one can find the following cryptic names: ou, ou_int, exp_ou_int. These are explained in Numerical simulations in tunacell.

Metadata description

YAML format

Experiment metadata is stored in the metadata.yml file which is parsed using the YAML syntax. First the file can be separated in documents (documents are separated by ‘—’). Each document is organized as a list of parameters (parsed as a dictionary). There must be at least one document where the entry level should be set to experiment (or synonymously, top). It indicates the higher level experimental metadata (can be date of experiment, used strain, medium, etc…). A minimal example would be:

level: experiment
period: 3

which indicates that the acquisition time period is 3 minutes. A more complete metadata file could be:

level: experiment
period: 3
strain: E. coli
medium: M9 Glucose
temperature: 37
author: John
date : 2018-01-20

When the experiment has been designed such that metadata is heterogeneous, i.e. some fields of view get a different set of parameters, and that one later needs to distinguish these fields of view, then insert as many new documents as there are different types of fields of view. For example assume our experiment is designed to compare the growth of two strains and that fields of view 01 and 02 get one strain while field of view 03 get the other strain. One way to do it is:

level: experiment
period: 3
---
level:
   - container_01
   - container_02
strain: E. coli MG1655
---
level: container_03
strain: E. coli BW25113

A parameter given in a lower-lover overrides the same experiment-level parameter, which means that such a metadata could be shortened:

level: experiment
period: 3
strain: E. coli MG1655
---
level: container_03
strain: E. coli BW25113

such that it is assumed that the strain is E. coli MG1655 for all container files, unless indicated otherwise which is the case here for container_03 that gets the BW25113 strain.

Tabular format (.csv)

Another option is to store metadata in a tabular file, such as comma-separated values. The header should contain at least level and period. The first row after header is usually reserved for the experiment level metadata, and following rows may be populated for different fields of view. For example the csv file corresponding to our latter example reads:

level,period,strain
experiment,3,E. coli MG1655
container_03,,E.coli BW25113

Although more compact, it can be harder to read/or fill from a text file.

Note

When a container is not listed, its metadata is read from to the experiment metadata. Missing values for a container row are filled with experiment-level values.

Supersegger output

The supersegger output is stored in numerous subfolders from a main folder. The Metadata description needs to be added as well under this main folder.

What to do next?

If you’d like to start analysing your dataset, your first task is to organize data in the presented structure. When it’s done, you can try to adapt the commands from the 10 minute tutorial to your dataset. When you want to get more control about your analysis, have a look at Setting up your analysis which presents you how to set up the analysis, in particular how to define the statistical ensemble and how to create subgroups for statistical analysis. Then you can refer to Plotting samples to customize your qualitative exploration of data, and then dive in Statistics of the dynamics to start the quantitative analysis.

Setting up your analysis

Once raw data files are organized following requirements in Input file format, analysis can get started. A first step is to follow the guidelines in 10 minute tutorial. Here we go into more detail about:

  • how to parse your data,
  • how to define the observable to look at, and
  • how to define conditions.

Experiment and filters

To start the analysis, you need to tell tunacell which experiment to analyse, and whether to apply filters.

Loading an experiment

To set up the experiment, you have to give the path to the experiment folder on your computer. We will denote this path as <path-to-exp>, then use:

from tunacell import Experiment
exp = Experiment(<path-to-exp>)

By default, no filter is applied. But it is possible to associate a set of filters to an experiment, giving instructions to how data structures will be parsed.

Defining the statistical ensemble

The statistical ensemble is the set of cells, lineages, colonies, and containers that are parsed to compute statistics. In some cases, you might be likely to remove outliers, such as cells carrying anomalous values.

To do so a FilterSet instance must be defined and associated it with the Experiment object. Detailed description about how to define filters and filter sets is in Filters. Here we give a simple, concrete example. Suppose you’d like to filter out cells that didn’t divide symmetrically. To do so, you first instantiate the FilterSymmetricDivision class:

from tunacell.filters.cells import FilterSymmetricDivision
myfilter = FilterSymmetricDivision(raw='length', lower_bound=0.4, upper_bound=0.6)

length is used as the raw quantifier (assuming you have a column length in your data files). Such filter requires that the daughter cell length at birth must be bound within 40 and 60 percent of the mother cell’s length at division. Then:

from tunacell import FilterSet
myfset = FilterSet(filtercells=myfilter)

In the last line, the keyword argument specifies filtercells since our filter myfilter acts on Cell instances. You can define one filter for each type of structures: Cell, Colony, Lineage, and Container.

Once that a FilterSet instance is defined, load it with:

exp.set_filter(fset)

Note

Filtering cell outliers may affect the tree structure, decomposing original tree in multiple subtrees where outlier node has been removed. Hence the number of trees generated from one container file depends on the filter applied to cells.

Defining particular samples

All samples from an experiment are used for statistics, under the filtering assumption discussed above. However, visualization of trajectories is performed over a subset of reasonable size: this is what we’ll be calling small samples.

Small samples can be chosen specifically by user (“I am intrigued by this cell, let’s have a look on its trajectory”), or randomly. To do so:

from tunacell import Parser
parser = Parser(exp)

Note that a sample is identified by a couple of labels: the container label, and the cell identifier. For example:

parser.add_sample({'container_label': 'FOV_001', 'cellID': 12})

or synonymously:

parser.add_sample(('FOV_001', 12))

This information is stored under the samples, and you can get a print of the registered samples with:

print(parser.info_samples())

You can also add randomly chosen samples:

parser.add_sample(10)

adds 10 such samples.

Please refer to Parser for more information about how to use it.

Iterating through samples

The exp object provides a set of iterators to parse data at each level, with the appropriate applied filters:

  • Container level with the method iter_containers(), filtered at the container level,
  • Colony level with the method iter_colonies(), filtered at the container, cell, and colony levels,
  • Lineage level with the method iter_lineages(), filtered at the container, cell, colony, and lineages levels
  • Cell level with the method iter_cells(), filtered at the container, cell, colony, and lineages levels.

The idea behind tunacell is to decompose colonies into sets of lineages, i.e. into sets of sequences of parentally linked cells. This way, it is possible to extract time-series that span time ranges larger than single cell cycles.

Note

Decomposition in lineages is performed randomly: at cell division, one daughter cell is chosen randomly to be the next step in the lineage. This way, lineages are independent: a given cell belongs to one, and only one independent lineage.

Iterating over listed samples

Use above-mentioned methods on the Parser instance.

See Parser for more details.

Iterating over all samples

Use above-mentioned methods on the Experiment instance.

See Experiment for more details.

Defining the observable

To define an observable, i.e. a measurable quantity that evolves through time, use the Observable class:

from tunacell import Observable

and instantiate it with parameters to define a particular observable.

First parameter is the name to give to the observable (to find it back in the analysis process).

Second, mandatory parameter is the column to use as raw data (e.g. ‘length’, ‘size’, ‘fluo’, …).

Then, it is possible to use time-lapse data (as stored in data files, or processed using a time-derivative estimate) or to determine the value of said raw observable at a particular cell cycle stage, for example length at birth.

Indicating raw data

First, one needs to indicate which column to be used in the raw data file, by specifying raw='<column-name>'.

When raw data is expected to be steady, or to be a linear function of time within cell cycle, then use scale='linear' (default setting). When it is expected to be an exponential function of time within cell cycle, use scale='log'. We will mention below how this parameter affects some procedures.

Raw data can be used as is, or further processed to provide user-defined observable. Two main modes are used to process raw data:

  • The dynamics mode is used when one wants to analyze observables for all time points; examples are: length, growth rate, …
  • The cell-cycle mode indicates observables that are defined as a single value per cell cycle; examples are: length at birth, average growth rate, …

Dynamic mode

It corresponds to the parameter mode='dynamics'. It sets automatically the timing parameter as timing='t' where t stands for time-lapse timing. It is meant to study observables for all time points (time-lapse, dynamic analysis).

Cell-cycle modes

Cell-cycle modes are used when one observable need to be quantified at the cell-cycle level, i.e. quantified once per cell cycle.There are few cell cycle modes:

  • mode='birth': extrapolates values to estimate observable at cell birth;
  • mode='division': extrapolates values to estimate observable at cell division;
  • mode='net-increase-additive': returns the difference between division and birth values of observable;
  • mode='net-increase-multiplicative': returns the ratio between division and birth values of observable;
  • mode='average': returns the average value of observable along cell cycle;
  • mode='rate': proceeds to a linear/exponential fit of observable depending on the chosen scale parameter. In fact, the procedure always performs linear fits, when scale='log' the log of raw data is used, thereby performing an exponential fit on raw data.

Choosing the timing

For dynamic mode, the only associated timing is t (stands for “time-lapse”). The parameter tref may be used to align time points. When provided as a number, it will be substracted to acquisition time. A string code can be given, 'root' that aligns data with the colony’s root cell division time (caution: when filtering happens, some cells that were acquired at the middle of your experiment can become root cells if their parent cell is an outlier; this may affect dangerously the alignement of your time-series).

For cell-cycle modes it associates to the estimated observable a time-point to be chosen between:

  • b: time at birth, when known;
  • d: time at division, when known;
  • m: time at mid-point trhough cell-cycle;
  • g: generation index, which can be used in conjunction with the parameter tref. When the later is set to a floating number, generation index will be offset to the generation index of the cell’s ancestor that lived at this time of reference if it exists, otherwise, data from this lineage is discarded in analysis. When tref=None, then the generation index is relative to the colony to which belongs current cell.

End-point values are estimated by extrapolation. This is because cell divisions are recorded halfway between parent cell last frame and daughter cell first frame. The extrapolation uses local fits over join_points points.

Warning

generation index may be used with care in statistical estimates over the dynamics of dividing cells, since generation 0 for a given colony does not necessarily correspond to generation 0 of another colony.

Differentiation

In dynamics mode, differentiation is obtained either by default using finite differences with two consecutive points, either by a sliding window fit. For an observable \(x(t)\), depending on the chosen scale, linear or log, it returns the estimate of \(\frac{dx}{dt}\) or \(\frac{d}{dt} \log x(t)\) respectively.

Local fit estimates

As finite difference estimates of derivatives are very sensitive to measurement precision, the user can opt for a local fitting procedure.

This procedure can be applied to estimate derivatives, or values of the observables by performing local linear fit of the scaled observable over a given time window. To use said option, user needs to provide the time window extent, e.g. time_window=15, will proceed to a local fit over a time window of 15 units of time (usually minutes).

Such a local fit procedure restricted to scanning cell-cycle time segments would lead to a loss of exploitable times, as large as the time window, for each cell. To deal with that, the procedure provide a way to use daughter cell information to “fill data estimates” towards the end of cell-cycle. The parameter join_points=3 indicates that end-point values are estimated using 3 first frames, or 3 last frames.

Warning

Using local fitting procedure is likely to artificially correlate time points over the time window time range. Such option can help with data visualization since it smoothens measurement errors, but extreme caution is adviced when this feature is used in statistical analysis.

Examples

Let’s assume that raw data column names include 'length' and 'fluo'.

Example 1: length vs. time

This is the trivial example. We stay in dynamic mode, and we do not associate any further processing to collected data:

>>> length = Observable(name='length', raw='length')
Example 2: length at birth

We go to the corresponding cell-cycle mode with the appropriate timing:

>>> length_at_birth = Observable(name='birth-length', raw='length', mode='birth', timing='b')

Note

one could associate the observable length at birth with another timing, e.g. time at mid cell cycle.

Example 3: Fluorescence production rate (finite differences)
>>> k = Observable(name='prod-rate', raw='fluo', differentiate=True)
Example 4: Fluorescence production rate (local fit)

We found that the later led to really noisy timeseries, so we choose to produce local estimates over 3 points, in an experiment where acquisition period is 4 minutes, it means to have a 12 minutes time-window:

>>> kw = Observable(name='window-prod-rate', raw='fluo', differentiate=True, scale='linear',
                    local_fit=True, time_window=12.)

It computes

\[\frac{d}{dt} \mathrm{fluo}(t)\]

using 12 minutes time windows.

Example 5: Fluorescence production rate (using local fits) at birth

And we want to have it as a function of generation index, setting 0 for cells that live at time 300 minutes:

>>> kw = Observable(name='window-prod-rate-at-birth'raw='fluo', differentiate=True, scale='linear',
                    local_fit=True, time_window=12.,
                    mode='birth', timing='g', tref=300.)

Conditional analysis

We saw in Defining the statistical ensemble that one can define filters that act on cells, or colonies, and to group them in a FilterSet instance that essentially sets the statistical ensemble over which analysis is performed.

There is another utility of these FilterSet objects: they may define sub-ensembles over which analysis is performed in order to compare results over chosen sub-populations. One example is to “gate” cell-cycle quantifiers and observe the statistics of the different sub-populations. Here we extend the gating procedure to analyse any dynamic observable.

To do os, a list of FilterSet instances, one per condition, can be provided to our analysis functions. We refer to the following users pages for further reading on how to use filters, see Filters, and how to run statistical analysis Statistics of the dynamics.

Plotting samples

To gain qualitative intuition about a dataset, it is common to visualize trajectories among few samples. tunacell provides a matplotlib-based framework to visualize timeseries as well as the underlying colony/lineage strutures arising from dividing cells.

Note

In order for the colour-code to work properly, matplotlib must be updated to a version >=2.

In this document we will describe how to use the set of tools defined in tunacell.plotting.samples.

We already saw in the 10 minute tutorial a simple plot of length vs. time in a colony from our numerical simulations. Here we will review the basics of plotting small samples in few test cases.

Note

If you cloned tunacell repository, there are two ways of executing quickly the following tutorial.

You may run the script plotting-samples.py with the following command:

python plotting-samples.py -i --seed 951

The seed is used to select identical samples as the one printed below.

Alternatively it can be run from the root folder using the Makefile:

make plotting-demo

If you execute one of the commands above, there is no need to run the commands below. Follow the command line explanations and cross-reference it with the following commands to understand how it works. If you didn’t execute the commands above, you can run sequentially the commands below.

Setting up samples and observables

For plotting demonstration, we will create a numerically simulated experiment, where the dynamics is sampled on a time interval short enough for the colonies to be of reasonable size. Call from a terminal:

tunasimu -l simushort --stop 120 --seed 167389

In a Python script/shell, we load data with the usual:

from tunacell import Experiment, Parser, Observable, FilterSet
from tunacell.filters.cells import FilterCellIDparity
from tunacell.plotting.samples import SamplePlot

exp = Experiment('~/tmptunacell/simushort')
parser = Parser(exp)
np.random.seed(seed=951)  # uncomment this line to match samples/plots below
parser.add_sample(10)

# define a condition
even = FilterCellIDparity('even')
condition = FilterSet(filtercell=even)

# define observable
length = Observable(name='length', raw='exp_ou_int')
ou = Observable(name='growth-rate', raw='ou')

We have defined two observables and one condition used as a toy example. With these preliminary lines, we are ready to plot timeseries. The main object to call is SamplePlot, which accepts the following parameters:

  • samples, an iterable over Colony or Lineage instances
  • the Parser instance used to parse data,
  • the list of conditions (optional).

We already saw how to define instances of the class Observable. Samples can be chosen samples, or random samples from the experiment. We will review below the different cases with concrete examples from our settings.

We have 10 samples in our parser, that have been chosen randomly. Remember that they can also be specified on purpose with the container and cell identifiers. Once stored in the parser object, they can be addressed by their index in the table; to check the table of samples, call:

print(parser)

If you used the default settings, you should observe:

  index  container        cell
-------  -------------  ------
      0  container_015       3
      1  container_087      14
      2  container_002       6
      3  container_012      12
      4  container_096      15
      5  container_040       8
      6  container_088      14
      7  container_007       1
      8  container_042       2
      9  container_013       5

How to plot a colony sample

We start from the basic example initiated in the 10 minute tutorial:

colony = parser.get_colony(0)  # any index between 0 and 9 would do

and we call our plotting environment:

colplt = SamplePlot([colony, ], parser=parser, conditions=[condition, ])

The first argument is an Observable instance, the second the sample(s) to be plotted, then it is more explicit. Conditions must be given as a list of FilterSet instances (the list can be left empty).

Using default settings

We start with the default settings and will inspect the role of each parameter:

colplt.make_plot(length)

The figure is stored as the fig attribute of colplt:

colplt.fig.show()  # in non-interactive mode, colplt.fig in interactive mode

This kind of plot should be produced:

_images/colony0-plot.png

Timeseries of length vs time for one colony, default settings.

The default settings for a colony plot display:

  • one lineage per row (it comes from keyword parameter superimpose='none'),
  • cell identifiers on top of each cell (report_cids=True),
  • container and colony root identifiers when they change,
  • vertical lines to follow divisions (report_divisions=True).

Data points are represented by plain markers (show_markers=True) and with underlying, transparent connecting lines for visual help (show_lines=True). Title of plot is made from the Observable.as_latex_string() method.

Visualization of a given condition

The first feature we explore is to visualize whether samples verify a given condition. To do so, use the report_condition keyword parameter:

colplt.make_plot(length, report_condition=repr(condition))

Conditions are labeled according to their representation, this is why we used the repr() call.

Now the fig attribute should store the following result:

_images/colony0-even-plot.png

Timeseries of length vs time for one colony. Plain markers are used for samples that verify the condition (cell identifier is even), empty markers point to samples that do not verify the condition.

Colouring options

Colour can be changed for distinct cells, lineages, colonies, or containers (given in order of priority), or not changed at all.

Changing cell colour
colplt.make_plot(length, report_condition=repr(condition), change_cell_color=True)
_images/colony0-even-cell-color-plot.png

Colour is changed for each cell, and assigned with respect to the generation index of the cell in the colony. This allows to investigate how generations unsynchronize through time.

Changing lineage colour
colplt.make_plot(length, report_condition=repr(condition), change_lineage_color=True)
_images/colony0-even-lineage-color-plot.png

Colour is changed for each lineage, i.e each row in this colony plot.

Superimposition options

The default setting is not to superimpose lineages. It is possible to change this behaviour by changing the superimpose keyword parameter. Some keywords are reserved:

  • 'none': do not superimpose timeseries,
  • 'all': superimpose all timeseries into a single row plot,
  • colony : superimpose all timeseries from the same colony, thereby making as many rows as there are different colonies in the list of samples,
  • container: idem with container level,

and when an integer is given, each row will be filled with at most that number of lineages.

For example, if we superimpose at most 3 lineages:

colplt.make_plot(length, report_condition=repr(condition), change_lineage_color=True,
             superimpose=3)
_images/colony0-even-super3-plot.png

Superimposition of at most 3 lineages with superimpose=3. Once superimpose is different from 'none' (or 1), the vertical lines showing cell divisions and cell identifiers are not shown (what happens is that the options report_cids and report_divisions are overriden to False.

Plotting few colonies

So far our sample was a unique colony. It is possible to plot multiples colonies in the same plot, that can be given as an iterable over colonies:

splt = SamplePlot(parser.iter_colonies(mode='samples', size=2),
               parser=parser, conditions=[condition, ])
splt.make_plot(length, report_condition=repr(condition), change_colony_color=True)

Here we iterated over colonies from the samples defined in parser.samples.

_images/colonies-even-plot.png

First two colonies from parser.samples, with changing colony colour option.

Now we will switch to the other observable, ou, which is the instantaneous growth rate:

splt3.make_plot(ou, report_condition=repr(condition), change_colony_color=True,
                superimpose=2)
_images/colonies-ou-even-plot.png

Same samples as above, but we changed the observable to growth rate.

We can also iterate over unselected samples: iteration goes through container files:

splt = SamplePlot(parser.iter_colonies(size=5), parser=parser,
                   conditions=[condition, ])
splt.make_plot(ou, report_condition=repr(condition), change_colony_color=True,
                superimpose=2)
_images/colonies5-ou-even-plot.png

Two lineages are superimposed on each row. Colour is changed for each new colony.

To get an idea of the divergence of growth rate, it is better to plot all timeseries in a single row plot. We mask markers and set the transparency to distinguish better individual timeseries:

splt.make_plot(ou, change_colony_color=True, superimpose='all', show_markers=False,
                alpha=.6)
_images/lineages-from-colonies5-plot.png

Lineages from the 5 colonies superimposed on a single row plot.

Plotting few lineages

Instead of a colony, or an iterable over colonies, one can use a lineage or an iterable over lineages as argument of the plotting environment:

splt = SamplePlot(parser.iter_lineages(size=10), parser=parser,
                   conditions=[condition, ])
splt.make_plot(ou, report_condition=repr(condition), change_lineage_color=True,
                superimpose='all', alpha=.6)
_images/lineages10-plot.png

10 lineages from an iterator on a single row plot.

Adding reference values

One can add expectation values for the mean, and for the variance, to be plotted as a line for the mean and +/- standard deviations.

From the numerical simulation metadata, it is possible to compute the mean value and the variance of the process:

md = parser.experiment.metadata
# ou expectation values
ref_mean = float(md.target)
ref_var = float(md.noise)/(2 * float(md.spring))

and then to plot it to check how our timeseries compare to these theoretical values:

splt.make_plot(ou, report_condition=repr(condition), change_lineage_color=True,
                superimpose='all', alpha=.5, show_markers=False,
                ref_mean=ref_mean, ref_var=ref_var)
_images/lineages10-with-ref-plot.png

Timeseries from lineages are reported together with theoretical mean value (dash-dotted horizontal line) +/- one standard deviation (dotted lines).

Adding information from computed statistics

We sill review the computation of the statistics in the next document, but we will assume it has been performed for our observable ou. The data_statistics option is used to display results of statistics, which is useful when no theoretical values exist (most of the time):

splt.make_plot(ou, report_condition=repr(condition), change_lineage_color=True,
            superimpose='all', alpha=.5, show_markers=False,
            data_statistics=True)
_images/lineages10-with-stats-plot.png

Data statistics have been added: grey line shows the estimated mean value and shadows show +/- one estimated standard deviation. Note that these values have been estimated over the entire statistical ensemble, not just the plotted timeseries.

Statistics of the dynamics

Once qualitative intuition has been gained by plotting time series from a few samples (see Plotting samples) one can inspect quantitatively the dynamics of a dataset by using tunacell’s pre-defined tools.

Univariate and bivariate analysis tools are coded in tunacell in order to describe the statistics of single, resp. a couple of, observable(s).

We start with a background introduction to those concepts. Then we set the session, and show tunacell’s procedures.

Note

This manual page is rather tedious. To be more practical, open, read, and run the following scripts univariate-analysis.py, univariate-analysis-2.py, and bivariate-analysis.py. Background information on this page, and figures might be useful as a side help.

Background

We consider a stochastic process \(x(t)\).

One-point functions

One-point functions are statistical estimates of functions of a single time-point. Typical one-point functions are the average at a given time

\[\langle x(t) \rangle\]

where the notation \(\langle \cdot \rangle\) means taking the ensemble average of the quantity; or the variance

\[\sigma^2 (t) = \langle (x(t) - \langle x(t) \rangle )^2 \rangle\]

Inspecting these functions provides a first quantitative approach of the studied process.

In our case, time series are indexed to acquisition times \(t_i\), where \(i=0,\ 1,\ 2,\ \dots\). Usually

\[t_i = t_{\mathrm{offset}} + i \times \Delta t\]

where \(\Delta t\) is the time interval between two frame acquisitions, and \(t_{\mathrm{offset}}\) is an offset that sets the origin of times.

Then if we denote \(s^{(1)}_i\) as the number of cells acquired at time index \(i\), the average value at the same time of observable \(x\) is evaluated in tunacell as:

\[\langle x(t_i) \rangle = \frac{1}{s^{(1)}_i} \sum_{k=1}^{s^{(1)}_i} x^{(k)}(t_i)\]

where \(x^{(k)}(t)\) is the value of observable \(x\) in cell \(k\) at time \(t\).

Two-point functions

Two-point functions are statistical estimates of functions of two time-points. The typical two-point function is the auto-correlation function, defined as:

\[a(s, t) = \langle (x(s) - \langle x(s) \rangle) \times (x(t) - \langle x(t) \rangle) \rangle\]

In tuna it is estimated using:

\[a_{ij} = \frac{1}{s^{(2)}_{ij}} \sum_{k=1}^{s_{ij}} x^{(k)}(t_i) \times x^{(k)}(t_j) - \langle x(t_i) \rangle \langle x(t_j) \rangle \equiv a(t_i, t_j)\]

where the sum over \(k\) means over lineages connecting times \(t_i\) to \(t_j\) (there are \(s^{(2)}_{ij}\) such lineages).

With our method, for a cell living at time \(t_i\), there will be at most one associated descendant cell at time \(t_j\). There may be more descendants living at \(t_j\), but only one is picked at random according to our lineage decomposition procedure.

For identical times, the auto-correlation coefficient reduces to the variance:

\[a_{ii} \equiv \langle \left( x(t_i) - \langle x(t_i) \rangle \right)^2 \rangle = \sigma^2(t_i)\]

Under stationarity hypothesis, the auto-correlation function depends only on time differences such that:

\[a(s, t) = \tilde{a}(s-t)\]

(and the function \(\tilde{a}\) is symmetric: \(\tilde{a}(-u)=\tilde{a}(u)\)).

In tuna, there is a special procedure to estimate \(\tilde{a}\) which is to use the lineage decomposition to generate time series, and then for a given time interval \(u\), we collect all couple of points \((t_i, t_j)\) such that \(u = t_i - t_j\), and perform the average over all these samples.

We can extend such a computation to two observables \(x(t)\) and \(y(t)\). The relevant quantity is the cross-correlation function

\[c(s, t) = \langle (x(s) - \langle x(s) \rangle ) \times (y(t) - \langle y(t) \rangle ) \rangle\]

which we estimate through cross-correlation coefficients

\[c_{ij} = c(t_i, t_j)\]

Again under stationarity hypothesis, the cross-correlation function depends only on time differences:

\[c(s, t) = \tilde{c}(s - t)\]

though now the function \(\tilde{c}\) may not be symmetric.

Note

At this stage of development, extra care has not been taken to ensure ideal properties for our statistical estimates such as unbiasedness. Hence caution should be taken for the interpretation of such estimates.

Warming-up [1]

We start with:

from tunacell import Experiment, Observable, FilterSet
from tunacell.filters.cells import FilterCellIDparity

exp = Experiment('~/tmptunacell/simutest')

# define a condition
even = FilterCellIDparity('even')
condition = FilterSet(label='evenID', filtercell=even)

Note

The condition we are using in this example serves only as a test; we do not expect that the subgroup of cells with even identifiers differ from all cells, though we expect to halve the samples and thus we can appreciate the finite-size effects.

In this example, we look at the following dynamic observables:

ou = Observable(name='exact-growth-rate', raw='ou')

The ou—Ornstein-Uhlenbeck—observable process models instantaneous growth rate. As it is a numerical simulation, we have some knowledge of the statistics of such process. We import some of them from the metadata:

md = exp.metadata
params = md['ornstein_uhlenbeck_params']
ref_mean = params['target']
ref_var = params['noise']/(2 * params['spring'])
ref_decayrate = params['spring']

Starting with the univariate analysis

To investigate the statistics of a single observable over time, tuna uses the lineage decomposition to parse samples and computes incrementally one- and two-point functions.

Estimated one-point functions are the number of samples and the average value at each time-point. Estimated two-point functions are the correlation matrix between any couple of time-points, which reduces to the variance for identical times.

The module tuna.stats.api stores most of the functions to be used

To perform the computations, we import tunacell.stats.api.compute_univariate() and call it:

from tunacell.stats.api import compute_univariate
univ = compute_univariate_dynamics(exp, ou, cset=[condition, ])

This function computes one-point and two-point functions as described above and stores the results in univ, a tuna.stats.single.Univariate instance. Results are reported for the unconditioned data, under the master label, and for each of the conditions provided in the cset list. Each individual group is an instance of tuna.stats.single.UnivariateConditioned, which attributes points directly to the estimated one- and two-point functions. These items can be accessed as values of a dictionary:

result = univ['master']
result_conditioned = univ[repr(condition)]

As the master is always defined, one can alternatively use the attribute:

result = univ.master

Inspecting univariate results

The objects result and result_conditioned are instances of the UnivariateConditioned class, where one can find the following attributes: time, count_one, average, count_two, and autocorr; these are Numpy arrays.

To be explicit, the time array is the array of each \(t_i\) where observables have been evaluated. The count_one array stores the corresponding number of samples \(s^{(1)}_i\) (see Background), and the average array stores the \(\langle x(t_i) \rangle\) average values.

One can see an excerpt of the table of one-point functions by typing:

result.display_onepoint(10)  # 10 lines excerpt

which should be like:

   time  counts   average   std_dev
0   0.0     200  0.011725  0.001101
1   5.0     207  0.011770  0.001175
2  10.0     225  0.011780  0.001201
3  15.0     253  0.011766  0.001115
4  20.0     265  0.011694  0.001119
5  25.0     286  0.011635  0.001149
6  30.0     301  0.011627  0.001147
7  35.0     318  0.011592  0.001173
8  40.0     337  0.011564  0.001189
9  45.0     354  0.011578  0.001150

The count_two 2d array stores matrix elements \(s^{(2)}_{ij}\) corresponding to the number of independent lineages connecting time \(t_i\) to \(t_j\), and the attribute autocorr stores the matrix elements \(a_{ij}\) (auto-covariance coefficients). The std_dev column of the latter table is in fact computed as the square root of the diagonal of such auto-covariance matrix (such diagonal is the variance at each time-point).

An excerpt of the auto-covariance function can be printed:

result.display_twopoint(10)

which should produce something like:

   time-row  time-col  counts  autocovariance
0       0.0       0.0     200    1.211721e-06
1       0.0       5.0     200    1.093628e-06
2       0.0      10.0     200    7.116838e-07
3       0.0      15.0     200    3.415255e-07
4       0.0      20.0     200    6.881773e-07
5       0.0      25.0     200    1.027559e-06
6       0.0      30.0     200    1.053278e-06
7       0.0      35.0     200    5.925049e-07
8       0.0      40.0     200   -7.884958e-08
9       0.0      45.0     200   -8.413113e-08

Examples

To fix the idea, if we want to plot the sample average as a function of time for the whole statistical ensemble, here’s how one can do:

import matplotlib.pyplot as plt
plt.plot(univ.master.time, univ.master.average)
plt.show()

If one wants to plot the variance as a function of time for the condition results:

import numpy as np
res = univ[repr(condition)]
plt.plot(res.time, np.diag(res.autocorr))

To obtain a representation of the auto-correlation function, we set a time of reference and find the closest index in the time array:

tref = 80.
iref = np.argmin(np.abs(res.time - tref)  # index in time array
plt.plot(res.time, res.autocorr[iref, :])

Such a plot represents the autocorrelation \(a(t_{\mathrm{ref}}, t)\) as a function of \(t\).

We will see below some pre-defined plotting capabilities.

Computations can be exported as text files

To save the computations, just type:

univ.export_text()

This convenient function exports computations as text files, under a folder structure that stores the context of the computation such as the filter set, the various conditions that have been applied, and the different observables over which computation has been performed:

simutest/analysis/filterset/observable/condition

The advantage of such export is that it is possible to re-load parameters from an analysis in a different session.

Plotting results

tunacell comes with the following plotting functions:

from tunacell.plotting.dynamics import plot_onepoint, plot_two_points

that works with tuna.stats.single.Univariate instances such as our results stored in univ:

fig = plot_onepoint(univ, mean_ref=ref_mean, var_ref=ref_var, show_ci=True, save=True)

One point plots are saved in the simutest/analysis/filterset/observable folder since all conditions are represented.

The first figure, stored in fig1, looks like:

_images/plot_onepoint_exact-growth-rate_ALL.png

Plot of one-point functions computed by tuna. The first row shows the sample counts vs. time, \(s^{(1)}_i\) vs. \(t_i\). The middle row shows the sample average \(\langle x(t_i) \rangle\) vs. time. Shadowed regions show the 99% confidence interval, computed in the large sample size limit with the empirical standard deviation. The bottom row shows the variance \(\sigma^2(t_i)\). The blue line shows results for the whole statistical ensemble, whereas the orange line shows results for the conditioned sub-population (cells with even identifier).

We can represent two point functions:

fig2 = plot_twopoints(univariate, condition_label='master', trefs=[40., 80., 150.],
                  show_exp_decay=ref_decayrate)

The second figure, stored in fig2, looks like so:

_images/plot_twopoints_exact-growth-rate_ALL.png

Plot of two-point functions. Three times of reference are chosen to display the associated functions. Top row shows the sample counts, i.e. the number of independent lineages used in the computation that connect tref to \(t\). Middle row shows the associated auto-correlation functions \(a(t_{\mathrm{ref}}, t)/\sigma^2(t_{\mathrm{ref}})\). The bottom row show the translated functions \(a(t_{\mathrm{ref}}, t-t_{\mathrm{ref}})/\sigma^2(t_{\mathrm{ref}})\). One can guess that they peak at \(t-t_{\mathrm{ref}} \approx 0\), though decay on both sides are quite irregular compared to the expected behaviour due to the low sample size.

The view proposed on auto-correlation functions for specific times of reference is not enough to quantify the decay and associate a correlation time. A clever trick to gain statistics is to pool all data where the process is stationary and numerically evaluate \(\tilde{a}\).

Computing the auto-correlation function under stationarity

By inspecting the average and variance in the one-point function figure above, the user can estimate whether the process is stationary and where (over the whole time course, or just over a subset of it). The user is prompted to define regions where the studied process is (or might be) stationary. These regions are saved automatically:

# %% define region(s) for steady state analysis
# call the Regions object initialized on parser
regs = Regions(exp)
# this call reads previously defined regions, show them with
print(regs)

# then use one of the defined regions
region = regs.get('ALL')  # we take the entire time course

Computation options need to be provided. They dictate how the mean value must be substracted: either global mean over all time-points defined within a region, either locally where the time-dependent average value is used; and how segments should be sampled: disjointly or not. Default settings are set to use global mean value and disjoint segments:

# define computation options
options = CompuParams()

To compute the stationary auto-correlation function \(\tilde{a}\) use:

from tunacell.stats.api import compute_stationary
stat = compute_stationary(univ, region, options)

The first argument is the Univariate instance univ, the second argument is the time region over which to accept samples, and the third are the computation options.

Here our process is stationary by construct over the whole time period of the simulation so we choose the ‘ALL’ region. Our options is to substract the global average value for the process, and to accept only disjoint segments for a given time interval: this will ensure that samples used for a given time interval are independent (as long as the process is Markovian) and we can estimate the confidence interval by computing the standard deviation of all samples for a given time interval.

stat is an instance of tuna.stats.single.StationaryUnivariate which is structured in the same way with respect to master and conditions. Each of its items (e.g. stat.master, or stat[repr(condition)]) is an instance of tuna.stats.single.StationaryUnivariateConditioned and stores information in the following attributes:

  • time: the 1d array storing time interval values,
  • counts: the 1d array storing the corresponding sample counts,
  • autocorr: the 1d array storing the value of the auto-correlation function \(\tilde{a}\) for corresponding time intervals.
  • dataframe: a Pandas.dataframe instance that collects data points used in the computation ; each row corresponds to a single data point (in a single cell), with information on the acquisition time, the cell identifier, the value of the observable, and as many boolean columns as there are conditions, plus the master (no condition), that indicate whether a sample has been taken or not. This is a convenient dataframe to draw e.g. marginal distributions.

Plotting results

tunacell provides a plotting function that returns a Figure instance:

from tunacell.plotting.dynamics import plot_stationary
fig = plot_stationary(stat, show_exp_decay=ref_decayrate, save=True)

The first argument must be a tuna.stats.single.StationaryObservable instance. The second parameter displays an exponential decay (to compare with data).

_images/plot_stationary_exact-growth-rate_ALL.png

Plot of stationary autocorrelation function. Top row is the number of samples, i.e. the number of (disjoint) segments of size \(\Delta t\) found in the decomposed lineage time series. Middle row is the auto-correlation function \(\tilde{a}(\Delta t)/\sigma^2(0)\). Confidence intervals are computed independently for each time interval, in the large sample size limit.

Exporting results as text files

Again it is possible to export results as text files under the same folder structure by typing:

stat.export_text()

This will create a tab-separated text file called stationary_<region.name>.tsv that can be read with any spreadsheet reader.

In addition, the dataframe of single time point values is exported as a csv file under the filterset folder as data_<region.name>_<observable.label()>.csv.

A note on loading results

As described above, results can be saved in a specific folder structure that not only store the numerical results but also the context (filterset, conditions, observables, regions).

Then it is possible to load results by parsing the folder structure and reading the text files. To do so, initialize an analysis object with some settings, and try to read results from files:

from tunacell.stats.api import load_univariate
# load univariate analysis of experiment defined in parser
univ = load_univariate(exp, ou, cset=[condition, ])

The last call will work only if the analysis has been performed and exported to text files before. Hence a convenient way to work is:

try:
    univ = load_univariate(exp, ou, cset=[condition, ])
except UnivariateIOError as uerr:
    print('Impossible to load univariate {}'.format(uerr))
    print('Launching computation')
    univ = compute_univariate(exp, ou, cset=[condition, ])
    univ.export_text()

Bivariate analysis: cross-correlations

Key questions are to check which observables correlate, and how they correlate in time. The appropriate quantity to look at is the cross-correlation function, \(c(s, t)\), and the stationary cross-correlation funtion \(\tilde{c}(\Delta t)\) defined above (see Background).

To estimate these functions, one first need to have run the univariate analyses on the corresponding observables. We take the univariate objects corresponding to the ou and gr observables:

# local estimate of growth rate by using the differentiation of size measurement
# (the raw column 'exp_ou_int' plays the role of cell size in our simulations)
gr = Observable(name='approx-growth-rate', raw='exp_ou_int',
                differentiate=True, scale='log',
                local_fit=True, time_window=15.)
univ_gr = compute_univariate(exp, gr, [condition, ])

# import api functions
from tunacell.stats.api import (compute_bivariate,
                            compute_stationary_bivariate)
# compute cross-correlation matrix
biv = compute_cross(univ, univ_gr)
biv.export_text()

# compute cross-correlation function under stationarity hypothesis
sbiv = compute_cross_stationary(univ, univ_gr, region, options)
sbiv.export_text()

These objects again point to items corresponding to the unconditioned data and each of the conditions.

Again, cross-correlation functions as a function of two time-points (results stored in biv), the low sample size is a limit to get a smooth numerical estimate and we turn to the estimate under stationary hypothesis in order to pool all samples.

Inspecting cross-correlation results

We can inspect the master result:

master = biv.master

or any of the conditioned dataset:

cdt = biv[repr(condition)]

where condition is an item of each of the cset lists (one for each single object). Important attributes are:

  • times: a couple of lists of sequences of times, corresponding respectively to the times evaluated for each item in singles, or \(\{ \{s_i\}_i, \{t_j\}_j \}\) where \(\{s_i\}_i\) is the sequence of times where the first single item has been evaluated, and \(\{t_j\}_j\) is the sequence of times where the second single observable has been evaluated. Note that the length \((p, q)\) of these vectors may not coincide.
  • counts: the \((p, q)\) matrix giving for entry \((i, j)\) the number of samples in data where an independent lineage has been drawn between times \(s_i\) and \(t_j\).
  • corr: the \((p, q)\) matrix giving for entry \((i, j)\) the value of estimated correlation \(c_(s_i, t_j)\).

It is possible to export data in text format using:

biv.export_text()

It will create a new folder <obs1>_<obs2> under each condition folder and store the items listed above in text files.

Inspecting cross-correlation function at stationarity

In the same spirit:

master = sbiv.master

gets among its attributes array that stores time intervals, counts, and values for correlation as a Numpy structured array. The dataframe attribute points to a Pandas.dataframe that recapitulates single time point data in a table, with boolean columns for each condition.

It is possible to use the same plotting function used for stationary autocorrelation functions:

plot_stationary(sbiv, ref_decay=ref_decayrate)

which should plot something like:

_images/plot_stationary_exact-growth-rate---approx-growth-rate_ALL.png

Plot of the stationary cross-correlation function of the Ornstein-Uhlenbeck process with the local growth rate estimate using the exponential of the integrated process. It is symmetric and not very informative since it should more or less collapse with the auto-correlation of one of the two observables, since the second is merely a local approximation of the first.

Other examples

If one performs a similar analysis with the two cell-cycle observables, for example:

_images/plot_stationary_average-growth-rate---division-size_ALL.png

Plot of the stationary cross-correlation function of the cell-cycle average growth rate with the cell length at division, with respect to the number of generations. We expect that a fluctuation in cell-cycle average growth rate influences length at division in the same, or in later generations. This is why we observe the highest values of correlation for \(\Delta t = 0,\ 1,\ 2\) generations, and nearly zero correlation for previous generations (there is no size control mechanism in this simulation).

Footnotes

[1]This document has been written during Roland Garros tournament…

Filters

Outliers may have escaped segmentation/tracking quality control tools and thus there might be a need to apply further filtering when analysing their output data, as tunacell does. For example, filamentous cells may have been reported in data, but one might exclude them from a given analysis.

tunacell provides a set of user-customisable filters that allows user to define properly the statistical ensemble of samples over which its analysis will be performed.

In addition to removing outliers, filters are also used for conditional analysis, as they allow to divide the statistical ensemble in sub-populations of samples that verify certain rules.

Series of filters have already been defined for each of the following types: cell, lineage, colony, and container. In addition boolean operations AND, OR, NOT can be used within each type. Then filters of different types are combined in FilterSet instances: one is used to define the statistical ensemble (remove outliers), and optionnally, others may be used to create sub-populations of samples for comparative analyses.

How individual filters work

Filters are instances of the FilterGeneral class. A given filter class is instantiated (possibly) with parameters, that define how the filter work. Then the instantiated object is callable on the object to be filtered. It returns either True (the object is valid) or False (the object is rejected).

Four main subclasses are derived from FilterGeneral, one for each structure that tuna recognizes: FilterCell for Cell objects, FilterTree for Colony objects, FilterLineage for Lineage objects, FilterContainer for Container objects.

Example: testing the parity of cell identifier

The filter FilterCellIdparity has been designed for illustration: it tests whether the cell identifier is even (or odd).

First we set the filter by instantiating its class with appropriate parameter:

>>> from tunacell.filters.cells import FilterCellIDparity
>>> filter_even = FilterCellIDparity(parity='even')

For this filter class, there is only one keyword parameter, parity, which we have set to 'even': accept cells with even identifier, rejects cells with odd identifier.

First, we can print the string representation:

>>> print(str(filter_even))
CELL, Cell identifier is even

The first uppercase word in the message reminds the type of objects the filter is acting upon. Then the message is a label that has been defined in the class definition).

We set two Cell instances, one with even identifier, and one odd:

>>> from tunacell.base.cell import Cell
>>> mygoodcell = Cell(identifier=12)
>>> mybadcell = Cell(identifier=47)

Then we can perform the test over both objects:

>>> print(filter_even(mygoodcell))
True
>>> print(filter_even(mybadcell))
False

We also mention another feature implemented in the representation of such filters:

>>> print(repr(filter_even))
FilterCellIDparity(parity='even', )

Such representation is the string one would type to re-instantiate the filter. This representation is used by tuna when a data analysis is exported to text files. Indeed, when tuna reads back this exported files, it is able to load the objects defined in the exported session. Hence, no need of remembering the precise parameters adopted on a particular analysis: if it’s exported, it can be loaded later on.

Creating a new filter

Few filters are already defined in the following modules:

  • tunacell.filters.cells for filters acting on cells,
  • tunacell.filters.lineages for filters acting on lineages,
  • tunacell.filters.trees for filters acting on colonies,
  • tunacell.filters.containers for filters acting on containers.

Within each type, filters can be combined with boolean operations (see below), that allows user to explore a range of filters. However a user may need to define its own filter(s), and he/she is encouraged to do so following the general guidelines:

  • define a label attribute (human-readable message, which was 'Cell identifier is even' in our previous example),
  • define the func() method that performs the boolean testing.

From the module tunacell.filters.cells we copied below the class definition of the filter used in our previous example:

class FilterCellIDparity(FilterCell):
    """Test whether identifier is odd or even"""

    def __init__(self, parity='even'):
        self.parity = parity
        self.label = 'Cell identifier is {}'.format(parity)
        return

    def func(self, cell):
        # test if even
        try:
            even = int(cell.identifier) % 2 == 0
            if self.parity == 'even':
                return even
            elif self.parity == 'odd':
                return not even
            else:
                raise ValueError("Parity must be 'even' or 'odd'")
        except ValueError as ve:
            print(ve)
            return False

Although this filter may well be useless in actual analyses, it shows how to define a filter class. Also have a look at filters defined in the above-mentioned modules.

How to combine individual filters together with boolean operations

Filters already implemented are “atomic” filters, i.e. they perform one testing operation. It is possible to combine many atomic filters of the same type (type refers to the object type on which filter is applied: cell, lineage, colony, container) by using Boolean filter types.

There are 3 of them, defined in tuna.filters.main: FilterAND, FilterOR, FilterNOT. The first two accepts any number of filters, that are combined with the AND/OR logic respectively; the third accepts one filter as argument.

With these boolean operations, complex combinations of atomic filters can be created.

How to define a FilterSet instance

So far we saw how to use filters for each type of structures, independently: cell, lineage, colony, and container.

The FilterSet registers filters to be applied on each of these types. It is used to define the statistical ensemble of valid samples, or to define a condition (rules to define a sub-population from the statistical ensemble).

Explicitely, if we would like to use our filter_even from our example above as the only filter to make the statistical ensemble, we would define:

from tunacell.filters.main import FilterSet
fset = FilterSet(filtercell=filter_even)

(the other keyword parameters are filterlineage, filtertree, and filtercontainer)

Tunacell’s data model

tunacell’s top level data structure matches input files scaffold. Raw data is stored in Cell instances connected through a tree structure arising from cell divisions.

Top level structures: Experiment and Container

tunacell’s top level structure is Experiment and handles the experiment. We refer to the API documentation for details about attributes and methods. In particular, it stores the list of container files that allows to open/read such containers.

These are stored under Container instances, which label is set by file name. Such objects gets two major attributes: cells and trees. The former is the list of Cell instances imported from raw data file, the latter is the list of reconstructed trees formed by dividing cells, stored as Colony instances.

Low-level structures: Cell and Colony

These classes are derived from the treelib package Node and Tree classes respectively.

Raw data is stored under the data attribute of Cell instances.

Methods are defined at the Container level to retrieve objects corresponding to an identifier. More importantly there is an iterator over colonies that can be used when parsing data for statistical analysis.

Tree decomposition: Lineage

For studying dynamics over times larger than one or few cell cycles, it is necessary to build timeseries of observables over sequences of more than one cells.

We use features from the treelib package to decompose trees in independent lineages. A lineage is a sequence \({c_i}_i\) of cells related through successive divisions: cell \(c_i\) is a daughter of cell :math`c_{i-1}`, and the mother of cell \(c_{i+1}\).

One way to decompose a tree in lineages is to build the sets of lineages from root to all leaves. Such decomposition implies that some cells may belong to more than one lineage. Using such decomposition require some statistical weighting procedure.

To avoid such weighting procedure, we used a decomposition in independent lineages. Such decomposition ensures that each cell is counted once and only once. More specifically our method to decompose a tree in independent lineages is to traverse the tree starting from the root and choosing randomly one daughter cell at each division until a leaf is reached, repeatidly.

A lineage is defined as a Lineage instance. Such object gets method to build the corresponding timeseries for a given observable.

Numerical simulations in tunacell

Using the script

When installed using pip, a tunasimu executable is provided to run numerical simulations and save results on local directories. These files can be used to try tunacell’s analysis tools.

Such a command comes with various parameters, that will printed upon call:

$ tunasimu -h

There is a list of (optional) parameters.

What are these simulations about?

ou is the value of the Ornstein-Uhlenbeck random process simulated in each cell, int_ou is the integrated value of the random process reset to zero at each cell birth, exp_int_ou is the exponential of the later value. One can think of the Ornstein-Uhlenbeck as instantaneous growth rate of the cell, and thus exp_int_ou can be associated to cell length.

Experiment

This module implements the first core class: Experiment, and functions to parse containers, retrieve and build data.

Each experiment consists of multiple containers where data is stored under container folders. A container may correspond to a single field of view, to a subset thereof (e.g. a single channel in microfluidic experiments).

Such containers must meet two requirements:
  1. Cell identifiers are unique within a container;
  2. Lineage reconstruction is defined and performed within a single container.
This module stores classes and functions that allow to:
  • explore data structure (and metadata if provided)
  • keep track of every container where to look for data
  • extract data in a Container instance from text containers
  • build cells filiation, store time-lapse microscopy data, build trees
class tunacell.base.experiment.Experiment(path='.', filetype=None, filter_set=None, count_items=False)

General class that stores experiment details.

Creates an Experiment instance from reading a file, records path, filetype, reads metadata, stores the list of containers.

Parameters:
  • path (str) – path to experiment root file
  • -- str {None, 'text', 'supersegger'} (filetype) – leave to None for automatic detection.
abspath

absolute path on disk of main directory for text containers

Type:str
label

experiment label

Type:str
filetype

one of the available file type (‘simu’ is not a filetype per se…)

Type:str {‘text’, ‘supersegger’}
fset

filterset to be applied when parsing data

Type:FilterSet instance
datatype

provides the datatype of raw data stored in each Cell instance .data This attribute is defined only for text filetype, when a descriptor file is associated to the experiment.

Type:Numpy.dtype instance
metadata

experiment metadata

Type:Metadata instance
containers

list of absolute paths to containers

Type:list of pathlib.Path
period

time interval between two successive aquisitions (this should be defined in the experiment metadata)

Type:float
iter_containers(self, read=True, build=True, prefilt=None,
extend_observables=False, report_NaNs=True, size=None, shuffle=False)

browse containers

analysis_path

Get analysis path (with appropriate filterset path)

count_items(independent=True, seed=None, read=True)

Parse data to count items: cells, colonies, lineages, containers

Parameters:
  • independent (bool {True, False}) – lineage decomposition parameter
  • seed (int, or None) – lineage decomposition parameter
  • read (bool {True, False}) – try to read it in analysis folder
fset

Get current FilterSet

get_container(label, read=True, build=True, prefilt=None, extend_observables=False, report_NaNs=True)

Open specified container from this experiment.

Parameters:
  • label (str) – name of the container file to be opened
  • read (bool (default True)) – whether to read data and extract Cell instances list
  • build (bool (default True)) – when read option is active, whether to build Colony instances
  • extend_observables (bool (default False)) – whether to compute secondary observables from raw data
  • report_NaNs (bool (default True)) – whether to report for NaNs found in data
Returns:

container

Return type:

Container instance

Raises:
  • ParsingExperimentError : when no container corresponds in this exp
  • ParsingContainerError: when despite of existing container filename, – parsing of container failed and nothing is loaded
info()

Show informations about experiment

iter_cells(size=None, shuffle=False)

Iterate through valid cells.

Applies all filters defined in fset.

Parameters:
  • size (int (default None)) – limit the number of lineages to size. Works only in mode=’all’
  • shuffle (bool (default False)) – whether to shuffle the ordering of lineages when mode=’all’
Yields:

cell (Cell instance) – filtering removed outlier cells, containers, colonies, and lineages

iter_colonies(filter_for_colonies='from_fset', size=None, shuffle=False)

Iterate through valid colonies.

Parameters:
  • filter_for_colonies (FilterTree instance or str {'from_fset', 'none'}) –
  • size (int (default None)) – limit the number of colonies to size. Works only in mode=’all’
  • shuffle (bool (default False)) – whether to shuffle the ordering of colonies when mode=’all’
Yields:

colony (Colony instance) – filtering removed outlier cells, containers, and colonies

iter_containers(read=True, build=True, filter_for_cells='from_fset', filter_for_containers='from_fset', apply_container_filter=True, extend_observables=False, report_NaNs=True, size=None, shuffle=False)

Iterator over containers.

Parameters:
  • size (int (default None)) – number of containers to be parsed
  • read (bool (default True)) – whether to read data and extract Cell instances
  • build (bool (default True), called only if read is True) – whether to build colonies
  • filter_for_cells (FilterCell instance, or str {'from_fset', 'none'}) – filter applied to cells when data files are parsed
  • filter_for_containers (FilterContainer instance or str {'from_fset', 'none'}) – filter applied to containers when data files are parsed
  • extend_observables (bool (default False)) – whether to construct secondary observables from raw data
  • report_NaNs (bool (default True)) – whether to report for NaNs found in data
  • shuffle (bool (default False)) – when size is set to a number, whether to randomize ordering of upcoming containers
Returns:

Return type:

iterator iver Container instances of current Experiment instance.

iter_lineages(filter_for_lineages='from_fset', size=None, shuffle=False)

Iterate through valid lineages.

Parameters:
  • filter_for_lineages (FilterLineage instance or str {'from_fset', 'none'}) – filter lineages
  • size (int (default None)) – limit the number of lineages to size. Works only in mode=’all’
  • shuffle (bool (default False)) – whether to shuffle the ordering of lineages when mode=’all’
Yields:

lineage (Lineage instance) – filtering removed outlier cells, containers, colonies, and lineages

period

Return the experimental level period

The experimental level period is defined as the smallest acquisition period over all containers.

raw_text_export(path='.', metadata_extension='.yml')

Export raw data as text containers in correct directory structure.

Parameters:
  • path (str) – path to experiment root directory
  • metadata_extension (str (default '.yml')) – type of metadata file (now only yaml file works)
exception tunacell.base.experiment.FiletypeError
exception tunacell.base.experiment.ParsingExperimentError
tunacell.base.experiment.count_items(exp, independent_decomposition=True, seed=None)

Parse the experiment, with associated FilterSet, and count items

Parser

This module provides elements for parsing data manually, i.e. getting a handful list of samples and extract specific structures (colony, lineage, cell) from such samples.

  • Parser: handles how to parse an experiment with a given filterset
class tunacell.base.parser.Parser(exp=None, filter_set=None)

Defines how user wants to parse data.

Parameters:
  • exp (Experiment instance) –
  • filter_set (FilterSet instance) – this is the set of filters used to read/build data, used for for iterators (usually, only .cell_filter and .container_filter are used)
add_sample(*args)

Add sample to sample list.

Parameters:args (list) – list of items such as: integer, strings, couple, and/or dict An integer denotes the number of sample_ids to be chosen randomly (if many integers are given, only the first is used) A string will result in loading the corresponding Container, with a cell identifier randomly chosen. A couple (str, cellID) denotes (container_label, cell_identifier) A dictionary should provide ‘container’ key, and ‘cellID’ key
clear_samples()

Erase all samples.

get_cell(sample_id)

Get Cell instance corresponding to sample_id.

Parameters:sample_id (dict) – element of self.samples
Returns:cell – corresponding to sample_id
Return type:Cell instance
get_colony(sample_id)

Get Colony instance corresponding to sample_id.

Parameters:sample_id (dict) – element of self.samples
Returns:colony – corresponding to sample_id
Return type:Colony instance
get_container(sample_id)

Get Container instance corresponding to sample_id.

Parameters:sample_id (dict) – element of self.samples
Returns:container
Return type:Container instance
get_lineage(sample_id)

Get Lineage instance corresponding to sample_id.

Parameters:sample_id (dict) – element of self.samples
Returns:lineage – corresponding to sample_id
Return type::class:` Lineage` instance
get_sample(index, level='cell')

Return sample corresponding to index.

Parameters:
  • index (int) – index of sample id in self.samples
  • level (str {'cell'|'lineage'|'colony'}) –
Returns:

structure level corresponding to sample id

Return type:

out

info_samples()

Table output showing stored samples.

iter_cells(mode='samples', size=None)

Iterate through valid cells.

Parameters:
  • mode (str {'samples'} (default 'samples')) – whether to iterate over all cells (up to number limitation), or over registered samples
  • size (int (default None)) – limit the number of lineages to size. Works only in mode=’all’
Yields:

cell (Cell instance) – filtering removed outlier cells, containers, colonies, and lineages

iter_colonies(mode='samples', size=None)

Iterate through valid colonies.

Parameters:
  • mode (str {'samples'} (default 'samples')) – whether to iterate over all colonies (up to number limitation), or over registered samples
  • size (int (default None)) – limit the number of colonies to size.
Yields:

colony (Colony instance) – filtering removed outlier cells, containers, and colonies

iter_containers(mode='samples', size=None)

Iterate through valid containers.

Parameters:
  • mode (str {'samples'} (default 'samples')) – iterates over containers pointed by parser.samples
  • size (int (default None)) – number of containers to be parsed
Yields:

container (tunacell.core.Container instance) – filtering removed outlier cells, containers

iter_lineages(mode='samples', size=None)

Iterate through valid lineages.

Parameters:
  • mode (str {''samples'} (default 'samples')) – whether to iterate over all lineages (up to number limitation), or over registered samples
  • size (int (default None)) – limit the number of lineages to size.
Yields:

lineage (Lineage instance) – filtering removed outlier cells, containers, colonies, and lineages

load_experiment(path, filetype='text')

Loads an experiment from path to file.

Parameters:
  • path (str) – path to root directory (‘text’), or to datafile (‘h5’)
  • filetype (str {'text', 'h5}) –
remove_sample(index, verbose=True)

Remove sample of index in sample list.

set_filter(fset)

Set build filter.

Parameters:fset (FilterSet instance) –

Observable

This module provides API class definition to define observables.

Classes

class tunacell.base.observable.FunctionalObservable(name=None, f=None, observables=[])

Combination of Observable instances

Parameters:
  • name (str) – user defined name for this observable
  • f (callable) – the function to apply to observables
  • observables (list of Observable instances) – parameters of the function f to be applied

Warning

Contrary to Observable, instances of FunctionalObservable cannot be represented as a string using repr(), that could be turned into a new instance with identical parameters using eval(). This is due to the applied function, difficult to serialize as a string AND keeping a human-readable format to read its definition.

label

get unique string identifier

latexify(show_variable=True, plus_delta=False, shorten_time_variable=False, prime_time=False, as_description=False, use_name=None)

Latexify observable name

mode

Returns mode depending on observables passed as parameters

timing

Return timing depending on observables passed as parameters

class tunacell.base.observable.Observable(name=None, from_string=None, raw=None, differentiate=False, scale='linear', local_fit=False, time_window=0.0, join_points=3, mode='dynamics', timing='t', tref=None)

Defines how to retrieve observables.

Parameters:
  • name (str) – user name for this observable (can be one of the raw observable)
  • raw (str (default None)) – raw name of the observable: must be a column name of raw data, i.e. first element of one entry of Experiment.datatype
  • differentiate (boolean (default False)) – whether to differentiate raw observable
  • scale (str {'linear', 'log'}) – expected scaling form as a function of time (used for extrapolating values at boundaries, including in the local_fit procedure)
  • local_fit (boolean (default False)) – whether to perform local fit procedure
  • time_window (float (default 20.)) – Time window over which local fit procedure is applied (used only when local_fit is activated)
  • join_points (int (default 3)) – number of points over which extrapolation procedure is led by fitting linearly [the scale of] the observable w.r.t time
  • mode (str {'dynamics', 'birth', 'division', 'net_increase', 'rate',) –

    ‘average’} mode used to retrieve data:

    • ’dynamics’: all timepoints are retrieved
    • ’birth’: only birth value is retrieved
    • ’division’: only division value is retrieved
    • ’net-increase-additive’: difference between division value
      and birth value
    • ’net-increase-multiplicative’: ratio between division value
      and birth value
    • ’rate’: rate of linear fit of [scale of] observable
    • ’average’: average of observable over cell cycle
  • timing (str {'t', 'b', 'd', 'm', 'g'}) –

    set the time at which cell cycle observable is associated:

    • ’t’ : time-lapse timing (associated to mode ‘dynamics’)
    • ’b’ : cell cycle birth time
    • ’d’ : cell cycle division time
    • ’m’ : cell cycle midpoint (half time)
    • ’g’ : cell cycle generation index
  • tref (float or 'root' (default None)) – when timing is set to ‘g’, sets the 0th generation to the cell that bounds this reference time when timing is set to ‘t’ (time-lapse timing), allows to translate time values by substracting floating point value (if given as a float), or aligns to the colony root cell last time value as origin.
as_latex_string

Export as LaTeX string. Old format, replaced by latexify

as_string_table()

Human readable output as a table.

as_timelapse()

Convert current observable to its dynamic counterpart

This is needed when computing cell-cycle observables.

label

Label is outputing a unique string representation

This method creates a string label that specifies each parameter to re-construct the Observable. The output string is (kind-of) human readable. More importantly, it is suitable to be a filename (only alphanumeric caracters, and underscores), and in fact serves to name directories in analysis folder.

Note

__repr__() : returns another string representation, that can be called by the built-in eval(), to instantiate a new object with identical functional parameters.

latexify(show_variable=True, plus_delta=False, shorten_time_variable=False, prime_time=False, as_description=False, use_name=None)

Returns a latexified string for observable

Parameters:
  • show_variable (bool) – whether to print out time/generation variable
  • plus_delta (bool) – whether to add a $Delta$ to time/generation variable; used for auto- and cross-correlation labeling
  • shorten_time_variable (bool) – when active, will display only $t$/$g$
  • prime_time (bool) – whether to set a prime on the time/generation variable
  • as_description (bool (default False)) – sets up the description of the observable from rules to compute it (derivatives, log, and raw label)
  • use_name (str (default None)) – when the observable name is too cumbersome to be printed, and the user wants to choose a specific name for such a printout
load_from_string(codestring)

Set Observable instance from string code created by label method

Parameters:codestring (str) – must follow some rules for parsing
exception tunacell.base.observable.ObservableError
exception tunacell.base.observable.ObservableNameError
exception tunacell.base.observable.ObservableStringError
tunacell.base.observable.set_observable_list(*args, **kwargs)

Make raw, and functional observable lists for running analyses

Parameters:
  • *args – Variable length argument list of Observable or FunctionalObservable instances
  • **kwargs – Accepted keyword arguments: ‘filters=[]’ with a list of FilterSet or FilterGeneral instance (must have a .obs attribute)
Returns:

lists of raw observables, functional observables (correctly ordered)

Return type:

raw_obs, func_obs

tunacell.base.observable.unroll_func_obs(obs)

Returns flattened list of FunctionalObservable instances

It inspect recursively the observable content of the argument to yield all nested FunctionalObservable instances. They are ordered from lower to deeper layers in nested-ness. If you need to compute f(g(h(x))), where x is a raw Observable, the generator yields h, g, and f lastly, so that evaluation can be performed in direct order.

Parameters:obs (FunctionalObservable instance) – the observable to inspect
Yields:FunctionalObservable instance – The generator yields funcObs instance in appropriate order (from lower to higher level in nested-ness).
tunacell.base.observable.unroll_raw_obs(obs)

Returns a generator over flattened list of Observable instances

Parameters:obs ((list of) Observable or FunctionalObservable instances) –
Yields:flattenObservable instances found in argument list, going into nested layers in the case of nested list, or for FunctionalObservable instances

Filters

This module defines the structure for filters objects (see FilterGeneral).

Various filters are then defined in submodules, by subclassing.

The subclass needs to define at least two things:
  • the attribute _label (string or unicode) which defines filter operation
  • the func method, that performs and outputs the boolean test

Some useful functions are defined here as well.

class tunacell.filters.main.FilterAND(*filters)

Defines boolean AND operation between same type filters.

Parameters:filters (sequence of FilterGeneral instances) –
Returns:will perform AND boolean operation between the various filters passed as arguments.
Return type:FilterGeneral instance

Notes

Defines a FilterTrue

func(target)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

exception tunacell.filters.main.FilterArgError

Exception raised when Argument format is not suitable.

class tunacell.filters.main.FilterBoolean

General class to implement Boolean operations between filters

label

Get label of applied filter(s)

exception tunacell.filters.main.FilterError

Superclass for errors while filtering

class tunacell.filters.main.FilterGeneral

General class for filtering cell (i.e. tunacell.base.cell.Cell) instances.

Important property is to make the instance callable, and define with a human readable label the action of filter.

func(*args)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

label

Get label of applied filter(s)

obs

Provides the list of hidden observables

exception tunacell.filters.main.FilterLabelError

Exception when label is not (properly) set.

class tunacell.filters.main.FilterNOT(filt)

Defines boolean NOT operation on filter.

Parameters:filter (FilterGeneral instance) –
func(target)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

class tunacell.filters.main.FilterOR(*filters)

Defines boolean OR operation between same type filters.

Parameters:filters (sequence of FilterGeneral instances) –
func(target)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

exception tunacell.filters.main.FilterParamsError

Exception when parameters are not set for a given test.

class tunacell.filters.main.FilterSet(label=None, filtercell=FilterTRUE(), filterlineage=FilterTRUE(), filtertree=FilterTRUE(), filtercontainer=FilterTRUE())

Collects filters of each type in a single object

obs

Provides the list of hidden observables

class tunacell.filters.main.FilterTRUE

Returns True for argument of any type, can be used in Boolean filters

We need this artificial Filter to plug in defaults FilterSets.

func(*args)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

exception tunacell.filters.main.FilterTypeError

Error raised when one tries to combine different types of filters

tunacell.filters.main.bounded(arg, lower_bound=None, upper_bound=None)

Function that test whether argument is bounded.

By convention, lower bound is included, upper bound is excluded. Thus it tests whether lower_bound <= arg < upper_bound. If arg is an iterable, the test must be satisfied for every element (hence the minimal value must be greater or equal to lower_bound and the maximal value must be lower than the upper bound).

Parameters:
  • arg (int or float) – quantity to be tested for bounds
  • lower_bound (int or float (default None)) –
  • upper_bound (int or float (default None)) –
Returns:

Return type:

Boolean

This module defines filters for Cell instances

class tunacell.filters.cells.FilterCell

General class for filtering cell objects (reader.Cell instances)

class tunacell.filters.cells.FilterCellAny

Class that does not filter anything.

func(cell)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

class tunacell.filters.cells.FilterCellIDbound(lower_bound=None, upper_bound=None)

Test class

func(cell)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

class tunacell.filters.cells.FilterCellIDparity(parity='even')

Test whether identifier is odd or even

func(cell)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

class tunacell.filters.cells.FilterCompleteCycle(daughter_min=1)

Test whether a cell has a given parent and at least one daughter.

func(cell)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

class tunacell.filters.cells.FilterCycleFrames(lower_bound=None, upper_bound=None)

Check whether cell has got a minimal number of datapoints.

func(cell)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

class tunacell.filters.cells.FilterCycleSpanIncluded(lower_bound=None, upper_bound=None)

Check that cell cycle time interval is within valid bounds.

func(cell)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

class tunacell.filters.cells.FilterData

Default filter test only if cell exists and cell.data non empty.

func(cell)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

class tunacell.filters.cells.FilterDaughters(daughter_min=1, daughter_max=2)

Test whether a given cell as at least one daughter cell

func(cell)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

class tunacell.filters.cells.FilterHasParent

Test whether a cell has an identified parent cell

func(cell)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

class tunacell.filters.cells.FilterLengthIncrement(lower_bound=None, upper_bound=None)

Check increments are bounded.

func(cell)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

class tunacell.filters.cells.FilterObservableBound(obs=Observable(name='undefined', raw='undefined', scale='linear', differentiate=False, local_fit=False, time_window=0.0, join_points=3, mode='dynamics', timing='t', tref=None, ), tref=None, lower_bound=None, upper_bound=None)

Check that a given observable is bounded.

Parameters:
  • obs (Observable instance) – observable that will be tested for bounds works only for continuous observable (mode=’dynamics’)
  • tref (float (default None)) – Time of reference at which to test dynamics observable value
  • lower_bound (float (default None)) –
  • upper_bound (float (default None)) –
func(cell)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

class tunacell.filters.cells.FilterSymmetricDivision(raw='area', lower_bound=0.4, upper_bound=0.6)

Check that cell division is (roughly) symmetric.

Parameters:raw (str) – column label of raw observable to test for symmetric division (usually one of ‘length’, ‘area’). This quantity will be approximated
func(cell)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

class tunacell.filters.cells.FilterTimeInCycle(tref=0.0)

Check that tref is within cell birth and division time

func(cell)

This is the boolean operation to define in specific Filter instances

Default operation returns True.

Analysis of the dynamics

This module sets up api functions for dynamical correlation analysis.

exception tunacell.stats.api.ParamsError
tunacell.stats.api.compute_bivariate(row_univariate, col_univariate, size=None)

Computes cross-correlation between observables defiend in univs.

This functions handles conditions and time-window binning:

  • all conditions provided in cset are applied independently, in addition to the computation with unconditioned data (labelled ‘master’)
  • A time-binning window is provided with a given offset and a period. Explicitely a given time value t found in data will be rounded up to closest offset_t + k * delta_t, where k is an integer.
Parameters:
  • univs (couple of Univariate instances) –
  • size (int (default None)) – limit the iterator to size Lineage instances (used for testing)
Returns:

Return type:

TwoObservable instance

tunacell.stats.api.compute_stationary(univ, region, options, size=None)

Computes stationary autocorrelation. API level.

Parameters:
  • univ (Univariate instance) – the stationary autocorr is based on this object
  • region (Region instance) –
  • options (CompuParams instance) – set the ‘adjust_mean’ and ‘disjoint’ options
  • size (int (default None)) – limit number of parsed Lineages
tunacell.stats.api.compute_stationary_bivariate(row_univariate, col_univariate, region, options, size=None)

Computes stationary cross-correlation function from couple of univs

Need to compute stationary univariates as well.

tunacell.stats.api.compute_univariate(exp, obs, region='ALL', cset=[], times=None, size=None)

Computes one-point and two-point functions of statistical analysis.

This functions handles conditions and time-window binning:

  • all conditions provided in cset are applied independently, in addition to the computation with unconditioned data (labelled ‘master’)
  • A time-binning window is provided with a given offset and a period. Explicitely a given time value t found in data will be rounded up to closest offset_t + k * delta_t, where k is an integer.
Parameters:
  • exp (Experiment instance) –
  • obs (Observable instance) –
  • region (Region instance or str (default ‘ALL’)) – in case of str, must be the name of a registered region
  • cset (list of FilterSet instances) –
  • times (1d ndarray, or str (default None)) – array of times at which process is evaluated. Default is to use the ‘ALL’ region with the period taken from experiment metadata. User can opt for a specific time array, or for the label of a region as a string
  • size (int (default None)) – limit the iterator to size Lineage instances (used for testing)
Returns:

Return type:

Univariate instance

tunacell.stats.api.load_bivariate(row_univariate, col_univariate)

Initialize a StationaryBivariate instance from its dynamical one.

Parameters:
  • row_univariate (Univariate instance) –
  • col_univariate (Univariate instance) –
Returns:

set up with empty arrays

Return type:

Bivariate instance

tunacell.stats.api.load_stationary(univ, region, options)

Initialize a StationaryUnivariate instance from its dynamical one.

Parameters:
  • univ (Univariate instance) –
  • region (Region instance) –
  • options (CompuParams instance) –
Returns:

set up with empty arrays

Return type:

StationaryInstance instance

tunacell.stats.api.load_stationary_bivariate(row_univariate, col_univariate, region, options)

Initialize a StationaryBivariate instance from its dynamical one.

Parameters:
  • row_univariate (Univariate instance) –
  • col_univariate (Univariate instance) –
  • region (Region instance) –
  • options (CompuParams instance) –
Returns:

set up with empty arrays

Return type:

StationaryBivariate instance

tunacell.stats.api.load_univariate(exp, obs, region='ALL', cset=[])

Initialize an empty Univariate instance.

Such a Univariate instance is bound to an experiment (through exp), an observable, and a set of conditions.

Parameters:
  • exp (Experiment instance) –
  • obs (Observable instance) –
  • region (Region instance or str (default ‘ALL’)) – in case of str, must be the name of a registered region
  • cset (sequence of FilterSet instances) –
Returns:

initialized, nothing computed yet

Return type:

Univariate instance

Raises:

UnivariateIOError – when importing fails (no data corresponds to input params)

Plotting dynamic analysis

This module defines plotting functions for the statistics of the dynamics.

tunacell.plotting.dynamics.plot_onepoint(univariate, show_cdts='all', show_ci=False, mean_ref=None, var_ref=None, axe_xsize=6.0, axe_ysize=2.0, time_range=(None, None), time_fractional_pad=0.1, counts_range=(None, None), counts_fractional_pad=0.1, average_range=(None, None), average_fractional_pad=0.1, variance_range=(None, None), variance_fractional_pad=0.1, show_legend=True, show_cdt_details_in_legend=False, use_obs_name=None, save=False, user_path=None, ext='.png', verbose=False)

Plot one point statistics: counts, average, abd variance.

One point functions are plotted for each condition set up in show_cdts argument: ‘all’ for all conditions, or the string representation (or label) of a particuler condition (or a list thereof).

Parameters:
  • univariate (Univariate instance) –
  • show_cdts (str (default 'all')) – must be either ‘all’, or ‘master’, or the repr of a condition, or a list thereof
  • show_ci (bool {False, True}) – whether to show 99% confidence interval
  • mean_ref (float) – reference mean value: what user expect to see as sample average to compare with data
  • var_ref (float) – reference variance value: what user expect to see as sample variance to compare with data
  • axe_xsize (float (default 6)) – size of the x-axis (inches)
  • axe_ysize (float (default 2.)) – size if a single ax y-axis (inches)
  • time_range (couple of floats (default (None, None))) – specifies (left, right) bounds
  • time_fractional_pad (float (default .1)) – fraction of x-range to add as padding
  • counts_range (couple of floats (default (None, None))) – specifies range for the Counts y-axis
  • counts_fractional_pad (float (default .2)) – fractional amount of y-range to add as padding
  • average_range (couple of floats (default (None, None))) – sepcifies range for the Average y-axis
  • average_fractional_pad (couple of floats (default .2)) – fractional amounts of range to padding
  • variance_range (couple of floats (default (None, None))) – sepcifies range for the Variance y-axis
  • average_fractional_pad – fractional amounts of range to padding
  • show_legend (bool {True, False}) – print out legend
  • show_cdt_details_in_legend (bool {False, True}) – show details about filters
  • use_obs_name (str (default None)) – when filled, the plot title will use this observable name instead of looking for the observable registered name
  • save (bool {False, True}) – whether to save plot
  • user_path (str (default None)) – user defined path where to save figure; default is canonical path (encouraged)
  • ext (str {'.png', '.pdf'}) – extension to be used when saving file
  • verbose (bool {False, True}) –
tunacell.plotting.dynamics.plot_stationary(stationary, show_cdts='all', axe_xsize=6.0, axe_ysize=2.0, time_range=(None, None), time_fractional_pad=0.1, time_guides=[0.0], counts_range=(None, None), counts_fractional_pad=0.1, corr_range=(None, None), counts_logscale=False, corr_fractional_pad=0.1, corr_logscale=False, corr_guides=[0.0], show_exp_decay=None, show_legend=True, show_cdt_details_in_legend=False, use_obs_name=None, save=False, ext='.png', verbose=False)

Plot stationary autocorrelation.

Parameters:
  • stationary (StationaryUnivariate or StationaryBivariate instance) –
  • axe_xsize (float (default 6)) – size (in inches) of the x-axis
  • axe_ysize (float (default 2)) – size (in inches) of the individual y-axis
  • time_range (couple of floats) – bounds for time (x-axis)
  • time_fractional_pad (float) – fractional padding for x-axis
  • counts_range (couple of ints) – bounds for counts axis
  • counts_fractional_pad (float) – fractional padding for counts axis
  • corr_range (couple of floats) – bounds for correlation values
  • counts_logscale (bool {False, True}) – use logscale for counts axis
  • corr_fractional_pad (float) – fractional padding for correlation values
  • corr_logscale (bool {False, True}) – use logscale for correlation values (symlog is used to display symmetrically negative values)
  • corr_guides (list of float) – values where to plot shaded grey horizontal lines
  • show_exp_decay (float (default None)) – whether to plot an exponential decay with corresponding rate exp(-rate * t)
  • save (bool {False, True}) – whether to save plot at canonical path
  • use_obs_name (str (default None)) – when filled, the plot title will use this observable name instead of looking for the observable registered name
  • ext (str {'.png', '.pdf'}) – extension used for file
Returns:

fig

Return type:

Figure instance

tunacell.plotting.dynamics.plot_twopoints(univariate, condition_label=None, trefs=[], ntrefs=4, axe_xsize=6.0, axe_ysize=2.0, time_range=(None, None), time_fractional_pad=0.1, counts_range=(None, None), counts_fractional_pad=0.1, corr_range=(None, None), corr_fractional_pad=0.1, delta_t_max=None, show_exp_decay=None, show_legend=True, show_cdt_details_in_legend=False, use_obs_name=None, save=False, ext='.png', verbose=False)

Plot two-point functions: counts and autocorrelation functions.

These plots are able to show only one extra condition with ‘master’, and are plotted for a set of time of references.

Parameters:
  • univariate (Univariate instance) –
  • condition_label (str (default None)) – must be the repr of a given FilterSet
  • trefs (flist of floats) – indicate the times that you would like to have as references if left empty, reference times will be computed automatically
  • ntrefs (int) – if trefs is empty, number of times of reference to display
  • axe_xsize (float (default 6)) – size of the x-axis (inches)
  • axe_ysize (float (default 2.)) – size if a single ax y-axis (inches)
  • time_range (couple of floats (default (None, None))) – specifies (left, right) bounds
  • time_fractional_pad (float (default .1)) – fraction of x-range to add as padding
  • counts_range (couple of floats (default (None, None))) – specifies range for the Counts y-axis
  • counts_fractional_pad (float (default .2)) – fractional amount of y-range to add as padding
  • corr_range (couple of floats (default (None, None))) – sepcifies range for the Average y-axis
  • corr_fractional_pad (couple of floats (default .2)) – fractional amounts of range to padding
  • delta_t_max (float (default None)) – when given, bottom plot will be using this max range symmetrically; otherwise, will use the largest intervals found in data (often too large to see something)
  • show_exp_decay (float (default None)) –
    when a floating point number is passed, a light exponential decay
    curve is plotted for each tref
    show_legend : bool {True, False}
    print out legend
  • show_cdt_details_in_legend (bool {False, True}) – show details about filters
  • use_obs_name (str (default None)) – when filled, the plot title will use this observable name instead of looking for the observable registered name
  • save (bool {False, True}) – whether to save figure at canonical path
  • ext (str {'.png', '.pdf'}) – extension to be used when saving figure
  • verbose (bool {False, True}) –

Indices and tables