Welcome to FITS¶
Inference of population-genetics parameters from time-series data.
FITS (Flexible Inference from Time-Series data) comes in two distributions:
- Graphical user interface (GUI) distribution for Windows and MacOS
- Command line interface (CLI) distribution for Linux, Windows and MacOS
Before you start¶
Compiling from source¶
The most recent version of FITS is available here. In order to compile, FITS requires the Boost library 1.69 and a C++11 supporting compiler. We used gcc 8.2 on Linux (Centos), Clang provided with Xcode9 on MacOS (High Sierra) and MinGW provided with Qt 5.12 on Windows 10 & 7.
Note
Before you start, make sure GCC is in the PATH
.
Compiling the command line interface¶
Download and extract the Boost library 1.69 to a convenient location.
Compile all *.cpp files, referring the compiler to the Boost libraries, e.g.:
g++ -std=c++11 -O3 -o fits1.3.2 -I/path/to/boost/ -L/path/to/boost/libs *.cpp
(here we tell gcc to use c++11 standard, use optimization (-O3) and name the output file (-o) fits1.3.2).
Run FITS with no command line arguments to get the help text
Run FITS with the proper syntax (see Using the command line interface) in order to generate data or infer the required parameter
Compiling the graphical user interface¶
In order to compile the GUI, you will need (in addition to Boost) the Qt framework, along with Qt Creator. Both are available in open source license here.
Note
When installing the Qt framework, make sure you also install the included MinGW compiler.
- Extract the source code zip folder in your favorite location.
- Open Qt Creator.
- Click File>Open file or project and locate the project file (fits_gui.pro)
- Within the project file, replace placeholder text next to
__INCLUDEPATH__
with the path to the boost library - Click “fits_gui” in the left toolbox and make sure the configuration is set to Release.
- Click Build>Build All. After build completion, FITS executable will be found in the Build directory (you can see where it is under Projects>Build directory that is available from the left toolbar.
- Move fits_gui.exe to a new folder. Open the console and navigate to that folder.
- For windows: Locate windeployqt.exe under bin directory in the Qt installation folder. Run windeployqt.exe with fits_gui.exe as its sole argument:
path/to/windeployqt.exe fits_gui.exe
. - For MacOS: Locate macdeployqt.exe under bin directory in the Qt installation folder. Run macdeployqt with fits_gui as its sole argument:
path/to/macdeployqt fits_gui
.
- For windows: Locate windeployqt.exe under bin directory in the Qt installation folder. Run windeployqt.exe with fits_gui.exe as its sole argument:
FITS input¶
FITS requires two types of input: Data file and Parameters file.
Data file¶
This file is expected to hold observed allele information from the system under study. FITS expects a tab-delimited textual file, with following columns:
gen
for the generation of the observationallele
for the observed statefreq
for the measured frequency for that stateposition
for the position number for which the frequency data is given (optional)
Note
FITS assumes the columns to appear in the above order.
Note
The allele with the highest frequency at the first available time point will be defined as WT (w=1).
gen | allele | freq | position |
---|---|---|---|
0 | 0 | 1 | 1 |
0 | 1 | 0 | 1 |
1 | 0 | 1 | 1 |
1 | 1 | 0 | 1 |
2 | 0 | 0.99999 | 1 |
2 | 1 | 1e-05 | 1 |
3 | 0 | 0.9999899998 | 1 |
3 | 1 | 1.00002e-05 | 1 |
4 | 0 | 0.9999899998 | 1 |
4 | 1 | 1.00002e-05 | 1 |
5 | 0 | 0.9999600016 | 1 |
5 | 1 | 3.99984e-05 | 1 |
You can also download an example
.
Note
For each generation, the sum of frequencies for the different alleles should be 1.
Note
FITS accepts allele frequencies at a given loci. Sequencing techniques tend to vary in their accuracy, so sometimes the provided allele frequencies may be inaccurate. If using inaccurate input, FITS inferences may be inaccurate as well. Specific examples include:
- Inference of fitness of highly deleterious mutations where the accuracy threshold of sequencing is worse than the mutation rate.
- Inference of mutation rate from neutral alleles when the number of generations X the mutation rate is lower than the accuracy threshold of the sequencing.
- Inference of mutation rate or fitness when very shallow sequencing is available (due to limited sampling or limited sequence coverage).
Parameters file¶
This file provides FITS with population genetics parameters information of the system under study.
Each line in this file represents a different parameter to set, where a space exists between the name of the parameter and its value: <parameter_name> <parameter value>
.
Note
If you want to put comments within the parameters file, just add #
at the beginning of the comments’ lines.
You can also download an example
.
General parameters¶
Parameter name | Type | Description |
---|---|---|
N | Integer | Size of population |
sample_size | Integer | Size of observed population (e.g., sequenced genomes) |
bottleneck_size | Integer | Size of the population transferred on a bottleneck event |
bottleneck_interval* | Integer | Number of generations separating between bottleneck events (default: 0) |
num_alleles | Integer | Number of alleles observed in all loci |
mutation_rateX_Y | Float | Rate of mutation of allele X to allele Y. Not required if mutation rate is to be inferred |
fitness_alleleX | Float | Fitness value assigned to allele X. Not required if fitness is to be inferred |
logistic_growth* | Float | 1: model the population growth throughout the generations with a logistic growth model (default: 0) |
logistic_growth_K | Float | Logistic model - upper bound |
logistic_growth_r | Float | Logistic model - proportionality constant |
*parameter value of 0 means disabled/off; positive values mean enabled/on.
ABC parameters¶
Parameter name | Type | Description |
---|---|---|
num_samples_from_prior | Integer | How many simulations to perform |
acceptance_rate | Float | Fraction of best simulations to utilize for the inference of the parameter. |
Single simulation¶
Parameter name | Type | Description |
---|---|---|
num_generations | Integer | Number of generations to simulate |
init_freq_alleleX | Float | Initial frequency of allele X |
Fitness inference parameters¶
Parameter name | Type | Description |
---|---|---|
fitness_prior | Text | One of the following:
uniform (for Uniform distribution)
log_normal (based on Bons et al. 2018)
fitness_composite
smoothed_composite (default)
See the distribution of the above priors on a (0,2) fitness here
|
min_fitness_alleleX | Float | The minimum fitness value (inclusive) that may be assigned to allele X |
max_fitness_alleleX | Float | The maximum fitness value (exclusive) that may be assigned to allele X |
Mutation rate inference parameters¶
X and Y are alleles defined in the data file (i.e., 0 and 1).
Parameter name | Type | Description |
---|---|---|
min_log_mutation_rateX_Y | Float | Minimum (inclusive) ![]() ![]() |
max_log_mutation_rateX_Y | Float | Maximum (exclusive) ![]() ![]() |
Population size inference parameters¶
Parameter name | Type | Description |
---|---|---|
Nlog_min | Float | Minimum (inclusive) exponent ![]() ![]() |
Nlog_max | Float | Maximum (exclusive) exponent ![]() ![]() |
FITS output¶
General¶
FITS infers population genetics parameters using the Approximate Bayesian Computation (ABC) method. The output of this method is a distribution of values that explain the observed allele frequencies with the highest probabilities (also called the posterior distribution). A common practice is to take the median of this distribution as the inferred value of the parameter under study.
The results below are outputted for all inferences.
Result header | Description |
---|---|
median | The median value of the posterior distribution. This is practically the inferred population genetics parameter. |
MAD | Median Absolute Deviation index (MAD) of the posterior distribution. |
min | The minimum value in the posterior distribution. |
max | the maximum value in the posterior distribution. |
pval | The result of a statistical test about the informativeness of the posterior distribution, with a null hypothesis that the posterior distribution is as informative as the prior distribution. |
Fitness inference results¶
In addition to the General reported values, in fitness inference more data are available:
Result header | Description |
---|---|
allele | The allele for which the results are reported |
DEL(%) | The proportion of the posterior distribution with values below 1. |
NEU(%) | The proportion of the posterior distribution with values equal to 1. |
ADV(%) | The proportion of the posterior distribution with values above 1. |
category | A possible classification of the allele into {LETHAL,DEL,NEU,ADV}, based on the inferred fitness value. |
Note
Some fitness priors rarely choose the exact value of 1 and therefore NEU(%) will approach zero, even for neutral alleles.
Mutation rate inference results¶
FITS infers the mutation rates between all defined alleles. Accordingly, the output table contains the target allele in the first row and the source allele in the first column.
Note
In Evolve & Resequence (E&R) studies, when the population is homogeneous at first generation, in the absence of more information the inference of the rates between the minor allele to the major will be insignificant, so the pval should be taken into account.
Using FITS¶
Using the graphical interface¶
After opening FITS, the following screen will be visible:
Click the Browse...
button near the Parameters
label to load a parameters file (example Parameters file).
Note
The loaded parameters may be viewed using the View
button.
From the given parameters, FITS will automatically identify the possible inference mode (in the example below, Fitness inference mode).
To load the data file, click Browse...
near the Data
label just below the Parameters
label. Locate and select the Data file.
Note
FITS expects the data file to be tab-delimited. If using Office Excel, save your worksheet as tab delimited
file.
Verify the content and the format of the file if FITS fails to run.
Within the Actions
area, FITS will automatically suggest available actions according to the parameters available in the parameters file.
Press Go!
to perform the selected action. FITS will show a progress bar and estimated time to completion.
The inference results are given in the Output
area. It may be copied to the clipboard (for example, to be pasted into a spreadsheet). Inference output, prior and posterior distributions may be exported to text files.
The inference results are explained in the FITS output page.
Using the command line interface¶
Running fits with no parameters prints the help screen to the console, listing possible usage syntaxes. For fitness inference, as an example, the syntax is:
fits -fitness <param_file> <actual_data_file> <posterior_file> <summary_file> (optional: <prior_file>)
Use cases¶
Fitness inference¶
N 100000
was set.mutation_rate0_1 1e-05
was set.min_fitness_allele1 0.0
was set, to indicate zero minimal expected fitness and max_fitness_allele1 2.0
as well, to indicate the maximal possible fitness value of two.fitness_prior smoothed_composite
was set.num_samples_from_prior 100000
was set, to indicate 100,000 simulations, and the parameter acceptance_rate 0.01
was set, to indicate that the top 1% simulations will be used to decide on the fitness value of this allele.The data file for a simulated neutral allele (fitness of 1) under a populations size of 105 and a mutation rate of 10-5 is available here
. The corresponding parameters file is available here
.
The inferred fitness value by FITS was practically 1:
Mutation rate inference¶
min_log_mutation_rate0_1 -7
and its reciprocal min_log_mutation_rate1_0 -7
.
For providing the maximal log mutation rate between the wildtype allele and the mutant we set max_log_mutation_rate0_1 -3
and its reciprocal max_log_mutation_rate1_0 -3
.fitness_allele0 1.0
and fitness_allele1 1.0
.The data file for simulated neutral alleles (fitness of 1) under a populations size of 105 and a mutation rate of 10-5 is available here
. The corresponding parameters file is available here
.
Population size inference¶
Nlog_min 4
to indicate minimum population size of 104.
We also set the parameter Nlog_max 7
to indicate the maximum population size of 107.fitness_allele0 1.0
and fitness_allele1 1.0
.here
. The corresponding parameters file is available here
.Trajectory simulations¶
num_alleles 2
to indicate two alleles.fitness_allele0 1.0
and fitness_allele1 1.02
.mutation_rate0_1 0.001
and mutation_rate1_0 0.001
.N 100000
.init_freq_allele0 1
and init_freq_allele1 0
.
To control for the number of generations (100 in our example) we set num_generations 100
.Note
There’s no need to load a Data file in order to perform the simulations.
Considering Sample Effect¶
bottleneck_size
, bottleneck_interval
and sample_size
, as illustrated below: