Computer simulations are powerful tools for understanding complex evolutionary and genetic processes and their relationships to ecological processes. Simulations can be used, for example, to predict complex scenarios involving the interaction between evolutionary forces or evaluate the plausibility of alternative hypotheses or, validate and evaluate genetic methods, see Hoban et al., 2012 for more examples.
We developed in dartR
user-friendly functions to simulate in silico the five evolutionary forces, namely: natural selection, genetic drift, mutation, gene flow and recombination. These simulations follow the Wright–Fisher model i.e. generations are forward in time, discrete (birth, reproduction and death is simultaneous for all individuals and synchronous among all individuals), and not overlapping (parents and offspring do not coexist), population size is constant (in each generation the entire population is replaced by sampling the same number of offspring as there were parents in the previous generation), mating is random within populations, all individuals reproduce and sex ratio is equal to one.
The simulation model can be customised by using more than 100 different parameters. These simulation parameters can be specified using three different options:
Shiny
(a web application framework for R).Description of the simulation parameters are included in the CSV file and will be displayed in the interactive window by hovering over the parameter name. The same parameter names are used in the documentation, tutorials and in the actual code of the simulation model.
Some of the main characteristics of the simulation model are:
The simulation model has been rigorously tested, validated and documented (see here).
The obvious first step is to have installed R, RStudio and dartR on your computer. Check out our installation tutorial for this process here.
Running the basic simulations involves two steps:
gl.sim.WF.table()
.gl.sim.WF.run()
.We use the function gl.sim.WF.table()
to produce the reference table. This table contains the 8 following variables for each of the loci to be simulated:
An example of this table is below.
There are three options in which we can submit the values of the parameters into the function to produce this reference table.
1. Interactive option
If the function is run without any parameter (i.e. using the default values), an interactive window will be displayed in which we can specify the values of the parameters to produce the reference table. When you are finished selecting the parameters values, click on the RUN button at the bottom of the window.
Information about each parameter will be displayed if you hover over the variable name. Note that it might be necessary to call first library(shinyBS)
for the information to be displayed.
library(dartR) library(shinyBS) ref_table <- gl.sim.WF.table()
The interactive window of the function gl.sim.WF.table()
looks like this.
2. CSV file option
In this option a CSV file can be used as template to specify the values of the parameters to produce the reference table. The name of this template is ref_variables.csv
.
This template can be downloaded from here or you can locate the file included with dartR by typing in the R console:
system.file('extdata', 'ref_variables.csv', package ='dartR')
The values of the parameters can be modified using the third column (“value”) of this file. To make things easier, this file should be located in your working directory.
In this option, we use the function parameter file_var
to specify that we are using the file ref_variables.csv
as input and turn off the interactive window by using interactive_vars = FALSE
.
ref_table <- gl.sim.WF.table(file_var = "ref_variables.csv", interactive_vars = FALSE)
3. CSV file and parameters within the function option
In this option we use the CSV file ref_variables.csv
as template, however value parameters can be specified within the function which will overwrite the values used in the CSV file. For example, in the code below, we specify that we are using the file ref_variables.csv
as input, turn off the interactive window and that we are going to simulate 20 loci with deleterious alleles.
ref_table <- gl.sim.WF.table(file_var="ref_variables.csv", interactive_vars = FALSE, loci_deleterious = 20)
The reference table can be modified "by hand" if required.
Running the actual simulationsWe use the function gl.sim.WF.run()
to run the simulations and use the output of the function gl.sim.WF.table()
to specify the 8 variables for each locus.
Similar to the function gl.sim.WF.table()
, we can submit the values of the parameters into the function in three different ways.
Take into account that the simulations will take a little bit longer the first time you use the function gl.sim.WF.run()
because C++ functions must be compiled.
1. Interactive option
We first specify the variable in which we stored the output of the function gl.sim.WF.table()
using the parameter ref_table
. An interactive window will be displayed if we use the default value (interactive_vars = TRUE
). In this window, we can specify the value of the parameters to run the simulations. When you are finished selecting the parameter values, click on the RUN button at the bottom of the window.
res_sim <- gl.sim.WF.run(ref_table = ref_table)
The interactive window of the function gl.sim.WF.run()
looks like this.
2. CSV File option
In this option we used the CSV file sim_variables.csv
to specify the values of the simulation variables. This template can be downloaded from here or you can locate the file included with dartR by typing in the R console:
system.file('extdata', 'sim_variables.csv', package ='dartR')
The values of the variables can be modified using the third column (“value”) of this file. To make things easier, this file should be located in your working directory.
In this option, we use the function parameter file_var
to specify that we are using the file sim_variables.csv
as input, turn off the interactive window by using interactive_vars = FALSE
and the parameter ref_table
to indicate where we store the output of the function gl.sim.WF.table()
.
res_sim <- gl.sim.WF.run(file_var = "sim_variables.csv", ref_table = ref_table, interactive_vars = FALSE)
3. CSV File and parameters within the function option
In this option we use the CSV file sim_variables.csv
as template, however value parameters can be specified within the function which will overwrite the values used in the CSV file. For example, in the code below, we specify that we are using the file sim_variables.csv
as input, turn off the interactive window and that we are going to simulate 2 populations.
res_sim <- gl.sim.WF.run(file_var = "sim_variables.csv", ref_table = ref_table, interactive_vars = FALSE, number_pops_phase2 = 2)
That's it!!!
The output of the simulations are genlight objects which you can analise right away with several of dartR functions.
Simulations with two phasesAlthough in the simulations population size remain constant across generations, simulations can have 2 phases (phase 1 and phase 2). Simulation parameter values are constant across generations in each phase but parameter values can be different in each phase. Parameters that can be specified for each phase end either with the words phase1
or phase2
. The default is to run just phase 2. To include phase 1 in the simulations, select Phase 1
in the interactive window or set to TRUE
the value column of the parameter phase1
if the CSV file is used as input. Alternatively, use the parameter phase1 = TRUE
within the function in combination with the CSV file, as shown below.
library(dartR) library(shinyBS) ref_table <- gl.sim.WF.table(file_var = "ref_variables.csv", interactive_vars = FALSE) res_sim <- gl.sim.WF.run(file_var = "sim_variables.csv", ref_table = ref_table, interactive_vars = FALSE, phase1 = TRUE)Simulating Gene Flow (dispersal)
In the basic simulations, dispersal is symmetric, constant and equal. These parameters can be customised using the function gl.sim.create_dispersal()
(see subtitle "Customising gene flow" below). Dispersing individuals are sampled at random and the sex of the dispersing individual is alternated every time a dispersal event occurs. If two or more individuals are exchanged the same generation, half are males and half females.
Dispersal is controlled by three parameters:
number_transfers
determines the number of dispersing individuals in each dispersal event.transfer_each_gen
determines the interval of number of generations in which a dispersal event occurs.dispersal_type
determines the type of dispersal.Dispersal is modeled in three different ways which can be specified using the dispersal_type
(see image below):
circle
all_connected
line
In the figure below, blue circles represent populations and green lines represent gene flow between populations.
As you might remember, the simulations can have two phases (the default is using just phase 2) and some parameters values can be different in each phase. To identify the parameters for phase 1, parameter names end with the words phase1
and for phase 2 end with phase2
. For example, if you want to change the dispersal type for phase 1, you would use the parameter dispersal_type_phase1
and for phase 2 you would use the parameter dispersal_type_phase2
.
Below is an example in which we specify that we want 2 individuals to disperse every 2 generations. Note that we didn't specify the type of dispersal, in this case the simulations will use the default value (i.e. all_connected
).
ref_table <- gl.sim.WF.table(file_var = "ref_variables.csv", interactive_vars = FALSE) res_sim <- gl.sim.WF.run(file_var = "sim_variables.csv", ref_table = ref_table, interactive_vars = FALSE, number_transfers_phase2 = 2, transfer_each_gen_phase2 = 2)
Customising dispersal
To further customise dispersal, we can use the function gl.sim.create_dispersal()
. This function produces a CSV file which contains the dispersal parameters values for each pair of populations. This CSV file is to be used as input for the function gl.sim.WF.run()
.
In the example below, we specify that we want to simulate four populations (number_pops = 4
) which are all connected (dispersal_type = "all_connected"
), one individual will disperse to each population (number_transfers = 1
) every generation (transfer_each_gen = 1
). Finally, we specify that we want to write the CSV file in the working directory using the R function getwd()
and the parameter outpath
.
gl.sim.create_dispersal(number_pops = 4, dispersal_type = "all_connected", number_transfers = 1, transfer_each_gen = 1, outpath = getwd() )
The default name for this file is dispersal_table.csv
, but you can change its name using the parameter outfile
. If you open this CSV file in Excel, it looks like this:
Once that the CSV file is produced, you can specify the values of the parameters number_transfers
and transfer_each_gen
for each population pair by opening the CSV file using Excel for example.
For example, in the code below, we will simulate four populations (number_pops_phase2
) with 50 individuals each (population_size_phase2
). Note that the values for the parameter population_size_phase2
must be quoted, be even, separated by a space and have the same number of values as there are populations to simulate. Finally, to specify that we want to use the CSV file dispersal_table.csv
as input for the function gl.sim.WF.run()
, we use the parameter file_dispersal
.
ref_table <- gl.sim.WF.table(file_var = "ref_variables.csv", interactive_vars = FALSE) res_sim <- gl.sim.WF.run(file_var = "sim_variables.csv", ref_table = ref_table, interactive_vars = FALSE, number_pops_phase2 = 4, population_size_phase2 = "50 50 50 50", file_dispersal = "dispersal_table.csv" )
Genetic drift is the effect, or the tendency to reduce genetic diversity and increase allelic fixation over time, resulting from sampling alleles from generation to generation in finite populations. Genetic drift is an emergence behavior of the simulations. Emergence can be defined as that system behavior that can emerge from the agents behavior and from their environment instead of being imposed by equations or rules (Grimm et al., 2020).
The effects of genetic drift depend on effective population size
(Ne) and not on the actual or census population size (Nc). The smaller Ne is, the stronger the effects of genetic drift.
The simulation with the default values, simulates populations with a ratio Ne/Nc = 1 (see "Validation" section below). This ratio can be modified by the user using the parameter variance_offspring
(see "Calibration" section below).
In the simulations, selection is directional (replaces one allele by another) and it is used to and individual fitness is calculated using the multiplicative model (cumulative multiplication of the fitness of each locus in the individual).
Selection For loci under selection with a deleterious allele “a” and an alternative allele “A”, the relative fitness (w) of each allele combination within a locus within an individual (i.e. genotype) are: wAA = 1, wAa = 1 − hs, and waa = 1 – s, where h is the dominance coefficient of the deleterious allele and s is the selection coefficient of the deleterious allele.
For loci under selection with a beneficial allele “a” and an alternative allele “A”, the relative fitness (w) of each allele combination within a locus within an individual (i.e. genotype) are: wAA = 1, wAa = 1 + hs, and waa = 1 + s, where h is the dominance coefficient of the beneficial allele and s is the selection coefficient of the beneficial allele.
There are seven types of locus:
real
locus based on a real location using as input a genlight object (see section "Using real information as input"). This locus type has alleles with no effects on fitness (w = 1 for all genotypes).neutral
locus with alleles with no effects on fitness (w = 1 for all genotypes).deleterious
locus with one allele with detrimental effects on fitness (wAA = 1, wAa = 1 − hs, and waa = 1 – s).advantageous
locus with one allele with beneficial effects on fitness (wAA = 1, wAa = 1 + hs, and waa = 1 + s).mutation_neu
locus with mutation with alleles with no effects on fitness (w = 1 for all genotypes).mutation_del
locus with mutation with one allele with detrimental effects on fitness (wAA = 1, wAa = 1 − hs, and waa = 1 – s).mutation_adv
locus with with one allele with beneficial effects on fitness (wAA = 1, wAa = 1 + hs, and waa = 1 + s).In a relative fitness model (selection_model = ”relative”), also called soft selection or density dependent selection (Wallace, 1975; Lesecque et al., 2012), the fitness of each individual (W) is dependent on the fitness of other individuals in the population (i.e., based on competition). In this model, first the fitness of each offspring (W) is recalculated as Wrelative = W / summation of the fitness of all offspring. Afterwards, offspring are selected (equal sex ratio) to become parents of the next generation with a probability equal to their recalculated fitness (Wrelative; Lesecque et al., 2012). For example, if we had four individuals with fitnesses (W) of 0.1, 0.2, 0.3, and 0.2 the first individual would be selected on average Wrelative = 0.1 / (0.1 + 0.2 + 0.3 + 0.2) = 0.125 of the time to become parent of the next generation.
In an absolute fitness model (selection_model = ”absolute”), also called hard selection or density independent selection (Wallace, 1975; Lesecque et al., 2012), genetic load is defined as “the loss of fitness resulting from deleterious alleles maintained by mutation- selection balance” (Agrawal & Whitlock, 2012). Genetic load effectively measures the fraction of the population that fails to survive or reproduce (Charlesworth & Charlesworth, 2010). The fitness of each individual (W) is independent of the fitness of other individuals in the population (Wallace, 1975). In this model, first a random number is sampled from a uniform distribution within an interval between 0 and genetic_load, which is an approximation of the genetic load of the proportion of the genome that is simulated. This approximation can be guided by equations to calculate genetic load described in the Section 3.2.1.7.4. Afterwards, individuals are set to death if the random number is larger than the total fitness of the individual (W; Wang et al., 1999). Finally, offspring are sampled at random (equal sex ratio) from the remaining live offspring to become parents of the next generation.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4