A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/damiendevienne/phylter below:

damiendevienne/phylter: Detection of outlier genes and species in phylogenomics

PhylteR, a tool for analyzing, visualizing and filtering phylogenomics datasets

phylter is a tool that allows detecting, removing and visualizing outliers in phylogenomics dataset by iteratively removing taxa from gene families (gene trees) and optimizing a score of concordance between individual matrices.
phylter relies on DISTATIS (Abdi et al, 2005), an extension of multidimensional scaling to 3 dimensions to compare multiple distance matrices at once.
phylter builds on Phylo-MCOA (de Vienne et al. 2012) but is much faster and accurate.
phylter takes as input either a collection of phylogenetic trees (that are converted to distance matrices by phylter), or a collection of pairwise distance matrices (obtained from multiple sequence alignements, for instance).
phylter accepts data with missing values (missing taxa in some genes).
phylter detects outliers with a method proposed by Hubert & Vandervieren (2008) for skewed data.
phylter does not accept that the same taxa is present multiple times in the same gene.

phylter is written in R language.

For details about the functions, their usage, and a in-depth description of the use of phylter on a biological dataset, step-by-step, please vist the phylter web page : https://damiendevienne.github.io/phylter.

Note: if you don't use R or don't want to use R, containerized versions of phylter are also available (Docker and Singularity): https://damiendevienne.github.io/phylter/articles/phyltercontainer.html

if you use phylter, please cite: Comte, A., Tricou, T., Tannier, E., Joseph, J., Siberchicot, A., Penel, S., Allio, R., Delsuc, F., Dray, S., de Vienne, D.M. (2023). PhylteR: Efficient Identification of Outlier Sequences in Phylogenomic Datasets, Molecular Biology and Evolution, 40(11) msad234, https://doi.org/10.1093/molbev/msad234

phylter is now on CRAN.

Installation is as easy as typing what follows at the R command prompt:

install.packages("phylter")

If you want the latest version, you can also install the development version of phylter:

  1. Install the release version of remotes from CRAN:
install.packages("remotes")
  1. Install the development version of phylter from GitHub:
remotes::install_github("damiendevienne/phylter")
  1. Once installed, the package can be loaded:

Note: phylter requires R version > 4.0, otherwise it cannot be installed. Also, R uses the GNU Scientific Library. On Ubuntu, this can be installed prior to the installation of the phylter package by typing sudo apt install libgsl-dev in a terminal.

Here is a brief introduction to the use phylter on a collection of gene trees. For more detailed explanations and a use case example, please visit https://damiendevienne.github.io/phylter/.

1. With the read.tree function from the ape package, read trees from external file and save as a list called trees.

if (!requireNamespace("ape", quietly = TRUE))
   install.packages("ape")
trees <- ape::read.tree("treefile.tre")

2. (optional) Read or get gene names somewhere (same order as the trees) and save it as a vector called names.

3. Run phylter on your trees (see details below for possible options).

results <- phylter(trees, gene.names = names)

The phylter function is called as follows by default:

phylter(X, bvalue = 0, distance = "patristic", k = 3, k2 = k, Norm = "median", 
 Norm.cutoff = 0.001, gene.names = NULL, test.island = TRUE, 
 verbose = TRUE, stop.criteria = 1e-5, InitialOnly = FALSE, normalizeby = "row", 
 parallel = TRUE)

Arguments are as follows:

4. Analyze the results

To get the list of outliers detected by phylter, simply type:

In addition, many functions allow looking at the outliers detected and comparing before and after phyltering.

# Get a summary: nb of outliers, gain in concordance, etc.
summary(results)

# Show the number of species in each gene, and how many per gene are outliers
plot(results, "genes") 

# Show the number of genes where each species is found, and how many are outliers
plot(results, "species") 

# Compare before and after genes x species matrices, highlighting missing data and outliers 
# identified (not efficient for large datasets)
plot2WR(results) 

# Plot the dispersion of data before and after outlier removal. One dot represents one 
# gene x species association
plotDispersion(results) 

# Plot the genes x genes matrix showing pairwise correlation between genes
plotRV(results) 

# Plot optimization scores during optimization
plotopti(results) 

5. Save the results of the analysis to an external file, for example to perform cleaning on raw alignments or pruning gene trees based on the results from phylter.

write.phylter(results, file = "phylter.out")

For comments, suggestions and bug reports, please open an issue on this GitHub repository.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4