Python code (using ete2) - plotTree.py
This is rough code that we use sometimes for making figures in my lab (http://holtlab.net).
There will definitely be bugs and there's lots more features one could add, and probably many ways these features could be implemented differently/better.
So basically, I am putting this up here to share the love and save other people's time messing around with R, Python and ETE2, and so that others can use the code as the basis for their own functions or to learn how to do various handy things with these packages. If you find it useful and expand on it - please share!
Kat Holt - @DrKatHolt - http://holtlab.net
This is R code for plotting a phylogenetic tree and annotating the leaves with various information, including:
There are also options to:
(ii) Numeric values to plot as a heatmap (to be passed to the function via heatmapData="data.csv")
(iii) One column of numeric values to plot as a barplot (to be passed to the function via barData="bar.csv")
(iv) SNP allele table (to be passed to the function via snpFile="alleles.csv") will plotted to indicate the position of SNPs in each strain, where SNPs are defined as differences COMPARED TO THE ALLELES IN COLUMN 1. So, your alleles in column one should be the inferred ancestral alleles (e.g. those of an outgroup).
(v) Blocks file (to be passed to the function via blockFile="blocksByStrain.txt")
p <- plotTree(tree="tree.nwk",heatmapData="data.csv",infoFile="info.csv",barData="bar.csv",snpFile="alleles.csv", blockFile="blocksByStrain.txt")
Optionally, output to PDF:
(specify width in inches via w=X, specify height in inches via h=X)
OR output to PNG:
(specify width in pixels via w=X, specify height in pixels via h=X)
You can provide any or all of strain info, data to be plotted as a heatmap, data to be plotted as a bar chart, SNPs and/or blocks.
The order will be:
[ tree | info | heatmap | barplot | SNPs/blocks]
• Relative widths of the components can be changed in the function; by default they are:
left & right spacing framing the whole page: edgeWidth = 1
tree plotting space: treeWidth = 10
info printing space: infoWidth = 10
heatmap printing space: dataWidth = 30
barplot plotting space: barDataWidth = 10
SNP/blocks plotting space: blockPlotWidth = 10
• Relative heights of the components can be changed in the function; by default they are:
height of plotting spaces: mainHeight = 100
top & bottom spacing: labelHeight = 10
if heatmap provided, this will be the height of the area in which the column names are printed above the heatmap; otherwise the top edge height will be taken from edgeWidth
if barplot provided, this will be the height of the area in which the x-axis is printed below the barplot; otherwise the bottom edge height will be taken from edgeWidth
(see ?plot.phylo in R for more info)
• tip.labels = T turns on printing strain names at the tips
• tipLabelSize = 1 change the size of printed strain names (only relevant if tip.labels=T)
• offset=0 change the spacing between the end of the tip and the printed strain name (only relevant if tip.labels=T)
• tip.colour.cex=0.5 change the size of the coloured circles at the tips (only relevant if infoFile is provided AND colourNodesBy is specified)
• tipColours = c("blue","red","black") specify colours to use for colouring nodes (otherwise will use rainbow(n)). RColourBrewer palettes are a good option to help with colour selection
• lwd=1.5 change the line width of tree branches
• edge.color="black" change the colour of the tree branches
• axis=F,axisPos=3 add and position an axis for branch lengths
• colourNodesBy = "column name" colour the nodes according to the discrete values in this column. additional options:
legend=T, legend.pos="bottomleft" plot legend of node colour values, specify location (possible values: "topleft","topright","bottomleft" or "bottomright")
ancestral.reconstruction=T reconstruct ancestral states for this discrete variable, results will be returned as $mat and plotted as pie graphs on the tree
• infoCex=0.8 Change the size of the printed text
• heatmap.colours=
if not specified, uses white -> black
colorRampPalette is a good option, eg:
heatmap.colours=colorRampPalette(c("white","yellow","blue"),space="rgb")(100)
note the legend/scale will be plotted above the tree
• colLabelCex=0.8 change the size of the column labels
• cluster Cluster matrix columns? (Default is no clustering.)
Set cluster=T to use default hclust clustering method ("ward.D"), or specify a different method to pass to hclust (see ?hclust for options).
Alternatively, if you have a square matrix (i.e. strain x strain) and you want to order columns the same as rows to keep it square, set cluster="square"
• barDataCol=2 Colour for the barplot (can be numeric, 1=black, 2=red, etc; or text, "red", "black", etc)
• genome_size Sets the length of the x-axis that represents the length of the genome. This is REQUIRED when plotting SNPs/blocks.
• gapChar="-" Character used to indicate gaps/unknown alleles in the SNP file (will not be counted as SNPs).
• snp_colour Sets the colour of the lines indicating SNPs (default is red)
• genome_size Sets the length of the x-axis that represents the length of the genome. This is REQUIRED when plotting SNPs/blocks.
• block_colour Sets the colour of the lines indicating blocks (default is black). Blocks are drawn after SNPs, so may obscure SNPs.
• blwd Sets the height of the lines indicating blocks (default is 5).
Ancestral trait reconstructionTo perform ancestral discrete trait reconstruction using ace, and plot the results as pie graphs on each node of the tree:
(i) specify the variable in the infoFile that you want to analyse: colourNodesBy="Variable_name"
(ii) set ancestral.reconstruction = T
(iii) to change the size of the pie graphs, change pie.cex (default value is 0.5)
Primary output is the rendered tree figure (in the R drawing device or in a PDF/PNG file if specified) The plotTree() function also returns an R object with the following:
$info: infoFile input file, re-ordered as per tree
$anc: result of ancestral discrete trait reconstruction using ace
$mat: heatmap data file, with rows re-ordered as per tree and columns re-ordered as per clustering (if cluster=T)
$strain_order: order of leaves in the tree
Data (trees and tables) used in this example are available in the subdirectory /tree_example_april2015
Plot tree, colour tips by city of isolation, specify colours for each city manually, print strain details as table next to tree.
v <- plotTree(tree="tree.nwk",ancestral.reconstruction=F,tip.colour.cex=1,cluster=T,tipColours=c("black","purple2","skyblue2","grey"),lwd=1,infoFile="info.csv",colourNodesBy="location",treeWidth=10,infoWidth=10,infoCols=c("name","location","year"))
Plot tree, colour tips by location (as above), cluster a gene content matrix and plot as heatmap next to the tree (white = 0% coverage of gene, black = 100% coverage of the gene).
v <- plotTree(tree="tree.nwk",heatmapData="pan.csv",ancestral.reconstruction=F,tip.colour.cex=1,cluster=T,tipColours=c("black","purple2","skyblue2","grey"),lwd=1,infoFile="info.csv",colourNodesBy="location",treeWidth=5,dataWidth=20,infoCols=NA)
Plot tree, colour tips by location (as above), plot curated resistance gene information next to the tree as a heatmap...
Here the gene information in the heatmapData file is coded so that 0 represents absence, and different numbers are used to indicate presence of each gene/variant (e.g. in the gyrA column, one mutation is coded as 2 and the other is coded as 4).
We then specify which colour to use for each number, using heatmap.colours... here 0 (ie absent) is white; 2 (ie gyrA mutant 1) is "seagreen3"; 4 (ie gyrA mutant 2) is "darkgreen", etc etc.
heatmap.colours=c("white","grey","seagreen3","darkgreen","green","brown","tan","red","orange","pink","magenta","purple","blue","skyblue3","blue","skyblue2")
v <- plotTree(tree="tree.nwk",heatmapData="res_genes.csv",ancestral.reconstruction=F,tip.colour.cex=1,cluster=F,heatmap.colours=c("white","grey","seagreen3","darkgreen","green","brown","tan","red","orange","pink","magenta","purple","blue","skyblue3","blue","skyblue2"),tipColours=c("black","purple2","skyblue2","grey"),lwd=1,infoFile="info.csv",colourNodesBy="location",treeWidth=10,dataWidth=10,infoCols=c("name","year"),infoWidth=8)
This is a Python script, that uses the Python package ETE2 (see http://ete.cgenomics.org/) which in turn requires pyqt4 and numpy. If you have BioPython then you will already have numpy. To install the rest, follow these steps:
Download sip from http://www.riverbankcomputing.com/software/sip/download
Unpack it (tar -xzvf sip.tar.gz) and cd into the directory (cd sip...)
Run these commands:
python configure.py -d /Library/Python/2.7/site-packages --arch x86_64
make
sudo make install
Download the Qt4 binary from http://qt.nokia.com/downloads/sdk-mac-os-cpp
download PyQt4 from http://pyqt.sourceforge.net/Docs/PyQt4/installation.html
unpack it and cd into the directory
Run these commands:
python configure.py -q /usr/bin/qmake-4.8 -d /Library/Python/2.7/site-packages/ --use-arch x86_64
make
sudo make install
sudo easy_install -U ete2
To use ETE2 on your own, see http://ete.cgenomics.org/
Running on contagion or merri
The script will work on a remote server, but ONLY IF you add "-Y" to the ssh login command. Eg:
ssh -Y you@server.com
--output tree.pdf
--tree tree.nwk
--info info.csv
--labels LABELS [LABELS ...]
--tags
--padding PADDING
--colour_nodes_by COLOUR_NODES_BY
--node_size NODE_SIZE
-size for node shapes (default 10)
(iii) Use colour blocks to represent column values--colour_tags COLOUR_TAGS [COLOUR_TAGS ...]
--colour_dict COLOUR_DICT
--data DATA
--data_type DATA_TYPE
--data_width DATA_WIDTH
Total width of data plot for each strain (mm, default 200)
--data_height DATA_HEIGHT
--mindata MINDATA Minimum data value for plotting scale (-1)
--maxdata MAXDATA Maximum data value for plotting scale (1)
--centervalue CENTERVALUE
--midpoint Midpoint root the tree
--outgroup OUTGROUP Outgroup to root the tree
--no_ladderize Switch off ladderizing
--show_leaf_names Print leaf names as well as labels
--branch_support Print branch supports
--no_guiding_lines Turn off linking nodes to data with guiding lines
--fan Plot tree as fan dendrogram
--length_scale LENGTH_SCALE scale (pixels per branch length unit)
--branch_padding BRANCH_PADDING branch (pixels between each branch, ie vertical padding)
--branch_support_print Print branch supports
--branch_support_colour Colour branch leading to node by branch supports (scale is 0=red -> 100=black)
--branch_support_cutoff BRANCH_SUPPORT_CUTOFF
--colour_branches_by COLOUR_BRANCHES_BY variable to use for colouring branches
--colour_backgrounds_by COLOUR_BACKGROUNDS_BY variable to use for colouring clade backgrounds
--title TITLE Title for plot
--width WIDTH width of output image pile (mm, default 200)
--interactive Switch on interactive view (after printing tree to file)
plotTree script - examplesCOMING SOON
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4