The ggtree package should not be viewed solely as a standalone software. While it is useful for viewing, annotating and manipulating phylogenetic trees, it is also an infrastructure that enables evolutionary evidences that inferred by commonly used software packages in the field to be used in R. For instance, dN/dS values or ancestral sequences inferred by CODEML1, clade support values (posterior) inferred by BEAST2 and short read placement by EPA3 and pplacer4. These evolutionary evidences are not only used in annotating phylogenetic tree in ggtree but can also be further analyzed in R.

Supported File Formats

Most of the tree viewer software (including R packages) focus on Newick and Nexus file formats, while there are file formats from different evolution analysis software that contain supporting evidences within the file that are ready for annotating a phylogenetic tree. The ggtree package define several parser functions and S4 classes to store statistical evidences inferred by commonly used software packages. It supports several file formats, including:

and software output from:

Parser functions

The ggtree package implement several parser functions, including:

S4 classes

Correspondingly, ggtree defines several S4 classes to store evolutionary evidences inferred by these software packages, including:

The jplace class is also designed to store user specified annotation data.

Here is an overview of these S4 classes:

In addition, ggtree also supports phylo, multiPhylo (defined by ape10), phylo4, phylo4d (defined by phylobase) obkData (defined in OutbreakTools) and phyloseq (defined in phyloseq).

In ggtree, tree objects can be merged and evidences inferred from different phylogenetic analyses can be combined or compared and visualized.

Viewing a phylogenetic tree in ggtree is easy by using the command ggtree(tree_object) and annotating a phylogenetic tree is simple by adding graphic layers using the grammar of graphics.

For each class, we defined get.fields method to get the annotation features that available in the object that can be used to annotate a phylogenetic tree directly in ggtree. A get.tree method can be used to convert tree object to phylo (or multiPhylo for r8s) object that are widely supported by other R packages.

The groupOTU method is used for clustering related OTUs (from tips to their most recent common ancestor). Related OTUs are not necessarily within a clade, they can be distantly related. groupOTU works fine for monophyletic (clade), polyphyletic and paraphyletic, while groupClade only works for clade (monophyletic). These methods are useful for clustering related OTUs or clades.

The fortify method is used to convert tree object to a data.frame which is familiar by R users and easy to manipulate. The output data.frame contains tree information and all evolutionary evidences (if available, e.g. dN/dS in codeml object).

Detail descriptions of slots defined in each class are documented in class man pages. Users can use class?className (e.g. class?beast) to access man page of a class.

Getting Tree Data into R

Parsing BEAST output

file <- system.file("extdata/BEAST", "beast_mcc.tree", package="ggtree")
beast <- read.beast(file)
beast
## 'beast' S4 object that stored information of
##   '/tmp/RtmpNWH8ce/Rinst59a4280360d/ggtree/extdata/BEAST/beast_mcc.tree'.
## 
## ...@ tree: 
## Phylogenetic tree with 15 tips and 14 internal nodes.
## 
## Tip labels:
##  A_1995, B_1996, C_1995, D_1987, E_1996, F_1997, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##   'height',  'height_0.95_HPD',  'height_median',    'height_range', 'length',
##   'length_0.95_HPD', 'length_median',    'length_range', 'posterior',    'rate',
##   'rate_0.95_HPD',   'rate_median',  'rate_range'.

Since % is not a valid character in names, all the feature names that contain x% will convert to 0.x. For example, length_95%_HPD will be changed to length_0.95_HPD.

The get.fields method return all available features that can be used for annotation.

get.fields(beast)
##  [1] "height"          "height_0.95_HPD" "height_median"  
##  [4] "height_range"    "length"          "length_0.95_HPD"
##  [7] "length_median"   "length_range"    "posterior"      
## [10] "rate"            "rate_0.95_HPD"   "rate_median"    
## [13] "rate_range"

Users can use ggtree(beast) to visualize the tree and add layer to annotate it.

ggtree(beast, ndigits=2, branch.length = 'none') + geom_text(aes(x=branch, label=length_0.95_HPD), vjust=-.5, color='firebrick')

ggtree provides geom_range layer to display uncertainty of branch length.

ggtree(beast) + geom_range(range='length_0.95_HPD', color='red', alpha=.6, size=2)

In FigTree, only heigh_0.95_HPD is meaningful since the branch is scaled by height. In ggtree we can display HPD of rate, height or other variable if available since ggtree can rescale a tree using rescale_tree function or by specifing branch.length in ggtree function.

ggtree(beast, branch.length = 'rate') + geom_range(range='rate_0.95_HPD', color='red', alpha=.6, size=2)

With ggtree, evolutionary evidences inferred by commonly used software packages (BEAST in this example) can be easily transformed to a tidy data.frame by fortify method.

beast_data <- fortify(beast)
head(beast_data)
##   node parent branch.length        x  y  label isTip    branch angle
## 1    1     26     1.3203366 19.92609 13 A_1995  TRUE 19.265920   312
## 2    2     25     3.2619545 20.92609 12 B_1996  TRUE 19.295111   288
## 3    3     28     3.3119924 19.92609  7 C_1995  TRUE 18.270092   168
## 4    4     17     8.8216633 11.92609  1 D_1987  TRUE  7.515257    24
## 5    5     29     3.2710481 20.92609  9 E_1996  TRUE 19.290565   216
## 6    6     27     0.8311578 21.92609 14 F_1997  TRUE 21.510510   336
##   height height_0.95_HPD height_median height_range    length
## 1     18         [18,18]            18      [18,18] 1.3441569
## 2     17         [17,17]            17      [17,17] 3.3017477
## 3     18         [18,18]            18      [18,18] 3.3286573
## 4     26         [26,26]            26      [26,26] 9.1413106
## 5     17         [17,17]            17      [17,17] 3.2862156
## 6     16         [16,16]            16      [16,16] 0.8488314
##                        length_0.95_HPD length_median
## 1  [0.68517891886869,2.10354483714556]     1.3212658
## 2   [2.53358832907781,4.1401029285715]     3.2820598
## 3  [2.27307022364686,4.31554438343149]     3.3119924
## 4  [5.30884002039044,13.4387641030349]     8.9748680
## 5  [2.22106540956613,4.33254269801286]     3.2713107
## 6 [0.400580889894957,1.31579833143232]     0.8311578
##                           length_range posterior        rate
## 1 [0.265324005330331,3.07384777043179]        NA 0.002910248
## 2   [2.0792675983921,5.06630544445367]        NA 0.002931375
## 3  [1.71975701694686,6.50816017066041]        NA 0.002897274
## 4  [3.19947741596854,18.8278732882232]        NA 0.002924273
## 5  [1.64732407690073,5.67293898900339]        NA 0.002930741
## 6  [0.24483144082356,1.97014285250898]        NA 0.002908179
##                               rate_0.95_HPD rate_median
## 1 [0.00223113475668131,0.00356451343607832] 0.002894327
## 2  [0.0022724788824011,0.00354123504386722] 0.002911165
## 3 [0.00227453143856834,0.00351910617601055] 0.002891624
## 4 [0.00215956295110437,0.00357908783275133] 0.002906094
## 5 [0.00228866191654885,0.00356083209119594] 0.002916350
## 6  [0.00222013461432246,0.0035242778400976] 0.002894097
##                                  rate_range
## 1 [0.00139151638500062,0.00527869901952313]
## 2 [0.00175983928116987,0.00501640955442468]
## 3 [0.00137369890223726,0.00491233632681979]
## 4 [0.00129525039729045,0.00607832777082667]
## 5 [0.00141612851412156,0.00557985134567404]
## 6 [0.00118946556091268,0.00607832777082667]

Parsing PAML output

rst file

The read.paml_rst function can parse rst file from BASEML and CODEML. The only difference is the space in the sequences. For BASEML, each ten bases are separated by one space, while for CODEML, each three bases (triplet) are separated by one space.

brstfile <- system.file("extdata/PAML_Baseml", "rst", package="ggtree")
brst <- read.paml_rst(brstfile)
brst
## 'paml_rst' S4 object that stored information of
##   '/tmp/RtmpNWH8ce/Rinst59a4280360d/ggtree/extdata/PAML_Baseml/rst'.
## 
## ...@ tree:
## Phylogenetic tree with 15 tips and 13 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## Node labels:
##  16, 17, 18, 19, 20, 21, ...
## 
## Unrooted; includes branch lengths.
## 
## with the following features available:
##   'marginal_subs',   'joint_subs',   'marginal_AA_subs', 'joint_AA_subs'.
p <- ggtree(brst) +
    geom_text(aes(x=branch, label=marginal_AA_subs), vjust=-.5, color='steelblue') +
    theme_tree2()
print(p)

Similarly, we can parse the rst file from CODEML and visualize the tree with amino acid substitution.

crstfile <- system.file("extdata/PAML_Codeml", "rst", package="ggtree")
crst <- read.paml_rst(crstfile)
crst
## 'paml_rst' S4 object that stored information of
##   '/tmp/RtmpNWH8ce/Rinst59a4280360d/ggtree/extdata/PAML_Codeml/rst'.
## 
## ...@ tree:
## Phylogenetic tree with 15 tips and 13 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## Node labels:
##  16, 17, 18, 19, 20, 21, ...
## 
## Unrooted; includes branch lengths.
## 
## with the following features available:
##   'marginal_subs',   'joint_subs',   'marginal_AA_subs', 'joint_AA_subs'.

ggtree defines the update operator, %<%, that can update a tree view (applying the same pattern of visualization and annotation) with another tree object.

p %<% crst

We can find that these two figures have different evolution distances and subsitutions inferred by BASEML and CODEML are slightly different.

CODEML

mlc file

The mlc file output by CODEML contains dN/dS estimation.

mlcfile <- system.file("extdata/PAML_Codeml", "mlc", package="ggtree")
mlc <- read.codeml_mlc(mlcfile)
mlc
## 'codeml_mlc' S4 object that stored information of
##   '/tmp/RtmpNWH8ce/Rinst59a4280360d/ggtree/extdata/PAML_Codeml/mlc'. 
## 
## ...@ tree:
## Phylogenetic tree with 15 tips and 13 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## Node labels:
##  16, 17, 18, 19, 20, 21, ...
## 
## Unrooted; includes branch lengths.
## 
## with the following features available:
##   't',   'N',    'S',    'dN_vs_dS', 'dN',   'dS',   'N_x_dN',   'S_x_dS'.

Please aware that / and * are not valid characters in names, they were changed to _vs_ and _x_ respectively.

So dN_vs_dS is dN/dS, N_x_dN is N*dN, and S_x_dS is S*dS.

ggtree(mlc) + geom_text(aes(x=branch, label=dN_vs_dS), color='blue', vjust=-.2)

The paramter branch.length can be one of available annotations:

get.fields(mlc)
## [1] "t"        "N"        "S"        "dN_vs_dS" "dN"       "dS"      
## [7] "N_x_dN"   "S_x_dS"

For example, if we set branch.length to dN_vs_dS, the tree view will be re-scaled based on \(\omega\) (dN/dS).

ggtree(mlc, branch.length = "dN_vs_dS", aes(color=dN_vs_dS)) +
    scale_color_continuous(name='dN/dS', limits=c(0, 1.5),
                           oob=scales::squish, low="darkgreen", high="red")+
    theme_tree2(legend.position=c(.9, .5))

We can also plot the dN or dS tree and others. More examples (including evidences inferred by BEAST) can be found in the Tree Annotation vignette.

CODEML output: rst and mlc files

We annotate the tree with information presented in rst and mlc file separately as demonstrated in previous sessions.

We can also use both of them which is highly recommended.

ml <- read.codeml(crstfile, mlcfile)
ml
## 'codeml' S4 object that stored information of
##   '/tmp/RtmpNWH8ce/Rinst59a4280360d/ggtree/extdata/PAML_Codeml/rst' and 
##  '/tmp/RtmpNWH8ce/Rinst59a4280360d/ggtree/extdata/PAML_Codeml/mlc'. 
## 
## ...@ tree:
## Phylogenetic tree with 15 tips and 13 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## Node labels:
##  16, 17, 18, 19, 20, 21, ...
## 
## Unrooted; includes branch lengths.
## 
## with the following features available:
##   'marginal_subs',   'joint_subs',   'marginal_AA_subs', 'joint_AA_subs',
##   't',   'N',    'S',    'dN_vs_dS',
##   'dN',  'dS',   'N_x_dN',   'S_x_dS',
## .
head(fortify(ml))
##    label node parent branch.length        x  y isTip    branch angle     t
## 14     A    1     27      0.010340 0.417866 13  TRUE 0.4126960   312 0.010
## 15     B    2     26      0.031643 0.431097 12  TRUE 0.4152755   288 0.032
## 16     C    3     22      0.027730 0.418729  7  TRUE 0.4048640   168 0.028
## 17     D    4     17      0.082021 0.313649  3  TRUE 0.2726385    72 0.082
## 18     E    5     23      0.031104 0.430431  8  TRUE 0.4148790   192 0.031
## 19     F    6     28      0.006649 0.442478 14  TRUE 0.4391535   336 0.007
##         N     S dN_vs_dS     dN     dS N_x_dN S_x_dS
## 14 1514.9 633.1   0.0646 0.0007 0.0101      1    6.4
## 15 1514.9 633.1   0.0001 0.0000 0.0358      0   22.7
## 16 1514.9 633.1   0.0461 0.0013 0.0282      2   17.9
## 17 1514.9 633.1   0.0385 0.0033 0.0849      5   53.8
## 18 1514.9 633.1   0.0641 0.0020 0.0305      3   19.3
## 19 1514.9 633.1   0.2980 0.0013 0.0044      2    2.8
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                marginal_subs
## 14                                                                                                                                                                                                                                                                                                                                                                                                                C214T / G294T / T966C \n C1641T / A1808G / C1941T / C2070T
## 15                                                                                                                                                                                                                                                                                     C222T / G555A / C732A / C828T / G939T / T999C / A1047G / G1083A / C1263T / G1329A \n A1341G / A1344T / T1405C / T1440C / C1752T / A1827G / C1911T / C1941T / G2013A / T2022C / C2043T
## 16                                                                                                                                                                                                                                                                                                               A204G / C222T / G504A / A815G / C891T / G1023T / T1061C / G1101A \n G1176A / C1299T / C1545T / C1584T / T1626C / A1668G / G1878A / G1884T / C1944T / A1947G
## 17 A111G / A168G / T211C / A213G / T214C / G243A / A297G / A311G / T315A / G369A / G423A / A477C / G504A / G555A / G585A / C621T / C636T / A768G / G774A / T780C / C825T / A867G / A870G / G885A / C888T / G897A \n C987T / C1006T / G1017A / A1047G / C1062T / A1146T / G1197A / A1230G / A1281G / T1365C / C1392T / T1439C / A1443G / A1470G / A1482G / T1512C / T1530C / T1539C / T1560C / A1608G / G1671A / G1689A / G1695A / A1761G / T1776C / T1787C / A1914G / G2052A
## 18                                                                                                                                                                                                                                                                                                 C174T / G243A / G321A / A622T / T831C / A894G / G948T / C960T / A984G \n C1041T / G1135A / T1215C / T1249C / A1401G / T1527A / T1594C / A1826G / A1848G / T1950C / G1986A
## 19                                                                                                                                                                                                                                                                                                                                                                                                                                 T829C / C1353T / A1443G / T1548C / C1645A
##                                                                                                                                                                                                                                                                                                                                                                                                                                                 joint_subs
## 14                                                                                                                                                                                                                                                                                                                                                                                              C214T / G294T / T966C \n C1641T / A1808G / C1941T / C2070T
## 15                                                                                                                                                                                                                                                                   C222T / G555A / C732A / C828T / G939T / T999C / A1047G / G1083A / C1263T / G1329A \n A1341G / A1344T / T1405C / T1440C / C1752T / A1827G / C1911T / C1941T / G2013A / T2022C / C2043T
## 16                                                                                                                                                                                                                                                                                             A204G / C222T / G504A / A815G / C891T / G1023T / T1061C / G1101A \n G1176A / C1299T / C1545T / C1584T / T1626C / A1668G / G1878A / G1884T / C1944T / A1947G
## 17 A111G / A168G / T211C / A213G / T214C / G243A / A311G / T315A / G369A / G423A / A477C / G504A / G555A / G585A / C621T / C636T / A768G / T771C / G774A / T780C / C825T / A867G / A870G / G885A / C888T \n G897A / C987T / C1006T / G1017A / A1047G / C1062T / A1146T / G1197A / A1230G / A1281G / T1365C / C1392T / T1439C / A1443G / T1512C / T1530C / T1539C / T1560C / A1608G / G1671A / G1689A / G1695A / A1761G / T1776C / T1787C / A1914G / G2052A
## 18                                                                                                                                                                                                                                                                               C174T / G243A / G321A / A622T / T831C / A894G / G948T / C960T / A984G \n C1041T / G1135A / T1215C / T1249C / A1401G / T1527A / T1594C / A1826G / A1848G / T1950C / G1986A
## 19                                                                                                                                                                                                                                                                                                                                                                                                               T829C / C1353T / A1443G / T1548C / C1645A
##                         marginal_AA_subs
## 14                                 K603R
## 15                                      
## 16                         N272S / I354T
## 17 K104R / F105L / E382D / F480S / I596T
## 18                 T208S / V379I / K609R
## 19                         S277P / L549I
##                            joint_AA_subs
## 14                                 K603R
## 15                                      
## 16                         N272S / I354T
## 17 K104R / F105L / E382D / F480S / I596T
## 18                 T208S / V379I / K609R
## 19                         S277P / L549I

All the features in both files are available for annotation. For example, we can annotate dN/dS with the tree in rstfile and amino acid substitutions with the tree in mlcfile.

Parsing HYPHY output

nwk <- system.file("extdata/HYPHY", "labelledtree.tree", package="ggtree")
ancseq <- system.file("extdata/HYPHY", "ancseq.nex", package="ggtree")
tipfas <- system.file("extdata", "pa.fas", package="ggtree")
hy <- read.hyphy(nwk, ancseq, tipfas)
hy
## 'hyphy' S4 object that stored information of 
##   '/tmp/RtmpNWH8ce/Rinst59a4280360d/ggtree/extdata/HYPHY/labelledtree.tree', 
##  '/tmp/RtmpNWH8ce/Rinst59a4280360d/ggtree/extdata/HYPHY/ancseq.nex' and 
##  '/tmp/RtmpNWH8ce/Rinst59a4280360d/ggtree/extdata/pa.fas'. 
## 
## ...@ tree:
## Phylogenetic tree with 15 tips and 13 internal nodes.
## 
## Tip labels:
##  K, N, D, L, J, G, ...
## Node labels:
##  Node1, Node2, Node3, Node4, Node5, Node12, ...
## 
## Unrooted; includes branch lengths.
## 
## with the following features available:
##   'subs',    'AA_subs'.
ggtree(hy) + geom_text(aes(x=branch, label=AA_subs), vjust=-.5)

Parsing r8s output

r8s output contains 3 trees, namely TREE, RATO and PHYLO for time tree, rate tree and absolute substitution tree respectively.

r8s <- read.r8s(system.file("extdata/r8s", "H3_r8s_output.log", package="ggtree"))
ggtree(r8s, branch.length="TREE", mrsd="2014-01-01") + theme_tree2()

branch.length should be one of TREE, RATO or PHYLO for selecting the corresponding tree.

User can also view 3 trees simultaneously.

ggtree(get.tree(r8s), aes(color=.id)) + facet_wrap(~.id, scales="free_x")

Parsing output of RAxML bootstraping analysis

raxml_file <- system.file("extdata/RAxML", "RAxML_bipartitionsBranchLabels.H3", package="ggtree")
raxml <- read.raxml(raxml_file)
ggtree(raxml) + geom_label(aes(label=bootstrap, fill=bootstrap)) + geom_tiplab() +
    scale_fill_continuous(low='darkgreen', high='red') + theme_tree2(legend.position='right')

Parsing NHX tree

NHX (New Hampshire eXtended) format is commonly used in phylogenetics software (e.g. PHYLDOG6, RevBayes9, etc.)

nhxfile <- system.file("extdata", "ADH.nhx", package="ggtree")
nhx <- read.nhx(nhxfile)
ggtree(nhx) + geom_tiplab() + geom_point(aes(color=S), size=5, alpha=.5) + 
    theme(legend.position="right") +
    geom_text(aes(label=branch.length, x=branch), vjust=-.5) + 
    xlim(NA, 0.3)

Parsing Phylip tree

Phylip format contains taxa sequences with Newick tree text.

phyfile <- system.file("extdata", "sample.phy", package="ggtree")
phylip <- read.phylip(phyfile)
phylip
## 'phylip' S4 object that stored information of
##   '/tmp/RtmpNWH8ce/Rinst59a4280360d/ggtree/extdata/sample.phy'.
## 
## ...@ tree: 
## Phylogenetic tree with 15 tips and 13 internal nodes.
## 
## Tip labels:
##  K, N, D, L, J, G, ...
## 
## Unrooted; no branch lengths.
## 
## with sequence alignment available (15 sequences of length 2148)
ggtree(phylip) + geom_tiplab()

User can view sequence alignment with the tree via msaplot() function.

msaplot(phylip, offset=1)
## Warning: Ignoring unknown aesthetics: x, y

Parsing EPA and pplacer output

EPA3 and PPLACER4 have common output file format, jplace, which can be parsed by read.jplace() function.

jpf <- system.file("extdata/sample.jplace",  package="ggtree")
jp <- read.jplace(jpf)
print(jp)
## 'jplace' S4 object that stored information of
##   '/tmp/RtmpNWH8ce/Rinst59a4280360d/ggtree/extdata/sample.jplace'. 
## 
## ...@ tree: 
## Phylogenetic tree with 13 tips and 12 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features availables:
##   'edge_num',    'likelihood',   'like_weight_ratio',    'distal_length',    'pendant_length'.

In ggtree, we provide get.placements method to access the placement.

## get only best hit
get.placements(jp, by="best")
##   name edge_num likelihood like_weight_ratio distal_length pendant_length
## 1   AA       24  -61371.30          0.333344         3e-06       0.003887
## 2   BB        1  -61312.21          0.333335         1e-06       0.000003
## 3   CC        8  -61312.23          0.200011         1e-06       0.000003
## get all placement
get.placements(jp, by="all")
##   name edge_num likelihood like_weight_ratio distal_length pendant_length
## 1   AA       24  -61371.30          0.333344      0.000003       0.003887
## 2   BB        1  -61312.21          0.333335      0.000001       0.000003
## 3   BB        2  -61322.21          0.333322      0.000003       0.000003
## 4   BB      550  -61352.21          0.333322      0.000961       0.000003
## 5   CC        8  -61312.23          0.200011      0.000001       0.000003
## 6   CC        9  -61322.23          0.200000      0.000003       0.000003
## 7   CC       10  -61342.23          0.199992      0.000003       0.000003

This is only a tiny sample file. In reality, EPA and PPLACER may place thousands of short reads on a reference tree.

We may, for example, count the number of placement and annotate this information in the tree.

merging tree objects

In ggtree, tree objects can be merged and evidences inferred from different phylogenetic analyses can be combined or compared and visualized.

User can use the command like:

tree <- merge_tree(tree_object_1, tree_object_2)
## it's chainable, and serveral tree objects can be merged into one tree object
tree <- merge_tree(tree_object_1, tree_object_2) %>% merge_tree(tree_object_3) %>% merge_tree(tree_object_4)

Here we use a tree analyzed by BEAST and CodeML and merge them into one.

beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree")
beast_tree <- read.beast(beast_file)

rst_file <- system.file("examples/rst", package="ggtree")
mlc_file <- system.file("examples/mlc", package="ggtree")
codeml_tree <- read.codeml(rst_file, mlc_file)

merged_tree <- merge_tree(beast_tree, codeml_tree)

merged_tree
## 'beast' S4 object that stored information of
##   '/tmp/RtmpNWH8ce/Rinst59a4280360d/ggtree/examples/MCC_FluA_H3.tree'.
## 
## ...@ tree: 
## Phylogenetic tree with 76 tips and 75 internal nodes.
## 
## Tip labels:
##  A/Hokkaido/30-1-a/2013, A/New_York/334/2004, A/New_York/463/2005, A/New_York/452/1999, A/New_York/238/2005, A/New_York/523/1998, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##   'height',  'height_0.95_HPD',  'height_median',    'height_range', 'length',
##   'length_0.95_HPD', 'length_median',    'length_range', 'posterior',    'rate',
##   'rate_0.95_HPD',   'rate_median',  'rate_range',   't',    'N',
##   'S',   'dN_vs_dS', 'dN',   'dS',   'N_x_dN',
##   'S_x_dS',  'marginal_subs',    'joint_subs',   'marginal_AA_subs', 'joint_AA_subs',
## .
head(fortify(merged_tree))
##     node parent branch.length         x  y                  label isTip
## 1      1    115     1.7132967 21.273403 42 A/Hokkaido/30-1-a/2013  TRUE
## 64     2     94     0.4987642 12.273403 19    A/New_York/334/2004  TRUE
## 75     3    149     1.1007898 13.273403 20    A/New_York/463/2005  TRUE
## 86     4     81     1.4515245  7.273403  5    A/New_York/452/1999  TRUE
## 97     5    150     0.7584041 13.273403 22    A/New_York/238/2005  TRUE
## 108    6     80     0.9862567  6.273403  4    A/New_York/523/1998  TRUE
##        branch     angle height      height_0.95_HPD height_median
## 1   20.416754 198.94737      0                 <NA>            NA
## 64  12.024021  90.00000      9 [8.99999999999999,9]             9
## 75  12.723008  94.73684      8 [7.99999999999999,8]             8
## 86   6.547640  23.68421     14              [14,14]            14
## 97  12.894201 104.21053      8 [7.99999999999999,8]             8
## 108  5.780274  18.94737     15              [15,15]            15
##                            height_range    length
## 1                                  <NA> 1.7488457
## 64  [8.99999999999999,9.00000000000001] 0.4142844
## 75  [7.99999999999999,8.00000000000001] 1.2150863
## 86                              [14,14] 1.4825054
## 97  [7.99999999999999,8.00000000000001] 0.8352450
## 108                             [15,15] 1.0263708
##                          length_0.95_HPD length_median
## 1     [1.2419245785787,2.35253528554965]     1.7134859
## 64  [0.11087381773155,0.763503094778329]     0.3910469
## 75   [0.668322260746544,1.7737712171576]     1.2219990
## 86  [0.853931395208569,2.15937558137689]     1.4515996
## 97  [0.174538841541212,1.44334731036665]     0.8340485
## 108 [0.297308827683315,1.81660844839582]     0.9859892
##                              length_range posterior        rate
## 1     [1.08469725616582,3.35118282047009]        NA 0.003922044
## 64  [0.0335456372102882,1.29678831285928]        NA 0.005388027
## 75   [0.162345251796054,2.28802468225478]        NA 0.004054826
## 86   [0.425541574416807,3.40177019752132]        NA 0.002404793
## 97   [0.0177955758220367,2.3938807335679]        NA 0.003345623
## 108  [0.124258872802288,3.29822995743692]        NA 0.003278158
##                                  rate_0.95_HPD rate_median
## 1    [0.00198424955162117,0.00613179409091572] 0.003798885
## 64    [0.00193310629993817,0.0100837365559632] 0.004924431
## 75   [0.00176457626785999,0.00666684842413834] 0.003832351
## 86  [0.000967813813657923,0.00399429523696937] 0.002291148
## 97    [0.00116683553294069,0.0060052602738473] 0.003097797
## 108  [0.00117931453028289,0.00590429993517813] 0.003051411
##                                    rate_range     t      N     S dN_vs_dS
## 1    [0.00116560599218871,0.0116327877779378] 0.021 1220.7 465.3   0.0816
## 64  [0.000981041342925409,0.0249303086260615] 0.009 1220.7 465.3   0.0936
## 75   [0.00107720583988351,0.0237616091905446] 0.015 1220.7 465.3   0.2242
## 86  [0.000458425175436178,0.0082332992806142] 0.005 1220.7 465.3   0.1876
## 97   [0.00064526720263127,0.0145675373060715] 0.005 1220.7 465.3   0.7543
## 108  [0.000648648207991904,0.016590355843508] 0.007 1220.7 465.3   1.1624
##         dN     dS N_x_dN S_x_dS
## 1   0.0017 0.0205    2.0    9.5
## 64  0.0008 0.0089    1.0    4.1
## 75  0.0025 0.0111    3.0    5.2
## 86  0.0008 0.0044    1.0    2.1
## 97  0.0017 0.0022    2.0    1.0
## 108 0.0025 0.0022    3.1    1.0
##                                                                                marginal_subs
## 1   C72T / A146G / C285T / C369A / G501A \n T606C / T873G / C885A / C1242A / A1506G / A1512G
## 64                                                     T210C / T569A / C720A / C790T / G948A
## 75                        A52G / T432A / T623C \n C1098T / G1194A / G1206A / T1380C / C1562T
## 86                                                                   C543T / C1038T / C1615T
## 97                                                                      A13T / A567T / G591A
## 108                                                             T32C / G74A / C399T / A1559G
##                                                                                   joint_subs
## 1   C72T / A146G / C285T / C369A / G501A \n T606C / T873G / C885A / C1242A / A1506G / A1512G
## 64                                                     T210C / T569A / C720A / C790T / G948A
## 75                        A52G / T432A / T623C \n C1098T / G1194A / G1206A / T1380C / C1562T
## 86                                                                   C543T / C1038T / C1615T
## 97                                                                      A13T / A567T / G591A
## 108                                                             T32C / G74A / C399T / A1559G
##         marginal_AA_subs        joint_AA_subs
## 1           Q49R / N291K         Q49R / N291K
## 64                 F190Y                F190Y
## 75  K18E / I208T / S521L K18E / I208T / S521L
## 86                 L539F                L539F
## 97           I5F / K189N          I5F / K189N
## 108  L11S / S25N / K520R  L11S / S25N / K520R

After merging, all evidences inferred from different tools can be used to annotate the tree simultaneously. In this example, we used dN/dS inferred by CodeML to color the tree and annotate the tree with posterior inferred by BEAST.

ggtree(merged_tree, aes(color=dN_vs_dS), mrsd="2013-01-01", ndigits = 3) +
    geom_text(aes(label=posterior), vjust=.1, hjust=-.1, size=5, color="black") + 
    scale_color_continuous(name='dN/dS', limits=c(0, 1.5),
                           oob=scales::squish, low="green", high="red")+
    theme_tree2(legend.position="right")

jplace file format

The jplace file format was defined by Masten12 for phylogenetic placements. We employed this file format to store phylogenetic tree and user specified annotation data. Suppose we have a tree, and the associated data as shown below:

tree <- system.file("extdata", "pa.nwk", package="ggtree")
data <- read.csv(system.file("extdata", "pa_subs.csv", package="ggtree"), stringsAsFactor=FALSE)
print(tree)
## [1] "/tmp/RtmpNWH8ce/Rinst59a4280360d/ggtree/extdata/pa.nwk"
head(data)
##   label                          subs    gc
## 1     A                   R262K/K603R 0.444
## 2     B                         R262K 0.442
## 3     C                   N272S/I354T 0.439
## 4     D K104R/F105L/E382D/F480S/I596T 0.449
## 5     E             T208S/V379I/K609R 0.443
## 6     F                   S277P/L549I 0.444

The data contains amino acid substitutions from parent node to child node and GC contents of each node. We can annotate the tree as demonstrated in User specified annotation session of Tree Annotation vignette.

ggtree provides a function, write.jplace, to combine a tree and an associated data and store them to a single jplace file.

outfile <- tempfile()
write.jplace(tree, data, outfile)

The read.jplace function was designed to read the jplace file and store the information to a jplace object.

jp <- read.jplace(outfile)
print(jp)
## 'jplace' S4 object that stored information of
##   '/tmp/Rtmpl2ibsg/file6f89a9ad2d'. 
## 
## ...@ tree: 
## Phylogenetic tree with 15 tips and 13 internal nodes.
## 
## Tip labels:
##  K, N, D, L, J, G, ...
## 
## Unrooted; includes branch lengths.
## 
## with the following features availables:
##   'label',   'subs', 'gc'.

Now we know the jp object that stores the tree and the associated amino acid substitution and GC content information, we can view the tree and display the associated annotation data on it directly by ggtree.

ggtree(jp) + 
    geom_text(aes(x=branch, label=subs), color="purple", vjust=-1, size=3) + 
    geom_text(aes(label=gc), color="steelblue", hjust=-.6, size=3) +
    geom_tiplab()

References

1. Yang, Z. PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24, 1586–1591 (2007).

2. Bouckaert, R. et al. BEAST 2: A software platform for bayesian evolutionary analysis. PLoS Comput Biol 10, e1003537 (2014).

3. Berger, S. A., Krompass, D. & Stamatakis, A. Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Systematic Biology 60, 291–302 (2011).

4. Matsen, F. A., Kodner, R. B. & Armbrust, E. V. Pplacer: Linear time maximum-likelihood and bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11, 538 (2010).

5. Pond, S. L. K., Frost, S. D. W. & Muse, S. V. HyPhy: Hypothesis testing using phylogenies. Bioinformatics 21, 676–679 (2005).

6. Boussau, B. et al. Genome-scale coestimation of species and gene trees. Genome Res. 23, 323–330 (2013).

7. Marazzi, B. et al. Locating evolutionary precursors on a phylogenetic tree. Evolution 66, 3918–3930 (2012).

8. Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics btu033 (2014). doi:10.1093/bioinformatics/btu033

9. Höhna, S. et al. Probabilistic graphical model representation in phylogenetics. Syst Biol 63, 753–771 (2014).

10. Paradis, E., Claude, J. & Strimmer, K. APE: Analyses of phylogenetics and evolution in r language. Bioinformatics 20, 289–290 (2004).

11. Schliep, K. P. Phangorn: Phylogenetic analysis in r. Bioinformatics 27, 592–593 (2011).

12. Matsen, F. A., Hoffman, N. G., Gallagher, A. & Stamatakis, A. A format for phylogenetic placements. PLoS ONE 7, e31009 (2012).