Plotting a phylogeny with the package ggtree
(5 min)
Installing the package ggtree
(10 min)
ggtree
is an extension of theggplot2
package, developed specifically for phylogenetic tree visualization- The author has made available an extensive book with examples
ggtree
is hosted in Bioconductor (not CRAN.- do
length(available())
if you want to know the number of R packages available for installation, both from CRAN and Bioconductor - “CRAN hosts over 15000 packages and is the official repository for user contributed R packages. Bioconductor provides open source software oriented towards bioinformatics and hosts over 1800 R packages”, from An Introduction to R
- Bioconductor Vs CRAN
- The function
install.packages()
that we know well only workf for CRAN packages - To install an R package from Bioconductor, use the function
install()
from the packageBiocManager
:- Install
BiocManager
from CRAN withinstall.packages("BiocManager")
- Then install
ggtree
from Bioconductor withBiocManager::install("ggtree")
- Install
Exercise 1
- Download this phylogenetic tree of species from the Portal Project Teaching Database, by clicking on the link and saving it to your data-raw folder.
- Open the file by clicking on its name on the Files tab of RStudio’s Plots pane. It should look like this:
- Use the main function of the package
ggtree
(it is also calledggtree()
) to visualizeportal-tree.tre
:ggtree(portal_tree)
- What differences can you note between a
ggtree()
plot and one generated withplot.phylo()
?
The phylo
class structure (10 min)
- Type the name of the tree you just created and look at the output
- What information is printed to screen?
- Use functions to explore the structure of the objects you just created:
class(portal_tree)
,portal_tree
is an object of class"phylo"
length(tree)
, it has length 4names(tree)
, and it has names- Just as with data frames, we can access the named elements of a
"phylo"
object using the dollar sign$
portal_tree$edge
,class(portal_tree$edge)
portal_tree$Nnode
,portal_tree$tip.label
portal_tree$node.label
str(tree)
, shows a summary of the elements of the"phylo"
objecttypeof(tree)
, the"phylo"
class is an object of type"list"
- 🎗️ classes and types are data structures that R uses to store/extract information
- a
"list"
is a data type (or object type), that can hold one or more objects of different types. - the class
"phylo"
is a list that combines a matrix, a numeric vector of length one and two character vectors. - the
"phylo"
class provides R with all the information it needs to represent a phylogenetic tree
Connecting a phylogeny with data from a table
- Preparation:
- Download a data table of the species from the Portal Data base that inlcudes taxonomy
- Save it in your data-raw folder.
- Read it into R with
read.csv()
, and assign it to an object calledtaxonomy
.
- To join a tree and a data table, we will use the
_join()
functions that we used previously to join tables- Mini review; example with
surveys
andspecies
:species <- read.csv("../data-raw/species.csv") surveys <- read.csv("../data-raw/surveys.csv") intersect(colnames(species), colnames(surveys)) library(dplyr) joined_left <- left_join(surveys, species, by = "plot_id") joined_inner <- inner_join(surveys, species, by = "plot_id")
- What is the difference between
left_join
andinner_join
?
- Mini review; example with
- To link a tree and a data table, the tree has to be the first argument and the table will be second
- Also, the column that we will be joining by is always
"label"
- Make sure there is a “label” column that has some names from your tips in your table
tree_table <- left_join(portal_tree, taxonomy_matched, by= "label")
- What is the structure of the object?
- Attention! doing a full join does not work later down the analysis workflow, we need a left join to drop non matches
full_join(portal_tree, taxonomy, by = "label")
- We can still plot our tree normally with
ggtree(tree_table)
- But now we can use
aesthetics
to color some tip labels by a varianle of choice:ggtree(tree, aes(color = taxa, fontface = "italic")) + # it freezes if there are any unmatched or NA labels in data table!!! xlim(0, 20) + geom_tiplab()
Exercise 2: A taxonomy table for small_tree
- Find the appropriate scientific group labels for each genus in
small_tree
using this tree as guide. - Create a data frame with 3 columns:
- a
"label"
column with the names of the tip labels ofsmall_tree
. Tip: extract the element"tip.label"
from your phylo object to get a vector of tip labels that you can then join to the other vectors to create a data frame. - a
"taxa"
column with the scientific names of the group that each genus belongs to. - a
"common_name"
column with the common names of the group that each genus belongs to. Tip: use the functionc()
to create the vectors that will be columns"taxa"
and"common_name"
- a
- Join your tree and your table using
left_join()
. - Create two different tree plots using
taxa
andcommon name
to color the tips of the tree.
Exercise 3: Connecting a tree and a table for your final project
- Connect the data table you chose for your final project to a tree with taxa on your table that you obtained from Open Tree of Life.
- Plot the tree and color the tips following one variable on the data table.
Homework - Exercise 4: Mapping weight data from surveys CSV table to the portal tree (10 min)
- Get the average weight and hindfoot length per species.
- Create a new data frame that contains the taxonomy data plus the averaged data per species that you got on last question.
- Create two plots with data on the tips, one with the average weight and the other with average hindfoot length. Make sure to also add tip labels, formatted in italics.