3. Dealing with "broken" and "invalid" taxa
Overview
Teaching: 5 min
Exercises: 5 minQuestions
How do I detect a broken taxon?
Objectives
Use the function is_in_tree()
Understand outputs from those functions
We say that a taxon is “broken” when its OTT id is not assigned to a node in the synthetic tree. As mentioned before, this happens when the OTT id belongs to a taxon that is not monophyletic in the synthetic tree. This is the reason why we get an error when we try to get a synthetic subtree including that OTT id: it is not in the tree.
There is a way to find out that a group is “broken” before trying to get the subtree and getting an error.
rotl::is_in_tree(resolved_names["Canis",]$ott_id)
[1] FALSE
Indeed, our Canis is not in the synthetic OTOL. To extract a subtree of a “broken” taxon, we have some options. But we will focus on one.
Getting the MRCA of a taxon
The function tol_node_info()
gets for you all relevant information of the node that is the ancestor or MRCA of a taxon. That also includes the actual node id.
canis_node_info <- rotl::tol_node_info(resolved_names["Canis",]$ott_id)
canis_node_info
OpenTree node.
Node id: mrcaott47497ott110766
Number of terminal descendants: 85
Is taxon: FALSE
Let’s explore the class of the output.
class(canis_node_info)
[1] "tol_node" "list"
So we have an object of class ‘list’ and ‘tol_node’. When we printed it, we got some information. But we do not know how much information might not be “printed” to screen.
Let’s use the functions str()
or ls()
to check out the data strcture of our ‘tol_node’ object.
str(canis_node_info)
List of 8
$ node_id : chr "mrcaott47497ott110766"
$ num_tips : int 85
$ query : chr "ott372706"
$ resolves :List of 1
..$ pg_2812@tree6545: chr "node1135827"
$ source_id_map:List of 5
..$ ot_278@tree1 :List of 3
.. ..$ git_sha : chr "3008105691283414a18a6c8a728263b2aa8e7960"
.. ..$ study_id: chr "ot_278"
.. ..$ tree_id : chr "tree1"
..$ ot_328@tree1 :List of 3
.. ..$ git_sha : chr "3008105691283414a18a6c8a728263b2aa8e7960"
.. ..$ study_id: chr "ot_328"
.. ..$ tree_id : chr "tree1"
..$ pg_1428@tree2855:List of 3
.. ..$ git_sha : chr "3008105691283414a18a6c8a728263b2aa8e7960"
.. ..$ study_id: chr "pg_1428"
.. ..$ tree_id : chr "tree2855"
..$ pg_2647@tree6169:List of 3
.. ..$ git_sha : chr "3008105691283414a18a6c8a728263b2aa8e7960"
.. ..$ study_id: chr "pg_2647"
.. ..$ tree_id : chr "tree6169"
..$ pg_2812@tree6545:List of 3
.. ..$ git_sha : chr "3008105691283414a18a6c8a728263b2aa8e7960"
.. ..$ study_id: chr "pg_2812"
.. ..$ tree_id : chr "tree6545"
$ supported_by :List of 2
..$ ot_278@tree1: chr "node233"
..$ ot_328@tree1: chr "node495"
$ synth_id : chr "opentree12.3"
$ terminal :List of 2
..$ pg_1428@tree2855: chr "node610132"
..$ pg_2647@tree6169: chr "ott247333"
- attr(*, "class")= chr [1:2] "tol_node" "list"
This is telling us that tol_node_info()
extracted 8 different pieces of information from my node.
Right now we are only interested in the node ir. Where do you think it is?
Hands on! Get the node id of Canis MRCA
Extract it from your
canis_node_info
object and call itcanis_node_id
.canis_node_id <- canis_node_info$node_id
Pro tip 3.1: Get the node id of the MRCA of a group of OTT ids
Sometimes you want the MRCA of a bunch of lineages. The function
tol_mrca()
gets the node if of the MRCA of a group of OTT ids.Can you use it to get the mrca of Canis?
The node that contains Canis is “mrcaott47497ott110766”.
Getting a subtree using a node id instead of the taxon OTT id
Now that we have a node id, we can use it to get a subtree with tol_subtree()
, using the argument node_id
.
canis_node_subtree <- rotl::tol_subtree(node_id = canis_node_id)
ape::plot.phylo(canis_node_subtree, cex = 1.2)
Nice! We got a subtree of 85 tips, containing all descendants from the node that also contains Canis.
This includes species assigned to genera other than Canis.
Note: Get an induced subtree of taxonomic children
It might seem non phylogenetic, but what if I really, really need a tree containing species within the genus Canis only?
We can get the OTT ids of the taxonomic children of our taxon of interest and use the function
tol_induced_subtree()
.So, here is my hack, enjoy!
First, get the taxonomic children.
canis_taxonomy <- rotl::taxonomy_subtree(resolved_names["Canis",]$ott_id)
canis_taxonomy
$tip_label [1] "Canis_dirus_ott3612500" [2] "Canis_anthus_ott5835572" [3] "Canis_rufus_ott113383" [4] "Canis_simensis_ott752755" [5] "Canis_aureus_ott621168" [6] "Canis_mesomelas_elongae_ott576165" [7] "Canis_adustus_ott621176" [8] "unclassified_Canis_ott7655955" [9] "Canis_latrans_ott247331" [10] "Canis_lupus_baileyi_ott67371" [11] "Canis_lupus_laniger_ott80830" [12] "Canis_lupus_orion_ott7067596" [13] "Canis_lupus_hodophilax_ott318630" [14] "Canis_lupus_signatus_ott545727" [15] "Canis_lupus_arctos_ott5340002" [16] "Canis_lupus_mogollonensis_ott263524" [17] "Canis_lupus_variabilis_ott5839539" [18] "Canis_lupus_lupus_ott883675" [19] "Canis_lupus_campestris_ott4941916" [20] "Canis_lupus_lycaon_ott948004" [21] "Canis_lupus_pallipes_ott47497" [22] "Canis_lupus_chanco_ott47500" [23] "Canis_lupus_x_Canis_lupus_familiaris_ott4941915" [24] "Canis_lupus_desertorum_ott234374" [25] "Canis_lupus_familiaris_ott247333" [26] "Canis_lupus_dingo_ott380529" [27] "Canis_lupus_labradorius_ott531973" [28] "Canis_lupus_hattai_ott83897" [29] "Canis_lupus_lupaster_ott987895" [30] "Canis_himalayensis_ott346723" [31] "Canis_indica_ott346728" [32] "Canis_environmental_samples_ott4941917" [33] "Canissp.KEB-2016ott5925604" [34] "Canis_sp._CANInt1_ott470950" [35] "'Canissp.Russia/33" [36] "500ott5338950'" [37] "Canis_sp._ott247325" [38] "'Canissp.Belgium/36" [39] "000ott5338951'" [40] "Canis_environmental_sample_ott4941918" [41] "Canis_morenis_ott6145387" [42] "Canis_niger_ott6145388" [43] "Canis_palaeoplatensis_ott6145390" [44] "Canis_osorum_ott6145389" [45] "Canis_thooides_ott6145392" [46] "Canis_antarcticus_ott6145381" [47] "Canis_proplatensis_ott6145391" [48] "Canis_feneus_ott6145384" [49] "Canis_geismarianus_ott6145385" [50] "Canis_ameghinoi_ott7655930" [51] "Canis_nehringi_ott7655947" [52] "Canis_palustris_ott7655949" [53] "Canis_lanka_ott7655942" [54] "Canis_pallipes_ott7655948" [55] "Canis_gezi_ott7655939" [56] "Canis_montanus_ott7655945" [57] "Canis_primaevus_ott7655951" [58] "Canis_chrysurus_ott7655935" [59] "Canis_dukhunensis_ott7655937" [60] "Canis_kokree_ott7655941" [61] "Canis_sladeni_ott7655952" [62] "Canis_himalaicus_ott7655940" [63] "Canis_chanco_ott7655934" [64] "Canis_curvipalatus_ott7655936" [65] "Canis_lateralis_ott7655943" [66] "Canis_argentinus_ott7655931" [67] "Canis_tarijensis_ott7655953" [68] "Canis_naria_ott7655946" [69] "Canis_peruanus_ott7655950" [70] "Canis_cautleyi_ott7655933" [71] "Canis_ursinus_ott7655954" [72] "Canis_armbrusteri_ott3612502" [73] "Canis_ferox_ott3612501" [74] "Canis_lepophagus_ott3612503" [75] "Canis_edwardii_ott3612509" [76] "Canis_apolloniensis_ott3612508" [77] "Canis_cedazoensis_ott3612507" [78] "Canis_primigenius_ott3612506" [79] "Canis_lydekkeri_ott7655944" [80] "Canis_arnensis_ott7655932" [81] "Canis_antarticus_ott6145382" [82] "Canis_dingo_ott6145383" [83] "Canis_etruscus_ott7655938" [84] "Canis_spelaeus_ott3612504" $edge_label [1] "Canis_mesomelas_ott666235" "Canis_lupus_ott247341" [3] "Canis_ott372706"
Now, extract the OTT ids.
canis_taxonomy_ott_ids <- datelife::extract_ott_ids(x = canis_taxonomy$tip_label)
After extracting ott ids, there are some non numeric elements:
Canissp.KEB-2016ott5925604 'Canissp.Russia/33 500ott5338950' 'Canissp.Belgium/36 000ott5338951'
NAs removed.
Try to get an induced subtree of Canis taxonomic children.
canis_taxonomy_subtree <- rotl::tol_induced_subtree(canis_taxonomy_ott_ids)
Error: HTTP failure: 400 [/v3/tree_of_life/induced_subtree] Error: node_id 'ott3612504' was not found!list(ott247325 = "pruned_ott_id", ott3612504 = "pruned_ott_id", ott3612506 = "pruned_ott_id", ott3612508 = "pruned_ott_id", ott470950 = "pruned_ott_id", ott4941915 = "pruned_ott_id", ott4941917 = "pruned_ott_id", ott6145381 = "pruned_ott_id", ott6145384 = "pruned_ott_id", ott6145385 = "pruned_ott_id", ott6145387 = "pruned_ott_id", ott6145388 = "pruned_ott_id", ott6145389 = "pruned_ott_id", ott6145390 = "pruned_ott_id", ott6145391 = "pruned_ott_id", ott6145392 = "pruned_ott_id", ott7655932 = "pruned_ott_id", ott7655944 = "pruned_ott_id", ott7655945 = "pruned_ott_id", ott7655955 = "pruned_ott_id")
It is often not possible to get an induced subtree of all taxonomic children from a taxon, because some of them will not make it to the synthetic tree.
To verify which ones are giving us trouble, we can use the function
is_in_tree()
again.canis_in_tree <- sapply(canis_taxonomy_ott_ids, rotl::is_in_tree) # logical vector canis_taxonomy_ott_ids_intree <- canis_taxonomy_ott_ids[canis_in_tree] # extract ott ids in tree
Now get the tree.
canis_taxonomy_subtree <- rotl::tol_induced_subtree(canis_taxonomy_ott_ids_intree)
Plot it.
ape::plot.phylo(canis_taxonomy_subtree, cex = 1.2)
There! We have a synthetic subtree (derived from phylogenetic information) containing only the taxonomic children of Canis.
What if I want a subtree of certain taxonomic ranks withing my group? Go to the next episode and find out how you can do this!
Key Points
It is not possible to get a subtre from an OTT id that is not in the synthetic tree.
OTT ids and node ids allow us to interact with the synthetic OTOL.