Latest issue

Systematic Biology - RSS feed of current issue

URL

XML feed
http://sysbio.oxfordjournals.org

Last update

21 sec ago

October 13, 2014

21:34

When doing a bootstrap analysis with a single tree saved per pseudoreplicate, biased search algorithms may influence support values more than actual properties of the data set. Two methods commonly used for finding phylogenetic trees consist of randomizing the input order of species in multiple addition sequences followed by branch swapping, or using random trees as the starting point for branch swapping. The randomness inherent to such methods is assumed to eliminate any consistent preferences for some trees or unsupported groups of taxa, but both methods can be significantly biased. In the case of trees created by sequentially adding taxa, a bias may occur even if every addition sequence is equiprobable, and if one of the equally optimal positions for each terminal to add to the tree is selected equiprobably. In the case of branch swapping, the bias can happen even when branch swapping equiprobably selects any of the trees of better score in the subtree-pruning-regrafting-neighborhood or tree-bisection-reconnection-neighborhood. Consequently, when the data set is ambiguous, both random-addition sequences and branch swapping from random trees may (i) find some of the optimal trees much more frequently than others and (ii) find some groups with a frequency that differs from their frequency among all optimal trees. When the data set defines a single optimal tree, the groups present in that tree may have a different probability of being found by a search, even if supported by equal amounts of evidence. This may happen in both parsimony and maximum-likelihood analyses, and even in small data sets without incongruence.

21:34

Phylogenetic analyses using concatenation of genomic-scale data have been seen as the panacea for resolving the incongruences among inferences from few or single genes. However, phylogenomics may also suffer from systematic errors, due to the, perhaps cumulative, effects of saturation, among-taxa compositional (GC content) heterogeneity, or codon-usage bias plaguing the individual nucleotide loci that are concatenated. Here, we provide an example of how these factors affect the inferences of the phylogeny of early land plants based on mitochondrial genomic data. Mitochondrial sequences evolve slowly in plants and hence are thought to be suitable for resolving deep relationships. We newly assembled mitochondrial genomes from 20 bryophytes, complemented these with 40 other streptophytes (land plants plus algal outgroups), compiling a data matrix of 60 taxa and 41 mitochondrial genes. Homogeneous analyses of the concatenated nucleotide data resolve mosses as sister-group to the remaining land plants. However, the corresponding translated amino acid data support the liverwort lineage in this position. Both results receive weak to moderate support in maximum-likelihood analyses, but strong support in Bayesian inferences. Tests of alternative hypotheses using either nucleotide or amino acid data provide implicit support for their respective optimal topologies, and clearly reject the hypotheses that bryophytes are monophyletic, liverworts and mosses share a unique common ancestor, or hornworts are sister to the remaining land plants. We determined that land plant lineages differ in their nucleotide composition, and in their usage of synonymous codon variants. Composition heterogeneous Bayesian analyses employing a nonstationary model that accounts for variation in among-lineage composition, and inferences from degenerated nucleotide data that avoid the effects of synonymous substitutions that underlie codon-usage bias, again recovered liverworts being sister to the remaining land plants but without support. These analyses indicate that the inference of an early-branching moss lineage based on the nucleotide data is caused by convergent compositional biases. Accommodating among-site amino acid compositional heterogeneity (CAT-model) yields no support for the optimal resolution of liverwort as sister to the rest of land plants, suggesting that the robust inference of the liverwort position in homogeneous analyses may be due in part to compositional biases among sites. All analyses support a paraphyletic bryophytes with hornworts composing the sister-group to tracheophytes. We conclude that while genomic data may generate highly supported phylogenetic trees, these inferences may be artifacts. We suggest that phylogenomic analyses should assess the possible impact of potential biases through comparisons of protein-coding gene data and their amino acid translations by evaluating the impact of substitutional saturation, synonymous substitutions, and compositional biases through data deletion strategies and by analyzing the data using heterogeneous composition models. We caution against relying on any one presentation of the data (nucleotide or amino acid) or any one type of analysis even when analyzing large-scale data sets, no matter how well-supported, without fully exploring the effects of substitution models.

21:34

Tropical Southeast (SE) Asia harbors extraordinary species richness and in its entirety comprises four of the Earth's 34 biodiversity hotspots. Here, we examine the assembly of the SE Asian biota through time and space. We conduct meta-analyses of geological, climatic, and biological (including 61 phylogenetic) data sets to test which areas have been the sources of long-term biological diversity in SE Asia, particularly in the pre-Miocene, Miocene, and Plio-Pleistocene, and whether the respective biota have been dominated by in situ diversification, immigration and/or emigration, or equilibrium dynamics. We identify Borneo and Indochina, in particular, as major "evolutionary hotspots" for a diverse range of fauna and flora. Although most of the region's biodiversity is a result of both the accumulation of immigrants and in situ diversification, within-area diversification and subsequent emigration have been the predominant signals characterizing Indochina and Borneo's biota since at least the early Miocene. In contrast, colonization events are comparatively rare from younger volcanically active emergent islands such as Java, which show increased levels of immigration events. Few dispersal events were observed across the major biogeographic barrier of Wallace's Line. Accelerated efforts to conserve Borneo's flora and fauna in particular, currently housing the highest levels of SE Asian plant and mammal species richness, are critically required.

21:34

Our understanding of macroevolutionary patterns of adaptive evolution has greatly increased with the advent of large-scale phylogenetic comparative methods. Widely used Ornstein–Uhlenbeck (OU) models can describe an adaptive process of divergence and selection. However, inference of the dynamics of adaptive landscapes from comparative data is complicated by interpretational difficulties, lack of identifiability among parameter values and the common requirement that adaptive hypotheses must be assigned a priori. Here, we develop a reversible-jump Bayesian method of fitting multi-optima OU models to phylogenetic comparative data that estimates the placement and magnitude of adaptive shifts directly from the data. We show how biologically informed hypotheses can be tested against this inferred posterior of shift locations using Bayes Factors to establish whether our a priori models adequately describe the dynamics of adaptive peak shifts. Furthermore, we show how the inclusion of informative priors can be used to restrict models to biologically realistic parameter space and test particular biological interpretations of evolutionary models. We argue that Bayesian model fitting of OU models to comparative data provides a framework for integrating of multiple sources of biological data—such as microevolutionary estimates of selection parameters and paleontological timeseries—allowing inference of adaptive landscape dynamics with explicit, process-based biological interpretations.

21:34

The molecular era has fundamentally reshaped our knowledge of the evolution and diversification of angiosperms. One outstanding question is the phylogenetic placement of Amborella trichopoda Baill., commonly thought to represent the first lineage of extant angiosperms. Here, we leverage publicly available data and provide a broad coalescent-based species tree estimation of 45 seed plants. By incorporating 310 nuclear genes, our coalescent analyses strongly support a clade containing Amborella plus water lilies (i.e., Nymphaeales) that is sister to all other angiosperms across different nucleotide rate partitions. Our results also show that commonly applied concatenation methods produce strongly supported, but incongruent placements of Amborella: slow-evolving nucleotide sites corroborate results from coalescent analyses, whereas fast-evolving sites place Amborella alone as the first lineage of extant angiosperms. We further explored the performance of coalescent versus concatenation methods using nucleotide sequences simulated on (i) the two alternate placements of Amborella with branch lengths and substitution model parameters estimated from each of the 310 nuclear genes and (ii) three hypothetical species trees that are topologically identical except with respect to the degree of deep coalescence and branch lengths. Our results collectively suggest that the Amborella alone placement inferred using concatenation methods is likely misled by fast-evolving sites. This appears to be exacerbated by the combination of long branches in stem group angiosperms, Amborella, and Nymphaeales with the short internal branch separating Amborella and Nymphaeales. In contrast, coalescent methods appear to be more robust to elevated substitution rates.

21:34

The temperate woody bamboos constitute a distinct tribe Arundinarieae (Poaceae: Bambusoideae) with high species diversity. Estimating phylogenetic relationships among the 11 major lineages of Arundinarieae has been particularly difficult, owing to a possible rapid radiation and the extremely low rate of sequence divergence. Here, we explore the use of chloroplast genome sequencing for phylogenetic inference. We sampled 25 species (22 temperate bamboos and 3 outgroups) for the complete genome representing eight major lineages of Arundinarieae in an attempt to resolve backbone relationships. Phylogenetic analyses of coding versus noncoding sequences, and of different regions of the genome (large single copy and small single copy, and inverted repeat regions) yielded no well-supported contradicting topologies but potential incongruence was found between the coding and noncoding sequences. The use of various data partitioning schemes in analysis of the complete sequences resulted in nearly identical topologies and node support values, although the partitioning schemes were decisively different from each other as to the fit to the data. Our full genomic data set substantially increased resolution along the backbone and provided strong support for most relationships despite the very short internodes and long branches in the tree. The inferred relationships were also robust to potential confounding factors (e.g., long-branch attraction) and received support from independent indels in the genome. We then added taxa from the three Arundinarieae lineages that were not included in the full-genome data set; each of these were sampled for more than 50% genome sequences. The resulting trees not only corroborated the reconstructed deep-level relationships but also largely resolved the phylogenetic placements of these three additional lineages. Furthermore, adding 129 additional taxa sampled for only eight chloroplast loci to the combined data set yielded almost identical relationships, albeit with low support values. We believe that the inferred phylogeny is robust to taxon sampling. Having resolved the deep-level relationships of Arundinarieae, we illuminate how chloroplast phylogenomics can be used for elucidating difficult phylogeny at low taxonomic levels in intractable plant groups.

21:34

Founder-event speciation, where a rare jump dispersal event founds a new genetically isolated lineage, has long been considered crucial by many historical biogeographers, but its importance is disputed within the vicariance school. Probabilistic modeling of geographic range evolution creates the potential to test different biogeographical models against data using standard statistical model choice procedures, as long as multiple models are available. I re-implement the Dispersal–Extinction–Cladogenesis (DEC) model of LAGRANGE in the R package BioGeoBEARS, and modify it to create a new model, DEC + J, which adds founder-event speciation, the importance of which is governed by a new free parameter, $$j$$. The identifiability of DEC and DEC + J is tested on data sets simulated under a wide range of macroevolutionary models where geography evolves jointly with lineage birth/death events. The results confirm that DEC and DEC + J are identifiable even though these models ignore the fact that molecular phylogenies are missing many cladogenesis and extinction events. The simulations also indicate that DEC will have substantially increased errors in ancestral range estimation and parameter inference when the true model includes + J. DEC and DEC + J are compared on 13 empirical data sets drawn from studies of island clades. Likelihood-ratio tests indicate that all clades reject DEC, and AICc model weights show large to overwhelming support for DEC + J, for the first time verifying the importance of founder-event speciation in island clades via statistical model choice. Under DEC + J, ancestral nodes are usually estimated to have ranges occupying only one island, rather than the widespread ancestors often favored by DEC. These results indicate that the assumptions of historical biogeography models can have large impacts on inference and require testing and comparison with statistical methods.

21:34

Ancient oceanic archipelagos of similar geological age are expected to accrue comparable numbers of endemic lineages with identical life history strategies, especially if the islands exhibit analogous habitats. We tested this hypothesis using marine snails of the genus Conus from the Atlantic archipelagos of Cape Verde and Canary Islands. Together with Azores and Madeira, these archipelagos comprise the Macaronesia biogeographic region and differ remarkably in the diversity of this group. More than 50 endemic Conus species have been described from Cape Verde, whereas prior to this study, only two nonendemic species, including a putative species complex, were thought to occur in the Canary Islands. We combined molecular phylogenetic data and geometric morphometrics with bathymetric and paleoclimatic reconstructions to understand the contrasting diversification patterns found in these regions. Our results suggest that species diversity is even lower than previously thought in the Canary Islands, with the putative species complex corresponding to a single species, Conus guanche. One explanation for the enormous disparity in Conus diversity is that the amount of available habitat may differ, or may have differed in the past due to eustatic (global) sea level changes. Historical bathymetric data, however, indicated that sea level fluctuations since the Miocene have had a similar impact on the available habitat area in both Cape Verde and Canary archipelagos and therefore do not explain this disparity. We suggest that recurrent gene flow between the Canary Islands and West Africa, habitat losses due to intense volcanic activity in combination with unsuccessful colonization of new Conus species from more diverse regions, were all determinant in shaping diversity patterns within the Canarian archipelago. Worldwide Conus species diversity follows the well-established pattern of latitudinal increase of species richness from the poles towards the tropics. However, the eastern Atlantic revealed a striking pattern with two main peaks of Conus species richness in the subtropical area and decreasing diversities toward the tropical western African coast. A Random Forests model using 12 oceanographic variables suggested that sea surface temperature is the main determinant of Conus diversity either at continental scales (eastern Atlantic coast) or in a broader context (worldwide). Other factors such as availability of suitable habitat and reduced salinity due to the influx of large rivers in the tropical area also play an important role in shaping Conus diversity patterns in the western coast of Africa.