news aggregator

April 14, 2015

02:44
02:44

Here we introduce a general class of multiple calibration birth–death tree priors for use in Bayesian phylogenetic inference. All tree priors in this class separate ancestral node heights into a set of "calibrated nodes" and "uncalibrated nodes" such that the marginal distribution of the calibrated nodes is user-specified whereas the density ratio of the birth–death prior is retained for trees with equal values for the calibrated nodes. We describe two formulations, one in which the calibration information informs the prior on ranked tree topologies, through the (conditional) prior, and the other which factorizes the prior on divergence times and ranked topologies, thus allowing uniform, or any arbitrary prior distribution on ranked topologies. Although the first of these formulations has some attractive properties, the algorithm we present for computing its prior density is computationally intensive. However, the second formulation is always faster and computationally efficient for up to six calibrations. We demonstrate the utility of the new class of multiple-calibration tree priors using both small simulations and a real-world analysis and compare the results to existing schemes. The two new calibrated tree priors described in this article offer greater flexibility and control of prior specification in calibrated time-tree inference and divergence time dating, and will remove the need for indirect approaches to the assessment of the combined effect of calibration densities and tree priors in Bayesian phylogenetic inference.

Source: Latest issue
02:44

Mollusks are the most morphologically disparate living animal phylum, they have diversified into all habitats, and have a deep fossil record. Monophyly and identity of their eight living classes is undisputed, but relationships between these groups and patterns of their early radiation have remained elusive. Arguments about traditional morphological phylogeny focus on a small number of topological concepts but often without regard to proximity of the individual classes. In contrast, molecular studies have proposed a number of radically different, inherently contradictory, and controversial sister relationships. Here, we assembled a data set of 42 unique published trees describing molluscan interrelationships. We used these data to ask several questions about the state of resolution of molluscan phylogeny compared with a null model of the variation possible in random trees constructed from a monophyletic assemblage of eight terminals. Although 27 different unique trees have been proposed from morphological inference, the majority of these are not statistically different from each other. Within the available molecular topologies, only four studies to date have included the deep sea class Monoplacophora; but 36.4% of all trees are not significantly different. We also present supertrees derived from two data partitions and three methods, including all available molecular molluscan phylogenies, which will form the basis for future hypothesis testing. The supertrees presented here were not constructed to provide yet another hypothesis of molluscan relationships, but rather to algorithmically evaluate the relationships present in the disparate published topologies. Based on the totality of available evidence, certain patterns of relatedness among constituent taxa become clear. The internodal distance is consistently short between a few taxon pairs, particularly supporting the relatedness of Monoplacophora and the chitons, Polyplacophora. Other taxon pairs are rarely or never found in close proximity, such as the vermiform Caudofoveata and Bivalvia. Our results have specific utility for guiding constructive research planning to better test relationships in Mollusca as well as other problematic groups. Taxa with consistently proximate relationships should be the focus of a combined approach in a concerted assessment of potential genetic and anatomical homology, whereas unequivocally distant taxa will make the most constructive choices for exemplar selection in higher level phylogenomic analyses.

Source: Latest issue
02:44

A major concern in molecular clock dating is how to use information from the fossil record to calibrate genetic distances from DNA sequences. Here we apply three Bayesian dating methods that differ in how calibration is achieved—"node dating" (ND) in BEAST, "total evidence" (TE) dating in MrBayes, and the "fossilized birth–death" (FBD) in FDPPDiv—to infer divergence times in the royal ferns. Osmundaceae have 16–17 species in four genera, two mainly in the Northern Hemisphere and two in South Africa and Australasia; they are the sister clade to the remaining leptosporangiate ferns. Their fossil record consists of at least 150 species in ~17 genera. For ND, we used the five oldest fossils, whereas for TE and FBD dating, which do not require forcing fossils to nodes and thus can use more fossils, we included up to 36 rhizomes and frond compression/impression fossils, which for TE dating were scored for 33 morphological characters. We also subsampled 10%, 25%, and 50% of the 36 fossils to assess model sensitivity. FBD-derived divergence ages were generally greater than those inferred from ND; two of seven TE-derived ages agreed with FBD-obtained ages, the others were much younger or much older than ND or FBD ages. We prefer the FBD-derived ages because they best fit the Osmundales fossil record (including Triassic fossils not used in our study). Under the preferred model, the clade encompassing extant Osmundaceae (and many fossils) dates to the latest Paleozoic to Early Triassic; divergences of the extant species occurred during the Neogene. Under the assumption of constant speciation and extinction rates, the FBD approach yielded speciation and extinction rates that overlapped those obtained from just neontological data. However, FBD estimates of speciation and extinction are sensitive to violations in the assumption of continuous fossil sampling; therefore, these estimates should be treated with caution.

Source: Latest issue
02:44

Most eukaryotic lineages are microbial, and many have only recently been sampled for phylogenetic studies or remain in the "dark area" of the tree of life where there are no molecular data. To assess relationships among eukaryotic lineages, we perform a taxon-rich phylogenomic analysis including 232 eukaryotes selected to maximize taxonomic diversity and up to 1554 genes chosen as vertically inherited based on their broad distribution among eukaryotes. We also include sequences from 486 bacteria and 84 archaea to assess the impact of endosymbiotic gene transfer (EGT) from plastids and to detect contamination. Overall, our analyses are consistent with other less taxon-rich estimates of the eukaryotic tree of life, and we recover strong support for five major clades: Amoebozoa, Excavata (without the genus Malawimonas), Opisthokonta, Archaeplastida, and SAR (Stramenopila, Alveolata, and Rhizaria). Our analyses also highlight the existence of "orphan" lineages, lineages that lack robust placement in the eukaryotic tree of life, and indicate the possibility of as yet undiscovered diversity. In analyses including bacteria and archaea, we find that approximately 10% of the 1554 genes, which we choose because they are found in four or five of the five major eukaryotic clades and hence may be more likely to be inherited vertically, appear to have been acquired from cyanobacteria through EGT in photosynthetic lineages. Removing these EGT genes places the green algae as sister to the glaucophytes instead of the red algae, suggesting that unknowingly including genes of plastid origin, and combining them with genes of nuclear origin, may mislead phylogenetic estimates. Finally, the large size of our data set allows comparative analyses of subsets of data; alignments built from randomly sampled sites provide greater support, particularly for deep relationships, than do equivalent-sized data sets built from randomly sampled genes.

Source: Latest issue
02:44

Despite an increasingly vast literature on cophylogenetic reconstructions for studying host–parasite associations, understanding the common evolutionary history of such systems remains a problem that is far from being solved. Most algorithms for host–parasite reconciliation use an event-based model, where the events include in general (a subset of) cospeciation, duplication, loss, and host switch. All known parsimonious event-based methods then assign a cost to each type of event in order to find a reconstruction of minimum cost. The main problem with this approach is that the cost of the events strongly influences the reconciliation obtained. Some earlier approaches attempt to avoid this problem by finding a Pareto set of solutions and hence by considering event costs under some minimization constraints. To deal with this problem, we developed an algorithm, called Coala, for estimating the frequency of the events based on an approximate Bayesian computation approach. The benefits of this method are 2-fold: (i) it provides more confidence in the set of costs to be used in a reconciliation, and (ii) it allows estimation of the frequency of the events in cases where the data set consists of trees with a large number of taxa. We evaluate our method on simulated and on biological data sets. We show that in both cases, for the same pair of host and parasite trees, different sets of frequencies for the events lead to equally probable solutions. Moreover, often these solutions differ greatly in terms of the number of inferred events. It appears crucial to take this into account before attempting any further biological interpretation of such reconciliations. More generally, we also show that the set of frequencies can vary widely depending on the input host and parasite trees. Indiscriminately applying a standard vector of costs may thus not be a good strategy.

Source: Latest issue
02:44

Tens of thousands of phylogenetic trees, describing the evolutionary relationships between hundreds of thousands of taxa, are readily obtainable from various databases. From such trees, inferences can be made about the underlying macroevolutionary processes, yet remarkably these processes are still poorly understood. Simple and widely used evolutionary null models are problematic: Empirical trees show very different imbalance between the sizes of the daughter clades of ancestral taxa compared to what models predict. Obtaining a simple evolutionary model that is both biologically plausible and produces the imbalance seen in empirical trees is a challenging problem, to which none of the existing models provide a satisfying answer. Here we propose a simple, biologically plausible macroevolutionary model in which the rate of speciation decreases with species age, whereas extinction rates can vary quite generally. We show that this model provides a remarkable fit to the thousands of trees stored in the online database TreeBase. The biological motivation for the identified age-dependent speciation process may be that recently evolved taxa often colonize new regions or niches and may initially experience little competition. These new taxa are thus more likely to give rise to further new taxa than a taxon that has remained largely unchanged and is, therefore, well adapted to its niche. We show that age-dependent speciation may also be the result of different within-species populations following the same laws of lineage splitting to produce new species. As the fit of our model to the tree database shows, this simple biological motivation provides an explanation for a long standing problem in macroevolution.

Source: Latest issue
02:44

Prior distributions can have a strong effect on the results of Bayesian analyses. However, no general consensus exists for how priors should be set in all circumstances. Branch-length priors are of particular interest for phylogenetics, because they affect many parameters and biologically relevant inferences have been shown to be sensitive to the chosen prior distribution. Here, we explore the use of outside information to set informed branch-length priors and compare inferences from these informed analyses to those using default settings. For both the commonly used exponential and the newly proposed compound Dirichlet prior distributions, the incorporation of relevant outside information improves inferences for data sets that have produced problematic branch- and tree-length estimates under default settings. We suggest that informed priors are worthy of further exploration for phylogenetics.

Source: Latest issue
02:44

There is a rising awareness that species trees are best inferred from multiple loci while taking into account processes affecting individual gene trees, such as substitution model error (failure of the model to account for the complexity of the data) and coalescent stochasticity (presence of incomplete lineage sorting [ILS]). Although most studies have been carried out in the context of dichotomous species trees, these processes operate also in more complex evolutionary histories involving multiple hybridizations and polyploidy. Recently, methods have been developed that accurately handle ILS in allopolyploids, but they are thus far restricted to networks of diploids and tetraploids. We propose a procedure that improves on this limitation by designing a workflow that assigns homoeologs to hypothetical diploid ancestral genomes prior to genome tree construction. Conflicting assignment hypotheses are evaluated against substitution model error and coalescent stochasticity. Incongruence that cannot be explained by stochastic mechanisms needs to be explained by other processes (e.g., homoploid hybridization or paralogy). The data can then be filtered to build multilabeled genome phylogenies using inference methods that can recover species trees, either in the face of substitution model error and coalescent stochasticity alone, or while simultaneously accounting for hybridization. Methods are already available for folding the resulting multilabeled genome phylogeny into a network. We apply the workflow to the reconstruction of the reticulate phylogeny of the plant genus Fumaria (Papaveraceae) with ploidal levels ranging from 2$$x$$ to 14$$x$$. We describe the challenges in recovering nuclear NRPB2 homoeologs in high ploidy species while combining in vivo cloning and direct sequencing techniques. Using parametric bootstrapping simulations we assign nuclear homoeologs and chloroplast sequences (four concatenated loci) to their common hypothetical diploid ancestral genomes. As these assignments hinge on effective population size assumptions, we investigate how varying these assumptions impacts the recovered multilabeled genome phylogeny.

Source: Latest issue
02:44

In order to gain an understanding of the effectiveness of phylogenetic Markov chain Monte Carlo (MCMC), it is important to understand how quickly the empirical distribution of the MCMC converges to the posterior distribution. In this article, we investigate this problem on phylogenetic tree topologies with a metric that is especially well suited to the task: the subtree prune-and-regraft (SPR) metric. This metric directly corresponds to the minimum number of MCMC rearrangements required to move between trees in common phylogenetic MCMC implementations. We develop a novel graph-based approach to analyze tree posteriors and find that the SPR metric is much more informative than simpler metrics that are unrelated to MCMC moves. In doing so, we show conclusively that topological peaks do occur in Bayesian phylogenetic posteriors from real data sets as sampled with standard MCMC approaches, investigate the efficiency of Metropolis-coupled MCMC (MCMCMC) in traversing the valleys between peaks, and show that conditional clade distribution (CCD) can have systematic problems when there are multiple peaks.

Source: Latest issue
02:44

Phylogenetic methods typically rely on an appropriate model of how data evolved in order to infer an accurate phylogenetic tree. For molecular data, standard statistical methods have provided an effective strategy for extracting phylogenetic information from aligned sequence data when each site (character) is subject to a common process. However, for other types of data (e.g., morphological data), characters can be too ambiguous, homoplastic, or saturated to develop models that are effective at capturing the underlying process of change. To address this, we examine the properties of a classic but neglected method for inferring splits in an underlying tree, namely, maximum compatibility. By adopting a simple and extreme model in which each character either fits perfectly on some tree, or is entirely random (but it is not known which class any character belongs to) we are able to derive exact and explicit formulae regarding the performance of maximum compatibility. We show that this method is able to identify a set of non-trivial homoplasy-free characters, when the number $$n$$ of taxa is large, even when the number of random characters is large. In contrast, we show that a method that makes more uniform use of all the data—maximum parsimony—can provably estimate trees in which none of the original homoplasy-free characters support splits.

Source: Latest issue
02:44

Müllerian mimicry among Neotropical Heliconiini butterflies is an excellent example of natural selection, associated with the diversification of a large continental-scale radiation. Some of the processes driving the evolution of mimicry rings are likely to generate incongruent phylogenetic signals across the assemblage, and thus pose a challenge for systematics. We use a data set of 22 mitochondrial and nuclear markers from 92% of species in the tribe, obtained by Sanger sequencing and de novo assembly of short read data, to re-examine the phylogeny of Heliconiini with both supermatrix and multispecies coalescent approaches, characterize the patterns of conflicting signal, and compare the performance of various methodological approaches to reflect the heterogeneity across the data. Despite the large extent of reticulate signal and strong conflict between markers, nearly identical topologies are consistently recovered by most of the analyses, although the supermatrix approach failed to reflect the underlying variation in the history of individual loci. However, the supermatrix represents a useful approximation where multiple rare species represented by short sequences can be incorporated easily. The first comprehensive, time-calibrated phylogeny of this group is used to test the hypotheses of a diversification rate increase driven by the dramatic environmental changes in the Neotropics over the past 23 myr, or changes caused by diversity-dependent effects on the rate of diversification. We find that the rate of diversification has increased on the branch leading to the presently most species-rich genus Heliconius, but the change occurred gradually and cannot be unequivocally attributed to a specific environmental driver. Our study provides comprehensive comparison of philosophically distinct species tree reconstruction methods and provides insights into the diversification of an important insect radiation in the most biodiverse region of the planet.

Source: Latest issue
02:44

Phycas is open source, freely available Bayesian phylogenetics software written primarily in C++ but with a Python interface. Phycas specializes in Bayesian model selection for nucleotide sequence data, particularly the estimation of marginal likelihoods, central to computing Bayes Factors. Marginal likelihoods can be estimated using newer methods (Thermodynamic Integration and Generalized Steppingstone) that are more accurate than the widely used Harmonic Mean estimator. In addition, Phycas supports two posterior predictive approaches to model selection: Gelfand–Ghosh and Conditional Predictive Ordinates. The General Time Reversible family of substitution models, as well as a codon model, are available, and data can be partitioned with all parameters unlinked except tree topology and edge lengths. Phycas provides for analyses in which the prior on tree topologies allows polytomous trees as well as fully resolved trees, and provides for several choices for edge length priors, including a hierarchical model as well as the recently described compound Dirichlet prior, which helps avoid overly informative induced priors on tree length.

Source: Latest issue
02:44

Virtually all models for reconstructing ancestral states for discrete characters make the crucial assumption that the trait of interest evolves at a uniform rate across the entire tree. However, this assumption is unlikely to hold in many situations, particularly as ancestral state reconstructions are being performed on increasingly large phylogenies. Here, we show how failure to account for such variable evolutionary rates can cause highly anomalous (and likely incorrect) results, while three methods that accommodate rate variability yield the opposite, more plausible, and more robust reconstructions. The random local clock method, implemented in BEAST, estimates the position and magnitude of rate changes on the tree; split BiSSE estimates separate rate parameters for pre-specified clades; and the hidden rates model partitions each character state into a number of rate categories. Simulations show the inadequacy of traditional models when characters evolve with both asymmetry (different rates of change between states within a character) and heterotachy (different rates of character evolution across different clades). The importance of accounting for rate heterogeneity in ancestral state reconstruction is highlighted empirically with a new analysis of the evolution of viviparity in squamate reptiles, which reveal a predominance of forward (oviparous-viviparous) transitions and very few reversals.

Source: Latest issue
01:25
Greeting Colleagues, We are hoping to hire research assistant to work on the biodiversity of marine ciliates within the Biological Sciences department at Smith College, USA. The research project involves the use of molecular tools to explore biodiversity of eukaryotic microorganisms, particularly ciliates, in marine environments. The RA will also maintain data in a well-organized database, with attention paid to details of completeness and quality. This research project focuses on characterization of near-shore communities of oligotrich and choreotrich ciliates using DGGE and next generation sequencing technologies. This is a grant-funded, limited-term position with the possibility of renewal. You can find more details and submit your materials through: http://bit.ly/1CDFVAq (search for “research assistant$B!I(B) Please forward this email and attachment to anyone you think might be interested in this position. Thank you Dr. Jean-David Grattepanche and Dr. Laura A. Katz Jean-David GRATTEPANCHE , Postdoctoral fellow PhD in Biological Oceanography jgrattepanche@smith.edu http://bit.ly/1GGHeU0 Katz lab Department of Biological sciences Smith College 44 College Lane Northampton. MA 01063 "Jean-David GRATTEPANCHE (Smith mail)” via Gmail
Source: EVOLDIR