There are currently 0 users and 83 guests online.
Systematic Biology - RSS feed of current issue
Last update3 min 28 sec ago
February 17, 2014
Species Delimitation Using Bayes Factors: Simulations and Application to the Sceloporus scalaris Species Group (Squamata: Phrynosomatidae)
Current molecular methods of species delimitation are limited by the types of species delimitation models and scenarios that can be tested. Bayes factors allow for more flexibility in testing non-nested species delimitation models and hypotheses of individual assignment to alternative lineages. Here, we examined the efficacy of Bayes factors in delimiting species through simulations and empirical data from the Sceloporus scalaris species group. Marginal-likelihood scores of competing species delimitation models, from which Bayes factor values were compared, were estimated with four different methods: harmonic mean estimation (HME), smoothed harmonic mean estimation (sHME), path-sampling/thermodynamic integration (PS), and stepping-stone (SS) analysis. We also performed model selection using a posterior simulation-based analog of the Akaike information criterion through Markov chain Monte Carlo analysis (AICM). Bayes factor species delimitation results from the empirical data were then compared with results from the reversible-jump MCMC (rjMCMC) coalescent-based species delimitation method Bayesian Phylogenetics and Phylogeography (BP&P). Simulation results show that HME and sHME perform poorly compared with PS and SS marginal-likelihood estimators when identifying the true species delimitation model. Furthermore, Bayes factor delimitation (BFD) of species showed improved performance when species limits are tested by reassigning individuals between species, as opposed to either lumping or splitting lineages. In the empirical data, BFD through PS and SS analyses, as well as the rjMCMC method, each provide support for the recognition of all scalaris group taxa as independent evolutionary lineages. Bayes factor species delimitation and BP&P also support the recognition of three previously undescribed lineages. In both simulated and empirical data sets, harmonic and smoothed harmonic mean marginal-likelihood estimators provided much higher marginal-likelihood estimates than PS and SS estimators. The AICM displayed poor repeatability in both simulated and empirical data sets, and produced inconsistent model rankings across replicate runs with the empirical data. Our results suggest that species delimitation through the use of Bayes factors with marginal-likelihood estimates via PS or SS analyses provide a useful and complementary alternative to existing species delimitation methods.
Introgression and Phenotypic Assimilation in Zimmerius Flycatchers (Tyrannidae): Population Genetic and Phylogenetic Inferences from Genome-Wide SNPs
Genetic introgression is pervasive in nature and may lead to large-scale phenotypic assimilation and/or admixture of populations, but there is limited knowledge on whether large phenotypic changes are typically accompanied by high levels of introgression throughout the genome. Using bioacoustic, biometric, and spectrophotometric data from a flycatcher (Tyrannidae) system in the Neotropical genus Zimmerius, we document a mosaic pattern of phenotypic admixture in which a population of Zimmerius viridiflavus in northern Peru (henceforth "mosaic") is vocally and biometrically similar to conspecifics to the south but shares plumage characteristics with a different species (Zimmerius chrysops) to the north. To clarify the origins of the mosaic population, we used the RAD-seq approach to generate a data set of 37,361 genome-wide single nucleotide polymorphisms (SNPs). A range of population-genetic diagnostics shows that the genome of the mosaic population is largely indistinguishable from southern Z. viridiflavus and distinct from northern Z. chrysops, and the application of parsimony and species tree methods to the genome-wide SNP data set confirms the close affinity of the mosaic population with southern Z. viridiflavus. Even so, using a subset of 2710 SNPs found across all sampled lineages in configurations appropriate for a recently proposed statistical ("ABBA/BABA") test that distinguishes gene flow from incomplete lineage sorting, we detected low levels of gene flow from northern Z. chrysops into the mosaic population. Mapping the candidate loci for introgression from Z. chrysops into the mosaic population to the zebra finch genome reveals close linkage with genes significantly enriched in functions involving cell projection and plasma membranes. Introgression of key alleles may have led to phenotypic assimilation in the plumage of mosaic birds, suggesting that selection may have been a key factor facilitating introgression.
Many questions in evolutionary biology require an estimate of divergence times but, for groups with a sparse fossil record, such estimates rely heavily on molecular dating methods. The accuracy of these methods depends on both an adequate underlying model and the appropriate implementation of fossil evidence as calibration points. We explore the effect of these in Poaceae (grasses), a diverse plant lineage with a very limited fossil record, focusing particularly on dating the early divergences in the group. We show that molecular dating based on a data set of plastid markers is strongly dependent on the model assumptions. In particular, an acceleration of evolutionary rates at the base of Poaceae followed by a deceleration in the descendants strongly biases methods that assume an autocorrelation of rates. This problem can be circumvented by using markers that have lower rate variation, and we show that phylogenetic markers extracted from complete nuclear genomes can be a useful complement to the more commonly used plastid markers. However, estimates of divergence times remain strongly affected by different implementations of fossil calibration points. Analyses calibrated with only macrofossils lead to estimates for the age of core Poaceae ~51–55 Ma, but the inclusion of microfossil evidence pushes this age to 74–82 Ma and leads to lower estimated evolutionary rates in grasses. These results emphasize the importance of considering markers from multiple genomes and alternative fossil placements when addressing evolutionary issues that depend on ages estimated for important groups.
Quantifying and Comparing Phylogenetic Evolutionary Rates for Shape and Other High-Dimensional Phenotypic Data
Many questions in evolutionary biology require the quantification and comparison of rates of phenotypic evolution. Recently, phylogenetic comparative methods have been developed for comparing evolutionary rates on a phylogeny for single, univariate traits (2), and evolutionary rate matrices (R) for sets of traits treated simultaneously. However, high-dimensional traits like shape remain under-examined with this framework, because methods suited for such data have not been fully developed. In this article, I describe a method to quantify phylogenetic evolutionary rates for high-dimensional multivariate data , found from the equivalency between statistical methods based on covariance matrices and those based on distance matrices (R-mode and Q-mode methods). I then use simulations to evaluate the statistical performance of hypothesis-testing procedures that compare for two or more groups of species on a phylogeny. Under both isotropic and non-isotropic conditions, and for differing numbers of trait dimensions, the proposed method displays appropriate Type I error and high statistical power for detecting known differences in among groups. In contrast, the Type I error rate of likelihood tests based on the evolutionary rate matrix (R) increases as the number of trait dimensions (p) increases, and becomes unacceptably large when only a few trait dimensions are considered. Further, likelihood tests based on R cannot be computed when the number of trait dimensions equals or exceeds the number of taxa in the phylogeny (i.e., when p ≥ N). These results demonstrate that tests based on provide a useful means of comparing evolutionary rates for high-dimensional data that are otherwise not analytically accessible to methods based on the evolutionary rate matrix. This advance thus expands the phylogenetic comparative toolkit for high-dimensional phenotypic traits like shape. Finally, I illustrate the utility of the new approach by evaluating rates of head shape evolution in a lineage of Plethodon salamanders.
Reconstructing the biogeographic history of groups present in continuous arid landscapes is challenging due to the difficulties in defining discrete areas for analyses, and even more so when species largely overlap both in terms of geography and habitat preference. In this study, we use a novel approach to estimate ancestral areas for the small plant genus Centipeda. We apply continuous diffusion of geography by a relaxed random walk where each species is sampled from its extant distribution on an empirical distribution of time-calibrated species-trees. Using a distribution of previously published substitution rates of the internal transcribed spacer (ITS) for Asteraceae, we show how the evolution of Centipeda correlates with the temporal increase of aridity in the arid zone since the Pliocene. Geographic estimates of ancestral species show a consistent pattern of speciation of early lineages in the Lake Eyre region, with a division in more northerly and southerly groups since ~840 ka. Summarizing the geographic slices of species-trees at the time of the latest speciation event (~20 ka), indicates no presence of the genus in Australia west of the combined desert belt of the Nullabor Plain, the Great Victoria Desert, the Gibson Desert, and the Great Sandy Desert, or beyond the main continental shelf of Australia. The result indicates all western occurrences of the genus to be a result of recent dispersal rather than ancient vicariance. This study contributes to our understanding of the spatiotemporal processes shaping the flora of the arid zone, and offers a significant improvement in inference of ancestral areas for any organismal group distributed where it remains difficult to describe geography in terms of discrete areas.
Adaptive radiations such as the Darwin finches in the Galapagos or the cichlid fishes from the Eastern African Great Lakes have been a constant source of inspiration for biologists and a stimulus for evolutionary thinking. A central concept behind adaptive radiation is that of evolution by niche shifts, or ecological speciation. Evidence for adaptive radiations generally requires a strong correlation between phenotypic traits and the environment. But adaptive traits are often cryptic, hence making this phenotype-environment approach difficult to implement. Here we propose a procedure for detecting adaptive radiation that focuses on species' ecological niche comparisons. It evaluates whether past ecological disparity in a group fits better a neutral Brownian motion model of ecological divergence or a niche shift model. We have evaluated this approach on New Zealand rockcresses (Pachycladon) that recently radiated in the New Zealand Alps. We show that the pattern of ecological divergence rejects the neutral model and is consistent with that of a niche shift model. Our approach to detect adaptive radiation has the advantage over alternative approaches that it focuses on ecological niches, a key concept behind adaptive radiation. It also provides a way to evaluate the importance of ecological speciation in adaptive radiations and will have general application in evolutionary studies. In the case of Pachycladon, the high estimated diversification rate, the distinctive ecological niches of species, and the evidence for ecological speciation suggest a remarkable example of adaptive radiation.
Fossil-based estimates of diversity and evolutionary dynamics mainly rely on the study of morphological variation. Unfortunately, organism remains are often altered by post-mortem taphonomic processes such as weathering or distortion. Such a loss of information often prevents quantitative multivariate description and statistically-controlled comparisons of extinct species based on morphometric data. A common way to deal with missing data involves imputation methods that directly fill the missing cases with model estimates. Over the last years, several empirically-determined thresholds for the maximum acceptable proportion of missing values have been proposed in the literature, whereas other studies showed that this limit actually depends on various properties of the study data set and of the selected imputation method, and is by no way generalizable. We evaluate the relative performances of seven multiple imputation (MI) techniques through a simulation-based analysis under three distinct patterns of missing data distribution. Overall, Fully Conditional Specification and Expectation–Maximization algorithms provide the best compromises between imputation accuracy and coverage probability. MI techniques appear remarkably robust to the violation of basic assumptions such as the occurrence of taxonomically or anatomically biased patterns of missing data distribution, making differences in simulation results between the three patterns of missing data distribution much smaller than differences between the individual MI techniques. Based on these results, rather than proposing a new (set of) threshold value(s), we develop an approach combining the use of MIs with procrustean superimposition of principal component analysis results, in order to directly visualize the effect of individual missing data imputation on an ordinated space. We provide an R function for users to implement the proposed procedure.
Partial Sequence Homogenization in the 5S Multigene Families May Generate Sequence Chimeras and Spurious Results in Phylogenetic Reconstructions
Multigene families have provided opportunities for evolutionary biologists to assess molecular evolution processes and phylogenetic reconstructions at deep and shallow systematic levels. However, the use of these markers is not free of technical and analytical challenges. Many evolutionary studies that used the nuclear 5S rDNA gene family rarely used contiguous 5S coding sequences due to the routine use of head-to-tail polymerase chain reaction primers that are anchored to the coding region. Moreover, the 5S coding sequences have been concatenated with independent, adjacent gene units in many studies, creating simulated chimeric genes as the raw data for evolutionary analysis. This practice is based on the tacitly assumed, but rarely tested, hypothesis that strict intra-locus concerted evolution processes are operating in 5S rDNA genes, without any empirical evidence as to whether it holds for the recovered data. The potential pitfalls of analysing the patterns of molecular evolution and reconstructing phylogenies based on these chimeric genes have not been assessed to date. Here, we compared the sequence integrity and phylogenetic behavior of entire versus concatenated 5S coding regions from a real data set obtained from closely related plant species (Medicago, Fabaceae). Our results suggest that within arrays sequence homogenization is partially operating in the 5S coding region, which is traditionally assumed to be highly conserved. Consequently, concatenating 5S genes increases haplotype diversity, generating novel chimeric genotypes that most likely do not exist within the genome. In addition, the patterns of gene evolution are distorted, leading to incorrect haplotype relationships in some evolutionary reconstructions.
Coalescent Species Delimitation in Milksnakes (Genus Lampropeltis) and Impacts on Phylogenetic Comparative Analyses
Both gene-tree discordance and unrecognized diversity are sources of error for accurate estimation of species trees, and can affect downstream diversification analyses by obscuring the correct number of nodes, their density, and the lengths of the branches subtending them. Although the theoretical impact of gene-tree discordance on evolutionary analyses has been examined previously, the effect of unsampled and cryptic diversity has not. Here, we examine how delimitation of previously unrecognized diversity in the milksnake (Lampropeltis triangulum) and use of a species-tree approach affects both estimation of the Lampropeltis phylogeny and comparative analyses with respect to the timing of diversification. Coalescent species delimitation indicates that L. triangulum is not monophyletic and that there are multiple species of milksnake, which increases the known species diversity in the genus Lampropeltis by 40%. Both genealogical and temporal discordance occurs between gene trees and the species tree, with evidence that mitochondrial DNA (mtDNA) introgression is a main factor. This discordance is further manifested in the preferred models of diversification, where the concatenated gene tree strongly supports an early burst of speciation during the Miocene, in contrast to species-tree estimates where diversification follows a birth–death model and speciation occurs mostly in the Pliocene and Pleistocene. This study highlights the crucial interaction among coalescent-based phylogeography and species delimitation, systematics, and species diversification analyses.
The Barcode of Life
The Genealogical World of Phylogenetic Networks
BMC Evolutionary Biology
Molecular Biology and Evolution