There are currently 0 users and 38 guests online.
Last update1 hour 27 min ago
January 27, 2015
Guangchuang Yu wrote:
I developed an R/Bioconductor package, ggtree, for phylogenetic tree visualization.
You can refer to the online document: http://www.bioconductor.org/packages/3.1/bioc/vignettes/ggtree/inst/doc/ggtree.html
January 26, 2015
Erick Matsen wrote:
This paper appeared in a journal I don't commonly read, so I wanted to highlight it. The ideas are not new (as they acknowledge) but it's a good reminder that we should be fighting model mis-specification on all fronts. Comments, anyone?www.ncbi.nlm.nih.gov Correcting for sequencing error in maximum likelihood phylogeny inference. MK Kuhner and J McGill, G3 (Bethesda, Md.), 2014
Accurate phylogenies are critical to taxonomy as well as studies of speciation processes and other evolutionary patterns. Accurate branch lengths in phylogenies are critical for dating and rate measurements. Such accuracy may be jeopardized by unacknowledged sequencing error. We use simulated data to test a correction for DNA sequencing error in maximum likelihood phylogeny inference. Over a wide range of data polymorphism and true error rate, we found that correcting for sequencing error improves recovery of the branch lengths, even if the assumed error rate is up to twice the true error rate. Low error rates have little effect on recovery of the topology. When error is high, correction improves topological inference; however, when error is extremely high, using an assumed error rate greater than the true error rate leads to poor recovery of both topology and branch lengths. The error correction approach tested here was proposed in 2004 but has not been widely used, perhaps because researchers do not want to commit to an estimate of the error rate. This study shows that correction with an approximate error rate is generally preferable to ignoring the issue.
January 24, 2015
Erick Matsen wrote:
MCEB - Mathematical and Computational Evolutionary Biology 21-25 June 2015 - Porquerolles Island, South of France.
Pre-registration deadline: February 10th Notification to applicants: February 28th Final list of attendees: April 1st
Scope: Mathematical and computational tools and concepts form an essential basis for modern evolutionary studies. The goal of the MCEB conference (at its 7th edition) is to bring together scientists with diverse backgrounds to present recent advances and discuss open problems in the field of mathematical and computational evolutionary biology. The theme of this year?s edition will be new data, new questions, new methods. New generation sequencing techniques have multiplied not just the amount, but also the types of genetic data produced, giving rise to new questions, and new methodologies to answer them. These methodologies are often cross-disciplinary, with applications to diverse research topics. General concepts, models, methods and algorithms will also be presented and discussed, just as during the previous conference editions.
Where and when: Porquerolles Island, near Hy?res, in the South of France, 21-25 June 2015.
Cost: Conference fees including accommodation for four nights, meals, coffee breaks, etc., will be between 300? and 630?, all inclusive, and will vary depending on the room. PhD students and postdocs will benefit of the cheapest rooms.
David Bryant - http://www.maths.otago.ac.nz/~dbryant/ University of Otago, NZ Recovering phylogeny and demographics from SNPs: prospects and limitations
Jukka Corander - http://www.helsinki.fi/bsg/ Bayesian Statistics Group, University of Helsinki, FI ABC meets machine learning - fitting intractable models to genome data
Asger Hobolth - http://www.daimi.au.dk/~asger/ Bioinformatics Research Center (BiRC), Aarhus University, DK Modelling DNA sequence evolution within and between species
Philippe Lemey https://rega.kuleuven.be/cev/ecv/lab-members/PhilippeLemey.html Rega Institute, Clinical and Epidemiological Virology, BE Data integrating in viral evolutionary inference: from spatial dynamics to trait evolution
Bernard Moret - http://lcbb.epfl.ch/ Laboratory for Computational Biology and Bioinformatics, EPFL, CH Phylogenetic Transfer of Knowledge
Ludovic Orlando - http://geogenetics.ku.dk/research_groups/palaeomix_group/ Center for GeoGenetics, Natural History Museum of Denmark, DK Ancient DNA: from very old molecules to genomes and epigenomes
Molly Przeworski - http://przeworski.c2b2.columbia.edu/ Columbia University, New york, USA A population-genetic approach to the study of mutation and recombination in humans
For more information, visit the website at: http://www.lirmm.fr/mceb2015/
January 22, 2015
Roderic D M Page wrote:
I've been experimenting with drawing "geophylogenies" on web maps and have created a live demo at http://iphylo.org/~rpage/geojson-phylogeny-demo/ All a bit crude, but the idea is to take either a NEXUS tree file with added geographical coordinates, or query the BOLD database of DNA barcodes (most of which are geotagged) and create an interactive phylogeny on a map. The demo uses OpenStreeMap, I've code for Google Maps as well (hope to add this to the demo shortly). The layout borrows from GenGIS (see http://dx.doi.org/10.1101/gr.095612.109 and http://kiwi.cs.dal.ca/GenGIS/Main_Page ) but doesn't need a standalone program. There are some other advantages to using GeoJSON, such as storage in document databases like CouchDB, but I'll blog about these.
To give you a sense of the visualisation, below is the classic Banza example that I used in 2007 when I started playing with trees on Google Earth http://iphylo.blogspot.co.uk/2007/06/google-earth-phylogenies.html, inspired by Bill Piel's pioneering experiments.
January 13, 2015
Erick Matsen wrote:
I had no idea that this was coming down the pipe, but was excited to see a paper come out describing Phycas (http://www.phycas.org) from its authors, including @mtholder. Code is at https://github.com/plewis/phycas.www.ncbi.nlm.nih.gov Phycas: Software for Bayesian Phylogenetic Analysis. PO Lewis, MT Holder and DL Swofford, Systematic biology, Jan 9 2015
Phycas is open source, freely available Bayesian phylogenetics software written primarily in C++ but with a Python interface. Phycas specializes in Bayesian model selection for nucleotide sequence data, particularly the estimation of marginal likelihoods, central to computing Bayes Factors. Marginal likelihoods can be estimated using newer methods (Thermodynamic Integration and Generalized Steppingstone) that are more accurate than the widely used harmonic mean estimator. In addition, Phycas supports two posterior predictive approaches to model selection: Gelfand-Ghosh and Conditional Predictive Ordinates. The GTR family of substitution models, as well as a codon model, are available, and data can be partitioned with all parameters unlinked except tree topology and edge lengths. Phycas provides for analyses in which the prior on tree topologies allows polytomous trees as well as fully resolved trees, and provides for several choices for edge length priors, including a hierarchical model as well as the recently described compound Dirichlet prior, which helps avoid overly informative induced priors on tree length.
I haven't played with Phycas, but I can describe what I think of as its "killer feature," which is the ability to use a prior incorporating polytomous trees. That means that the software can return trees that look like this:
where I've put an arrow pointing at the polytomy. In this case, the polytomy shows three descendants of a given lineage.
I would argue that such a representation is a more honest one (a "shrunken" estimate for @nicolas_lartill and friends). That is, if there is not information to resolve an internal node, then an unresolved tree is returned. One often sees such nodes when people collapse nodes in ML trees that have low bootstrap support. However, I think that it's better than that because it's rolled into the actual inference, meaning that the overall likelihoods are properly estimated. In a field where we are already in statistically tenuous territory with the number of discrete parameters being on the order of the number of independent data points (to say nothing of the discrete estimation part) having fewer parameters when appropriate is refreshing.
In the Bayesian phylogenetics world, not allowing such multifurcations can cause some trouble, as was the subject of some research in the mid-2000's. A good culmination of that work is this paper by Z Yang:www.ncbi.nlm.nih.gov Fair-balance paradox, star-tree paradox, and Bayesian phylogenetics. Z Yang, Molecular biology and evolution, Aug 2007
The star-tree paradox refers to the conjecture that the posterior probabilities for the three unrooted trees for four species (or the three rooted trees for three species if the molecular clock is assumed) do not approach 1/3 when the data are generated using the star tree and when the amount of data approaches infinity. It reflects the more general phenomenon of high and presumably spurious posterior probabilities for trees or clades produced by the Bayesian method of phylogenetic reconstruction, and it is perceived to be a manifestation of the deeper problem of the extreme sensitivity of Bayesian model selection to the prior on parameters. Analysis of the star-tree paradox has been hampered by the intractability of the integrals involved. In this article, I use Laplacian expansion to approximate the posterior probabilities for the three rooted trees for three species using binary characters evolving at a constant rate. The approximation enables calculation of posterior tree probabilities for arbitrarily large data sets. Both theoretical analysis of the analogous fair-coin and fair-balance problems and computer simulation for the tree problem confirmed the existence of the star-tree paradox. When the data size n --> infinity, the posterior tree probabilities do not converge to 1/3 each, but they vary among data sets according to a statistical distribution. This distribution is characterized. Two strategies for resolving the star-tree paradox are explored: (1) a nonzero prior probability for the degenerate star tree and (2) an increasingly informative prior forcing the internal branch length toward zero. Both appear to be effective in resolving the paradox, but the latter is simpler to implement. The posterior tree probabilities are found to be very sensitive to the prior.
The point of this work is that if the data truly doesn't have any signal concerning an unresolved node, and we use a Bayesian phylogenetic inference package that doesn't allow multifurcations (every one except for Phycas as far as I know) the data set may (with non-vanishing probability) give very high confidence that one of the resolutions is correct. This is shown in this figure from Yang's paper (click to expand):
See the deep red in the corners? That's showing that in replicate data sets there is a substantial probability that one of the resolutions will be very highly supported.
So, I can't comment on Phycas' usability, runtimes, etc, and Conditional Predictive Ordinates sound nice, but for me, proper support for polytomies is the headlining feature of Phycas. If anyone tries it out, please post here!
December 28, 2014
Erick Matsen wrote:
This morning I stumbled across this paper, which uses discrete Morse theory to analyze a space of "tree with symmetries". This paper has gotten almost no citations in the phylogenetics literature, but I think it shows some methods that may be interesting for folks interested in the large scale structure of tree space:
The topology of spaces of phylogenetic trees with symmetry ☆ Axel Hultman
Natural Dowling analogues of the complex of phylogenetic trees are studied. Using discrete Morse theory, we find their homotopy types. In the process, the homotopy types of certain subposets of Dowling lattices are determined.
December 16, 2014
Erick Matsen wrote:
Come do an independent postdoc collaborating with @trvrb, me, or another computational biologist at the Fred Hutch.Mahan Postdoctoral Fellowship
The Computational Biology Program of Fred Hutchinson Cancer Research Center in Seattle, Washington invites applications for the 2-year Mahan Postdoctoral fellowship. The fellowship will provide an exceptional individual with an early start on their career as an independent scientist by providing two years of salary and other support to complete their proposed research project in the laboratory of a Fred Hutch Computational Biologist mentor who is at the assistant or associate rank.
Faculty of any discipline or rank from the Fred Hutch, UW, or any other institute may be proposed as co-mentors. The fellowship must begin in the lab of the primary Fred Hutch mentor but it may move to another location as long as it benefits the science or career growth opportunities. The project must be focused on learning about biology, must involve a computational or mathematical component, and may include an experimental component. A laboratory trained scientist may satisfy the computational and mathematical requirement by including a training component in their proposal. Computationally strong candidates may include a laboratory training component as well. The research direction should reflect the interests and ideas of the applicant, although the final research proposal may be jointly designed; see below for more detail on fellowship rules and for a list of potential mentors.
See official announcement for more details, and get in touch if you are interested.
December 11, 2014
Andrew Roger wrote:
Does anyone know how to export the ancestral state reconstructions at nodes in trees estimated by PAUP*? The "Reconstruct" command works to generate them, but there doesn't seem to be any straightforward way to export a list of nodes with the ancestral 'sequences' at each node...or maybe there is, but its not obvious (help?)
I am running a programming practical in a CS department with some students at the Master's level. The students essentially re-implemented J. Huelsenbeck's paper that tests all 200 someting possible time-reversible models (see http://mbe.oxfordjournals.org/content/21/6/1123.full) but under Maximum Likelihood using our likelihood library.
Instead of having them write a report, I'd rather like to publish the availability of this little tool somewhere, with a focus on what was done and also on the teaching aspects.
Do you have any suggestions where this could be submitted?
December 3, 2014
December 2, 2014
Brian Foley wrote:
For one example, the molecular phylogeny of primates in this paper is stored in TreeBase: http://www.treebase.org/treebase-web/search/study/summary.html?id=12186
The tree is available in several formats: http://www.treebase.org/treebase-web/search/study/trees.html?id=12186
But I cannot get FigTree (version 1.4.0, on a Mac OSX-8 machine) to open any of those file formats.
it says Error: Unknown command "TITLE" in TREES block.
I can open the NEXUS Treefile and edit it by hand, but it seems to me that there should be a way to have FigTree ignore commands that it does not recognize, with a warning that it is doing so, rather than just refusing to open the tree altogether.
Roderic D M Page wrote:
Members might be interested in the GBIF Ebbe Nielsen Challenge which has just been launched http://www.gbif.org/page/62262 (to actually enter see http://gbif.challengepost.com/ ). There's €30,000 in total prize money, and three months to develop an application, tool, or visualisation that makes use of GBIF-mediated data (if you make it through the first round at the end of three months, you get a few more months to perfect it in time for the finals). The judging panel includes a mix of GBIF-associated people like myself (I chair the GBIF Science Committee), and non-GBIF people such as Lucas Joppa from Microsoft, and Mark Klein from NatureServe.
Those of you who've worked with GBIF data may think of it as solely about distributional data for organisms, but I personally think that there's a lot of scope for adding a phylogenetic perspective to that data. GBIF now also has imported several million geotagged GenBank sequences (albeit not with some issues). So if you have some cool ideas on how to link phylogeny, genomics, and geography, here's a chance to realise those ideas (and maybe win some prize money).
November 27, 2014
Erick Matsen wrote:
Up to about about a week ago, we were using the old version of the Google OAuth API. This being phased out. We have now moved to OAuth2.
A user reported having trouble logging in and getting a Error: redirect_uri_mismatch error. However, after reloading a number of times the problem was resolved.
Let me know if you have persistent problems. Thanks.
November 25, 2014
Andrew Roger wrote:
I'm wondering if those of you developing phylogenetic software tools have thought much about the problem of how we will develop software tools that are able to handle both large numbers of taxa (>100) and large numbers of sites (super-matrices of 50,000 positions and more) but at the same time implement complex substitution models. I think the problem is especially acute for complex 'site-heterogeneous' mixture models such as Lartillot's CAT models or even ones with a set number of classes (e.g. 10-60 as in Lartillot and Gascuel's C-series models) or models that involve large matrices such as codon models or the covarion-type models. Additional computational complexity is introduced by partitioned models where different parameters are allowed to be estimated for different partitions (e.g. edgelengths for different genes in a super-matrix). My concern is that currently large complex data sets cannot be analyzed with the best substitution models currently because the the computational time required to evaluate the likelihoods are prohibitive even with relatively small data sets. Tree-searching becomes nearly impossible under these conditions because of the time required.
I have been using DIVERGE (DetectIng Variability in Evolutionary Rates among GEnes) to test for divergence in function across different clades of a protein superfamily.
I have noticed that when using the 2001 algorithm and the 1999 algorithm (although to a lesser extent) I get negative values for thetaML = the Maximum likelihood estimate of the coefficient of functional divergence (normally lies between 0 and 1). I then get an LRT theta value (likelihood ratio test against the null) of '-1.#IND00' or 'not a number'.
I emailed the developer but had no response yet. I suspect it may have something to do with short branch lengths or something like that...
Has anyone ever used DIVERGE and have some insight as to why this might happen?
November 21, 2014
Meeting: NEXTGEN BIOINFORMATICS USER GROUP and SCOTTISH PHYLOGENY DISCUSSION GROUP
University of St Andrews, UK, 8 December 2014
Invited speaker -
Dr Jo DICKS (National Collection of Yeast Cultures http://www.ncyc.co.uk, Institute of Food Research):
"Estimating and exploiting yeast NGS-based phylogenies for industrial biotechnology".
Contributed talks -
Emma CARROLL: "Assessing the influence of migratory culture on connectivity in the southern right whale".
Deepali BASOYA: "Viral/host gene expression profiles in lymphoid and feather follicle epithelial (FFE) cells infected with Marek's disease virus".
Miguel PINHEIRO: "Determine dimorphic nature of the zoonotic parasite Plasmodium knowlesi".
Georgios KOUTSOVOULOS: "Reconstructing the phylogenetic relationships of nematodes using draft genomes and transcriptomes".
Joanne TAYLOR: "Environment and host genotype influence on fungal endophyte assemblages of Scots Pine".
Attendance is free, but please register in advance.
DETAILS AND REGISTRATION:
Daniel Barker email@example.com
November 17, 2014
November 9, 2014
[Paper] Towards more accurate ancestral protein genotype-phenotype reconstructions with the use of species tree-aware gene trees
Erick Matsen wrote:
Using species-tree aware gene trees for ancestral reconstruction is a good thing!www.ncbi.nlm.nih.gov Towards more accurate ancestral protein genotype-phenotype reconstructions with the use of species tree-aware gene trees. M Groussin, JK Hobbs, GJ Szöllősi, S Gribaldo, VL Arcus and M Gouy, Molecular biology and evolution, Nov 4 2014
The resurrection of ancestral proteins provides direct insight into how natural selection has shaped proteins found in nature. By tracing substitutions along a gene phylogeny, ancestral proteins can be reconstructed in silico and subsequently synthesized in vitro. This elegant strategy reveals the complex mechanisms responsible for the evolution of protein functions and structures. However, to date, all protein resurrection studies have used simplistic approaches for ancestral sequence reconstruction (ASR), including the assumption that a single sequence alignment alone is sufficient to accurately reconstruct the history of the gene family. The impact of such shortcuts on conclusions about ancestral functions has not been investigated. Here, we show with simulations that utilizing information on species history using a model that accounts for the duplication, horizontal transfer and loss (DTL) of genes statistically increases ASR accuracy. This underscores the importance of the tree topology in the inference of putative ancestors. We validate our in silico predictions using in vitro resurrection of the LeuB enzyme for the ancestor of the Firmicutes, a major and ancient bacterial phylum. With this particular protein, our experimental results demonstrate that information on the species phylogeny results in a biochemically more realistic and kinetically more stable ancestral protein. Additional resurrection experiments with different proteins are necessary to statistically quantify the impact of using species tree-aware gene trees on ancestral protein phenotypes. Nonetheless, our results suggest the need for incorporating both sequence and DTL information in future studies of protein resurrections to accurately define the genotype-phenotype space in which proteins diversify.
October 28, 2014
Erick Matsen wrote:
@cwhidden and I would like to sample from the subtree-prune-regraft (SPR) random walk on rooted phylogenetic trees. Does anyone know an easy way to do this? Chris could roll his own, but I'll bet that there is an easy solution out there.
If we wanted to sample from the random walk on unrooted trees, we could sample from the MrBayes prior. BEAST is rooted, which is nice, but has non-uniform priors on topologies. @mlandis would this be easy with revBayes?
The Genealogical World of Phylogenetic Networks
BMC Evolutionary Biology