There are currently 0 users and 41 guests online.
Last update1 hour 42 min ago
March 25, 2015
Interesting blog post on single precision arithmetic, compiler flags, and FastTree via Jonathan Eisen on Facebook (no, really) http://darlinglab.org/blog/2015/03/23/not-so-fast-fasttree.html
March 4, 2015
It looks like David Maddison, Rob Knight, and some others got an NSF grant to build out a website and have some meetings.
They are trying to stimulate some discussion around these topics:
March 3, 2015
“**20th International Bioinformatics Workshop on Virus Evolution and Molecular Epidemiology
The University of the West Indies (UWI), St. Augustine Campus, Trinidad and Tobago Sunday, August 9 - Friday, August 14, 2015
We are announcing the organization of the international workshop on Virus Evolution and Molecular Epidemiology (VEME) in 2015, hosted by the University of the West Indies (UWI), St. Augustine, Trinidad and Tobago on behalf of our main sponsor the International Committee for Genetic Engineering and Biotechnology. The workshop is co-organised by the University of Leuven (Belgium) and the J. Craig Venter Institute (USA).
We plan to organize a 'Phylogenetic Inference' module that offers the theoretical background and hands-on experience in phylogenetic analysis for those who have little or no prior expertise in sequence analysis. An 'Evolutionary Hypothesis Testing' is targeted to participants who are well familiar with alignments and phylogenetic trees, and would like to extend their expertise to likelihood and Bayesian inference in phylogenetics, coalescent and phylogeographic analyses ('phylodynamics') and molecular adaptation. A 'Large Dataset Analysis' module will cover the more complex analysis of full genomes, huge datasets of pathogens including Next Generation Sequencing data, and combined analyses of pathogen and host. Practical sessions in these modules will involve software like, PHYLIP, PAUP*, PHYML, MEGA, PAML or HYPHY, TREE-PUZZLE, SplitsTree, BEAST, MrBayes Simplot and RDP3.
We recommend participants to buy The Phylogenetic Handbook as a guide during the workshop, and to bring their own data set.
The abstract and application deadline is March 15th
Selections will be made by the beginning of May.
The registration fee of 850 Euro covers attendance, lunches and coffee breaks. Participation is limited to 30 scientists in each module and is dependent on a selection procedure based on the submitted abstract and statement of motivation. A limited number of grants are available for scientists who experience difficulties to attend because of financial reasons.
Selection criteria: (in order of importance)
Quality of the abstract: abstracts will be reviewed and priority will be given to applicants who are first author on the abstract.Letter of motivation: how urgent/important is your need for training?Each module is preferably restricted to 1 participant from the same lab.Priority will be given to participants from countries with limited resources.
Grant criteria: (in order of importance) Priority to countries with limited resources.Ranking according to the abstract quality.
Additional information and application forms are available on our website: http://www.rega.kuleuven.be/cev/veme-workshop/2015
We are confident that this course meets the needs of many molecular virologists and epidemiologists, and hope we can assist you in your search for training in Bioinformatics methods.
Christine Carrington, Karen Nelson and Annemie Vandamme Organizers of the workshop
March 2, 2015
January 27, 2015
I developed an R/Bioconductor package, ggtree, for phylogenetic tree visualization.
You can refer to the online document: http://www.bioconductor.org/packages/3.1/bioc/vignettes/ggtree/inst/doc/ggtree.html
January 26, 2015
This paper appeared in a journal I don't commonly read, so I wanted to highlight it. The ideas are not new (as they acknowledge) but it's a good reminder that we should be fighting model mis-specification on all fronts. Comments, anyone?www.ncbi.nlm.nih.gov Correcting for sequencing error in maximum likelihood phylogeny inference. MK Kuhner and J McGill, G3 (Bethesda, Md.), 2014
Accurate phylogenies are critical to taxonomy as well as studies of speciation processes and other evolutionary patterns. Accurate branch lengths in phylogenies are critical for dating and rate measurements. Such accuracy may be jeopardized by unacknowledged sequencing error. We use simulated data to test a correction for DNA sequencing error in maximum likelihood phylogeny inference. Over a wide range of data polymorphism and true error rate, we found that correcting for sequencing error improves recovery of the branch lengths, even if the assumed error rate is up to twice the true error rate. Low error rates have little effect on recovery of the topology. When error is high, correction improves topological inference; however, when error is extremely high, using an assumed error rate greater than the true error rate leads to poor recovery of both topology and branch lengths. The error correction approach tested here was proposed in 2004 but has not been widely used, perhaps because researchers do not want to commit to an estimate of the error rate. This study shows that correction with an approximate error rate is generally preferable to ignoring the issue.
January 24, 2015
MCEB - Mathematical and Computational Evolutionary Biology 21-25 June 2015 - Porquerolles Island, South of France.
Pre-registration deadline: February 10th Notification to applicants: February 28th Final list of attendees: April 1st
Scope: Mathematical and computational tools and concepts form an essential basis for modern evolutionary studies. The goal of the MCEB conference (at its 7th edition) is to bring together scientists with diverse backgrounds to present recent advances and discuss open problems in the field of mathematical and computational evolutionary biology. The theme of this year?s edition will be new data, new questions, new methods. New generation sequencing techniques have multiplied not just the amount, but also the types of genetic data produced, giving rise to new questions, and new methodologies to answer them. These methodologies are often cross-disciplinary, with applications to diverse research topics. General concepts, models, methods and algorithms will also be presented and discussed, just as during the previous conference editions.
Where and when: Porquerolles Island, near Hy?res, in the South of France, 21-25 June 2015.
Cost: Conference fees including accommodation for four nights, meals, coffee breaks, etc., will be between 300? and 630?, all inclusive, and will vary depending on the room. PhD students and postdocs will benefit of the cheapest rooms.
David Bryant - http://www.maths.otago.ac.nz/~dbryant/ University of Otago, NZ Recovering phylogeny and demographics from SNPs: prospects and limitations
Jukka Corander - http://www.helsinki.fi/bsg/ Bayesian Statistics Group, University of Helsinki, FI ABC meets machine learning - fitting intractable models to genome data
Asger Hobolth - http://www.daimi.au.dk/~asger/ Bioinformatics Research Center (BiRC), Aarhus University, DK Modelling DNA sequence evolution within and between species
Philippe Lemey https://rega.kuleuven.be/cev/ecv/lab-members/PhilippeLemey.html Rega Institute, Clinical and Epidemiological Virology, BE Data integrating in viral evolutionary inference: from spatial dynamics to trait evolution
Bernard Moret - http://lcbb.epfl.ch/ Laboratory for Computational Biology and Bioinformatics, EPFL, CH Phylogenetic Transfer of Knowledge
Ludovic Orlando - http://geogenetics.ku.dk/research_groups/palaeomix_group/ Center for GeoGenetics, Natural History Museum of Denmark, DK Ancient DNA: from very old molecules to genomes and epigenomes
Molly Przeworski - http://przeworski.c2b2.columbia.edu/ Columbia University, New york, USA A population-genetic approach to the study of mutation and recombination in humans
For more information, visit the website at: http://www.lirmm.fr/mceb2015/
January 22, 2015
I've been experimenting with drawing "geophylogenies" on web maps and have created a live demo at http://iphylo.org/~rpage/geojson-phylogeny-demo/ All a bit crude, but the idea is to take either a NEXUS tree file with added geographical coordinates, or query the BOLD database of DNA barcodes (most of which are geotagged) and create an interactive phylogeny on a map. The demo uses OpenStreeMap, I've code for Google Maps as well (hope to add this to the demo shortly). The layout borrows from GenGIS (see http://dx.doi.org/10.1101/gr.095612.109 and http://kiwi.cs.dal.ca/GenGIS/Main_Page ) but doesn't need a standalone program. There are some other advantages to using GeoJSON, such as storage in document databases like CouchDB, but I'll blog about these.
To give you a sense of the visualisation, below is the classic Banza example that I used in 2007 when I started playing with trees on Google Earth http://iphylo.blogspot.co.uk/2007/06/google-earth-phylogenies.html, inspired by Bill Piel's pioneering experiments.
January 13, 2015
I had no idea that this was coming down the pipe, but was excited to see a paper come out describing Phycas (http://www.phycas.org) from its authors, including @mtholder. Code is at https://github.com/plewis/phycas.www.ncbi.nlm.nih.gov Phycas: Software for Bayesian Phylogenetic Analysis. PO Lewis, MT Holder and DL Swofford, Systematic biology, Jan 9 2015
Phycas is open source, freely available Bayesian phylogenetics software written primarily in C++ but with a Python interface. Phycas specializes in Bayesian model selection for nucleotide sequence data, particularly the estimation of marginal likelihoods, central to computing Bayes Factors. Marginal likelihoods can be estimated using newer methods (Thermodynamic Integration and Generalized Steppingstone) that are more accurate than the widely used harmonic mean estimator. In addition, Phycas supports two posterior predictive approaches to model selection: Gelfand-Ghosh and Conditional Predictive Ordinates. The GTR family of substitution models, as well as a codon model, are available, and data can be partitioned with all parameters unlinked except tree topology and edge lengths. Phycas provides for analyses in which the prior on tree topologies allows polytomous trees as well as fully resolved trees, and provides for several choices for edge length priors, including a hierarchical model as well as the recently described compound Dirichlet prior, which helps avoid overly informative induced priors on tree length.
I haven't played with Phycas, but I can describe what I think of as its "killer feature," which is the ability to use a prior incorporating polytomous trees. That means that the software can return trees that look like this:
where I've put an arrow pointing at the polytomy. In this case, the polytomy shows three descendants of a given lineage.
I would argue that such a representation is a more honest one (a "shrunken" estimate for @nicolas_lartill and friends). That is, if there is not information to resolve an internal node, then an unresolved tree is returned. One often sees such nodes when people collapse nodes in ML trees that have low bootstrap support. However, I think that it's better than that because it's rolled into the actual inference, meaning that the overall likelihoods are properly estimated. In a field where we are already in statistically tenuous territory with the number of discrete parameters being on the order of the number of independent data points (to say nothing of the discrete estimation part) having fewer parameters when appropriate is refreshing.
In the Bayesian phylogenetics world, not allowing such multifurcations can cause some trouble, as was the subject of some research in the mid-2000's. A good culmination of that work is this paper by Z Yang:www.ncbi.nlm.nih.gov Fair-balance paradox, star-tree paradox, and Bayesian phylogenetics. Z Yang, Molecular biology and evolution, Aug 2007
The star-tree paradox refers to the conjecture that the posterior probabilities for the three unrooted trees for four species (or the three rooted trees for three species if the molecular clock is assumed) do not approach 1/3 when the data are generated using the star tree and when the amount of data approaches infinity. It reflects the more general phenomenon of high and presumably spurious posterior probabilities for trees or clades produced by the Bayesian method of phylogenetic reconstruction, and it is perceived to be a manifestation of the deeper problem of the extreme sensitivity of Bayesian model selection to the prior on parameters. Analysis of the star-tree paradox has been hampered by the intractability of the integrals involved. In this article, I use Laplacian expansion to approximate the posterior probabilities for the three rooted trees for three species using binary characters evolving at a constant rate. The approximation enables calculation of posterior tree probabilities for arbitrarily large data sets. Both theoretical analysis of the analogous fair-coin and fair-balance problems and computer simulation for the tree problem confirmed the existence of the star-tree paradox. When the data size n --> infinity, the posterior tree probabilities do not converge to 1/3 each, but they vary among data sets according to a statistical distribution. This distribution is characterized. Two strategies for resolving the star-tree paradox are explored: (1) a nonzero prior probability for the degenerate star tree and (2) an increasingly informative prior forcing the internal branch length toward zero. Both appear to be effective in resolving the paradox, but the latter is simpler to implement. The posterior tree probabilities are found to be very sensitive to the prior.
The point of this work is that if the data truly doesn't have any signal concerning an unresolved node, and we use a Bayesian phylogenetic inference package that doesn't allow multifurcations (every one except for Phycas as far as I know) the data set may (with non-vanishing probability) give very high confidence that one of the resolutions is correct. This is shown in this figure from Yang's paper (click to expand):
See the deep red in the corners? That's showing that in replicate data sets there is a substantial probability that one of the resolutions will be very highly supported.
So, I can't comment on Phycas' usability, runtimes, etc, and Conditional Predictive Ordinates sound nice, but for me, proper support for polytomies is the headlining feature of Phycas. If anyone tries it out, please post here!
December 28, 2014
This morning I stumbled across this paper, which uses discrete Morse theory to analyze a space of "tree with symmetries". This paper has gotten almost no citations in the phylogenetics literature, but I think it shows some methods that may be interesting for folks interested in the large scale structure of tree space:
The topology of spaces of phylogenetic trees with symmetry ☆ Axel Hultman
Natural Dowling analogues of the complex of phylogenetic trees are studied. Using discrete Morse theory, we find their homotopy types. In the process, the homotopy types of certain subposets of Dowling lattices are determined.
The Genealogical World of Phylogenetic Networks
BMC Evolutionary Biology