There are currently 0 users and 43 guests online.
Last update12 min 16 sec ago
December 16, 2014
Erick Matsen wrote:
Come do an independent postdoc collaborating with @trvrb, me, or another computational biologist at the Fred Hutch.Mahan Postdoctoral Fellowship
The Computational Biology Program of Fred Hutchinson Cancer Research Center in Seattle, Washington invites applications for the 2-year Mahan Postdoctoral fellowship. The fellowship will provide an exceptional individual with an early start on their career as an independent scientist by providing two years of salary and other support to complete their proposed research project in the laboratory of a Fred Hutch Computational Biologist mentor who is at the assistant or associate rank.
Faculty of any discipline or rank from the Fred Hutch, UW, or any other institute may be proposed as co-mentors. The fellowship must begin in the lab of the primary Fred Hutch mentor but it may move to another location as long as it benefits the science or career growth opportunities. The project must be focused on learning about biology, must involve a computational or mathematical component, and may include an experimental component. A laboratory trained scientist may satisfy the computational and mathematical requirement by including a training component in their proposal. Computationally strong candidates may include a laboratory training component as well. The research direction should reflect the interests and ideas of the applicant, although the final research proposal may be jointly designed; see below for more detail on fellowship rules and for a list of potential mentors.
See official announcement for more details, and get in touch if you are interested.
December 11, 2014
Andrew Roger wrote:
Does anyone know how to export the ancestral state reconstructions at nodes in trees estimated by PAUP*? The "Reconstruct" command works to generate them, but there doesn't seem to be any straightforward way to export a list of nodes with the ancestral 'sequences' at each node...or maybe there is, but its not obvious (help?)
I am running a programming practical in a CS department with some students at the Master's level. The students essentially re-implemented J. Huelsenbeck's paper that tests all 200 someting possible time-reversible models (see http://mbe.oxfordjournals.org/content/21/6/1123.full) but under Maximum Likelihood using our likelihood library.
Instead of having them write a report, I'd rather like to publish the availability of this little tool somewhere, with a focus on what was done and also on the teaching aspects.
Do you have any suggestions where this could be submitted?
December 3, 2014
December 2, 2014
Brian Foley wrote:
For one example, the molecular phylogeny of primates in this paper is stored in TreeBase: http://www.treebase.org/treebase-web/search/study/summary.html?id=12186
The tree is available in several formats: http://www.treebase.org/treebase-web/search/study/trees.html?id=12186
But I cannot get FigTree (version 1.4.0, on a Mac OSX-8 machine) to open any of those file formats.
it says Error: Unknown command "TITLE" in TREES block.
I can open the NEXUS Treefile and edit it by hand, but it seems to me that there should be a way to have FigTree ignore commands that it does not recognize, with a warning that it is doing so, rather than just refusing to open the tree altogether.
Roderic D M Page wrote:
Members might be interested in the GBIF Ebbe Nielsen Challenge which has just been launched http://www.gbif.org/page/62262 (to actually enter see http://gbif.challengepost.com/ ). There's €30,000 in total prize money, and three months to develop an application, tool, or visualisation that makes use of GBIF-mediated data (if you make it through the first round at the end of three months, you get a few more months to perfect it in time for the finals). The judging panel includes a mix of GBIF-associated people like myself (I chair the GBIF Science Committee), and non-GBIF people such as Lucas Joppa from Microsoft, and Mark Klein from NatureServe.
Those of you who've worked with GBIF data may think of it as solely about distributional data for organisms, but I personally think that there's a lot of scope for adding a phylogenetic perspective to that data. GBIF now also has imported several million geotagged GenBank sequences (albeit not with some issues). So if you have some cool ideas on how to link phylogeny, genomics, and geography, here's a chance to realise those ideas (and maybe win some prize money).
November 27, 2014
Erick Matsen wrote:
Up to about about a week ago, we were using the old version of the Google OAuth API. This being phased out. We have now moved to OAuth2.
A user reported having trouble logging in and getting a Error: redirect_uri_mismatch error. However, after reloading a number of times the problem was resolved.
Let me know if you have persistent problems. Thanks.
November 25, 2014
Andrew Roger wrote:
I'm wondering if those of you developing phylogenetic software tools have thought much about the problem of how we will develop software tools that are able to handle both large numbers of taxa (>100) and large numbers of sites (super-matrices of 50,000 positions and more) but at the same time implement complex substitution models. I think the problem is especially acute for complex 'site-heterogeneous' mixture models such as Lartillot's CAT models or even ones with a set number of classes (e.g. 10-60 as in Lartillot and Gascuel's C-series models) or models that involve large matrices such as codon models or the covarion-type models. Additional computational complexity is introduced by partitioned models where different parameters are allowed to be estimated for different partitions (e.g. edgelengths for different genes in a super-matrix). My concern is that currently large complex data sets cannot be analyzed with the best substitution models currently because the the computational time required to evaluate the likelihoods are prohibitive even with relatively small data sets. Tree-searching becomes nearly impossible under these conditions because of the time required.
I have been using DIVERGE (DetectIng Variability in Evolutionary Rates among GEnes) to test for divergence in function across different clades of a protein superfamily.
I have noticed that when using the 2001 algorithm and the 1999 algorithm (although to a lesser extent) I get negative values for thetaML = the Maximum likelihood estimate of the coefficient of functional divergence (normally lies between 0 and 1). I then get an LRT theta value (likelihood ratio test against the null) of '-1.#IND00' or 'not a number'.
I emailed the developer but had no response yet. I suspect it may have something to do with short branch lengths or something like that...
Has anyone ever used DIVERGE and have some insight as to why this might happen?
November 21, 2014
Meeting: NEXTGEN BIOINFORMATICS USER GROUP and SCOTTISH PHYLOGENY DISCUSSION GROUP
University of St Andrews, UK, 8 December 2014
Invited speaker -
Dr Jo DICKS (National Collection of Yeast Cultures http://www.ncyc.co.uk, Institute of Food Research):
"Estimating and exploiting yeast NGS-based phylogenies for industrial biotechnology".
Contributed talks -
Emma CARROLL: "Assessing the influence of migratory culture on connectivity in the southern right whale".
Deepali BASOYA: "Viral/host gene expression profiles in lymphoid and feather follicle epithelial (FFE) cells infected with Marek's disease virus".
Miguel PINHEIRO: "Determine dimorphic nature of the zoonotic parasite Plasmodium knowlesi".
Georgios KOUTSOVOULOS: "Reconstructing the phylogenetic relationships of nematodes using draft genomes and transcriptomes".
Joanne TAYLOR: "Environment and host genotype influence on fungal endophyte assemblages of Scots Pine".
Attendance is free, but please register in advance.
DETAILS AND REGISTRATION:
Daniel Barker firstname.lastname@example.org
November 17, 2014
November 9, 2014
[Paper] Towards more accurate ancestral protein genotype-phenotype reconstructions with the use of species tree-aware gene trees
Erick Matsen wrote:
Using species-tree aware gene trees for ancestral reconstruction is a good thing!www.ncbi.nlm.nih.gov Towards more accurate ancestral protein genotype-phenotype reconstructions with the use of species tree-aware gene trees. M Groussin, JK Hobbs, GJ Szöllősi, S Gribaldo, VL Arcus and M Gouy, Molecular biology and evolution, Nov 4 2014
The resurrection of ancestral proteins provides direct insight into how natural selection has shaped proteins found in nature. By tracing substitutions along a gene phylogeny, ancestral proteins can be reconstructed in silico and subsequently synthesized in vitro. This elegant strategy reveals the complex mechanisms responsible for the evolution of protein functions and structures. However, to date, all protein resurrection studies have used simplistic approaches for ancestral sequence reconstruction (ASR), including the assumption that a single sequence alignment alone is sufficient to accurately reconstruct the history of the gene family. The impact of such shortcuts on conclusions about ancestral functions has not been investigated. Here, we show with simulations that utilizing information on species history using a model that accounts for the duplication, horizontal transfer and loss (DTL) of genes statistically increases ASR accuracy. This underscores the importance of the tree topology in the inference of putative ancestors. We validate our in silico predictions using in vitro resurrection of the LeuB enzyme for the ancestor of the Firmicutes, a major and ancient bacterial phylum. With this particular protein, our experimental results demonstrate that information on the species phylogeny results in a biochemically more realistic and kinetically more stable ancestral protein. Additional resurrection experiments with different proteins are necessary to statistically quantify the impact of using species tree-aware gene trees on ancestral protein phenotypes. Nonetheless, our results suggest the need for incorporating both sequence and DTL information in future studies of protein resurrections to accurately define the genotype-phenotype space in which proteins diversify.
October 28, 2014
Erick Matsen wrote:
@cwhidden and I would like to sample from the subtree-prune-regraft (SPR) random walk on rooted phylogenetic trees. Does anyone know an easy way to do this? Chris could roll his own, but I'll bet that there is an easy solution out there.
If we wanted to sample from the random walk on unrooted trees, we could sample from the MrBayes prior. BEAST is rooted, which is nice, but has non-uniform priors on topologies. @mlandis would this be easy with revBayes?
October 15, 2014
Erick Matsen wrote:
New from @alexei_drummond and his postdoc:
The space of ultrametric phylogenetic trees by Alex Gavruskin, Alexei J. Drummond
We introduce two metric spaces on ultrametric phylogenetic trees and compare them with existing models of tree space. We formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered.
I haven't read it in detail, but it seems that the most version of the space that is most natural for time-trees (their t-space) has properties that make it mathematically difficult to analyze. The combinatorial machinery that helped out with the BHV space doesn't help here.
Theorem 8. The problem of computing geodesics in t-space is NP-hard. We will reduce the problem of computing NNI-distance to the problem of computing geodesics in t-space, but before going on to the proof of this result, we would like to develop some intuition of why t-space is so different from both BHV and τ -space. The key property for this difference is that the cone-path is rarely a geodesic in t-space. Indeed, in both BHV and τ - space the position of two cubes can result in a cone-path being the geodesic between any pair of trees from these cubes. Particularly, the measure of the set of pairs of trees between which the cone-path is a geodesic is positive. For example, if two trees T and R have topologies with no compatible splits, then the geodesic between T and R is a cone-path . A property such as this does not present in t-space. It will follow from the observations below that the measure of the set of pairs of trees between which the geodesic is a cone-path in t-space has measure 0.
I know @cwhidden has been reading it so perhaps he'll post some observations.
October 10, 2014
Erick Matsen wrote:
From Gascuel & co--www.ncbi.nlm.nih.gov Searching for virus phylotypes. F Chevenet, M Jung, M Peeters, T de Oliveira and O Gascuel, Bioinformatics (Oxford, England), Mar 2013 1
Large phylogenies are being built today to study virus evolution, trace the origin of epidemics, establish the mode of transmission and survey the appearance of drug resistance. However, no tool is available to quickly inspect these phylogenies and combine them with extrinsic traits (e.g. geographic location, risk group, presence of a given resistance mutation), seeking to extract strain groups of specific interest or requiring surveillance.
News to me!
October 6, 2014
Rob Lanfear wrote:
I'm wondering what software folks use for automatically aligning sequences and then manually editing those alignments on macs?
I know there's lots of software out there, but I'm wondering if there's something I've missed. In principle I like the offerings in Geneious (it includes various plug-ins for automated alignment, and a very serviceable manual editor), but the pricetag is a bit steep if that's all you want it for...
Fast log likelihood given a fixed alignment, fixed tree, and fixed huge arbitrary sparse transition rate matrix
I know some ways to compute this, but I wonder who has the best current implementation? This would just be a tool for methods development testing rather than for doing anything practical, for example it wouldn't estimate anything and it wouldn't need to know anything about biology.
September 30, 2014
Erick Matsen wrote:
New from @tanja_stadler and co:www.ncbi.nlm.nih.gov On age and species richness of higher taxa. T Stadler, DL Rabosky, RE Ricklefs and F Bokma, The American naturalist, Oct 2014
Abstract Many studies have tried to identify factors that explain differences in numbers of species between clades against the background assumption that older clades contain more species because they have had more time for diversity to accumulate. The finding in several recent studies that species richness of clades is decoupled from stem age has been interpreted as evidence for ecological limits to species richness. Here we demonstrate that the absence of a positive age-diversity relationship, or even a negative relationship, may also occur when taxa are defined based on time or some correlate of time such as genetic distance or perhaps morphological distinctness. Thus, inferring underlying processes from distributions of species across higher taxa requires caution concerning the way in which higher taxa are defined. When this definition is unclear, crown age is superior to stem age as a measure of clade age.
They were thinking about what models might not have a monotonically positive age-diversity relationship for clades:
Several studies have investigated relations between species richness and ages of higher taxa. Three methodological articles (Magallón and Sanderson 2001; Bokma 2003; Paradis 2003) prominently featuring the idea that E[n] = e(λ − μ)t have together been cited by more than 500 articles. Furthermore, Rabosky et al. (2012) investigated the behavior of a simple model where higher taxa originate under a Poisson process (see also Aldous et al. 2008; Maruvka et al. 2013). They found that such a model was expected to result in positive relationships between stem clade age and species richness, even when rates of species diversification varied among clades, provided that rates within clades were constant through time. As we have shown here, the expectation of a positive relationship between stem age and species richness may be incorrect, as it depends on the particular model of diversification and definition of higher taxa.
Many studies have identified young taxa as “unexpectedly” species rich, but our results show that such patterns can result from the manner in which higher taxa are delimited. For example, under scenarios i-b and ii-b, clades with young stem ages are expected to contain not fewer but more species than clades with old stem ages (table 1). In other words, studies may have incorrectly identified young taxa as unexpectedly species rich because they neglected how taxa were defined, and consequently incorrectly expected young taxa to be species poor.
Here is the model they consider:
September 29, 2014
Erick Matsen wrote:haldanessieve.org Author post: Predicting evolution from the shape of genealogical trees
Here's what I see as the essentials of their model:
September 26, 2014
The Genealogical World of Phylogenetic Networks
BMC Evolutionary Biology