phylobabble.org

Latest topics

URL

XML feed
http://www.phylobabble.org/latest

Last update

12 min 16 sec ago

December 16, 2014

06:45

Erick Matsen wrote:

Come do an independent postdoc collaborating with @trvrb, me, or another computational biologist at the Fred Hutch.

Mahan Postdoctoral Fellowship

The Computational Biology Program of Fred Hutchinson Cancer Research Center in Seattle, Washington invites applications for the 2-year Mahan Postdoctoral fellowship. The fellowship will provide an exceptional individual with an early start on their career as an independent scientist by providing two years of salary and other support to complete their proposed research project in the laboratory of a Fred Hutch Computational Biologist mentor who is at the assistant or associate rank.

Faculty of any discipline or rank from the Fred Hutch, UW, or any other institute may be proposed as co-mentors. The fellowship must begin in the lab of the primary Fred Hutch mentor but it may move to another location as long as it benefits the science or career growth opportunities. The project must be focused on learning about biology, must involve a computational or mathematical component, and may include an experimental component. A laboratory trained scientist may satisfy the computational and mathematical requirement by including a training component in their proposal. Computationally strong candidates may include a laboratory training component as well. The research direction should reflect the interests and ideas of the applicant, although the final research proposal may be jointly designed; see below for more detail on fellowship rules and for a list of potential mentors.

See official announcement for more details, and get in touch if you are interested.

Posts: 1

Participants: 1

Read full topic

December 11, 2014

12:33

Andrew Roger wrote:

Does anyone know how to export the ancestral state reconstructions at nodes in trees estimated by PAUP*? The "Reconstruct" command works to generate them, but there doesn't seem to be any straightforward way to export a list of nodes with the ancestral 'sequences' at each node...or maybe there is, but its not obvious (help?)

Posts: 1

Participants: 1

Read full topic

09:31

Alexandros_Stam wrote:

Dear All,

I am running a programming practical in a CS department with some students at the Master's level. The students essentially re-implemented J. Huelsenbeck's paper that tests all 200 someting possible time-reversible models (see http://mbe.oxfordjournals.org/content/21/6/1123.full) but under Maximum Likelihood using our likelihood library.

Instead of having them write a report, I'd rather like to publish the availability of this little tool somewhere, with a focus on what was done and also on the teaching aspects.

Do you have any suggestions where this could be submitted?

Thank you,

Alexis

Posts: 2

Participants: 2

Read full topic

December 3, 2014

08:48

Roderic D M Page wrote:

Katie Davis and Jon Hill are holding a workshop on their supertree toolkit at Bath (UK). See announcement on Syst Biol web site http://systbio.org/?q=node/439

Posts: 1

Participants: 1

Read full topic

December 2, 2014

13:31

Brian Foley wrote:

For one example, the molecular phylogeny of primates in this paper is stored in TreeBase: http://www.treebase.org/treebase-web/search/study/summary.html?id=12186

The tree is available in several formats: http://www.treebase.org/treebase-web/search/study/trees.html?id=12186

But I cannot get FigTree (version 1.4.0, on a Mac OSX-8 machine) to open any of those file formats.

it says Error: Unknown command "TITLE" in TREES block.

I can open the NEXUS Treefile and edit it by hand, but it seems to me that there should be a way to have FigTree ignore commands that it does not recognize, with a warning that it is doing so, rather than just refusing to open the tree altogether.

Posts: 2

Participants: 2

Read full topic

09:14

Roderic D M Page wrote:

Members might be interested in the GBIF Ebbe Nielsen Challenge which has just been launched http://www.gbif.org/page/62262 (to actually enter see http://gbif.challengepost.com/ ). There's €30,000 in total prize money, and three months to develop an application, tool, or visualisation that makes use of GBIF-mediated data (if you make it through the first round at the end of three months, you get a few more months to perfect it in time for the finals). The judging panel includes a mix of GBIF-associated people like myself (I chair the GBIF Science Committee), and non-GBIF people such as Lucas Joppa from Microsoft, and Mark Klein from NatureServe.

Those of you who've worked with GBIF data may think of it as solely about distributional data for organisms, but I personally think that there's a lot of scope for adding a phylogenetic perspective to that data. GBIF now also has imported several million geotagged GenBank sequences (albeit not with some issues). So if you have some cool ideas on how to link phylogeny, genomics, and geography, here's a chance to realise those ideas (and maybe win some prize money).

Posts: 1

Participants: 1

Read full topic

November 27, 2014

08:21

Erick Matsen wrote:

Hello Babblers--

Up to about about a week ago, we were using the old version of the Google OAuth API. This being phased out. We have now moved to OAuth2.

A user reported having trouble logging in and getting a Error: redirect_uri_mismatch error. However, after reloading a number of times the problem was resolved.

Let me know if you have persistent problems. Thanks.

Posts: 1

Participants: 1

Read full topic

November 25, 2014

11:54

Andrew Roger wrote:

I'm wondering if those of you developing phylogenetic software tools have thought much about the problem of how we will develop software tools that are able to handle both large numbers of taxa (>100) and large numbers of sites (super-matrices of 50,000 positions and more) but at the same time implement complex substitution models. I think the problem is especially acute for complex 'site-heterogeneous' mixture models such as Lartillot's CAT models or even ones with a set number of classes (e.g. 10-60 as in Lartillot and Gascuel's C-series models) or models that involve large matrices such as codon models or the covarion-type models. Additional computational complexity is introduced by partitioned models where different parameters are allowed to be estimated for different partitions (e.g. edgelengths for different genes in a super-matrix). My concern is that currently large complex data sets cannot be analyzed with the best substitution models currently because the the computational time required to evaluate the likelihoods are prohibitive even with relatively small data sets. Tree-searching becomes nearly impossible under these conditions because of the time required.
Does anyone have good ideas of how to get around these problems with large data sets?

Posts: 13

Participants: 5

Read full topic

10:37

wrote:

I have been using DIVERGE (DetectIng Variability in Evolutionary Rates among GEnes) to test for divergence in function across different clades of a protein superfamily.

I have noticed that when using the 2001 algorithm and the 1999 algorithm (although to a lesser extent) I get negative values for thetaML = the Maximum likelihood estimate of the coefficient of functional divergence (normally lies between 0 and 1). I then get an LRT theta value (likelihood ratio test against the null) of '-1.#IND00' or 'not a number'.

I emailed the developer but had no response yet. I suspect it may have something to do with short branch lengths or something like that...

Has anyone ever used DIVERGE and have some insight as to why this might happen?

Much appreciated!

Rosie

Posts: 1

Participants: 1

Read full topic

November 21, 2014

10:14

Db60 wrote:

Meeting: NEXTGEN BIOINFORMATICS USER GROUP and SCOTTISH PHYLOGENY DISCUSSION GROUP

University of St Andrews, UK, 8 December 2014

https://genomics.ed.ac.uk/ngbug/next-meeting-st-andrews

Invited speaker -

Dr Jo DICKS (National Collection of Yeast Cultures http://www.ncyc.co.uk, Institute of Food Research):

"Estimating and exploiting yeast NGS-based phylogenies for industrial biotechnology".

Contributed talks -

Emma CARROLL: "Assessing the influence of migratory culture on connectivity in the southern right whale".

Deepali BASOYA: "Viral/host gene expression profiles in lymphoid and feather follicle epithelial (FFE) cells infected with Marek's disease virus".

Miguel PINHEIRO: "Determine dimorphic nature of the zoonotic parasite Plasmodium knowlesi".

Georgios KOUTSOVOULOS: "Reconstructing the phylogenetic relationships of nematodes using draft genomes and transcriptomes".

Joanne TAYLOR: "Environment and host genotype influence on fungal endophyte assemblages of Scots Pine".

Attendance is free, but please register in advance.

DETAILS AND REGISTRATION:

https://genomics.ed.ac.uk/ngbug/next-meeting-st-andrews

Daniel Barker db60@st-andrews.ac.uk

Posts: 1

Participants: 1

Read full topic

November 17, 2014

18:52

Erick Matsen wrote:

The next series on http://phyloseminar.org will be about ancestral recombination graphs.

What would people like to see after that?

Posts: 1

Participants: 1

Read full topic

November 9, 2014

07:48

Erick Matsen wrote:

Using species-tree aware gene trees for ancestral reconstruction is a good thing!

www.ncbi.nlm.nih.gov Towards more accurate ancestral protein genotype-phenotype reconstructions with the use of species tree-aware gene trees. M Groussin, JK Hobbs, GJ Szöllősi, S Gribaldo, VL Arcus and M Gouy, Molecular biology and evolution, Nov 4 2014

The resurrection of ancestral proteins provides direct insight into how natural selection has shaped proteins found in nature. By tracing substitutions along a gene phylogeny, ancestral proteins can be reconstructed in silico and subsequently synthesized in vitro. This elegant strategy reveals the complex mechanisms responsible for the evolution of protein functions and structures. However, to date, all protein resurrection studies have used simplistic approaches for ancestral sequence reconstruction (ASR), including the assumption that a single sequence alignment alone is sufficient to accurately reconstruct the history of the gene family. The impact of such shortcuts on conclusions about ancestral functions has not been investigated. Here, we show with simulations that utilizing information on species history using a model that accounts for the duplication, horizontal transfer and loss (DTL) of genes statistically increases ASR accuracy. This underscores the importance of the tree topology in the inference of putative ancestors. We validate our in silico predictions using in vitro resurrection of the LeuB enzyme for the ancestor of the Firmicutes, a major and ancient bacterial phylum. With this particular protein, our experimental results demonstrate that information on the species phylogeny results in a biochemically more realistic and kinetically more stable ancestral protein. Additional resurrection experiments with different proteins are necessary to statistically quantify the impact of using species tree-aware gene trees on ancestral protein phenotypes. Nonetheless, our results suggest the need for incorporating both sequence and DTL information in future studies of protein resurrections to accurately define the genotype-phenotype space in which proteins diversify.

Posts: 1

Participants: 1

Read full topic

October 28, 2014

13:16

Erick Matsen wrote:

Fellow babblers,

@cwhidden and I would like to sample from the subtree-prune-regraft (SPR) random walk on rooted phylogenetic trees. Does anyone know an easy way to do this? Chris could roll his own, but I'll bet that there is an easy solution out there.

If we wanted to sample from the random walk on unrooted trees, we could sample from the MrBayes prior. BEAST is rooted, which is nice, but has non-uniform priors on topologies. @mlandis would this be easy with revBayes?

Thanks!

Posts: 10

Participants: 4

Read full topic

October 15, 2014

17:39

Erick Matsen wrote:

New from @alexei_drummond and his postdoc:

The space of ultrametric phylogenetic trees by Alex Gavruskin, Alexei J. Drummond

We introduce two metric spaces on ultrametric phylogenetic trees and compare them with existing models of tree space. We formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered.

http://arxiv.org/abs/1410.3544

I haven't read it in detail, but it seems that the most version of the space that is most natural for time-trees (their t-space) has properties that make it mathematically difficult to analyze. The combinatorial machinery that helped out with the BHV space doesn't help here.

Theorem 8. The problem of computing geodesics in t-space is NP-hard. We will reduce the problem of computing NNI-distance to the problem of computing geodesics in t-space, but before going on to the proof of this result, we would like to develop some intuition of why t-space is so different from both BHV and τ -space. The key property for this difference is that the cone-path is rarely a geodesic in t-space. Indeed, in both BHV and τ - space the position of two cubes can result in a cone-path being the geodesic between any pair of trees from these cubes. Particularly, the measure of the set of pairs of trees between which the cone-path is a geodesic is positive. For example, if two trees T and R have topologies with no compatible splits, then the geodesic between T and R is a cone-path [3]. A property such as this does not present in t-space. It will follow from the observations below that the measure of the set of pairs of trees between which the geodesic is a cone-path in t-space has measure 0.

I know @cwhidden has been reading it so perhaps he'll post some observations.

Posts: 1

Participants: 1

Read full topic

October 10, 2014

15:17

Erick Matsen wrote:

From Gascuel & co--

www.ncbi.nlm.nih.gov Searching for virus phylotypes. F Chevenet, M Jung, M Peeters, T de Oliveira and O Gascuel, Bioinformatics (Oxford, England), Mar 2013 1

Large phylogenies are being built today to study virus evolution, trace the origin of epidemics, establish the mode of transmission and survey the appearance of drug resistance. However, no tool is available to quickly inspect these phylogenies and combine them with extrinsic traits (e.g. geographic location, risk group, presence of a given resistance mutation), seeking to extract strain groups of specific interest or requiring surveillance.

http://lamarck.lirmm.fr/phylotype/

News to me!

Posts: 1

Participants: 1

Read full topic

October 6, 2014

22:41

Rob Lanfear wrote:

Hi All,

I'm wondering what software folks use for automatically aligning sequences and then manually editing those alignments on macs?

I know there's lots of software out there, but I'm wondering if there's something I've missed. In principle I like the offerings in Geneious (it includes various plug-ins for automated alignment, and a very serviceable manual editor), but the pricetag is a bit steep if that's all you want it for...

Cheers,

Rob

Posts: 7

Participants: 6

Read full topic

12:27

argriffing wrote:

I know some ways to compute this, but I wonder who has the best current implementation? This would just be a tool for methods development testing rather than for doing anything practical, for example it wouldn't estimate anything and it wouldn't need to know anything about biology.

Posts: 3

Participants: 3

Read full topic

September 30, 2014

14:28

Erick Matsen wrote:

New from @tanja_stadler and co:

www.ncbi.nlm.nih.gov On age and species richness of higher taxa. T Stadler, DL Rabosky, RE Ricklefs and F Bokma, The American naturalist, Oct 2014

Abstract Many studies have tried to identify factors that explain differences in numbers of species between clades against the background assumption that older clades contain more species because they have had more time for diversity to accumulate. The finding in several recent studies that species richness of clades is decoupled from stem age has been interpreted as evidence for ecological limits to species richness. Here we demonstrate that the absence of a positive age-diversity relationship, or even a negative relationship, may also occur when taxa are defined based on time or some correlate of time such as genetic distance or perhaps morphological distinctness. Thus, inferring underlying processes from distributions of species across higher taxa requires caution concerning the way in which higher taxa are defined. When this definition is unclear, crown age is superior to stem age as a measure of clade age.

They were thinking about what models might not have a monotonically positive age-diversity relationship for clades:

Several studies have investigated relations between species richness and ages of higher taxa. Three methodological articles (Magallón and Sanderson 2001; Bokma 2003; Paradis 2003) prominently featuring the idea that E[n] = e(λ − μ)t have together been cited by more than 500 articles. Furthermore, Rabosky et al. (2012) investigated the behavior of a simple model where higher taxa originate under a Poisson process (see also Aldous et al. 2008; Maruvka et al. 2013). They found that such a model was expected to result in positive relationships between stem clade age and species richness, even when rates of species diversification varied among clades, provided that rates within clades were constant through time. As we have shown here, the expectation of a positive relationship between stem age and species richness may be incorrect, as it depends on the particular model of diversification and definition of higher taxa.

Many studies have identified young taxa as “unexpectedly” species rich, but our results show that such patterns can result from the manner in which higher taxa are delimited. For example, under scenarios i-b and ii-b, clades with young stem ages are expected to contain not fewer but more species than clades with old stem ages (table 1). In other words, studies may have incorrectly identified young taxa as unexpectedly species rich because they neglected how taxa were defined, and consequently incorrectly expected young taxa to be species poor.

Here is the model they consider:

Pasted image1186x674 129 KB

Posts: 7

Participants: 3

Read full topic

September 29, 2014

September 26, 2014

14:57

Erick Matsen wrote:

Hopefully we will get some nice simple shells out of this mess.

Posts: 3

Participants: 1

Read full topic