phylobabble.org

Latest topics

URL

XML feed
http://www.phylobabble.org/latest

Last update

1 min 4 sec ago

June 25, 2015

09:01

@ematsen wrote:

Interoperability, reproducibility, sharing, and reuse of phylogenetic data and software.

Posts: 1

Participants: 1

Read full topic

June 18, 2015

07:38

@arlin wrote:

Hi everyone. One of the consequences of NESCent closing is that its mailing lists will be going away in the coming months. I was part of 2 working groups that maintained email lists. The 2 lists include pretty much everyone who has attended a hackathon at NESCent (wg-phyloinformatics, which began in 2006, and hip, which began in 2011).

We considered just migrating to a new non-NESCent email list, but an alternative would be to encourage current list members to sign up on phylobabble. If we could start a topic on phyloinformatics, then presumably users could set up a feed and it would have the same immediacy as an email list.

What do you think? I welcome your thoughts on that idea.

Posts: 6

Participants: 3

Read full topic

June 11, 2015

14:45

@BrianFoley wrote:

Are there any distance calculators, such as DNAdist in PHYLIP, which can be set to treat ambiguity codes as a partial match? For example, I want a R to be counted as half a match to A or G. I believe that PHYLIP DNAdist counts R as a full match to either A or G.

For diploid organisms an "R" is usually indicating that one allele had A and the other G. But for populations such as a swarm of HIV-1 in a single patient, the R usually means that part of the population had A and the other part G.

Posts: 4

Participants: 4

Read full topic

June 10, 2015

09:16

@erikvolz wrote:

We are seeking software developer to assist with incorporating new models into BEAST:
https://goo.gl/Z1m9XC
This is a fixed 2-year position based at Imperial College. The focus is on software development, but it could potentially be a good fit for someone with a scientific background and extensive programming experience. Please circulate to anyone you feel would be interested & qualified.

Posts: 1

Participants: 1

Read full topic

June 9, 2015

06:54

@max wrote:

Dear phylobabblers,

I got invited to give a talk at the Brazilian Mathematics Colloquium (end of July) on the interface of Maths and Stats with phylogenetics/phylodynamics.

I'm now gathering ideas from fellow mathematicians and mathematical biologists on what exactly would catch the attention of an audience of mathematicians. I have spoken to Statisticians before, but never to hardcore pure mathematicians, so that's why I'm asking.

My initial ideas involve talking about the Kingman coalescent, and/or some of Susan Holmes's work on the geometry of tree space and/or the connections with macroscopic ODE-based models such as the SIR.

I'm looking specially at you, @cwhidden, @ematsen and @mathmomike.

Best,

Luiz

Posts: 5

Participants: 4

Read full topic

05:22

@Gadget wrote:

Hi Folks, I am generating phylogenies for a large number of GPCR gene families from a variety of organisms. A lot of these gene families contain 2-3000 genes, and I can run prottest on them. As they are transmembrane, the JTT+G model has the best fit for some of the alignments. However there are some with more than 4000 sequences, which prottest cannot handle. Does anyone know of an alternative multithreaded program to determine the best fit amino acid model of sequence evolution for large datasets? The 4000+ gene families are close orthologues of the smaller families, however this is surely not enough to justify assuming the larger families must be JTT too. Can anyone share some advice on how to proceed? Thank you!

Posts: 2

Participants: 2

Read full topic

June 4, 2015

16:45

@jeetsukumaran wrote:

The fourth major version series of the DendroPy Phylogenetic Computing Library has been released!

http://dendropy.org

Get it now with:

$ sudo pip install -U dendropy
  • DendroPy 4 runs under Python 2.7 and Python 3.x
  • Re-architectured and re-engineered from the ground up, yet preserving (as much as possible, though certainly not all) the public API of DendroPy 3.x.
  • MAJOR, MAJOR, MAJOR performance improvements in data file reading and processing! Newick and Nexus tree file parsing crazily optimized, with performance scaling at O(1) rather than O(N) or O(n^2) (i.e., in practical terms, you will see better performance improvements with bigger trees when comparing DendroPy 4 vs. DendroPy 3). A thousand-tip tree can be parsed in 0.1 seconds with DendroPy 4 vs. 0.2 seconds with DendroPy 3, while a one million-tip tree can be parsed in under two minutes with DendroPy 4, vs. over 4 days with DendroPy 3. These performance improvements will percolate down to all applications based on DendroPy, including, for example, SumTrees.
  • Tests, tests, tests, tests, and more tests! The core library has a stupendous amount of new tests added, and with each one the ability to zero in and identify, isolate, and deal with bugs is improved.
  • Related to above: dozens of nasty bugs have been dealt with. No, not killed, because we are not that kind of organization. Rather, they have been taken to the big testing farm in the quarantine zone where they can lead healthy lives munching on mock constructs and helping us test the the library to ensure that it works as advertised so that your code works as advertised.
  • Documentation, documentation, documentation! The goal is to have every public method, function, or class fully-documented.
  • Many, many, many, many new features: e.g., a high-performance TreeArray class, calculation of MCCT topologies, new simulation models, new tree statistics, new tree manipulation routines.
  • SumTrees works faster than ever before thanks to the above improvements, and also allows for many new operations such as rerooting the target tree, using an MCCT tree as the target topology, extensive extra information summarized, auto-detection of number of parallel processors etc.: http://dendropy.org/programs/sumtrees.html .
  • The newly rewritten DendroPy primer is just full of information to get you started: http://dendropy.org/primer/index.html .
  • The "work-in-progress" migration primer will help ease the transition from 3 to 4: http://dendropy.org/migration.html .
  • Comprehensive documentation of all the data formats supported, plus all the keyword arguments you can use to control and customize reading and writing in all these different formats: http://dendropy.org/schemas/index.html .
  • A glossary of terms, to clarify the simultaneously redundant and oversubscribed/conflicting terminological soup that characterizes a lot of phylogenetics: http://dendropy.org/glossary.html .

Posts: 1

Participants: 1

Read full topic

June 1, 2015

13:52

@db60 wrote:

THIRD INTERNATIONAL ENVIRONMENTAL 'OMICS SYNTHESIS CONFERENCE - IEOS2015

University of St Andrews 6-8 July 2015

http://environmentalomics.org/ieos2015

DEADLINE for registration and abstract submission (talk/poster): 6 June 2015

DEADLINE for application for postgraduate student bursaries: 6 June 2015

KEYNOTE SPEAKERS:

o Professor ELIZABETH THOMPSON, University of Washington

o Professor MARK BLAXTER, University of Edinburgh

o Professor BARBARA METHE, J Craig Venter Institute

INVITED SPEAKERS:

Dr LOGAN KISTLER, University of Warwick

Dr UMER ZEESHAN IJAZ, University of Glasgow

Professor JIANQUAN LIU – Lanzhou University / Sichuan University

Dr NATHAN BAILEY, University of St Andrews

STUDENT BURSARIES:

Postgraduate students who submit abstracts are eligible for a bursary, covering 100% of the registration fee. When submitting an abstract, students should indicate they with to be considered for a bursary. Successful applicants will receive a code for FREE registration.

IEOS2015:

The aim of this conference is to bring together researchers and organisations from a range of disciplines with shared interests in the development of new approaches for data handling, generation and analysis in environmental omics. Science areas of interest include bioinformatics, DNA-barcoding, genomics, metagenomics, metabarcoding, transcriptomics, proteomics, metabolomics, epigenetics, evolutionary and ecological omics, phylogenetics, study of ancient DNA and anthropology, new tools, resources and training, and beyond as applied to the study of the natural environmental and environmentally relevant organisms and systems. It is our hope is that the resulting interaction and exchange of ideas will lead to novel approaches, new collaborations and the consolidation of a wider integrated environmental 'omics community.

EOS and this conference are supported by Natural Environmental Research Council (NERC) through its Mathematics and Informatics for Environmental 'Omics Data Synthesis programme and the UK Science and Technology Facilities Council (STFC) Global Challenges programme.

SUMMER OF V'S

IEOS2015 attendees are also welcome at a separate meeting on Data Science, The Summer of V's, immediately preceding the main registration event for IEOS. Separate registration is required for the Summer of V's: http://www.idir.st-andrews.ac.uk/vs

IEOS2015: http://environmentalomics.org/ieos2015

With best wishes, The IEOS Conference Organising Committee http://environmentalomics.org/ieos2015-committee

The University of St Andrews is a charity registered in Scotland : No SC013532

Posts: 2

Participants: 1

Read full topic

May 28, 2015

10:16

@karen_cranston wrote:

The FuturePhy folks have funds for (among other things) a series of hackathons. Given that focused events seem more productive than general "hacking on phylogenetics" events, what topics in phylogenetics could use a hackathon to move things forward?

Posts: 6

Participants: 5

Read full topic

May 27, 2015

06:25

@josephwb wrote:

Fellow phylo-dorks,

I was wondering if anyone had hard numbers (or a paper reference) for the energy consumption required by large phylogenetic analyses (or comparable computational problems from other fields). Perhaps @rdmpage, @ematsen, @alexei_drummond, @Alexis_RAxML, @mtholder, @phylorich, or @beerli might be able to help me out?

Thanks! JWB.

Posts: 8

Participants: 5

Read full topic

May 13, 2015

19:56

@rainbowgoblin wrote:

I'm running into a hitch putting the finishing touches on a paper that's been accepted by Methods in Ecology and Evolution. The journal requires that data, etc. be available in some repository that they deem acceptable, and since mine is a software paper I stupidly assumed that Github would meet their criteria. It doesn't, so I made a tar archive of the current version I have on GitHub, put it up on Dryad (their preferred repo), and figured THAT would be ok.

Still no good, and now it starts to get complicated. Dryad can't deal with GPL. They require that software be covered by a Creative Commons (CC0) license. Since my software is already on both GitHub and its own website, and has been distributed under GPL up to this point, I don't know whether it makes sense to change the licence I use now, or whether I can just distribute the version on Dryad under CC0, but keep using GPL elsewhere.

Dryad also has an option to link to materials on GitHub as well, which sounds like a better option, but I contacted the journal to ask about it, and they said probably not, and that this was all highly unusual, they'd never encountered this problem before.

Has anyone else run into anything like this before? Is ecology software usually not GPL?

Posts: 5

Participants: 4

Read full topic

April 16, 2015

17:43

@rob_lanfear wrote:

Hi All,

I am putting together a collection of alignments with metadata (https://github.com/roblanf/PartitionedAlignments), and I'm looking for advice on file formats. The point of the collection is to make it simpler to test software and compare methods, by providing a well-annotated, tested set of published alignments that are all CC0.

The problem is formats. Each dataset has an alignment, various definitions of sites (i.e. which locus and genome each site comes from), taxon sets (e.g. outgroups), and other metadata (e.g. DOIs for the original study and data set, estimate of the age of the root of the tree, etc). Alignment formats are notoriously varied, so I'd like to stick with one of the standard formats (Nexus, phylip, FASTA), plus at most one more file for metadata (e.g. YAML, CSV).

I'd be happy to hear anyone's thoughts on the various pros and cons of any options.

Cheers,

Rob

Posts: 3

Participants: 3

Read full topic

17:43

@rob_lanfear wrote:

Hi All,

I am putting together a collection of alignments with metadata (https://github.com/roblanf/PartitionedAlignments), and I'm looking for advice on file formats. The point of the collection is to make it simpler to test software and compare methods, by providing a well-annotated, tested set of published alignments that are all CC0.

The problem is formats. Each dataset has an alignment, various definitions of sites (i.e. which locus and genome each site comes from), taxon sets (e.g. outgroups), and other metadata (e.g. DOIs for the original study and data set, estimate of the age of the root of the tree, etc). Alignment formats are notoriously varied, so I'd like to stick with one of the standard formats (Nexus, phylip, FASTA), plus at most one more file for metadata (e.g. YAML, CSV).

I'd be happy to hear anyone's thoughts on the various pros and cons of any options.

Cheers,

Rob

Posts: 4

Participants: 4

Read full topic

April 8, 2015

04:46

@db60 wrote:

Would someone be able to point me to a generalisation of the Fitch algorithm to calculate parsimonious length for a topology, but which works for non-binary trees?

Actually our goal is to calculate consistency index correctly, in the face of (possibly) ambiguous DNA base symbols (R, Y, S, V, M, etc).

Getting consistency index involves knowing the minimum conceivable length of a tree, calculated for each character individually.

This part of the problem seems equivalent to calculating tree length, for each character separately, for a 'bush' topology in which all terminals are connected directly to the root.

I'm just not quite sure how to do that. But I'm sure it is known, in general.

Thanks a lot,

Daniel

Posts: 2

Participants: 2

Read full topic

04:46

@db60 wrote:

Would someone be able to point me to a generalisation of the Fitch algorithm to calculate parsimonious length for a topology, but which works for non-binary trees?

Actually our goal is to calculate consistency index correctly, in the face of (possibly) ambiguous DNA base symbols (R, Y, S, V, M, etc).

Getting consistency index involves knowing the minimum conceivable length of a tree, calculated for each character individually.

This part of the problem seems equivalent to calculating tree length, for each character separately, for a 'bush' topology in which all terminals are connected directly to the root.

I'm just not quite sure how to do that. But I'm sure it is known, in general.

Thanks a lot,

Daniel

Posts: 2

Participants: 2

Read full topic

00:15

@juliofdiaz wrote:

Hello, I am trying to use BEAST to infer the ancestry of intra-specific isolates (we have ~200). I currently have a list of SNPs called from illumina sequencing data. Since Im basing my BEAST analysis on these SNPs, is there a way to normalize the number of mutational events observed in each isolate to the size of the genome over which a high quality call is possible? Would this be even necessary? I assume one way of doing this would be to do some bootstrapping, but this would be too PC time consuming

Posts: 1

Participants: 1

Read full topic

00:15

@juliofdiaz wrote:

Hello, I am trying to use BEAST to infer the ancestry of intra-specific isolates (we have ~200). I currently have a list of SNPs called from illumina sequencing data. Since Im basing my BEAST analysis on these SNPs, is there a way to normalize the number of mutational events observed in each isolate to the size of the genome over which a high quality call is possible? Would this be even necessary? I assume one way of doing this would be to do some bootstrapping, but this would be too PC time consuming

Posts: 1

Participants: 1

Read full topic