news aggregator

July 22, 2015


Browsing JSTOR's Global Plants database I was struck by the number of comments people have made on individual plant specimens. For example, for the Holotype of Scorodoxylum hartwegianum Nees (K000534285) there is a comment from Håkan Wittzell that the "Collection number should read 1269 according to Plantae Hartwegianae". In JSTOR the collection number is 1209.

Now, many (if not all) of these specimens will also be in GBIF. Indeed, K000534285 is in GBIF as, also with collection number 1209. A GBIF user will have no idea that there is some doubt about one item of metadata about this specimen.

So, an obvious thing to do would be to make the link between the JSTOR and GBIF records. Implementing this would need so fussing because (sigh) unlike DOIs for articles we don't have agreed upon identifiers for specimens. So we'd need to do some mapping between the specimen barcode K000534285, the JSTOR URL, and the GBIF record

In addition to providing users with more information, it might also be useful in kickstarting annotation on the GBIF site. At the moment GBIF has no mechanism for annotating data, and if it did, then it would have to start from scratch. Imagine that a person visiting occurrence 912442645 sees that it has already attracted attention elsewhere (e.g., JSTOR). They might be encouraged to take part in that conversation (because at least one person cared enough to comment already). Likewise, we could feed annotations on the GBIF site to JSTOR.

A variation on this idea is to think of annotations such as those in the JSTOR database as being analogous to the tweets, blog posts, and bookmarking that altmetric tracks for academic papers. Imagine if we applied the same logic to GBIF and had a way to show users that a specimen has been commented on in JSTOR Plants? Thinking further down the track, we could image adding other sorts of "attention", such as citations by papers, vouchers for DNA sequences, etc.

It would be a fun project to see whether the Disqus API enabled us to create a tool that could match JSTOR Global Plants comments to GBIF occurrences.

Source: iPhylo

Steve Baskauf has concluded a thoughtful series of blog posts on RDF and biodiversity informatics with In this post he discussed the "Rod Page Challenge", which was a series of grumpy posts I wrote (starting with this one) where I claimed RDF basically sucked, and to illustrate this I issued a challenge for people to do something interesting with some RDF I provided. Since this RDF didn't have a stable home I've put it on GitHub and it has a DOI courtesy of GitHub's integration with Zenodo.

I argued that the RDF typically available was basically useless because it wasn't adequately linked (see Reflections on the TDWG RDF "Challenge"). Two of the RDF files I provided were created specifically created to tackle this problem (derived from my projects iPhylo Linkout and the precursor to BioNames This marked pretty much the end of any interest I had in pursuing RDF.

Towards the end of Steve's post he writes:

At the close of my previous blog post, in addition to revisiting the Rod Page Challenge, I also promised to talk about what it would take to turn me from an RDF Agnostic into an RDF Believer. I will recap the main points about what I think it will take in order for the Rod Page Challenge to REALLY be met (i.e. for machines to make interesting inferences and provide humans with information about biodiversity that would not be obvious otherwise):

  1. Resource descriptions in RDF need to be rich in triples containing object properties that link to other IRI-identified resources.
  2. "Discovery" of IRI-identified resources is more likely to lead to interesting information when the linked IRIs are from Internet domains controlled by different providers.
  3. Materialized entailed triples do not necessarily lead to "learning" useful things. Materialized entailed triples are useful if they allow the construction of more clever or meaningful queries, or if they state relationships that would not be obvious to humans.

Steve's point 1 is essentially the point I was making with the challenge. At the time of the challenge, RDF from major biodiversity informatics projects was in silos, with few (if any) links to external resources (the kinds of things Steve refers to in his point 2). As a result, the promised benefits from RDF simply haven't materialised. The lesson I took from this is that we need rich, dense cross-links between data sources (the "biodiversity knowledge graph"), and that's one reason I've been obsessed with populating BioNames, which links animal names to the primary literature (I'm planning to extend this to plants as well). Turns out , creating lots of cross links is really hard work, much harder than simply pumping out a bunch of RDF and waiting for it to automagically coalesce into an all-connected knowledge graph.

I posed the challenge back in 2011, and since then I think the landscape has changed to the extent that I wonder if trying to "fix" RDF is really the way forward.

XML is deadAnyone (sane) developing for the web and wanting to move data around is using JSON, XML is hideous and best avoided. Much of the early work on RDF used XML, which only made things even harder than they already were. JSON beats XML, to the extent that RDF itself now has a JSON serialisation, JSON-LD. But JSON-LD is about more than the semantic web (see JSON-LD and Why I Hate the Semantic Web), and has the great advantage that you can actually ignore all the RDF cruft (i.e., the namespaces) and simply treat the data as key-value pairs (yay!). Once you do that, then you can have fun with the data, especially with databases such as CouchDB ("fun" and "database" in the same sentence, I know!).

Key-value pairs, document stores, and graph databasesThe NoSQL "movement" has thrown up all sorts of new ways to handle data and to think about databases. We can think of RDF as describing a graph, but it carries the burden of all the namespaces, vocabularies, and ontologies that come with it. Compare that with the fun (there's that word again) of graph databases such as Neo4J with its graph gists. The Neo4J folks have made a great job of publicising their approach, and making it easy and attractive to play with.

So, we're in a interesting time when there are a bunch of technologies available, and I think maybe it's time to ask whether the community's allegiance to RDF and the Semantic Web has been somewhat misplaced...

Source: iPhylo

July 21, 2015


Next week there will be a gathering in Singapore, for a Phylogenetic Network Workshop. This is being hosted by the Institute for Mathematical Sciences, at the National University of Singapore.

The workshop has been organized under the guidance of Louxin Zhang. The program and abstracts can be found here. It runs for the whole week, 27 – 31 July 2015.

The workshop is actually the final part of a much larger, 2-month programme, called Networks in Biological Sciences (1 June – 31 July 2015). This programme is focused on mathematics for network models in biology, including complex networks and systems biology. Network modeling is extremely challenging, and so it offers outstanding opportunities for mathematicians and statisticians. The phylogenetics workshop will focus on the mathematics needed to develop fast and robust computer programs for inferring an evolutionary network models from biological sequence data.

The participants are principally from the computational sciences, of course, including many who have attended the previous network workshops in Leiden, in the Netherlands, in October 2012 and July 2014. There are, however, a few biologists to round out the field, including myself.

Singapore is hot and humid for most of the year, and July is no exception. So, I am expecting the unacclimatized participants to spend most of their time indoors, avoiding the daily thunderstorms.

I am hoping to add some blog posts based on what happens at the workshop, as it proceeds.

Background: Among the understudied fungi found in nature are those living in close association with social and solitary bees. The bee-specialist genera Bettsia, Ascosphaera and Eremascus are remarkable not only for their specialized niche but also for their simple fruiting bodies or ascocarps, which are morphologically anomalous in Pezizomycotina. Bettsia and Ascosphaera are characterized by a unicellular cyst-like cleistothecium known as a spore cyst, while Eremascus is characterized by completely naked asci, or asci not formed within a protective ascocarp. Before molecular phylogenetics the placement of these genera within Pezizomycotina remained tentative; morphological characters were misleading because they do not produce multicellular ascocarps, a defining character of Pezizomycotina. Because of their unique fruiting bodies, the close relationship of these bee-specialist fungi and their monophyly appeared certain. However, recent molecular studies have shown that Bettsia is not closely related to Ascosphaera.In this study, I isolated the very rare fungus Eremascus fertilis (Ascomycota, Pezizomycotina) from the bee bread of honey bees. These isolates represent the second report of E. fertilis both in nature and in the honey bee hive. To establish the systematic position of E. fertilis and Bettsia alvei, I performed phylogenetic analyses of nuclear ribosomal LSU + SSU DNA sequences from these species and 63 additional ascomycetes. Results: The phylogenetic analyses revealed that Eremascus is not monophyletic. Eremascus albus is closely related to Ascosphaera in Eurotiomycetes while E. fertilis belongs in Myxotrichaceae, a putative member of Leotiomycetes. Bettsia is not closely related to Ascosphaera and like E. fertilis apparently belongs in Leotiomycetes. These results indicate that both the naked ascus and spore cyst evolved twice in the Pezizomycotina and in distantly related lineages. The new genus Skoua is described to accommodate E. fertilis. Conclusions: The naked ascus and spore cyst are both shown to have evolved convergently within the bee habitat. The convergent evolution of these unusual ascocarps is hypothesized to be adaptive for bee-mediated dispersal. Elucidating the dispersal strategies of these fungal symbionts contributes to our understanding of their interaction with bees and provides insight into the factors which potentially drive the evolution of reduced ascocarps in Pezizomycotina.

July 19, 2015

Analysis of sequence data using time-reversible substitution models and maximum likelihood (ML) algorithms is currently the most popular method to infer phylogenies, despite the fact that results often contradict each other. Searching for sources of error we focus on a hitherto neglected feature of these methods: character polarity is usually thought to be irrelevant in ML analyses. Mechanisms that lead to wrong tree topologies were analysed at the level of split-supporting site patterns. In simulations, plesiomorphic site patterns can be identified by comparison with known root sequences. These patterns cause some surprising effects: Using data sets generated with simulations of sequence evolution along a variety of topologies and inferring trees using the same (correct) model, we show for cases of branch-length heterogeneity that (i) as already known, ML analyses can fail to recover the correct tree even when the correct substitution model is used, but also that (ii) plesiomorphic character states cause substantial mistakes and therefore character polarity is relevant, and (iii) accumulating chance similarities on long branches are far less misleading than plesiomorphic states accumulating on shorter branches. The artefacts occur when branch lengths are heterogeneous. The systematic errors disappear for the most part when the sites with symplesiomorphies supporting false clades are deleted from the data set. We conclude that many of the phylogenies published during the past decades may be false due to the neglected effects of symplesiomorphies.
Source: Cladistics

The following diagrams are taken from the book A History of Architecture on the Comparative Method for the Student, Craftsman, and Amateur. This book is considered to be "a canonical text that has played a formative role in the education of generations of architects" because it really does "cram everything into a single volume". The first edition of the book appeared in 1896, with the 20th edition appearing in 1996.

The first picture is from the 5th edition (1905), and the second one is from the 16th edition (1954).

As noted in the first figure, these trees purport to show the "evolution" of the various architectural styles. However, they do no such thing.

At the base of the tree trunk is a set of individual architectural styles that apparently led nowhere, while at the crown of the tree several styles are repeated. Each of the latter styles exist on two side-branches from the main trunk, each pair connected by vertical tendrils. So, this is a network, at least. However, the meaning of this network is not immediately obvious. Indeed, even a short perusal of the diagram should lead you to the idea that the meaning is contained more in cultural bias than in the actual history of architecture.

The history of the book itself is somewhat complex. The first edition was written by the father and son team of Banister Fletcher & Banister F. Fletcher. Subsequent editions were revised by Banister F. Fletcher (the son), with the 6th edition (1921) being rewritten by Fletcher and his first wife (who got no credit, even though the father's name was then dropped). After Fletcher's death in 1953, the 17th edition (1961) was revised by R.A. Cordingley, the 18th (1975) by James Palme, the 19th (1984) by John Musgrove, and the 20th (1996) by Dan Cruickshank. The tone and arrangement of the book was changed with each edition.

The tree has been analyzed in detail by Gülsüm Baydar Nalbantoglu (1998. Toward postcolonial openings: rereading Sir Banister Fletcher's "History of Architecture". Assemblage 35: 6-17). She notes the following:
Until the fourth edition of 1901, A History of Architecture had been a relatively modest survey of European styles. The fourth edition, however, appeared with an important difference: this time the book was divided into two sections, "The Historical Styles", which covered all the material from earlier editions, and "The Non-Historical Styles", which included Indian, Chinese, Japanese, Central American, and Saracenic architecture. The "Tree of Architecture" has a very solid upright trunk that is inscribed with the names of European styles and that branches out to hold various cultural / geographical locations. The nonhistorical styles, which unlike others remain undated, are supported by the "Western" trunk of the tree with no room to grow beyond the seventh-century mark. European architecture is the visible support for nonhistorical styles. Nonhistorical styles, grouped together, are decorative additions, they supplement the proper history of architecture that is based on the logic of construction. In the posthumously published seventeenth edition of 1961, the two parts were renamed "Ancient Architecture and the Western Succession" and "Architecture in the East", respectively. The nineteenth edition of 1987, on the other hand, consisted of seven parts based on chronology and geographical location. Cultures outside of Europe included "The Architecture of the Pre-Colonial Cultures outside Europe" and "The Architecture of the Colonial and Post-Colonial Periods outside Europe".
That is, "architecture" for the Banisters was defined as being about a building's construction, not its decoration. European cultures focused on construction, and they developed their styles through time. Other cultures focused on decoration, and were therefore not a proper part of architecture, and had no historical development. This is what the tree attempts to show.

This cultural bigotry was corrected in the final few editions of the book (after the Fletchers were no longer involved), where all architectural styles were considered more-or-less equal.

July 18, 2015


@mathmomike wrote:

Phylomania - November - Hobart, Tasmania, Australia and then the 20th annual NZ phylo meeting - February, Tongariro National Park, NZ

Posts: 1

Participants: 1

Read full topic

July 17, 2015


Eukaryotes were born of a chimeric union between two prokaryotes—the progenitors of the mitochondrial and nuclear genomes. Early in eukaryote evolution, most mitochondrial genes were lost or transferred to the nucleus, but a core set of genes that code exclusively for products associated with the electron transport system remained in the mitochondrion. The products of these mitochondrial genes work in intimate association with the products of nuclear genes to enable oxidative phosphorylation and core energy production. The need for coadaptation, the challenge of cotransmission, and the possibility of genomic conflict between mitochondrial and nuclear genes have profound consequences for the ecology and evolution of eukaryotic life. An emerging interdisciplinary field that I call "mitonuclear ecology" is reassessing core concepts in evolutionary ecology including sexual reproduction, two sexes, sexual selection, adaptation, and speciation in light of the interactions of mitochondrial and nuclear genomes.


The cnidarian freshwater polyp Hydra sp. exhibits an unparalleled regeneration capacity in the animal kingdom. Using an integrative transcriptomic and stable isotope labeling by amino acids in cell culture proteomic/phosphoproteomic approach, we studied stem cell-based regeneration in Hydra polyps. As major contributors to head regeneration, we identified diverse signaling pathways adopted for the regeneration response as well as enriched novel genes. Our global analysis reveals two distinct molecular cascades: an early injury response and a subsequent, signaling driven patterning of the regenerating tissue. A key factor of the initial injury response is a general stabilization of proteins and a net upregulation of transcripts, which is followed by a subsequent activation cascade of signaling molecules including Wnts and transforming growth factor (TGF) beta-related factors. We observed moderate overlap between the factors contributing to proteomic and transcriptomic responses suggesting a decoupled regulation between the transcriptional and translational levels. Our data also indicate that interstitial stem cells and their derivatives (e.g., neurons) have no major role in Hydra head regeneration. Remarkably, we found an enrichment of evolutionarily more recent genes in the early regeneration response, whereas conserved genes are more enriched in the late phase. In addition, genes specific to the early injury response were enriched in transposon insertions. Genetic dynamicity and taxon-specific factors might therefore play a hitherto underestimated role in Hydra regeneration.


Rates of molecular evolution can vary over time. Diverse statistical techniques for divergence time estimation have been developed to accommodate this variation. These typically require that all sequence (or codon) positions at a locus change independently of one another. They also generally assume that the rates of different types of nucleotide substitutions vary across a phylogeny in the same way. This permits divergence time estimation procedures to employ an instantaneous rate matrix with relative rates that do not differ among branches. However, previous studies have suggested that some substitution types (e.g., CpG to TpG changes in mammals) are more clock-like than others. As has been previously noted, this is biologically plausible given the mutational mechanism of CpG to TpG changes. Through stochastic mapping of sequence histories from context-independent substitution models, our approach allows for context-dependent nucleotide substitutions to change their relative rates over time. We apply our approach to the analysis of a 0.15 Mb intergenic region from eight primates. In accord with previous findings, we find comparatively little rate variation over time for CpG to TpG substitutions but we find more for other substitution types. We conclude by discussing the limitations and prospects of our approach.


At high-altitude, small mammals are faced with the energetic challenge of sustaining thermogenesis and aerobic exercise in spite of the reduced O2 availability. Under conditions of hypoxic cold stress, metabolic demands of shivering thermogenesis and locomotion may require enhancements in the oxidative capacity and O2 diffusion capacity of skeletal muscle to compensate for the diminished tissue O2 supply. We used common-garden experiments involving highland and lowland deer mice (Peromyscus maniculatus) to investigate the transcriptional underpinnings of genetically based population differences and plasticity in muscle phenotype. We tested highland and lowland mice that were sampled in their native environments as well as lab-raised F1 progeny of wild-caught mice. Experiments revealed that highland natives had consistently greater oxidative fiber density and capillarity in the gastrocnemius muscle. RNA sequencing analyses revealed population differences in transcript abundance for 68 genes that clustered into two discrete transcriptional modules, and a large suite of transcripts (589 genes) with plastic expression patterns that clustered into five modules. The expression of two transcriptional modules was correlated with the oxidative phenotype and capillarity of the muscle, and these phenotype-associated modules were enriched for genes involved in energy metabolism, muscle plasticity, vascular development, and cell stress response. Although most of the individual transcripts that were differentially expressed between populations were negatively correlated with muscle phenotype, several genes involved in energy metabolism (e.g., Ckmt1, Ehhadh, Acaa1a) and angiogenesis (Notch4) were more highly expressed in highlanders, and the regulators of mitochondrial biogenesis, PGC-1α (Ppargc1a) and mitochondrial transcription factor A (Tfam), were positively correlated with muscle oxidative phenotype. These results suggest that evolved population differences in the oxidative capacity and capillarity of skeletal muscle involved expression changes in a small suite of coregulated genes.


Sodalis glossinidius, a maternally inherited secondary symbiont of the tsetse fly, is a bacterium in the early/intermediate state of the transition toward symbiosis, representing an important model for investigating establishment and evolution of insect–bacteria symbiosis. The absence of phylogenetic congruence in tsetse-Sodalis coevolution and the existence of Sodalis genotypic diversity in field flies are suggestive for a horizontal transmission route. However, to date no natural mechanism for the horizontal transfer of this symbiont has been identified. Using novel methodologies for the stable fluorescent-labeling and introduction of modified Sodalis in tsetse flies, we unambiguously show that male-borne Sodalis is 1) horizontally transferred to females during mating and 2) subsequently vertically transmitted to the progeny, that is, paternal transmission. This mixed mode of transmission has major consequences regarding Sodalis’ genome evolution as it can lead to coinfections creating opportunities for lateral gene transfer which in turn could affect the interaction with the tsetse host.


Sex chromosomes are subject to unique evolutionary forces that cause suppression of recombination, leading to sequence degeneration and the formation of heteromorphic chromosome pairs (i.e., XY or ZW). Although progress has been made in characterizing the outcomes of these evolutionary processes on vertebrate sex chromosomes, it is still unclear how recombination suppression and sequence divergence typically occur and how gene dosage imbalances are resolved in the heterogametic sex. The threespine stickleback fish (Gasterosteus aculeatus) is a powerful model system to explore vertebrate sex chromosome evolution, as it possesses an XY sex chromosome pair at relatively early stages of differentiation. Using a combination of whole-genome and transcriptome sequencing, we characterized sequence evolution and gene expression across the sex chromosomes. We uncovered two distinct evolutionary strata that correspond with known structural rearrangements on the Y chromosome. In the oldest stratum, only a handful of genes remain, and these genes are under strong purifying selection. By comparing sex-linked gene expression with expression of autosomal orthologs in an outgroup, we show that dosage compensation has not evolved in threespine sticklebacks through upregulation of the X chromosome in males. Instead, in the oldest stratum, the genes that still possess a Y chromosome allele are enriched for genes predicted to be dosage sensitive in mammals and yeast. Our results suggest that dosage imbalances may have been avoided at haploinsufficient genes by retaining function of the Y chromosome allele through strong purifying selection.


LBD (LATERAL ORGAN BOUNDARIES DOMAIN) genes are essential to the developmental programs of many fundamental plant organs and function in some of the basic metabolic pathways of plants. However, our historical perspective on the roles of LBD genes during plant evolution has, heretofore, been fragmentary. Here, we show that the LBD gene family underwent an initial radiation that established five gene lineages in the ancestral genome of most charophyte algae and land plants. By inference, the LBD gene family originated after the emergence of the green plants (Viridiplantae), but prior to the diversification of most extant streptophytes. After this initial radiation, we find limited instances of gene family diversification in land plants until successive rounds of expansion in the ancestors of seed plants and flowering plants. The most dynamic phases of LBD gene evolution, therefore, trace to the aquatic ancestors of embryophytes followed by relatively recent lineage-specific expansions on land.


Many phylogenomic studies based on transcriptomes have been limited to "single-copy" genes due to methodological challenges in homology and orthology inferences. Only a relatively small number of studies have explored analyses beyond reconstructing species relationships. We sampled 69 transcriptomes in the hyperdiverse plant clade Caryophyllales and 27 outgroups from annotated genomes across eudicots. Using a combined similarity- and phylogenetic tree-based approach, we recovered 10,960 homolog groups, where each was represented by at least eight ingroup taxa. By decomposing these homolog trees, and taking gene duplications into account, we obtained 17,273 ortholog groups, where each was represented by at least ten ingroup taxa. We reconstructed the species phylogeny using a 1,122-gene data set with a gene occupancy of 92.1%. From the homolog trees, we found that both synonymous and nonsynonymous substitution rates in herbaceous lineages are up to three times as fast as in their woody relatives. This is the first time such a pattern has been shown across thousands of nuclear genes with dense taxon sampling. We also pinpointed regions of the Caryophyllales tree that were characterized by relatively high frequencies of gene duplication, including three previously unrecognized whole-genome duplications. By further combining information from homolog tree topology and synonymous distance between paralog pairs, phylogenetic locations for 13 putative genome duplication events were identified. Genes that experienced the greatest gene family expansion were concentrated among those involved in signal transduction and oxidoreduction, including a cytochrome P450 gene that encodes a key enzyme in the betalain synthesis pathway. Our approach demonstrates a new approach for functional phylogenomic analysis in nonmodel species that is based on homolog groups in addition to inferred ortholog groups.


Citrus genus includes some of the most important cultivated fruit trees worldwide. Despite being extensively studied because of its commercial relevance, the origin of cultivated citrus species and the history of its domestication still remain an open question. Here, we present a phylogenetic analysis of the chloroplast genomes of 34 citrus genotypes which constitutes the most comprehensive and detailed study to date on the evolution and variability of the genus Citrus. A statistical model was used to estimate divergence times between the major citrus groups. Additionally, a complete map of the variability across the genome of different citrus species was produced, including single nucleotide variants, heteroplasmic positions, indels (insertions and deletions), and large structural variants. The distribution of all these variants provided further independent support to the phylogeny obtained. An unexpected finding was the high level of heteroplasmy found in several of the analyzed genomes. The use of the complete chloroplast DNA not only paves the way for a better understanding of the phylogenetic relationships within the Citrus genus but also provides original insights into other elusive evolutionary processes, such as chloroplast inheritance, heteroplasmy, and gene selection.