Phyloseminar

phyloseminar -- a free online seminar about phylogenetics

URL

XML feed
http://phyloseminar.org/

Last update

1 hour 56 min ago

June 6, 2014

01:00

Tumour heterogeneity, i.e. the genomic diversity of cancer cells within a single tumour, is thought to be the source of chemotherapy resistance. In many cancers, this heterogeneity is not limited to point mutations but includes large scale genomic rearrangements and endoreduplications that lead to aberrant copy number (CN) profiles. Reconstruction of the evolutionary tree of cancer within the patient allows us to quantify and understand the aetiology of tumour heterogeneity. In some cancers, such as high-grade serous ovarian cancer (HGSOC), CN profiles predominate. However tree inference is hindered by unknown phasing of major and minor CNs, horizontal dependencies between adjacent genomic loci and the lack of curated CN profile databases to use as a reference for probabilistic inference.

We recently developed MEDICC (Minimum Event Distance for Intra-tumour Copy number Comparisons), an algorithm for phylogenetic reconstruction based on CN profiles. MEDICC uses finite-state transducers (FSTs) to encode a minimum evolution criterion that determines pairwise evolutionary distances between CN profiles. This minimum-event distance computes the smallest number of amplification and deletions of arbitrary length that are necessary to transform one genomic profile into another. The FST-based approach allows us thereby to model dependencies between sites, similar to the problem of modelling indels on trees in traditional phylogenetics. Using this approach we are able to phase major and minor CN profiles to the parental alleles and infer trees and ancestral genomes, while minimizing the overall tree length. The distance measure is formulated such that the resulting matrix of pairwise distances has a direct mapping to a positive semi-definite kernel matrix. This allows us to perform principal component analysis in evolutionary space and use this embedding to numerically quantify tumour heterogeneity and other quantities of interest, such as the degree of clonal expansion, using spatial statistics.

I will talk about the basics of FST-based phylogenetic inference and explain how they can be used to model genomic rearrangement events with horizontal dependencies. I will explain how this approach implicitly maps genomes into a feature space in which we can quantify heterogeneity. Finally, I will present clinical results that show how this quantification of ITH can predict resistance development in the hospital.

March 17, 2014

19:00

Metastasis is the main cause of cancer morbidity and mortality. Despite its clinical significance, several fundamental questions about the metastatic process in humans remain unsolved. Does metastasis occur early or late in cancer progression? Do metastases emanate directly from the primary tumor or give rise to each other? How does heterogeneity in the primary tumor relate to the genetic composition of secondary lesions? Addressing these questions – ideally by examining the genetic makeup of tumor cells in distinct anatomic locations and reconstructing their evolutionary relationships – is crucial to improving our understanding of metastasis. I will give an overview of a simple PCR-based assay that enables the tracing of tumor lineage in patient tissue specimens. The methodology relies on somatic variation in highly mutable polyguanine (poly-G) repeats located in non-coding genomic regions. Poly-G mutations are present in a variety of human cancers. In colon carcinoma, an association exists between patient age at diagnosis and tumor mutational burden, suggesting that poly-G variants accumulate during normal division in colonic stem cells. Poorly differentiated colon carcinomas (which have a worse prognosis) have fewer mutations than well-differentiated tumors, possibly indicating a shorter mitotic history of the founder cell in these cancers. By presenting several patient case studies, I will describe how poly-G fingerprints can be used to construct phylogenetic trees that reflect the evolution of metastatic colon cancer, with an emphasis on how biological considerations inform analysis strategies.

19:00

Genome rearrangements were discovered and used to build molecular phylogenies in the 1930s. They are implied in many cancers and their evolutionary role might be of primary importance. But the mathematical and computational tools to model rearrangements are still not as efficient as the ones developed later for local mutations as nucleotide or amino-acid substitutions. In this seminar I will report the attempts to integrate genome organisations in the usual models of genome evolution. I will explain how this can improve the inference of phylogenies, as well as ancestral genomes.

19:00

Genome rearrangements were discovered and used to build molecular phylogenies in the 1930s. They are implied in many cancers and their evolutionary role might be of primary importance. But the mathematical and computational tools to model rearrangements are still not as efficient as the ones developed later for local mutations as nucleotide or amino-acid substitutions. In this seminar I will report the attempts to integrate genome organisations in the usual models of genome evolution. I will explain how this can improve the inference of phylogenies, as well as ancestral genomes.

March 16, 2014

19:00

In this second talk of our series on genome-scale phylogeny, I build upon Gergely's introduction and present the modelling assumptions and algorithmic details behind some of the methods we and others have developed. There will be two parts to this talk. I start with the model of gene duplications and losses implemented in PHYLDOG. I present the assumptions we make and the shortcuts we take to improve the program's efficiency, and show some results on real and simulated sequence data. I notably show problems that arise when the program is confronted with data generated with a model of incomplete lineage sorting (Rasmussen and Kellis, 2012), and present avenues of research to find solutions to these problems. In the second part, I present our current efforts to use our model of gene duplication, loss, and transfer (Szöllosi et al, 2013) to infer a species tree in which speciation nodes are ordered in time. I briefly remind the forgetful viewer of what this model does and how it works, and I then explain how we devise a new MCMC algorithm to use it on data sets containing dozens of species and thousands of gene families. I finish with some perspectives of our plans uniting gene tree-species tree models and databases of gene families and phylogenetic trees.

19:00

In this second talk of our series on genome-scale phylogeny, I build upon Gergely's introduction and present the modelling assumptions and algorithmic details behind some of the methods we and others have developed. There will be two parts to this talk. I start with the model of gene duplications and losses implemented in PHYLDOG. I present the assumptions we make and the shortcuts we take to improve the program's efficiency, and show some results on real and simulated sequence data. I notably show problems that arise when the program is confronted with data generated with a model of incomplete lineage sorting (Rasmussen and Kellis, 2012), and present avenues of research to find solutions to these problems. In the second part, I present our current efforts to use our model of gene duplication, loss, and transfer (Szöllosi et al, 2013) to infer a species tree in which speciation nodes are ordered in time. I briefly remind the forgetful viewer of what this model does and how it works, and I then explain how we devise a new MCMC algorithm to use it on data sets containing dozens of species and thousands of gene families. I finish with some perspectives of our plans uniting gene tree-species tree models and databases of gene families and phylogenetic trees.

March 15, 2014

19:00

Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees.

I introduce models that describe the relationship between gene trees and species trees. I begin with models that account for gene duplication and loss, and subsequently introduce models that account for the horizontal transfer of genes. I review results from simulations as well as empirical studies on genomic data that show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. I also discuss the possibility of extracting information on the timing of speciation events from ancient horizontal transfer events.