Combinatorial Models of Synteny Conservation in Genomes

Cedric Chauve

Abstract

Genomic rearrangements - reversals, transpositions, deletions, insertions, duplications, translocations, among others - are large-scale evolutionary events that disrupt gene order along chromosomes. The computational analysis of gene orders, their structure and their evolution relies on combinatorial models and algorithms designed in terms of sequences of signed. The last 15 years have seen a wealth of research in this field, which provided widely used computational tools as well as new insights on the evolution and structure of several sets of genomes. The proposed program is centered on detection of con served gene clusters, the assignment of evolutionary relations in the presence of multigenes families and the computation of evolution hypothesis. Gene clusters, like operons in prokaryotic genomes, are fundamental functional genomic elements, that are characterized, among others, by conserved gene content and order, up to local rearrangements sometimes. Detecting some clusters is a difficult problem with applications in very applied domains, like pathogenomics. This part of the project aims at designing methods to detect such clusters that will (1) detect highly rearranged clusters, (2) discriminate between conserved clusters due to evolutive pressure and conserved clusters due to phylogenetic proximity and (3) be computationally efficient in order to process large datasets. Most algorithms aimed at analyzing genome rearrangements have been designed for signed permutations, which correspond to genomes with with trivial gene families. A recent approach to overcome this limitation assigns evolutionary relations between members of multigenes families, using a genes matching strategy in a parsimony framework. Our goal in this project is to develop a gene matching strategy that is not based on an evolutionary model but on the conservation of local synteny and also consider sequence alignments results used to define gene families and a statistical model of synteny conservation significance. The third main part of the proposed project deals with the analysis of gene order datasets produced using the methods developed in the two previous sub-projects for phylogenomic analysis, including computing gene order phylogenies, ancestral gene orders and statistics on genome rearrangements. Finally some attention will be given to the problem of generating "gene order" datasets for eukaryotic genomes, where genes only do not cover enough genome to be reliable markers. We will investigate two classical approaches, whole genome alignments and comparitive mapping technique, and a new method, based on virtual hybridation. Our approach for most of the above problems will rely on sound and well understand combinatorial models for the analysis of signed permutations and sequences, like, but not limited to, common intervals and max-gap clusters. An important focus will be on designing and implementing efficient algorithms based on these models.