Fields of research
There has been a fruitful interplay between several fields of discrete mathematics and computer sciences.
The field of formal models for computation, including automata and formal languages, is one of the best established of theoretical computer science. The French school in this field has a visible international leading position. Far from being a closed subject, it has continued to develop thanks to new connexions or points of view and to new fields of applications:
- the link with formal logic and game theory and the application to software application;
- the connection with the field of symbolic dynamics which has itself important applications in the field of modular coding;
- the development of the field of weighted automata, with applications to speech processing.
Computational genomics and bioinformatics are two active fields of computational biology. The former studies the genomes of cells and organisms (high-throughput genome sequencing produces lots of data, which requires extensive post-processing) to help finding genes of interest for certain diseases or conditions, whereas the latter applies algorithms to the interpretation, classification and understanding of biological datasets. These typically consist of large numbers of DNA, RNA, or protein sequences/graphs.
Algebraic combinatorics is nowadays a well-developed subject with general theories (such as combinatorial Hopf algebras) connected to many topics of pure mathematics or mathematical physics (representation theory, quantum groups, operads, random matrices, resurgence, resummation and renormalization, etc). The investigation of these theories rely in a crucial way upon the development of powerful software (based upon general purpose mathematical systems) and now open source Sage system for experimentation and even computer assisted proofs. The challenge is to build an efficient and flexible system, yet easily usable by mathematicians, and to foster code sharing.
Scientific program of the Bézout Labex
Algorithms were originally designed for a flawless world, somewhat perturbed by random noise. But the real world (for instance for web data processing, natural language processing, biological interaction graphs processing, biological datasets classifications) is often far from that perfect ideal. In practice, data are poorly labeled or not labeled at all, they can be biased or drawn from slightly different problems than what one is trying to learn, and the distributions may drift with time.
Another challenge facing applications is the staggering size of the datasets, which exceeds several million and can reach billions of high-dimensional data points. Scaling exiting algorithms to deal with such magnitudes using massively distributed systems, or devising novel techniques taking advantage of these systems while not sacrificing learning guarantees, is an important question in many applications.
In this situation, the Bézout Labex’s proposal is to use syntactic methods to treat these sets of data to a large extent unstructured. These methods are related with algebraic models (automata, graphs, formal grammars), which have already proved to be efficient in a large number of domains. The backbone of the subjects is the rich domain of combinatorics on words, that provides the foundation of further studies, applications and algorithm design for questions like pattern matching, data mining and sequence analysis. The domain involves combinatorial structures (like permutations and trees), whose accurate analysis requires algebraic and enumeration methods.
Valorization of results, transfer, and expertise
The valorization of research in this theme will result in a number of softwares. These softwares fall into two categories, those that are directly dedicated to applications in bioinformatics, linguistics, operations research, or computer code analysis, and those that are tools for research, such as the platform Sage-Combinat.
The Bézout Labex is thus involved in several projects that have to do with the concept of “sustainable city”, in particular with city planning and urban transport. Using tools in the field of discrete mathematics and operations research, the Bézout Labex’s teams plan for instance to design efficient algorithms to operate self service bike/car hiring system like the Vélib’ system in Paris. Such algorithms can help to route efficiently the trucks that rebalance the bikes among the stations.
In the bioinformatics area, the objective is to develop the first software prototype devoted to new generation multi-leaf collimator settings (rotating beam head and orthogonal multi-leaf collimators) that are used for radiation treatment. The Bézout Labex also plans to develop a benchmark solution for functional pattern matching in biological networks. Protein interactions identified on a genome-wide scale are indeed commonly visualized as protein interaction graphs, where proteins are vertices and interactions are edges. In this context, searching for motifs in graphs has become a crucial problem in the analysis of these networks.
Lastly, the Sage project is clearly the future of mathematical software. Being free and open source, it is being adopted as a teaching tool in many universities, with some 10 000 users and 200 developers in the academic community. With 25 developers Sage-Combinat is, with number theory, its largest organized sub-community. The algebraic combinatorics framework has become an important part of its architecture, and it is expected that it will be widely used by various communities. The objective is to add an efficient implementation of multivariate polynomials, which are not already present in other systems. This development will be done in collaboration with several universities is USA and Canada.