Working group “Machine learning and optimization”

This new working group will be held on Tuesday mornings. It is organized by Romual Elie, Walid Hachem and Gabriel Stoltz. There is a mailing list if you want (gt-appr-opt followed by

The sessions of the working group will be held in the seminar room of CERMICS (2nd floor of Coriolis building, left wing, see here for instructions to reach the building).

May, 14th, 2019

  • 10h-11h Lenka Zdeborova (Institut de Physique Theorique, CEA Saclay), Statistical physics of computational problems [slides]
    What are the problems we can solve using a computer? is one of the very fundamental question in science. We will describe how do we use statistical physics to address this question. We will discuss what insights does physics bring to the field of algorithmic hardness and how is this insight used to develop better algorithms. We will describe examples of applications in artificial neural networks, compressed sensing or data clustering.
  • 11h15-12h15 Etienne Come (Ifsttar), Clustering et données massives de mobilités quelques problèmes, and Jean-Philippe Tarel (Ifsttar), Combined stereo reconstruction and defogging [slides for first talk, slides for the second one]
    Stereo reconstruction serves many outdoor applications, and thus sometimes faces difficulties with foggy weather. However, fog provides depth cues for far away objects. By taking advantages of both stereo and fog cues, stereo reconstruction can be improved in presence of fog. We thus propose an algorithm for combined stereo reconstruction and defogging, based on minimizing adequate energy.

April, 2nd, 2019

  • 10h-11h Bruno Scherrer (Université de Lorraine et Inria), On the use of non-stationary policies for stationary sequential decision problems [slides, notes]
    We consider the problem of optimal control in discrete time, formalized by Markov decision processes. We will start by describing this model and some standard results: computation of an optimal non-stationary policy by dynamic programming, existenceof optimal stationary policy for stationary problems and related algorithms. We will then focus on the situation where the control problem that we want to solve is so large that we can only consider approximate forms of dynamic programming, which rely, for example, on solving a sequence of regression problems. Although there is a stationary solution to the stationary control problem, it will be shown (via optimal performance guarantees) that it is, in general, preferable to construct non-stationary approximate solutions.
  • 11h10-12h10 Loic Landrieu (IGN), Semantic Segmentation of 3D Point Clouds  [presentation]
    Recent breakthroughs in 3D data analysis have lead to significant improvements in the quality of 3D data semantic analysis. In this talk, we present an overview of the state-of-the-art of the field, starting from the traditional pointwise classification/regularization approach to modern deep learning methods. In particular, we will discuss the use of neural networks structured by large-scale, nonconvex, graph-structured optimization problems. This approach allows for the use of powerful general graph convolutions and graph-structured metric learning schemes on point clouds.

February 19th, 2019

  • 10h-11h Emilie Kaufmann (CNRS & Université de Lille & Inria Lille), New tools for Adaptive Testing and Applications to Bandit Problems
    I will introduce a general framework for sequential, adaptive testing of multiple composite hypotheses, that are possibly overlapping. This framework is motivated by several identification problems in multi-armed bandit models, whose applications range from (adaptive) A/B/C Testing to (adaptive) game tree exploration, a.k.a. Monte-Carlo Tree Search.
    I will first introduce a generic stopping rule for those tests, based on Generalized Likelihood Ratio statistics, and prove its correctness in some cases using new self-normalized concentration inequalities. I will then discuss the sample complexity of this stopping rule when coupled with a good sampling rule, that is the minimal number of samples needed before stopping the test. In particular, we will propose an optimal strategy for (epsilon)-best arm identification in a bandit model.
  • 11h10-12h10 Michel de Lara (CERMICS), Design of lower bound convex programs for exact sparse optimization [presentation]
    (Joint work with Jean-Philippe Chancelier) In exact sparse optimization problems, one looks for solution that have few non-zero components. We consider problems where sparsity is exactly measured by the non convex $l_0$ pseudo norm (and not by susbtitute penalizing terms). First, we display a suitable conjugacy for which we show that the $l_0$ pseudo norm is “convex” in the sense of generalized convexity (equal to its biconjugate). As a corollary, we also show that the $l_0$ pseudo norm coincides, on the sphere, with a convex lsc function. Second, thus equipped, we display a lower bound for the original exact sparse optimization problem. Under mild additional assumptions, this bound is a convex minimization program over the unit ball of a so-called support norm. Third, we introduce generalized sparse optimization, where the solution is searched among a union of subspaces. We provide a systematic way to design norms and lower bound convex minimization programs over their unit ball. Thus, we yield an interpretation for most of the sparsity-inducing norms used in machine learning

    Jean-Philippe Chancelier, Michel De Lara. Lower Bound Convex Programs for Exact Sparse Optimization, HAL, 02/2019, 32 pages.

    Jean-Philippe Chancelier, Michel De Lara. A Suitable Conjugacy for the l0 Pseudonorm, HAL, 02/2019, 20 pages.

January 8th, 2019

  • 10h-11h Olivier Fercoq (Telecom Paris), Smooth Primal-Dual Coordinate Descent Algorithms for Nonsmooth Convex Optimization [pdf]
    We propose a new randomized coordinate descent method for a convex optimization template with broad applications. Our analysis relies on a novel combination of four ideas applied to the primal-dual gap function: smoothing, acceleration, homotopy, and coordinate descent with non-uniform sampling. As a result, our method features the first convergence rate guarantees among the coordinate descent methods, that are the best-known under a variety of common structure assumptions on the template. We provide numerical evidence to support the theoretical results with a comparison to state-of-the-art algorithms.
  • 11h15-12h15, Laurent Najman (ESIEE), A morphological point of view on graph-based classification and clustering [pdf]
    In this talk, we will introduce the maximum margin partition problem on graphs. This problem can be seen as the “dual” of the maximum margin boundary problem, of which SVM is a well-known solution. We will see that the maximum margin partition problem naturally leads to different variants of the watershed, the main tool developed in mathematical morphology for image segmentation. We will survey the main ideas underlying the watershed framework for data analysis and regularization: distances, minimum spanning tree, connected filtering and ensemble techniques. We will then present some applications (including for example GIS data) to clustering and semi-supervised learning. If time permits, we will clarify (with the help of the Gamma-convergence framework) the link between minimum-spanning tree optimization and other more classical optimization framework such as, for example, the random-walk approach based on graph-Laplacian.

November 6th, 2018

  • 10h-11h, Alexandre d’Aspremont (ENS Paris), Restarting Frank-Wolfe [pdf]
    Conditional Gradients (aka Frank-Wolfe algorithms) form a classical set of methods for constrained smooth convex minimization due to their simplicity, the absence of projection step, and competitive numerical performance. While the vanilla Frank-Wolfe algorithm only ensures a worst-case rate of O(1/\epsilon), various recent results have shown that for strongly convex functions, the method can be slightly modified to achieve linear convergence. However, this still leaves a huge gap between sublinear O(1/\epsilon) convergence and linear O(\log 1/\epsilon) convergence to reach an $\epsilon$-approximate solution.
    Here, we present a new variant of Conditional Gradients, that can dynamically adapt to the function’s geometric properties using restarts and thus smoothly interpolates between the sublinear and linear regimes.
  • 11h15-12h15, Guillaume Obozinski (Ecole des Ponts), Structured prediction
    I will make an overview of structured prediction a.k.a. structured output learning. A number of problems in statistics and machine learning go beyond regression or multi-class classification, and require to make a more structured decision in the presence of uncertainty: some example include historically image segmentation, pose estimation, and a number of task in NLP such as part of speech tagging, word alignment of sentences from a source language to a target language in machine translation; it is still relevant today for alignment of knowledge bases, for applications in computational biology such as protein function prediction, or side-chain prediction, etc. These problems are very similar in structure to problems considered in operations research, except that the cost defining the objective, which is often a function of input data, is not known and must be learned. This type of formulation could be relevant for operations research problems in the presence of uncertainty when the distribution of the random phenomena to take into account is not known. Several frameworks and formulation have been proposed for structured prediction. I will present the main ones: graphical models and structured SVMs, and discuss some of the algorithms used for learning and prediction. Time permitting I will also talk about the “learn to search” paradigm and deep learning approaches.

There are other events/working groups/seminars on related topics on campus:

  • Friday October 16th, 2018, Christian Soize, Approche probabiliste de machine learning pour les grandes simulations numériques (more details)
  • Thursday, November 22nd, 2018, Robert Gower, Stochastic quasi-gradient methods (more details)
  • Thursday, November 22nd, 2018, Maxime Sangnier, What can a statistician expect from GANs? (more details)
  • Thursday, November 31rd, 2018, Niolas Flammarion, Gen-Oja: A Simple and Efficient Algorithm for Streaming Generalized Eigenvector Computation (more details)
  • Thursday, January 10th, 2019, Nicolas Keriven, Scalable model-free online change-point detection with NEWMA (more details)
  • Thursday, January 17th, 2019, Pierre Bellec, Formule de Stein d’ordre 2 et quelques conséquences en optimisation et statistique avec bruit Gaussien (more details)
  • Thursday, January 31rd, 2019, Robert Gower, Optimal minibatch size for stochastic average gradient descent (more details)
  • Thursday, February 21th, 2019, Damien Garreau, Consistent change-point detection with kernels; Arthur Mensch, Differentiable Dynamic Programming for structured prediction and attention (more details)
  • Thursday, March 14th, 2019, Guillaume Perrin, TBA.