Working group “Machine learning and optimization”

This working group, initiated in 2018, resumes in 2019.  It will be held on Tuesday mornings. It is organized by Romual Elie, Walid Hachem, Geneviève Robin and Gabriel Stoltz. There is a mailing list if you want (gt-appr-opt followed by @liste.enpc.fr; write to Gabriel Stoltz to subscribe).  It is open in particular to all members of the University Paris-Est (and beyond) interested in these themes.

The sessions of the working group will be held in the seminar room of CERMICS (2nd floor of Coriolis building, left wing, see here for instructions to reach the building).

May 12th, 2020

  • 10h-11h Alessandro Rudi (INRIA Paris), Introduction on reproducing kernel Hilbert spaces, with application to machine learning
    In this talk, I will introduce the machinery of reproducing kernel Hilbert spaces (RKHS), which is a key tool when we need to represent a rich space of functions over an arbitrary domain and have at the same time good approximation properties. A key example where RKHS are used is in supervised learning. It will be developed during the talk starting from the definition of the supervised learning problem and its solution via empirical risk minimization over suitable RKHSs.

March 31. 2020 CANCELLED

  • 10h-11h Alessandro Rudi (INRIA Paris), CANCELLED  
  • 11h15-12h15 CANCELLED
    Tiffany Vlaar (University of Edinburgh), Thermodynamic sampling algorithms for neural network parametrization

Motivated by intuitions from molecular dynamics, we discuss a variety of methods for sampling of high-dimensional distributions, which rely on discretised stochastic differential equations. These methods have the potential to improve upon existing optimisation procedures for neural network parameterisation, such as stochastic gradient descent and stochastic gradient Langevin dynamics. In particular, I will discuss Adaptive Langevin (AdL) dynamics [1,2,3] and our AdLaLa algorithm [4], which uses a partitioning of the neural network parameters based on natural layer structure to obtain rapid convergence for certain classification problems.

[1] A. Jones and B. Leimkuhler. Adaptive stochastic methods for sampling driven molecular systems, Journal of Chemical Physics 135, 2011.

[2] N. Ding, Y. Fang, R. Babbush, C. Chen, R. D. Skeel, and H. Neven. Bayesian sampling using stochastic gradient thermostats, NIPS, 2014.

[3] B. Leimkuhler, M. Sachs, and G. Stoltz. Hypocoercivity properties of adaptive Langevin dynamics, preprint arXiv:1908.09363, 2019.

[4] B. Leimkuhler, C. Matthews, and T. Vlaar. Partitioned Integrators for Thermodynamic Parameterisation of Neural Networks, Foundations of Data Science 1 (4) , 2019. Available: arXiv:1908.11843.

 

February 4th. 2020

  • 10h-11h Steve Oudot (INRIA Saclay, École Polytechnique), The pre-image problem from a topological perspective   

This talk will be a review of the efforts of the Topological Data Analysis (TDA) community to tackle the preimage problem. After a general introduction on TDA, the main focus will be on recent attempts to invert the TDA operator. While this line of work is still in itsinfancy, the hope on the long run is to use such inverses for featureinterpretation. The mathematical tools involved in the analysis come mainly from metric geometry, spectral theory, and the theory of constructible functions—specific pointers will be given in the course of the exposition.

  • 11h15-12h15 Marie-Liesse Cauwet (ESIEE), Noisy black-box optimization

In this talk, first, we will look at a few examples that motivate the study of numerical black-box optimization in the presence of noise, which occurs when the evaluation given by an oracle at a point is inaccurate. Then we will precise the model of noise considered and look at different approaches to tackle this type of problem. In particular we will be interested into the dichotomy between the classes of so-called comparison-based algorithms, that do not rely on the function value itself, and value-based algorithms, such as algorithms that approximate gradient by finite differences. Which one of these classes is more robust against noise in a black-box problem? Concretely, for a given real-world optimization problem, which algorithm to select?

 

December, 17th, 2019 CANCELLED

  • 10h-11h Steve Oudot (INRIA Saclay, École Polytechnique), The pre-image problem from a topological perspective   

This talk will be a review of the efforts of the Topological Data Analysis (TDA) community to tackle the preimage problem. After a general introduction on TDA, the main focus will be on recent attempts to invert the TDA operator. While this line of work is still in itsinfancy, the hope on the long run is to use such inverses for featureinterpretation. The mathematical tools involved in the analysis come mainly from metric geometry, spectral theory, and the theory of constructible functions—specific pointers will be given in the course of the exposition.

  • 11h15-12h15 David Picard (ENPC, LIGM), Image Similarity: from Matching Kernels to Deep Metric Learning

Being able to define a similarity between images is a key task in computer vision as it is a necessary step to solve many popular problems such as image retrieval, automatic labeling, segmentation, etc. A historic approach inspired by stereo-analysis consists in counting the number of nearly identical regions of interest between two images. However, this approach is not compatible with machine learning tools and has to be adapted using what is known as matching kernels. In this talk, we show examples of these matching kernels and how their linearization leads to powerful representation learning techniques. We draw a parallel between these approaches and recent deep metric learning developments and we show that both are trying to solve a similar problem of distribution matching. We finally propose to solve this distribution matching problem in deep metric learning by introducing a high order moment based regularization criterion.

October, 1st, 2019

  • 10h-11h Romain Couillet (CentraleSupélec and GIPSA-lab), Random Matrix Advances in Machine Learning [slides]
    Machine learning algorithms, starting from elementary yet popular ones, are difficult to theoretically analyze as (i) they are data-driven, and (ii) they rely on non-linear tools (kernels, activation functions). These theoretical limitations are exacerbated in large dimensional datasets where standard algorithms behave quite differently than predicted, if not completely fail.
    In this talk, we will show how random matrix theory (RMT) answers all these problems. We will precisely show that RMT provides a new understanding and various directions of improvements for kernel methods, semi-supervised learning, SVMs, community detection on graphs, spectral clustering, etc. Besides, we will show that RMT can explain observations made on real complex datasets in as advanced methods as deep neural networks.
  • 11h15-12h15 Viet Chi Tran (LAMA, UPEM), Random walks on simplicial complexes [slides]
    (joint work with T. Bonis, L. Decreusefond and Z. Zhang)
    A natural and well-known way to discover the topology of random structures (such as a random graph G), is to have them explored by random walks. The usual random walk jumps from a vertex of G to a neighboring vertex, providing information on the connected components of the graph G. The number of these connected components is the Betti number \beta_0. To gather further information on the higher Betti numbers that describe the topology of the graph, we can consider the simplicial complex C associated to the graph G: a k-simplex (edge for k=1, triangle for k=2, tetrahedron for k=3 etc.) belongs to C if all the lower k-1-simplices that constitute it also belong to the C. For example, a triangle belongs to C if its three edges are in the graph G. Several random walks have already been propose recently to explore these structures, mostly in Computer Science. We propose a new random walk, whose generator is related to a Laplacian of higher order of the graph, and to the Betti number \beta_k. A rescaling of the walk is also proposed.

May, 14th, 2019

  • 10h-11h Lenka Zdeborova (Institut de Physique Theorique, CEA Saclay), Statistical physics of computational problems [slides]
    What are the problems we can solve using a computer? is one of the very fundamental question in science. We will describe how do we use statistical physics to address this question. We will discuss what insights does physics bring to the field of algorithmic hardness and how is this insight used to develop better algorithms. We will describe examples of applications in artificial neural networks, compressed sensing or data clustering.
  • 11h15-12h15 Etienne Come (Ifsttar), Clustering et données massives de mobilités quelques problèmes, and Jean-Philippe Tarel (Ifsttar), Combined stereo reconstruction and defogging [slides for first talk, slides for the second one]
    Stereo reconstruction serves many outdoor applications, and thus sometimes faces difficulties with foggy weather. However, fog provides depth cues for far away objects. By taking advantages of both stereo and fog cues, stereo reconstruction can be improved in presence of fog. We thus propose an algorithm for combined stereo reconstruction and defogging, based on minimizing adequate energy.

April, 2nd, 2019

  • 10h-11h Bruno Scherrer (Université de Lorraine et Inria), On the use of non-stationary policies for stationary sequential decision problems [slides, notes]
    We consider the problem of optimal control in discrete time, formalized by Markov decision processes. We will start by describing this model and some standard results: computation of an optimal non-stationary policy by dynamic programming, existenceof optimal stationary policy for stationary problems and related algorithms. We will then focus on the situation where the control problem that we want to solve is so large that we can only consider approximate forms of dynamic programming, which rely, for example, on solving a sequence of regression problems. Although there is a stationary solution to the stationary control problem, it will be shown (via optimal performance guarantees) that it is, in general, preferable to construct non-stationary approximate solutions.
  • 11h10-12h10 Loic Landrieu (IGN), Semantic Segmentation of 3D Point Clouds  [presentation]
    Recent breakthroughs in 3D data analysis have lead to significant improvements in the quality of 3D data semantic analysis. In this talk, we present an overview of the state-of-the-art of the field, starting from the traditional pointwise classification/regularization approach to modern deep learning methods. In particular, we will discuss the use of neural networks structured by large-scale, nonconvex, graph-structured optimization problems. This approach allows for the use of powerful general graph convolutions and graph-structured metric learning schemes on point clouds.

February 19th, 2019

  • 10h-11h Emilie Kaufmann (CNRS & Université de Lille & Inria Lille), New tools for Adaptive Testing and Applications to Bandit Problems
    I will introduce a general framework for sequential, adaptive testing of multiple composite hypotheses, that are possibly overlapping. This framework is motivated by several identification problems in multi-armed bandit models, whose applications range from (adaptive) A/B/C Testing to (adaptive) game tree exploration, a.k.a. Monte-Carlo Tree Search.
    I will first introduce a generic stopping rule for those tests, based on Generalized Likelihood Ratio statistics, and prove its correctness in some cases using new self-normalized concentration inequalities. I will then discuss the sample complexity of this stopping rule when coupled with a good sampling rule, that is the minimal number of samples needed before stopping the test. In particular, we will propose an optimal strategy for (epsilon)-best arm identification in a bandit model.
  • 11h10-12h10 Michel de Lara (CERMICS), Design of lower bound convex programs for exact sparse optimization [presentation]
    (Joint work with Jean-Philippe Chancelier) In exact sparse optimization problems, one looks for solution that have few non-zero components. We consider problems where sparsity is exactly measured by the non convex $l_0$ pseudo norm (and not by susbtitute penalizing terms). First, we display a suitable conjugacy for which we show that the $l_0$ pseudo norm is “convex” in the sense of generalized convexity (equal to its biconjugate). As a corollary, we also show that the $l_0$ pseudo norm coincides, on the sphere, with a convex lsc function. Second, thus equipped, we display a lower bound for the original exact sparse optimization problem. Under mild additional assumptions, this bound is a convex minimization program over the unit ball of a so-called support norm. Third, we introduce generalized sparse optimization, where the solution is searched among a union of subspaces. We provide a systematic way to design norms and lower bound convex minimization programs over their unit ball. Thus, we yield an interpretation for most of the sparsity-inducing norms used in machine learning

     

    Jean-Philippe Chancelier, Michel De Lara. Lower Bound Convex Programs for Exact Sparse Optimization, HAL, 02/2019, 32 pages.

    Jean-Philippe Chancelier, Michel De Lara. A Suitable Conjugacy for the l0 Pseudonorm, HAL, 02/2019, 20 pages.

     

January 8th, 2019

  • 10h-11h Olivier Fercoq (Telecom Paris), Smooth Primal-Dual Coordinate Descent Algorithms for Nonsmooth Convex Optimization [pdf]
    We propose a new randomized coordinate descent method for a convex optimization template with broad applications. Our analysis relies on a novel combination of four ideas applied to the primal-dual gap function: smoothing, acceleration, homotopy, and coordinate descent with non-uniform sampling. As a result, our method features the first convergence rate guarantees among the coordinate descent methods, that are the best-known under a variety of common structure assumptions on the template. We provide numerical evidence to support the theoretical results with a comparison to state-of-the-art algorithms.
  • 11h15-12h15, Laurent Najman (ESIEE), A morphological point of view on graph-based classification and clustering [pdf]
    In this talk, we will introduce the maximum margin partition problem on graphs. This problem can be seen as the “dual” of the maximum margin boundary problem, of which SVM is a well-known solution. We will see that the maximum margin partition problem naturally leads to different variants of the watershed, the main tool developed in mathematical morphology for image segmentation. We will survey the main ideas underlying the watershed framework for data analysis and regularization: distances, minimum spanning tree, connected filtering and ensemble techniques. We will then present some applications (including for example GIS data) to clustering and semi-supervised learning. If time permits, we will clarify (with the help of the Gamma-convergence framework) the link between minimum-spanning tree optimization and other more classical optimization framework such as, for example, the random-walk approach based on graph-Laplacian.

November 6th, 2018

  • 10h-11h, Alexandre d’Aspremont (ENS Paris), Restarting Frank-Wolfe [pdf]
    Conditional Gradients (aka Frank-Wolfe algorithms) form a classical set of methods for constrained smooth convex minimization due to their simplicity, the absence of projection step, and competitive numerical performance. While the vanilla Frank-Wolfe algorithm only ensures a worst-case rate of O(1/\epsilon), various recent results have shown that for strongly convex functions, the method can be slightly modified to achieve linear convergence. However, this still leaves a huge gap between sublinear O(1/\epsilon) convergence and linear O(\log 1/\epsilon) convergence to reach an $\epsilon$-approximate solution.
    Here, we present a new variant of Conditional Gradients, that can dynamically adapt to the function’s geometric properties using restarts and thus smoothly interpolates between the sublinear and linear regimes.
  • 11h15-12h15, Guillaume Obozinski (Ecole des Ponts), Structured prediction
    I will make an overview of structured prediction a.k.a. structured output learning. A number of problems in statistics and machine learning go beyond regression or multi-class classification, and require to make a more structured decision in the presence of uncertainty: some example include historically image segmentation, pose estimation, and a number of task in NLP such as part of speech tagging, word alignment of sentences from a source language to a target language in machine translation; it is still relevant today for alignment of knowledge bases, for applications in computational biology such as protein function prediction, or side-chain prediction, etc. These problems are very similar in structure to problems considered in operations research, except that the cost defining the objective, which is often a function of input data, is not known and must be learned. This type of formulation could be relevant for operations research problems in the presence of uncertainty when the distribution of the random phenomena to take into account is not known. Several frameworks and formulation have been proposed for structured prediction. I will present the main ones: graphical models and structured SVMs, and discuss some of the algorithms used for learning and prediction. Time permitting I will also talk about the “learn to search” paradigm and deep learning approaches.

There are other events/working groups/seminars on related topics on campus:

  • Friday October 16th, 2018, Christian Soize, Approche probabiliste de machine learning pour les grandes simulations numériques (more details)
  • Thursday, November 22nd, 2018, Robert Gower, Stochastic quasi-gradient methods (more details)
  • Thursday, November 22nd, 2018, Maxime Sangnier, What can a statistician expect from GANs? (more details)
  • Thursday, November 31rd, 2018, Niolas Flammarion, Gen-Oja: A Simple and Efficient Algorithm for Streaming Generalized Eigenvector Computation (more details)
  • Thursday, January 10th, 2019, Nicolas Keriven, Scalable model-free online change-point detection with NEWMA (more details)
  • Thursday, January 17th, 2019, Pierre Bellec, Formule de Stein d’ordre 2 et quelques conséquences en optimisation et statistique avec bruit Gaussien (more details)
  • Thursday, January 31rd, 2019, Robert Gower, Optimal minibatch size for stochastic average gradient descent (more details)
  • Thursday, February 21th, 2019, Damien Garreau, Consistent change-point detection with kernels; Arthur Mensch, Differentiable Dynamic Programming for structured prediction and attention (more details)
  • Thursday, March 14th, 2019, Guillaume Perrin, TBA.