Gabriel Peyré (CNRS and Ecole Normale Supérieure)
Scaling Optimal Transport for High dimensional Learning
Optimal transport (OT) has recently gained lot of interest in machine learning. It is a natural tool to compare in a geometrically faithful way probability distributions. It finds applications in both supervised learning (using geometric loss functions) and unsupervised learning (to perform generative model fitting). OT is however plagued by the curse of dimensionality, since it might require a number of samples which grows exponentially with the dimension. In this talk, I will explain how to leverage entropic regularization methods to define computationally efficient loss functions, approximating OT with a better sample complexity. More information and references can be found on the website of our book “Computational Optimal Transport” https://optimaltransport.github.io/
Samory Kpotufe (Columbia University)
Some Recent Insights on Transfer and Multitask Learning
A common situation in Machine Learning is one where training data is not fully representative of a target population due to bias in the sampling mechanism or due to prohibitive target sampling costs. In such situations, we aim to ‘transfer’ relevant information from the training data (a.k.a. source data) to the target application. How much information is in the source data about the target application? Would some amount of target data improve transfer? These are all practical questions that depend crucially on ‘how far’ the source domain is from the target. However, how to properly measure ‘distance’ between source and target domains remains largely unclear.
In this talk we will argue that much of the traditional notions of ‘distance’ (e.g. KL-divergence, extensions of TV such as D_A discrepancy, density-ratios, Wasserstein distance) can yield an over-pessimistic picture of transferability. Instead, we show that some new notions of ‘relative dimension’ between source and target (which we simply term ‘transfer-exponents’) capture a continuum from easy to hard transfer. Transfer-exponents uncover a rich set of situations where transfer is possible even at fast rates; they encode relative benefits of source and target samples, and have interesting implications for related problems such as ‘multi-task or multi-source learning’.
In particular, in the case of transfer from multiple sources, we will discuss (if time permits) a strange phenomena: no procedure can achieve a rate better than that of having a single data source, even in seemingly mild situations where multiple sources are informative about the target.
The talk is based on earlier work with Guillaume Martinet, and ongoing work with Steve Hanneke.