## オープニング

**プログラム委員長**： 佐藤 一誠 （東京大学）

## 招待講演１：Nathan Srebro (TTI-Chicago) (10:10 – 10:40)

### Supervised Learning without Discrimination

**Abstract**: As machine learning is increasingly being used in areas protected by anti discrimination law, or in other domains which are socially and morally sensitive, the problem of algorithmicly measuring and avoiding prohibited discrimination in machine learning is pressing. What does it mean for a predictor to not discriminate with respect to protected group (e.g. according to race, gender, etc)? We propose a notion of non-discrimination that can be measured statistically, used algorithmicly, and avoids many of the pitfalls of previous definitions.

Joint work with Suriya Gunasekar, Mortiz Hardt, Mesrob Ohannessian, Eric Pierce and Blake Woodwoorth

## 招待講演２：Edward Albert Feigenbaum (10:40 – 11:40)

### Advice to Young and New AI Scientists

**Abstract**: The science and engineering of AI is the understanding and computer modeling of I (intelligence). The spectrum of behaviors that we call “I” is very large, and many of these behaviors have not been well understood or modeled. This is very good news for young and new AI scientists and engineers, because most of the interesting problems of “I” are still open for innovation, “breakthrough”, and most importantly “paradigm shift.” AI may be the destiny of computer science and information applications, but that is a future for the young scientists and engineers to invent.

## 企画セッション：国際会議採択論文（前半） (13:00 – 14:30)

### Budgeted stream-based active learning via adaptive submodular maximization (NIPS 2016) [スライド]

**Authors**: Kaito Fujii (Kyoto University) · Hisashi Kashima (Kyoto University)

**講演者**： 藤井 海斗（東京大学）

**Abstract**: Active learning enables us to reduce the annotation cost by adaptively selecting unlabeled instances to be labeled. For pool-based active learning, several effective methods with theoretical guarantees have been developed through maximizing some utility function satisfying adaptive submodularity. In contrast, there have been few methods for stream-based active learning based on adaptive submodularity. In this paper, we propose a new class of utility functions, policy-adaptive submodular functions, and prove this class includes many existing adaptive submodular functions appearing in real world problems. We provide a general framework based on policy-adaptive submodularity that makes it possible to convert existing pool-based methods to stream-based methods and give theoretical guarantees on their performance. In addition we empirically demonstrate their effectiveness comparing with existing heuristics on common benchmark datasets.

### Differential Privacy without Sensitivity (NIPS 2016) [スライド]

**Authors**: Kentaro Minami (The University of Tokyo) · Hiromi Arai (The University of Tokyo) · Issei Sato (The University of Tokyo) · Hiroshi Nakagawa (The University of Tokyo)

**講演者**：南 賢太郎（東京大学）

**Abstract**: The exponential mechanism is a general method to construct a randomized estimator that satisfies (ε,0)(ε,0)–

### Learning Koopman Invariant Subspaces for Dynamic Mode Decomposition (NIPS 2017) [スライド]

**Authors**: Naoya Takeishi (The University of Tokyo) · Yoshinobu Kawahara (Osaka University) · Takehisa Yairi (The University of Tokyo)

**講演者**：武石 直也（東京大学）

**Abstract**: Spectral decomposition of the Koopman operator is attracting attention as a tool for the analysis of nonlinear dynamical systems. Dynamic mode decomposition is a popular numerical algorithm for Koopman spectral analysis; however, we often need to prepare nonlinear observables manually according to the underlying dynamics, which is not always possible since we may not have any a priori knowledge about them. In this paper, we propose a fully data-driven method for Koopman spectral analysis based on the principle of learning Koopman invariant subspaces from observed data. To this end, we propose minimization of the residual sum of squares of linear least-squares regression to estimate a set of functions that transforms data into a form in which the linear regression fits well. We introduce an implementation with neural networks and evaluate performance empirically using nonlinear dynamical systems and applications.

### Positive-Unlabeled Learning with Non-Negative Risk Estimator (NIPS 2017) [スライド]

**Authors**: Ryuichi Kiryo (UTokyo/RIKEN) · Gang Niu (The University of Tokyo) · Marthinus C du Plessis (The University of Tokyo) · Masashi Sugiyama (RIKEN / University of Tokyo)

**講演者**：木了 龍一（東京大学）

**Abstract**: From only \emph{positive}~(P) and \emph{unlabeled}~(U) data, a binary classifier can be trained with PU learning, in which the state of the art is \emph{unbiased PU learning}. However, if its model is very flexible, its empirical risk on training data will go negative and we will suffer from serious overfitting. In this paper, we propose a \emph{non-negative risk estimator} for PU learning. When being minimized, it is more robust against overfitting and thus we are able to train very flexible models given limited P data. Moreover, we analyze the \emph{bias}, \emph{consistency} and \emph{mean-squared-error reduction} of the proposed risk estimator and the \emph{estimation error} of the corresponding risk minimizer. Experiments show that the proposed risk estimator successfully fixes the overfitting problem of its unbiased counterparts.

### Expectation Propagation for t-Exponential Family Using Q-Algebra (NIPS 2017) [スライド]

**Authors**: Futoshi Futami (University of Tokyo/RIKEN) · Issei Sato (The University of Tokyo/RIKEN) · Masashi Sugiyama (RIKEN / University of Tokyo)

**講演者**：二見 太（東京大学）

**Abstract**: Exponential family distributions are highly useful in machine learning since their calculation can be performed efficiently through natural parameters. The exponential family has recently been extended to the \emph{t-exponential family}, which contains Student-t distributions as family members and thus allows us to handle noisy data well. However, since the t-exponential family is defined by the \emph{deformed exponential}, we cannot derive an efficient learning algorithm for the t-exponential family such as expectation propagation (EP). In this paper, we borrow the mathematical tools of q-algebra} from statistical physics and show that the pseudo additivity of distributions allows us to perform calculation of t-exponential family distributions through natural parameters. We then develop an expectation propagation (EP) algorithm for the t-exponential family, which provides a deterministic approximation to the posterior or predictive distribution with simple moment matching. We finally apply the proposed EP algorithm to the Bayes point machine and Student-t process classification, and demonstrate their performance numerically.

### Learning from Complementary Labels (NIPS 2017)

**Authors**: Takashi Ishida (Sumitomo Mitsui Asset Management, The University of Tokyo, RIKEN) · Gang Niu (The University of Tokyo, RIKEN) · Masashi Sugiyama (RIKEN, The University of Tokyo)

**講演者**：石田 隆（三井住友アセットマネジメント，東京大学）

**Abstract**: Collecting labeled data is costly and thus is a critical bottleneck in real-world classification tasks. To mitigate the problem, we consider a complementary label, which specifies a class that a pattern does not belong to. Collecting complementary labels would be less laborious than ordinary labels since users do not have to carefully choose the correct class from many candidate classes. However, complementary labels are less informative than ordinary labels and thus a suitable approach is needed to better learn from complementary labels. In this paper, we show that an unbiased estimator of the classification risk can be obtained only from complementary labels, if a loss function satisfies a particular symmetric condition. We theoretically prove the estimation error bounds for the proposed method, and experimentally demonstrate the usefulness of the proposed algorithms.

## 企画セッション：国際会議採択論文（後半） (15:30 – 17:00)

### Learning Discrete Representations via Information Maximizing Self-Augmented Training (ICML 2017) [スライド]

**Authors**: Weihua Hu (The University of Tokyo / RIKEN) · Takeru Miyato (Preferred Networks, Inc., ATR) · Seiya Tokui (Preferred Networks / The University of Tokyo) · Eiichi Matsumoto (Preferred Networks Inc.) · Masashi Sugiyama (RIKEN / The University of Tokyo)

**講演者**：胡 緯華（東京大学）

**Abstract**: Learning discrete representations of data is a central machine learning task because of the compactness of the representations and ease of interpretation. The task includes clustering and hash learning as special cases. Deep neural networks are promising to be used because they can model the non-linearity of data and scale to large datasets. However, their model complexity is huge, and therefore, we need to carefully regularize the networks in order to learn useful representations that exhibit intended invariance for applications of interest. To this end, we propose a method called Information Maximizing Self-Augmented Training (IMSAT). In IMSAT, we use data augmentation to impose the invariance on discrete representations. More specifically, we encourage the predicted representations of augmented data points to be close to those of the original data points in an end-to-end fashion. At the same time, we maximize the information-theoretic dependency between data and their predicted discrete representations. Extensive experiments on benchmark datasets show that IMSAT produces state-of-the-art results for both clustering and unsupervised hash learning.

### Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data (ICML 2017) [スライド]

**Authors**: Tomoya Sakai (The University of Tokyo / RIKEN) · Marthinus du Plessis (N/A) · Gang Niu (University of Tokyo) · Masashi Sugiyama (RIKEN / The University of Tokyo)

**講演者**：坂井 智哉（東京大学）

**Abstract**: Most of the semi-supervised classification methods developed so far use unlabeled data for regularization purposes under particular distributional assumptions such as the cluster assumption. In contrast, recently developed methods of classification from positive and unlabeled data (PU classification) use unlabeled data for risk evaluation, i.e., label information is directly extracted from unlabeled data. In this paper, we extend PU classification to also incorporate negative data and propose a novel semi-supervised learning approach. We establish generalization error bounds for our novel methods and show that the bounds decrease with respect to the number of unlabeled data without the distributional assumptions that are required in existing semi-supervised learning methods. Through experiments, we demonstrate the usefulness of the proposed methods.

### Asymmetric Tri-training for Unsupervised Domain Adaptation (ICML 2017) [スライド]

**Authors**: Saito Kuniaki (The University of Tokyo) · Yoshitaka Ushiku (The University of Tokyo) · Tatsuya Harada (The Univ. of Tokyo / RIKEN)

**講演者**：齋藤 邦章（東京大学）

**Abstract**: It is important to apply models trained on a large number of labeled samples to different domains because collecting many labeled samples in various domains is expensive. To learn discriminative representations for the target domain, we assume that artificially labeling the target samples can result in a good representation. Tri-training leverages three classifiers equally to provide pseudo-labels to unlabeled samples; however, the method does not assume labeling samples generated from a different domain. In this paper, we propose the use of an \textit{asymmetric} tri-training method for unsupervised domain adaptation, where we assign pseudo-labels to unlabeled samples and train the neural networks as if they are true labels. In our work, we use three networks \textit{asymmetrically}, and by \textit{asymmetric}, we mean that two networks are used to label unlabeled target samples, and one network is trained by the pseudo-labeled samples to obtain target-discriminative representations. Our proposed method was shown to achieve a state-of-the-art performance on the benchmark digit recognition datasets for domain adaptation.

### Multichannel End-to-end Speech Recognition (ICML 2017)

**Authors**: Tsubasa Ochiai (Doshisha University) · Shinji Watanabe (MITSUBISHI ELECTRIC RESEARCH LABORATORIES) · Takaaki Hori (MITSUBISHI ELECTRIC RESEARCH LABORATORIES) · John Hershey (MITSUBISHI ELECTRIC RESEARCH LABORATORIES)

**講演者**：落合 翼（同志社大学）

**Abstract**: The field of speech recognition is in the midst of a paradigm shift: end-to-end neural networks are challenging the dominance of hidden Markov models as a core technology. Using an attention mechanism in a recurrent encoder-decoder architecture solves the dynamic time alignment problem, allowing joint end-to-end training of the acoustic and language modeling components. In this paper we extend the end-to-end framework to encompass microphone array signal processing for noise suppression and speech enhancement within the acoustic encoding network. This allows the beamforming components to be optimized jointly within the recognition architecture to improve the end-to-end speech recognition objective. Experiments on the noisy speech benchmarks (CHiME-4 and AMI) show that our multichannel end-to-end system outperformed the attention-based baseline with input from a conventional adaptive beamformer.

### Selective Inference for Sparse High-Order Interaction Models (ICML 2017)

**Authors**: Shinya Suzumura (Nagoya Institute of Technology) · Kazuya Nakagawa (Nagoya Institute of Technology) · Yuta Umezu (Nagoya Institute of Technology) · Koji Tsuda (University of Tokyo / RIKEN) · Ichiro Takeuchi (Nagoya Institute of Technology / RIKEN)

**講演者**：鈴村 真矢（名古屋工業大学）

**Abstract**: Finding statistically significant high-order interactions in predictive modeling is important but challenging task because the possible number of high-order interactions is extremely large (e.g., >1017>1017). In this paper we study feature selection and statistical inference for sparse high-order interaction models. Our main contribution is to extend recently developed selective inference framework for linear models to high-order interaction models by developing a novel algorithm for efficiently characterizing the selection event for the selective inference of high-order interactions. We demonstrate the effectiveness of the proposed algorithm by applying it to an HIV drug response prediction problem.

### Differentially Private Chi-squared Test by Unit Circle Mechanism (ICML 2017)

**Authors**: Kazuya Kakizaki (University of Tsukuba) · Jun Sakuma (University of Tsukuba / RIKEN AIP) · Kazuto Fukuchi (University of Tsukuba)

**講演者**：柿崎 和也（NECセキュリティ研究所）

**Abstract**: This paper develops differentially private mechanisms for χ2χ2 test of independence. While existing works put their effort into properly controlling the type-I error, in addition to that, we investigate the type-II error of differentially private mechanisms. Based on the analysis, we present unit circle mechanism: a novel differentially private mechanism based on the geometrical property of the test statistics. Compared to existing output perturbation mechanisms, our mechanism improves the dominated term of the type-II error from O(1)O(1) to O(exp(−√\

### Evaluating the Variance of Likelihood-Ratio Gradient Estimators (ICML 2017) [スライド]

**Authors**: Seiya Tokui (Preferred Networks / The University of Tokyo) · Issei Sato (The University of Tokyo / RIKEN)

**講演者**：得居 誠也（株式会社 Preferred Networks）

**Abstract**: The likelihood-ratio method is often used to estimate gradients of stochastic computations, for which baselines are required to reduce the estimation variance. Many types of baselines have been proposed, although their degree of optimality is not well understood. In this study, we establish a novel framework of gradient estimation that includes most of the common gradient estimators as special cases. The framework gives a natural derivation of the optimal estimator that can be interpreted as a special case of the likelihood-ratio method so that we can evaluate the optimal degree of practical techniques with it. It bridges the likelihood-ratio method and the reparameterization trick while still supporting discrete variables. It is derived from the exchange property of the differentiation and integration. To be more specific, it is derived by the reparameterization trick and local marginalization analogous to the local expectation gradient. We evaluate various baselines and the optimal estimator for variational learning and show that the performance of the modern estimators is close to the optimal estimator.