Boosted CVaR Classification
Runtian Zhai, Chen Dan, Arun Sai Suggala, Zico Kolter, Pradeep Ravikumar
To appear in NeurIPS 2021
Paper   Code   Poster   Slides
In this work, we focus on improving the tail performance of machine learning models, which is widely studied in fields such as algorithmic fairness, class imbalance, and risk-sensitive decision making. A common approach is to minimize the CVaR (Conditional Value at Risk) loss, which is the average risk over the tails of the loss. However, we find that in classification tasks, for deterministic models, ERM achieves the minimum CVaR loss, so there is no gain in using CVaR. To circumvent this negative result, we minimize the CVaR loss over randomized classifiers, and particularly we propose to train ensemble models with Boosting. In contrast to the original Boosting which boosts a weak learner to a strong learner, our method boosts an unfair learner to a fair learner. Our method is motivated by a direct relationship between CVaR and LPBoost, which shows that minimizing the CVaR loss of the ensemble model is equivalent to maximizing the objective of LPBoost.
Boosted CVaR

DORO: Distributional and Outlier Robust Optimization
Runtian Zhai*, Chen Dan*, Zico Kolter, Pradeep Ravikumar
In ICML 2021
Paper   Code   Poster   Slides
Many machine learning tasks involve subpopulation shift where the testing data distribution is a subpopulation of the training distribution. For such settings, a line of recent work has proposed the use of a variant of empirical risk minimization(ERM) known as distributionally robust optimization (DRO). In this work, we apply DRO to real, large-scale tasks with subpopulation shift, and observe that DRO performs relatively poorly, and moreover has severe instability. We identify one direct cause of this phenomenon: sensitivity of DRO to outliers in the datasets. To resolve this issue, we propose the framework of DORO, for Distributional and Outlier Robust Optimization. At the core of this approach is a refined risk function which prevents DRO from overfitting to potential outliers. We theoretically prove the effectiveness of the proposed method, and empirically show that DORO improves the performance and stability of DRO with experiments on large modern datasets.

MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius
Runtian Zhai*, Chen Dan*, Di He*, Huan Zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Liwei Wang
In ICLR 2020
Paper   Code   Slides
Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly. In this paper, we propose the MACER algorithm, which learns robust models without using adversarial training but performs better than all existing provable l2-defenses. Recent work shows that randomized smoothing can be used to provide certified l2 radius to smoothed classifiers, and our algorithm trains provably robust smoothed classifiers via MAximizing the CErtified Radius (MACER). The attack-free characteristic makes MACER faster to train and easier to optimize. Our experiments show that MACER runs faster than state-of-the-art adversarial training algorithms, and the learned models achieve larger average certified radius.

Adversarially Robust Generalization Just Requires More Unlabeled Data
Runtian Zhai*, Tianle Cai*, Di He*, Chen Dan, Kun He, John E. Hopcroft, Liwei Wang
arXiv: 1906.00555   Code
Previous works show that significantly more labeled data is required to achieve adversarially robust generalization. In this paper, we show that just more unlabeled data is required. The key insight is based on a risk decomposition theorem, in which the expected robust risk is separated into two parts: the stability part which measures the prediction stability in the presence of perturbations, and the accuracy part which evaluates the standard classification accuracy. As the stability part does not depend on any label information, we can optimize this part using unlabeled data. Inspired by the theoretical findings, we further show that a practical adversarial training algorithm that leverages unlabeled data can improve adversarial robust generalization on MNIST and Cifar-10.
Core Idea