My primary research interests lie at the interface of machine learning and optimization with a focus on developing theoretically principled and practically efficient algorithms for learning from massive datasets and complex domains. The variety of results in my research rooted in a very diverse set of insights and techniques from optimization, statistics, concentration of measure, empirical processes, and geometric functional analysis.
Optimization for machine learning
The growth in size of modern datasets to the scales of terabyte and petabyte makes massive datasets to become increasingly prevalent. This trend poses new challenges for computational and statistical sciences as the conventional learning algorithms are often entirely infeasible to handle this volume of data.
A significant strand of my research has focused on developing new scalable and faster stochastic optimization algorithms for machine learning problems. In particular, ove the years we have developed a suite of stochastic optimization algorithms which are independent of sample size and can be applied to wide spectrum of real world applications with huge amount of training samples or complex domains. My research on stochastic optimization addressed several important issues that were overlooked by the previous studies, including how to reduce the number of expensive projection steps in learning over complex domains, how to simultaneously optimize multiple stochastic objectives, how to exploit the smoothness of loss function to reduce sample complexity, how to reduce the variance and obtain linear convergent rates for stochastic optimization by exploiting the nite sums structure of empirical risk minimization, how to optimize non-smooth functions with structural constraints, how to optimize non-smoothness functions with complex non-separable constraints, and showing high-probability upper and lower bounds for optimizing exp-concave losses in machine learning.
Randomized methods in machine learning
Random projection and sketching methods have been widely used in data classiffication. It maps highdimensional data into a low-dimensional subspace in order to reduce the computational cost in solving the related optimization problem. While previous studies are focused on analyzing the classiffication performance in the low-dimensional space, we raised the following recovery problem: Is it possible to solve a small-scaled sketched version of the original high-dimensional optimization problem and accurately recover the solution to original problem from the solution learned after sketching?.
Answering this question addresses the high-dimensionality problem of data as we can efficiently solve the problem in a low dimensional space and then recover the solution to the original problem. We first proposed a simple two-stage algorithm that uses the dual solution of the low-dimensional optimization problem obtained by random projection to recover the optimal solution to the original problem with high accuracy. Recently, we generalized this approach to reduce both the sample size and dimensionality simultaneously by showing a duality relation between random projection based sketching (sample reduction) and dimensionality reduction.
Online (adversarial) learning
I have been interested in problems related to the popular area of sequential decision making which is often modeled as a game between an algorithm and an adversary. The goal here is to minimize the regret defined as the difference between the loss of the learner and the best possible loss that could have been accumulated in hindsight if the algorithm knew all the moves by the adversary in advance. As part of thesis, we introduced the gradual variation, measured by the sum of the distances between every two consecutive loss functions, to asses the performance of online learning algorithm in gradually evolving environments such as stock prediction. We also proposed algorithms that can achieve a regret bound which only scales as the square root of the gradual variation. This work received the Best Student Paper Award at Conference on Learning Theory (COLT). Recently, I am mostly interested in online learning problems with many correlated users or adversaries.
Statistical learning thory
The advent of massive data sets requires beyond sample complexity, which is the main focus of traditional statistical learning theory, the computational complexity of learning algorithms also to be considered in analysis of large-scale learning algorithms. The computational complexity might include memory and sequential access constraints in streaming applications, communication constraints in distributed learning, missing features and noisy labels when the available data is corrupted, memory constraints in applications such as principal component analysis and kernel learning with super-linear memory requirements-- to name a few. However, currently we have little understanding how such these constraints fundamentally affect the performance. One of my research ambitions is to explore the intimate relationship between performance and imposed constraints, e.g., my work on learning from benign sequences in online learning, or passive learning with known target risk.
Besides the fundamental research in machine learning, I have been actively involved in a number of other projects, e.g., learning from noisy labels or projects that exploit the machine learning techniques for big data analysis, ranging from social network analysis, image retrieval, to gene expression data analysis.