Theoretical Computer Science Group

Theory Seminar Schedule for Fall 2014

The weekly Theory seminar provides an opportunity to get exposed to cutting edge research in theoretical computer science, to learn to understand technical material from talks and to give technical presentations. Graduate students interested in doing research in algorithms, complexity theory, cryptography, quantum computing and related areas are strongly encouraged to attend.

Date Speaker Title
August 25 Sofya Raskhodnikova L_p testing
September 1 Labor day (no talk)
September 8 Raef Bassily Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds
September 15 Kashyap Dixit Optimal Property Testers Over Product Distributions
September 22 Nithin Varma Pairwise spanners with additive distortion
September 29 (in IST 333) Krzysztof Onak (IBM Watson) A Near-Optimal Algorithm for Testing Graph Isomorphism
October 6 Meiram Murzabulatov TBA
October 13 (in IST 333) Abhradeep Thakurta(Yahoo Research) TBA
October 20 FOCS (no talk)
October 27 Om Thakkar TBA
November 3 TBA TBA
November 10 (in IST 333) Aaron Roth (University of Pennsilvania) TBA
November 17 (in IST 333) Jon Ullman (Columbia University) TBA
November 24 Thanksgiving (no talk)
December 1 TBA TBA
December 8 Ramesh Krishnan TBA

Spring 2007     Fall 2007     Spring 2008     Summer 2008     Fall 2008     Spring 2009     Fall 2009     Spring 2010     Fall 2010     Spring 2011     Fall 2011

    Spring 2012     Fall 2012     Spring 2013     Fall 2013

L_P Testing

We initiate a systematic study of sublinear-time algorithms for approximately testing properties of real-valued data, where the quality of approximation is measured with respect to L_p distances. L_p distances generalize the standard Euclidian distance (L_2) and the Hamming distance (L_0). By real-valued data, we mean datasets that are meaningfully represented as real-valued functions. Our algorithms, called L_p-testers, are required to distinguish datasets that have a required property from those that are far in L_p distance from any dataset with the required property. This is a generalization of standard property testers, which are defined with respect to the Hamming distance. Using general L_p distances is more appropriate for many computational tasks on real-valued data.

We use our framework to design simple and fast algorithms for classic problems, such as testing monotonicity, convexity and the Lipschitz property, and also approximating the distance to monotonicity. For some of these problems, our L_p-testers for p>=1 are faster than possible in the standard property testing framework.

In the talk, we will explain and motivate the new framework, give an overview of the results, and explain some of the techniques.

Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds

The volume of high-quality data that we can access, analyze, and learn from has reached astronomical figures. Since these data often contain sensitive or personally identifiable information, privacy has become a subject of a major concern. As it is hard to intuitively reason about the notion of privacy, we were compelled to use the rigorous mathematical approach for this matter. The notion of differential privacy [Dwork-McSherry-Nissim-Smith, 2006] is one of the most powerful and widely-accepted mathematical definitions of privacy. Roughly speaking, a (randomized) algorithm that takes a data set as an input is said to be differentially private if, for any pair of data sets D and D’ that differ only in one record, the distribution of the algorithm’s output remains ``almost’’ the same whether it is run on D or D’.

In this talk, we consider the problem of Empirical Risk Minimization (ERM) under the differential privacy constraint. The notion of ERM is at the heart of learning theory and statistics, and is a central tool in machine learning algorithms. In the traditional convex ERM, we are given a data set {d_1, …, d_n}, together with a convex set C known as the parameter set. For a given parameter w in C, each data point d_i contributes a loss L(w; d_i). Given a data set {d_1,..., d_n}, the algorithm's goal is to find a parameter w that minimizes the empirical risk, defined as \sum_{i=1}^n L(w;d_i).

In this talk, we show how to construct efficient differentially private algorithms with optimal excess risk (i.e., the least amount of error necessary to satisfy the differential privacy constraint). Namely, we provide efficient algorithms and matching lower bounds for (i) the "general" setting where we only assume the loss function is convex and Lipschitz-continuous and the convex constraint set is bounded; and (ii) the setting where the loss function is also known to be strongly convex. We give separate algorithms (and lower bounds) for the two standard cases of differential privacy: pure $\epsilon$ and $(\epsilon, \delta)$ differential privacy. Perhaps, surprisingly, the techniques used for designing optimal algorithms in the two cases are completely different.

Optimal Property Testers Over Product Distributions

We study algorithms that, given access to a small number of samples from a large dataset, approximately determine whether the dataset satisfies a desired property. More specifically, our algorithms accept if the dataset has the property and reject with high probability if the dataset is far from having the property. The distance to having the property is measured with respect to a known or an unknown distribution on the datasets. Since any dataset can be represented as a function, we study testing properties of functions. In this work, we focus on functions over domains of the form {1,2,..,n}^d (that is, d-dimensional hypergrids). We look at a general class of properties of such functions, called bounded derivative properties (BDP), which includes monotonicity and the Lipschitz property. These two properties are fundamental and have applications in processing database queries and data-privacy, respectively.

We give an optimal tester for BDPs for the case when the distance to the property is measured with respect to a product distribution, that is, a distribution where each coordinate is chosen independently. Our main tool here is a novel dimension reduction which reduces testing properties of functions over {1,2,..,n}^d to testing functions over {1,2,..,n}. This dimension reduction is optimal up to a constant factor. For BDPs of functions over {1,2,...,n}, we design an optimal tester for the case when the distribution is known. Our tester is based on Knuth's construction of binary search trees over {1,2,..,n} with minimum expected depth. As a special case, we obtain an optimal monotonicity tester for {1,2,..,n}, thus improving the tester given by Ailon and Chazelle (Information and Computation, 2006). Our work resolves two open problems given in their work.

Joint work with Deeparnab Chakrabarty, Madhav Jha, and C.Seshadhri (to appear in SODA 2015)

Pairwise spanners with additive distortion

An additive pairwise spanner of an undirected graph is a subgraph in which the shortest distances between certain specified pairs of nodes are larger than the corresponding distances in the original graph by at most an additive term. This additive term is called the distortion of that spanner. In the case when the approximation requirement applies to all pairs of nodes in the graph, the subgraph is simply called an additive spanner.

Additive spanners are very well studied. They have applications in compact routing schemes, near shortest path algorithms, approximate distance oracles, etc. It is NP-hard to compute the sparsest additive pairwise spanner of a graph. The typical goal is to design algorithms which for specific values of distortion construct pairwise spanners which are as sparse as possible.

In the talk we will present a brief survey of the work done in the area of graph spanners, the important algorithmic techniques used in constructing them and high level ideas of some of our algorithms for additive pairwise spanners.

Joint work with T. Kavitha.

A Near-Optimal Algorithm for Testing Graph Isomorphism

The goal of the graph isomorphism problem is to check if two finite graphs are identical if their labels are ignored. Apart from practical applications such as chemical compound identification and circuit verification, the problem is interesting due to its challenging open computational status.

In the property testing framework, Fischer and Matsliah (SICOMP 2008) showed that the number of queries necessary to (approximately) test isomorphism of two unknown graphs in the dense graph model is at least Omega(n) and at most O(n^(5/4)). We essentially close this gap by showing an algorithm that makes n * 2^O(sqrt(log n)) queries.

Distribution testing (Batu et al. [JACM 2013]; Batu et al. [FOCS 2001]) is one of the main tools in this line of research. A major obstacle in the quest for an efficient graph isomorphism tester turns out to be the Omega(n^(2/3)) distribution testing lower bound due to Valiant (SICOMP 2011). We bypass this lower bound by designing a more efficient algorithm that for two distributions on near-isomorphic sets of points is required to reject only if the Earth-Mover Distance between them is large.

Joint work with Xiaorui Sun (Columbia University).