Theoretical Computer Science Group

Theory Seminar Schedule for Fall 2010

The weekly Theory seminar provides an opportunity to get exposed to cutting edge research in theoretical computer science, to learn to understand technical material from talks and to give technical presentations. Graduate students interested in doing research in algorithms, complexity theory, cryptography, quantum computing and related areas are strongly encouraged to attend.

Date Speaker Title
Aug 23 Madhav Jha Lower Bounds on Local Monotonicity Reconstruction from Transitive Closure Spanners
Sofya Raskhodnikova Approximation Algorithms for Min-Max Generalization Problems
Aug 30 No Seminar
Sep 6 Labor Day
Sep 13 Adam Smith Codes for Computationally Limited Channels
Sep 20 Fang Song Zero-Knowlede in a Quantum World.
Sep 27 Ishan Behoora A Local Heuristic for the k-Medians Problem and Its Applications
Oct 4 Swarat Chauduri A Hands-On Introduction to Automated Theorem-Proving
Oct 11 Kashyap Dixit Local Ratio Approximation Algorithms: An Introduction
Oct 18 Krzysztof Onak Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity
Oct 25 No Seminar
Nov 1 Grigory Yaroslavtsev Two-Party Differential Privacy and Deterministic Extraction from Santha-Vazirani Sources
Nov 8 Abhradeep Guha Thakurta Introduction to Fourier Analysis of Boolean Functions
Nov 15 Youngtae Youn Computing Market Equilibrium Prices for Linear Fisher Markets
Nov 22 Thanksgiving Break
Nov 29 Huiwen Yu Finding, Minimizing and Counting Weighted Subgraphs
Dec 6 Dayu Yuan Indexing Feature Selection for Graph Querying

Spring 2007     Fall 2007     Spring 2008     Summer 2008     Fall 2008     Spring 2009     Fall 2009     Spring 2010

Lower Bounds on Local Monotonicity Reconstruction from Transitive Closure Spanners

Consider an algorithm A computing on a large data set D, say, an array of numbers. s correctness (and sometimes efficiency) may be contingent upon the data set satisfying a certain structural property. For example, A may require that D is sorted. In such situations, it may be desirable that A access D via a "filter". The filter ensures that data seen by A always satisfy the required property, modifying D at few on the if required. This model of property-preserving data reconstruction was introduced in [Ailon Chazelle Comandur Liu, Algorithmica 2008]. Extending this notion, Saks and Seshadhri (SIAM Journal on Computing, 2010) gave a filter for the property of monotonicity (aka sortedness). Specifically, given access to an oracle for an almost monotone function f : [m]^d -> R, the filter can quickly evaluate a related function g : [m]^d -> R which is guaranteed to be monotone. Furthermore, the filter can be implemented in a distributed manner.
In this work, we show a new connection between efficiency of a monotonicity filter and the size of 2-transitive-closure spanner (2-TC-spanner). The latter were introduced in [Bhattacharyya Grigorescu Jung Raskhodnikova Woodruff, SODA 2009]. Given a graph G = (V,E), a graph of H on the same vertex set is a 2-TC-spanner of G if distance between vertices u and v in H is at most 2 iff G has a path from u to v. We show that an efficient local monotonicity filter implies a sparse 2-TC-spanner of the directed hypergrid (hypercube), providing a new technique for proving lower bounds for such filters. We present tight upper and lower bounds on the size of the sparsest 2-TC-spanners of the directed hypercube and hypergrid. These bounds imply tighter lower bounds for local monotonicity filters that nearly match the known upper bounds.
Joint work with Arnab Bhattacharyya, Elena Grigorescu, Kyomin Jung, Sofya Raskhodnikova and David Woodruff.

Approximation Algorithms for Min-Max Generalization Problems

We provide improved approximation algorithms for the min-max generalization problems considered by Du, Eppstein, Goodrich, and Lueker. In min-max generalization problems, the input consists of data items with weights and a lower bound w_lb, and the goal is to partition individual items into groups of weight at least w_lb, while minimizing the maximum weight of a group. The rules of legal partitioning are specic to a problem. Du et al. consider several problems in this vein:
  1. partitioning a graph into connected subgraphs,
  2. partitioning unstructured data into arbitrary classes and
  3. partitioning a 2-dimensional array into non-overlapping contiguous rectangles (subarrays) that satisfy the above size requirements.

We significantly improve approximation ratios for all the problems considered by Du et al., and provide additional motivation for these problems. Moreover, for the first problem, while Du et al. give approximation algorithms for specific graph families, namely, 3-connected and 4-connected planar graphs, no polynomial time approximation algorithm that works for all graphs was known prior to this work.
Joint work with Piotr Berman.

Codes for Computationally Limited Channels

There are, crudely, two popular ways to model the noise that can occur when transmitting over an imperfect channel: "stochastic" models, which assume a specific probability distribution on the patterns of errors, and "adversarial" models, which limit only the number (or magnitude) of errors that can occur. Codes that can transmit reliably in the presence of adversarial errors are useful since they work in a wide variety of conditions. However, designing such codes is hard since the errors can be tailored to the particular codeword being transmitted.

Claude Shannon, in his seminal work in information theory, developed tools for determining the maximal rate at which information can be transmitted in stochastic models. For example, we know that for the binary symmetric channel $BSC_p$ which flips each transmitted bit independently with probability $p$, there exist binary codes of rate arbitrarily close to $1-H(p)$ that enable reliable information transmission with exponentially small probability of mistaken decoding. Here $H(\cdot)$ is the binary entropy function. The quantity $1-H(p)$ is called the (Shannon) capacity of the $\bsc_p$ channel. But what if the errors are adversarial? If the channel can corrupt up to a fraction $p$ of symbols in an arbitrary manner *after* seeing the codeword, it is known that for error-free communication to be possible, the rate has to be much smaller than the Shannon capacity $1-H(p)$.

In this talk, we survey a line of work which considers relaxations of the model that enable achieving capacity even against adversarial errors. The talk will focus on models which limit the channel to *computationally simple behavior* and use ideas from cryptography to reason about the channel. Such settings were considered in several previous works, notably those of Lipton (STACS 1994) and Micali et al.~(TCC 2005); both of those works showed how to achieve the Shannon capacity in the presence of arbitrary polynomial-time channels, at the cost of additional setup assumptions (either shared randomness or a public-key infrastructure).

We conclude with some recent results (joint with V. Guruswami, to appear at FOCS 2010) on removing these set-up assumptions.

Zero-Knowlede in a Quantum World

Fang will give us a brief introduction to zero-knowledge protocols, explain why they are problematic if the verifier ("adversary") possesses even very limited quantum computational power, and sketch a result of Watrous' (STOC 2006) that gives us some initial tools for addressing the problems.

J. Watrous. Zero-knowledge against quantum attacks. SIAM Journal on Computing 39(1): 25

A Local Heuristic for the k-Medians Problem and Its Applications

Given a graph G, the goal in the k-medians problem is to find a set of k distinguished vertices ("centers") which minimize the sum, over all vertices in G, of the distance from a vertex to its nearest center. We will look at a local search heuristic for the k-medians problem and prove that the heuristic provides a 3-approximation to the global optimum (a result due to Arya et al., STOC 2001). Afterwards, we will look at how we can use results from the proof to generate differentially private approximation algorithms for k-median (following Gupta et al., SODA 2010).

Papers we will discuss:

A Hands-On Introduction to Automated Theorem-Proving

What is a proof, and what makes some proofs harder to find, write and understand than others? Swarat won't answer any of these questions during the talk. We will, however, acquire some fodder for thinking about these topics by learning about software that helps a human generate machine-verifiable proofs. We'll see a few examples of proofs that are easy to automate, and also get a sense of which types of proofs require more human help (at least, with this system).

The purpose of this seminar is (a) to educate attendees about an interesting topic at the intersection of logic and computer science and (b) take a first step towards developing some course materials to help students understand rigorous reasoning.

In order to make it more interactive, we would like as many people as possible to get the software up and running on their laptops before the seminar. See instructions below.

Two of the examples we will work through will consist of verifying the solutions to two short puzzles (below). Other examples will come from basic discrete math and algorithms.

Local Ratio Approximation Algorithms: An Introduction

The "local ratio" framework for the design of approximation algorithms simplifies the analysis of many classic approximation algorithms. It has also led to improvements in the state of the art for many problems, in terms of both running time and approximation ratio. We'll see the basic framework and how it applies to example problems in several domains: partial vertex cover, partial hitting set and interval scheduling.

Material for the talk is drawn from the following survey:

Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity

We present a near-linear time algorithm for approximating the edit distance between two strings to within a significantly better factor than previously known. This result emerges in our investigation of edit distance from a new perspective, namely a model of asymmetric queries, for which we present near-tight query complexity bounds. Another consequence of this new model is the first rigorous separation between edit distance and Ulam distance, which we obtain by tracing the hardness of edit distance back to a phenomenon that was not used before.

Joint work with Alexandr Andoni and Robert Krauthgamer.

Paper link:

Two-Party Differential Privacy and Deterministic Extraction from Santha-Vazirani Sources

The talk presents recent results (from a paper at FOCS 2010) on "differential privacy" in a distributed setting: two parties holding sensitive data would like to analyze the union of their data sets while preserving the privacy of the individuals whose data they hold.

The paper give almost tight bounds on the accuracy of such analysis for the Hamming distance function. These bounds show the contrast between information theoretic and computational differential privacy in the two-party setting and also between the two-party and the client-server setting. The proof technique introduces a connection between differential privacy and deterministic extraction from independent Santha-Vazirani sources.

The talk is mostly based on the paper "The Limits of Two-Party Differential Privacy" by Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar and Salil Vadhan (

Introduction to Fourier Analysis of Boolean Functions

In this talk we give a brief introduction to Fourier analysis of boolean functions. We first discuss some the basic concepts like Parseval's theorem, bias, influence, energy, and noise stability. We then move on to discuss Condorcet's voting paradox and Arrow's Theorem (which says that dictatorship is the only way to avoid Condorcet's paradox). We also discuss Kalai's Fourier analytic proof of Arrow's theorem. We follow the proof with a brief discussion about the "robustness" of his proof.

(Adapted from the tutorial talk by Ryan O'Donnell at STOC'08)

Computing Market Equilibrium Prices for Linear Fisher Markets

Consider a Fisher market where there are n buyers with fixed budget and m divisible goods. Given prices for the goods, each buyer wants to buy a bundle of goods that maximizes her utility. The market is said to be at an equilibrium when each buyer is allocated with a utility-maximizing bundle of goods and there is no deficit or surplus of goods. Arrow and Debreu proved the equilibrium price exists for a wide range of markets under a mild assumption, but polynomial-time algorithms for finding such a price have drawn attention only relatively recently. In this talk, we will see the basic framework to find such a price in polynomial-time for a special case of markets with linear utility.

The talk is mostly based on the paper "Improved Algorithms for Computing Fisher's Market Clearing Prices" by James B. Orlin in STOC 2010.

Finding, Minimizing and Counting Weighted Subgraphs

We will present several recent algorithms of Vassilevska and Williams (STOC 2009) for finding minimum-weight copies of a given graph H in a larger graph G with node and/or edge weights. Specifically, we'll discuss a general reduction from the problem of finding arbitrary graphs H to the problem of finding triangles of a given weight. We will also present two algorithms for finding weighted triangles; one of the algorithms works well in dense graphs and the other, in sparse ones.

Indexing Feature Selection for Graph Querying

Graph data are becoming more common in many scientific and commercial fields. One of the most important computational problems with databases of graphs is *subgraph querying*: given a particular query graph and a database, the goal is to return all graphs in the database that are supergraphs of the query. "Feature-based graph indexes" are commonly used to filter out all but a (hopefully) small set of candidate graphs which are then verified with subgraph isomorphism tests.

In this talk, I will explain the motivation and basic framework for these indexes and discuss several algorithms for selecting the features used to build indexes. I will explain the principles behind the algorithms' design and compare their empirical performance.