## Theory Seminar Schedule for Fall 2011

The weekly Theory seminar provides an opportunity to get exposed to cutting edge research in theoretical computer science, to learn to understand technical material from talks and to give technical presentations. Graduate students interested in doing research in algorithms, complexity theory, cryptography, quantum computing and related areas are strongly encouraged to attend.- All talks will be at 3:30 pm on Mondays in IST 333 (unless they are part of the departmental colloquium).
- If you would like to speak, please email us at: theory [at] cse [dot] psu [dot] edu.
- If you want to receive announcements of upcoming talks, please
join the
**theory mailing list**.

*Archives*Spring 2007 Fall 2007 Spring 2008 Summer 2008 Fall 2008 Spring 2009 Fall 2009 Spring 2010 Fall 2010 Spring 2011

## Degree Bounded Network Design with Metric Costs

Given a complete undirected graph, a cost function on edges, and a degree bound*B*, the degree bounded network design problem is to find a minimum cost simple subgraph with maximum degree $B$ satisfying given connectivity requirements. Even for a simple connectivity requirement such as finding a spanning tree, computing a feasible solution for the degree bounded network design problem is already NP-hard, and thus there is no known polynomial time algorithm for this problem. In this talk, I will show the proof that when the cost function satisfies the triangle inequality, there are constant factor approximation algorithms for various degree bounded network design problems particularly

*k*-connected minimum cost sub-graph with bounded degree. The algorithm can be seen as a generalization of Christofides' algorithm for the metric traveling salesman problem. The main technical tool is a simplicity-preserving edge splitting-off operation, which is used to “short-cut” vertices (i.e. to remove edges $xu$ and

*xv*and adding the edge

*uv*) with high degree while maintaining connectivity requirements and preserving simplicity (i.e. no new parallel edges are formed) of the solutions.

## Estimating the Longest Increasing Subsequence in Polylogarithmic Time

Finding the length of the longest increasing subsequence (LIS) is a classic algorithmic problem. A small example should explain the problem. In the array 1 9 4 10 6 7, the LIS has length 4 and is 1 4 6 7. Let n denote the size of the input array. Simple textbook solutions achieve an O (n log n) running time, using dynamic programming. What can a sublinear time algorithm say about the LIS? For any constant delta > 0, we construct a polylogarithmic time randomized algorithm that estimates the length of the LIS within an additive error of (delta n). Previously, the best known polylogarithmic time algorithms could only achieve an additive n/2 approximation. Why is this problem challenging? The LIS length is the output of a dynamic program, which means unless we solve many (read linear) sub problems, we cannot get the exact LIS length. We are looking to get extremely accurate (read delta n) approximations in *polylogarithmic time*. The algorithm we construct attempts to follow the progress of a(top-down) dynamic program. In polylogarithmic time, it zeroes in on the "important" sub-problems to solve. In essence, it is a sublinear-time divide and conquer algorithm. This requires some new sublinear algorithm ideas, which we hope will have applications for approximating other dynamic programs. Joint work with Michael Saks.## Testing and Reconstruction of Lipschitz Functions with Applications to Data Privacy

paper link: Madhav Jha, Sofya Raskhodnikova Testing and Reconstruction of Lipschitz Functions with Applications to Data Privacy FOCS 2011## The Complexity of a Line of 9-state Qudits

The Local Hamiltonian problem (approximating the ground energy of a local Hamiltonian) can be thought of as the quantum analogue of the Constraint Satisfaction Problem. We describe Kitaev's reduction from an arbitrary language in the complexity class QMA (the quantum analogue of NP) to the local Hamiltonian problem, showing that k-Local Hamiltonians is QMA-complete. We adapt this reduction to show that the Local Hamiltonian problem restricted to nearest-neighbor constraints on a line of 9-dimensional qudits is QMA-complete## Correlation clustering: a survey

We will consider the problem of finding clustering of a graph, based on qualitative information about similarities between vertices. This problem is known as "correlation clustering" and was introduced by Bansal, Blum and Chawla (FOCS 2002). One important aspect that makes "correlation clustering" different from other clustering problems (such as k-medians, k-means, etc) is that it doesn't assume the input to be embedded into a metric space. Another interesting property is that the number of clusters does not have to be fixed, so the algorithm can choose the number of clusters, which works best for a specific instance. The problem is NP-hard and significant progress in the study of its approximability was made by Charikar, Guruswami and Wirth ("Clustering with qualitative information", FOCS 2003), Ailon, Charikar and Newman ("Aggregating inconsistent information: ranking and clustering", STOC 2005) and Guruswami and Giotis ("Correlation clustering with a fixed number of clusters", SODA 2006). We will survey the main results, obtained in this line of work.## Group Testing and Coding Theory

Group testing was formalized by Dorfman in his 1943 paper and was originally used in WW-II to identify soldiers with syphilis. The main insight in this application is that blood samples from different soldiers can be combined to check if at least one of soldiers in the pool has the disease. Since then group testing has found numerous applications in many areas such as (computational) biology, combinatorics and (theoretical) computer science.Theory of error-correcting codes, or coding theory, was born in the works of Shannon in 1948 and Hamming in 1950. Codes are ubiquitous in our daily life and have also found numerous applications in theoretical computer science in general and computational complexity in particular.

Kautz and Singleton connected these two areas in their 1964 paper by using "code concatenation" to design good group testing schemes. All of the (asymptotically) best known explicit constructions of group testing schemes use the code concatenation paradigm. In this talk, we will focus on the "decoding" problem for group testing: i.e. given the outcomes of the tests on the pools identify the infected soldiers.

Recent applications of group testing in data stream algorithm require sub-linear time decoding, which is not guaranteed by the traditional constructions.

We will show that recent developments in list decoding of codes lead in a modular way to sub-linear time decodable group testing schemes that match the best known parameters of group testing schemes (which might not be efficiently decodable).

## Hilbert's Nullstellensatz and Linear Algebra: Investigating coNP Certificates from a Computational Perspective

Unlike linear models, systems of multivariate polynomial equations over the complex numbers or finite fields can be compactly used to model combinatorial problems. In this way, a problem is feasible (e.g. a graph is 3-colorable, Hamiltonian, etc.) if and only if a given system of polynomial equations has a solution. Via Hilbert's Nullstellensatz, we generate a sequence of large-scale, sparse linear algebra computations from these non-linear models to describe an algorithm for solving the combinatorial problems. As a byproduct of this algorithm, we produce algebraic certificates for the non-3-colorability, non-Hamiltonicity, or non-existence of the underlying combinatorial property modeled by these systems of polynomial equations.In this talk, we present theoretical and experimental results on the size of these sequences. For the independent set problem, we show that the size of the smallest Nullstellensatz-based linear algebra system certifying that there is no independent set of a given size is the independence number of the graph. For non-3-colorability over a finite field, we utilize this method to successfully solve graph problem instances having thousands of nodes and tens of thousands of edges. We also describe methods of optimizing this method, such as finding alternative forms of the Nullstellensatz, adding carefully-constructed polynomials to the system, branching and exploiting symmetry.

## Online Stochastic Matching: Beating 1 - 1/e

paper link: Jon Feldman, Aranyak Mehta, Vahab Mirrokni, S. Muthukrishnan Online Stochastic Matching: Beating 1-1/e FOCS 2009## Approximating Linear Threshold Predicates

We study constraint satisfaction problems with constraints defined by homogeneous linear threshold predicates. Specifically, we consider the standard optimization problem Max-CSP(P) where theobjective is to satisfy as many constraints of the same type P on a number of Boolean variables as possible. We assume that the constraints are defined by a fixed linear threshold predicate P: {-1,+1}^n ? {-1,+1} of the form P(x_1 , ..., x_n ) = sgn(w_1 x_1 + ... + w_n x_n ), for positive integer weights w_i (i=1, ..., n). Our focus is on a range of threshold predicates for which the problem does not become approximation resistant, and we study the approximation curve of this class of problems. For the special case of the majority function with w_1 = ... = w_n = 1 and n odd, we obtain almost-matching approximability and hardness results that can be summarized as follows:* Approximation: Using linear programming, we design a polynomial-time algorithm that, given a 1-?/(n+1) satisfiable instance for any ? < 1, satisfies a 1/2 + (1-?)^7 ?(1/sqrt(n)) fraction of theconstraints.

* Hardness: Assuming the Unique Games Conjecture (Khot, STOC 2002), for any ?>0 and ? in [0,1], given a (1-?/(n+1) - ?)-satisfiable instance it is NP-hard to satisfy a 1/2 + (1-?)?(1/sqrt(n)) + ? fraction of the constraints.

We extend the above results to a more general class of "majority-like" predicates and obtain parallel results for them. Loosely speaking, this class of predicates can be defined using weights w_i that are generally small. Towards this, we introduce the notion of Chow-robustness that might be of independent interest.

[Joint work with Johan Håstad, Marcus Isaksson, and Ola Svensson]

## Sampling classical and quantum populations

Given a classical n-bit string, we can sample part of it and approximate its mean by the sample mean. It is known that random sampling gives good approximation due to Chernoff-type tail inequalities. Consider we are instead given n qubits, and we pick some of them to measure. A natural question is: what can we infer from the measurement outcomes, and how accurate the inference could be?In this talk, I will discuss one of the possible interpretations of sampling a quantum population, appeared in [BF'10], which is in particular suitable for quantum cryptographic applications. A formal framework will be derived, capturing what information can be obtained from a sampling strategy and how accurately it performs. As an application, I will show a quantum oblivious transfer protocol and prove its security using the quantum sampling techniques.

Reference: [BF'10] "Sampling in a quantum population, and applications". Niek Bouman and Serg Fehr. Crypto'2010.

## Emerging Topic Detection Using Dictionary Learning

Streaming user-generated content in the form of blogs, microblogs, forums, and multimedia sharing sites, provides a rich source of data from which invaluable information and insights may be gleaned. Given the vast volume of such social media data being continually generated, one of the challenges is to automatically tease apart the emerging topics of discussion from the constant background chatter. Such emerging topics can be identified by the appearance of multiple posts on a unique subject matter, which is distinct from previous online discourse. We address the problem of identifying emerging topics through the use of dictionary learning.Our objective function is a combination of the L1-norms of a sparse error (robust reconstruction) and a sparse code, which appears well suited for sparse high-dimensional datasets such as those that arise in text applications. Additionally, we have non-negativity constraints on the sparse code and dictionary, to maintain interpretability. We derive a scalable approach by using the alternating directions method to solve the resulting optimization problems. Empirical results on various datasets show that our proposed approach is more effective than several baselines in detecting emerging topics.

Joint work with Prem Melville, Arindam Banerjee, and Vikas Sindhwani.

## Engineering high-performance parallel graph algorithms

The goal of this talk is to get the CSE theory community excited about parallel computing and algorithm design for 'irregular' problems (irregular = memory-intensive, non dense-matrix computations). I will list a series of problems, which, in my opinion, will immensely benefit from new theoretical computer science contributions. My research is on 'engineering' algorithms for real-world data-intensive computations (engineering = algorithm -> efficient implementation with real-world data and experimentation). I will present some motivating applications from genomics and social network analysis, and discuss some of my contributions to recent DIMACS computational challenges on shortest paths and clustering.Here are links to recent parallel computing workshops with talks of a similar nature:

http://dimacs.rutgers.edu/Workshops/Parallel/slides/slides.html

http://www.umiacs.umd.edu/conferences/tmc2009/