## Theory Seminar Schedule for Spring 2011

The weekly Theory seminar provides an opportunity to get exposed to cutting edge research in theoretical computer science, to learn to understand technical material from talks and to give technical presentations. Graduate students interested in doing research in algorithms, complexity theory, cryptography, quantum computing and related areas are strongly encouraged to attend.- All talks will be at 1 pm on Tuesdays in IST 223B (unless they are part of the departmental colloquium).
- If you would like to speak, please email us at: theory [at] cse [dot] psu [dot] edu.
- If you want to receive announcements of upcoming talks, please
join the
**theory mailing list**.

*Archives*Spring 2007 Fall 2007 Spring 2008 Summer 2008 Fall 2008 Spring 2009 Fall 2009 Spring 2010 Fall 2010

## Local Graph Partitioning using PageRank Vectors

paper link: Reid Andersen, Fan Chung, and Kevin Lang Local Graph Partitioning using PageRank Vectors in STOC 2006## Design of Directed Networks to Minimize Branching

Kashyap Dixit will discuss two problem on design of directed networks where we minimize (or limit) the branching in the network.In the simplest version, we want to design a directed tree that reaches all target nodes from a source node, while each node has a bound on the number of edges that exit that nodes and are used in the tree. Verifying that a solution satisfying such bound exists is NP-complete, but a polynomial time algorithm can find a solution that increases sufficient bounds by 2.

Other version minimize the cost of an arborescence that satisfies such requirements and more general connectivity requirements.

The results discussed in this talked were published as Additive Guarantees for Degree Bounded Directed Network Design by N. Bansal, R. Kandhekar and V. Nagarajan in STOC 2008.

## Improved Approximation Algorithms for Directed Steiner Forest

paper link: Moran Feldman, Guy Kortsarz, and Zeev Nutov Improved Approximation Algorithms for Directed Steiner Forest SODA 2006## Improved Approximation for the Directed Spanner Problem

Grigory Yaroslavtsev will present an algorithm for k-spanner in directed graph with approximation ratio $\sqrt{n}\log n$. The technique involves randomized rounding of a linear programming solution. This is a recent joint work with Berman, Bhattacharyya, Makarychev and Raskhodnikova.## Constant-Round Black-Box Zero-Knowledge Proofs

In this talk, I will discuss the question of round complexity in zero-knowledge (ZK) proofs in the sense of black-box simulation. Both impossibility and possibility results will be covered. Specifically, I will review the result of Goldreich and Krawczyk that only languages in BPP have 3-round black-box ZK proofs. This in particular implies parallel composing Blum's basic ZK proof for NP languages will no longer be black-box ZK. On the other hand, based on reasonable computational assumptions, there exist 5-round ZK proofs for any NP language (due to Goldreich and Kahan). Finally, I will conclude with brief discussions on round complexity of ZK proofs where verifiers may be quantum.Reference:

- Oded Goldreich, Hugo Krawczyk: On the Composition of Zero-Knowledge Proof Systems.SIAM J. Comput. 25(1): 169-192 (1996)
- Oded Goldreich, Ariel Kahan: How to Construct Constant-Round Zero-Knowledge Proof Systems for NP. J. Cryptology 9(3): 167-190 (1996)
- Jon Katz: Which languages have 4-round ZK proofs? in TCC 2008
- Rahul Jain, Alexandra Kolla, Gatis Midrijanis, Ben W. Reichardt: On parallel composition of zero-knowledge proofs with black-box quantum simulators. Quant. Inf. Comp. 9:513-532, 2009

## Privately Releasing Conjunctions and the Statistical Query Barrier

Suppose we would like to know all answers to a set of statistical queries*C*on a data set up to small error, but we can only access the data itself using statistical queries. A trivial solution is to exhaustively ask all queries in

*C*. Can we do any better?

We show that the number of statistical queries necessary and sufficient for this task is -- up to polynomial factors -- equal to the agnostic learning complexity of

*C*in Kearns' statistical query (SQ) model. This gives a complete answer to the question when running time is not a concern, and has important implications for private data release. First, it completely characterizes what classes of queries can be released in the randomized response based "Local Privacy" model of Kasiviswanathan et al, and shows that in terms of query release, the local privacy model is substantially weaker than the centralized model. In particular, in the local privacy model, it is not possible to release even monotone conjunctions (contingency tables) to sublinear error. It also provides unconditional running time lower bounds on any algorithm that can be implemented using only statistical queries, independent of its privacy guarantees. This class of algorithms encompasses almost all techniques for private data release that we are aware of. Specifically, no such mechanism can release any datastructure representing even monotone conjunctions to sublinear average error in time polynomial in the dimension of the data.

We then show that if the class of queries can be jointly described by a submodular function, than it is possible to release them to average error

*epsilon*n*using

*d*statistical queries. Thus, without escaping the statistical query barrier, we can release conjunctions to linear error in time polynomial in

^{(1/epsilon)}*d*(the dimension of the data).

This talk is based on joint work with Anupam Gupta, Moritz Hardt, and Jonathan Ullman.

## Secure Multiparty Regression Based on Homomorphic Encryption

In this work we develop techniques to securely compute regression coefficients for a dataset that is shared between a number of distinct parties (e.g., institutions, government branches etc.). We conceptualize the existence of a single combined database containing all of the information for the individuals in the separate databases and for the union of the variables. We propose an approach that gives full statistical calculation on this combined database without actually combining information sources. We focus on computing linear regression and logistic regression estimates, as well as certain goodness of fit statistics. We make use of homomorphic encryption in constructing a protocol for regression analysis which adheres to the definitions of security laid out in the cryptography literature. Our approach provides only the final result of the calculations compared with other methods that share intermediate values and thus present an opportunity for compromise of privacy. We perform an experiment on data taken from the current population survey, with 50000 cases and 22 covariates, to show that our approach is practical for moderate sized problems.## Communication Complexity

In two-party communication complexity, Alice and Bob wish to evaluate a function*f(x, y)*, where

*x*is Alice's input and

*y*is Bob's input. To compute

*f*, they exchange messages according to a pre-specified protocol and the goal is to compute the correct answer with as little communication as possible. Communication Complexity is increasingly becoming an important tool for proving lower bounds in a variety of applications. In this talk, we give an introduction to the area of Communication Complexity, describing some important models, lower bound techniques and some applications.

References: Eyal Kushilevitz and Noam Nisan. Communication Complexity. Cambridge University Press, Cambridge, 1997.

## Analysis of Differentially-Private Synthetic Data Sets

In the hope of limiting the risks of disclosure, statistical agencies use a host of data reduction and data perturbation methods before publishing data sets. Most of them heuristically provide protection, but none offers any formal guarantee of confidentiality protection. One way of ensuring and quantifying confidentiality protection is with differential privacy, a powerful criterion which provides a strict, measurable, guarantee of confidentiality. Several techniques have now been proposed to create completely synthetic data sets which satisfy differential privacy. Our interest is in methods for users to analyze such synthetic data sets. We will show that inferences from usual statistical methods are not necessarily valid when the data sets are created to achieve differential privacy. We also propose a solution to analyze differentially-private synthetic data sets by directly modeling the data generation process in a Bayesian framework.## Fast Approximation of the Permanent for Very Dense Problems

Approximation of the permanent of a matrix with nonnegative entries is a well studied problem. The most successful approach to date for general matrices uses Markov chains to approximately sample from a distribution on weighted permutations, and Jerrum, Sinclair, and Vigoda developed such a method they proved runs in polynomial time in the input. The current bound on the running time of their method is O(n7(log n)4). Here we present a very different approach using sequential acceptance/rejection, and show that for a class of dense problems this method has an O(n4 log n) expected running time.## Compressibility of Behavioral Graphs

Graphs resulting from human behavior (the web graph, social networks, etc.) have hitherto been viewed as a monolithic class with similar characteristics. Our focus here is on the compressibility of such graphs. It has been empirically shown that Web graphs can be compressed down to three bits of storage per edge. In light of this, we address two basic questions: are social networks as compressible as Web graphs and are there tractable and realistic models that yield highly compressible graphs?## Firefighter Problem

In a graph fire starts at node r, and afterwards the firefighter in every step can select one node, if it is not on fire, so this node becomes fire resistance. In the second part of the step the fire continues on the nodes that are on fire, and it also spreads to the adjacent nodes that are not fire resistant. The goal is to maximize the number of nodes that are saved from fire.This talk will present algorithms for the case when the graph is a tree. This case is NP-hard even if the maximum node degree is 3.

## Knowledge States: A Tool in Randomized Online Algorithms

We introduce the concept of knowledge states; many well known algorithms can be viewed as knowledge state algorithms. The knowledge state approach can be used to construct competitive randomized online algorithms and study the tradeoff between competitiveness and memory.We give new results for the two server problem and the paging problem. We present a knowledge state algorithm for the two server problem over Cross Polytope Spaces with a competitive ratio of 19/12 and show that it is optimal against the oblivious adversary.