# Theoretical Computer Science Group

## Theory Seminar Schedule for Spring 2013

The weekly Theory seminar provides an opportunity to get exposed to cutting edge research in theoretical computer science, to learn to understand technical material from talks and to give technical presentations. Graduate students interested in doing research in algorithms, complexity theory, cryptography, quantum computing and related areas are strongly encouraged to attend.

Date Speaker Title
Jan 13 Sofya Raskhodnikova Analyzing Graphs with Node Differential Privacy
Jan 21 PSU holiday, no seminar
Jan 28 Grigory Yaroslavtsev Beating the Direct Sum Theorem in Communication Complexity
Feb 4 Adam Smith The Regularity of Lossy RSA on Subdomains and its Applications
Feb 15 2:30 PM, IST 222
Shubhangi Saraf The Method of Multiplicities
Feb 18 2:30 PM, IST 222
Andrew McGregor Graph Sketching and Homomorphic Compression
Feb 25 Meiram Murzabulatov The Discrepancy Method in Communication Complexity
Mar 18 Kashyap Dixit Best known approximation algorithm for the s-t Path Traveling Salesman Problem
Mar 25 Megan Heysham Load Bounds for Two-Choice Hashing
Apr 1 Madhav Jha Testing monotonicity of Boolean functions
Apr 5 2:30 PM, IST 222
Ye Zhang A Subexponential Algorithm to Solve Discrete Logarithm Problem
Apr 8 Rayan Chikhi TBD
Apr 15 Fang Song TBD
Apr 19 2:30 PM, IST 113
Richard Lipton The P vs NP Question: Past, Present, Future
As part of "The James F. Kelly Distinguished Lecture Series"
Apr 22 Nai-Hui Chia TBD
• All talks will be at 2:30 pm on Mondays in 118 Earth and Engineering Science Building (unless they are part of the departmental colloquium). Any unusual date/time/location is highlighted in the schedule.
• If you would like to speak, please email us at: theory [at] cse [dot] psu [dot] edu.
• If you want to receive announcements of upcoming talks, please join the theory mailing list.
Archives

## Analyzing Graphs with Node Differential Privacy

We develop algorithms for the private analysis of network data that provide accurate analysis of realistic networks while satisfying stronger privacy guarantees than those of previous work. We present several techniques for designing node differentially private algorithms, that is, algorithms whose output distribution does not change significantly when a node and all its adjacent edges are added to a graph. We also develop methodology for analyzing the accuracy of such algorithms on realistic networks.

The main idea behind our techniques is to “project” (in one of several senses) the input graph onto the set of graphs with maximum degree below a certain threshold. We design projection operators, tailored to specific statistics that have low sensitivity and preserve information about the original statistic. These operators can be viewed as giving a fractional (low-degree) graph that is a solution to an optimization problem described as a maximum flow instance, linear program, or convex program. In addition, we derive a generic, efficient reduction that allows us to apply any differentially private algorithm for bounded-degree graphs to an arbitrary graph. This reduction is based on analyzing the smooth sensitivity of the “naive” truncation that simply discards nodes of high degree.

## Beating the Direct Sum Theorem in Communication Complexity

A two-party communication problem is given by a function f(x,y), where x and y are held by two different parties (Alice and Bob) connected via a communication channel. The goal of the parties is to compute f(x,y) while minimizing the communication through the channel on the worst-case pair of inputs (x,y). Randomized communication complexity of f with error probability \eps, denoted R_{\eps}(f), is the minimum communication required for a randomized protocol which computes f with error probability at most \eps. Results in the area of communication complexity are used to prove strong information-theoretic lower bounds in different areas of computer science, ranging from data structures to circuit complexity. A direct sum theorem for two parties and a function f states that the communication cost of solving k instances of f simultaneously (computing a vector f(x_1,y_1), ... f(x_k, y_k)) with error probability 1/3 is at least k * R_{1/3}(f), where R_{1/3}(f) is the communication required to solve a single instance of f with error probability 1/3. We improve this for a specific family of functions f, showing that the communication required to solve k instances of f simultaneously with error probability 1/3 is (k R_{1/k}(f)). Since R_{1/k}(f) may be as large as R_{1/3}(f) * log k, we asymptotically beat the standard direct sum bound for such functions. This shows that the trivial upper bound which solves each of the k instances of f with probability 1 - O(1/k) is optimal (correctness follows from the union bound). Our results imply optimal communication/space lower bounds for several sketching problems (e.g. Johnson-Lindenstrauss transform) in a setting when the algorithm should be correct on a sequence of k queries. Joint work with Marco Molinaro and David Woodruff.

## The Regularity of Lossy RSA on Subdomains and its Applications

Rivest, Shamir and Adleman's conjecture, that exponentiation modulo composite numbers is hard to invert, is the basis of many modern encryption and signature schemes. However, the security of most of the schemes resides on (apparently) stronger assumptions than simply the difficulty of inverting exponentiation. For example, it is often assumed that certain specific bits of the input are hard to compute from the output. In this paper, we investigate the relationship between several of these stronger variants of the RSA assumption. Under the "phi-hiding assumption" (which we define in the talk), we show that large consecutive runs of input bits are hard to distinguish from random given the output, and that "PKCS #1 v1.5" and "RSA-OAEP" (two particular encryption schemes based on RSA) are secure under chosen plaintext attacks. We obtain these results by deriving new bounds on how evenly the primitive e-th roots of unity mod N are distributed, for exponents e that divide phi(N). The new bounds use both elementary techniques and recent estimates of Gauss sums over finite subgroups. Our results deepen the connection between combinatorial'' properties of exponentiation in Z_N and the security of RSA-based constructions. The talk will not assume any special background except familiarity with modular arithmetic and basic group theory. Joint work with Eike Kiltz, Mark Lewko and Adam O'Neill. Based on a papers at Crypto 2010 and Eurocrypt 2013.

Two-party (or multi-party) computation allows users who each hold a different part of the input to compute some joint function without sharing their inputs with each other. Traditional security notions have focused on guaranteeing that the output is correct and the inputs are kept secret even if a party behaves maliciously. Fairness, the requirement that no party can abort after learning the output and prevent others from also learning the output, is impossible to achieve in general. In this work, we model the interaction as a game, and model the players as rational utility-maximizing agents. We find that in this setting fairness can be achieved in as broad a set of situations as one could hope for. This gives a new situation in which a game-theoretic setting can be used to circumvent known impossibility results. Joint work with Jonathan Katz.

## The Method of Multiplicities

Polynomials have played a fundamental role in the construction of objects with interesting combinatorial properties, such as error correcting codes, pseudorandom generators, randomness extractors etc. Somewhat strikingly, polynomials have also been found to be a powerful tool in the analysis of combinatorial parameters of objects that have some algebraic structure. This method of analysis has found applications in works on list-decoding of error correcting codes, constructions of randomness extractors, and in obtaining strong bounds for the size of Kakeya Sets. Remarkably, all these applications have relied on very simple and elementary properties of polynomials such as the sparsity of the zero sets of low degree polynomials. In this talk we will discuss improvements on several of the results mentioned above by a more powerful application of polynomials that takes into account the information contained in the *derivatives* of the polynomials. We call this technique the method of multiplicities". The information about higher multiplicity vanishings of polynomials, which is encoded in the derivative polynomials, enables us to meaningfully reason about the zero sets of polynomials of degree much higher than the underlying field size. We will discuss some of these applications of the method of multiplicities, to obtain improved constructions of error correcting codes, and qualitatively tighter analyses of Kakeya sets, and randomness extractors. (Based on joint works with Zeev Dvir, Swastik Kopparty, Madhu Sudan, Sergey Yekhanin)

## Graph Sketching and Homomorphic Compression

Applying random linear projections, a.k.a. "sketching", has become a standard technique when analyzing high-dimensional data sets. The resulting algorithms are embarrassingly parallelizable and suitable for stream processing. However, existing results have largely focused on vector-based problems (e.g., estimating norms and reconstructing sparse signals) and linear-algebraic problems (e.g., regression and low-rank approximation). In this work, we ask whether the richer combinatorial structure of graphs can also be analyzed via small sketches. We present a range of algorithmic results and applications including the first single-pass algorithm for constructing graph sparsifiers of fully-dynamic graph streams (i.e., where edges can be added and deleted in the underlying graph). Based on joint work with Kook Jin Ahn and Sudipto Guha (SODA 12, PODS 12).

## The Discrepancy Method in Communication Complexity

The most commonly studied model in communication complexity theory is the following. There are two players, Alice and Bob. Alice holds input x, Bob holds input y, and neither knows the other's input. They want to compute some joint function f(x,y). In order to do that, they exchange information (bit by bit) according to some protocol chosen beforehand. Their goal is to minimize the communication complexity (that is, the number of bits communicated between them). Lower bounds on communication complexity are used in various fields of computer science, e.g., VLSI circuit design, data structures, optimization of computer networks and property testing. In this talk, we present the discrepancy method that can be used to prove lower bounds on randomized communication complexity, that is, communication complexity over randomized protocols.
The material for this talk is taken from the book "Communication Complexity" by Eyal Kushilevitz and Noam Nisan.

## Best known approximation algorithm for the s-t Path Traveling Salesman Problem

This talk is based on the paper by C. H. An, R. Kleinberg and D. Shmoys called "Improving Christofides algorithm for the s-t path TSP" from STOC 2012. It gives the best known approximation algorithm for a classical NP-complete problem: the s-t Path Traveling Salesman Problem (TSP) over an arbitrary metric. The best approximation algorithm for this problem prior to the work of An, Kleinberg and Shmoys was presented by Hoogeveen in 1991. It was based on Christofides' algorithm (for Circuit TSP) and had approximation ratio 5/3. The approach in [AKS12] uses a linear programming relaxation of the problem.
I will start by explaining Christofides' algorithm. Then I will state the algorithm from [AKS12]. I will present the analysis that gives an approximation ratio of 5/3 for the new algorithm. Then I will give a short overview of the improved analysis that achieves a 1.6577 approximation ratio for this algorithm. (In the paper, this analysis is further improved to get the best known approximation ratio of (1+sqrt(5))/2.)

## Load Bounds for Two-Choice Hashing

When placing n balls in n bins sequentially, if each ball selects a bin uniformly at random, the resulting maximum load of any bin is Theta(ln n/ln ln n) with high probability. If each ball instead selects two bins and is placed in the least full bin, we can achieve better load-balancing. In this talk, I will prove that if each ball selects d >=2 bins, the resulting maximum load of any bin is Theta(ln ln n). This talk is based on a proof from the textbook "Probability and Computing" by Michael Mitzenmacher and Eli Upfal.

## Testing monotonicity of Boolean functions

In this talk, we present the exciting new progress made by Chakrabarty and Seshadhri (STOC 2013) on the problem of Boolean monotonicity testing. This is a classic problem in the study of property testing of functions and has been studied extensively since 1998. In this work, the authors give an O(n^(5/6)) time algorithm for distinguishing Boolean monotone functions from functions which are "far" from monotone. In particular, this upcoming STOC 2013 paper establishes (the long conjectured result) that complexity of Boolean monotonicity testing is sublinear in dimension n.
Boolean monotonicity testing is a generalization of the problem of determining if an array is sorted to higher dimensions. More precisely, a function f : {0,1}^n - > {0,1} is monotone if for every pair of inputs x = (x_1, ..., x_n) and y = (y_1, ..., y_n) satisfying x_i <= y_i for all i, the following holds: f(x) <= f(y). The goal is to design a randomized algorithm which gets oracle access to f and has the following behavior. It accepts monotone functions and rejects (with probability at least 2/3) functions which are epsilon-far from monotone. A function is epsilon-far from monotone if it must be modified in epsilon-fraction of places to make it monotone.

## A Subexponential Algorithm to Solve Discrete Logarithm Problem

Let G be a cyclic group generated by element g. The discrete logarithm problem (DLP) is: given g and an element t in G, find an integer s such that g^s=t. DLP is thought to be a hard problem in many groups, and has many applications in cryptography (encryption and signature schemes, zero-knowledge proof systems, and others). Assuming G has size N, the brute force algorithm to search for DLP takes O(N) time. We can improve the running time to O(\sqrt{N}) without any restriction on G. However, both of these algorithms take time exponential in log(N). In this talk, I will present a randomized algorithm to solve DLP over Z*_p (for prime p) that runs in time exp(\sqrt{log p log log p}), which is subexponential in N=log p. The content of this talk is based on Chapter 16.2 of Shoup's textbook, "A Computational Introduction to Number Theory and Algebra".

## The P vs NP Question: Past, Present, Future

In this talk I will give some history of the great P vs NP question. I will also discuss the problem's current status, some current claims about it, and talk about its future. The talk should be fun for those who are not experts, and also fun for those who are experts (though with the P=NP question, it is unclear who is really an "expert", since we know so little).
Finally, I will outline an approach to the question that I have been actively pursuing for several years now. The end is not in sight, but progress appears possible.