# Theoretical Computer Science Group

## Theory Seminar Schedule for Fall 2016

The weekly Theory seminar provides an opportunity to get exposed to cutting edge research in theoretical computer science, to learn to understand technical material from talks and to give technical presentations. Graduate students interested in doing research in algorithms, complexity theory, cryptography, quantum computing and related areas are strongly encouraged to attend.

DateSpeakerTitle
August 26
10:45 am
Organizational Meeting (Location: 357, IST Building)
September 9 Greg Bodwin (Stanford) The 4/3 Additive Spanner Exponent is Tight
September 23 Nai-Hui Chia (PSU) How hard is deciding trivial versus nontrivial in the dihedral coset problem?
September 23 (Colloquium) Jan Holub (CVUT) Approximate String Matching for Self-Indexes
September 30 Nima Haghpanah (PSU, Economics) Sequential Mechanisms with ex-post Participation Guarantees
October 7
10:30-11:45 a.m.
Ramesh Krishnan (PSU) An Isoperimetric Theorem for the Boolean Hypercube
October 14 Om Thakkar (PSU) Max-Information, Differential Privacy, and Post-Selection Hypothesis Testing
October 21 Justin Hsu (UPenn) Differential Privacy as an Approximate Coupling
October 21 (Colloquium) Paul Medvedev (PSU) Assembly of Big Genomic Data
October 28 (Colloquium) Shafi Goldwasser (MIT) Modern Cryptography in the Age of Cloud Computing
(Kelly Distinguished lecture)
November 4 Roksana Baleshzar (PSU) Local Computation Algorithms with Query Complexity Independent of the Input Size
November 11 Nithin Varma (PSU) A Subcubic-time Algorithm for a Special Matrix Product and its Applications
November 18 Jiayu Zhang (PSU) Privacy Amplification and Randomness Extractors
November 18 (Colloquium) Vladimir Braverman (Johns Hopkins) Streaming and Sketching Algorithms and Their Applications
November 25 Thanksgiving week, no seminar
December 2 Ishan Behoora (PSU) Progressive Algorithms
December 9 Meiram Murzabulatov (PSU) The Power and Limitations of Uniform Samples in Testing Properties of Figures
• Talks will generally be held on Fridays from 10:45-11:45 a.m. in 219 Hammond Building (unless they are part of the departmental colloquium, in which case they will be held from 2-3 p.m. in 113 IST Building (Cybertorium)). Any unusual date/time/location is highlighted in the schedule.
• If you would like to speak, please email us at: theory [at] cse [dot] psu [dot] edu.
• If you want to receive announcements of upcoming talks, please join the theory mailing list.
Archives

## The 4/3 Additive Spanner Exponent is Tight

A $+k$ additive spanner is a sparse subgraph that preserves all pairwise distances up to additive error $+k$. A classic result in this field states that all graphs have +6 spanners on $n^{4/3}$ edges, but no sparser spanners are known for any constant $k$. Meanwhile, current lower bound allow for the possibility of constant-error additive spanners on $n^{1 + \epsilon}$ edges for any $\epsilon > 0$. It is considered a central open problem to prove the existence/nonexistence of these nearly-linear sized additive spanners.

We resolve this question in the negative by constructing a family of graphs that provably have no $+n^{o(1)}$ additive spanners on $n^{4/3 - \epsilon}$ edges. In this talk, we will describe the construction technique. We will then discuss some extensions and implications of this new lower bound.

Joint work with Amir Abboud.

## How hard is deciding trivial versus nontrivial in the dihedral coset problem?

We study the hardness of the dihedral hidden subgroup problem. It is known that lattice problems reduce to it, and that it reduces to random subset sum with density $> 1$ and also to quantum sampling subset sum solutions. We examine a decision version of the problem where the question asks whether the hidden subgroup is trivial or order two. The decision problem essentially asks if a given vector is in the span of all coset states. We approach this by first computing an explicit basis for the coset space and the perpendicular space. We then look at the consequences of having efficient unitaries that use this basis. We show that if a unitary maps the basis to the standard basis in any way, then that unitary can be used to solve random subset sum with constant density $>1$. We also show that if a unitary can exactly decide membership in the coset subspace, then the collision problem for subset sum can be solved for density $>1$ but approaching $1$ as the problem size increases. This strengthens the previous hardness result that implementing the optimal POVM in a specific way is as hard as quantum sampling subset sum solutions.

This is a joint work with Sean Hallgren and is going to be presented in TQC2016.

## Approximate String Matching for Self-Indexes

String matching is very frequent task of locating all occurrence of pattern P in text T. When we have text T in advance we can construct an index for the text that allows to search for the pattern in time linear to the length of the pattern. Self-index is an index which in addition allows to replace the original text. So it allows queries retrieving the text. However, only exact string matching is allowed by self-indexes. The lecture presents an interface between approximate string matching and self-indexes.

## Sequential Mechanisms with ex-post Participation Guarantees

We provide a characterization of revenue-optimal dynamic mechanisms in settings where a monopolist sells $k$ items over $k$ periods to a buyer who realizes his value for item $i$ in the beginning of period $i$. We require that the mechanism satisfies a strong individual rationality constraint, requiring that the stage utility of each agent be positive during each period. We show that the optimum mechanism can be computed by solving a nested sequence of static (single-period) mechanisms that optimize a tradeoff between the surplus of the allocation and the buyer's utility. We also provide a simple dynamic mechanism that obtains at least half of the optimal revenue. The mechanism either ignores history and posts the optimal monopoly price in each period, or allocates with a probability that is independent of the current report of the agent and is based only on previous reports. Our characterization extends to multi-agent auctions. We also formulate a discounted infinite horizon version of the problem, where we study the performance of "Markov mechanisms."

Joint work with Itai Ashlagi and Costis Daskalakis.

## An Isoperimetric Theorem for the Boolean Hypercube

An Isoperimetric theorem is a geometric inequality that establishes the relationship between the circumference of a closed curve in the plane and the area it encloses. For functions $f:\{0,1\}^d\rightarrow \{0,1\}$, there are analogous isoperimetric theorems connecting the cardinality of $\{x\in \{0,1\}^d:f(x)=1\}$ and that of $\{(x,y):f(x)=1, f(y)=0;x,y \text{ differ in exactly one coordinate}\}$. In this talk, we will look at the proof of an isoperimetric theorem due to Khot et al. (2015), which is used to design a monotonicity tester for Boolean functions on {0,1}^d with optimal query complexity.

Based on the paper “On Monotonicity Testing and Boolean Isoperimetric type Theorems” by Khot, Minzer and Safra (2015).

## Max-Information, Differential Privacy, and Post-Selection Hypothesis Testing

Hypothesis tests are run on data to indicate if the dataset is likely to have been drawn from one of a particular set of distributions (called the “null hypothesis”). Classical statistical theory assumes that the test to be performed is chosen independently of the data. In practice, however, tests are often selected as a function of the data. As a result, the false positive rate for such ‘post-selection’ hypothesis tests is much higher than what classical statistical theory predicts.

‘Adaptive data analysis’ refers to the reuse of data for analyses that were suggested by previously computed statistics on the same data. In addition to post-selection hypothesis testing, adaptive data analysis captures many real-world settings: machine learning competitions where a validation set is used to evaluate multiple submissions from participants, learning algorithms in which feature selection is followed by regression, and when a research paper inspires subsequent work based on the same dataset. In each of these settings, adaptivity can easily lead to overfitting and incorrect conclusions such as false positives.

In this talk, we look at how to prevent overfitting in such settings by restricting the class of analyses that are allowed on the data. We observe that limiting the ‘max-information’ leaked by an analysis limits overfitting in subsequent analyses. We show bounds on the max-information leaked by algorithms that satisfy ‘approximate differential privacy’ (also known as $(\epsilon, \delta)$-differential privacy), when the dataset is drawn from a product distribution. This demonstrates that sequences of differentially private analyses can use a single dataset while providing statistically valid answers. Our techniques generalize and partly unify results of previous works (Dwork et al. (2015), Bassily et al. (2016), Russo and Zou (2016)).

Based on joint work with Ryan Rogers, Aaron Roth, and Adam Smith (FOCS 2016).

## Differential Privacy as an Approximate Coupling

Approximate lifting is a concept from formal verification for reasoning about pairs of probabilistic programs, similar to an approximate notion of probabilistic couplings. Recently, we have explored an interesting connection: differential privacy is a consequence of a particular form of approximate lifting. This view yields a composition principle for differential privacy that is more general than standard composition, so that privacy for algorithms like Sparse Vector and the Exponential mechanism follow from an intuitive composition argument. In this talk we will present the connection in more detail and describe its implications both for formal verification of differential privacy and for the theory of differential privacy.

Joint with Gilles Barthe, Marco Gaboradi, Benjamin Gregoire, and Pierre-Yves Strub.

## Assembly of Big Genomic Data

As genome sequencing technologies continue to facilitate the generation of large datasets, developing scalable algorithms has come to the forefront as a crucial step in analyzing these datasets. In this talk, I will discuss several recent advances, with a focus on the problem of reconstructing a genome from a set of reads (genome assembly). I will describe low-memory and scalable algorithms for automatic parameter selection and de Bruijn graph compaction, recently implemented in two tools KmerGenie and bcalm. I will also present recent advances in the theoretical foundations of genome assemblers.

## Modern Cryptography in the Age of Cloud Computing

Going beyond the basic challenge of private communication, in the last 35 years, cryptography has become the general study of correctness and privacy of computation in the presence of a computationally bounded adversary, and as such has changed how we think of proofs, secrets, and information.

In this talk I will discuss some beautiful developments in the theory of cryptography, and the role it can play in the next successful shift from local to global computation.

Short Bio: Goldwasser is the RSA Professor of Electrical Engineering and Computer Science at MIT. She is also a professor of computer science and applied mathematics at the Weizmann Institute of Science in Israel. Goldwasser received a BS degree in applied mathematics from Carnegie Mellon University in 1979, and MS and PhD degrees in computer science from the University of California, Berkeley, in 1984.

Goldwasser's pioneering contributions include the introduction of probabilistic methods in cryptography, interactive proofs, zero knowledge protocols, pseudo random functions, hardness of approximation proofs for combinatorial problems, and multi-party secure protocols.

She was the recipient of the ACM Turing Award for 2012, the Gödel Prize in 1993 and another in 2001, the ACM Grace Murray Hopper award, the RSA award in mathematics, the ACM Athena award for women in computer science, the Benjamin Franklin Medal and the IEEE Emanuel R. Piore award. She is a member of the AAAS, NAS and NAE.

## Local Computation Algorithms with Query Complexity Independent of the Input Size

Suppose you are computing on a very large input and, moreover, the answer to your computational problem is long. Local Computation Algorithms (LCAs) allow you to quickly answer queries about parts of one arbitrary legal solution to the problem. For example, such an algorithm can answer whether a given node is in a vertex cover of the input graph. Replies about different nodes have to be consistent with one vertex cover.

In this talk, we will discuss LCAs from "Constant-Time Local Computation Algorithms" by Yishay Mansour, Boaz Patt-Shamir and Shai Vardi (2014). We will consider the following problems: (1) Maximum Forest in a weighted graph and (2) Multicommodity Flow and Multicut on trees with edges with integer capacities. We will give approximation LCAs for these problems with time complexity independent of the size of the input.

## A Subcubic-time Algorithm for a Special Matrix Product and its Applications

Given two real-valued matrices $A$ and $B$, the (min,+) product of $A$ and $B$ is the matrix $C$ such that $C_{ij}$ is equal to the minimum entry in the sum of the i-th row of A and j-th column of B. A Bounded Differences matrix (BD matrix) is one in which any two consecutive entries differ by at most a constant. Several classical problems, such as the language edit distance problem, RNA folding, and optimum stack generation, can be cast as computing the (min,+) product of appropriate BD matrices. Until recently, none of these problems had algorithms that ran in time $O(n^k)$ where $k < 3$.

In this talk, we will discuss a recent $O(n^{2.8244})$ time algorithm, due to Bringmann, Grandoni, Saha, and Vassilevska Williams, for computing the (min,+) product of BD matrices, and also touch upon some of the applications.

## Privacy Amplification and Randomness Extractors

Privacy amplification, also called randomness extraction, is used to extract a uniformly random string from a somewhat random string. In this talk, we will prove the leftover hash lemma (LOHL), which gives an explicit construction and analysis of a randomness extractor. The LOHL says that if we apply a pairwise independent hash function to the somewhat random string, then the statistical distance from the distribution of the hash result to the uniform distribution will be bounded. As long as the output length is slightly smaller than the min entropy in the input, we get a sufficiently random output.

We will also generalize this result to a conditional version (useful in applications). Finally, we will show another construction of a randomness extractor which uses a much shorter seed to achieve output length nearly as good as achieved by the LOHL.

## Streaming and Sketching Algorithms and Their Applications

Streaming and sketching algorithms have found applications in machine learning, astronomy, medicine, networking, natural language processing and other disciplines. The practicality of streaming and sketching algorithms stems from (1) simplicity and generality of the streaming model and (2) the ability to provide results in real time (e.g., in network monitoring) and represent Big Data with small “sketches” (e.g., in astronomy, statistics).

In this talk we will give a gentle introduction to streaming and sketching algorithms and demonstrate their applicability in cosmological N-body simulations, network monitoring, and statistical inference on massive graphs. No prior knowledge of streaming or sketching algorithms is required.

Short Bio: Vladimir Braverman is an Assistant Professor in the Department of Computer Science in the Whiting School of Engineering at the Johns Hopkins University. He received his PhD from UCLA in 2011. His main research interests include streaming and sketching algorithms, in particular, universal sketches for norms and functions. His research is supported by DARPA, NSF, Google, and Nvidia.

## Privacy Amplification and Randomness Extractors

Progressive algorithms are algorithms which, on the way to computing a complete solution to a given problem, output intermediate solutions that approximate the final solution increasing well.

We go over a framework for analyzing such algorithms. We begin initially within the context of sorting and thereafter utilise the framework to provide a progressive algorithm for the $k$-popular places problem.

The popular places problem asks which regions are visited by at least a certain number of entities. More formally, you are given $k$ and a set of trajectories of entities. The trajectories are in the form of points visited in $\mathbb{R}2$. Now we want to find regions in $\mathbb{R}2$ such that at least $k$ of the entities visited those regions.

## The Power and Limitations of Uniform Samples in Testing Properties of Figures

We study property testing of 2-dimensional figures $(U,C)$ (where $U$ is a 2-dimensional set and $C$ is a subset of $U$) that represent a black object $C$ on a white background $U\backslashC$. For $\epsilon$ in $(0,1/2)$, a figure $(U,C)$ is $\epsilon$-far from a given property if for every figure $(U,C')$ over the same universe that has the property, the probability of the symmetric difference between $(U,C)$ and $(U,C')$ under the uniform distribution on $U$ is at least $\epsilon$. A property tester has to accept the input figure with probability at least $2/3$ if it satisfies a given property and reject the figure with probability at least $2/3$ if the figure is $\epsilon$-far from satisfying the property. Generally, property testers can query points of their choice from the input figure and obtain the colors of those points.

In this talk, we prove that for the property of being a half-plane, uniform testers can be as powerful as general testers: $O(1/\epsilon)$ uniform samples are enough for them. However, for the convexity property, we show that it can be tested with only $O(1/\epsilon)$ samples by general testers but every uniform tester for that property requires $O(\epsilon^{-5/4})$ samples.

Joint work with Sofya Raskhodnikova and Piotr Berman.