# Theoretical Computer Science Group

## Theory Seminar Schedule for Fall 2016

The weekly Theory seminar provides an opportunity to get exposed to cutting edge research in theoretical computer science, to learn to understand technical material from talks and to give technical presentations. Graduate students interested in doing research in algorithms, complexity theory, cryptography, quantum computing and related areas are strongly encouraged to attend.

DateSpeakerTitle
August 26
10:45 am
Organizational Meeting (Location: 357, IST Building)
September 9 Greg Bodwin (Stanford) The 4/3 Additive Spanner Exponent is Tight
September 23 Nai-Hui Chia (PSU) How hard is deciding trivial versus nontrivial in the dihedral coset problem?
September 23 (Colloquium) Jan Holub (CVUT) Approximate String Matching for Self-Indexes
September 30 Nima Haghpanah (PSU, Economics) Sequential Mechanisms with ex-post Participation Guarantees
October 7
10:30-11:45 a.m.
Ramesh Krishnan (PSU) An Isoperimetric Theorem for the Boolean Hypercube
October 14 Om Thakkar (PSU) Max-Information, Differential Privacy, and Post-Selection Hypothesis Testing
October 21 Justin Hsu (UPenn) Differential Privacy as an Approximate Coupling
October 21 (Colloquium) Paul Medvedev (PSU) Assembly of Big Genomic Data
October 28 (Colloquium) Shafi Goldwasser (MIT) Modern Cryptography in the Age of Cloud Computing
(Kelly Distinguished lecture)
November 4 Roksana Baleshzar (PSU) TBD
November 11 Nithin Varma (PSU) TBD
November 18 Jiayu Zhang (PSU) TBD
November 18 (Colloquium) Vladimir Braverman (Johns Hopkins) TBD
November 25 Thanksgiving week, no seminar
December 2 Ishan Behoora (PSU) TBD
December 9 Meiram Murzabulatov (PSU) TBD
• Talks will generally be held on Fridays from 10:45-11:45 a.m. in 219 Hammond Building (unless they are part of the departmental colloquium, in which case they will be held from 2-3 p.m. in 113 IST Building (Cybertorium)). Any unusual date/time/location is highlighted in the schedule.
• If you would like to speak, please email us at: theory [at] cse [dot] psu [dot] edu.
• If you want to receive announcements of upcoming talks, please join the theory mailing list.
Archives

## The 4/3 Additive Spanner Exponent is Tight

A $+k$ additive spanner is a sparse subgraph that preserves all pairwise distances up to additive error $+k$. A classic result in this field states that all graphs have +6 spanners on $n^{4/3}$ edges, but no sparser spanners are known for any constant $k$. Meanwhile, current lower bound allow for the possibility of constant-error additive spanners on $n^{1 + \epsilon}$ edges for any $\epsilon > 0$. It is considered a central open problem to prove the existence/nonexistence of these nearly-linear sized additive spanners.

We resolve this question in the negative by constructing a family of graphs that provably have no $+n^{o(1)}$ additive spanners on $n^{4/3 - \epsilon}$ edges. In this talk, we will describe the construction technique. We will then discuss some extensions and implications of this new lower bound.

Joint work with Amir Abboud.

## How hard is deciding trivial versus nontrivial in the dihedral coset problem?

We study the hardness of the dihedral hidden subgroup problem. It is known that lattice problems reduce to it, and that it reduces to random subset sum with density $> 1$ and also to quantum sampling subset sum solutions. We examine a decision version of the problem where the question asks whether the hidden subgroup is trivial or order two. The decision problem essentially asks if a given vector is in the span of all coset states. We approach this by first computing an explicit basis for the coset space and the perpendicular space. We then look at the consequences of having efficient unitaries that use this basis. We show that if a unitary maps the basis to the standard basis in any way, then that unitary can be used to solve random subset sum with constant density $>1$. We also show that if a unitary can exactly decide membership in the coset subspace, then the collision problem for subset sum can be solved for density $>1$ but approaching $1$ as the problem size increases. This strengthens the previous hardness result that implementing the optimal POVM in a specific way is as hard as quantum sampling subset sum solutions.

This is a joint work with Sean Hallgren and is going to be presented in TQC2016.

## Approximate String Matching for Self-Indexes

String matching is very frequent task of locating all occurrence of pattern P in text T. When we have text T in advance we can construct an index for the text that allows to search for the pattern in time linear to the length of the pattern. Self-index is an index which in addition allows to replace the original text. So it allows queries retrieving the text. However, only exact string matching is allowed by self-indexes. The lecture presents an interface between approximate string matching and self-indexes.

## Sequential Mechanisms with ex-post Participation Guarantees

We provide a characterization of revenue-optimal dynamic mechanisms in settings where a monopolist sells $k$ items over $k$ periods to a buyer who realizes his value for item $i$ in the beginning of period $i$. We require that the mechanism satisfies a strong individual rationality constraint, requiring that the stage utility of each agent be positive during each period. We show that the optimum mechanism can be computed by solving a nested sequence of static (single-period) mechanisms that optimize a tradeoff between the surplus of the allocation and the buyer's utility. We also provide a simple dynamic mechanism that obtains at least half of the optimal revenue. The mechanism either ignores history and posts the optimal monopoly price in each period, or allocates with a probability that is independent of the current report of the agent and is based only on previous reports. Our characterization extends to multi-agent auctions. We also formulate a discounted infinite horizon version of the problem, where we study the performance of "Markov mechanisms."

Joint work with Itai Ashlagi and Costis Daskalakis.

## An Isoperimetric Theorem for the Boolean Hypercube

An Isoperimetric theorem is a geometric inequality that establishes the relationship between the circumference of a closed curve in the plane and the area it encloses. For functions $f:\{0,1\}^d\rightarrow \{0,1\}$, there are analogous isoperimetric theorems connecting the cardinality of $\{x\in \{0,1\}^d:f(x)=1\}$ and that of $\{(x,y):f(x)=1, f(y)=0;x,y \text{ differ in exactly one coordinate}\}$. In this talk, we will look at the proof of an isoperimetric theorem due to Khot et al. (2015), which is used to design a monotonicity tester for Boolean functions on {0,1}^d with optimal query complexity.

Based on the paper “On Monotonicity Testing and Boolean Isoperimetric type Theorems” by Khot, Minzer and Safra (2015).

## Max-Information, Differential Privacy, and Post-Selection Hypothesis Testing

Hypothesis tests are run on data to indicate if the dataset is likely to have been drawn from one of a particular set of distributions (called the “null hypothesis”). Classical statistical theory assumes that the test to be performed is chosen independently of the data. In practice, however, tests are often selected as a function of the data. As a result, the false positive rate for such ‘post-selection’ hypothesis tests is much higher than what classical statistical theory predicts.

‘Adaptive data analysis’ refers to the reuse of data for analyses that were suggested by previously computed statistics on the same data. In addition to post-selection hypothesis testing, adaptive data analysis captures many real-world settings: machine learning competitions where a validation set is used to evaluate multiple submissions from participants, learning algorithms in which feature selection is followed by regression, and when a research paper inspires subsequent work based on the same dataset. In each of these settings, adaptivity can easily lead to overfitting and incorrect conclusions such as false positives.

In this talk, we look at how to prevent overfitting in such settings by restricting the class of analyses that are allowed on the data. We observe that limiting the ‘max-information’ leaked by an analysis limits overfitting in subsequent analyses. We show bounds on the max-information leaked by algorithms that satisfy ‘approximate differential privacy’ (also known as $(\epsilon, \delta)$-differential privacy), when the dataset is drawn from a product distribution. This demonstrates that sequences of differentially private analyses can use a single dataset while providing statistically valid answers. Our techniques generalize and partly unify results of previous works (Dwork et al. (2015), Bassily et al. (2016), Russo and Zou (2016)).

Based on joint work with Ryan Rogers, Aaron Roth, and Adam Smith (FOCS 2016).

## Differential Privacy as an Approximate Coupling

Approximate lifting is a concept from formal verification for reasoning about pairs of probabilistic programs, similar to an approximate notion of probabilistic couplings. Recently, we have explored an interesting connection: differential privacy is a consequence of a particular form of approximate lifting. This view yields a composition principle for differential privacy that is more general than standard composition, so that privacy for algorithms like Sparse Vector and the Exponential mechanism follow from an intuitive composition argument. In this talk we will present the connection in more detail and describe its implications both for formal verification of differential privacy and for the theory of differential privacy.

Joint with Gilles Barthe, Marco Gaboradi, Benjamin Gregoire, and Pierre-Yves Strub.

## Assembly of Big Genomic Data

As genome sequencing technologies continue to facilitate the generation of large datasets, developing scalable algorithms has come to the forefront as a crucial step in analyzing these datasets. In this talk, I will discuss several recent advances, with a focus on the problem of reconstructing a genome from a set of reads (genome assembly). I will describe low-memory and scalable algorithms for automatic parameter selection and de Bruijn graph compaction, recently implemented in two tools KmerGenie and bcalm. I will also present recent advances in the theoretical foundations of genome assemblers.

## Modern Cryptography in the Age of Cloud Computing

Going beyond the basic challenge of private communication, in the last 35 years, cryptography has become the general study of correctness and privacy of computation in the presence of a computationally bounded adversary, and as such has changed how we think of proofs, secrets, and information.

In this talk I will discuss some beautiful developments in the theory of cryptography, and the role it can play in the next successful shift from local to global computation.

Short Bio: Goldwasser is the RSA Professor of Electrical Engineering and Computer Science at MIT. She is also a professor of computer science and applied mathematics at the Weizmann Institute of Science in Israel. Goldwasser received a BS degree in applied mathematics from Carnegie Mellon University in 1979, and MS and PhD degrees in computer science from the University of California, Berkeley, in 1984.

Goldwasser's pioneering contributions include the introduction of probabilistic methods in cryptography, interactive proofs, zero knowledge protocols, pseudo random functions, hardness of approximation proofs for combinatorial problems, and multi-party secure protocols.

She was the recipient of the ACM Turing Award for 2012, the Gödel Prize in 1993 and another in 2001, the ACM Grace Murray Hopper award, the RSA award in mathematics, the ACM Athena award for women in computer science, the Benjamin Franklin Medal and the IEEE Emanuel R. Piore award. She is a member of the AAAS, NAS and NAE.