The weekly Theory seminar provides an opportunity to get exposed to
cutting edge research in theoretical computer science, to learn to
understand technical material from talks and to give technical
presentations. Graduate students interested in doing research in
algorithms, complexity theory, cryptography, quantum
computing and related areas are strongly encouraged to attend.

All talks will be at 2:30 pm on Mondays in 118 Earth and Engineering Science Building (unless they
are part of the departmental colloquium). Any unusual date/time/location is highlighted in the schedule.

If you would like to speak,
please email us at: theory [at] cse [dot] psu [dot] edu.

If you want to receive announcements of upcoming talks, please
join the theory
mailing list.

We initiate a systematic study of sublinear-time algorithms for approximately testing properties of real-valued data, where the quality of
approximation is measured with respect to L_p distances. L_p distances generalize the standard Euclidian distance (L_2) and the Hamming distance (L_0). By real-valued data,
we mean datasets that are meaningfully represented as real-valued functions. Our algorithms, called L_p-testers, are required to distinguish datasets that have a required property
from those that are far in L_p distance from any dataset with the required property. This is a generalization of standard property testers, which are defined with respect to the
Hamming distance. Using general L_p distances is more appropriate for many computational tasks on real-valued data.

We use our framework to design simple and fast algorithms for classic problems, such as testing monotonicity, convexity and the Lipschitz property, and also approximating
the distance to monotonicity. For some of these problems, our L_p-testers for p>=1 are faster than possible in the standard property testing framework.

In the talk, we will explain and motivate the new framework, give an overview of the results, and explain some of the techniques.

Joint work with Piotr Berman and Grigory Yaroslavtsev, STOC 2014.

Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds

The volume of high-quality
data that we can access, analyze, and learn from has reached astronomical figures. Since these data often contain sensitive or
personally identifiable information, privacy has become a subject of a major concern. As it is hard to intuitively reason about
the notion of privacy, we were compelled to use the rigorous mathematical approach for this matter. The notion of differential
privacy [Dwork-McSherry-Nissim-Smith, 2006] is one of the most powerful and widely-accepted mathematical definitions of privacy.
Roughly speaking, a (randomized) algorithm that takes a data set as an input is said to be differentially private if, for any pair
of data sets D and D’ that differ only in one record, the distribution of the algorithm’s output remains ``almost’’ the same
whether it is run on D or D’.

In this talk, we consider the problem of Empirical Risk Minimization (ERM) under the differential privacy constraint. The notion of ERM is at the heart of learning theory and statistics, and is a central tool in machine learning algorithms. In the traditional convex ERM, we are given a data set {d_1, …, d_n}, together with a convex set C known as the parameter set. For a given
parameter w in C, each data point d_i contributes a loss L(w; d_i). Given a data set {d_1,..., d_n}, the algorithm's goal is to find a parameter w that minimizes the empirical risk, defined as \sum_{i=1}^n L(w;d_i).

In this talk, we show how to construct efficient differentially private algorithms with optimal excess risk
(i.e., the least amount of error necessary to satisfy the differential privacy constraint). Namely, we provide efficient
algorithms and matching lower bounds for (i) the "general" setting where we only assume the loss function is convex and
Lipschitz-continuous and the convex constraint set is bounded; and (ii) the setting where the loss function is also known to be
strongly convex. We give separate algorithms (and lower bounds) for the two standard cases of differential privacy: pure $\epsilon$
and $(\epsilon, \delta)$ differential privacy. Perhaps, surprisingly, the techniques used for designing optimal algorithms in the
two cases are completely different.

*Based on joint work with Adam Smith and Abhradeep Thakurta - To appear in FOCS 2014.

Optimal Property Testers Over Product Distributions

We study algorithms that, given access to a small number
of samples from a large dataset, approximately determine whether the dataset satisfies a desired property. More specifically,
our algorithms accept if the dataset has the property and reject with high probability if the dataset is far from having the
property. The distance to having the property is measured with respect to a known or an unknown distribution on the datasets.
Since any dataset can be represented as a function, we study testing properties of functions. In this work, we focus on functions
over domains of the form {1,2,..,n}^d (that is, d-dimensional hypergrids). We look at a general class of properties of such
functions, called bounded derivative properties (BDP), which includes monotonicity and the Lipschitz property. These two
properties are fundamental and have applications in processing database queries and data-privacy, respectively.

We give an optimal tester for BDPs for the case when the distance to the property is measured with respect to a product
distribution, that is, a distribution where each coordinate is chosen independently. Our main tool here is a novel dimension
reduction which reduces testing properties of functions over {1,2,..,n}^d to testing functions over {1,2,..,n}. This dimension
reduction is optimal up to a constant factor. For BDPs of functions over {1,2,...,n}, we design an optimal tester for the case
when the distribution is known. Our tester is based on Knuth's construction of binary search trees over {1,2,..,n} with minimum
expected depth. As a special case, we obtain an optimal monotonicity tester for {1,2,..,n}, thus improving the tester given by
Ailon and Chazelle (Information and Computation, 2006). Our work resolves two open problems given in their work.

Joint work with Deeparnab Chakrabarty, Madhav Jha, and C.Seshadhri (to appear in SODA 2015)

Pairwise spanners with additive distortion

An additive pairwise spanner of an undirected graph is a subgraph
in which the shortest distances between certain specified pairs of nodes are larger than the corresponding distances in the
original graph by at most an additive term. This additive term is called the distortion of that spanner. In the case when the
approximation requirement applies to all pairs of nodes in the graph, the subgraph is simply called an additive spanner.

Additive spanners are very well studied. They have applications in compact routing schemes, near shortest path algorithms,
approximate distance oracles, etc. It is NP-hard to compute the sparsest additive pairwise spanner of a graph. The typical goal
is to design algorithms which for specific values of distortion construct pairwise spanners which are as sparse as possible.

In the talk we will present a brief survey of the work done in the area of graph spanners, the important algorithmic techniques
used in constructing them and high level ideas of some of our algorithms for additive pairwise spanners.

Joint work with T. Kavitha.

A Near-Optimal Algorithm for Testing Graph Isomorphism

The goal of the graph isomorphism problem is to check
if two finite graphs are identical if their labels are ignored. Apart from practical applications such as chemical compound
identification and circuit verification, the problem is interesting due to its challenging open computational status.

In the property testing framework, Fischer and Matsliah (SICOMP 2008) showed that the number of queries necessary to
(approximately) test isomorphism of two unknown graphs in the dense graph model is at least Omega(n) and at most O(n^(5/4)).
We essentially close this gap by showing an algorithm that makes n * 2^O(sqrt(log n)) queries.

Distribution testing (Batu et al. [JACM 2013]; Batu et al. [FOCS 2001]) is one of the main tools in this line of research.
A major obstacle in the quest for an efficient graph isomorphism tester turns out to be the Omega(n^(2/3)) distribution testing
lower bound due to Valiant (SICOMP 2011). We bypass this lower bound by designing a more efficient algorithm that for two
distributions on near-isomorphic sets of points is required to reject only if the Earth-Mover Distance between them is large.

Joint work with Xiaorui Sun (Columbia University).

Sample-Based Constant-Time Algorithms for Properties of Images

We initiate a systematic study of sublinear-time algorithms for image analysis that have access
only to labeled random samples from the input. Most previous sublinear-time algorithms for image analysis were query-based, that is, they could query pixels of their choice.
We consider algorithms with two types of input access: sample-based algorithms that draw independently random pixels, and block-sample-based algorithms that draw pixels from
independently random square blocks of the image. We investigate three basic properties of black-and-white images: being a half-plane, convexity and connectedness. For the first
two properties, all our algorithms are sample-based, and for connectedness they are block-sample-based. All algorithms we present have low sample complexity that depends only
on the error parameter, but not on the input size.

We design algorithms that approximate the distance to the three properties within a small additive error or, equivalently, tolerant testers for being a half-plane,
convexity and connectedness. Tolerant testers for these properties, even with query access to the image, were not investigated previously. Tolerance is important in image
processing applications because it allows algorithms to be robust to noise in the image. We also give (non-tolerant) testers for these properties with better complexity than
implied by our distance approximation algorithms. For convexity and connectedness, our testers are faster than previously known query-based testers.

To obtain our algorithms for convexity, we design two fast proper PAC learners of convex sets in two dimensions that work under the uniform distributions: non-agnostic and agnostic.

Joint work with Piotr Berman and Sofya Raskhodnikova.

On the use of regularization in deep neural networks

A deep neural network (DNN) is a classification algorithm described by many layers of threshold
gates arranged in sequence. DDNs have recently become hugely popular, both in industry and among academics, mostly due to their stellar success in computer vision and speech
processing. Unfortunately, training these DNNs for a new learning application is notoriously hard, partly because of the nonconvex optimization problems that arise. In contrast,
'the optimization problems that arise in training more traditional classification models are convex, and hence relatively easy to solve.

Dropout regularization (Hinton et al, 2012) is a heuristic designed to help in the training of DNNs. Dropout significantly reduces the classification error of DNNs on many benchmark
data sets, and is now widely used in practice. However, despite recent efforts (Baldi and Sadowski, 2013; Wager et al., 2013), dropout’s theoretical properties remain poorly
understood. Until this work there were no rigorous guarantees on its effect on classification error.

We provide the first proof that the generalization error of a neural network trained using dropout regularization goes to 0 as the number of training examples tends to infinity.
Our proof applies to a one-layer neural network. Moreover, we provide strong stability (robustness) guarantees for dropout and relate them to differential privacy, a definition of
privacy for statistical databases. Finally, we provide experimental evidence supporting our theoretical claims.

In this talk, I will introduce deep neural networks and the dropout heuristic in its general form. I will explain our results on the error and the robustness of dropout in the
context of one-layer neural networks, and relate them to differential privacy. No prior knowledge of neural networks or differential privacy is assumed.

Joint work with Prateek Jain [Microsoft Research India], Vivek Kulkarni [SUNY, Stony Brook], and Oliver Williams.

RSA Key Extraction via Low-Bandwidth Acoustic Cryptanalysis

RSA is one of the most popular asymmetric crypto schemes, which is considered a very strong and secure
crypto scheme given long keys (e.g., 2048 bits). However, Daniel Genkin, Adi Shamir, and Eran Tromer found that due to the vibration in some of the computer devices' electronic
components, a high-pitched noise is emitted during the RSA's decryption/signature process. In this talk, we will discuss how to leverage these acoustic emanations to "hear" the sensitive
information about security-related computations.

In the discussed paper, a new acoustic cryptanalysis key extraction method based on chosen ciphertext attack is designed and implemented against GnuPG’s current implementation of RSA.
The attack can extract full 4096-bit RSA decryption keys from laptop computers by sending email to the Enigmail software which uses GnuPG RSA for email content decryptions/signature.
Their experiment result shows that the designed attacks can be carried out, using either a normal mobile phone placed next to the computer, or a more sensitive microphone placed 4 meters
away.

As background knowledge, we will also explain the RSA decryption / signature procedure and its relevant optimization approach based on the Chinese Remainder Theorem.

The Multiplicative Weights Update Method and an Application to Solving Zero-Sum Games Approximately

Consider the following situation: Given the unpredictable
weather in State College, every morning you have to predict whether it will rain that day or not. To do this, you can look at the predictions of a handful of experts and compute your
answer based on those. However, you have to ensure that, over any period of time, the number of mispredictions that you make is a “good” approximation to that made by the best expert.
The weighted majority algorithm solves this and other similar prediction problems with just two outcomes.

In this talk, we will first discuss the weighted majority algorithm. We will then introduce a meta algorithm, called the Multiplicative Weights Update Method, which generalizes the
weighted majority algorithm. It solves prediction problems even when the set of outcomes to predict from is infinite. Finally, we will see an application of this method to two-player
zero-sum games. Such games have an optimal solution if both players are allowed to use mixed strategies. By applying the Multiplicative Weights Update Method, we will find a solution
whose value is bounded within an additive term from the optimum.

Consider the following problem: a large public electricity provider
(say, in California) faces a situation where demand for electricity might, without intervention, rise above the utility's ability to generate power. Rather than resorting to rolling
brown-outs, however, a forward-thinking ballot initiative has given the utility the ability to shut off the air-conditioners of individual buildings remotely. Using this ability, the
utility hopes, they might be able to coordinate shut-offs so that nobody is ever uncomfortable (say, guaranteeing that every apartment's air conditioner runs during some 10 minute
interval of every hour in which the apartment is occupied), but so that peak electricity usage never rises above peak power production.

While this combinatorial optimization approach to the problem might be preferable to rolling blackouts, it introduces a privacy concern: the utility will now be making decisions as a
function of when customers report they are at home, which can be highly sensitive information. Is there a way to solve this problem so that no coalition of customers can learn about
the schedule of any customer not in the coalition? We can ask the same question about many other combinatorial optimization problems based on private data: routing agents in a capacity
constrained network between private s_i-t_i pairs, scheduling jobs with secret resource demands on machines with different resource constraints, or finding the cheapest way to outfit
hospital wings with shared resources, to meet heterogenous demand.

We show that the answer is "yes" to these problems, and to a broad class of convex programming problems. Moreover, we show that when the objective being maximized is social welfare,
it is possible to pair the nearly optimal primal solutions we obtain with Walrasian-equilibrium like prices on items/resources (corresponding to a near optimal dual solution) which
results in truth-telling as an asymptotically dominant strategy, as the size of the problem grows large in a mild way.

Preventing False Discovery in Interactive Data Analysis is Hard

We show that, under a standard hardness assumption, there is no computationally
efficient algorithm that given n samples from an unknown distribution can give valid answers to O(n^2) adaptively chosen statistical queries. A statistical query asks for
the expectation of a predicate over the underlying distribution, and an answer to a statistical query is valid if it is "close" to the correct expectation over the distribution.

Our result stands in stark contrast to the well known fact that exponentially many statistical queries can be answered validly and efficiently if the queries are chosen non-adaptively
(no query may depend on the answers to previous queries). Moreover, a recent work of Dwork et al. shows how to accurately answer exponentially many adaptively chosen statistical queries
via a computationally inefficient algorithm; and how to answer a nearly quadratic number of adaptive queries via a computationally efficient algorithm. Thus, our hardness results are
essentially optimal.

Conceptually, our result demonstrates that achieving statistical validity alone can be a source of computational intractability in adaptive settings. For example, in
the modern large collaborative research environment, data analysts typically choose a particular approach based on previous findings. False discovery occurs if a research
finding is supported by the data but not by the underlying distribution. While the study of preventing false discovery in Statistics is decades old, to the best of our knowledge our
result is the first to demonstrate a computational barrier. In particular, our result suggests that the perceived difficulty of preventing false discovery in today's collaborative
research environment may be inherent.

This talk is based on joint works with Moritz Hardt and Thomas Steinke.

Statistical Learning in Complex Prediction Spaces: What Do We Know?

In an increasing number of application domains, supervised learning methods are
' needed for making predictions in complex spaces: predicting sequences in speech recognition, predicting rankings in information retrieval and recommender systems, predicting
assignments in image segmentation, and so on. How should we design statistically consistent learning algorithms for such problems? A popular learning approach is to minimize a
convex surrogate loss, but how does one design surrogates that lead to consistent algorithms? While the answer to this question is well understood for simple binary classification
tasks and a few other specific learning problems, a general understanding has been missing. In this talk I will describe some of our recent work on developing a unified framework for
designing statistically consistent learning algorithms for complex prediction problems defined by an arbitrary loss matrix. I will introduce the notion of convex calibration dimension
of a general loss matrix, which measures the smallest dimension of a surrogate space in which one can design a convex calibrated surrogate loss for a given loss matrix, and will
describe some ways to design explicit convex calibrated surrogates for any given loss matrix. I will also discuss ways to achieve approximate statistical consistency when exact
consistency may be computationally hard to achieve. I will conclude by describing some fundamental open questions.

Quantum strategies in games

In the usual prisoner's dilemma situation, the optimal strategy---where both players cooperate---is not reached,
because one player would always gain more by defecting. What will happen if quantum strategies are allowed? In this talk, we will clarify what quantum strategies mean and
show that the dilemma is removed.

The talk is based on "Quantum Games and Quantum Strategies" by J. Eisert, M. Wilkens, and M. Lewenstein.

Bounds on sample complexity in PAC learning

In this talk, we will define Valiant's Probably Approximately Correct (PAC) learning model and
study the number of samples sufficient for learning in this model. In PAC learning, we wish to approximate an unknown function from some specified class
(for example, the class of functions of the form R^2 -> {0,1} that represent half-planes) after seeing a small number of samples from domain labeled with the value
of the input function. We will first bound the complexity in terms of the concept class. Then we will look at the definition of Vapnik-Chervonenkis (VC) dimension,
using which we get a much better bound on the sample complexity.