The weekly Theory seminar provides an opportunity to get exposed to
cutting edge research in theoretical computer science, to learn to
understand technical material from talks and to give technical
presentations. Graduate students interested in doing research in
algorithms, complexity theory, cryptography, quantum
computing and related areas are strongly encouraged to attend.

All talks will be at 2:30 pm on Mondays in 118 Earth and Engineering Science Building (unless they
are part of the departmental colloquium). Any unusual date/time/location is highlighted in the schedule.

If you would like to speak,
please email us at: theory [at] cse [dot] psu [dot] edu.

If you want to receive announcements of upcoming talks, please
join the theory
mailing list.

We initiate a systematic study of sublinear-time algorithms for approximately testing properties of real-valued data, where the quality of
approximation is measured with respect to L_p distances. L_p distances generalize the standard Euclidian distance (L_2) and the Hamming distance (L_0). By real-valued data,
we mean datasets that are meaningfully represented as real-valued functions. Our algorithms, called L_p-testers, are required to distinguish datasets that have a required property
from those that are far in L_p distance from any dataset with the required property. This is a generalization of standard property testers, which are defined with respect to the
Hamming distance. Using general L_p distances is more appropriate for many computational tasks on real-valued data.

We use our framework to design simple and fast algorithms for classic problems, such as testing monotonicity, convexity and the Lipschitz property, and also approximating
the distance to monotonicity. For some of these problems, our L_p-testers for p>=1 are faster than possible in the standard property testing framework.

In the talk, we will explain and motivate the new framework, give an overview of the results, and explain some of the techniques.

Joint work with Piotr Berman and Grigory Yaroslavtsev, STOC 2014.

Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds

The volume of high-quality
data that we can access, analyze, and learn from has reached astronomical figures. Since these data often contain sensitive or
personally identifiable information, privacy has become a subject of a major concern. As it is hard to intuitively reason about
the notion of privacy, we were compelled to use the rigorous mathematical approach for this matter. The notion of differential
privacy [Dwork-McSherry-Nissim-Smith, 2006] is one of the most powerful and widely-accepted mathematical definitions of privacy.
Roughly speaking, a (randomized) algorithm that takes a data set as an input is said to be differentially private if, for any pair
of data sets D and D’ that differ only in one record, the distribution of the algorithm’s output remains ``almost’’ the same
whether it is run on D or D’.

In this talk, we consider the problem of Empirical Risk Minimization (ERM) under the differential privacy constraint. The notion of ERM is at the heart of learning theory and statistics, and is a central tool in machine learning algorithms. In the traditional convex ERM, we are given a data set {d_1, …, d_n}, together with a convex set C known as the parameter set. For a given
parameter w in C, each data point d_i contributes a loss L(w; d_i). Given a data set {d_1,..., d_n}, the algorithm's goal is to find a parameter w that minimizes the empirical risk, defined as \sum_{i=1}^n L(w;d_i).

In this talk, we show how to construct efficient differentially private algorithms with optimal excess risk
(i.e., the least amount of error necessary to satisfy the differential privacy constraint). Namely, we provide efficient
algorithms and matching lower bounds for (i) the "general" setting where we only assume the loss function is convex and
Lipschitz-continuous and the convex constraint set is bounded; and (ii) the setting where the loss function is also known to be
strongly convex. We give separate algorithms (and lower bounds) for the two standard cases of differential privacy: pure $\epsilon$
and $(\epsilon, \delta)$ differential privacy. Perhaps, surprisingly, the techniques used for designing optimal algorithms in the
two cases are completely different.

*Based on joint work with Adam Smith and Abhradeep Thakurta - To appear in FOCS 2014.

Optimal Property Testers Over Product Distributions

We study algorithms that, given access to a small number
of samples from a large dataset, approximately determine whether the dataset satisfies a desired property. More specifically,
our algorithms accept if the dataset has the property and reject with high probability if the dataset is far from having the
property. The distance to having the property is measured with respect to a known or an unknown distribution on the datasets.
Since any dataset can be represented as a function, we study testing properties of functions. In this work, we focus on functions
over domains of the form {1,2,..,n}^d (that is, d-dimensional hypergrids). We look at a general class of properties of such
functions, called bounded derivative properties (BDP), which includes monotonicity and the Lipschitz property. These two
properties are fundamental and have applications in processing database queries and data-privacy, respectively.

We give an optimal tester for BDPs for the case when the distance to the property is measured with respect to a product
distribution, that is, a distribution where each coordinate is chosen independently. Our main tool here is a novel dimension
reduction which reduces testing properties of functions over {1,2,..,n}^d to testing functions over {1,2,..,n}. This dimension
reduction is optimal up to a constant factor. For BDPs of functions over {1,2,...,n}, we design an optimal tester for the case
when the distribution is known. Our tester is based on Knuth's construction of binary search trees over {1,2,..,n} with minimum
expected depth. As a special case, we obtain an optimal monotonicity tester for {1,2,..,n}, thus improving the tester given by
Ailon and Chazelle (Information and Computation, 2006). Our work resolves two open problems given in their work.

Joint work with Deeparnab Chakrabarty, Madhav Jha, and C.Seshadhri (to appear in SODA 2015)

Pairwise spanners with additive distortion

An additive pairwise spanner of an undirected graph is a subgraph
in which the shortest distances between certain specified pairs of nodes are larger than the corresponding distances in the
original graph by at most an additive term. This additive term is called the distortion of that spanner. In the case when the
approximation requirement applies to all pairs of nodes in the graph, the subgraph is simply called an additive spanner.

Additive spanners are very well studied. They have applications in compact routing schemes, near shortest path algorithms,
approximate distance oracles, etc. It is NP-hard to compute the sparsest additive pairwise spanner of a graph. The typical goal
is to design algorithms which for specific values of distortion construct pairwise spanners which are as sparse as possible.

In the talk we will present a brief survey of the work done in the area of graph spanners, the important algorithmic techniques
used in constructing them and high level ideas of some of our algorithms for additive pairwise spanners.

Joint work with T. Kavitha.

A Near-Optimal Algorithm for Testing Graph Isomorphism

The goal of the graph isomorphism problem is to check
if two finite graphs are identical if their labels are ignored. Apart from practical applications such as chemical compound
identification and circuit verification, the problem is interesting due to its challenging open computational status.

In the property testing framework, Fischer and Matsliah (SICOMP 2008) showed that the number of queries necessary to
(approximately) test isomorphism of two unknown graphs in the dense graph model is at least Omega(n) and at most O(n^(5/4)).
We essentially close this gap by showing an algorithm that makes n * 2^O(sqrt(log n)) queries.

Distribution testing (Batu et al. [JACM 2013]; Batu et al. [FOCS 2001]) is one of the main tools in this line of research.
A major obstacle in the quest for an efficient graph isomorphism tester turns out to be the Omega(n^(2/3)) distribution testing
lower bound due to Valiant (SICOMP 2011). We bypass this lower bound by designing a more efficient algorithm that for two
distributions on near-isomorphic sets of points is required to reject only if the Earth-Mover Distance between them is large.

Joint work with Xiaorui Sun (Columbia University).

Sample-Based Constant-Time Algorithms for Properties of Images

We initiate a systematic study of sublinear-time algorithms for image analysis that have access
only to labeled random samples from the input. Most previous sublinear-time algorithms for image analysis were query-based, that is, they could query pixels of their choice.
We consider algorithms with two types of input access: sample-based algorithms that draw independently random pixels, and block-sample-based algorithms that draw pixels from
independently random square blocks of the image. We investigate three basic properties of black-and-white images: being a half-plane, convexity and connectedness. For the first
two properties, all our algorithms are sample-based, and for connectedness they are block-sample-based. All algorithms we present have low sample complexity that depends only
on the error parameter, but not on the input size.

We design algorithms that approximate the distance to the three properties within a small additive error or, equivalently, tolerant testers for being a half-plane,
convexity and connectedness. Tolerant testers for these properties, even with query access to the image, were not investigated previously. Tolerance is important in image
processing applications because it allows algorithms to be robust to noise in the image. We also give (non-tolerant) testers for these properties with better complexity than
implied by our distance approximation algorithms. For convexity and connectedness, our testers are faster than previously known query-based testers.

To obtain our algorithms for convexity, we design two fast proper PAC learners of convex sets in two dimensions that work under the uniform distributions: non-agnostic and agnostic.

Joint work with Piotr Berman and Sofya Raskhodnikova.

On the use of regularization in deep neural networks

A deep neural network (DNN) is a classification algorithm described by many layers of threshold
gates arranged in sequence. DDNs have recently become hugely popular, both in industry and among academics, mostly due to their stellar success in computer vision and speech
processing. Unfortunately, training these DNNs for a new learning application is notoriously hard, partly because of the nonconvex optimization problems that arise. In contrast,
'the optimization problems that arise in training more traditional classification models are convex, and hence relatively easy to solve.

Dropout regularization (Hinton et al, 2012) is a heuristic designed to help in the training of DNNs. Dropout significantly reduces the classification error of DNNs on many benchmark
data sets, and is now widely used in practice. However, despite recent efforts (Baldi and Sadowski, 2013; Wager et al., 2013), dropout’s theoretical properties remain poorly
understood. Until this work there were no rigorous guarantees on its effect on classification error.

We provide the first proof that the generalization error of a neural network trained using dropout regularization goes to 0 as the number of training examples tends to infinity.
Our proof applies to a one-layer neural network. Moreover, we provide strong stability (robustness) guarantees for dropout and relate them to differential privacy, a definition of
privacy for statistical databases. Finally, we provide experimental evidence supporting our theoretical claims.

In this talk, I will introduce deep neural networks and the dropout heuristic in its general form. I will explain our results on the error and the robustness of dropout in the
context of one-layer neural networks, and relate them to differential privacy. No prior knowledge of neural networks or differential privacy is assumed.

Joint work with Prateek Jain [Microsoft Research India], Vivek Kulkarni [SUNY, Stony Brook], and Oliver Williams.

RSA Key Extraction via Low-Bandwidth Acoustic Cryptanalysis

RSA is one of the most popular asymmetric crypto schemes, which is considered a very strong and secure
crypto scheme given long keys (e.g., 2048 bits). However, Daniel Genkin, Adi Shamir, and Eran Tromer found that due to the vibration in some of the computer devices' electronic
components, a high-pitched noise is emitted during the RSA's decryption/signature process. In this talk, we will discuss how to leverage these acoustic emanations to "hear" the sensitive
information about security-related computations.

In the discussed paper, a new acoustic cryptanalysis key extraction method based on chosen ciphertext attack is designed and implemented against GnuPG’s current implementation of RSA.
The attack can extract full 4096-bit RSA decryption keys from laptop computers by sending email to the Enigmail software which uses GnuPG RSA for email content decryptions/signature.
Their experiment result shows that the designed attacks can be carried out, using either a normal mobile phone placed next to the computer, or a more sensitive microphone placed 4 meters
away.

As background knowledge, we will also explain the RSA decryption / signature procedure and its relevant optimization approach based on the Chinese Remainder Theorem.