Theoretical Computer Science Group

Theory Seminar Schedule for Fall 2015

The weekly Theory seminar provides an opportunity to get exposed to cutting edge research in theoretical computer science, to learn to understand technical material from talks and to give technical presentations. Graduate students interested in doing research in algorithms, complexity theory, cryptography, quantum computing and related areas are strongly encouraged to attend.

Date Speaker Title
August 24 Paul Medvedev (PSU) On the Readability of Overlap Digraphs
August 31 Yanxi Liu (PSU) Wallpaper Group-based Human Perception of Regular Patterns from EEG Data
September 9 Eunou Lee (PSU) A nearly optimal approximation algorithm for 3LIN
September 14 Nithin Varma (PSU) Erasure-Resilient Property Testing
September 21 (222 IST) Mary Wootters (CMU) Repairing Reed-Solomon Codes
September 28 Jamie Morgenstern (UPenn) The Pseudo-Dimension of Near-Optimal Auctions
October 5 Aaron Roth (UPenn) How Well do Prices Coordinate Markets
October 12 Sampath Kannan (UPenn) Graph Verification and Reconstruction
October 19 FOCS, no seminar
October 28 Bobby Kleinberg (Cornell) Secretary Problems with Non-Uniform Arrival Order
November 2 Chandra Chekuri (Illinois) Routing and Treewidth: Recent Developments
November 9 Ramesh Krishnan (PSU) A lower bound on the time complexity of Edit Distance
November 16 Ryan Rogers (UPenn) Differentially Private Chi-Squared Hypothesis Tests
November 23 Thanksgiving week, no seminar
November 30 No seminar
December 2 Michael Langberg (SUNY Buffalo) Coding for online channels
December 7 No seminar

Spring 2007     Fall 2007     Spring 2008     Summer 2008     Fall 2008     Spring 2009     Fall 2009     Spring 2010     Fall 2010     Spring 2011     Fall 2011     Spring 2012     Fall 2012     Spring 2013     Fall 2013     Fall 2014     Spring 2015

On the Readability of Overlap Digraphs

We introduce the graph parameter readability and study it as a function of the number of vertices in a graph. Given a digraph D, an injective overlap labeling assigns a unique string to each vertex such that there is an arc from x to y if and only if x properly overlaps y. The readability of D is the minimum string length for which an injective overlap labeling exists. In applications that utilize overlap digraphs (e.g., in bioinformatics), readability reflects the length of the strings from which the overlap digraph is constructed.
We study the asymptotic behavior of readability by casting it in purely graph theoretic terms (without any reference to strings). We prove upper and lower bounds on readability for certain graph families and general graphs.

This is joint work with Rayan Chikhi, Martin Milanic, and Sofya Raskhodnikova.

Wallpaper Group-based Human Perception of Regular Patterns from EEG Data

Mathematicians have long proven that there is a finite number of crystallographic groups associated with all possible periodical patterns in N-dimensional Euclidean space, while little is known on how human perceive such patterns and how the regular patterns are organized in human brain. We explore human perception using, systematically, wallpaper patterns as stimuli and collect electroencephalogram (EEG) responses of human brain directly. Our initial analysis of the EEG data through machine learning, MDS and minimum/maximum spanning tree extraction leads to surprising graph-representation resemblance between the subgroup hierarchy of the 17 wallpaper groups and the human neural responses (EEG data) to the corresponding wallpaper patterns.

Joint work of Chris Funk (PSU), Peter Jes Kohler (Stanford), Anthony Norcia (Stanford).

Helpful links:
- For a quick look on wallpaper groups, see chapter 2 of this survey paper on "computational symmetry".
- For some background/related information on our NSF grant of 'Symmetry Group-based Regularity Perception in Human and Computer Vision', check out this project page.

A nearly optimal approximation algorithm for 3LIN

Suppose we are given a system of linear equations each having 3 variables on Z_2. The problem 3LIN is to come up with an assignment to the variables that satisfies the maximum fraction of equations. Random guessing will satisfy half of equations. Can we do better? This talk will be about an algorithm that nearly matches NP-hardness result on approximation ratio. The algorithm is from Håstad’s work. For those who are interested, there is interesting history behind this result.

Erasure-Resilient Property Testing

In one of the most widely studied models of property testing [Goldreich, Goldwasser and Ron 1998], the testing algorithm has oracle access to an input object and has to distinguish with high probability, objects that satisfy a property from those that need to be changed in at least an epsilon fraction of their positions to satisfy the property. Examples of some useful common properties having efficient testers in this model are monotonicity, convexity, Lipschitz property etc. However in real life databases, several of the entries might be unavailable to the tester due to adversarial erasures or privacy requirements. The tester does not learn anything about the input by querying such erased points. Moreover, the knowledge of a tester may enable an adversary to erase some of the points so as to increase the query complexity of the tester arbitrarily, or in some cases, make the tester entirely useless. In the talk, I will define the erasure-resilient property testing model that formalizes these impediments to testing in real life. I will then discuss the main aspects of our erasure-resilient testers for monotonicity, convexity, Lipschitz property, etc. I will also describe in detail, our nearly optimal erasure-resilient tester for convexity.

Joint work with Kashyap Dixit, Sofya Raskhodnikova and Abhradeep Thakurta.

Repairing Reed-Solomon Codes

A fundamental fact about polynomial interpolation is that k evaluations of a degree-(k-1) polynomial f(x) are sufficient to determine f(x). This is also necessary in a strong sense: given k-1 evaluations of f, we learn nothing about f(a) for any kth point a. We study a variant of this polynomial interpolation problem. Instead of getting complete evaluations of f(x) -- which may live in a large finite field F -- we are allowed to ask for *partial* evaluations -- that is, a few symbols from a subfield of F, rather than one symbol from F itself -- from the evaluations f(x). The goal is again to determine f(a) for some point a that was not queried, while minimizing the total amount of information gathered from the evaluations.

We show that in this model, one can do significantly better than the traditional setting, in terms of the amount of information required to determine f(a). Our motivation comes from distributed storage systems, and from the exact repair problem for Reed-Solomon codes. The traditional use of Reed-Solomon codes for distributed storage is analogous to the standard interpolation problem: each node in a system stores an evaluation f(a), and if one node fails we can recover it by reading k other nodes. However, each node is perfectly free to send less information, leading to the modified interpolation problem above. The quickly-developing field of regenerating codes has yielded several codes which take advantage of this freedom. However, these regenerating codes are not Reed-Solomon codes, and RS codes are still often used in practice. Our results imply that, in fact, one can use Reed-Solomon codes to also take advantage of this freedom to download partial symbols. In some (disclaimer: less standard) parameter regimes, our scheme for Reed-Solomon codes outperforms all known regenerating codes.

This is joint work with Venkat Guruswami.

Short Bio: Mary Wootters is an NSF post-doctoral fellow at Carnegie Mellon University. She received her Ph.D. in math from the University of Michigan in 2014, and her B.A. in math and computer science from Swarthmore College in 2008. Her research focuses on coding theory and randomized algorithms.

The Pseudo-Dimension of Near-Optimal Auctions

In the traditional economic approach to identifying a revenue-maximizing auction, one first posits a prior distribution over all unknown information, in particular, the values of buyers in the market, and then solves for the auction that maximizes expected revenue with respect to this distribution. The first obstacle to making this approach operational is the difficulty of formulating an appropriate prior. The second obstacle is that, even if an appropriate prior distribution is available, the corresponding optimal auction can be far too complex and unintuitive for practical use. This motivates the goal of identifying auctions that are “simple” and yet nearly-optimal in terms of expected revenue.

In this talk, I will describe recent work that simultaneously addresses these two difficulties, using techniques from statistical learning theory to design auction classes. Formally, we design a parametric class of auctions which trades off between the pseudo-dimension of the class and the revenue optimality of the best auction within that class. This approach allows one to state precisely how much data is needed about buyers in order to learn a revenue-optimal auction from a class of auctions, and also gives a precise measure of the relative simplicity of a class of auctions.

This work is joint with Tim Roughgarden of Stanford University.

Short Bio: Jamie Morgenstern is a Warren fellow in computer science at the University of Pennsylvania, hosted by Michael Kearns, Aaron Roth, and Rakesh Vohra. She recently completed her PhD working with Avrim Blum at Carnegie Mellon University. Her interests lie within the area of theoretical computer science; more specifically, she spends a lot of her time thinking about mechanism design, learning, algorithmic game theory, privacy, and approximation algorithms.

How Well do Prices Coordinate Markets

In discrete allocation problems, Walrasian equilibrium prices have a remarkable property: they allow each agent in the market to purchase a bundle of goods that he independently judges to be the most desirable, while guaranteeing that the jointly induced allocation will globally be social welfare maximizing. However, this otherwise clean story has two caveats:
-- First, the prices themselves may induce a large number of indifferences among agents that need to be resolved in a coordinated manner. Hence, the prices themselves are not alone sufficient to coordinate the market. In fact, it is not hard to see that the minimal Walrasian equilibrium prices necessarily -always- induce indifferences, for any set of bidder valuations, so we cannot simply appeal to "genericity" to eliminate indifferences.
-- Second, although we know natural procedures which converge to Walrasian equilibrium prices when used on a fixed population, in practice, we observe and interact with prices that were arrived at independently of our own participation in a tâtonnement process. We expect that these prices therefore are not perfect Walrasian equilibrium prices tailored exactly to the individuals in the market, but rather, that they result from some kind of "distributional" information about the market. This exacerbates the issue of coordination.

We give results of two sorts. First, we show a genericity condition under which the (exact) minimal Walrasian equilibrium prices in a commodity market induce allocations which result in vanishing overdemand for any tie breaking rule that buyers may use to resolve their indifferences. Second, we use techniques from learning theory to argue that the overdemand and welfare induced by a price vector converges to its expectation uniformly over the class of all price vectors, with sample complexity only linear in the number of goods in the market (without placing any assumption on the form of the valuation functions of the buyers).

Combining these two results implies that the exact Walrasian equilibrium prices computed in a commodity market (under a mild genericity condition) are guaranteed to induce both low overdemand and high welfare when used in a new market, in which agents are sampled independently from the same distribution, whenever the number of agents is larger than the number of commodities in the market.

Based on joint work and discussion with Justin Hsu, Jamie Morgenstern, Ryan Rogers, and Rakesh Vohra.

Short Bio: Aaron Roth is the Raj and Neera Singh assistant professor of Computer and Information Sciences at the University of Pennsylvania, affiliated for the Warren Center for Network and Data Science. Previously, he received his PhD from Carnegie Mellon University and spent a year as a postdoctoral researcher at Microsoft Research New England. He is the recipient of an Alfred P. Sloan Research Fellowship, an NSF CAREER award, and a Yahoo! ACE award. His research focuses on the algorithmic foundations of data privacy, game theory and mechanism design, learning theory and the intersection of these topics.

Graph Verification and Reconstruction

Given a mental picture of some unknown graph, how can we verify that this picture is correct, when the unknown graph is only accessible by querying pairs of vertices and getting their shortest path distance in the graph? For graphs of bounded degree, we present a simple algorithm that achieves an O(log n) approximation to the minimum number of queries needed. For graphs that are also bounded treewidth, we give an algorithm that uses a number of queries that is nearly linear in the number of vertices. When we do not have an a priori mental image, and the goal is to reconstruct the unknown graph, we give a nearly linear query bound for some special classes of graphs. We also relate the query complexity under distance queries with the query complexity under shortest path queries, where the answer is actually a shortest path between the vertices queried.

Based on joint work with Claire Mathieu and Hang Zhou.

Short Bio: Sampath is a Henry Salvatori Professor and Department Chair in the Department of Computer and Information Science at the University of Pennsylvania. His research spans several subfields in algorithms. In his work on massive data set algorithms, Sampath explores what can be computed efficiently, and what is not computable. He is also interested in program checking, a paradigm for ensuring the correctness of a program by observing its behavior at run-time, and in algorithmic problems in computational biology, particularly the problem of reconstructing the evolutionary history of a set of species from phenotypic and molecular sequence observations.

Secretary Problems with Non-Uniform Arrival Order

For a number of problems in the theory of online algorithms, the assumption that elements arrive in uniformly random order enables the design of algorithms with much better performance guarantees than under worst-case assumptions. The quientessential example of this phenomenon is the secretary problem, in which an algorithm attempts to stop a sequence at the moment it observes the element of maximum value. As is well known, if the sequence is presented in uniformly random order there is an algorithm that succeeds with probability 1/e, whereas no non-trivial performance guarantee is possible if the elements arrive in worst-case order.

In many applications of online algorithms, it is reasonable to assume there is some randomness in the input sequence, but unreasonable to assume that the arrival ordering is uniformly random. This work initiates an investigation into relaxations of the random-ordering hypothesis, by focusing on the secretary problem and asking what performance guarantees one can prove under relaxed assumptions. Along the way to answering this question we will encounter some tools, such as coding theory and approximation theory, not normally associated with the analysis of online algorithms.

Joint work with Thomas Kesselheim and Rad Niazadeh.

Short Bio: Bobby Kleinberg is an Associate Professor of Computer Science at Cornell University and a Principal Researcher at Microsoft Research New England. His research studies the design and analysis of algorithms, and their applications to economics, networking, information retrieval, and other areas. Prior to receiving his doctorate from MIT in 2005, Kleinberg spent three years at Akamai Technologies, where he assisted in designing the world's largest Internet Content Delivery Network. He is the recipient of a Microsoft Research New Faculty Fellowship, an Alfred P. Sloan Foundation Fellowship, and an NSF CAREER Award.

Routing and Treewidth: Recent Developments

The study of approximation algorithms for the maximum edge and node disjoint paths problems raised an interesting question on the structure of graphs with large treewidth. The breakthrough work of Chuzhoy in 2011 obtained a poly-logarithmic approximation for the maximum edge disjoint paths problem with constant congestion. Several new results have been obtained by building on her insights and other ideas. One of the highlights is a significant quantitative improvement from exponential to polynomial in the grid-minor theorem of Robertson and Seymour. We will survey some of these developments.

Short Bio: Chandra Chekuri is a Professor of Computer Science at the University of Illinois, Urbana-Champaign. He received his PhD from the Department of Computer Science at Stanford University in 1998, then worked as a Member of Technical Staff at Lucent Bell Labs from 1998 until 2006 before moving to Illinois. He is broadly interested in theoretical computer science, algorithms, and combinatorial optimization. His main contributions have been to approximation algorithms for NP-Hard problems, related technical tools, and some applications. His recent work has focused on constrained submodular function optimization, flows and cuts in graphs, and connections to topics in graph theory.

A lower bound on the time complexity of Edit Distance

The edit distance between two strings is defined as the minimum number of insertions, deletions or substitutions of symbols needed to transform one string to the other. The problem of computing edit distance is a classic computation task. Unfortunately, all known algorithms that exactly compute edit distance run in time nearly quadratic in the length of the strings.

In this talk, I will present a result of Backurs and Indyk (2015), which gives evidence that the near-quadratic running time for this problem is tight. They give a near-quadratic lower bound based on the “Strong Exponential Time Hypothesis”, which states that 3SAT on $N$ variables cannot be solved in time $O(2^{N(1-c)})$ for any constant $c> 0$.

Link to the original work.
STOC'15 link.

Differentially Private Chi-Squared Hypothesis Tests

We consider the problem of performing statistical hypothesis testing when the data is to remain private. For this paper we focus on multinomial data to perform goodness of fit testing and independence testing in contingency tables, which typically uses a chi-squared statistic. We develop a new statistical test that maintains high significance with a small loss in power when the data analyst is given access to the noisy cell counts (due to privacy), as opposed to the exact counts.

Coding for online channels

In the online (or “causal”) channel coding model, a sender wishes to communicate a message to a receiver by transmitting a codeword $x = (x_1,..., x_n)$ character by character over a channel limited to at most $pn$ corruptions. The channel is “online” in the sense that at the ith step of communication the channel decides whether to corrupt the i'th character or not based on its view so far, i.e., its decision depends only on the transmission $(x_1,...,x_i )$. This is in contrast to the classical adversarial channel in which the error is chosen by a jammer that has a full knowledge on the sent codeword $x$.

In this talk, I will present a spectrum of results on the achievable communication rate in the online setting, including a recent characterization of the capacity of the online binary channel (STOC15). No prior knowledge in coding theory will be assumed.

The talk will be based on works that are joint with: Z. Chen, B. K. Dey, S. Jaggi, and A. D. Sarwate.

Short Bio: Prof. Langberg received his B.Sc. in mathematics and computer science from Tel-Aviv University in 1996, and his M.Sc. and Ph.D. in computer science from the Weizmann Institute of Science in 1998 and 2003 respectively. Between 2003 and 2006, he was a postdoctoral scholar in the Electrical Engineering and Computer Science departments at the California Institute of Technology. Prof. Langberg is currently an associate professor in the Department of Electrical Engineering at the State University of New York in Buffalo after holding a position in the department of Mathematics and Computer Science at the Open University of Israel.

Prof. Langberg's research addresses the algorithmic and combinatorial aspects of information in communication, management, and storage; focusing on the study of information theory, coding theory, network communication and network coding, big data in the form of succinct data representation, and probabilistic methods in combinatorics. Prof. Langberg is an associate editor for the IEEE Transactions on Information Theory and editor for the Information Theory Society Newsletter.