On the Readability of Overlap DigraphsWe introduce the graph parameter readability and study it as a function of the number of vertices in a graph. Given a digraph D, an injective overlap labeling assigns a unique string to each vertex such that there is an arc from x to y if and only if x properly overlaps y. The readability of D is the minimum string length for which an injective overlap labeling exists. In applications that utilize overlap digraphs (e.g., in bioinformatics), readability reflects the length of the strings from which the overlap digraph is constructed. We study the asymptotic behavior of readability by casting it in purely graph theoretic terms (without any reference to strings). We prove upper and lower bounds on readability for certain graph families and general graphs. This is joint work with Rayan Chikhi, Martin Milanic, and Sofya Raskhodnikova.
Wallpaper Group-based Human Perception of Regular Patterns from EEG DataMathematicians have long proven that there is a finite number of crystallographic groups associated with all possible periodical patterns in N-dimensional Euclidean space, while little is known on how human perceive such patterns and how the regular patterns are organized in human brain. We explore human perception using, systematically, wallpaper patterns as stimuli and collect electroencephalogram (EEG) responses of human brain directly. Our initial analysis of the EEG data through machine learning, MDS and minimum/maximum spanning tree extraction leads to surprising graph-representation resemblance between the subgroup hierarchy of the 17 wallpaper groups and the human neural responses (EEG data) to the corresponding wallpaper patterns. Joint work of Chris Funk (PSU), Peter Jes Kohler (Stanford), Anthony Norcia (Stanford). Helpful links: - For a quick look on wallpaper groups, see chapter 2 of this survey paper on "computational symmetry". - For some background/related information on our NSF grant of 'Symmetry Group-based Regularity Perception in Human and Computer Vision', check out this project page.
A nearly optimal approximation algorithm for 3LINSuppose we are given a system of linear equations each having 3 variables on Z_2. The problem 3LIN is to come up with an assignment to the variables that satisfies the maximum fraction of equations. Random guessing will satisfy half of equations. Can we do better? This talk will be about an algorithm that nearly matches NP-hardness result on approximation ratio. The algorithm is from Håstad’s work. For those who are interested, there is interesting history behind this result.
Erasure-Resilient Property TestingIn one of the most widely studied models of property testing [Goldreich, Goldwasser and Ron 1998], the testing algorithm has oracle access to an input object and has to distinguish with high probability, objects that satisfy a property from those that need to be changed in at least an epsilon fraction of their positions to satisfy the property. Examples of some useful common properties having efficient testers in this model are monotonicity, convexity, Lipschitz property etc. However in real life databases, several of the entries might be unavailable to the tester due to adversarial erasures or privacy requirements. The tester does not learn anything about the input by querying such erased points. Moreover, the knowledge of a tester may enable an adversary to erase some of the points so as to increase the query complexity of the tester arbitrarily, or in some cases, make the tester entirely useless. In the talk, I will define the erasure-resilient property testing model that formalizes these impediments to testing in real life. I will then discuss the main aspects of our erasure-resilient testers for monotonicity, convexity, Lipschitz property, etc. I will also describe in detail, our nearly optimal erasure-resilient tester for convexity.
Joint work with Kashyap Dixit, Sofya Raskhodnikova and Abhradeep Thakurta.
Repairing Reed-Solomon CodesA fundamental fact about polynomial interpolation is that k evaluations of a degree-(k-1) polynomial f(x) are sufficient to determine f(x). This is also necessary in a strong sense: given k-1 evaluations of f, we learn nothing about f(a) for any kth point a. We study a variant of this polynomial interpolation problem. Instead of getting complete evaluations of f(x) -- which may live in a large finite field F -- we are allowed to ask for *partial* evaluations -- that is, a few symbols from a subfield of F, rather than one symbol from F itself -- from the evaluations f(x). The goal is again to determine f(a) for some point a that was not queried, while minimizing the total amount of information gathered from the evaluations. We show that in this model, one can do significantly better than the traditional setting, in terms of the amount of information required to determine f(a). Our motivation comes from distributed storage systems, and from the exact repair problem for Reed-Solomon codes. The traditional use of Reed-Solomon codes for distributed storage is analogous to the standard interpolation problem: each node in a system stores an evaluation f(a), and if one node fails we can recover it by reading k other nodes. However, each node is perfectly free to send less information, leading to the modified interpolation problem above. The quickly-developing field of regenerating codes has yielded several codes which take advantage of this freedom. However, these regenerating codes are not Reed-Solomon codes, and RS codes are still often used in practice. Our results imply that, in fact, one can use Reed-Solomon codes to also take advantage of this freedom to download partial symbols. In some (disclaimer: less standard) parameter regimes, our scheme for Reed-Solomon codes outperforms all known regenerating codes. This is joint work with Venkat Guruswami. Short Bio: Mary Wootters is an NSF post-doctoral fellow at Carnegie Mellon University. She received her Ph.D. in math from the University of Michigan in 2014, and her B.A. in math and computer science from Swarthmore College in 2008. Her research focuses on coding theory and randomized algorithms.
The Pseudo-Dimension of Near-Optimal AuctionsIn the traditional economic approach to identifying a revenue-maximizing auction, one first posits a prior distribution over all unknown information, in particular, the values of buyers in the market, and then solves for the auction that maximizes expected revenue with respect to this distribution. The first obstacle to making this approach operational is the difficulty of formulating an appropriate prior. The second obstacle is that, even if an appropriate prior distribution is available, the corresponding optimal auction can be far too complex and unintuitive for practical use. This motivates the goal of identifying auctions that are “simple” and yet nearly-optimal in terms of expected revenue. In this talk, I will describe recent work that simultaneously addresses these two difficulties, using techniques from statistical learning theory to design auction classes. Formally, we design a parametric class of auctions which trades off between the pseudo-dimension of the class and the revenue optimality of the best auction within that class. This approach allows one to state precisely how much data is needed about buyers in order to learn a revenue-optimal auction from a class of auctions, and also gives a precise measure of the relative simplicity of a class of auctions. This work is joint with Tim Roughgarden of Stanford University. Short Bio: Jamie Morgenstern is a Warren fellow in computer science at the University of Pennsylvania, hosted by Michael Kearns, Aaron Roth, and Rakesh Vohra. She recently completed her PhD working with Avrim Blum at Carnegie Mellon University. Her interests lie within the area of theoretical computer science; more specifically, she spends a lot of her time thinking about mechanism design, learning, algorithmic game theory, privacy, and approximation algorithms.
How Well do Prices Coordinate MarketsIn discrete allocation problems, Walrasian equilibrium prices have a remarkable property: they allow each agent in the market to purchase a bundle of goods that he independently judges to be the most desirable, while guaranteeing that the jointly induced allocation will globally be social welfare maximizing. However, this otherwise clean story has two caveats:
-- First, the prices themselves may induce a large number of indifferences among agents that need to be resolved in a coordinated manner. Hence, the prices themselves are not alone sufficient to coordinate the market. In fact, it is not hard to see that the minimal Walrasian equilibrium prices necessarily -always- induce indifferences, for any set of bidder valuations, so we cannot simply appeal to "genericity" to eliminate indifferences.
-- Second, although we know natural procedures which converge to Walrasian equilibrium prices when used on a fixed population, in practice, we observe and interact with prices that were arrived at independently of our own participation in a tâtonnement process. We expect that these prices therefore are not perfect Walrasian equilibrium prices tailored exactly to the individuals in the market, but rather, that they result from some kind of "distributional" information about the market. This exacerbates the issue of coordination.
We give results of two sorts. First, we show a genericity condition under which the (exact) minimal Walrasian equilibrium prices in a commodity market induce allocations which result in vanishing overdemand for any tie breaking rule that buyers may use to resolve their indifferences. Second, we use techniques from learning theory to argue that the overdemand and welfare induced by a price vector converges to its expectation uniformly over the class of all price vectors, with sample complexity only linear in the number of goods in the market (without placing any assumption on the form of the valuation functions of the buyers).
Combining these two results implies that the exact Walrasian equilibrium prices computed in a commodity market (under a mild genericity condition) are guaranteed to induce both low overdemand and high welfare when used in a new market, in which agents are sampled independently from the same distribution, whenever the number of agents is larger than the number of commodities in the market.
Based on joint work and discussion with Justin Hsu, Jamie Morgenstern, Ryan Rogers, and Rakesh Vohra. Short Bio: Aaron Roth is the Raj and Neera Singh assistant professor of Computer and Information Sciences at the University of Pennsylvania, affiliated for the Warren Center for Network and Data Science. Previously, he received his PhD from Carnegie Mellon University and spent a year as a postdoctoral researcher at Microsoft Research New England. He is the recipient of an Alfred P. Sloan Research Fellowship, an NSF CAREER award, and a Yahoo! ACE award. His research focuses on the algorithmic foundations of data privacy, game theory and mechanism design, learning theory and the intersection of these topics.
Graph Verification and ReconstructionGiven a mental picture of some unknown graph, how can we verify that this picture is correct, when the unknown graph is only accessible by querying pairs of vertices and getting their shortest path distance in the graph? For graphs of bounded degree, we present a simple algorithm that achieves an O(log n) approximation to the minimum number of queries needed. For graphs that are also bounded treewidth, we give an algorithm that uses a number of queries that is nearly linear in the number of vertices. When we do not have an a priori mental image, and the goal is to reconstruct the unknown graph, we give a nearly linear query bound for some special classes of graphs. We also relate the query complexity under distance queries with the query complexity under shortest path queries, where the answer is actually a shortest path between the vertices queried.
Based on joint work with Claire Mathieu and Hang Zhou. Short Bio: Sampath is a Henry Salvatori Professor and Department Chair in the Department of Computer and Information Science at the University of Pennsylvania. His research spans several subfields in algorithms. In his work on massive data set algorithms, Sampath explores what can be computed efficiently, and what is not computable. He is also interested in program checking, a paradigm for ensuring the correctness of a program by observing its behavior at run-time, and in algorithmic problems in computational biology, particularly the problem of reconstructing the evolutionary history of a set of species from phenotypic and molecular sequence observations.