### Sublinear Algorithms for Approximating String Compressibility

We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and Lempel-Ziv (LZ), and present sublinear algorithms for approximating compressibility with respect to both schemes. We also give several lower bounds that show that our algorithms for both schemes cannot be improved significantly.

Our investigation of LZ yields results whose interest goes beyond the initial questions we set out to study. In particular, we prove combinatorial structural lemmas that relate the compressibility of a string with respect to Lempel-Ziv to the number of distinct short substrings contained in it. In addition, we show that approximating the compressibility with respect to LZ is related to approximating the support size of a distribution.

Based on joint work with Dana Ron, Ronitt Rubinfeld and Adam Smith.

### Packet-symbol decoding for reliable multipath reception with no sequence numbers (slides in ppt)

This talk describes a new decoding method based on two prior ideas:
1) Mitzenmacher's idea in 2006 of adding a different pseudo-random
number to each packet in a packet-symbol (n, k) code; 2) Metzner's
idea in 1990 of
vector symbol decoding based on symbol set verifications. The new
method allows one to derive the data despite packet deletions, packet
errors, out-of-order receptions, and no sequence numbers and no in-
packet error
detection. It has less decoding complexity than the method described
by Mitzenmacher, and is not restricted to low density codes. The
method appears most useful in wireless communication that might use
multiple path
reception, where packets are small and may be received out of order,
with error, more than once, correctly or incorrectly, or deleted.
Despite the saving of using no sequence numbers and n in-packet error
detection
overhead, the decoder can decode the packet-based code almost as well
as if the error positions and deletion positions were known in
advance. Error detection in the overall multi-packet code should be
used, but this can
amount to less than one bit per data packet.

### Improvements on the minimum digraph problem

It is an old and natural problem: given a set of arcs of a strongly
connected graph, find a subset of minimum size that still makes the
graph connected.

In one version of the problem, we have data on dependence of various
parameters in a natural system, typically, so-called expression
level of proteins in cells and enviromental factors (light, temperature,
nutrients). Data includes indirect evidence and direct evidence.
We want to keep all edges that correspond to direct evidence and
minimize the number of edges that correspond to indirect evidence.
In terms of the minimum digraph problem, the new feature is that
we are forced to ``pay'' for certain edges.

We show approximation algorithm which guarantees that we pay less
than 1.5 times the optimum. This was published already in 2001,
but in a conference, and it seems that the method description and
the proof were both incomplete. Moreover, we have the same
approximation
ratio for the second version of the problem.

Joint work with Bhaskar DasGupta from UIC.

### Regularity Discovery as a Graph Generation Problem

Automatic regularity discovery in real world data is an important and computationally challenging task. Our previous work on "Discovering Texture Regularity as a Higher-Order Correspondence Problem" uses a spectral method to generate degree-4 graph from monochrome imagery. Even though the initial results suggest the feasibility of the approach, the problem formulation and algorithmic considerations of the degree-4 lattice/graph generation process remain unexplored. In addition, we have not taken the potential advantage of propagating constraints bottom-up and top-down between raw image data and an effective abstract graphic representation. In this talk, I shall motivate the problem from its source of machine perception of near-regular patterns based on crystallographic groups, lay out the current method and considerations and experimental results, and finally, with your help, explore a new graph theoretical formalization for effective and (hopefully) efficient regularity discovery in real world image data.

References:

Discovering Texture Regularity as a Higher-Order Correspondence Problem 9th European Conference on Computer Vision, May, 2006.
http://www.ri.cmu.edu/pubs/pub_5295.html

A Computational Model for Periodic Pattern Perception Based on Frieze and Wallpaper Groups IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, No. 3, March, 2004, pp. 354 - 371.
http://www.ri.cmu.edu/pubs/pub_4468.html

### The Communication Complexity of Correlation

In this talk, we examine the communication required for generating
correlated random variables remotely. Consider a pair of joint random
variables (X,Y). Suppose Alice is given a sample x distributed
according to X, and she has to send a message to Bob who is then
required to generate a value with distribution exactly Y|_{X=x} (i.e,
the conditional distribution of Y given X=x). We consider the minimum
expected number of bits (which we call C(X:Y)) Alice needs to send Bob
to achieve this. We show that if Alice and Bob are allowed to share
random bits (independent of Alice's input x), then Alice need send no
more than approximately the mutual information number of bits. More
formally,

I[X:Y] <= C(X:Y) <= I[X:Y] + O(I[X:Y]) + O(1)

where I[X:Y] = H[X] + H[Y] - H[X,Y] is the mutual information between
the random variables X and Y.

As an application of this characterization of mutual information, we
derive a direct sum
theorem in communication complexity that substantially improves the
previous such result shown by Jain et al 2003.

Our proofs are based on a rejection sampling procedure that relates
the relative entropy between two distributions to the communication

### Authentication for Pervasive Communications

Imagine a world where devices from vehicles to dog collars are constantly communicating with their environments. On the one hand, we need a method for authenticating these messages to hold parties spreading false information accountable. On the other hand, we need these messages to remain anonymous, so as not to create a privacy crisis. In this work, we propose a system that balances these two (seemingly contradictory) requirements. We create a credential system that lets a user anonymously authenticate at most k times in a single time period. A user withdraws a dispenser of k authentication tokens, which automatically refreshes every time period. Thus, parties obeying the rules remain anonymous and there is a limit to how much false information one party can spread. Our construction is based on e-cash, which allows us to propose several interesting extensions including optional tracing, revocation, and glitch protection. We\'ll discuss how efficient our scheme is in practice, as well as techniques for making it even more practical. A background in cryptography will be helpful, but not necessary for this talk. This is based on a collection of joint work with Mira (Meyerovich) Belenkiy, Jan Camenisch, Markulf Kohlweiss, Anna Lysyanskaya, and Michael Ostergaard Pedersen.

### Secure Multiparty Computation of Approximations: A Survey

Suppose two or more parties have pieces of a large database and they want to do joint datamining. Typically, they will want their computation to be efficient, at least approximately correct (often exact computations cannot be performed efficiently), and private, in the sense that the players do not want to reveal more information than necessary. Although approximation algorithms and private multiparty computation (as well as a stronger notion of secure multiparty computation) have been studied for decades, it is not straightforward to combine existing techniques. We discuss the issues, give a general framework for secure approximations, and survey some results in
the area.

### Trapdoors for Hard Lattices, with Cryptographic Applications

Cryptographic schemes based on *lattices* are attractive for several reasons: their computations are simple and of low complexity, they have thus far resisted quantum attacks, and their security can be based on the *worst-case* hardness of lattice problems. To date, constructions of such primitives have unfortunately been limited mainly to collision-resistant hashing and public-key encryption. This talk will describe how to securely generate and exploit \"trapdoors\" in lattices. The cryptographic applications include various kinds of trapdoor functions, simple \"hash-and-sign\" digital signature schemes, and identity-based encryption, all based on worst-case lattice assumptions. The talk will be self-contained. Joint work with Craig Gentry and Vinod Vaikuntanathan.

### New Possibilities for Program Obfuscation

A program\'s code is often instructive---using a debugger with the code, a determined student can usually understand how the program works and learn any of its secrets. A recent line of research in program obfuscation aims to address whether this property is fundamental: Is it possible to obfuscate a program\'s code so that its functionality is preserved, but the code reveals no other information? Unfortunately, for a variety of definitions and models, general program obfuscation is not possible [H99, BGI+01, W05, GK06]. The grim situation stems from the fact that some programs necessarily leak secrets about how they work. In this talk, we show that an interesting and natural encryption functionality can be obfuscated. Whereas the only other positive results of obfuscation apply to very simple point functions [C97,CMR99,W05], we are able to obfuscate the much richer re- encryption functionality. This functionality transforms a ciphertext for message m encrypted under Alice\'s public key into a ciphertext for the same message m under Bob\'s public key. It is useful in many practical applications such as secure email forwarding, encrypted file servers, and certain implementations of iTunes. Our positive evidence that some complex programs can be meaningfully obfuscated introduces a new challenge in this area: can we come to a better understanding of which types of programs can be obfuscated?This is joint work with Susan Hohenberger, Guy Rothblum, and Vinod Vaikuntanathan.

### Exponential time algorithms for hard problems

In this talk, I discuss fast exponential time algorithms for NP-complete
problems. Every NP-complete problem can be solved by exhaustive search.
But for some problems, it is possible to obtain significantly faster
algorithms, though still exponential in time.
I present some of the most important techniques to design and analyze
fast exponential time algorithms: Dynamic Programming across Subsets,
Branch & Reduce, Measure & Conquer, Memorization, Treewidth based
Algorithms, Treewidth combined with Branch & Reduce,
Inclusion-Exclusion, and Iterative Compression.
Each technique is illustrated with an algorithm for a specific problem.
The problems considered in this talk are:
- Maximum Independent Set (MIS). Given a graph G = (V,E), find an
independent set of G of maximum size. An independent set of a graph is a
set of pairwise non-adjacent vertices of the graph.
- Travelling Salesman Problem (TSP). Given a set of cities and the
distance between every two cities, find a tour visiting all cities with
minimum total distance. A tour is a permutation of the cities, starting
and ending in a given city.
- Coloring (COL): Given a graph G = (V,E), find a coloring of V with the
smallest number of colors. A coloring f : V -> {1, 2, ..., k} is a
function assigning colors to V such that no two adjacent vertices are
assigned the same color.
- k-Hitting Set (k-HS). Given an instance (U, S) where U is a universe
of elements and S is a set of subsets of size at most k of U, find a
hitting set for (U, S) of minimum size. A hitting set for (U, S) is a
set of elements H such that for each S' in S, S' and H contain at least
one common element.

Some References:

[1] F. V. Fomin, F. Grandoni, and D. Kratsch. Some new techniques in
design and analysis of exact (exponential) algorithms. Bulletin of the
European Association for Theoretical Computer Science 87, pp. 47-77 (2005).

[2] G. J.Woeginger. Exact algorithms for NP-hard problems: a survey.
Combinatorial Optimization - Eureka, You Shrink!, LNCS. Springer-Verlag
New York, pp. 185-207 (2003).

[3] G. J.Woeginger. Space and time complexity of exact algorithms : some
open problems. Proceedings of the First International Workshop on
Parameterized and Exact Computation (IWPEC 2004), LNCS 3162.
Springer-Verlag Berlin, pp. 281-290 (2004).

### Learning, Testing, and Approximation

Roughly speaking, in learning theory the goal is to use limited data to construct an approximation to an unknown function, and in property testing the goal is to determine whether an unknown function can be approximated in a particular way using very limited query access to the function.Thus it is clear that (at least on a superficial level) both learning and testing are closely connected to approximation.The high-level goal of this talk is to demonstrate that in fact deeper connections exist between these topics. Each of these three tasks -- learning, testing, and approximation -- can be viewed as a type of \"lens\" for gaining insights into classes of functions, and the insights gained from one viewpoint can quite often be profitably transferred to the other tasks. We demonstrate this by describing recent results that connect learning, testing, and approximation for many different well-studied types of Boolean functions, such as DNF formulas, decision trees, sparse polynomials, and halfspaces. The talk is based in part on joint work with I. Diakonikolas (Columbia), H. Lee (Columbia), K. Matulef (MIT), K. Onak (MIT), R. Rubinfeld (MIT) and A. Wan (Columbia); with K. Matulef, R. Rubinfeld and R. O\'Donnell (CMU); and with R. O\'Donnell.

### Internet Measurement in the Presence of Adversaries

Edge networks connected to the Internet need effective monitoring techniques to drive routing decisions and detect violations of Service Level Agreements (SLAs). However, existing measurement tools, (like ping and traceroute), are vulnerable to attacks that make a path look better than it really is. In this paper, we design and analyze path-quality monitoring protocols that robustly raise an alarm when packet-loss rate and delay exceeds a threshold, even when adversary tries to bias monitoring results by selectively delaying, dropping, modifying, injecting, or preferentially treating packets. Despite this strong threat model we consider here, we are able to develop protocols that are efficient enough to run at line rate on high-speed routers.We take a rigorous approach to the problem by applying techniques from theoretical cryptography. We start by presenting an extremely lightweight secure sketching protocol for identifying when packet loss and delay degrade beyond a threshold that requiring only 250-600 bytes of storage in a router and periodic transmission of a comparably sized IP packet. We then present a technique for composing this protocol in a manner that allows the source to securely localize faulty links on a single path through a network to a receiver. Finally, we present a set of negative results that show that any protocol that has the security properties we require must also have a key infrastructure and invoke cryptographic operations.

### Bertrand Competition in Networks

The Internet is a unique modern artifact given its sheer size and the number of its users. Given its continuing distributed and ad-hoc evolution, and emerging applications, there have been growing concerns about the effectiveness of its current routing protocols in finding good routes and ensuring quality of service. Imposing congestion-based and QoS-based prices on packets has been suggested as a way of combating the ills of this distributed growth and selfish use of resources. Unfortunately, the effectiveness of such approaches relies on the cooperation of the multiple entities implementing them, namely the ISPs or Internet service providers. The goals of the ISPs do not necessarily align with the social objectives of efficiency and quality of service; their primary objective is to maximize their own profit. It is therefore imperative to study the following question: given a large combinatorial market such as the Internet, suppose that the owners of resources selfishly price their product to maximize their own profit, and consumers selfishly purchase resources to maximize their own utility, how does this effect the functioning of the market as a whole? We study this problem in the context of a simple network pricing game, and analyze the performance of equilibria arising in this game as a function of the degree of competition in the game, the network topology, and the demand structure. Economists have traditionally studied such questions in single-item markets. It is well known, for example, that monopolies cause inefficiency in a market by charging high prices, whereas competition has the effect of driving prices down and operating efficiently. Our work extends the classical Bertrand model of competition from economics to the network setting. For example, we ask (and answer): is competition in a network enough to ensure efficient operation? does performance worsen as the number of monopolies grows? does the answer depend in an interesting way on the network topology and/or demand distribution? This is joint work with Tim Roughgarden.

### Approximate Online Optimization

Imagine a FedEx driver who must repeatedly choose a route to visit a set of cities. Each day, she chooses a tour and then finds out the total time that it took. The time a street takes may vary unpredictably from day to day due to traffic, construction, accidents, or any other cause. Her goal is to achieve low average time over many days. More generally, in an online sequential linear optimization problem, a decision-maker selects daily from a known (possibly infinite) set of decisions, and experiences a cost determined by the next function in an unknown sequence of cost functions chosen from a known family of linear functions. The offline version of the FedEx driver\'s problem is simply the Traveling Salesman Problem, where the edge costs are known in advance. In previous work, Kalai and Vempala (COLT \'03) showed how to convert any exact offline algorithm for a linear optimization problem into an algorithm for the online sequential version of the problem. They also applied their technique to a special class of approximation algorithms. Unfortunately, many problems (e.g., the TSP above) cannot be solved efficiently offline, and many approximation algorithms do not fall into this special class. In this talk, we show how to convert any approximation algorithm for a linear optimization problem into an algorithm for the online sequential version of the problem. Our reduction maintains the asymptotic approximation guarantee of the original algorithm, relative to the average performance of the best static decision in hindsight (e.g. best TSP tour). Our approach is based on a geometric interpretation of this class of problems. Joint work with Sham Kakade and Adam Tauman Kalai.