One of the key challenges of the information age is to actually get access to information. Many important datasets, such as collections of medical records, are not accessible to the general public due to privacy concerns. This limits our ability to analyze such data to derive information that could benefit the general public (for example, such analysis may reveal the beginnings of disease outbreaks).

The goal of statistical privacy is to overcome these limitations. Statistical privacy is the art of computing statistics, answering queries, or performing other processing of sensitive datasets without revealing revealing sensitive information. This would let researchers learn about disease outbreaks without being able to learn about the personal details of any individual. In other words, the goal is to precisely control which inferences are possible.

Research in statistical privacy focuses on creating privacy definitions - restrictions on how data can be processed, and privacy mechanisms - algorithms that process data according to those restrictions. One of the difficulties of statistical privacy is that intuition about privcay is often faulty. The literature on privacy (including statistical databases, disclosure control, etc.) contains many examples of privacy definitions and mechanisms that were later found to be faulty or inapplicable to different kinds of datasets.

Recent work has shown that privacy definitions can be formalized and analyzed as mathematical objects. This approach offers the possibility of minimizing the role of intuition. The goal is to develop a set of privacy axioms that are simple and easy to understand. This would allow privacy definitions to be generated in a modular way: for a given application, an appropriate subset of axioms is chosen and this serves to define the privacy definition. Since privacy and its axioms can be analyzed mathematically, properties and consequences of such privacy definitions would be easier to examine.

This research is funded by NSF award #1054389