Commercial software is fraught with bugs and attackers are constantly looking to identify and exploit these vulnerabilities both for fun and for profit. In recent times we have come across several instances where attackers have managed to gain unauthorized access into systems and grab large amounts of security-sensitive user information that companies store, such as credit card information of their customers. Years of research have gone into studying such bugs and prescribing solutions, without much change in the quality of deployed software. The main problem is that while the software security community prescribes a proactive approach of incorporating security into the design of software, programmers tend to view software security as a reactive process. Economic and practical considerations force programmers to focus mainly on functionality, leaving security to be retrofitted as the need arises. However at present the existing techniques that retrofit software for security are manual and ad hoc.
In this work, we focus on the problem of retrofitting legacy code in order to enforce an authorization policy. An authorization policy specifies who can access a resource in a system (e.g., files, sockets, etc) and what rights (actions such as reading, writing, etc) they have when they do so. A mechanism that enforces such a policy is called a reference monitor and must satisfy certain guarantees. One such property is that every potentially security-sensitive action on a resource is mediated by a call to the reference monitor. These calls are called authorization hook calls. This will ensure that an unauthorized user will not be able to gain access to security-sensitive operations in a program. To date, authorization hook placement in code bases, such as the X server and postgresql, has largely been a manual procedure, driven by informal analysis of server code, and discussions on developer forums. There is even lack of consensus on basic concepts, such as the definition of what constitutes a security-sensitive operation. Consequently past efforts have taken several years to accomplish this task manually. The goal of this thesis is to solve this problem by retrofitting a legacy program with such a reference validation mechanism in a largely automated fashion with little effort from a programmer. Research efforts in the past have attempted to solve this problem but in a largely ad hoc fashion involving significant programmer input and domain knowledge. As a result none of them have seen widespread adoption.

There are two main challenges in the placement of authorization hooks. The first step is the identification of the objects to be protected, the subjects that execute activities and request access to objects,and the operations that can be executed on the objects, and that must be controlled. Subjects, objects, and operations may be different in different systems or application contexts. For instance, in the protection of operating systems, objects are typically files, directories, or programs; in database systems, objects can be relations, views, stored procedures, and so on. In this work we tackle generic user-space servers that manage resources for and provide services to multiple, mutually-distrusting clients. Therefore we need to infer program specific subjects, objects and operations. Second, after we have an initial set of specifications, we need to determine how to place hooks that satisfy some important guarantees. The hook placement must satisfy the complete mediation guarantee of a reference monitor. It must also additionally satisfy the principle of least privilege, i.e., a subject must only have permissions required to perform the operation requested and minimize the total number of hooks placed to allow for easy verification of hook code and reduce performance penalties at run-time. The granularity of operation identification plays an important role in the number of hooks placed in the program. Once an initial placement is suggested and deployed, the access control policy specification may also provide additional clues that help guide the quest for the correct granularity of hook placement. Therefore any authorization hook placement effort must also have an iterative component that allows for incorporating lessons learned from the deployment of the system.


The main contribution in this work is a novel automated method for placing authorization hooks in server code that significantly reduces the burden on developers. The technique can identify optimal hook placements, in a manner that both minimizes the number of authorization hooks placed, as well as the number of authorization queries generated at runtime, while providing complete mediation. To develop the technique, we rely on a key observation that we gleaned by studying server code.

In a server, clients make requests, which identifies the objects manipulated and the security-sensitive operations performed on them. When a client makes a deliberate choice of an object from a collection of objects managed by the server, that automatically signals the need for authorization. What security-sensitive operations are performed on the retrieved object(s) is determined by the code path that the server takes, which is also an upshot of the user's choice.

Based upon this observation, we design a static program analysis that tracks user choice to identify both security-sensitive objects and the operations that the server performs on them. Our analysis only requires a specification of the statements from which client input may be obtained (e.g., socket reads), and a language-specific definition of object containers (e.g., arrays, lists), to generate a complete authorization hook placement. It uses context-sensitive, flow-insensitive data flow analysis to track how client input influences the selection of objects from containers: these are marked security-sensitive objects. The analysis also tracks how control flow decisions in code influence how the objects are manipulated: these manipulations are security-sensitive operations. The output of this analysis is a set of program locations where mediation is necessary. However, placing hooks at all these locations may be suboptimal, both in terms of the number of hooks placed (e.g., a large number of hooks complicates code maintenance and understanding) and the number of authorization queries generated at runtime. We therefore use the control structure of the program to further optimize the placement of authorization hooks.

Figure 1: Example of a program to demostrate our approach

The Figure 1 above shows the insights used to motivate our solution approach. Authorization is needed when only a subset of subjects should be allowed to access particular program objects or perform particular accesses on those objects. Figure 1 shows that objects o1 and o2 are accessible to a subject User A, but o3 and o4 are not. Further, the subject may not be allowed to perform all operations on her accessible objects. For example, User A may only be allowed to perform a read operation on object o1, while she can both read and write object o2. The choice of authorization hook placement must ensure that the program can only perform an operation after it is mediated for that operation, while ensuring that a subject is authorized only for the operation she has requested to perform. Thus, the statements F and H should only be mediated for read operations, whereas the statements K and L should only be mediated for write operations.The statements I and J do not require mediation as they perform no security-sensitive operations. The program's behaviors on behalf of subjects are determined by the subject's user requests. We find that by tracking user requests we can identify the set of objects that require mediation because such inputs guide the selection of objects for processing. Also, by tracking user requests, we can identify operations the program performs because such inputs choose the program statements that manipulate objects.

Programs that manage objects on behalf of multiple users typically store them in containers. When a subject makes a request, the program may use the request input to choose the objects to retrieve from some containers, potentially resulting in access to a data used by another subject. As the retrieved values are assigned to program variables, these variables represent the program's security-sensitive objects. In Figure 1, variable v in statement C is security-sensitive because the user request input i is used to retrieve an object from the container. By tracking the dataflow from user requests to the selection of objects in containers, we can identify the variables that hold these objects in the program.

The program executes statements chosen by its control statements. If a user request affects the values of the variables used in a control statement's predicate, then the subject can choose the program statements that may access security-sensitive objects. We call the sets of program statements that may be chosen by subjects user-choice operations. As shown in Figure 1, statement E is a control statement. It is shown that three user-choice operations result if E's predicate is dependent on the user request. In addition, the choice of an object from a container is also a user-choice operation. Only the user-choice operations that contain accesses to variables that hold security-sensitive objects represent security-sensitive operations. In the example in Figure 1, this is operation read v and write v. These are the operations that require mediation via authorization hooks.

Using a naive placement of an authorization hook per security-sensitive operation may lead to sub-optimal hook placement. For instance, if all three user-choice operations at E perform the same security-sensitive operation, then we could place a single hook at E to the same effect as placing a separate hook at each branch. Also, if the same security-sensitive operation is performed twice as part of a single request, we only need to authorize it only once. Our solution to the authorization hook placement problem optimizes hook placement by removing redundant mediation in two ways. First, we remove hooks from sibling operations (i.e., user-choice operations that result from the same control statement) if they mediate the same operations, which we call hoisting common operations. Second, we remove any mediation that is already performed by existing hooks that dominate the operation, which we call removing redundant mediation.


We implemented our solution using the CIL C source code analyzer framework. We evaluated our technique on four programs- X server, postgresql, pennmush and memcached. The first two have manually placed hooks and we found that our automatically placed hook cover the manually placed hooks, although the granularity of the automatically generated hooks is finer. The source code will be posted here in a few months. For the design, implementation and results of our work please refer to our CCS2012 paper.


[CCS2012] Leveraging `Choice' to Automate Authorization Hook Placement, Divya Muthukumaran; Trent Jaeger; Vinod Ganapathy; in proceedings of CCS 2012, Raleigh, North Carolina, Oct 2012.BIB