Analyzing and transforming COTS binaries to improve their security is arguable the Holy Grail in software-security research. There is a vast number of programs whose source code is either unavailable or cannot be recompiled. Yet these programs are part of deployed systems to provide critical functionalities; as a result we must protect them from exploits. Therefore, binary-level techniques that analyze and transform stripped COTS binaries for security have been intensively studied.
However, despite decades of research, most state-of-the-art binary-level defense techniques are based on heuristics, make heavy assumptions about compilation toolchains, and do not handle code obfuscation well.
In this project, we focus on binary-level reverse engineering. Before we are able to perform analysis and transformation on a piece of binary, we must reverse engineer it to get its basic information, including its instructions, its control-flow graph, and basic dataflow information. Previous reverseengineering techniques are often ad hoc and do not have a formal basis. There is also no evaluation about what would be the best reverse-engineering algorithms in terms of precision and performance. We plan to construct a reverse-engineering tool that makes it easy for principled exploration of the design space of reverse-engineering algorithms.