Research

Research Overview

Last modified: Fri Jun 17 16:06:37 2011 GMT.

Very broadly, we are interested in building reliable and secure computer software. The computer industry has shown explosive growth in the past thirty years. This trend is not likely to slow down. However, the lack of dependable software is becoming a bottleneck for such growth. Software dependability falls far behind everything else because we do not have good languages and tools that can take precise control of the ever-growing complexity in modern computer systems.

Software today is still developed without any precise specification. A developer often starts the programming task with a rather informal specification S. After careful engineering, the developer delivers a program P that may not satisfy S. Extensive testing and debugging may shrink the gap between the two, but there is no assurance that P accurately follows S. Such inaccuracy may not sound like a big deal, but when developers link a large number of such modules together, these "noises" may multiply, leading to a system that nobody can understand and manage.

Our research interest is to develop new programming languages and tools that can make programming precise. We are particularly interested in making system programming precise, for the following reasons: (1) system software forms the backbone of all computer systems; without a reliable OS core, all claims we can make at the application level are meaningless; (2) OS kernels and runtime systems contain many low-level features that are still not well-understood; (3) emerging computer systems require novel software support; making system programming precise will lead to new insights and capabilities for fulfilling such challenges.

Certified Programming

Certified OS Kernels

Formal Semantics

Concurrency and Coordination

VeriML and Proof Automation

Certifying and Certified Compilation

Language-Based Security and Resilence

Certified Programming

We use the term certified programming to denote the process of developing a program P that precisely follows a specification S. The result is what we call certified software. To build certified code, the very first thing we need is a logic (or a specification language) that can help programmers write rigorous specifications. Such logic must be both expressive and formal so that it can describe everything which programmers may say in natural languages.

The next step is to produce the program itself and show that it indeed satisfies a desirable specification. There are multiple ways to accomplish this. One is to have programmers still write their code using their favorite programming languages (e.g., C, Java, C#) and then apply post-hoc program verification. Another is to build tools that synthesize the resulting program automatically from the specification. The most promising approach will likely fall somewhere in between. This requires new languages and tools that support co-development of formal specifications, proofs, and actual code in an integrated framework.

In the past few years, we have used the Coq proof assistant to serve as our test-bed for developing certified programs. Because we are particularly interested in system software, we focused our effort on certifying low-level programs. This includes our initial work on certified binaries and proof-carrying code as well as a series of subsequent efforts on certified assembly programming. To certify realistic system code, we took a swiss-cheese approach and attempted to knock down as many open challenges as we can: embedded code pointers, stack-based control abstraction, garbage collection (with or without read/write barriers), self-modifying code (including OS bootloaders), machine context-switch libaries, preemptive thread libraries (with hardware interrupts), relaxed memory models, and optimistic concurrency algorithms. We also developed a new open framework (named OCAP) for linking certified heterogeneous components and building end-to-end certified software

Making certified programming both practical and scalable requires much more work. Existing languages and tools are clearly insufficient. The metalogic we used so far (CiC in Coq) is unnecessarily complex and rigid. Domain-specific logics (DSLs) we developed for certifying various systems components are great, but there are many more that are waiting to be discovered.

Certified OS Kernels

We recently started a new effort on the advanced development of certified OS kernels. More details about this effort can be found in this white paper.

Formal Semantics

A big part of certified programming is to uncover the deep insights hidden inside various programming patterns, idioms, or abstractions and to understand the subtle interaction among them. Many of the recent advances in certified software are powered by the large body of work on formal semantics and type systems in the PL community (see Harper's book for a good survey).

Motivated by our interest on system software, we are particularly keen on developing semantic models for low-level languages. Although different applications and OS components may involve completely different programming abstractions, they all get compiled into low-level programs (on von Neumann machines). In some cases, the machine language is the only place where we can reason about interoperation and make end-to-end certification work.

We are interested in developing simple semantic models for side-effects, concurrency, and first-class code pointers, especially in the context of system programming. Ideally, the semantic model we use must be both intuitive (to system programmers) and general. Recent results on linear logic, game semantics, and step-indexing hold great promise, but it is still not clear what are the most fundamental intuitions and how they can be used to support certified low-level programming.

Concurrency and Coordination

We have done much work on reasoning about shared memory concurrency in the past few years. We first applied the rely-guarantee method to certify concurrent assembly code. To support nonpreemptive code, we developed a local gurantee technique. We have also developed a new program logic to support dynamic thread creation and termination.

Our group is the first to successfully combine the Concurrent Separation Logic (CSL) with the Rely-Guarantee mechanism. We later developed an extension of this logic to certify concurrent code with hardware interrupts.

Recently we formalized relaxed memory models by giving a parameterized operational semantics to a concurrent programming language. Behaviors of a program under a relaxed memory model are defined as behaviors of a set of related programs under the sequentially consistent model. This semantics is parameterized in the sense that different memory models can be obtained by using different relations between programs. We present one particular relation that is weaker than many memory models and accounts for the majority of sequential optimizations. We then show that the derived semantics has the DRF-guarantee, using a notion of race-freedom captured by an operational grainless semantics. Our grainless semantics bridges concurrent separation logic (CSL) and relaxed memory models naturally, which allows us to finally prove the folklore theorem that CSL is sound with relaxed memory models.

We are particularly interested in certifying optimistic concurrent programs. We recently developed a history logic which is an extension of Xinyu Feng's local rely guarantee logic with support of temporal reasoning. We have successfully certified a few optimistic algorithms using this new logic.

VeriML and Proof Automation

We want to combine automated provers and decision procedures with certified programming and proof development. Existing automated verifiers often depend on a rather restricted logic. Proof assistants with a richer meta logic (e.g., Coq), however, provide poor support for automation.

We have recently developed a prototype implementation of VeriML, a novel language design that couples a type-safe effectful computational language with first-class support for manipulating logical terms such as propositions and proofs. Our main idea is to integrate a rich logical framework inside a computational language inspired by ML. This way, the OCAP-style certified linking can be supported and the trusted computing base of the verification process is kept at a minimum. The design for VeriML is such that the added features are orthogonal to the rest of the computational language, and also do not require significant additions to the logic language, so soundness is guaranteed.

VeriML can programmatically reflect on logical terms so that we can write a large number of procedures (e.g., tactics, decision procedures, and automated provers) tailored to solving different proof obligations. VeriML also provides an unrestricted programming model for developing these procedures, that permits the use of features such as non-termination and mutable references. The reason for this is that even simple decision procedures might make essential use of imperative data structures and might have complex termination arguments. One such example are decision procedures for the theory of equality with uninterpreted functions. By enabling an unrestricted programming model, porting such procedures does not require significant re-engineering.

Certifying and Certified Compilation

Much work in the program-verification community concentrates on source-level programs written in high-level languages (such as C, Java, C#). In order to turn these programs into certified assembly components suitable for linking in the OCAP framework, OCAP developers must show that their corresponding compiler is also trustworthy.

A certified compiler proves the compiler itself "correct". One such example is CompCert. It specifies formal operational semantics for the C subset as well as for the machine language, and provides a machine-checkable proof in Coq that the compiler preserves behavior from one operational semantics to another.

A certifying compiler, on the other hand, is not necessarily correct but will take a (certified) source program and generate certified assembly code. Previous work on certifying compilation focuses on type-safe source languages and can preserve only type-safety properties. A challenging open problem is to extend certifying compilation to preserve deep correctness and security properties.

Language-Based Security and Resilience

To provide strong mechanisms for controlling information propagation via inter-process communication (IPC) and other interaction mechanisms, we will incorporate and build on recent work on information flow control, which enable the explicit labelling of information and control over its propagation through and out of a system. To support the analysis of and recovery from software and hardware failures of all kinds, we are interested in incorporating history-based accountability and recovery as a basic primitive.

colophon