End-to-End Verification of Information-Flow Security for C and Assembly Programs

Last modified: Feb 5th 2016.

Authors

Abstract

Protecting the confidentiality of information manipulated by a computing system is one of the most important challenges facing today's cybersecurity community. A promising step toward conquering this challenge is to formally verify that the end-to-end behavior of the computing system really satisfies various information-flow policies. Unfortunately, because today's system software still consists of both C and assembly programs, the end-to-end verification necessarily requires that we not only prove the security properties of individual components, but also carefully preserve these properties through compilation and cross-language linking.

In this paper, we present a practical, general, and novel methodology for formally verifying end-to-end security of a software system that consists of both C and assembly programs. We introduce a general definition of observation function that unifies the concepts of policy specification, state indistinguishability, and whole-execution behaviors. We show how to use different observation functions for different levels of abstraction, and how to link different security proofs across abstraction levels using a special kind of simulation that is guaranteed to preserve state indistinguishability. To demonstrate the effectiveness of our new methodology, we have successfully constructed an end-to- end security proof, fully formalized in the Coq proof assistant, of a nontrivial operating system. Some parts of the operating system are written in C and some are written in assembly; we verify all of the code, regardless of language.

Published

Paper: [PDF] (Extended TR: [PDF])
Artifact: [TAR]
Virtual Box Image: [OVA]

Instructions for PLDI'16 Artifact Evaluation

This document contains the instructions for compiling and running the system described in our PLDI submission: a certified operating system kernel that guarantees security. We intend to clarify in this document every detail the AEC members may need to consider. The first four sections of this document describe how to compile and execute the secure version of mCertiKOS, while Section 5 describes the security proof and how the Coq files correspond to our paper. If you have any questions, please do not hesitate to send an email to:

David Costanzo (david.costanzo@yale.edu)
Ronghui Gu (ronghui.gu@yale.edu)

1. The structure of the package

We submit our aritifact together with a Virtual Box image.


    +-- certikos.tar.gz                    # source code of mCertiKOS
    +-- PLDI16AE_Ubuntu15.04_x86_64.ova    # Virtual Box image

We will explain each of the files in the main directory in more detail below.

1.1 `certikos.tar.gz`

The source code of our version of mCertiKOS. It includes:

Folder	Description
compcert	the Compcert compiler
compcertx	the extended Compcert
liblayers	the layer library
mcertikos	the Coq proofs for the 32 layers of mCertiKOS
kernel	the directory where extracted mCertiKOS code can be executed and tested

1.2 `PLDI16AE_Ubuntu15.04_x86_64.ova`

Virtual Box image of the environment to build the mCertiKOS image.

2. Preparation for the environment

Inside the Virtual Box image PLDI16AE_Ubuntu15.04_x86_64.ova, we have installed the Ubuntu 15.04, and downloaded the full source code of mCertiKOS (in /home/certikos/workspace/certikos) together with the extracted verified assembly code (in /home/certikos/workspace/certikos/kernel/sys/kern/certikos.S). While building the whole proof and extraction can be fully done within the virtual guest machine, it will take a large amount of time to complete. Thus, we strongly recommend building the proof directly on a Linux workstation. To give you a rough idea, it takes about 5 hours to compile the full proof and extract the certified assembly code on an 8 core (with hyperthreading) machine with 32 GB of memory, with maximum of 16 parallel threads (make -j16).

To compile and extract mCertiKOS, the following tools are needed:

Operating System: Linux (Ubuntu 15.04 is recommended).
Coq: version 8.4pl4. (Coq 8.5 is not supported because of compatibility issues with CompCert).
OCaml: version 4 (used for code extraction)

To compile the kernel with extracted mCertiKOS assembly and to build the mCertiKOS kernel image, the following tools are needed:

gcc: (32 bit version)
libc6: (32 bit version)
build-essential: (for 'make' system)

mCertiKOS is used in a special testbed platform, which requires certain hardware. As a result, it cannot run on top of an arbitrary computer. There are two ways to test the built mCertiKOS image:

A bare-metal machine satisfying the following specification:
- ITX Motherboard Chassis System (1407664/2809001)
- 120 Watt AC/DC Adapter 120 Watt DC-DC ATX Pico Power Supply
- Intel® Desktop Board DQ67EP / DQ77KB
- Intel Quad Core i7-2600S 2.8GHz (Max Turbo Frequency 3.8GHz) CPU
- 8GB DDR3 Memory Modules
- Bootable Compact Flash Converter
- PCI-Express Mini Card, 120GB 2.5" MLC SSD
- 9" 9-Pin to 10-Pin Adapter with Applicable Cables
- A USB stick with 32.0+ GB space
An easier way is to use 'qemu'.

We recommend starting the evaluation process with the following three steps:

Install the necessary software to compile and extract mCertiKOS assembly on either a local Linux workstation or on a Linux Cloud machine. And follow necessary instructions in the next sections to make sure all of the proofs compile and the code extraction works.
Start the Virtualbox image PLDI16AE_Ubuntu15.04_x86_64.ova with a relatively new version of Virtualbox. In the guest Linux, we have installed all the tools above. The password of the internal Linux is a single letter "a". Compare and verify that the extracted code is indeed identical to the one in /home/certikos/workspace/certikos/kernel/sys/kern/certikos.S. If you are unsure about it, you can set up the necessary shared folder with Virtualbox to copy your newly extracted code to the guest, overwriting the old one. Build mCertiKOS and verify that the small example setup we have prepared for you works as expected.

In case you want to build everything locally, we will explain how you can install all the necessary tools on a clean amd64 version of Ubuntu 15.04 (which can be downloaded from here).

install all the essential tools by:

sudo apt-get install gcc-multilib libc6-i386 build-essential \
m4 ocaml camlp4-extra qemu-utils qemu-system-x86 git netcat menhir ruby

download and install coq

# download Coq 
wget https://coq.inria.fr/distrib/V8.4pl4/files/coq-8.4pl4.tar.gz
tar -vxzf coq-8.4pl4.tar.gz
rm coq-8.4pl4.tar.gz
 
# compile and install Coq 
cd coq-8.4pl4
./configure
make
sudo make install

3. Proof checking and code extraction

To build the proof, first unzip the certikos directory and configure it against the platform:

tar vxzf certikos.tar.gz
cd certikos
 
# for linux 
./configure ia32-linux

Then use make to do the proof checking and code extraction. We recommend forking multiple jobs when making mCertiKOS to reduce the compilation time. It takes approximately 5 hrs to finish the proof checking when running on an 8-core machine with 32GB memory with 16 parallel jobs (make -j16).

The code extraction will start automatically in the middle of the proof checking process. It prompts the following message:

make -j16
 
>>> coqc ...
>>> ...
>>> coqtop -batch -load-vernac-source ./extraction/extraction.v
>>> Welcome to Coq 8.4pl4 (April 2015)
>>>
>>> cd .. && \ ocamlbuild -no-hygiene -no-links -j 2 -I mcertikos/extraction -I
>>> mcertikos/driver -I compcert/lib -I compcert/common -I compcert/ia32/standard -I
>>> compcert/ia32 -I compcert/backend -I compcert/cfrontend -I compcert/flocq/Core
>>> -I compcert/flocq/Prop -I compcert/flocq/Calc -I compcert/flocq/Appli -I
>>> compcert/driver -I compcert/cparser  CertiKOSDriver.native && \ mv
>>> _build/mcertikos/driver/CertiKOSDriver.native mcertikos/ccertikos Finished, 0
>>> targets (0 cached) in 00:00:00.
>>>
>>> Finished, 1652 targets (1316 cached) in 00:00:31.

When it is done, an executable named mcertikos/ccertikos will be generated. Finally, the mCertiKOS assembly (certikos.s) can be generated by executing the produced binary:

./mcertikos/ccertikos -S

To compile and execute the CertiKOS kernel, you need to insert it into the bootstrapping environment by:

mv certikos.s kernel/sys/kern/certikos.S

You can also use make install to achieve the above steps.

4. Execution of the certified code

4.1 Build the mCertiKOS kernel image

To compile the mCertiKOS kernel, you should first switch the working directory to kernel/, and then use sudo ./make.sh to compile the kernel.

cd kernel
sudo ./make.sh
# password: a

An illustrating video recorded in the Virtual Box can be accessed below.

If everything goes well, the following messages will be shown on the screen:

 
*** make certikos kernel (DEBUG_MSG=1 SERIAL_DEBUG=1 ):  ***
 *** [as: 13, cc: 73, ccomp: 0, ld: 11] ***
 *** [boot0: 238, boot1: 11836, loader: 61964, kernel: 935496] ***
 >>> done! <<<
 *** build image ***
Building Certikos Image...
 
creating disk...
131040+0 records in
131040+0 records out
67092480 bytes (67 MB) copied, 0.372211 s, 180 MB/s
done.
 
writing mbr...
0+1 records in
0+1 records out
238 bytes (238 B) copied, 0.000103436 s, 2.3 MB/s
23+1 records in
23+1 records out
11836 bytes (12 kB) copied, 0.000115659 s, 102 MB/s
done.
 
copying kernel files...
kernel starts at sector 2048
 
make file system...
[sudo] password for hao:
mke2fs 1.42.12 (29-Aug-2014)
Discarding device blocks: done                            
Creating filesystem with 64496 1k blocks and 16128 inodes
Filesystem UUID: c53516ec-9457-416d-8875-3130eec8e927
Superblock backups stored on blocks:
    8193, 24577, 40961, 57345
 
Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done
 
 
mount disk...
 
copy kernel...
mkdir: created directory '/mnt/boot'
'obj/boot/loader' -> '/mnt/boot/loader'
'obj/sys/kernel' -> '/mnt/boot/kernel'
 
unmount...
done
 
All done.
 >>> done! <<<
 *** check image ***
 >>> done! <<<
 *** all done! ***

NOTE: building the image requires sudo privilege. Recall that the account password for the Virtualbox guest Linux is a single letter "a".

The command generates a raw disk image certikos.img.

4.2 Run mCertiKOS kernel

mCertiKOS can be directly run by ./run.sh using qemu. On the terminal, you will see the serial outputs from our simple security example, where a hacker process is attempting to learn information about another process's computations by exhausting memory.

./run.sh
# password: a

An illustrating video recorded in the Virtual Box can be accessed below.

If you would like to play around with the user processes, we suggest only modifying the files Alice.c, Hacker.c, and Bob.c in the directory kernel/user/pingpong. These files correspond to the Alice, Hacker, and Bob principals in our security example, respectively. They are compiled by CompCert when running the make.sh script mentioned above. The root CertiKOS process (ID 0) is hardcoded to spawn the idle process (ID 1), which in turn is hardcoded to spawn these three user processes. It may be helpful to take a look at the idle process code (kernel/user/idle/idle.c), and you may wish to allocate more quota to the three child processes if you want them to use more memory (variables aliceq, hackerq, and bobq represent the quotas, which are the maximum number of 4096-byte pages that each process is allowed to allocate or distribute to children). Make sure that the sum of these quotas is no more than the ROOT_QUOTA value defined in idle.c.

5. The Security Proof

5.1 Relevant Coq files

The Coq proofs for mCertiKOS can mostly be found in the mcertikos directory. We will state all file/directory names in this section assuming we are rooted at mcertikos. If you would like to interactively step through any of our proofs, you can open a file in Proof General with the pg script (e.g., ./pg security/Security.v, executed from the mcertikos directory). This script sets up COQPATH so that Coq knows where to look for imported files. If you wish to use CoqIDE instead, you'll need to modify the script or write your own.

We made three major changes to the previous version of mCertiKOS in our work:

Added the entire security directory. The files in this directory correspond primarily to Sections 4 and 5 of the paper; we will discuss them in more detail below.
Made many changes throughout the layer libraries (the "framework" code referred to in Figure 5 of the paper), including adding general notions of observation (../liblayers/compcertx/Observation.v and its mCertiKOS output-buffer instantiation layerlib/ObservationImpl.v), as well as observation-aware simulations and whole-execution behaviors (layerlib/Behavior.v). These concepts are discussed in detail in Sections 2 and 3 of the paper. Note that Observation.v is not in the mcertikos directory due to compilation issues.
Prior to our security verification, we added the notion of containers to mCertiKOS (layer mm/MContainer.v), which keep track of memory quotas for each process and child-parent relationships. This aspect is not mentioned in the paper as a contribution, but containers do play an important role in getting the security proof to work (see, for example, the "Security of Page Fault" paragraph of Section 5 in the paper).

We have placed detailed comments throughout the files mentioned in the first two bullets; we strongly encourage evaluators to read through these comments to understand how everything works.

Additional notes about the mCertiKOS structure that may be useful for navigating the Coq proofs:

This version of mCertiKOS consists of 32 linearly-ordered layers. The layer files are spread out across the mm, proc, and trap directories, and there are 7 different files associated with each layer. The easiest way to see how the layers are ordered is to look at the imports of the file driver/CertiKOSproof.v. This file imports a single file for each layer in the correct order, starting from MBoot and ending at TSysCall.
As mentioned in the paper, each layer of mCertiKOS has its own type for abstract data, and a full program state consists of the abstract data, the machine registers, and some concrete CompCert-style memory that has not yet been abstracted. There is actually a single shared type for the abstract data of all layers, defined in the file objects/AbstractDataType.v. This type is a record with many fields; lower layers only make use of a few of these fields, while higher layers use most or all of them. Most of the fields themselves are of a specific object type that is defined in the file layerlib/AuxStateDataType.v.
You can take a look at the file trap/TSysCall.v to see which primitives are exposed to user processes (the paper also mentions these primitives in Section 4). The primitives are expressed using a finite-map syntax, in the definitions tsyscall_fresh and tsyscall_passthrough near the bottom of the file. The full machine semantics of the TSysCall layer is the step semantics defined in file layerlib/LAsm.v, specialized to be able to call the TSysCall primitives (see the exec_step_prim_call case of the step definition).
Our Coq proofs are available in browsable, html format at this website

5.2 The Security Directory

The security directory contains a large portion of our contribution, including the top-level security theorem. We will discuss the overall structure and the most important points here.

The bottom-tier (in terms of dependency) files are:

File Description

SecurityCommon.v The baseline security file, defining helpful tactics and a few common lemmas.

ObsEq.v The definition of the mCertiKOS observation function (see Section 4.2 of the paper).

SecurityInv1.v, SecurityInv2.v, SecurityInv.v New invariants over the top layer (TSysCall) needed by the security proof. See the beginning of Section 5 of the paper. The invariants are divided into three files to allow for parallel compilation.

SecurityExtra.v A couple of extra proofs needed to plug everything together, including a proof of behaviorality (see Definition 3 of Section 3 in the paper).

The second-tier files are:

File	Description
`Integrity.v`	The integrity lemma pictured in Figure 6 of the paper.
`Confidentiality.v`	An extension of the confidentiality lemma pictured in Figure 6 of the paper, that applies to any observably equivalent active states (rather than just those that step to inactive states). In other words, this is the unwinding condition for all primitives being called by the observing principal, not just the yield primitive.
`ConfidentialityRestore.v`	The confidentiality restore lemma pictured in Figure 6 of the paper.
`SecurityBigstep.v`	The framework for converting the TSysCall semantics into the TSysCall-local semantics, as described in Section 4.2 of the paper and pictured in Figure 4. It takes as parameters proofs of various lemmas over the TSysCall semantics including the three mentioned above, and first produces an unwinding-condition security proof (high-level security property defined in Sections 2 and 3 of the paper) over TSysCall-local. Then it applies the same strategy as Theorem 1 in the paper to produce a whole-execution security proof (called the low-level security property in the paper) over the TSysCall-local semantics. Note that in our Coq code, we refer to the bigstep semantics as TSysCall-secure; we later changed the name to TSysCall-local for the paper because we felt it was more descriptive.

The top-tier files are:

File Description

Security.v The top-level file that plugs everything together. It combines all of the second-tier files together to obtain a whole-execution (low-level) security proof for the TSysCall-local semantics and a simulation from TSysCall-local to TSysCall, and then chains the simulation with the one from TSysCall to MBoot that mCertiKOS already provided. Finally, it exploits relationships between simulations and whole-execution behaviors (proved in layerlib/Behavior.v) to obtain the final end-to-end guarantee, a whole-execution security proof over the MBoot semantics.

SafeSecure.v An extra, somewhat disconnected proof that we added recently (after the paper acceptance notification). It proves a lemma over the TSysCall layer saying that whenever you have two observably equivalent states and one is known to safely take a step, then the other also can take a step (i.e., it doesn't get stuck). See below for discussion on why we added this proof.

5.3 Process Safety

When looking through the security proof, there is one assumed hypothesis that may stand out. In the paper, we developed our theory under the assumption that the top-level semantics is safe. That is, the semantics preserves some invariant I, and the invariant guarantees that the semantics never gets stuck (standard preservation and progress). The TSysCall-local semantics of the current version of mCertiKOS, however, can actually get stuck in two ways:

The top-level semantics does not currently specify what should happen when a user process attempts to execute an assembly instruction with undefined semantics (e.g., divide by zero). Ideally, an operating system should provide a sandbox environment for user processes, where any undefined instruction causes a trap into the kernel, and is handled by either killing the offending process or yielding to a different process. mCertiKOS does not yet do this, but there are plans to add this feature in the near future.
The big steps of the TSysCall-local semantics could get stuck if a process yields but is never scheduled again. Even if we proved that the kernel scheduler is fair (which would not be difficult as it currently only does round-robin scheduling), we would still need to assume that user processes always eventually call yield. This is a fundamental limitation of a non-preemptive kernel. There are plans to make mCertiKOS preemptive in the future, but this requires a significant amount of effort.

Because of this potential for the top-level semantics to get stuck, we assume a significant hypothesis in Security.v, called active_forever. The hypothesis essentially states that neither of the two situations above happens. Once mCertiKOS adds a sandbox feature and preemption, we will remove this hypothesis from the security proof.

In order to mitigate the potential harm that such an assumption could imply for the end-to-end guarantee, we chose to add the SafeSecure.v proof to this artifact. This proof guarantees that, even if we do not assume that user processes execute safely, they still will never be able to exploit safety as a side channel for information flow. This proof will become pointless once mCertiKOS is upgraded to have a sandbox environment and preemption, but in the meantime it serves as a reasonable compromise.

Note that the submitted version of the paper does not discuss this safety assumption, mainly for reasons of space. We will add some discussion for the camera-ready version, however.