End-to-End Verification of Information-Flow Security for C and Assembly ProgramsLast modified: Feb 5th 2016. AuthorsDavid CostanzoZhong Shao Ronghui Gu AbstractProtecting the confidentiality of information manipulated by a computing system is one of the most important challenges facing today's cybersecurity community. A promising step toward conquering this challenge is to formally verify that the end-to-end behavior of the computing system really satisfies various information-flow policies. Unfortunately, because today's system software still consists of both C and assembly programs, the end-to-end verification necessarily requires that we not only prove the security properties of individual components, but also carefully preserve these properties through compilation and cross-language linking.In this paper, we present a practical, general, and novel methodology for formally verifying end-to-end security of a software system that consists of both C and assembly programs. We introduce a general definition of observation function that unifies the concepts of policy specification, state indistinguishability, and whole-execution behaviors. We show how to use different observation functions for different levels of abstraction, and how to link different security proofs across abstraction levels using a special kind of simulation that is guaranteed to preserve state indistinguishability. To demonstrate the effectiveness of our new methodology, we have successfully constructed an end-to- end security proof, fully formalized in the Coq proof assistant, of a nontrivial operating system. Some parts of the operating system are written in C and some are written in assembly; we verify all of the code, regardless of language. PublishedInstructions for PLDI'16 Artifact EvaluationThis document contains the instructions for compiling and running the system described in our PLDI submission: a certified operating system kernel that guarantees security. We intend to clarify in this document every detail the AEC members may need to consider. The first four sections of this document describe how to compile and execute the secure version of mCertiKOS, while Section 5 describes the security proof and how the Coq files correspond to our paper. If you have any questions, please do not hesitate to send an email to:
1. The structure of the packageWe submit our aritifact together with a Virtual Box image. We will explain each of the files in the main directory in more detail below. 1.1
|
Folder | Description |
---|---|
compcert | the Compcert compiler |
compcertx | the extended Compcert |
liblayers | the layer library |
mcertikos | the Coq proofs for the 32 layers of mCertiKOS |
kernel | the directory where extracted mCertiKOS code can be executed and tested |
PLDI16AE_Ubuntu15.04_x86_64.ova
Virtual Box image of the environment to build the mCertiKOS image.
Inside the Virtual Box image PLDI16AE_Ubuntu15.04_x86_64.ova
, we have installed the Ubuntu 15.04, and downloaded the full source code of mCertiKOS (in /home/certikos/workspace/certikos
) together with the extracted verified assembly code (in /home/certikos/workspace/certikos/kernel/sys/kern/certikos.S
). While building the whole proof and extraction can be fully done within the virtual guest machine, it will take a large amount of time to complete. Thus, we strongly recommend building the proof directly on a Linux workstation. To give you a rough idea, it takes about 5 hours to compile the full proof and extract the certified assembly code on an 8 core (with hyperthreading) machine with 32 GB of memory, with maximum of 16 parallel threads (make -j16
).
To compile and extract mCertiKOS, the following tools are needed:
Coq
: version 8.4pl4. (Coq 8.5 is not supported because of compatibility issues with CompCert).OCaml
: version 4 (used for code extraction)To compile the kernel with extracted mCertiKOS assembly and to build the mCertiKOS kernel image, the following tools are needed:
gcc
: (32 bit version)libc6
: (32 bit version)build-essential
: (for 'make' system)mCertiKOS is used in a special testbed platform, which requires certain hardware. As a result, it cannot run on top of an arbitrary computer. There are two ways to test the built mCertiKOS image:
A bare-metal machine satisfying the following specification:
An easier way is to use 'qemu'.
We recommend starting the evaluation process with the following three steps:
PLDI16AE_Ubuntu15.04_x86_64.ova
with a relatively new version of Virtualbox. In the guest Linux, we have installed all the tools above. The password of the internal Linux is a single letter "a". Compare and verify that the extracted code is indeed identical to the one in /home/certikos/workspace/certikos/kernel/sys/kern/certikos.S
. If you are unsure about it, you can set up the necessary shared folder with Virtualbox to copy your newly extracted code to the guest, overwriting the old one. Build mCertiKOS and verify that the small example setup we have prepared for you works as expected.In case you want to build everything locally, we will explain how you can install all the necessary tools on a clean amd64 version of Ubuntu 15.04 (which can be downloaded from here).
install all the essential tools by:
sudo apt-get install gcc-multilib libc6-i386 build-essential \m4 ocaml camlp4-extra qemu-utils qemu-system-x86 git netcat menhir ruby
download and install coq
# download Coqwget https://coq.inria.fr/distrib/V8.4pl4/files/coq-8.4pl4.tar.gztar -vxzf coq-8.4pl4.tar.gzrm coq-8.4pl4.tar.gz# compile and install Coqcd coq-8.4pl4./configuremakesudo make install
To build the proof, first unzip the certikos directory and configure it against the platform:
tar vxzf certikos.tar.gzcd certikos# for linux./configure ia32-linux
Then use make
to do the proof checking and code extraction. We recommend forking multiple jobs when making mCertiKOS to reduce the compilation time. It takes approximately 5 hrs to finish the proof checking when running on an 8-core machine with 32GB memory with 16 parallel jobs (make -j16
).
The code extraction will start automatically in the middle of the proof checking process. It prompts the following message:
make -j16>>> coqc ...>>> ...>>> coqtop -batch -load-vernac-source ./extraction/extraction.v>>> Welcome to Coq 8.4pl4>>>>>> cd .. && \ ocamlbuild -no-hygiene -no-links -j 2 -I mcertikos/extraction -I>>> mcertikos/driver -I compcert/lib -I compcert/common -I compcert/ia32/standard -I>>> compcert/ia32 -I compcert/backend -I compcert/cfrontend -I compcert/flocq/Core>>> -I compcert/flocq/Prop -I compcert/flocq/Calc -I compcert/flocq/Appli -I>>> compcert/driver -I compcert/cparser CertiKOSDriver.native && \ mv>>> _build/mcertikos/driver/CertiKOSDriver.native mcertikos/ccertikos Finished, 0>>> targets in 00:00:00.>>>>>> Finished, 1652 targets in 00:00:31.
When it is done, an executable named mcertikos/ccertikos
will be generated. Finally, the mCertiKOS assembly (certikos.s) can be generated by executing the produced binary:
./mcertikos/ccertikos -S
To compile and execute the CertiKOS kernel, you need to insert it into the bootstrapping environment by:
mv certikos.s kernel/sys/kern/certikos.S
You can also use make install
to achieve the above steps.
To compile the mCertiKOS kernel, you should first switch the working directory to kernel/
, and then use sudo ./make.sh
to compile the kernel.
cd kernelsudo ./make.sh# password: a
An illustrating video recorded in the Virtual Box can be accessed below.
If everything goes well, the following messages will be shown on the screen:
*** make certikos kernel : ****** ****** ***>>> done! <<<*** build image ***Building Certikos Image...creating disk...131040+0 records in131040+0 records out67092480 bytes copied, 0.372211 s, 180 MB/sdone.writing mbr...0+1 records in0+1 records out238 bytes copied, 0.000103436 s, 2.3 MB/s23+1 records in23+1 records out11836 bytes copied, 0.000115659 s, 102 MB/sdone.copying kernel files...kernel starts at sector 2048make file system...password for hao:mke2fs 1.42.12Discarding device blocks: doneCreating filesystem with 64496 1k blocks and 16128 inodesFilesystem UUID: c53516ec-9457-416d-8875-3130eec8e927Superblock backups stored on blocks:8193, 24577, 40961, 57345Allocating group tables: doneWriting inode tables: doneCreating journal : doneWriting superblocks and filesystem accounting information: donemount disk...copy kernel...mkdir: created directory '/mnt/boot''obj/boot/loader' -> '/mnt/boot/loader''obj/sys/kernel' -> '/mnt/boot/kernel'unmount...doneAll done.>>> done! <<<*** check image ***>>> done! <<<*** all done! ***
NOTE: building the image requires sudo
privilege. Recall that the account password for the Virtualbox guest Linux is a single letter "a".
The command generates a raw disk image certikos.img
.
mCertiKOS can be directly run by ./run.sh
using qemu
. On the terminal, you will see the serial outputs from our simple security example, where a hacker process is attempting to learn information about another process's computations by exhausting memory.
./run.sh# password: a
An illustrating video recorded in the Virtual Box can be accessed below.
If you would like to play around with the user processes, we suggest only modifying the files Alice.c
, Hacker.c
, and Bob.c
in the directory kernel/user/pingpong
. These files correspond to the Alice, Hacker, and Bob principals in our security example, respectively. They are compiled by CompCert when running the make.sh
script mentioned above. The root CertiKOS process (ID 0) is hardcoded to spawn the idle process (ID 1), which in turn is hardcoded to spawn these three user processes. It may be helpful to take a look at the idle process code (kernel/user/idle/idle.c
), and you may wish to allocate more quota to the three child processes if you want them to use more memory (variables aliceq
, hackerq
, and bobq
represent the quotas, which are the maximum number of 4096-byte pages that each process is allowed to allocate or distribute to children). Make sure that the sum of these quotas is no more than the ROOT_QUOTA
value defined in idle.c
.
The Coq proofs for mCertiKOS can mostly be found in the mcertikos
directory. We will state all file/directory names in this section assuming we are rooted at mcertikos
. If you
would like to interactively step through any of our proofs, you can open a file in Proof General
with the pg
script (e.g., ./pg security/Security.v
, executed from the
mcertikos
directory). This script sets up COQPATH
so that Coq knows where
to look for imported files. If you wish to use CoqIDE instead, you'll need to modify the script
or write your own.
We made three major changes to the previous version of mCertiKOS in our work:
security
directory. The files in this directory correspond primarily to Sections 4 and 5 of the paper; we will discuss them in more detail below.../liblayers/compcertx/Observation.v
and its mCertiKOS output-buffer instantiation layerlib/ObservationImpl.v
), as well as observation-aware simulations and whole-execution behaviors (layerlib/Behavior.v
). These concepts are discussed in detail in Sections 2 and 3 of the paper. Note that Observation.v
is not in the mcertikos
directory due to compilation issues.mm/MContainer.v
), which keep track of memory quotas for each process and child-parent relationships. This aspect is not mentioned in the paper as a contribution, but containers do play an important role in getting the security proof to work (see, for example, the "Security of Page Fault" paragraph of Section 5 in the paper).We have placed detailed comments throughout the files mentioned in the first two bullets; we strongly encourage evaluators to read through these comments to understand how everything works.
Additional notes about the mCertiKOS structure that may be useful for navigating the Coq proofs:
mm
, proc
, and trap
directories,
and there are 7 different files associated with each layer. The easiest way to see how the
layers are ordered is to look at the imports of the file driver/CertiKOSproof.v
.
This file imports a single file for each layer in the correct order, starting from MBoot
and ending at TSysCall.objects/AbstractDataType.v
. This type is a record with many fields; lower
layers only make use of a few of these fields, while higher layers use most or all of them.
Most of the fields themselves are of a specific object type that is defined in the file
layerlib/AuxStateDataType.v
.
trap/TSysCall.v
to see which primitives
are exposed to user processes (the paper also mentions these primitives in Section 4).
The primitives are expressed using a finite-map syntax, in the definitions
tsyscall_fresh
and tsyscall_passthrough
near the bottom of
the file. The full machine semantics of the TSysCall layer is the step
semantics
defined in file layerlib/LAsm.v
, specialized to be able to call the TSysCall
primitives (see the exec_step_prim_call
case of the step
definition).
The security
directory contains a large portion of our contribution, including the top-level security theorem. We will discuss the overall structure and the most important points here.
The bottom-tier (in terms of dependency) files are:
File | Description |
---|---|
SecurityCommon.v |
The baseline security file, defining helpful tactics and a few common lemmas. |
ObsEq.v |
The definition of the mCertiKOS observation function (see Section 4.2 of the paper). |
SecurityInv1.v , SecurityInv2.v , SecurityInv.v |
New invariants over the top layer (TSysCall) needed by the security proof. See the beginning of Section 5 of the paper. The invariants are divided into three files to allow for parallel compilation. |
SecurityExtra.v |
A couple of extra proofs needed to plug everything together, including a proof of behaviorality (see Definition 3 of Section 3 in the paper). |
The second-tier files are:
File | Description |
---|---|
Integrity.v |
The integrity lemma pictured in Figure 6 of the paper. |
Confidentiality.v |
An extension of the confidentiality lemma pictured in Figure 6 of the paper, that applies to any observably equivalent active states (rather than just those that step to inactive states). In other words, this is the unwinding condition for all primitives being called by the observing principal, not just the yield primitive. |
ConfidentialityRestore.v |
The confidentiality restore lemma pictured in Figure 6 of the paper. |
SecurityBigstep.v |
The framework for converting the TSysCall semantics into the TSysCall-local semantics, as described in Section 4.2 of the paper and pictured in Figure 4. It takes as parameters proofs of various lemmas over the TSysCall semantics including the three mentioned above, and first produces an unwinding-condition security proof (high-level security property defined in Sections 2 and 3 of the paper) over TSysCall-local. Then it applies the same strategy as Theorem 1 in the paper to produce a whole-execution security proof (called the low-level security property in the paper) over the TSysCall-local semantics. Note that in our Coq code, we refer to the bigstep semantics as TSysCall-secure; we later changed the name to TSysCall-local for the paper because we felt it was more descriptive. |
The top-tier files are:
File | Description |
---|---|
Security.v |
The top-level file that plugs everything together. It combines all of the second-tier
files together to obtain a whole-execution (low-level) security proof for the TSysCall-local
semantics and a simulation from TSysCall-local to TSysCall, and then chains the simulation
with the one from TSysCall to MBoot that mCertiKOS already provided. Finally, it exploits
relationships between simulations and whole-execution behaviors (proved in
layerlib/Behavior.v ) to obtain the final end-to-end guarantee, a whole-execution
security proof over the MBoot semantics. |
SafeSecure.v |
An extra, somewhat disconnected proof that we added recently (after the paper acceptance notification). It proves a lemma over the TSysCall layer saying that whenever you have two observably equivalent states and one is known to safely take a step, then the other also can take a step (i.e., it doesn't get stuck). See below for discussion on why we added this proof. |
When looking through the security proof, there is one assumed hypothesis that may stand out. In the paper, we developed our theory under the assumption that the top-level semantics is safe. That is, the semantics preserves some invariant I, and the invariant guarantees that the semantics never gets stuck (standard preservation and progress). The TSysCall-local semantics of the current version of mCertiKOS, however, can actually get stuck in two ways:
Because of this potential for the top-level semantics to get stuck, we assume a
significant hypothesis in Security.v
, called active_forever
.
The hypothesis essentially states that neither of the two situations above happens. Once
mCertiKOS adds a sandbox feature and preemption, we will remove this hypothesis from
the security proof.
In order to mitigate the potential harm that such an assumption could imply for
the end-to-end guarantee, we chose to add the SafeSecure.v
proof to
this artifact. This proof guarantees that, even if we do not assume that user processes
execute safely, they still will never be able to exploit safety as a side channel for
information flow. This proof will become pointless once mCertiKOS is upgraded to
have a sandbox environment and preemption, but in the meantime it serves as a reasonable
compromise.
Note that the submitted version of the paper does not discuss this safety assumption, mainly for reasons of space. We will add some discussion for the camera-ready version, however.
Copyright © 1996-2017 The FLINT Group
<flint at cs dot yale dot edu>
Yale University Department of Computer Science |