Research

I study programming languages and formal verification via interactive theorem-proving: I consider mathematically rigorous specifications of programming languages, and try to prove interesting properties about them using machine-assisted reasoning.

My focus is CakeML: an open-source, ML-like programming language, verified compiler, and proof ecosystem. The CakeML language, semantics, and compiler algorithm are all specified in higher-order logic (HOL), using the interactive theorem-prover HOL4. This allows both end-to-end verification of the compiler, and bootstrapping within the logic.

Project: Building Verified Applications in CakeML, funded by the UK Research Institute in Verified Trustworthy Software Systems (VeTSS).


On this page:


Publications



.
, (), In , PhD thesis. , .
Slides



PureCake

This work has been published.

Higher-level programming languages

In recent years, the programming language community has seen an uptake of high-level programming languages which offer stronger static guarantees to the programmer. Languages such as Rust, Hack, and Go have received strong support from industry (Mozilla, Facebook, and Google respectively). As such guarantees become stronger, it becomes harder to ensure they are enforced by the language ecosystems - the smallest bug can introduce unexpected issues in user code.

Purity and laziness

These two particular features of interest are perhaps best known due to Haskell.

Purity refers to a lack of side-effects (such as reading/writing to a set of global variables). Side effects are difficult to reason about, so lack of purity leads to many bugs. As a result Haskell largely eliminates side-effects - though fortunately users can still produce effectful computations using monads. Purity leads to referential transparency: the idea that any expression in a program can be replaced with its definition, without changing the behaviour of that program. In other words, the behaviour of any given expression depends only on its inputs.

Referential transparency allows expressions to be evaluated in any order. Haskell leverages this to be non-strict (or lazy): no expression is evaluated until it has to be. This prevents unnecessary divergence in a program - only values that are needed will be computed. This is sometimes viewed as another form of purity: according to some, termination is a side-effect!

Our work

I am collaborating with several other CakeML developers to develop PureCake: a language which brings the end-to-end guarantees of CakeML to the pure, lazy, functional programming world.

We are developing the following features for PureCake:



Verifying Arm ISA models

This work has been published.

Several key proofs rely on formal models of instruction set architectures (ISAs) to provide a reference semantics. For example, CakeML’s powerful end-to-end correctness result relies on such a model for each of its compilation targets. Another example is the sel4 project, which has created a verified operating system microkernel, proving that source C code enforces isolation between system components and processes. The project extends its guarantees down the the compiled binary using translation validation: sel4 is compiled using an off-the-shelf compiler, and automated SMT proves equivalent the semantics of the original C code and the compiled target code, according to the ISA model.

L3

L3 is an architecture specification language created by Anthony Fox, and is state of the art in formalising ISA models - including those used in CakeML/seL4. It can be thought of as a DSL for ISA specification: L3 provides features to streamline specification writing, and generates clean definitions and useful automation in the interactive theorem-provers Isabelle/HOL and HOL4 (as well as executable SML for testing).

The canonical L3 Arm models are hand-written - though they are well-validated with thorough testing, there is still no formal link to Arm specifications. They also selectively abstract over the ISA in the interest of streamlining theorem-proving. We would like to advance the state of the art by using more complete models, derived systematically from Arm specifications.

ASL and Sail

ASL is the Arm-internal architecture specification language - it began as informal pseudocode for concise human-readable documentation, but has since become a well-specified language used extensively within Arm. Arm architecture specifications are now released with machine-executable, ASL documentation, permitting faithful specification and testing.

The Sail language (REMS project, University of Cambridge) is another architecture-specification language. A key aim of the Sail project is to translate ASL to Sail automatically, allowing ASL specifications to be used with Sail’s wider ecosystem. In particular, Sail models can be translated to various backends: executable code for simulation in OCaml and C, and the interactive theorem-provers Coq, Isabelle/HOL, and HOL4. These latter extractions allow us to reason about the Arm ISA formally in far more detail than ever before.

Strengthening assurances in CakeML and sel4

Replacing the existing L3 Arm models used in CakeML and sel4 with their ASL-derived, Sail-translated versions would strengthen the results of the two projects greatly. However, these ASL-derived models are extremely complex, and not well-suited to interactive proof.

Instead, we can prove that the L3-derived models simulate the ASL-derived ones, effectively using the L3 models as stepping-stones. This links the two models formally and increases assurances in the two projects, but avoids the complexity of repeating arduous proofs over a more complex model.



Semantic type soundness for CakeML

As the CakeML project matures, my project aims to create practical techniques for writing useful, verified applications in CakeML. I am particularly interested in improving performance in imperative compiler benchmarks, without compromising the guarantees provided by the CakeML ecosystem – currently, we suffer as we cannot straightforwardly use array accesses which are not bounds-checked. This is due to the formulation of our compiler correctness result.

Compiler correctness in CakeML

The CakeML compiler is proved to preserve the semantics of input source programs in output machine code, according to validated models of instruction set architectures. However, our compiler correctness result requires the input source program to be well-formed - it must have a well-defined semantics (i.e. it does not get stuck, or cause a runtime type error).

Currently, this condition is enforced in one of three ways:

However, each of these has its own limitations:

We would like a trade-off between these methods, so that we can combine the compositionality and ease-of-use of syntactic typing or translation from HOL, with the expressivity of proof in separation logic - but while avoiding the restrictions of the former and the effort of the latter. We believe semantic type soundness holds the key.

What is semantic type soundness?

My current aim is to produce a semantic type soundness result for CakeML. This would permit reasoning about programs that are safe to use as if they have a certain type, but are not syntactically typeable.

Semantic type soundness is well-studied (see here for a presentation), and requires construction of a unary, step-indexed logical relation which expresses “semantic typing”. Any notion of semantic typing must be proved compatible with syntactic typing: for every syntactic typing rule, we should be able to derive a semantic equivalent. However, expressing a suitable logical relation in the simply-typed logic of HOL4 is not at all straightforward.

Ideally, our notion of semantic typing could integrate with the existing CakeML proof ecosystem in some way (in particular, with refinement invariants and characteristic formulae). This would permit the re-use of existing proofs in these frameworks to derive semantic typing results.

Uses in CakeML

Other work

Translation from higher-order logic to stateful CakeML

This work has been published.

I worked with Oskar Abrahamsson and other developers to improve CakeML’s monadic translator. This machinery augments CakeML’s pure translator, translating HOL functions written in a state-and-exception monad to stateful CakeML.

In particular, I contributed the following: