I study programming languages and formal verification via interactive theorem-proving: I consider mathematically rigorous specifications of programming languages, and try to prove interesting properties about them using machine-assisted reasoning.
My focus is CakeML: an open-source, ML-like programming language, verified compiler, and proof ecosystem. The CakeML language, semantics, and compiler algorithm are all specified in higher-order logic (HOL), using the interactive theorem-prover HOL4. This allows both end-to-end verification of the compiler, and bootstrapping within the logic.
My project is titled “Building Verified Applications in CakeML” and is funded by the UK Research Institute in Verified Trustworthy Software Systems (VeTSS).
I gave a talk on my current work in Dec 2020 - contact me if you would like to see a recording.
As the CakeML project matures, I aim to create practical techniques for writing useful, verified applications in CakeML. I am particularly interested in improving performance in imperative compiler benchmarks, without compromising the guarantees provided by the CakeML ecosystem – currently, we suffer as we cannot straightforwardly use array accesses which are not bounds-checked. This is due to the formulation of our compiler correctness result.
The CakeML compiler is proved to preserve the semantics of input source programs in output machine code, according to validated models of instruction set architectures. However, our compiler correctness result requires the input source program to be well-formed - it must have a well-defined semantics (i.e. it does not get stuck, or cause a runtime type error).
Currently, this condition is enforced in one of three ways:
However, each of these has its own limitations:
We would like a trade-off between these methods, so that we can combine the compositionality and ease-of-use of syntactic typing or translation from HOL, with the expressivity of proof in separation logic - but while avoiding the restrictions of the former and the effort of the latter. We believe semantic type soundness holds the key.
My current aim is to produce a semantic type soundness result for CakeML. This would permit reasoning about programs that are safe to use as if they have a certain type, but are not syntactically typeable.
Semantic type soundness is well-studied (see here for a presentation), and requires construction of a unary, step-indexed logical relation which expresses “semantic typing”. Any notion of semantic typing must be proved compatible with syntactic typing: for every syntactic typing rule, we should be able to derive a semantic equivalent. However, expressing a suitable logical relation in the simply-typed logic of HOL4 is not at all straightforward.
Ideally, our notion of semantic typing could integrate with the existing CakeML proof ecosystem in some way (in particular, with refinement invariants and characteristic formulae). This would permit the re-use of existing proofs in these frameworks to derive semantic typing results.
Fast, safe, imperative libraries.
We can improve imperative compiler benchmarks by safely encapsulating inherently unsafe features (such as array accesses which are not bounds-checked). Semantic typing provides a verification condition, describing exactly what it means to be safely-encapsulated – i.e. a user of the library can only observe safe behaviour as long as they write syntactically well-typed code. Critically, compatibility of semantic typing places the proof burden on the library writer only - we can compose their proof of semantic typing with type inference over the user’s code, to produce an overall result of semantic typing.
Reasoning about type abstraction.
Many modules hide their implementations using type abstraction, permitting module-writers to maintain internal invariants. Semantic typing is a well-known way of proving that such invariants are preserved.
We aim to (re-)prove soundness for Candle, a HOL theorem-proving kernel implemented in CakeML. Like all LCF-style kernels, Candle hides its theorem datatype in a module which exposes only primitive inference rules - soundness follows because only this module can construct theorems.
CakeML can be targeted as a mature, verified compiler backend by any language aiming for end-to-end compiler verification (cf. LLVM IR for non-verified compilers). Such languages may have type systems incompatible with that of CakeML (e.g. dependent typing, linear types, etc.). However, in certain cases we may safely compose code compiled from these languages with code written directly in CakeML, by proving that the compilation to CakeML produces semantically well-typed code.
One candidate for this approach is PureCake: an in-development, call-by-need language (similar to the Glasgow Haskell Compiler Core language) which will compile to CakeML. A more exotic possibility is verified extraction to CakeML from the Coq proof assistant - currently Coq can extract code to OCaml, with the insertion of unsafe casts.
I received a Palantir Prize for a highly commended undergraduate dissertation in 2018. I implemented and verified a compiler optimsation for the CakeML compiler, supervised by Magnus Myreen (technical - CakeML), Stephen Kell (project planning and dissertation), and Anthony Fox (technical - HOL4). This is what led me to my current PhD work!
This work has been published.
I worked with Oskar Abrahamsson and other developers to improve CakeML’s monadic translator. This machinery augments CakeML’s pure translator, translating HOL functions written in a state-and-exception monad to stateful CakeML.
In particular, I contributed the following: