patternMinor
Proof Carrying LLVM?
Viewed 0 times
llvmcarryingproof
Problem
I am intrigued by and understand the very basics of Proof Carrying Code (PCC) and I recognize that LLVM is a machine-independent intermediate language. LLVM is the intermediate form of many languages, including the new Rust language. Typed Assembly is another style of intermediate form which can be used to generate PCC proofs but it has not gained the same popularity/utility. I've heard of Singularity, Verve and JX.
As I understand it, generating PCC proofs are within practicality even for large programs, but that their resulting size and verification burden (necessary only once per "installation") are non-negligible. I don't care about 10x (discardable) proofs nor several hours of verification - I'm very interested in a provably-safe operating environment.
Is there anything missing from the LLVM language or type system that would prevent us from emitting such proofs? Or can you tell that am I missing something about the bigger picture?
More
PCC for what properties? Memory safety?
Yes, memory safety which I believe is a minimum policy required to ensure that a process cannot take over the kernel or another application. Seems to me that (popularly) a message passing interprocess-communication scheme restores some of the functionality taken away by direct memory access (Erlang, Singularity OS).
Do you have a citation/reference for your statement about practicality?
Here's something that suggests its practicality to me: "It has been shown that proofs for Java type safety can be compressed to 12%–20% of the size of x86 machine code ... but unfortunately this increases the proof checking time by a factor of 3".
-- Interpretation of Necula, Oracle-Based Checking of Untrusted Software by Franz
What research have you done?
I've been following this topic (including the names I mentioned previously) over the years reading and absorbing what I can comprehend.
I think it helps to separate "is it possible to do PCC for LLVM bitcode?" from "why doesn't
As I understand it, generating PCC proofs are within practicality even for large programs, but that their resulting size and verification burden (necessary only once per "installation") are non-negligible. I don't care about 10x (discardable) proofs nor several hours of verification - I'm very interested in a provably-safe operating environment.
Is there anything missing from the LLVM language or type system that would prevent us from emitting such proofs? Or can you tell that am I missing something about the bigger picture?
More
PCC for what properties? Memory safety?
Yes, memory safety which I believe is a minimum policy required to ensure that a process cannot take over the kernel or another application. Seems to me that (popularly) a message passing interprocess-communication scheme restores some of the functionality taken away by direct memory access (Erlang, Singularity OS).
Do you have a citation/reference for your statement about practicality?
Here's something that suggests its practicality to me: "It has been shown that proofs for Java type safety can be compressed to 12%–20% of the size of x86 machine code ... but unfortunately this increases the proof checking time by a factor of 3".
-- Interpretation of Necula, Oracle-Based Checking of Untrusted Software by Franz
What research have you done?
I've been following this topic (including the names I mentioned previously) over the years reading and absorbing what I can comprehend.
I think it helps to separate "is it possible to do PCC for LLVM bitcode?" from "why doesn't
Solution
This is a project I've been thinking of venturing into for a few years now, and I was fortunate to have a very useful discussion about it at the time I started planning it with Chris Lattner (at the time, one of the main architects of LLVM, although better known these days perhaps for his work on Swift).
It does seem that there is a possibility of producing correctness proofs for a subset of C (or potentially C++) using LLVM. There has already been some work done which would be useful:
-
As discussed in comments, memory safety is critical. Without memory safety, type safety is essentially irrelevant. A number of memory-safety related analyses and transformations have already been implemented for LLVM that could help. These include:
-
"fat pointers", i.e. transformation of C style pointers into pointers into areas of defined size, in order to catch bound errors
-
a transformation applied to dynamic memory allocation that enables detection of use of pointers after they have been freed. These two features are described in this paper.
For the subset of C programs that perform no type casts, applying these transformations may be adequate. Of course, real C and/or C++ code often uses typecasts, and adding this facility is more challenging. For performance sake, as many of these cases as possible must be caught at verification time rather than runtime, although in the end of the day the use of runtime type information and a dynamic casting implementation would be enough to allow for entirely safe C code.
It does seem that there is a possibility of producing correctness proofs for a subset of C (or potentially C++) using LLVM. There has already been some work done which would be useful:
-
As discussed in comments, memory safety is critical. Without memory safety, type safety is essentially irrelevant. A number of memory-safety related analyses and transformations have already been implemented for LLVM that could help. These include:
-
"fat pointers", i.e. transformation of C style pointers into pointers into areas of defined size, in order to catch bound errors
-
a transformation applied to dynamic memory allocation that enables detection of use of pointers after they have been freed. These two features are described in this paper.
For the subset of C programs that perform no type casts, applying these transformations may be adequate. Of course, real C and/or C++ code often uses typecasts, and adding this facility is more challenging. For performance sake, as many of these cases as possible must be caught at verification time rather than runtime, although in the end of the day the use of runtime type information and a dynamic casting implementation would be enough to allow for entirely safe C code.
Context
StackExchange Computer Science Q#49357, answer score: 3
Revisions (0)
No revisions yet.