Theory Compiler

(*  Title:       A Shorter Compiler Correctness Proof for Language IMP
    Author:      Pasquale Noce
                 Software Engineer at HID Global, Italy
                 pasquale dot noce dot lavoro at gmail dot com
                 pasquale dot noce at hidglobal dot com
*)

section "Compiler formalization"

theory Compiler
  imports
    "HOL-IMP.BExp"
    "HOL-IMP.Star"
begin


subsection "Introduction"

text ‹
This paper presents a compiler correctness proof for the didactic imperative programming language
IMP, introduced in cite"Nipkow14", shorter than the proof described in cite"Nipkow14" and
included in the Isabelle2021 distribution cite"Klein21". Actually, the size of the presented
proof is just two thirds of the book's proof in the number of formal text lines, and as such it
promises to constitute a further enhanced reference for the formal verification of compilers meant
for larger, real-world programming languages.

Given compiler \emph{completeness}, viz. the simulation of source code execution by compiled code,
"in a deterministic language like IMP", compiler correctness "reduces to preserving termination: if
the machine program terminates, so must the source program", even though proving this "is not much
easier" (cite"Nipkow14", section 8.4). However, the presented proof does not depend on language
determinism, so that the proposed approach is applicable to non-deterministic languages as well.

As a confirmation, this paper extends IMP with an additional command @{text "c1 OR c2"}, standing
for the non-deterministic choice between commands @{text c1} and @{text c2}, and proves compiler
\emph{correctness}, viz. the simulation of compiled code execution by source code, for such extended
language. Of course, the aforesaid comparison between proof sizes does not consider the lines in the
proof of lemma @{text ccomp_correct} (which proves compiler correctness for commands) pertaining to
non-deterministic choice, since this command is not included in the original language IMP. Anyway,
non-deterministic choice turns out to extend that proof just by a modest number of lines.

For further information about the formal definitions and proofs contained in this paper, see
Isabelle documentation, particularly cite"Paulson21", cite"Nipkow21", cite"Krauss21",
and cite"Nipkow11".
›


subsection "Definitions"

text ‹
Here below are the definitions of IMP commands, extended with non-deterministic choice, as well as
of their big-step semantics.

As in the original theory file cite"Klein21", program counter's values are modeled using type
@{typ int} rather than @{typ nat}. As a result, the same declarations and definitions used in
cite"Klein21" to deal with this modeling choice are adopted here as well.

\null
›

declare [[coercion_enabled]]
declare [[coercion "int :: nat  int"]]
declare [[syntax_ambiguity_warning = false]]

datatype com =
  SKIP |
  Assign vname aexp  ("_ ::= _" [1000, 61] 61) |
  Seq com com  ("_;;/ _" [60, 61] 60) |
  If bexp com com  ("(IF _/ THEN _/ ELSE _)" [0, 0, 61] 61) |
  Or com com  ("(_ OR _)" [60, 61] 61) |
  While bexp com  ("(WHILE _/ DO _)" [0, 61] 61)

inductive big_step :: "com × state  state  bool" (infix "" 55) where
Skip:  "(SKIP, s)  s" |
Assign:  "(x ::= a, s)  s(x := aval a s)" |
Seq:  "(c1, s1)  s2; (c2, s2)  s3  (c1;; c2, s1)  s3" |
IfTrue:  "bval b s; (c1, s)  t  (IF b THEN c1 ELSE c2, s)  t" |
IfFalse:  "¬ bval b s; (c2, s)  t  (IF b THEN c1 ELSE c2, s)  t" |
Or1:  "(c1, s)  t  (c1 OR c2, s)  t" |
Or2:  "(c2, s)  t  (c1 OR c2, s)  t" |
WhileFalse:  "¬ bval b s  (WHILE b DO c, s)  s" |
WhileTrue:  "bval b s1; (c, s1)  s2; (WHILE b DO c, s2)  s3 
  (WHILE b DO c, s1)  s3"

declare big_step.intros [intro]

abbreviation (output)
"isize xs  int (length xs)"

notation isize ("size")

primrec (nonexhaustive) inth :: "'a list  int  'a" (infixl "!!" 100) where
"(x # xs) !! i = (if i = 0 then x else xs !! (i - 1))"

lemma inth_append [simp]:
 "0  i 
    (xs @ ys) !! i = (if i < size xs then xs !! i else ys !! (i - size xs))"
by (induction xs arbitrary: i, auto simp: algebra_simps)

text ‹
\null

Next, the instruction set and its semantics are defined. Particularly, to allow for the compilation
of non-deterministic choice commands, the instruction set is extended with an additional instruction
@{text JMPND} performing a non-deterministic jump -- viz. as a result of its execution, the program
counter unconditionally either jumps by the specified offset, or just moves to the next instruction.

As instruction execution can be non-deterministic, an inductively defined predicate @{text iexec},
rather than a simple non-recursive function as the one used in cite"Klein21", must be introduced
to define instruction semantics.

\null
›

datatype instr = 
  LOADI int | LOAD vname | ADD | STORE vname |
  JMP int | JMPLESS int | JMPGE int | JMPND int

type_synonym stack = "val list"
type_synonym config = "int × state × stack"

abbreviation "hd2 xs  hd (tl xs)"
abbreviation "tl2 xs  tl (tl xs)"

inductive iexec :: "instr × config  config  bool" (infix "" 55) where
LoadI:  "(LOADI i, pc, s, stk)  (pc + 1, s, i # stk)" |
Load:  "(LOAD x, pc, s, stk)  (pc + 1, s, s x # stk)" |
Add:  "(ADD, pc, s, stk)  (pc + 1, s, (hd2 stk + hd stk) # tl2 stk)" |
Store:  "(STORE x, pc, s, stk)  (pc + 1, s(x := hd stk), tl stk)" |
Jmp:  "(JMP i, pc, s, stk)  (pc + i + 1, s, stk)" |
JmpLessY:  "hd2 stk < hd stk 
  (JMPLESS i, pc, s, stk)  (pc + i + 1, s, tl2 stk)" |
JmpLessN:  "hd stk  hd2 stk 
  (JMPLESS i, pc, s, stk)  (pc + 1, s, tl2 stk)" |
JmpGeY:  "hd stk  hd2 stk 
  (JMPGE i, pc, s, stk)  (pc + i + 1, s, tl2 stk)" |
JmpGeN:  "hd2 stk < hd stk 
  (JMPGE i, pc, s, stk)  (pc + 1, s, tl2 stk)" |
JmpNdY:  "(JMPND i, pc, s, stk)  (pc + i + 1, s, stk)" |
JmpNdN:  "(JMPND i, pc, s, stk)  (pc + 1, s, stk)"

declare iexec.intros [intro]

inductive_cases LoadIE  [elim!]:  "(LOADI i, pc, s, stk)  cf"
inductive_cases LoadE  [elim!]:  "(LOAD x, pc, s, stk)  cf"
inductive_cases AddE  [elim!]:  "(ADD, pc, s, stk)  cf"
inductive_cases StoreE  [elim!]:  "(STORE x, pc, s, stk)  cf"
inductive_cases JmpE  [elim!]:  "(JMP i, pc, s, stk)  cf"
inductive_cases JmpLessE  [elim!]:  "(JMPLESS i, pc, s, stk)  cf"
inductive_cases JmpGeE  [elim!]:  "(JMPGE i, pc, s, stk)  cf"
inductive_cases JmpNdE  [elim!]:  "(JMPND i, pc, s, stk)  cf"

definition exec1 :: "instr list  config  config  bool"
  ("(_/ / _/ / _)" 55) where
"P  cf  cf'  (P !! fst cf, cf)  cf'  0  fst cf  fst cf < size P"

abbreviation exec :: "instr list  config  config  bool"
  ("(_/ / _/ →*/ _)" 55) where
"exec P  star (exec1 P)"

text ‹
\null

Next, compilation is formalized for arithmetic and boolean expressions (functions @{text acomp} and
@{text bcomp}), as well as for commands (function @{text ccomp}). Particularly, as opposed to what
happens in cite"Klein21", here @{text bcomp} takes a single input, viz. a 3-tuple comprised of
a boolean expression, a flag, and a jump offset. In this way, all three functions accept a single
input, which enables to streamline the compiler correctness proof developed in what follows.

\null
›

primrec acomp :: "aexp  instr list" where
"acomp (N i) = [LOADI i]" |
"acomp (V x) = [LOAD x]" |
"acomp (Plus a1 a2) = acomp a1 @ acomp a2 @ [ADD]"

fun bcomp :: "bexp × bool × int  instr list" where
"bcomp (Bc v, f, i) = (if v = f then [JMP i] else [])" |
"bcomp (Not b, f, i) = bcomp (b, ¬ f, i)" |
"bcomp (And b1 b2, f, i) =
  (let cb2 = bcomp (b2, f, i);
     cb1 = bcomp (b1, False, size cb2 + (if f then 0 else i))
   in cb1 @ cb2)" |
"bcomp (Less a1 a2, f, i) =
  acomp a1 @ acomp a2 @ (if f then [JMPLESS i] else [JMPGE i])"

primrec ccomp :: "com  instr list" where
"ccomp SKIP = []" |
"ccomp (x ::= a) = acomp a @ [STORE x]" |
"ccomp (c1;; c2) = ccomp c1 @ ccomp c2" |
"ccomp (IF b THEN c1 ELSE c2) =
  (let cc1 = ccomp c1; cc2 = ccomp c2; cb = bcomp (b, False, size cc1 + 1)
   in cb @ cc1 @ JMP (size cc2) # cc2)" |
"ccomp (c1 OR c2) =
  (let cc1 = ccomp c1; cc2 = ccomp c2
   in JMPND (size cc1 + 1) # cc1 @ JMP (size cc2) # cc2)" |
"ccomp (WHILE b DO c) =
  (let cc = ccomp c; cb = bcomp (b, False, size cc + 1)
   in cb @ cc @ [JMP (- (size cb + size cc + 1))])"

text ‹
\null

Finally, two lemmas are proven automatically (both seem not to be included in the standard library,
though being quite basic) and registered for use by automatic proof tactics. In more detail:

   The former lemma is an elimination rule similar to @{thm [source] impCE}, with the difference
that it retains the antecedent of the implication in the premise where the consequent is assumed to
hold. This rule permits to have both assumptions @{prop "¬ bval b s"} and @{prop "bval b s"} in the
respective cases resulting from the execution of boolean expression @{text b} in state @{text s}.

   The latter one is an introduction rule similar to @{thm [source] Suc_lessI}, with the difference
that its second assumption is more convenient for proving statements of the form @{prop "Suc m < n"}
arising from the compiler correctness proof developed in what follows.

\null
›

lemma impCE2 [elim!]:
 "P  Q; ¬ P  R; P  Q  R  R"
by blast

lemma Suc_lessI2 [intro!]:
 "m < n; m  n - 1  Suc m < n"
by simp

end