ReferencesTyping Mutable References

(* $Date: 2013-07-17 16:19:11 -0400 (Wed, 17 Jul 2013) $ *)

Require Export Smallstep.

Most real languages include impure features ("computational effects")...

mutable pointer structures
non-local control constructs (exceptions, continuations, etc.)
process synchronization and communication
etc.

Goal for this chapter formalize pointers.

Definitions

In most real-world programming languages, the mechanisms of name binding and storage allocation are intentionally confused: every name refers to a mutable piece of storage.

Conceptually, it's cleaner to separate the two:

keep the mechanisms for name binding (abstraction, let) the same;
introduce new, explicit operations for allocating, changing, and looking up the contents of references (pointers).

Syntax

Module STLCRef.

The basic operations on references are allocation, dereferencing, and assignment.

To allocate a reference, we use the ref operator, providing an initial value for the new cell. For example, ref 5 creates a new cell containing the value 5, and evaluates to a reference to that cell.
To read the current value of this cell, we use the dereferencing operator !; for example, !(ref 5) evaluates to 5.
To change the value stored in a cell, we use the assignment operator. If r is a reference, r := 7 will store the value 7 in the cell referenced by r. However, r := 7 evaluates to the trivial value unit; it exists only to have the side effect of modifying the contents of a cell.

Types

If T is a type, then Ref T is the type of references which point to a cell holding values of type T.

      T ::= Nat
          | Unit
          | T → T
          | Ref T

Inductive ty : Type :=
  | TNat : ty
  | TUnit : ty
  | TArrow : ty → ty → ty
  | TRef : ty → ty.

Terms

Besides variables, abstractions, applications, natural-number-related terms, and unit, we need four more sorts of terms in order to handle mutable references:

      t ::= ...              Terms
          | ref t              allocation
          | !t                 dereference
          | t := t             assignment
          | l                  location

Typing (Preview)

Informally, the typing rules for allocation, dereferencing, and assignment will look like this:

Γ ⊢ t₁ : T₁	(T_Ref)

Γ ⊢ ref t₁ : Ref T₁

Γ ⊢ t₁ : Ref T₁₁	(T_Deref)

Γ ⊢ !t₁ : T₁₁

Γ ⊢ t₁ : Ref T₁₁
Γ ⊢ t₂ : T₁₁	(T_Assign)

Γ ⊢ t₁ := t₂ : Unit

The rule for locations will require a bit more machinery, and this will motivate some changes to the other rules; we'll come back to this later.

Values and Substitution

Besides abstractions and numbers, we have two new types of values: the unit value, and locations.

Inductive value : tm → Prop :=
  | v_abs : ∀x T t,
      value (tabs x T t)
  | v_nat : ∀n,
      value (tnat n)
  | v_unit :
      value tunit
  | v_loc : ∀l,
      value (tloc l).

Hint Constructors value.

Extending substitution to handle the new syntax of terms is straightforward.

Fixpoint subst (x:id) (s:tm) (t:tm) : tm :=
  match t with
  | tvar x' ⇒
      if eq_id_dec x x' then s else t
  | tapp t₁ t₂ ⇒
      tapp (subst x s t₁) (subst x s t₂)
  | tabs x' T t₁ ⇒
      if eq_id_dec x x' then t else tabs x' T (subst x s t₁)
  | tnat n ⇒
      t
  | tsucc t₁ ⇒
      tsucc (subst x s t₁)
  | tpred t₁ ⇒
      tpred (subst x s t₁)
  | tmult t₁ t₂ ⇒
      tmult (subst x s t₁) (subst x s t₂)
  | tif0 t₁ t₂ t₃ ⇒
      tif0 (subst x s t₁) (subst x s t₂) (subst x s t₃)
  | tunit ⇒
      t
  | tref t₁ ⇒
      tref (subst x s t₁)
  | tderef t₁ ⇒
      tderef (subst x s t₁)
  | tassign t₁ t₂ ⇒
      tassign (subst x s t₁) (subst x s t₂)
  | tloc _ ⇒
      t
  end.

Notation "'[' x ':=' s ']' t" := (subst x s t) (at level 20).

Pragmatics

Side Effects and Sequencing

We can write

       r:=succ(!r); !r

as an abbreviation for

       (λx:Unit. !r) (r := succ(!r)).

Definition tseq t₁ t₂ :=
tapp (tabs (Id 0) TUnit t₂) t₁.

References and Aliasing

It is important to bear in mind the difference between the reference that is bound to r and the cell in the store that is pointed to by this reference.

If we make a copy of r, for example by binding its value to another variable s, what gets copied is only the reference, not the contents of the cell itself.

For example, after evaluating

      let r = ref 5 in
      let s = r in
      s := 82;
      (!r)+1

the cell referenced by r will contain the value 82, while the result of the whole expression will be 83. The references r and s are said to be aliases for the same cell.

The possibility of aliasing can make programs with references quite tricky to reason about. For example, the expression

      r := 5; r := !s

assigns 5 to r and then immediately overwrites it with s's current value; this has exactly the same effect as the single assignment

      r := !s

unless we happen to do it in a context where r and s are aliases for the same cell!

Shared State

Of course, aliasing is also a large part of what makes references useful. In particular, it allows us to set up "implicit communication channels" — shared state — between different parts of a program. For example, suppose we define a reference cell and two functions that manipulate its contents:

    let c = ref 0 in
    let incc = λ_:Unit. (c := succ (!c); !c) in
    let decc = λ_:Unit. (c := pred (!c); !c) in
    ...

The Unit-abstractions ("thunks") are used to slow down evaluation.

Objects

We can go a step further and write a function that creates c, incc, and decc, packages incc and decc together into a record, and returns this record:

    newcounter = 
        λ_:Unit.
           let c = ref 0 in
           let incc = λ_:Unit. (c := succ (!c); !c) in
           let decc = λ_:Unit. (c := pred (!c); !c) in
           {i=incc, d=decc}

Now, each time we call newcounter, we get a new record of functions that share access to the same storage cell c. The caller of newcounter can't get at this storage cell directly, but can affect it indirectly by calling the two functions. In other words, we've created a simple form of object.

    let c1 = newcounter unit in
    let c2 = newcounter unit in
    // Note that we've allocated two separate storage cells now!
    let r1 = c1.i unit in
    let r2 = c2.i unit in
    r2  // yields 1, not 2!

References to Compound Types

A reference cell need not contain just a number: the primitives we've defined above allow us to create references to values of any type, including functions. For example, we can use references to functions to give a (not very efficient) implementation of arrays of numbers, as follows. Write NatArray for the type Ref (Nat→Nat).

Recall the equal function from the MoreStlc chapter:

    equal = 
      fix 
        (λeq:Nat->Nat->Bool.
           λm:Nat. λn:Nat.
             if m=0 then iszero n 
             else if n=0 then false
             else eq (pred m) (pred n))

Now, to build a new array, we allocate a reference cell and fill it with a function that, when given an index, always returns 0.

    newarray = λ_:Unit. ref (λn:Nat.0)

To look up an element of an array, we simply apply the function to the desired index.

    lookup = λa:NatArray. λn:Nat. (!a) n

The interesting part of the encoding is the update function. It takes an array, an index, and a new value to be stored at that index, and does its job by creating (and storing in the reference) a new function that, when it is asked for the value at this very index, returns the new value that was given to update, and on all other indices passes the lookup to the function that was previously stored in the reference.

    update = λa:NatArray. λm:Nat. λv:Nat. 
                 let oldf = !a in
                 a := (λn:Nat. if equal m n then v else oldf n);

References to values containing other references can also be very useful, allowing us to define data structures such as mutable lists and trees.

Null References

One more difference between our references and C-style mutable variables: null pointers

in C, a pointer variable can contain either a valid pointer into the heap or the special value NULL
source of many errors and much tricky reasoning
- (any pointer may potentially be "not there")
- but occasionally useful
easy to implement here using references plus options (which can be built out of disjoint sum types)
```
            Option T       =  Unit + T
            NullableRef T  =  Option (Ref T)
```

Garbage Collection

A last issue that we should mention before we move on with formalizing references is storage de-allocation. We have not provided any primitives for freeing reference cells when they are no longer needed. Instead, like many modern languages (including ML and Java) we rely on the run-time system to perform garbage collection, collecting and reusing cells that can no longer be reached by the program.

This is not just a question of taste in language design: it is extremely difficult to achieve type safety in the presence of an explicit deallocation operation. The reason for this is the familiar dangling reference problem: we allocate a cell holding a number, save a reference to it in some data structure, use it for a while, then deallocate it and allocate a new cell holding a boolean, possibly reusing the same storage. Now we can have two names for the same storage cell — one with type Ref Nat and the other with type Ref Bool.

Exercise: 1 star (type_safety_violation)

Show how this can lead to a violation of type safety.

(* FILL IN HERE *)

☐

Operational Semantics

Locations

A reference names a location in the store (also known as the heap or just the memory).

What is the store?

Concretely: An array of 8-bit bytes, indexed by 32-bit integers.
More abstractly: a list (or array) of values
Even more abstractly: a partial function from locations to values.

We'll choose the middle way here: A store is a list of values, and a location is a natural number index into this list.

Stores

A store is just a list of values. (This more concrete representation will be more convenient for proofs than the functional representation we used in IMP.)

Definition store := list tm.

We use store_lookup n st to retrieve the value of the reference cell at location n in the store st. Note that we must give a default value to nth in case we try looking up an index which is too large. (In fact, we will never actually do this, but proving it will of course require some work!)

Definition store_lookup (n:nat) (st:store) :=
nth n st tunit.

To add a new reference cell to the store, we use snoc.

Fixpoint snoc {A:Type} (l:list A) (x:A) : list A :=
  match l with
  | nil ⇒ x :: nil
  | h :: t ⇒ h :: snoc t x
  end.

Lemma length_snoc : ∀A (l:list A) x,
  length (snoc l x) = S (length l).
Proof.
(* ELIDED *) Admitted.

Lemma nth_lt_snoc : ∀A (l:list A) x d n,
  n < length l →
  nth n l d = nth n (snoc l x) d.
Proof.
(* ELIDED *) Admitted.

Lemma nth_eq_snoc : ∀A (l:list A) x d,
  nth (length l) (snoc l x) d = x.
Proof.
(* ELIDED *) Admitted.

To update the store, we use the replace function, which replaces the contents of a cell at a particular index.

Fixpoint replace {A:Type} (n:nat) (x:A) (l:list A) : list A :=
  match l with
  | nil ⇒ nil
  | h :: t ⇒
    match n with
    | O ⇒ x :: t
    | S n' ⇒ h :: replace n' x t
    end
  end.

Lemma replace_nil : ∀A n (x:A),
  replace n x nil = nil.
Proof.
(* ELIDED *) Admitted.

Lemma length_replace : ∀A n x (l:list A),
  length (replace n x l) = length l.
Proof with auto.
(* ELIDED *) Admitted.

Lemma lookup_replace_eq : ∀l t st,
  l < length st →
  store_lookup l (replace l t st) = t.
Proof with auto.
(* ELIDED *) Admitted.

Lemma lookup_replace_neq : ∀l1 l2 t st,
  l1 ≠ l2 →
  store_lookup l1 (replace l2 t st) = store_lookup l1 st.
Proof with auto.
(* ELIDED *) Admitted.

Reduction

First, we augment existing evaluation rules with stores:

value v₂	(ST_AppAbs)

(λa:T.t₁₂) v₂ / st ⇒ [a:=v₂]t₁₂ / st

t₁ / st ⇒ t₁' / st'	(ST_App1)

t₁ t₂ / st ⇒ t₁' t₂ / st'

value v₁ t₂ / st ⇒ t₂' / st'	(ST_App2)

v₁ t₂ / st ⇒ v₁ t₂' / st'

Now we can give the rules for the new constructs:

	(ST_RefValue)

ref v₁ / st ⇒ loc \|st\| / st,v₁

t₁ / st ⇒ t₁' / st'	(ST_Ref)

ref t₁ / st ⇒ ref t₁' / st'

l < \|st\|	(ST_DerefLoc)

!(loc l) / st ⇒ lookup l st / st

t₁ / st ⇒ t₁' / st'	(ST_Deref)

!t₁ / st ⇒ !t₁' / st'

l < \|st\|	(ST_Assign)

loc l := v₂ / st ⇒ unit / replace l v₂ st

t₁ / st ⇒ t₁' / st'	(ST_Assign1)

t₁ := t₂ / st ⇒ t₁' := t₂ / st'

t₂ / st ⇒ t₂' / st'	(ST_Assign2)

v₁ := t₂ / st ⇒ v₁ := t₂' / st'

Reserved Notation "t₁ '/' st1 '⇒' t₂ '/' st2"
  (at level 40, st1 at level 39, t₂ at level 39).

Inductive step : tm × store → tm × store → Prop :=
  | ST_AppAbs : ∀x T t₁₂ v₂ st,
         value v₂ →
         tapp (tabs x T t₁₂) v₂ / st ⇒ [x:=v₂]t₁₂ / st
  | ST_App1 : ∀t₁ t₁' t₂ st st',
         t₁ / st ⇒ t₁' / st' →
         tapp t₁ t₂ / st ⇒ tapp t₁' t₂ / st'
  | ST_App2 : ∀v₁ t₂ t₂' st st',
         value v₁ →
         t₂ / st ⇒ t₂' / st' →
         tapp v₁ t₂ / st ⇒ tapp v₁ t₂'/ st'
  | ST_SuccNat : ∀n st,
         tsucc (tnat n) / st ⇒ tnat (S n) / st
  | ST_Succ : ∀t₁ t₁' st st',
         t₁ / st ⇒ t₁' / st' →
         tsucc t₁ / st ⇒ tsucc t₁' / st'
  | ST_PredNat : ∀n st,
         tpred (tnat n) / st ⇒ tnat (pred n) / st
  | ST_Pred : ∀t₁ t₁' st st',
         t₁ / st ⇒ t₁' / st' →
         tpred t₁ / st ⇒ tpred t₁' / st'
  | ST_MultNats : ∀n1 n2 st,
         tmult (tnat n1) (tnat n2) / st ⇒ tnat (mult n1 n2) / st
  | ST_Mult1 : ∀t₁ t₂ t₁' st st',
         t₁ / st ⇒ t₁' / st' →
         tmult t₁ t₂ / st ⇒ tmult t₁' t₂ / st'
  | ST_Mult2 : ∀v₁ t₂ t₂' st st',
         value v₁ →
         t₂ / st ⇒ t₂' / st' →
         tmult v₁ t₂ / st ⇒ tmult v₁ t₂' / st'
  | ST_If0 : ∀t₁ t₁' t₂ t₃ st st',
         t₁ / st ⇒ t₁' / st' →
         tif0 t₁ t₂ t₃ / st ⇒ tif0 t₁' t₂ t₃ / st'
  | ST_If0_Zero : ∀t₂ t₃ st,
         tif0 (tnat 0) t₂ t₃ / st ⇒ t₂ / st
  | ST_If0_Nonzero : ∀n t₂ t₃ st,
         tif0 (tnat (S n)) t₂ t₃ / st ⇒ t₃ / st
  | ST_RefValue : ∀v₁ st,
         value v₁ →
         tref v₁ / st ⇒ tloc (length st) / snoc st v₁
  | ST_Ref : ∀t₁ t₁' st st',
         t₁ / st ⇒ t₁' / st' →
         tref t₁ / st ⇒ tref t₁' / st'
  | ST_DerefLoc : ∀st l,
         l < length st →
         tderef (tloc l) / st ⇒ store_lookup l st / st
  | ST_Deref : ∀t₁ t₁' st st',
         t₁ / st ⇒ t₁' / st' →
         tderef t₁ / st ⇒ tderef t₁' / st'
  | ST_Assign : ∀v₂ l st,
         value v₂ →
         l < length st →
         tassign (tloc l) v₂ / st ⇒ tunit / replace l v₂ st
  | ST_Assign1 : ∀t₁ t₁' t₂ st st',
         t₁ / st ⇒ t₁' / st' →
         tassign t₁ t₂ / st ⇒ tassign t₁' t₂ / st'
  | ST_Assign2 : ∀v₁ t₂ t₂' st st',
         value v₁ →
         t₂ / st ⇒ t₂' / st' →
         tassign v₁ t₂ / st ⇒ tassign v₁ t₂' / st'

where "t₁ '/' st1 '⇒' t₂ '/' st2" := (step (t₁,st1) (t₂,st2)).

Tactic Notation "step_cases" tactic(first) ident(c) :=
  first;
  [ Case_aux c "ST_AppAbs" | Case_aux c "ST_App1"
  | Case_aux c "ST_App2" | Case_aux c "ST_SuccNat"
  | Case_aux c "ST_Succ" | Case_aux c "ST_PredNat"
  | Case_aux c "ST_Pred" | Case_aux c "ST_MultNats"
  | Case_aux c "ST_Mult1" | Case_aux c "ST_Mult2"
  | Case_aux c "ST_If0" | Case_aux c "ST_If0_Zero"
  | Case_aux c "ST_If0_Nonzero" | Case_aux c "ST_RefValue"
  | Case_aux c "ST_Ref" | Case_aux c "ST_DerefLoc"
  | Case_aux c "ST_Deref" | Case_aux c "ST_Assign"
  | Case_aux c "ST_Assign1" | Case_aux c "ST_Assign2" ].

Hint Constructors step.

Definition multistep := (multi step).
Notation "t₁ '/' st '⇒*' t₂ '/' st'" := (multistep (t₁,st) (t₂,st'))
  (at level 40, st at level 39, t₂ at level 39).

Typing

Our contexts for free variables will be exactly the same as for the STLC, partial maps from identifiers to types.

Definition context := partial_map ty.

Store typings

Having extended our syntax and evaluation rules to accommodate references, our last job is to write down typing rules for the new constructs — and, of course, to check that they are sound. Naturally, the key question is, "What is the type of a location?"

First of all, notice that we do not need to answer this question for purposes of typechecking the terms that programmers actually write. Concrete location constants arise only in terms that are the intermediate results of evaluation; they are not in the language that programmers write. So we only need to determine the type of a location when we're in the middle of an evaluation sequence, e.g. trying to apply the progress or preservation lemmas. Thus, even though we normally think of typing as a static program property, it makes sense for the typing of locations to depend on the dynamic progress of the program too.

As a first try, note that when we evaluate a term containing concrete locations, the type of the result depends on the contents of the store that we start with. For example, if we evaluate the term !(loc 1) in the store [unit, unit], the result is unit; if we evaluate the same term in the store [unit, λx:Unit.x], the result is λx:Unit.x. With respect to the former store, the location 1 has type Unit, and with respect to the latter it has type Unit→Unit. This observation leads us immediately to a first attempt at a typing rule for locations:

Γ ⊢ lookup l st : T₁

Γ ⊢ loc l : Ref T₁

That is, to find the type of a location l, we look up the current contents of l in the store and calculate the type T₁ of the contents. The type of the location is then Ref T₁.

Having begun in this way, we need to go a little further to reach a consistent state. In effect, by making the type of a term depend on the store, we have changed the typing relation from a three-place relation (between contexts, terms, and types) to a four-place relation (between contexts, stores, terms, and types). Since the store is, intuitively, part of the context in which we calculate the type of a term, let's write this four-place relation with the store to the left of the turnstile: Γ; st ⊢ t : T. Our rule for typing references now has the form

Gamma; st ⊢ lookup l st : T₁

Gamma; st ⊢ loc l : Ref T₁

and all the rest of the typing rules in the system are extended similarly with stores. The other rules do not need to do anything interesting with their stores — just pass them from premise to conclusion.

However, there are two problems with this rule. First, typechecking is rather inefficient, since calculating the type of a location l involves calculating the type of the current contents v of l. If l appears many times in a term t, we will re-calculate the type of v many times in the course of constructing a typing derivation for t. Worse, if v itself contains locations, then we will have to recalculate their types each time they appear.

Second, the proposed typing rule for locations may not allow us to derive anything at all, if the store contains a cycle. For example, there is no finite typing derivation for the location 0 with respect to this store:

   [λx:Nat. (!(loc 1)) x, λx:Nat. (!(loc 0)) x]

Exercise: 2 stars (cyclic_store)

Can you find a term whose evaluation will create this particular cyclic store?

☐

Both of these problems arise from the fact that our proposed typing rule for locations requires us to recalculate the type of a location every time we mention it in a term. But this, intuitively, should not be necessary. After all, when a location is first created, we know the type of the initial value that we are storing into it. Suppose we are willing to enforce the invariant that the type of the value contained in a given location never changes; that is, although we may later store other values into this location, those other values will always have the same type as the initial one. In other words, we always have in mind a single, definite type for every location in the store, which is fixed when the location is allocated. Then these intended types can be collected together as a store typing —-a finite function mapping locations to types.

As usual, this conservative typing restriction on allowed updates means that we will rule out as ill-typed some programs that could evaluate perfectly well without getting stuck.

Just like we did for stores, we will represent a store type simply as a list of types: the type at index i records the type of the value stored in cell i.

Definition store_ty := list ty.

The store_Tlookup function retrieves the type at a particular index.

Definition store_Tlookup (n:nat) (ST:store_ty) :=
nth n ST TUnit.

Suppose we are given a store typing ST describing the store st in which some term t will be evaluated. Then we can use ST to calculate the type of the result of t without ever looking directly at st. For example, if ST is [Unit, Unit→Unit], then we may immediately infer that !(loc 1) has type Unit→Unit. More generally, the typing rule for locations can be reformulated in terms of store typings like this:

l < \|ST\|

Gamma; ST ⊢ loc l : Ref (lookup l ST)

That is, as long as l is a valid location (it is less than the length of ST), we can compute the type of l just by looking it up in ST. Typing is again a four-place relation, but it is parameterized on a store typing rather than a concrete store. The rest of the typing rules are analogously augmented with store typings.

The Typing Relation

l < \|ST\|	(T_Loc)

Gamma; ST ⊢ loc l : Ref (lookup l ST)

Gamma; ST ⊢ t₁ : T₁	(T_Ref)

Gamma; ST ⊢ ref t₁ : Ref T₁

Gamma; ST ⊢ t₁ : Ref T₁₁	(T_Deref)

Gamma; ST ⊢ !t₁ : T₁₁

Gamma; ST ⊢ t₁ : Ref T₁₁
Gamma; ST ⊢ t₂ : T₁₁	(T_Assign)

Gamma; ST ⊢ t₁ := t₂ : Unit

Reserved Notation "Gamma ';' ST '⊢' t '∈' T" (at level 40).

Inductive has_type : context → store_ty → tm → ty → Prop :=
  | T_Var : ∀Γ ST x T,
      Γ x = Some T →
      Γ; ST ⊢ (tvar x) ∈ T
  | T_Abs : ∀Γ ST x T₁₁ T₁₂ t₁₂,
      (extend Γ x T₁₁); ST ⊢ t₁₂ ∈ T₁₂ →
      Γ; ST ⊢ (tabs x T₁₁ t₁₂) ∈ (TArrow T₁₁ T₁₂)
  | T_App : ∀T₁ T₂ Γ ST t₁ t₂,
      Γ; ST ⊢ t₁ ∈ (TArrow T₁ T₂) →
      Γ; ST ⊢ t₂ ∈ T₁ →
      Γ; ST ⊢ (tapp t₁ t₂) ∈ T₂
  | T_Nat : ∀Γ ST n,
      Γ; ST ⊢ (tnat n) ∈ TNat
  | T_Succ : ∀Γ ST t₁,
      Γ; ST ⊢ t₁ ∈ TNat →
      Γ; ST ⊢ (tsucc t₁) ∈ TNat
  | T_Pred : ∀Γ ST t₁,
      Γ; ST ⊢ t₁ ∈ TNat →
      Γ; ST ⊢ (tpred t₁) ∈ TNat
  | T_Mult : ∀Γ ST t₁ t₂,
      Γ; ST ⊢ t₁ ∈ TNat →
      Γ; ST ⊢ t₂ ∈ TNat →
      Γ; ST ⊢ (tmult t₁ t₂) ∈ TNat
  | T_If0 : ∀Γ ST t₁ t₂ t₃ T,
      Γ; ST ⊢ t₁ ∈ TNat →
      Γ; ST ⊢ t₂ ∈ T →
      Γ; ST ⊢ t₃ ∈ T →
      Γ; ST ⊢ (tif0 t₁ t₂ t₃) ∈ T
  | T_Unit : ∀Γ ST,
      Γ; ST ⊢ tunit ∈ TUnit
  | T_Loc : ∀Γ ST l,
      l < length ST →
      Γ; ST ⊢ (tloc l) ∈ (TRef (store_Tlookup l ST))
  | T_Ref : ∀Γ ST t₁ T₁,
      Γ; ST ⊢ t₁ ∈ T₁ →
      Γ; ST ⊢ (tref t₁) ∈ (TRef T₁)
  | T_Deref : ∀Γ ST t₁ T₁₁,
      Γ; ST ⊢ t₁ ∈ (TRef T₁₁) →
      Γ; ST ⊢ (tderef t₁) ∈ T₁₁
  | T_Assign : ∀Γ ST t₁ t₂ T₁₁,
      Γ; ST ⊢ t₁ ∈ (TRef T₁₁) →
      Γ; ST ⊢ t₂ ∈ T₁₁ →
      Γ; ST ⊢ (tassign t₁ t₂) ∈ TUnit

where "Gamma ';' ST '⊢' t '∈' T" := (has_type Γ ST t T).

Hint Constructors has_type.

Tactic Notation "has_type_cases" tactic(first) ident(c) :=
  first;
  [ Case_aux c "T_Var" | Case_aux c "T_Abs" | Case_aux c "T_App"
  | Case_aux c "T_Nat" | Case_aux c "T_Succ" | Case_aux c "T_Pred"
  | Case_aux c "T_Mult" | Case_aux c "T_If0"
  | Case_aux c "T_Unit" | Case_aux c "T_Loc"
  | Case_aux c "T_Ref" | Case_aux c "T_Deref"
  | Case_aux c "T_Assign" ].

Of course, these typing rules will accurately predict the results of evaluation only if the concrete store used during evaluation actually conforms to the store typing that we assume for purposes of typechecking. This proviso exactly parallels the situation with free variables in the STLC: the substitution lemma promises us that, if Γ ⊢ t : T, then we can replace the free variables in t with values of the types listed in Γ to obtain a closed term of type T, which, by the type preservation theorem will evaluate to a final result of type T if it yields any result at all. (We will see later how to formalize an analogous intuition for stores and store typings.)

However, for purposes of typechecking the terms that programmers actually write, we do not need to do anything tricky to guess what store typing we should use. Recall that concrete location constants arise only in terms that are the intermediate results of evaluation; they are not in the language that programmers write. Thus, we can simply typecheck the programmer's terms with respect to the empty store typing. As evaluation proceeds and new locations are created, we will always be able to see how to extend the store typing by looking at the type of the initial values being placed in newly allocated cells; this intuition is formalized in the statement of the type preservation theorem below.

Properties

Standard theorems...

Progress — pretty much same as always
Preservation — needs to be stated more carefully!

Well-Typed Stores

Evaulation and typing relations take more parameters now, so at a minumum we have to add these to the statement of preservation...

Theorem preservation_wrong1 : ∀ST T t st t' st',
  empty; ST ⊢ t ∈ T →
  t / st ⇒ t' / st' →
  empty; ST ⊢ t' ∈ T.
Abort.

Obviously wrong: no relation between assumed store typing and provided store!

We need a way of saying "this store satisfies the assumptions of that store typing"...

Definition store_well_typed (ST:store_ty) (st:store) :=
  length ST = length st ∧
  (∀l, l < length st →
     empty; ST ⊢ (store_lookup l st) ∈ (store_Tlookup l ST)).

Informally, we will write ST ⊢ st for store_well_typed ST st.

We can now state something closer to the desired preservation property:

Theorem preservation_wrong2 : ∀ST T t st t' st',
  empty; ST ⊢ t ∈ T →
  t / st ⇒ t' / st' →
  store_well_typed ST st →
  empty; ST ⊢ t' ∈ T.
Abort.

This works... for all but one of the evaluation rules!

Extending Store Typings

Intuition: Since the store can grow during evaluation, we need to let the store typing grow too...

Inductive extends : store_ty → store_ty → Prop :=
  | extends_nil : ∀ST',
      extends ST' nil
  | extends_cons : ∀x ST' ST,
      extends ST' ST →
      extends (x::ST') (x::ST).

Hint Constructors extends.

We'll need a few technical lemmas about extended contexts.

First, looking up a type in an extended store typing yields the same result as in the original:

Lemma extends_lookup : ∀l ST ST',
  l < length ST →
  extends ST' ST →
  store_Tlookup l ST' = store_Tlookup l ST.
Proof with auto.
(* ELIDED *) Admitted.

Next, if ST' extends ST, the length of ST' is at least that of ST.

Lemma length_extends : ∀l ST ST',
  l < length ST →
  extends ST' ST →
  l < length ST'.
Proof with eauto.
(* ELIDED *) Admitted.

Finally, snoc ST T extends ST, and extends is reflexive.

Lemma extends_snoc : ∀ST T,
extends (snoc ST T) ST.
Proof with auto.
(* ELIDED *) Admitted.

Lemma extends_refl : ∀ST,
extends ST ST.
Proof.
(* ELIDED *) Admitted.

Preservation, Finally

We can now give the final, correct statement of the type preservation property:

Definition preservation_theorem := ∀ST t t' T st st',
  empty; ST ⊢ t ∈ T →
  store_well_typed ST st →
  t / st ⇒ t' / st' →
  ∃ST',
    (extends ST' ST ∧
     empty; ST' ⊢ t' ∈ T ∧
     store_well_typed ST' st').

Note that this gives us just what we need to "turn the crank" when applying the theorem to multi-step reduction sequences.

Substitution lemma

To prove preservation, we need to re-develop the rest of the machinery that we saw for the pure STLC (plus a couple of new things about store typings and extension)...

Inductive appears_free_in : id → tm → Prop :=
  | afi_var : ∀x,
      appears_free_in x (tvar x)
  | afi_app1 : ∀x t₁ t₂,
      appears_free_in x t₁ → appears_free_in x (tapp t₁ t₂)
  | afi_app2 : ∀x t₁ t₂,
      appears_free_in x t₂ → appears_free_in x (tapp t₁ t₂)
  | afi_abs : ∀x y T₁₁ t₁₂,
      y ≠ x →
      appears_free_in x t₁₂ →
      appears_free_in x (tabs y T₁₁ t₁₂)
  | afi_succ : ∀x t₁,
      appears_free_in x t₁ →
      appears_free_in x (tsucc t₁)
  | afi_pred : ∀x t₁,
      appears_free_in x t₁ →
      appears_free_in x (tpred t₁)
  | afi_mult1 : ∀x t₁ t₂,
      appears_free_in x t₁ →
      appears_free_in x (tmult t₁ t₂)
  | afi_mult2 : ∀x t₁ t₂,
      appears_free_in x t₂ →
      appears_free_in x (tmult t₁ t₂)
  | afi_if0_1 : ∀x t₁ t₂ t₃,
      appears_free_in x t₁ →
      appears_free_in x (tif0 t₁ t₂ t₃)
  | afi_if0_2 : ∀x t₁ t₂ t₃,
      appears_free_in x t₂ →
      appears_free_in x (tif0 t₁ t₂ t₃)
  | afi_if0_3 : ∀x t₁ t₂ t₃,
      appears_free_in x t₃ →
      appears_free_in x (tif0 t₁ t₂ t₃)
  | afi_ref : ∀x t₁,
      appears_free_in x t₁ → appears_free_in x (tref t₁)
  | afi_deref : ∀x t₁,
      appears_free_in x t₁ → appears_free_in x (tderef t₁)
  | afi_assign1 : ∀x t₁ t₂,
      appears_free_in x t₁ → appears_free_in x (tassign t₁ t₂)
  | afi_assign2 : ∀x t₁ t₂,
      appears_free_in x t₂ → appears_free_in x (tassign t₁ t₂).

Tactic Notation "afi_cases" tactic(first) ident(c) :=
  first;
  [ Case_aux c "afi_var"
  | Case_aux c "afi_app1" | Case_aux c "afi_app2" | Case_aux c "afi_abs"
  | Case_aux c "afi_succ" | Case_aux c "afi_pred"
  | Case_aux c "afi_mult1" | Case_aux c "afi_mult2"
  | Case_aux c "afi_if0_1" | Case_aux c "afi_if0_2" | Case_aux c "afi_if0_3"
  | Case_aux c "afi_ref" | Case_aux c "afi_deref"
  | Case_aux c "afi_assign1" | Case_aux c "afi_assign2" ].

Hint Constructors appears_free_in.

Lemma free_in_context : ∀x t T Γ ST,
   appears_free_in x t →
   Γ; ST ⊢ t ∈ T →
   ∃T', Γ x = Some T'.
Proof with eauto.
  intros. generalize dependent Γ. generalize dependent T.
  afi_cases (induction H) Case;
        intros; (try solve [ inversion H0; subst; eauto ]).
  Case "afi_abs".
    inversion H1; subst.
    apply IHappears_free_in in H8.
    rewrite extend_neq in H8; assumption.
Qed.

Lemma context_invariance : ∀Γ Γ' ST t T,
  Γ; ST ⊢ t ∈ T →
  (∀x, appears_free_in x t → Γ x = Γ' x) →
  Γ'; ST ⊢ t ∈ T.
Proof with eauto.
  intros.
  generalize dependent Γ'.
  has_type_cases (induction H) Case; intros...
  Case "T_Var".
    apply T_Var. symmetry. rewrite ← H...
  Case "T_Abs".
    apply T_Abs. apply IHhas_type; intros.
    unfold extend.
    destruct (eq_id_dec x x0)...
  Case "T_App".
    eapply T_App.
      apply IHhas_type1...
      apply IHhas_type2...
  Case "T_Mult".
    eapply T_Mult.
      apply IHhas_type1...
      apply IHhas_type2...
  Case "T_If0".
    eapply T_If0.
      apply IHhas_type1...
      apply IHhas_type2...
      apply IHhas_type3...
  Case "T_Assign".
    eapply T_Assign.
      apply IHhas_type1...
      apply IHhas_type2...
Qed.

Lemma substitution_preserves_typing : ∀Γ ST x s S t T,
  empty; ST ⊢ s ∈ S →
  (extend Γ x S); ST ⊢ t ∈ T →
  Γ; ST ⊢ ([x:=s]t) ∈ T.
Proof with eauto.
  intros Γ ST x s S t T Hs Ht.
  generalize dependent Γ. generalize dependent T.
  t_cases (induction t) Case; intros T Γ H;
    inversion H; subst; simpl...
  Case "tvar".
    rename i into y.
    destruct (eq_id_dec x y).
    SCase "x = y".
      subst.
      rewrite extend_eq in H3.
      inversion H3; subst.
      eapply context_invariance...
      intros x Hcontra.
      destruct (free_in_context _ _ _ _ _ Hcontra Hs) as [T' HT'].
      inversion HT'.
    SCase "x ≠ y".
      apply T_Var.
      rewrite extend_neq in H3...
  Case "tabs". subst.
    rename i into y.
    destruct (eq_id_dec x y).
    SCase "x = y".
      subst.
      apply T_Abs. eapply context_invariance...
      intros. apply extend_shadow.
    SCase "x ≠ x0".
      apply T_Abs. apply IHt.
      eapply context_invariance...
      intros. unfold extend.
      destruct (eq_id_dec y x0)...
      subst.
      rewrite neq_id...
Qed.

Assignment Preserves Store Typing

Next, we must show that replacing the contents of a cell in the store with a new value of appropriate type does not change the overall type of the store. (This is needed for the ST_Assign rule.)

Lemma assign_pres_store_typing : ∀ST st l t,
  l < length st →
  store_well_typed ST st →
  empty; ST ⊢ t ∈ (store_Tlookup l ST) →
  store_well_typed ST (replace l t st).
Proof with auto.
  intros ST st l t Hlen HST Ht.
  inversion HST; subst.
  split. rewrite length_replace...
  intros l' Hl'.
  destruct (beq_nat l' l) eqn: Heqll'.
  Case "l' = l".
    apply beq_nat_true in Heqll'; subst.
    rewrite lookup_replace_eq...
  Case "l' ≠ l".
    apply beq_nat_false in Heqll'.
    rewrite lookup_replace_neq...
    rewrite length_replace in Hl'.
    apply H0...
Qed.

Weakening for Stores

Finally, we need a lemma on store typings, stating that, if a store typing is extended with a new location, the extended one still allows us to assign the same types to the same terms as the original.

(The lemma is called store_weakening because it resembles the "weakening" lemmas found in proof theory, which show that adding a new assumption to some logical theory does not decrease the set of provable theorems.)

Lemma store_weakening : ∀Γ ST ST' t T,
  extends ST' ST →
  Γ; ST ⊢ t ∈ T →
  Γ; ST' ⊢ t ∈ T.
Proof with eauto.
  intros. has_type_cases (induction H0) Case; eauto.
  Case "T_Loc".
    erewrite ← extends_lookup...
    apply T_Loc.
    eapply length_extends...
Qed.

We can use the store_weakening lemma to prove that if a store is well typed with respect to a store typing, then the store extended with a new term t will still be well typed with respect to the store typing extended with t's type.

Lemma store_well_typed_snoc : ∀ST st t₁ T₁,
  store_well_typed ST st →
  empty; ST ⊢ t₁ ∈ T₁ →
  store_well_typed (snoc ST T₁) (snoc st t₁).
Proof with auto.
  intros.
  unfold store_well_typed in ×.
  inversion H as [Hlen Hmatch]; clear H.
  rewrite !length_snoc.
  split...
  Case "types match.".
    intros l Hl.
    unfold store_lookup, store_Tlookup.
    apply le_lt_eq_dec in Hl; inversion Hl as [Hlt | Heq].
    SCase "l < length st".
      apply lt_S_n in Hlt.
      rewrite ← !nth_lt_snoc...
      apply store_weakening with ST. apply extends_snoc.
      apply Hmatch...
      rewrite Hlen...
    SCase "l = length st".
      inversion Heq.
      rewrite nth_eq_snoc.
      rewrite ← Hlen. rewrite nth_eq_snoc...
      apply store_weakening with ST... apply extends_snoc.
Qed.

Preservation!

Now that we've got everything set up right, the proof of preservation is actually quite straightforward.

Theorem preservation : ∀ST t t' T st st',
  empty; ST ⊢ t ∈ T →
  store_well_typed ST st →
  t / st ⇒ t' / st' →
  ∃ST',
    (extends ST' ST ∧
     empty; ST' ⊢ t' ∈ T ∧
     store_well_typed ST' st').
Proof with eauto using store_weakening, extends_refl.
  (* ELIDED *) Admitted.

Exercise: 3 stars (preservation_informal)

Write a careful informal proof of the preservation theorem, concentrating on the T_App, T_Deref, T_Assign, and T_Ref cases.

(* FILL IN HERE *)
☐

Progress

Fortunately, progress for this system is pretty easy to prove; the proof is very similar to the proof of progress for the STLC, with a few new cases for the new syntactic constructs.

Theorem progress : ∀ST t T st,
  empty; ST ⊢ t ∈ T →
  store_well_typed ST st →
  (value t ∨ ∃t', ∃st', t / st ⇒ t' / st').
Proof with eauto.
  (* ELIDED *) Admitted.

References and Nontermination

Section RefsAndNontermination.
Import ExampleVariables.

We know that the simply typed lambda calculus is normalizing, that is, every well-typed term can be reduced to a value in a finite number of steps. What about STLC + references? Surprisingly, adding references causes us to lose the normalization property: there exist well-typed terms in the STLC + references which can continue to reduce forever, without ever reaching a normal form!

How can we construct such a term? The main idea is to make a function which calls itself. We first make a function which calls another function stored in a reference cell; the trick is that we then smuggle in a reference to itself!

   (λr:Ref (Unit -> Unit). 
        r := (λx:Unit.(!r) unit); (!r) unit) 
   (ref (λx:Unit.unit))

First, ref (λx:Unit.unit) creates a reference to a cell of type Unit → Unit. We then pass this reference as the argument to a function which binds it to the name r, and assigns to it the function (λx:Unit.(!r) unit) — that is, the function which ignores its argument and calls the function stored in r on the argument unit; but of course, that function is itself! To get the ball rolling we finally execute this function with (!r) unit.

Definition loop_fun :=
  tabs x TUnit (tapp (tderef (tvar r)) tunit).

Definition loop :=
  tapp
  (tabs r (TRef (TArrow TUnit TUnit))
    (tseq (tassign (tvar r) loop_fun)
            (tapp (tderef (tvar r)) tunit)))
  (tref (tabs x TUnit tunit)).

This term is well typed:

Lemma loop_typeable : ∃T, empty; nil ⊢ loop ∈ T.
Proof with eauto.
  eexists. unfold loop. unfold loop_fun.
  eapply T_App...
  eapply T_Abs...
  eapply T_App...
    eapply T_Abs. eapply T_App. eapply T_Deref. eapply T_Var.
    unfold extend. simpl. reflexivity. auto.
  eapply T_Assign.
    eapply T_Var. unfold extend. simpl. reflexivity.
  eapply T_Abs.
    eapply T_App...
      eapply T_Deref. eapply T_Var. reflexivity.
Qed.

To show formally that the term diverges, we first define the step_closure of the single-step reduction relation, written ⇒+. This is just like the reflexive step closure of single-step reduction (which we're been writing ⇒*), except that it is not reflexive: t ⇒+ t' means that t can reach t' by one or more steps of reduction.

Inductive step_closure {X:Type} (R: relation X) : X → X → Prop :=
  | sc_one : ∀(x y : X),
                R x y → step_closure R x y
  | sc_step : ∀(x y z : X),
                R x y →
                step_closure R y z →
                step_closure R x z.

Definition multistep1 := (step_closure step).
Notation "t₁ '/' st '⇒+' t₂ '/' st'" := (multistep1 (t₁,st) (t₂,st'))
  (at level 40, st at level 39, t₂ at level 39).

Now, we can show that the expression loop reduces to the expression !(loc 0) unit and the size-one store [r:=(loc 0)] loop_fun.

As a convenience, we introduce a slight variant of the normalize tactic, called reduce, which tries solving the goal with multi_refl at each step, instead of waiting until the goal can't be reduced any more. Of course, the whole point is that loop doesn't normalize, so the old normalize tactic would just go into an infinite loop reducing it forever!

Ltac print_goal := match goal with ⊢ ?x ⇒ idtac x end.
Ltac reduce :=
    repeat (print_goal; eapply multi_step ;
            [ (eauto 10; fail) | (instantiate; compute)];
            try solve [apply multi_refl]).

Lemma loop_steps_to_loop_fun :
  loop / nil ⇒*
  tapp (tderef (tloc 0)) tunit / cons ([r:=tloc 0]loop_fun) nil.
Proof with eauto.
  unfold loop.
  reduce.
Qed.

Finally, the latter expression reduces in two steps to itself!

Lemma loop_fun_step_self :
  tapp (tderef (tloc 0)) tunit / cons ([r:=tloc 0]loop_fun) nil ⇒+
  tapp (tderef (tloc 0)) tunit / cons ([r:=tloc 0]loop_fun) nil.
Proof with eauto.
  unfold loop_fun; simpl.
  eapply sc_step. apply ST_App1...
  eapply sc_one. compute. apply ST_AppAbs...
Qed.

Exercise: 4 stars (factorial_ref)

Use the above ideas to implement a factorial function in STLC with references. (There is no need to prove formally that it really behaves like the factorial. Just use the example below to make sure it gives the correct result when applied to the argument 4.)

Definition factorial : tm :=
(* FILL IN HERE *) admit.

Lemma factorial_type : empty; nil ⊢ factorial ∈ (TArrow TNat TNat).
Proof with eauto.
(* FILL IN HERE *) Admitted.

If your definition is correct, you should be able to just uncomment the example below; the proof should be fully automatic using the reduce tactic.

(*
Lemma factorial_4 : exists st,
tapp factorial (tnat 4) / nil ==>* tnat 24 / st.
Proof.
eexists. unfold factorial. reduce.
Qed.
*)

☐

Additional Exercises

Exercise: 5 stars, optional (garabage_collector)

Challenge problem: modify our formalization to include an account of garbage collection, and prove that it satisfies whatever nice properties you can think to prove about it.

☐

End RefsAndNontermination.
End STLCRef.