Type Inference System

Practical Session - Week #8

The Type Inference System is a TypeScript Implementation of the algorithm for Type Checking and Inference using Type Equations.

System Architecture

The layers of the system are:

Abstract Syntax for L5 Expressions and for Type Expressions.
The key data types manipulated in the algorithm, Type Equations and Type Substitutions.
The logic of the algorithm is implemented in three functional packages: Exp-to-Pool, Exp-to-Equations and the top-level algorithm Solve.

Layer 1: Languages Definitions

The Type Inference algorithm maps expressions in one language (L5 expressions) into expressions in another language (Type Expressions).

The lower level of the Type Inference system implements the Abstract Syntax interfaces for expressions in these two languages.

L5-AST

Implements parser and abstract syntax for the following BNF concrete syntax:

/*
<program> ::= (L5 <exp>+)                  / Program(exps:List(exp))
<exp> ::= <define> | <cexp>                / DefExp | CExp
<define> ::= ( define <var-decl> <cexp> )  / DefExp(var:VarDecl, val:CExp)
<var> ::= <identifier>                     / VarRef(var:string)
<cexp> ::= <number>                        / NumExp(val:number)
        |  <boolean>                       / BoolExp(val:boolean)
        |  <string>                        / StrExp(val:string)
        |  <var-ref>
        |  ( lambda ( <var-decl>* ) <cexp>+ ) | ( lambda ( <var-decl>* ) : <TExp> <cexp>+ ) / ProcExp(args:VarDecl[], body:CExp[], returnTE: TExp))
        |  ( if <cexp> <cexp> <cexp> )     / IfExp(test: CExp, then: CExp, alt: CExp)
        |  ( quote <sexp> )                / LitExp(val:SExp)
        |  ( <cexp> <cexp>* )              / AppExp(operator:CExp, operands:CExp[]))
        |  ( let ( <binding>* ) <cexp>+ )  / LetExp(bindings:Binding[], body:CExp[]))
        |  ( letrec ( binding*) <cexp>+ )  / LetrecExp(bindings:Bindings[], body: CExp)
        |  ( set! <var> <cexp>)            / SetExp(var: varRef, val: CExp)
<binding>  ::= ( <var-decl> <cexp> )            / Binding(var:VarDecl, val:Cexp)
<prim-op>  ::= + | - | * | / | < | > | = | not |  eq? | string=?
                 | cons | car | cdr | list? | number?
                 | boolean? | symbol? | string?
                 | display | newline
<num-exp>  ::= a number token
<bool-exp> ::= #t | #f
<var-ref>  ::= an identifier token         / VarRef(var)
<var-decl> ::= an identifier token | (var : TExp) / VarRef(var, TE: TExp) ##### L5
<sexp>     ::= symbol | number | bool | string | ( <sexp>* )              ##### L3
*/

Note that we introduce optional type annotations in the L5 syntax in 2 places only:

As part of VarDecl in the form (var : type-expression)
As part of ProcExp in the return type of the procedure in the form: (lambda ((x : number) (y : number)) : number ...)

TExp-AST

/*
Type checking language
Syntax with optional type annotations for var declarations and function return types.

<texp>         ::= <atomic-te> | <compound-te> | <tvar>
<atomic-te>    ::= <num-te> | <bool-te> | <void-te> | <str-te>
<num-te>       ::= number   // num-te()
<bool-te>      ::= boolean  // bool-te()
<str-te>       ::= string   // str-te()
<void-te>      ::= void     // void-te()
<compound-te>  ::= <proc-te> | <tuple-te>
<non-tuple-te> ::= <atomic-te> | <proc-te> | <tvar>
<proc-te>      ::= [ <tuple-te> -> <non-tuple-te> ] // proc-te(param-tes: list(te), return-te: te)
<tuple-te>     ::= <non-empty-tuple-te> | <empty-te>
<non-empty-tuple-te> ::= ( <non-tuple-te> *)* <non-tuple-te> // tuple-te(tes: list(te))
<empty-te>     ::= Empty
<tvar>         ::= a symbol starting with T // tvar(id: Symbol, contents; Box(string|boolean))
*/

The definition of the TExp data types, the parser of TExp is provided in file TExp.ts

Examples of Types Expressions:

number
boolean
void
number -> boolean
number * number -> boolean
number -> (number -> boolean)
empty -> number
empty -> void

// Purpose: Compute the type of an expression
// Traverse the AST and check the type according to the exp type.
export const typeofExp = (exp: A.Parsed, tenv: E.TEnv): Result<T.TExp> =>
    A.isNumExp(exp) ? makeOk(T.makeNumTExp()) :
    A.isBoolExp(exp) ? makeOk(T.makeBoolTExp()) :
    A.isStrExp(exp) ? makeOk(T.makeStrTExp()) :
    A.isPrimOp(exp) ? TC.typeofPrim(exp) :
    A.isVarRef(exp) ? E.applyTEnv(tenv, exp.var) :
    A.isIfExp(exp) ? typeofIf(exp, tenv) :
    A.isProcExp(exp) ? typeofProc(exp, tenv) :
    A.isAppExp(exp) ? typeofApp(exp, tenv) :
    A.isLetExp(exp) ? typeofLet(exp, tenv) :
    A.isLetrecExp(exp) ? typeofLetrec(exp, tenv) :
    A.isDefineExp(exp) ? typeofDefine(exp, tenv) :
    A.isProgram(exp) ? typeofProgram(exp, tenv) :
    makeFailure(`Unknown type: ${JSON.stringify(exp, null, 2)}`);

Layer 2: Substitution and Equation ADTs

The Type Inference algorithm is defined in terms of 2 basic formal tools:

Type Substitutions are finite mappings of Type Variables to Type Expressions.
Equations are pairs that state that a left Type Expression is to be held equivalent to a right Type Expression.

Equations are the tool we use to represent a typing statement in the algorithm.

Substitution ADT

export interface Sub {tag: "Sub"; vars: TVar[]; tes: TExp[]; }
export const isSub = (x: any): x is Sub => x.tag === "Sub";

The interface of the Substitution ADT includes:

makeSub(variables, type-expressions)
subGet(sub, var) - lookup a variable in a substitution
extendSub(sub, var, te) - create a new substitution by adding a single pair {var:te} to sub
applySub(sub, te) - replace type variables inside te according to sub
combineSub(sub1, sub2) - compute a new sub such that: applySub(sub, te) = applySub(sub2, applySub(sub1, te)) for all te

Equation ADT

export interface Equation {left: T.TExp, right: T.TExp}
export const makeEquation = (l: T.TExp, r: T.TExp): Equation => ({left: l, right: r});

Layer 3: Exp-to-Pool, Exp-to-Equations, Solve

expToPool(exp) - traverses the AST of a Scheme Expression and maps each sub-expression in the AST to distinct Type Variables

What is Pool?

  export interface PoolItem {e: A.Exp, te: T.TExp}
  export type Pool = PoolItem[];

  // Purpose: Traverse the abstract syntax tree L5-exp
  //          and collect all sub-expressions into a Pool of fresh type variables.
  // Example:
  // bind(p('(+ x 1)'), (s) => mapv(parseL5Exp(s), (e) => TE.expToPool(e))) =>
  // Ok([[AppExp(PrimOp(+), [VarRef(x), NumExp(1)]), TVar(16)],
  //     [NumExp(1), TVar(15)],
  //     [VarRef(x), TVar(14)],
  //     [PrimOp(+), TVar(13)]])
  export const expToPool = (exp: A.Exp): Pool => {
      const findVars = (e: A.Exp, pool: Pool): Pool =>
          A.isAtomicExp(e) ? extendPool(e, pool) :
          A.isProcExp(e) ? extendPool(e, reducePool(findVars, e.body, reducePoolVarDecls(extendPoolVarDecl, e.args, pool))) :
          A.isCompoundExp(e) ? extendPool(e, reducePool(findVars, A.expComponents(e), pool)) :
          makeEmptyPool();
      return findVars(exp, makeEmptyPool());
  };

poolToEquations(pool) applies type formulae to propagate constraints from the syntactic structure of the L5 Expression to constraints (equations) among the type variables of the pool.

There are specific constraints derived for each type of syntactic construct according to the semantics of the L5 programming language

  // Purpose: Return a set of equations for a given Exp encoded as a pool
  // @Pre: pool is the result of expToPool(exp)
  export const poolToEquations = (pool: Pool): Opt.Optional<Equation[]> => {
      // VarRef generate no equations beyond that of var-decl - remove them.
      const poolWithoutVars: Pool = R.filter(R.propSatisfies(R.complement(A.isVarRef), 'e'), pool);
      return Opt.mapv(Opt.mapOptional((e: A.Exp) => makeEquationsFromExp(e, pool), R.map(R.prop('e'), poolWithoutVars)), (eqns: Equation[][]) => 
                  flatten(eqns));
  };

inferType(L5expr) - applies the whole logic of the type inference algorithm.

The main steps of the algorithm are reflected in this top-level fragment:

  // Signature: inferType(exp)
  // Purpose: Infer the type of an expression using the equations method
  // Example: unparseTExp(inferType(parse('(lambda (f x) (f (f x)))')))
  //          ==> '((T_1 -> T_1) * T_1 -> T_1)'
  export const inferType = (exp: A.Exp): Opt.Optional<T.TExp> => {
      const pool = expToPool(exp);
      const equations = poolToEquations(pool);
      const sub = Opt.bind(equations, (eqns: Equation[]) => Res.resultToOptional(solveEquations(eqns)));
      // Extract the computed type of the root expression from the pool
      const texp = inPool(pool, exp);
      // Replace all TVars in the computed type by their type expression
      return Opt.bind(sub, (sub: S.Sub) =>
                  Opt.mapv(texp, (texp: T.TExp) =>
                      T.isTVar(texp) ? S.subGet(sub, texp) : texp))
  };

The key logic of the algorithm lies in the solveEquations algorithm – which turns a list of Type Equations into a single coherent Type Substitution which satisfies all the constraints.

solveEquations relies on the computation of unification between two type expressions – as shown in the following fragment:

export const solveEquations = (equations: Equation[]): Res.Result<S.Sub> =>
    solve(equations, S.makeEmptySub());

// Purpose: Solve the equations, starting from a given substitution.
//          Returns the resulting substitution, or error, if not solvable
const solve = (equations: Equation[], sub: S.Sub): Res.Result<S.Sub> => {
    const solveVarEq = (tvar: T.TVar, texp: T.TExp): Res.Result<S.Sub> =>
        Res.bind(S.extendSub(sub, tvar, texp), sub2 => solve(rest(equations), sub2));

    const bothSidesAtomic = (eq: Equation): boolean =>
        T.isAtomicTExp(eq.left) && T.isAtomicTExp(eq.right);

    const handleBothSidesAtomic = (eq: Equation): Res.Result<S.Sub> =>
        T.isAtomicTExp(eq.left) && T.isAtomicTExp(eq.right) && T.eqAtomicTExp(eq.left, eq.right)
        ? solve(rest(equations), sub)
        : Res.makeFailure(`Equation with non-equal atomic type ${JSON.stringify(eq, null, 2)}`);

    if (isEmpty(equations)) {
        return Res.makeOk(sub);
    }

    const eq = makeEquation(S.applySub(sub, first(equations).left),
                            S.applySub(sub, first(equations).right));

    return T.isTVar(eq.left) ? solveVarEq(eq.left, eq.right) :
           T.isTVar(eq.right) ? solveVarEq(eq.right, eq.left) :
           bothSidesAtomic(eq) ? handleBothSidesAtomic(eq) :
           T.isCompoundTExp(eq.left) && T.isCompoundTExp(eq.right) && canUnify(eq) ?
                solve(R.concat(rest(equations), splitEquation(eq)), sub) :
           Res.makeFailure(`Equation contains incompatible types ${JSON.stringify(eq, null, 2)}`);
};

Compound type expressions are unified by unifying their components one by one.

This is implemented by these 2 functions:

// Signature: canUnify(equation)
// Purpose: Compare the structure of the type expressions of the equation
const canUnify = (eq: Equation): boolean =>
    T.isProcTExp(eq.left) && T.isProcTExp(eq.right) &&
    (eq.left.paramTEs.length === eq.right.paramTEs.length);

// Signature: splitEquation(equation)
// Purpose: For an equation with unifyable type expressions,
//          create equations for corresponding components.
// Type: [Equation -> List(Equation)]
// Example: splitEquation(
//            makeEquation(parseTExp('(T1 -> T2)'),
//                         parseTExp('(T3 -> (T4 -> T4))')) =>
//            [ {left:T2, right: (T4 -> T4)},
//              {left:T3, right: T1)} ]
// @Pre: isCompoundExp(eq.left) && isCompoundExp(eq.right) && canUnify(eq)
const splitEquation = (eq: Equation): Equation[] =>
    (T.isProcTExp(eq.left) && T.isProcTExp(eq.right)) ?
        R.zipWith(makeEquation,
                  cons(eq.left.returnTE, eq.left.paramTEs),
                  cons(eq.right.returnTE, eq.right.paramTEs)) :
    [];

Exercise: Extend the Type Inference system to support `if` expressions

The part of the Type Inference System which “knows” L5 Expressions is the function makeEquationsFromExp(exp, pool).

The structure of this function is a case for each type of L5 expression.

The code we provide supports:

Procedure
Application
Number and Boolean
Primitive Procedures extend the code to support Scheme expressions of the type if.

From Typing Rule for if-expressions the type equations derived from a composite if-expression: (if _p _c _a) we derive 3 equations:

Tp = boolean
Tc = Ta
Tif = Ta

// Purpose: compute the type of an if-exp
// Typing rule:
//   if type<test>(tenv) = boolean
//      type<then>(tenv) = t1
//      type<else>(tenv) = t1
// then type<(if test then else)>(tenv) = t1
const typeofIf = (ifExp: A.IfExp, tenv: E.TEnv): Result<T.TExp> => {
    const testTE = typeofExp(ifExp.test, tenv);
    const thenTE = typeofExp(ifExp.then, tenv);
    const altTE = typeofExp(ifExp.alt, tenv);
    const constraint1 = bind(testTE, (testTE: T.TExp) => checkEqualType(testTE, T.makeBoolTExp(), ifExp));
    const constraint2 = bind(thenTE, (thenTE: T.TExp) => 
                            bind(altTE, (altTE: T.TExp) => 
                                checkEqualType(thenTE, altTE, ifExp)));
    return bind(constraint1, (_c1: true) =>
                bind(constraint2, (_c2: true) => thenTE));
};

Substitution

// Implementation of the Substitution ADT
// ========================================================
// A substitution is represented as a 2 element list of equal length
// lists of variables and type expression.
// The empty substitution is [[], []]

export interface Sub {tag: "Sub"; vars: TVar[]; tes: TExp[]; }
export const isSub = (x: any): x is Sub => x.tag === "Sub";

// Constructors:
// Signature: makeSub(vars, tes)
// Purpose: Create a substitution in which the i-th element of 'variables'
//          is mapped to the i-th element of 'tes'.
// Example: makeSub(
//             map(parseTE, ["x", "y", "z"]),
//             map(parseTE, ["number", "boolean", "(number -> number)"])
//          => {tag: "Sub", vars: [x y z], [numTexp, boolTexp, ProcTexp([NumTexp, NumTexp])]}
//          makeSub(map(parseTE, ["x", "y", "z"]),
//                  map(parseTE, ["number", "boolean", "(z -> number)"]))
//          => error makeSub: circular substitution
// Pre-condition: (length variables) = (length tes)
//                variables has no repetitions (set)
export const makeSub = (vars: TVar[], tes: TExp[]): Result<Sub> =>
    mapv(zipWithResult(checkNoOccurrence, vars, tes), _ => ({ tag: "Sub", vars: vars, tes: tes }));

export const makeEmptySub = (): Sub => ({tag: "Sub", vars: [], tes: []});

Whenever creating a substitution, we verify that the invariant holds that when a TVar is associated to a TExp, it does not occur inside the TExp.

This is enforced by this function:

// Purpose: when attempting to bind tvar to te in a sub - check whether tvar occurs in te.
// Return error if a circular reference is found.
export const checkNoOccurrence = (tvar: TVar, te: TExp): Result<true> => {
    const check = (e: TExp): Result<true> =>
        isTVar(e) ? ((e.var === tvar.var) ? bind(unparseTExp(te), up => makeFailure(`Occur check error - circular sub ${tvar.var} in ${JSON.stringify(up, null, 2)}`)) : 
                                            makeOk(true)) :
        isAtomicTExp(e) ? makeOk(true) :
        isProcTExp(e) ? bind(mapResult(check, e.paramTEs), _ => check(e.returnTE)) :
        makeFailure(`Bad type expression ${e} in ${JSON.stringify(te, null, 2)}`);
    return check(te);
};

Note on how we compare two substitutions to perform tests In order to compare two substitutions, we “normalize” them by sorting their string representation and then compare the strings.

This in effect achieves set equality:

export const subToStr = (sub: Sub): Result<string> =>
    bindResult(zipWithResult((v, t) => bindResult(unparseTExp(t), up => makeOk(`${v.var}:${up}`)), sub.vars, sub.tes),
               (vts: string[]) => makeOk(vts.sort().join(", ")));

Example of applySub from L5-substitutions-ADT-tests.ts:

export const applySub = (sub: Sub, te: TExp): TExp =>
    isEmptySub(sub) ? te :
    isAtomicTExp(te) ? te :
    isTVar(te) ? subGet(sub, te) :
    isProcTExp(te) ? makeProcTExp(map((te) => applySub(sub, te), te.paramTEs), applySub(sub, te.returnTE)) :
    te;

describe('applySub', () => {
        it('applies a substitution on a type expression', () => {
            const sub1 = sub(["T1", "T2"], ["number", "boolean"]);
            const te1 = parseTE("(T1 * T2 -> T1)");
            const unparsed = bind(sub1, (sub: S.Sub) =>
                                bind(te1, (te: TExp) =>
                                    unparseTExp(S.applySub(sub, te))));
            expect(unparsed).toEqual(makeOk("(number * boolean -> number)"));
        });
    });

Example of combineSub From L5-substitutions-ADT-tests.ts:

    describe('combineSub', () => {
        it('combines substitutions (the o operator)', () => {
            // {T1:(number -> S1), T2:(number -> S4)} o {T3:(number -> S2)} =>
            // {T1:(number -> S1), T2:(number -> S4), T3:(number -> S2)}
            const sub11 = sub(["T1", "T2"], ["(number -> S1)", "(number -> S4)"]);
            const sub12 = sub(["T3"], ["(number -> S2)"]);
            const expected1 = bind(sub(["T1", "T2", "T3"], ["(number -> S1)", "(number -> S4)", "(number -> S2)"]), subToStr);
            const res1 = bind(safe2(S.combineSub)(sub11, sub12), subToStr);
            expect(res1).toEqual(expected1);
        });
    });

Principles of Programming Languages

Course Website

Type Inference System

Practical Session - Week #8

System Architecture

Layer 1: Languages Definitions

L5-AST

TExp-AST

Layer 2: Substitution and Equation ADTs

Substitution ADT

Equation ADT

Layer 3: Exp-to-Pool, Exp-to-Equations, Solve

Exercise: Extend the Type Inference system to support `if` expressions

Substitution

Type Inference System

Practical Session - Week #8

System Architecture

Layer 1: Languages Definitions

L5-AST

TExp-AST

Layer 2: Substitution and Equation ADTs

Substitution ADT

Equation ADT

Layer 3: Exp-to-Pool, Exp-to-Equations, Solve

Exercise: Extend the Type Inference system to support if expressions

Substitution

Exercise: Extend the Type Inference system to support `if` expressions