Operational Semantics

PPL 2023

The operational semantics of a programming language is specified by a set of formal evaluation rules that operate on the AST of an expression. The evaluation process can be specified as an algorithm eval(exp) which maps an AST to a Value.

In this lecture, we go back to the definition of languages \(L1\) to \(L3\) (which are all subsets of Scheme) and present their operational semantics in a more formal manner, based on the definition of the AST of the languages and the method of structural induction. For each language, we also specify the set of Values that can be computed by the language and review different implementation options for the domain of Values.

In Sections 1 to 4, we studied the formal description of the syntax of programming languages. In Sections 5 to 8, we study the formal description of their semantics - using the operational semantics approach. The tool we use is to describe the implementation of interpreters for these languages using a functional subset of TypeScript as a metalanguage.

The pipeline of operations we describe is:

Concrete Syntax (string) ==> (Parser) ==> Abstract Syntax (AST) ==> (Interpreter) ==> Value

\(L1\) Operational Semantics

\(L1\) is a language in which primitive operators and primitive values can be combined recursively. In addition, composite expressions can be given a name and bound to variables using the define special form.

For example, the following is a program in \(L1\):

(L1
  (define x (+ (* 2 3) (* 4 5)))
  (+ x (* 2 2)))

\(L1\) Syntax

Let us summarize the syntax of \(L1\) using the BNF + Abstract Syntax specification we have developed in the previous lectures:

<program> ::= (L1 <exp>+) // program(exps:List(exp))
<exp> ::= <define-exp> | <cexp>
<define-exp> ::= (define <var-decl> <cexp>) // def-exp(var:var-decl, val:cexp)
<cexp> ::= <num-exp> // num-exp(val:Number)
       | <bool-exp>  // bool-exp(val:Boolean)
       | <prim-op>   // prim-op(op:String)
       | <var-ref>   // var-ref(var:String)
       | (<cexp> <cexp>*) // app-exp(rator:cexp, rands:List(cexp))
<prim-op> ::= + | - | * | / | < | > | = | not
<num-exp> ::= a number token
<bool-exp> ::= #t | #f
<var-ref> ::= an identifier token
<var-decl> ::= an identifier token

The corresponding TypeScript implementation of this AST using the Disjoint-Union pattern and with the parseL1 AST factory function is provided in L1-ast.ts which maps a string containing the concrete syntax of a program in \(L1\) to its AST representation.

\(L1\) Values

To determine which values can be computed by \(L1\) programs, we proceed inductively on the structure of \(L1\)-ASTs.

Altogether, we conclude that the set of all possible values computed by \(L1\) programs is:

Value = Number | Boolean | PrimOp | Void

\(L1\) Value Type

We decide to use the following type in the metalanguage (TypeScript) to represent the values that can be computed by \(L1\) expressions:

Value = Number | Boolean | PrimOp

Representing Primitive Operators

The decision to represent Number and Boolean using the corresponding value types in the meta-language (in our case, in TypeScript) is natural. Note, however, that there are many options to represent Numbers in different languages. For example, the way Scheme represents integers and rational numbers is very different from the way they are represented in JavaScript or C (experiment in Racket to see the differences). In this implementation of the interpreter, we will ignore these differences.

We must then decide how to represent primitive operators as computed values - so that we can decide which value to return when we compute the expression +.

In Scheme, when we compute this expression, we get:

> +
#<procedure:+>

That is, the value of the + expression (which is an expression of type Symbol) is a procedure in Scheme. If we implement primitive values as procedures, we rely on the fact that our meta-language (TypeScript) is a functional language which supports fist-class procedures: a procedure value can be bound to a variable.

In JavaScript (and TypeScript), primitive operators are not variables bound to procedures. This can be verified by the following test:

const plus = (x,y) => x + y;
plus
// --> [Function]
+
// --> Error

We observe that if we bind a function to a variable, then its value is returned as an object of type [Function]. In contrast, the expression + (where + can be any primitive operator) is not a well formed expression in TypeScript.

We decide in our language \(L1\) to represent primitive operators as strings, and in the code of the interpreter, map each primitive operator to the underlying primitive operation in the meta-language. We will adopt this solution in our first model and in our implementation.

We represent primitive operators as a specify expression type in the AST PrimOp(op:PrimOpKeyword), where primOpKeyword is an enumerated type of all the defined primitive operators. The value of a PrimOp expression is itself when we evaluate the expression. When we apply a primitive operator to arguments, we explicitly dispatch to each known primitive operator in the language and apply the corresponding primitive operation in the meta-language. The exhaustive enumeration of operators is terminated by a never case.

// There are type errors which we will address in L3
const applyPrimitive = (proc: PrimOp, args: Value[]): Result<Value> =>
    proc.op === "+" ? makeOk(reduce((x, y) => x + y, 0, args)) :
    proc.op === "-" ? makeOk(reduce((x, y) => x - y, 0, args)) :
    proc.op === "*" ? makeOk(reduce((x, y) => x * y, 1, args)) :
    proc.op === "/" ? makeOk(reduce((x, y) => x / y, 1, args)) :
    proc.op === ">" ? makeOk(args[0] > args[1]) :
    proc.op === "<" ? makeOk(args[0] < args[1]) :
    proc.op === "=" ? makeOk(args[0] === args[1]) :
    proc.op === "not" ? makeOk(!args[0]) :
    proc.op;  // never

Variadic Primitives

We adopt Scheme’s model for the arithmetic operators: they are variadic - meaning that they can accept any number of arguments (from 0 and up).
Since the AST of application forms (app-exp) supports any number of arguments, the syntax of \(L1\) also supports expressions of the form: (+ 1 2), (+ 1 2 3 4) and even (+ 1) and (+). In the meta-language, to apply a procedure to a list of arguments, we use reduce.

Think about what the value of (+) and (*) without parameters should be (compare with the general problem of finding the initial value passed to a reduce operation).

It turns out the handling of ‘-‘ and ‘/’ is more complex than ‘+’ and * because they are not associative operators. The code above does not reflect correctly the way - works in Scheme. We will revise this problem in the interpreter for \(L3\).

Check in Scheme what is the value returned by (-) and (/).

Type Strict Primitives

We have already pointed at the difference in behavior of primitives in Scheme and in JavaScript:

In the implementation of the \(L1\) interpreter, we do not test the type of arguments at runtime.
We also do not attempt to “do the right thing” when given unexpected arguments. Instead we silently compute “junk”.

In the implementation of \(L3\), we will properly implement type checking for primitives.

Representing the value returned by define

We need to decide how to handle the case of define expressions, because these expressions behave in a special way: In Scheme, a define expression does not return any value:

> (define x 1)
> 

This is unusual for a functional language - where all expressions are expected to return a value. The reason define does not return a usable value is that it is only used for its side-effect (which is to create a new binding for the variable with universal scope).

In non-functional languages, such expressions are called statements - as opposed to expressions which return a value.

To avoid the dichotomy expression/statement, we will adopt Scheme’s model which is that expressions that have a side-effect return a special type of value - which is called void.
The void type contains a single value (also called void). In TypeScript, there are multiple strange values that can be used to encode the void value. We choose to use the undefined TypeScript value for this purpose.

We thus adopt the convention that the value of a define expression will be the undefined value.

Value = number | boolean | string | PrimOp | undefined

\(L1\) Evaluation Rules

The operational semantics of \(L1\) is a function which maps inductively any Expression in \(L1\) to a Value:

eval: Expression -> Value

We have now established how to represent a Value. Let us define the eval algorithm in an inductive manner - by starting from the base cases - atomic expressions, and then moving up to composite expressions.

1. Evaluation of Atomic Expressions

// Number atomic literal expressions evaluate to number values.
1. eval(NumExp(val)) => val         
// Boolean atomic literal expressions evaluate to boolean values true and false.
2. eval(BoolExp(val)) => val        
// Primitive procedures evaluate to the primitive operation 
3. eval(PrimOp(op)) => PrimOp(op)   
// Variables are evaluated by looking up their value in the global environment - see below definition.
4. eval(VarRef(var)) => applyEnv(env, var) 

2. Evaluation of Compound Forms

The evaluation of compound forms involves the recursive evaluation of parts of the compound forms, followed by a rule that determines how to combine the resulting values to obtain the value of the compound form.

For special forms, not all the parts of the compound form are always evaluated.
The order in which the parts is evaluated is determined by the computation rule of the compound form type.
In \(L1\) there is a single special form - define.

1. eval(DefineExp(var, val)) => 
    // var is of type VarDecl
    // val is of type Cexp
    let val:Value = eval(val)
        add the binding <(var-decl->var var), val> to the global environment
        return undefined.
2. eval(AppExp(rator, rands)) =>
    // rator is of type Cexp
    // rands is of type List(Cexp)
    let proc = eval(rator)
        args = [eval(r) for r in rands]
        return applyProc(proc, args)

Global Environment, Variable References and Define Expressions

In the specification of the eval algorithm above, we need to clarify two clauses:

// Variables are evaluated by looking up their value in the global environment.
eval(VarRef(var)) => applyEnv(env, var) 
eval(DefineExp(var,val)=> ... add the binding <(var-decl->var var), val> to the global environment ...

These evaluation rules help us deal with the following cases:

The answer is that it depends on the context. If the variable x has been previously defined (using a define expression), then x should remember this value and return it. Else, the evaluation should fail.
In other words, the evaluation of variable references requires us to maintain a memory: the memory is filled when we define a variable, it is accessed when we evaluate a variable reference.

We implement this memory with an object we call the global environment.
We model an environment as a partial function which maps variable references to values. A function is partial when it is defined on a restricted domain - in our case, not all variable references will be defined in a given environment, and the function is actually a finite function (defined on a finite set of values). When the function is invoked outside of its domain, it returns a failure.

We model environments as an inductive data type, to reflect the fact that environments can be extended (this is what happens when we define a new variable and bind it to a value). We adopt the method discussed in Chapter 1.5 to model environments (which are mutable data structures) in a functional manner.

Recall that the strategy to implement mutable data structures in a functional manner consists of the following steps:

In the case of the environment data structure, we obtain the following inductive data type definition:

env ::= empty-env | extended-env
empty-env // empty-env()
extended-env(var, val, env) // extended-env(var:string, val:Value, next-env:Env)

That is, we define an environment as either:

On the basis of this inductive definition, we define a single value accessor for environment, which we call applyEnv (see full source code in L1-eval.ts):

// ========================================================
// Environment data type for L1

export type Env = EmptyEnv | NonEmptyEnv;
export type EmptyEnv = {tag: "EmptyEnv" };
export type NonEmptyEnv = {
    tag: "Env";
    var: string;
    val: Value;
    nextEnv: Env;
};
export const makeEmptyEnv = (): EmptyEnv => ({tag: "EmptyEnv"});
export const makeEnv = (v: string, val: Value, env: Env): NonEmptyEnv =>
    ({tag: "Env", var: v, val: val, nextEnv: env});
const isEmptyEnv = (x: any): x is EmptyEnv => x.tag === "EmptyEnv";
const isNonEmptyEnv = (x: any): x is NonEmptyEnv => x.tag === "Env";
const isEnv = (x: any): x is Env => isEmptyEnv(x) || isNonEmptyEnv(x);

export const applyEnv = (env: Env, v: string): Result<Value> =>
    isEmptyEnv(env) ? makeFailure("var not found " + v) :
    env.var === v ? makeOk(env.val) :
    applyEnv(env.nextEnv, v);

To lookup a variable v in an environment env, we apply the env function on the variable. This is implemented recursively:

As usual - this computation is an instance of the structural induction principle (we exhaustively traverse all the cases of the env type definition).

Here is an example using this data structure:

// Lookup of any variable in an empty env fails
isFailure(applyEnv(makeEmptyEnv(), "x"))
// --> true
// Lookup of a variable defined in an extended env succeeds
applyEnv(makeEnv("x", 1, makeEmptyEnv()), "x")
// --> 1
// Lookup of a variable that is not defined in an extended env fails
isFailure(applyEnv(makeEnv("x", 1, makeEmptyEnv()), "y"))
// --> true
// Lookup of a variable that is defined in a deeper env is retrieved
// Here we have 2 levels: (<y 2> <x 1> <empty>)
applyEnv(makeEnv("y", 2, 
         makeEnv("x", 1, 
         makeEmptyEnv())), "x")
// --> 1        
// Lookup of a variable that is defined in a deeper env is overridden by a newer binding
// Here we have 2 levels: (<x 2> <x 1> <empty>)
applyEnv(makeEnv("x", 2, 
         makeEnv("x", 1, 
         makeEmptyEnv())), "x")
--> 2         

Handling Variable References

Given this definition of the environment data type, let us review how to handle variable references and define expressions. We first observe that mutation to the environment are necessary when evaluating a program (that is, a sequence of expressions that can include define expressions), while the evaluation of cexp expressions does not require any mutation. Yet, the evaluation of variable references (var-ref expressions) requires access to an environment.

We split the implementation of the eval algorithm in two cases:

L1eval handles the case of evaluating a variable reference with respect to a given environment. The evaluation rule for VarRef expressions is now clarified:

// Variables are evaluated by looking up their value in the global environment.
L1eval(VarRef(var)) => applyEnv(env, var) 

We still need to determine how the value of env is obtained so that it can be used in this clause.

Handling DefineExp and Evaluating Programs

Let us now address the issue of evaluating a program - which is a sequence of expressions, which can be either define expressions, which update the current environment (and return void), or cexp expressions, which return a value.

L1evalProgram(program) receives a program, which includes an ordered sequence of expressions. It iterates over the expressions and depending on the type of each expression, it performs the following:

DefineExp(var, val) =>
    let value = L1eval(val, env)
        if there are more expressions in the program:
            let newEnv = extendEnv(var, val, env)
                continue evaluating remaining expressions in newEnv
        else
            return void

Cexp =>
    let value = L1eval(cexp, env)
        if there are more expressions in the program
            continue evaluating remaining expressions in env
        else
            return value

This algorithm is implemented in L1-eval.ts.

The function L1applicativeEval has the typical structure of a syntax-driven function, as was reviewed in Section 2.4 with a conditional clause for each type of AST expression. It returns a Result<Value> given a CExp and the current Env.

const L1applicativeEval = (exp: CExp, env: Env): Result<Value> =>
    isNumExp(exp) ? makeOk(exp.val) :
    isBoolExp(exp) ? makeOk(exp.val) :
    isPrimOp(exp) ? makeOk(exp) :
    isVarRef(exp) ? applyEnv(env, exp.var) :
    isAppExp(exp) ? bind(mapResult((rand: CExp) =>  L1applicativeEval(rand, env), exp.rands), (rands: Value[]) => 
                         L1applyProcedure(exp.rator, rands)) :
    exp;

The L1evalProgram invokes evalExps to evaluate the sequence of expressions that appear inside the program with an initially empty environment. The expressions can either be of type DefineExp (which modify the current environment) or CExp (which have no side effect). To model the updates of the current environment in a functional manner, we implement in the evalDefineExps the process that creates a new environment after each define, and passes the new environment to the evaluation of the next expressions.

// Purpose: evaluate a program made up of a sequence of expressions.
// When def-exp expressions are executed, thread an updated env to the continuation.
// For other expressions (that have no side-effect), execute the expressions sequentially.
export const L1evalProgram = (program: Program): Result<Value> =>
    evalSequence(program.exps, makeEmptyEnv());

// Evaluate a sequence of expressions (in a program)
export const evalSequence = (seq: Exp[], env: Env): Result<Value> =>
    isEmpty(seq) ? makeFailure("Empty sequence") :
    evalSequenceFirst(first(seq), rest(seq), env);

const evalSequenceFirst = (first: Exp, rest: Exp[], env: Env): Result<Value> =>
    isDefineExp(first) ? evalDefineExps(first, rest, env) :
    isEmpty(rest) ? L1applicativeEval(first, env) :
    bind(L1applicativeEval(first, env), _ => evalSequence(rest, env));

// Eval a sequence of expressions when the first exp is a Define.
// Compute the rhs of the define, extend the env with the new binding
// then compute the rest of the exps in the new env.
const evalDefineExps = (def: DefineExp, exps: Exp[], env: Env): Result<Value> =>
    bind(L1applicativeEval(def.val, env),
         (rhs: Value) => evalSequence(exps, makeEnv(def.var.var, rhs, env)));

Interestingly, we implemented a form of mutation in \(L1\) (the evaluation of define) without mutation in the interpreter. This is obtained by using a functional implementation of the environment and threading the updated value of this environment (which is a new constructed value, obtained without mutation) at each step of the evaluation process.

Procedure Calls

The last aspect of the operational semantics that is left to clarify is how procedure calls are handled. In \(L1\), the only procedures that can be applied are primitives, since we have not yet provided a way to define user procedures (we will do this next in \(L2\)).

Consider for example the computation of this \(L1\)-expression:

(+ (* 2 3) (- 3 2))

The way this expression is evaluated according to the computation rule for AppExp(rator, rands) expressions is:

    isAppExp(exp) ? bind(mapResult((rand: CExp) =>  L1applicativeEval(rand, env), exp.rands), (rands: Value[]) => 
                         L1applyProcedure(exp.rator, rands)) :

We first evaluate all the arguments, then we invoke the procedure on the computed values. In the example above, it means that we compute the expression in this order:

  1. Compute (* 2 3) and (- 3 2)
  2. Compute (+ 6 1)

Recursively, if we have an expression that is nested deeper, we start by computing the inner-most sub-expression, and then move up towards the root of the AST.

The operational semantics does not specify the order of execution among the arguments - we could compute (* 2 3) first and (- 3 2) next, or in reverse, or even together (in parallel).

The procedure apply-proc does not need an env parameter because it receives only values, not expressions, and in particular, it does not receive any variable reference or any object that may contain a variable reference.

\(L1\) Operational Semantics Summary

The complete code of the \(L1\) interpreter which implements the operational semantics described here is available in the following program:
L1-eval.ts

Tests are available in L1-eval.test.ts

In general, the operational semantics is defined as a syntax-driven traversal of an input AST. The algorithm of this AST traversal is specified inductively:

  1. Evaluation of atomic expressions
    // Number atomic literal expressions evaluate to number values.
    eval(NumExp(val), env)  => val 
    // Boolean atomic literal expressions evaluate to boolean values true and false.
    eval(BoolExp(val), env) => val 
    // Primitive procedures evaluate to the primitive operation 
    eval(PrimOp(op), env)   => PrimOp(op) 
    // Variables are evaluated by looking up their value in the global environment.
    eval(VarRef(var), env)  => applyEnv(env, var) 
    
  2. Evaluation of compound forms
    eval(DefineExp(var, val)) =>
     // var is of type var-decl
     // val is of type cexp
     let val:Value = eval(val)
         add the binding [var, val] to the global environment
         return void.
    eval(AppExp(rator, rands)) =>
     // rator is of type cexp
     // rands is of type List(cexp)
     let  proc = eval(rator, env)
          args = [eval(r,env) for r in rands]
          return applyProc(proc, args)
    

The key decisions we made are:

The set of computed values for \(L1\) is:

Value = Number | Boolean | Prim-ops | Void

Variables in \(L1\) are bound to Values.