Chapter 2 - Syntax and Semantics with Scheme
PPL 2023
We have introduced the following main concepts in Chapter 1 through examples in JavaScript and TypeScript:
- Different programming languages encourage different programming practices and make other practices difficult by providing programming tools and idioms. We compared the familiar procedural programming paradigm with the functional programming (FP) paradigm and identified the key tools each provides (for FP: first-class citizen functions, immutability and functional abstractions).
-
We distinguished two aspects that participate in the definition of a programming language: the syntax of expressions, on the one hand, and the set of values that can be computed when evaluating the expressions in the language on the other. The process of mapping from expression to value is called evaluation. The description of the evaluation algorithm for a given algorithm provides the operational semantics of the language.
- Atomic vs. Compound
- In the syntactic description of a language, we distinguished atomic expressions (which have no sub-components) and compound expressions (which have sub-expressions).
- Similarly, in the semantic domain, we distinguish atomic values (such as boolean and numbers) and compound values (such as arrays and maps)
- Data types denote specific subsets of values over which operations can be performed uniformly. Programming languages define primitive data types (sets of values with well defined operations). They also provide means to define user defined types
- A type system defines a type language to construct new data type expressions to denote useful sets of data structures. Relations of subtyping and disjoint types are derived from this perspective on types, and operations on types such as union, intersection and product provide ways to construct new types.
- A language with type checking allows programmers to annotate variables and functions with type annotations which express constraints on which values can be bound to these variables in any execution of the program. The type checker can establish that a given program satisfies the type declaration constraints.
- The structure of type definitions is important in general because it provides an abstract model of the structure of the data values expected by functions, and the type of values they can return. The structure of the code in the functions usually follows the structure of the data it receives.
In Chapter 2, we develop these concepts in a more systematic and in-depth manner.
The method we use is to design a new programming language, to specify exactly its syntax and its operational semantics, and to implement this specification into an executable interpreter.
Interpreters
In this chapter, we define a small but complete programming language. We demonstrate what are the elements necessary to define a programming language.
We first present these elements informally - to define a subset of the Scheme language.
This language is a simple functional language.
We then move on to define the elements formally in the form of:
- The syntax of the language (defining the set of expressions which belong to the language)
- The operational semantics of the language (defining how to map any expression in the language to a value in a recursive manner).
We implement these formal definitions into concrete programs - which together form an interpreter of the language.
As summarized by Hal Abelson in the introduction to Essentials of Programming Languages (Friedman, Wand and Haynes, 2000) - this chapter: brings us face to face with the most fundamental idea in computer programming: The interpreter for a computer language is just another program.
Interpreters are interesting because:
- They clarify what programs do when they are executed
- They illustrate how to build a wide class of programs which transform complex data from one form into another based on syntactic structure.
Object Language vs. Meta Language
We use interpreters to implement the formal definitions of a programming language.
The interpreter itself is written in a programming language, and it defines a programming language.
To distinguish between these two programming roles, we use the following terminology:
- The language we define and describe is called the object language
- The language we use to implement the interpreter is called the meta language
The choice of each of these languages (object and meta) depends on the objective we set for ourselves.
Scheme as an Object Language
We introduce a subset of Scheme to illustrate how to specify a programming language and how to implement a full interpreter because Scheme is a small language (it has few primitives, few data types, an extremely simple and regular syntax) and a simple language (the evaluation rules are consistent and simple, there are not many special cases). Yet, Scheme is an expressive language - it is Turing complete, and because it follows the functional programming paradigm, it allows the definition of powerful functional abstractions which make programming productive.
All of these elements stand in contrast with JavaScript - which has evolved into a large language with many primitives, a very complex syntax, and because it is a multi-paradigm language - supporting FP, OOP, procedural programming and more - JavaScript has a complex evaluation semantics.
We thus select to describe a small and simple but full language - a subset of Scheme - as the object language of this course.
TypeScript as a Meta Language
We select to use a subset of TypeScript used as a Functional Language to implement the interpreter.
One of the advantages of this decision is that we exploit the TypeScript type system to encode the objects manipulated in the interpreter - abstract syntax trees and values. If the algorithm of the operational semantics is well understood, it can be implemented as a pure functional model in a code that is surprisingly short and elegant - and ported to any functional language.
Let us engage thus in developing a programming language bottom up, starting from small blocks and building up into a fuller version of a functional programming language.
Elements of a Programming Language
How do we specify a programming language? What are the elements that together define a programming language?
The key elements are:
- Primitives: expressions whose evaluation is built-in in the interpreter and which are not explained by the semantics of the language. These include primitive operations (for example, arithmetic operations on number or comparison operators) and primitive literal values (for examples, numbers or boolean values).
- Combination means: ways to create compound expressions and compound values from simpler ones.
- Abstraction means: ways to manipulate compound objects (expressions or data values) as standalone units by giving them simple names.
The language definition is structured in two aspects:
- Expressions are the words and sentences of the language.
- Values are the results of the computation of expressions according to the evaluation rules of the language. Values belong to the semantic domain of the language.
To describe our object programming language, we first present all the types of expressions in the language (this is called the syntax of the language) on the one hand, and all the possible values that can be computed by the language on the other. The syntax is the input to the interpreter, the values are the output of the interpreter.
We will introduce the syntax and semantics of the Scheme subset in several iterations - starting from the simplest forms of expressions, then describing the rules to evaluate their value, then introducing more complex forms of expressions and their evaluation rules. We will give names to each of the stages, starting from a very small language \(L1\), and growing the language to larger stages, \(L2\) and \(L3\).
Expressions
We distinguish
- Atomic expressions
- Compound expressions
Atomic Expressions
The following are the types of atomic expressions in Scheme:
- Literal numbers: they are written as numbers such as
1
- Literal booleans: the true and false values are written
#t
and#f
. They are primitive literal atomic expressions. - Primitive procedures: are atomic expressions - they include arithmetic primitive operators
+
,-
,*
,/
and comparison operators<
,>
,=
.
Compound Expressions
The only syntactic form used to combine expressions into complex expressions in Scheme is to arrange them into parentheses.
(+ 45 78) ;--> 123
(- 56 9) ;--> 47
(* 6 50) ;--> 300
(/ 25 5) ;--> 5
These expressions are called forms.
In this structure, we always refer to the leftmost sub-expression of the form as an operator and the rest of the sub-expressions as operands.
Scheme compound expressions are always written in prefix notation.
Forms can be nested recursively:
(+ (* 3 15) (- 9 2)) ;--> 52
It helps to pretty print (or indent) such complex expressions to figure out their internal structure:
(+ (* 3
(+ 4 2)
(* 2 5))
7)
Expressions are evaluated by the interpreter and return a value. In Scheme, there are only expressions in the language. This is in contrast to JavaScript which contains expressions and statements (syntactic elements which, when evaluated, do not return a value, but simply execute a command).
Variables and Values
The programming language provides means to name objects. This is a fundamental form of abstraction: using a simple name instead of using a complex value or a complex expression.
In Scheme, define
is used to bind a name to the value of an expression.
(define size 6)
(* 2 size) ;--> 12 ;; size is now understood
size
in this context is called a variable.
The relation between a variable and the value it denotes is called a binding.
Variables can be used as atomic expressions.
They are evaluated to the value to which they were bound using define
.
define
is a form of abstraction because it allows the programmer to use names (variables) instead of complex operations.
(define area (* size size)) ;--> 36
In the syntax of a define
operation, define
is a special operator - it indicates that a special operation must be performed by the interpreter to evaluate the (define <var> <expression>)
form.
The result of evaluating a define
form is that the interpreter remembers that the variable is now bound to a value.
The steps of this evaluation are:
Evaluation rule for forms of the type: (define <var> <exp>)
1. Let val = Evaluate(<exp>)
2. Add the binding < <var>, val > to the global environment
The global environment is a function which maps variable names to values.
It is best to think of it as an object of type variable => value
.
Expression Types and Evaluation Rules - Round 1
We have now presented different expression types with the corresponding evaluation rules for each type of expression.
Let us call this language \(L1\) and summarize the rules to evaluate all of the expression types in \(L1\).
We also summarize the set of all possible values computed by \(L1\).
Expression Types
All the expression types presented so far are:
- Atomic expressions:
- number literal expression (0, 1, 2, …)
- boolean literal expression (#t and #f)
- primitive operation expression (+, -, *, /, <, >, =)
- variable expression (a well formed name - made up of letters and punctuations)
- Compound expressions:
- Special compound expressions:
(define <var> <exp>)
- where<var>
is a variable expression and<exp>
is any expression except a define expression. - Non-special compound expressions: \((exp_0 \ldots exp_n)\) - where each \(exp_i\) is any expression type except a define expression.
- Special compound expressions:
This inductive definition corresponds to the set \(Expression\) of all possible expressions in the language. (The definition is inductive because we use the term expression to define what are compound expressions.)
Evaluation Rules for Expressions
We define the function \(evaluate: Expression \rightarrow Value\) in an inductive manner:
1. Evaluation of atomic expressions:
- Variables are evaluated by looking up their value in the global environment.
- Primitive atomic expressions evaluate to their pre-defined denoted value.
- Number atomic literal expressions evaluate to number values.
- Boolean atomic literal expressions evaluate to boolean values true and false.
- Primitive procedures evaluate to the primitive function that performs the denoted operation.
2. Evaluation of compound special forms:
- For each special form - a special evaluation rule exists.
- The special form
(define <var> <exp>)
is evaluated according to this rule:- Let val =
evaluate(<exp>)
- Add the binding
< <var>, val >
to the global environment. - The form returns a special
void
value.
- Let val =
- The special form
In this rule, the sub-expression <var>
is not evaluated.
We have only introduced one special form (define
) so far - we will introduce more later.
3. Evaluation of compound non-special forms:
All compound forms are of the form \((exp_0 \ldots exp_n)\).
- Let \((val_0 \ldots val_n) = (evaluate(exp_0) \ldots evaluate(exp_n))\)
- Apply the procedure \(val_0\) to the values \((val_1 \ldots val_n)\).
NOTE: We assume that \(val_0\) evaluates into a primitive procedure - otherwise the second step above will not succeed. Can you think of which expressions are evaluated into primitive procedure values?
NOTE: Observe that this definition of the evaluate
function is inductive - and that it follows the inductive definition of the set of expressions.
This is an instance of the general principle we discussed in Section 1.4
which indicates that the structure of the types we define determines the structure of the code that processes values of the type.
Computed Values
Looking at all the evaluation rules, we can summarize the set of all possible values that can be returned by an invocation of evaluate:
- Number values
- Boolean values
- Primitive operations (the value of primitive operators like
+
or<
). - void (the value of
define
expressions)
Example Programs in \(L1\)
Let us write a few examples of programs in \(L1\):
5 ;--> 5
(* 3 2) ;--> 6
(+ (* 3 2) 4) --> 10
(> 2 3) ;--> #f
(= 2 (+ 1 1)) ;--> #t
Expressions in general are evaluated one by one. The order in which expressions are evaluated does not change their value.
Only in the case of the define
form, there is a side-effect which makes the sequence of expressions significant:
(define radius 12)
(define pi 3.14)
(define area (* (* radius radius) pi))
(+ area (* 2 3))
Sequences in \(L1\)
In order to make sense of a program that includes define
forms, we must define a compound expression which is a sequence of expressions and its evaluation rule. We will describe the details of the evaluation rule for sequences including define-expressions in more details later.
What is Not in \(L1\)
Let us try to assess what programs can be written in \(L1\) as defined.
On the side of the restrictions - we have:
- Few primitives - and no way to define other functions besides primitive functions (no functional abstraction means).
- The computed values can only be numbers or booleans - there are no way to build compound values (no value composition means).
- We can define global variables and no scoping mechanism
- There are no control structures: no conditionals, no loops (sequence and embedding are the only control composition means)
On the positive side:
- We can build expressions as deeply nested as required
- We can give names to complex expressions so that they can be reused to avoid repetition
NOTE: consider the claim that programs in \(L1\) always terminate. Can you prove it? (Using structural induction).
\(L2\): User Defined Procedures and Conditional Expressions
Let us introduce two new types of expression into \(L1\) - leading to a new language we will call \(L2\). We choose to add together user defined procedures and conditional expressions - because these two language facilities work well together. The reason is that we will develop recursive functions - and when we write a recursive function, it helps to be able to test for the base case vs. recursive case.
Compound Procedure Definition with Lambda
lambda
is a special operator which can be used in a special form of type lambda
.
The syntax is:
(lambda (<var> ...) <exp> ...)
For example:
(lambda (x) (* x x))
This expression is a procedure expression. It has three sub-expressions:
- The special operator
lambda
- The list of parameters - all of which are variables.
- The body of the procedure - which is an expression.
When this expression is evaluated, it creates a value whose type is called a closure.
We will denote such values as <Closure (x) (* x x)>
.
NOTE: (lambda (x) (* x x))
is an expression. <Closure (x) (* x x)>
is a value.
This is a similar distinction as: the literal expression 2
is an expression. Its value is the number 2.
NOTE: The expression (lambda (x) (* x x))
in Scheme is equivalent to (x) => x * x
in TypeScript.
The difference is syntactic only:
- Scheme prefers prefix notation (which is the general syntactic preference of Scheme for all syntactic constructs).
- The only delimiters in Scheme are parentheses - whereas in TypeScript there is a selection of {} and punctuation (=>);
- In Scheme every syntactic construct is an expression - in contrast in JavaScript there can be expressions or statements.
When statements are used, the special form
return
must be used - when they are not, it can be skipped. In general, Scheme’s syntax is more consistent - but it can be less readable to the untrained eye.
Closures: Composite or Atomic Values?
A closure value contains multiple parts - the parameters and the body. But there are no accessors to take apart these components from the value.
This leads to an interesting distinction:
- From the programmer perspective, a closure is an atomic value.
- From the interpreter perspective, a closure is a compound data structure with accessible sub-components.
When a lambda
expression is evaluated, the body
is not evaluated.
Naming User Procedures
To give a name to a procedure, we use the existing define
mechanism:
(define square (lambda (x) (* x x)))
We will see later in the course that the capability to name procedures is a big deal - as it allows the definition of recursive functions - and, in particular, it changes the expressive power of the programming language.
Compound Procedure Application
When a lambda
expression is computed, we obtain a closure. Closures can then be applied to values.
This is the mechanism used when applying a closure to values:
Recall the evaluation rule for non-special forms in \(L1\):
Given a compound forms of the form \((exp_0 \ldots exp_n)\) where \(exp_0\) is not a special operator:
- Let \((val_0 \ldots val_n) = (evaluate(exp_0) \ldots evaluate(exp_n))\)
- Apply the procedure \(val_0\) to the values \((val_1 \ldots val_n)\).
In \(L1\) - the only possible procedure values were primitive procedures (the value of the atomic expressions +
, *
etc).
In \(L2\) - we also have user created closures.
The way a closure \(val_0\) = <Closure <p1, ..., pn> <exp1> ... <expk>>
is applied to values \((val_1, \ldots, val_n)\) is according to the following rule:
- Replace all occurrences of \(p_1, \ldots, p_n\) in the expressions
<exp1> ... <expk>
of the body of the procedure with the corresponding values \(val_1, \ldots, val_n\). - Evaluate all resulting expressions.
- The value returned by the application is the value of the last expression
<expk>
.
In summary:
- We introduced a new syntactic form (lambda expressions) and a new type of values (closures).
- We defined the evaluation rule for lambda expressions - which yield closures - without performing any recursive evaluation.
- Finally, we expanded the evaluation rule of non-special forms to work for the case where the operator value is a closure type - in addition to the known case in \(L1\) where the operator value was a primitive operator.
Conditional Expressions
We introduce a third special form to the syntax of the language, in addition to define
and lambda
together with its specific evaluation rule to enable conditional evaluation.
The syntax of a Scheme conditional expression is:
(if <exp> <exp> <exp>)
For example:
(if (> x 2) x (* x 2))
if
is a special operator - it has a special evaluation rule.
The three other sub-expressions are called the test-part, then-part, and else-part of the compound if-expression.
They can recursively be any type of expression.
NOTE: if-expressions are expressions - not statements as they are in TypeScript. They are evaluated into a value.
They are equivalent to the ternary ?:
operator we used in TypeScript.
Example: abs
(define abs (lambda (x) (if (> x 0) x (- x))))
(abs 2) ;--> 2
(abs -3) ;--> 3
if-expressions can be nested as needed to define complicated decisions:
(if (= x y)
0
(if (> x y)
1
-1))
Scheme also includes an additional special form called cond
which allows a more general form of conditional expression:
(cond (<p1> <e11> <e12> ... <e1k1>)
(<p2> <e21> <e22> ... <e2k2>>)
...
(else <en1> <en2> ... <enkn>>))
The sub-expressions of the cond
form are called clauses - each clause starts with a predicate-expression and is followed by consequence-expressions.
Evaluation Rule for If-expressions
To evaluate an if-expression (if <test-exp> <then-exp> <else-exp>)
:
- Let p =
evaluate <test-exp>
- If p is true, then evaluate
<then-exp>
and return this value as the value of the if-expression. - else evaluate
<else-exp>
and return its value as the value of the if-expression.
NOTE: When evaluating an if-expression, the <test-exp>
is always evaluated, but only one of <then-exp>
or <else-exp>
is evaluated. This is in contrast to what happens when we evaluate a non-special form - where all the sub-expressions are always evaluated.
Example Program in \(L2\)
The language we have defined so far is quite expressive. Let us define an example program demonsrating this: this program implements Newton’s method for computing square roots.
Newton’s method is stated as this algorithm:
- If y is non-zero guess for a square root of x, then \((y + x/y)/2\) is a better approximation of \(\sqrt x\).
To start this computation, we provide a non-zero guess like 1, and we need to decide when to stop guessing.
This algorithm is iterative:
- Is the current guess good enough? if yes return it.
- Else improve the guess and try again.
Interestingly - we can implement this iterative algorithm even though we have no construct in the language to iterate.
(define sqrt (lambda (x) (sqrt-iter 1 x)))
(define sqrt-iter
(lambda (guess x)
(if (good-enough? guess x)
guess
(sqrt-iter (improve guess x)
x))))
(define abs (lambda (x) (if (< x 0) (- x) x)))
(define square (lambda (x) (* x x)))
(define epsilon 0.0001)
(define good-enough?
(lambda (guess x)
(< (abs (- (square guess) x)) epsilon)))
(define average
(lambda (x y) (/ (+ x y) 2.0)))
(define improve
(lambda (guess x)
(average guess (/ x guess))))
This program illustrates many of the “good properties” we associated with the Procedural Programming paradigm in Chapter 1:
- Encourage the use of small units of codes, called procedures, which encapsulate well-defined commands.
- Procedures interact through well-defined interfaces published by each procedure (the contract of the procedure, including the signature of which arguments it expects, and which return value it returns)
We haven’t discussed local variables (used inside each procedure without affecting the state of the program outside the scope of the procedures). We will see later in the chapter that even in \(L2\), we have enough semantic power to define local variables, but we have not provided syntactic constructs to encourage the use of this feature.
We have created a hierarchy of procedures, higher-level procedures call lower-level procedures:
Expression Types and Evaluation Rules - Round 2
Let us summarize the syntax (expression types) of the language \(L2\) and summarize the rules to evaluate all of the expression types in \(L2\).
Expression Types
All the expression types presented so far are:
- Atomic expressions:
- number literal expression (0, 1, 2, …)
- boolean literal expression (#t and #f)
- primitive operation expression (+, -, *, /, <, >, =)
- variable expression (a well formed name - made up of letters and punctuations)
- Compound expressions:
- Special compound expressions:
(define <var> <exp>)
(lambda (<var> ...) <exp> ...)
(if <exp> <exp> <exp>)
(cond (<exp> ...) ...)
- Non-special compound expressions: \((exp_0 \ldots exp_n)\)
- Special compound expressions:
This inductive definition corresponds to the set \(Expression\) of all possible expressions in the language.
Evaluation Rules for Expressions
We define the function \(evaluate: Expression \rightarrow Value\) in an inductive manner:
1. Evaluation of atomic expressions:
- Special operators are not evaluated (
define
,lambda
,if
,cond
andelse
are the special operators). - Variables are evaluated by looking up their value in the global environment.
- Primitive atomic expressions evaluate to their pre-defined denoted value.
- Number atomic literal expressions evaluate to number values.
- Boolean atomic literal expressions evaluate to boolean values true and false.
- Primitive procedures evaluate to the primitive function that performs the denoted operation.
2. Evaluation of compound special forms:
- The special form
(define <var> <exp>)
is evaluated according to this rule:- Let val =
evaluate(<exp>)
- Add the binding
< <var>, val >
to the global environment. - The form does not have any value (which we state as: it has a
void
value).
- Let val =
- The special form
(lambda (<p1> ..) <exp1> ...)
is evaluated into a value<Closure (<p1>...) <exp1>...>
. The sub-expressions of the lambda-form are not evaluated. - The special form
(if <test-exp> <then-exp> <else-exp>)
is evaluated as follows:- Let p = evaluate(
<test-exp>
) - If p is true - the value of the if-expression is evaluate(
<then-exp>
) - else it is evaluate(
<else-exp>
).
- Let p = evaluate(
A similar rule applies for cond
. For all special forms - not all sub-expressions are evaluated.
3. Evaluation of compound non-special forms:
All compound forms are of the form \((exp_0 \ldots exp_n)\).
- Let \((val_0, \ldots, val_n) = (evaluate(exp_0), \ldots, evaluate(exp_n))\)
- If \(val_0\) is a primitive procedure: Apply the procedure \(val_0\) to the values \((val_1, \ldots, val_n)\).
- Else if \(val_0\) is a closure
<Closure <p1...pn> <exp1>..<expk>)
:- Replace \(p_1 ... p_n\) by \(val_1 ... val_n\) in
<exp1> .. <expk>
- Evaluate the resulting expressions and return the value of
<expk>
.
- Replace \(p_1 ... p_n\) by \(val_1 ... val_n\) in
MISSING PARTS: In this specification, we did not explain yet:
- how the substitution operation works in step 3.
- how to behave for error cases (wrong number of arguments, wrong type of value for \(val_0\)).
- how the global environment works (to extend the environment when
define
is invoked and to apply the environment when a variable is evaluated).
These parts will become explicit and will need to be specified when we implement the evaluation rules in the code of the interpreter.
Computed Values
The set of all possible values computed by \(L2\) is:
- Numbers
- Booleans
- Primitive operations
- Void
- Closures
What is Not in \(L2\)
Note the programming constructs which are absent from \(L2\):
- No loop data sructures.
- No mutation of variables.
- No compound data structures.
- No local variables.
Termination
Can you prove that some programs in \(L2\) may not terminate?
Extending the Language with Compound Values: \(L3\)
We observe that the computed values of \(L2\) are atomic values (numbers, booleans) or closures (which are a compoound value but which has no accessors).
To introduce compound data values in the language, we need:
- Constructors for compound values
- Literal expressions that denote compound values
In JavaScript, for example, compound values are constructed with the array and map constructors and denoted by the [] and {} notations for literal expressions.
In the minimalistic spirit which characterizes Scheme, we will introduce into \(L3\) a single compound value constructor and the capability to use it recursively.
The Pair Compound Data Type
A pair is a minimal compound data type that combines two values together into a single new unit. The language supports this by introducing:
- A value constructor: this is called
cons
in Scheme - Accessors to take apart a compound pair value: these are called
car
andcdr
(for the first and second element of the pair). - A type predicate to check whether a value belongs to the set of pair values:
pair?
. - An equality predicate for pairs
equal?
.
We thus extend the language with 5 primitive functions: cons
, car
, cdr
, pair?
, equal?
.
In addition, Scheme defines a standard form to print pair values and literal expressions that denote pair values:
A literal pair expression is denoted as '( <exp> . <exp> )
. For example:
(define p1 '(1 . 5)) ;; literal pair expression
(define p2 (cons 1 5)) ;; constructor invocation
(car p1) ;--> 1
(cdr p1) ;--> 5
Pairs can be combined recursively into complex compound values:
(define p3 (cons p1 p2))
(define p4 (cons p3 p2))
(car p3) ;--> '(1 . 5)
(cdr p3) ;--> '(1 . 5)
NOTE: The cons
Pair constructor can receive parameters of any types.
The type of cons
is thus described as (first T1, second T2)=>Pair(T1,T2)
.
(cons #t 1) ;--> '(#t . 1)
The List Compound Data Type
In addition to the Pair data type, \(L3\) introduces a recursive data type - called List. We first introduce it inductively:
- The empty list denoted ‘() is the base case.
- A non empty list is built by combining a value \(v_0\) together with a list \((v_1, \ldots, v_n)\) to obtain a non empty list \((v_0 v_1 \ldots v_n)\) - which combines a head (\(v_0\)) and a tail which is a list.
Non empty lists are constructed from a value and a list. The size that characterizes lists in this inductive definition is the length of the list: a list of size \(n+1\) is constructed from a value and a list of size \(n\).
The definition of this inductive data type is a disjoint union between the empty list and non-empty lists.
Scheme implements List values by re-using the Pair data type for non-empty lists and a special value for the empty list. The additions to the language are:
- A new primitive value ‘() and a corresponding predicate
empty?
- A special type of literal compound expressions for list values:
'(e1 ... en)
(This is read"quote e1 ... en"
) - The predicate primitive
list?
to check that an object belongs to theList
data type (either empty or non empty).
(define l14 (cons 1 (cons 2 (cons 3 (cons 4 '()))))) ;--> '(1 2 3 4)
(list? l12) ;--> #t
(cdr l14) ;--> '(2 3 4)
(car (cdr l14)) ;--> 2
(cons 10 l14) ;--> '(10 1 2 3 4)
On the basis of this inductive definition, we can define functions over lists:
(define length
(lambda (l)
(if (empty? l)
0
(+ 1 (length (cdr l))))))
As usual, recursive functions operating over an inductive type have a structure similar to the inductive definition of the inductive compound data type: because List is a disjoint union over Empty and Non-Empty lists, the structure of a function operating over lists will be:
(if (empty? l)
<process empty list case>
<process non-empty list case>
For example:
(define nth
(lambda (n l)
(if (empty? l)
'()
(if (= n 0)
(car l)
(nth (- n 1) (cdr l))))))
(nth 2 '(0 1 2 3 4)) ;--> 2
(nth 3 '(0 1)) ;--> '()
The list constructor can also receive parameters of any types. In particular, we can create list of pairs and recursive tree structures using the compound list data type.
Another list constructor is available in Scheme - which avoids the need for nested calls to cons
:
(list <e1> ... <en>) ;--> '(v1 ... vn)
where vi
is the value of <ei>
.
Syntactically, list
is complex because it can take variable number of arguments.
NOTE: Compare (list 1 2)
and (cons 1 2)
- how different are they?
Expression Types and Evaluation Rules - Round 3
Let us summarize the syntax (expression types) of the language \(L3\) and the rules to evaluate all of the expression types in \(L3\).
Expression Types
All the expression types presented so far are:
- Atomic expressions:
- number literal expression (
0, 1, 2
, …) - boolean literal expression (
#t
and#f
) - primitive operation expression (+, -, *, /, <, >, =, cons, car, cdr, pair?, list? )
- variable expression (a well formed name - made up letters and punctuations)
- the empty list literal expression ‘()
- number literal expression (
- Compound expressions:
- Literal compound expressions: denoted
'(<lit> . <lit>)
for pairs or'(<lit> ... <lit>)
for lists. where<lit>
is an embedded literal expression (either atomic or compound). - Special compound expressions:
(define <var> <exp>)
(lambda (<var> ...) <exp> ...)
(if <exp> <exp> <exp>)
(cond (<exp> ...) ...)
- Non-special compound expressions: \((exp_0 \ldots exp_n)\)
- Literal compound expressions: denoted
This inductive definition corresponds to the set \(Expression\) of all possible expressions in the language.
Evaluation Rules for Expressions
We define the function \(evaluate: Expression \rightarrow Value\) in an inductive manner:
1. Evaluation of atomic expressions:
- Special operators are not evaluated (
define
,lambda
,if
,cond
andelse
are the special operators). - Variables are evaluated by looking up their value in the global environment.
- Primitive atomic expressions evaluate to their pre-defined denoted value.
- Number atomic literal expressions evaluate to number values.
- Boolean atomic literal expressions evaluate to boolean values true and false.
- Primitive procedures evaluate to the primitive function that performs the denoted operation.
- The Empty list atomic literal expression evaluates to an empty list value.
2. Evaluation of compound special forms:
- The special form
(define <var> <exp>)
is evaluated according to this rule:- Let val =
evaluate(<exp>)
- Add the binding
< <var>, val >
to the global environment. - The form does not have any value (which we state as: it has a
void
value). - Compound literal expressions evaluate to compound values (pair or list).
- Let val =
- The special form
(lambda (<p1> ..) <exp1> ...)
is evaluated into a value<Closure (<p1>...) <exp1>...>
. The sub-expressions of the lambda-form are not evaluated. - The special form
(if <test-exp> <then-exp> <else-exp>)
is evaluated as follows:- Let p = evaluate(
<test-exp>
) - If p is true - the value of the if-expression is evaluate(
<then-exp>
) - else it is evaluate(
<else-exp>
).
- Let p = evaluate(
A similar rule applies for cond
. For all special forms - not all sub-expressions are evaluated.
3. Evaluation of compound non-special forms:
All compound forms are of the form \((exp_0, \ldots, exp_n)\).
- Let \((val_0, \ldots, val_n) = (evaluate(exp_0), \ldots, evaluate(exp_n))\)
- If \(val_0\) is a primitive procedure: Apply the procedure \(val_0\) to the values \((val_1, \ldots, val_n)\).
- Else if \(val_0\) is a closure
<Closure <p1...pn> <exp1>..<expk>)
:- Replace \(p_1 ... p_n\) by \(val_1 ... val_n\) in
<exp1> .. <expk>
- Evaluate the resulting expressions and return the value of
<expk>
.
- Replace \(p_1 ... p_n\) by \(val_1 ... val_n\) in
Computed Values
The set of all possible values computed by \(L3\) is:
- Numbers
- Booleans
- Primitive operations
- Void
- Closures
- Pairs
- Lists (empty or non-empty lists).
Comparison Functional JavaScript and \(L3\)
Let us compare the language we have obtained in \(L3\) and the Functional JavaScript subset we used in Chapter 1:
- \(L3\) does not have local variables - JavaScript can define them with
let
. - \(L3\) has less primitive operations, less primitive data types (no strings), less compound data types (no maps - arrays are quite similar to Scheme lists).
- There is no mutation in \(L3\) of variables and of compound data values (in contrast to JavaScript).