Trefoil v4 Language Reference

Start by reading the informal description in hw7 if you haven't already.

The definition below describes the syntax and semantics of Trefoil v4 programs. Italicized words are technical terms being defined.

Many of the concepts are the same as Trefoil v3. We do not repeat them here. Instead, we describe only the changes and new features.

Syntax

The syntax has four levels: characters, tokens, parenthesized symbol trees, and abstract syntax trees (ASTs). Only the AST level changes compared to Trefoil v3.

Abstract Syntax Tree level

A program consists of a sequence of bindings.

A binding is one of the following:

  • Any of the binding forms available in Trefoil v3.
  • Struct binding: a node with head struct with at least one argument. All the arguments must be symbols. The first argument is called the struct name. The rest of the arguments (possibly none) are called the field names. The field names must be distinct (but it is allowed to have a field with the same name as the struct name(!)).
    • Example with two fields: (struct my-record name age)
    • Example with zero fields (struct my-empty-record)

An expression is one of the following:

  • Any of the expression forms available in Trefoil v3.
  • Trefoil-symbol literal: (Since the word "symbol" overlaps between PSTs and Trefoil expressions here, we will say "PST-symbol" and "Trefoil-symbol" to distinguish them.) The syntax of a Trefoil-symbol literal is a PST-symbol that starts with an apostrophe.
    • Example: 'hello-world
  • Match expression: a node with head match and at least one argument.
    • The first argument is an expression.
    • Any remaining arguments are match clauses.
    • A match clause is a node with exactly two children.
      • The first child is a pattern.
        • If the pattern reuses variable patterns with the same name more than once, it is an abstract syntax error.
      • The second child is an expression.
    • Example: The body of this function is a match expression with two clauses.
      (define (sum l)
        (match l
          (nil 0)
          ((cons x xs) (+ x (sum xs)))))
      
  • Lambda expression: a node with head lambda and exactly two arguments.
    • The first argument is a node with any number of arguments, all of which must be symbols (the parameter names).
    • The second argument is an expression.
    • Example: (lambda (x y) (+ x (- y 1))) is an anonymous function of two arguments.
  • Print expression: a node with head print and exactly one argument, which is an expression.

  • We change the syntax of function call expressions to allow any expression to be called, not just a symbol.

    • Example: ((f 1) 2)
      • Calls the function named f with the argument 1 and expects that to return a closure that is then called with argument 2.
    • Note that "normal" function calls are still supported, since f is just a variable that evaluates to a closure.

There are also three "internal" expressions used by the interpreter to implement structs. Trefoil users cannot write these expressions directly, so they have no concrete syntax as PSTs. But they do have purely abstract syntax as ASTs. So we describe them in terms of what fields their AST nodes have.

  • Struct constructor expression: represents a value built with a struct constructor. The AST node has two fields.
    • A (OCaml) string (the name of the struct)
    • A (OCaml) list of Trefoil v3 expression ASTs (the arguments to the constructor)
    • Since the expression has no concrete syntax, but sometimes we need to refer to it in this document, we will write it as StructConstructor(s, es) where s stands for any string and es stand for any list of expression ASTs.
  • Struct predicate expression: represents a predicate that checks whether a value was built by a struct's constructor. The AST node has two fields.
    • A (OCaml) string (the name of the struct)
    • A Trefoil v3 expression AST (the expression that will evaluate to the value the programmer wants to examine)
    • Since the expression has no concrete syntax, but sometimes we need to refer to it in this document, we will write it as StructPredicate(s, e) where s stands for any string and e stands for any expression AST.
  • Struct access expression: represents a field access to a struct value. The AST node has three fields.
    • A (OCaml) string (the name of the struct)
    • A (OCaml) integer (the index of the field to access)
    • A Trefoil v3 expression AST (the expression that will evaluate to the struct value whose fields the programmer wishes to access)
    • Since the expression has no concrete syntax, but sometimes we need to refer to it in this document, we will write it as StructAccess(s, i, e) where s stands for any string, i stands for any (OCaml) integer, and e stands for any expression AST.

Finally, there is one more "internal" expression used to implement first-class functions.

  • Closure expression: represents a function as a value. The AST node has four fields
    • An optional (Ocaml) string (the name of the function, or None if it is anonymous)
    • An OCaml list of strings (the names of the parameters)
    • A Trefoil v4 expression AST (the body of the function)
    • A dynamic environment (the defining environment)

A pattern is one of the following:

  • Wildcard pattern: the PST-symbol _ (the underscore character)
  • Variable pattern: a PST-symbol that is not any of the keywords used as stand-alone symbols anywhere in this section, nor _, nor any symbol that starts with apostrophe.
  • Integer literal pattern: a symbol consisting of an optional minus sign followed (without space) by a nonempty sequence of digits
  • Boolean literal pattern: the symbol true or the symbol false
  • Nil literal pattern: The symbol nil
  • Trefoil-symbol pattern: a PST-symbol that starts with an apostrophe
  • Cons pattern: a node with head cons and exactly two arguments, each of which is a pattern.
  • Struct pattern: a node with a head that is not any of the keywords used as the head of any node anywhere in this section, nor _, nor any symbol that starts with an apostrophe. The node can have any number of arguments after the head, each of which is a pattern.

A value is an expression that satisfies one of the following additional constraints:

  • The constraints of any Trefoil v3 value.
  • It is a Trefoil-symbol expression.
  • It is a struct constructor expression all of whose subexpressions (children) are values.
  • It is a closure.

List of symbol keywords (cannot be used as variable names)

  • true, false, nil, _, and any symbol that starts with an apostrophe

List of node head keywords (cannot be used as function names)

  • test, define, +, -, *, =, if, let, cons, nil?, cons?, car, cdr, cond, match, struct, _, lambda, print, and any symbol that starts with an apostrophe

Semantics

The meaning of a Trefoil v4 program is the same as in Trefoil v3. We only need to describe the semantics of the new binding and expression forms, as well as the semantics of patterns. We will also adjust the definition of the dynamic environment and the semantics of function calls.

The dynamic environment

We change the definition of a dynamic environment back to the version from Trefoil v2 (!) to map strings to values.

Semantics of bindings

We extend the semantics for the new kind of binding.

  • Consider a struct binding (struct s fs) where s is the name of the struct and fs stands for a possibly empty sequence of symbols representing the field names. The semantics is to extend the current dynamic environment with the following mappings:
    • s maps to a function defined as if by the binding (define (s xs) StructConstructor("s", xs))
    • s? maps to a function defined as if by the binding (define (s? x) StructPredicate("s", x))
    • for each field f in fs, s-f maps to a function defined as if by the binding (define (s-f x) StructAccess(s, i, x)) where i is the index of the field f in the sequence fs.
    • (Side note: the functions described in the previous bullet points have bodies which cannot be written by the Trefoil programmer, because there is (intentionally) no concrete syntax for StructConstructor, StructPredicate and StructAccess. We have used a semi-concrete syntax to describe what AST the interpreter should construct for these function bodies.)

Semantics of expressions

We change the semantics of the = operator so that it works on all values.

  • Consider an expression (= e1 e2), where e1 and e2 stand for any expressions. The semantics is to evaluate e1 to a value in the current dynamic environment. Call that value v1. Then evaluate e2 to a value in the current dynamic environment. Call that value v2. The result is a boolean literal indicating whether v1 and v2 are structurally equal.

We change the semantics of function calls to support calling arbitrary expressions:

  • Consider a function call (ecall args) where ecall stands for any expression, and args stands for any sequence (possibly empty) of expressions to be passed as arguments to the function. Call the current dynamic environment callenv. The semantics is to first evaluate ecall in callenv. Call the resulting value vcall. If vcall is anything besides a closure, signal an error. Otherwise, let f_opt be the optional function name, params be the parameter names, body be the body, and defenv be the defining environment. (All of this comes out of the closure.) If args and params are sequences of different lengths, signal an error. Next, evaluate each element of args in left-to-right order in callenv. This results in a sequence of values. Call that sequence vals. Then evaluate body in a new dynamic environment constructed by extending defenv with mappings param -> val for each param and corresponding value val in the lists params and values, and, if f_opt is of the form Some(f), also with another mapping from f to vcall (the whole closure) (this implements recursion). Return the result of evaluating body.

We also extend the semantics for the new kinds of user-facing expressions.

  • Trefoil-symbol literals evaluate to themselves.
  • Consider a match expression (match e clauses) where e stands for any expression and clauses stands for any sequence (possibly empty) of match clauses. Each match clause is of the form (pi bi) where pi is a pattern and bi is an expression. (Note how this is different from cond clauses, where both are expressions!!) The semantics is to first evaluate e in the current dynamic environment to a value v. Then iterate over the clauses in order. For each clause (pi bi), ask pi whether it matches v. If pi answers "no", continue the loop. If pi answers "yes, and B", then stop the loop and evaluate bi in the current dynamic environment extended by all the bindings in B. Return the value the bi evaluates to. If the loop gets to the end of the sequence of clauses without finding any pi that matches v, then signal an error.
  • Consider a lambda expression (lambda (params) body) where params stands for any list of symbols and body stands for any expression. The semantics is to return a closure expression with None name, params as the parameter names, body as the body, and the current dynamic environment as the defining environment.
  • Consider a print expresion (print e) where e stands for any expression. The semantics is to evaluate e to a value v and then print v in an implementation-defined format. (In other words, the specification does not say exactly how to print v, it's up to you.)

Even though the three "internal" struct expressions cannot be written directly by the Trefoil programmer, they can be created by the new kind of function call as well as the struct accessor and struct predicate functions introduced by struct bindings, so we need to give semantics to these "internal" expressions as well:

  • Consider a struct constructor expression StructConstructor(s, es), where s stands for any string and es stand for any list of expression ASTs. Evaluate the expressions in es in the current dynamic environment to a sequence of values vs. Return StructConstructor(s, vs).
  • Consider a struct predicate expression StructPredicate(s, e) where s stands for any string and e stands for any expression AST. Evaluate e in the current dynamic environment to a value v. If v is of the form StructConstructor(s', vs) and s and s' are equal as strings, return true. In all other cases, return false.
  • Consider a struct access expression StructAccess(s, i, e) where s stands for any string, i stands for any (OCaml) integer, and e stands for any expression AST. Evaluate e in the current dynamic environment to a value v. If v is of the form StructConstructor(s', vs) and s and s' are equal as strings and i is less than the length of vs, then return the ith element of vs. In all other cases, signal an error.

Finally, a closure evaluates to itself. (Note that programmers cannot write closures directly.)

Structural equality

Structural equality is defined recursively as follows. Comparing two values for structural equality can return true/false or signal a runtime error.

Let v1 and v2 be two values. We say that v1 is structurally equal to v2 if:

  • v1 and v2 are both integer literals and they are equal as integers
  • v1 and v2 are both true
  • v1 and v2 are both false
  • v1 and v2 are both nil
  • v1 and v2 are both Trefoil-symbol values, and the symbols are equal as strings
  • v1 and v2 are both cons values, say v1 is (cons v11 v12) and v2 is (cons v21 v22), and v11 is structurally equal to v21 and v12 is structurally equal to v22.
    • (Side note 1: by definition of what it means for a cons expression to be a value, v11, v12, v21, and v22 are all guaranteed to be values.)
    • (Side note 2: the definition of structural equality is recursive!)
  • v1 and v2 are both struct constructor values, say v1 is StructConstructor(s1, vs1) and v2 is StructConstructor(s2, vs2), and s1 and s2 are equal as strings, and vs1 and vs2 have the same length as lists, and each element of vs1 is structurally equal to the corresponding element of vs2.
    • (Side note: by definition of what it means for a struct constructor expression to be a value, all the elements of vs1 and vs2 are values.)
  • If v1 or v2 are closures, signal a runtime error.
  • In all other cases, return false.

Semantics of patterns

The semantics of a pattern takes a value v as an argument and returns one of the following:

  • "no", indicating that the pattern does not match v
  • "yes and B" (where B is a dynamic environment), indicating that the pattern matches v and introduces bindings B

Given v, we describe the semantics of each kind of pattern.

  • A wildcard pattern always answers "yes and []" (the empty dynamic environment)
  • A variable pattern x always answers "yes and [(x -> v)]" (the dynamic environment that maps x to v and has no other mappings)
  • An integer literal pattern n, where n stands for any integer, examines v:
    • If v is an integer literal with same value as n, answer "yes and []"
    • Otherwise, answer "no"
  • A boolean literal pattern b, where b stands for any boolean, examines v:
    • If v is a boolean literal with the same value as b, answer "yes and []"
    • Otherwise, answer "no"
  • A nil literal pattern examines v:
    • If v is a nil literal, answer "yes and []"
    • Otherwise, answer "no"
  • A Trefoil-symbol pattern s, where s stands for any symbol-string, examines v:
    • If v is a Trefoil-symbol literal with the same value as s, answer "yes and []"
    • Otherwise, answer "no"
  • A cons pattern (cons p1 p2), where p1 and p2 stand for any patterns, examines v:
    • If v is a cons expression, it can be written as (cons v1 v2) where v1 and v2 are values. Ask p1 if it matches v1
      • If "no", then answer "no".
      • If "yes and B1", ask p2 if it matches v2
        • If "no", then answer "no".
        • If "yes and B2", answer "yes and B1 @ B2", where @ denotes appending the two dynamic environments together, as in OCaml
    • Otherwise, if v is not a cons expression, answer "no".
  • A struct pattern (s ps), where s stands for any struct name PST-symbol and ps stands for a (possibly empty) sequence of patterns, examines v
    • If v is a struct constructor expression, it can be written StructConstructor(s', vs), where s' is a struct name PST-symbol and vs is a sequence of values.
      • If s and s' are not equal as strings, answer "no".
      • If ps and vs are sequences of different lengths, answer "no".
      • Otherwise, iterate through ps and vs considering the i-th element of each sequence. Call these elements pi and vi respectively. At each iteration of the loop, ask pi if it matches vi.
      • If "no", answer "no"
      • If "yes and Bi", continue the loop.
      • If you reach the end of the iteration, then all of the pi match vi. Collect all the Bi from each iteration of the loop and concatenate them all together. Call this concatenated environment B. Answer "yes and B".
    • Otherwise, if v is not a struct constructor expression, answer "no".