HW7: Trefoil v4

Due: Thursday, December 7, 11:59pm Pacific Time

This homework is the last step in our Trefoil journey as a whole class. (You may choose to keep going in the final project or beyond!) You will implement an interpreter for version 4 of Trefoil, which extends version 3 to support structs, pattern matching, and first-class functions.

Refactoring your previous interpreter.

You should start by making a copy of your hw6 solution and then making the following changes:

  • Rename the file trefoil3.ml to trefoil4.ml.
    • Change the line at the top from
      open Trefoil3lib
      
      to
      open Trefoil4lib
      
    • Change the welcome message near the bottom from saying "v3" to "v4"
  • Rename the file trefoil3test.ml to trefoil4test.ml
    • Change the line at the top from
      open Trefoil3lib
      
      to
      open Trefoil4lib
      
  • Replace the src/dune file with the updated file from the HW7 starter code.
  • Copy-paste the contents of the newtests.ml file into your trefoil4test.ml file. These are additional provided tests for the new features.
  • Copy-paste the contents of ast-starter.ml into the top of your ast.ml file, just after the open line.
  • Copy-paste the contents of interpreter-starter.ml into your interpreter.ml file, above interpret_expression.
  • Move your definitions of dynamic_env and env_entry to ast.ml, as you will need them as part of implementing closures/first-class functions. You will need to use OCaml's and syntax to make the definition of the dynamic_env simultaneous with expr.

Informal description of the language

The Trefoil v4 language is an extension of Trefoil v3. We only describe the differences.

Bindings

There is one new kind of binding, the struct binding, which declares a new struct type. For example, the binding (struct my-record name age) declares a struct called my-record with fields name and age. We have discussed this binding extensively in lecture. In short, it generates a constructor, a predicate (question-mark) function, and field accessor functions for each field. See the Trefoil v4 language spec for a formal description.

Expressions

There are several new kinds of expressions:

  • Trefoil symbols, which are kind of like strings without spaces in them. They start with an apostrophe: '.
  • match expression for pattern matching as discussed in lecture.
  • lambda expressions for anonymous functions
  • print expressions which evaluate a trefoil expression and then print it

There are also three new "internal" forms of expression, which you will use to implement struct bindings. By "internal", we mean that there is no way for the Trefoil programmer to write these expressions directly in their code. Instead, when the Trefoil programmer writes a struct binding, your interpreter will autogenerate some functions that use these internal expressions in their bodies. Then the Trefoil programmer can call those autogenerated functions to indirectly use these new expressions. Here they are:

  • struct constructor expressions, which represent the results of calling the struct constructor
  • struct access expressions, which allow accessing fields of a struct constructor value
  • struct predicate expressions, which tell you whether a value was built with a particular struct's constructor

There is one additional internal expression for closures, which store a function definition and its defining environment as a value.

We also make one change to the Trefoil v3 semantics of expressions:

  • The = operator now works on all values except for closures, not just integers.

Patterns

Patterns are a new syntactic category in Trefoil v4 that represent "the left hand side of a match clause", similar to patterns in OCaml that you are used to using.

There are 8 kinds of patterns, which we have discussed informally in lecture, and which are discussed in detail in the spec.

Your task

Extend your previous homework solution to implement the changes described in the Trefoil v4 language spec. Write tests that cover all the normal and error case behavior for all new language features.

We highly recommend keeping your tests from previous homeworks around to help you find bugs. You do not need to resubmit tests that you submitted for HW6, but you can if you want to. Note that some tests may require adjusting.

Suggested workflow

You can do anything in whatever order you want, but here is one suggested path.

  • Look at the Trefoil v4 language spec to get an idea of the list of expressions and bindings you have to add.
  • Refactor your HW6 code by following the instructions above for copying your HW6 code and changing a few 3s to 4s if you haven't already.

Implement symbols

  • Declare an expression AST node for symbols:
    ...
    | Symbol of string
    
  • Implement the expr_of_pst case for symbols. You may find the following pseudocode useful, but note that some things may change depending on your implementation.
    if String.get sym 0 = '\''
    then Symbol (String.sub sym 1 (String.length sym - 1))
    else Variable sym
    
  • Implement symbols in interpret_expression in interpreter.ml.
  • Add tests for symbols.

Implement print expressions

  • Declare AST.
  • Parse it.
  • Interpret it following the semantics.
  • Add tests.

Implement closures and the new semantics of function calls

  • Declare an expression AST node for closures.
  • Change the expression AST node for function calls to accept an arbitrary expression to call.
  • Change expr_of_pst to parse the new function call syntax correctly.
  • Delete FunctionEntry and refactor all uses of it to use Closure instead, as discussed in lecture 23 (11/29/23).
  • Update the interpreter case for function calls to evaluate the callee expression to a closure and then call it.
  • Add a case to the interpreter to evaluate closures to themselves.
  • Add tests for closures.

Implement lambdas

  • Declare AST.
  • Parse it.
  • Interpret it to a closure.
  • Add tests for lambdas.

Implement the new extended semantics of =

  • Implement a new helper function for checking structural equality. Be sure to raise a runtime error if you encounter any closures.
  • Call that function from your interpreter for =.
  • Add tests for =. Be sure to test the behavior when comparing closures, and when comparing closures nested inside other data structures such as cons (both these cases signal an error).

Implement struct bindings and the internal struct expressions

  • Declare an AST binding node for struct bindings.
  • Parse struct bindings in binding_of_pst.
    • Hint: Broadly, parsing structs is similar parsing function definitions, except that there are fewer parentheses and no function body expression.
  • Implement the case for struct bindings in interpret_binding as follows:
    • Declare StructConstructor AST node in expr following the spec.
    • Note: Since users cannot write StructValue in their programs, there is no need to parse it.
    • Implement StructConstructor in interpret_expression following the spec.
    • In the interpret_binding case for struct bindings, generate a function (closure) that calls StructConstructor. Be sure to pass the right number of arguments.
      • Hint: The function's definition is described in the spec "as if by the binding ..." with a binding that is not valid user-facing syntax because it uses StructConsturctor. So, you cannot use the parser to generate this binding, and you must instead construct it manually using the AST constructors for function bindings, StructConstructor, and variables.
      • Hint: You will need to generate variable names for all the parameters to this generated function. We recommend using x1, x2, ..., xN, where N is the number of fields in the struct. Note that N will be different for different structs.
    • Declare an AST node for StructPredicate in the expr type. Again, no need to parse it.
    • Implement StructPredicate in interpret_expression following the spec.
    • Back in the struct binding case for interpret_binding, generate a function binding for the struct's predicate function following the spec.
    • Declare an AST node for StructAccess in the expr type. No need to parse it.
    • Implement the semantics of StructAccess in interpret_expression.
      • Be sure you signal an error if the value is not built from the same struct name as the access.
    • Back in the struct binding case for interpret_binding, generate function bindings to access each of the structs fields.
  • Ensure the provided tests whose name contain the string "struct" pass. Note that some of the provided tests use features like cond from previous homeworks.

Add match expressions

  • The starter code from ast-starter.ml implements an AST for patterns. We have provided AST nodes for wildcard patterns (_) and cons patterns. There is also a parsing function as usual called pattern_of_pst, which we have implemented for the two provided patterns (plus given you some hints about other cases).
  • All of this provided code about patterns is currently unused in the starter code, because the only place patterns appear in Trefoil v4 is in match expressions, and the starter code does not include match expressions. Let's fix that now.
  • Declare a new AST node for match expressions by adding a constructor in the expr type.
    • Hint: The AST node should be relatively similar to Cond, except that match takes a "scrutinee" expression as its first argument, in addition to a list of clauses. Also, a cond clause is two expressions, but a match expression is a pattern and an expression.
  • Add a case to expr_of_pst to parse match expressions.
    • Hint: It is relatively similar to parsing cond, with the differences noted above.
    • Hint: You'll need to call pattern_of_pst somewhere.
  • Test the normal case and malformed case of parsing.
  • The semantics of patterns is implemented in interpret_pattern in the starter code. We have implemented the semantics of wildcard patterns and cons patterns for you.
  • Implement match expressions in interpret_expression.
    • Hint: It is relatively similar to interpreting cond, with the differences noted above. After evaluating the scrutinee to a value, loop through the clauses and check if the clause pattern matches the scrutinee's value by calling interpret_pattern. If yes, be sure to evaluate the right hand side of the clause in an appropriately extended environment.
  • Ensure the provided tests "match expression with wildcards and cons 1" and "... 2" pass.

Add the remaining patterns

  • You can do them in any order, but here's one order.
  • Declare a pattern AST node for integer literal patterns by adding a constructor to the pattern type in ast.ml.
    • Hint: The AST node will have the same arguments as the one for integer literal expressions. That's because integer literals (also boolean literals, nil, etc.) are syntactically valid as both expressions and patterns. Don't get confused though -- they have different semantics depending on whether they are an expression or pattern!
  • Parse integer literal patterns by fixing the first TODO in pattern_of_pst.
    • Hint: The starter code calls int_of_string_opt for you. All you have to do is pass the result to your constructor.
  • Skim the semantics of patterns in the spec.
    • In the semantics, patterns are described as returning "no" or "yes and B". In the code, we implement this using an option type, specifically
      dynamic_env option
      
      where "no" corresponds to None in the code, and "yes and B" corresponds to Some B in the code, where B is implemented as a list of string * value, or in other words, a dynamic environment.
    • Review the semantics of integer literal patterns in detail.
  • Implement integer literal patterns in the interpreter by adding a case to interpret_pattern.
    • Hint: You should not need to change interpret_expression at all, only interpret_pattern.
  • Ensure the provided tests "match expression with int literal patterns" and "match expression with int literal patterns and cons" pass.
  • Boolean literal patterns
    • Declare AST node in pattern type in ast.ml.
    • Parse by adding cases in pattern_of_pst.
      • Hint: The starter code shows you where to add the case for "true". "false" is similar.
    • Add a case for bool literal patterns to interpret_pattern in interpreter.ml.
    • Ensure provided tests "match expression with bool literal patterns 1" and "... 2" pass.
  • Nil literal patterns
    • Declare AST node in pattern type in ast.ml.
    • Parse by adding a case in pattern_of_pst.
    • Implement by adding a case to interpret_pattern in interpreter.ml.
    • Write a simple test involving a nil pattern and ensure it passes.
  • Trefoil symbol patterns
    • Declare AST node in pattern type in ast.ml.
    • Parse by fixing the corresponding TODO in pattern_of_pst.
    • Implement by adding a case to interpret_pattern in interpreter.ml.
    • Ensure provided test "match expression with symbol literal patterns" passes.
  • Variable patterns
    • Declare AST node in pattern type in ast.ml.
    • Parse by fixing the corresponding TODO in pattern_of_pst.
    • Implement by adding a case to interpret_pattern in interpreter.ml.
    • Hint: This is the first pattern you implement where the bindings list is nonempty!
    • Ensure provided test "match expression with variable patterns" passes.
  • Struct patterns
    • Declare AST node in pattern type in ast.ml.
      • Hint: This is the most complicated kind of pattern in the language, because it has a list of sub-patterns. It also has a string (the name of the struct).
    • Parse by fixing the corresponding TODO in pattern_of_pst.
      • Hint: Parsing is challenging because of the list of sub-patterns.
      • Hint: It's sort of similar to parsing a function call expression. You may want to review that code.
    • Implement by adding a case to interpret_pattern in interpreter.ml.
      • Hint: Be sure to check the struct name strings are equal!
      • Hint: The hard part again is the fact that the pattern has a list of sub-patterns, and the struct value has a list of sub-values.
    • Ensure provided test "match struct binding" pass.
  • According to the spec, it is an error if any pattern in a match expression repeats a variable name. Fix this in your interpreter so that the test named "sum_with_match_error" passes.
    • Hint: In ast.ml, write a function vars_of_pattern that takes a pattern and returns a list of strings containing all the variable names that appear anywhere in the pattern.
    • Hint: In expr_of_pst in the case for match expressions, first parse the pattern, then check that vars_of_pattern applied to the parsed pattern returns a list with no duplicates.

Submission

Submit these files to Gradescope:

  • ast.ml
  • interpreter.ml
  • trefoil4test.ml