CSE413 Notes for Friday, 2/23/24

I continued our discussion of recursive descent parsing. I reviewed the grammar we ended up with in the previous lecture:

        <options> ::= <sequence> {"+" <sequence>}
        <sequence> ::= <item> {"+" <item>}
        <item> ::= <number> | <symbol>

Recall that we wrote a procedure for each different rule of the grammar. The parse-item procedure processed just a simple number of symbol. The parse-sequence procedure called parse-item and processed the addition operator. The parse-options procedure called parse-sequence to process the ampersand operators.

This grammar naturally gives higher precedence to the plus operator because it is called before we deal with any ampersand operators. It is important to recognize that the grammar tells you what code to write.

        (define (parse-item lst)
          (if (not (pair? lst))
              (error "item error")
              (let ([first (car lst)])
                (cond ((number? first) (cons (number->string first) (cdr lst)))
                      ((symbol? first) (cons (symbol->string first) (cdr lst)))
                      (else (error "item error"))))))

        (define (parse-sequence lst)
          (let ([result (parse-item lst)])
            (cond ((and (> (length result) 1) (eq? '+ (cadr result)))
                   (let ([result2 (parse-sequence (cddr result))])
                     (cons (string-append "(" (car result) "," (car result2) ")")
                           (cdr result2))))
                  (else result))))

        (define (parse-options lst)
          (let ([result (parse-sequence lst)])
            (cond ((and (> (length result) 1) (eq? '& (cadr result)))
                   (let ([result2 (parse-options (cddr result))])
                     (cons (string-append "(" (car result) "--" (car result2) ")")
                           (cdr result2))))
                  (else result))))

As we saw last time, these versions group operators from right-to-left. For example:

        > (parse-sequence '(2 + 3 + 5 + x & 4 & 9 + y & z))
        '("(2,(3,(5,x)))" & 4 & 9 + y & z)

Remember that these parsing procedures are supposed to be greedy in the sense that they consume as many tokens as they can that are considered part of the grammar. In this case, parse-sequence was able to process (2 + 3 + 5 + x) but then stopped because it encountered an ampersand. But it is grouping the values from left to right, putting together (5,x) first rather than (2,3).

I asked how we could change the procedure to group the other way. I said that it can be helpful to imagine how we would do this iteratively. So suppose that you are proccessing (2 + 3 + 5 + x). What would you do? We can imagine a kind of while loop that keeps processing values as long as it keeps seeing plus operators:

        start with the first value
        while (you see a plus operator next) {
            include the next value in the expression
        }

If we think about each individual step, we get:

        start with 2, which we turn into "2"
        we see a plus, so incorporate 3: "(2,3)"
        we see another plus, so incorporate 5: "((2,3),5)"
        we see another plus, so incorporate x: "(((2,3),5),x)"

The key is to think about how this kind of computational process can be translated into a recursive definition. I have made the point several times that if an iterative solution involves keeping track of some local variable that changes as you execute a loop, then the recursive version would require an extra parameter to store that information.

In fact what we want is what we have called an accumulator in other examples. We want a helper procedure that builds up the desired string on each call. We want to first call parse-item, but then we can use the resulting list to call our helper function. Remember that when we call parse-item, it processes the first value:

        > (parse-item '(2 + 3 + 5 + x & 4 & 9 + y & z))
        '("2" + 3 + 5 + x & 4 & 9 + y & z)

We can use the car of this list as the initial value for our accumulator and the cdr of this list as the tokens left to process:

        (define (parse-sequence lst)
          (define (helper acc rest)
            ...)
          (let ([result (parse-item lst)])
            (helper (car result) (cdr result))))

So what do we do in the helper procedure? We want to see if rest begins with a plus. If so, we have to process it:

        (define (parse-sequence lst)
          (define (helper acc rest)
            (if (and (pair? rest) (eq? '+ (car rest)))
                ...))
          (let ([result (parse-item lst)])
            (helper (car result) (cdr result))))

We need the test pair? because it's possible that rest will be empty. If we see a plus, then what? The other version made a recursive call on parse-sequence. That's going to process all of the plus operators that follow before it returns, which is going to give us right-to-left grouping. So instead we can call parse-item again:

        (define (parse-sequence lst)
          (define (helper acc rest)
            (if (and (pair? rest) (eq? '+ (car rest)))
                (let ([result2 (parse-item (cdr rest))])
                ...)))
          (let ([result (parse-item lst)])
            (helper (car result) (cdr result))))

What then? We can add this next item to our accumulator and call the helper procedure again to process what comes after:

        (define (parse-sequence lst)
          (define (helper acc rest)
            (if (and (pair? rest) (eq? '+ (car rest)))
                (let ([result2 (parse-item (cdr rest))])
                  (helper (string-append "(" acc "," (car result2) ")")
                          (cdr result2)))
                ...))
          (let ([result (parse-item lst)])
            (helper (car result) (cdr result))))

The only thing left to include is the base case when we run out of plus operators. In that case, we want to return the list you get by combining the accumulator with the rest of the list:

        (define (parse-sequence lst)
          (define (helper acc rest)
            (if (and (pair? rest) (eq? '+ (car rest)))
                (let ([result2 (parse-item (cdr rest))])
                  (helper (string-append "(" acc "," (car result2) ")")
                          (cdr result2)))
                (cons acc rest)))
          (let ([result (parse-item lst)])
            (helper (car result) (cdr result))))

That produced the left-to-right grouping we were looking for:

        > (parse-sequence '(2 + 3 + 5 + x & 4 & 9 + y & z))
        '("(((2,3),5),x)" & 4 & 9 + y & z)

As I mentioned in the previous lecture, this example should serve as a medium hint for the homework. In some ways the homework is easier to write because it doesn't involve this conversion from tokens to strings. But you will want to pay attention to the two different approaches here that get left-to-right versus right-to-left grouping. When a procedure like parse-sequence calls parse-item followed by parse-sequence, it is going to produce right-to-left grouping. If it instead calls parse-item followed by another call on parse-item, it can produce left-to-right grouping. In the homework one operator groups right-to-left and the others group left-to-right.

I spent the remainder of the lecture describing the grammar we will be using for the homework. It comes from Python and the assignment involves implementing a mini version of the Python interpreter known as Idle. That information is all in the assignment writeup, so I won't repeat it here.

Stuart Reges

Last modified: Sat Feb 24 11:29:27 PST 2024