CSE 401 Compilers Class (Fall '99)

[Home] [Admin] [Details] [Help]

Assignments / #2


Due Friday, 10/15/99

If you have formed a group, #4 - #7 may be done by your group; turn in only one solution per group. #1 - #3 should be done and turned in individually, not as a group.

  1. ASU 3.7abch. ("In order," or "in lexicographic order" means for example that "p" preceeds "r", with or without arbitrary intervening characters. You may not use the not operator except for the simple case where not(a, b, ..., c) means all single letters except the letters a, b, ... or c.)

  2. ASU 3.16cd

  3. ASU 3.17 (for 3.16cd only) Be sure to use Algorithm 3.2.

  4. Extend the description of the PL/0 lexical structure to include the new language features described in the PL/0 Project Description. Hand in your revised description (hardcopy only). In particular, note that the extensions will require the scanner to handle

  5. Project, Part A. Extend the PL/0 scanner to scan the extended language. Use the -T option to stop compiling after scanning. You will also likely be interested in the -t option that prints tokens as they are read, and perhaps the -c (print characters) and -l (print lines) options.

    For all implementation projects, you will be graded on correctness of your implementation, on clarity and good design of your implementation, and on sufficiency of your test cases.

  6. Project, Part B. Also extend the lex version of the scanner to handle the extended PL/0 language, i.e., duplicate the function of the handcoded project scanner. The directory /cse/courses/cse401/CurrentQtr/pl0_lex contains the few files that differ between the pl0_base scanner and the lex version. (A new scanner.c, Makefile, the lex source file pl0.l, and a lex library file ncform. You should not change ncform.)

    Set up a new directory "hw2b", copy pl0_base into it, then copy pl0_lex into it to replace those few files. Again, you can test your scanner via plzero -T, but note that the -c and -l options do not work in the lex version. (This is due the the kludgey way I grafted the C code from lex to the C++ code of PL/0.)

    Remaining parts of the project this quarter will not be based on the lex-scanner, so be careful to keep these Part B files separate from the main project files (Part A).

    To really understand lex fully, you should read the (hardcopy) documentation on it in the Unix manuals. Fortunately, Section 3.5 and Question 3.10 in ASU provides a sufficient overview for this assignment. To make things even simpler, I'll describe here just enough that you should be able to manage this assignment.

    The lex program you'll need looks like

    <regular expression0>	{return(token0_code);}
    <regular expression1>	{return(token1_code);}
    The regular expressions are just like you've seen in class, using the extensions like "[a-z]" and "+".  lex uses many punctuation marks to have special meanings in the regular expresssions; if you get a syntax error, you've probably inadvertently used one. To create a literal string, put it in quotes. For example, "<" will match the left-angle-bracket, whereas simply typing the left-angle-bracket as the pattern will result in a syntax error.

    What lex does is to generate a C program, always called lex.yy.c, that implements a finite state machine capable of recognizing the regular expressions you've input. When called at its main entry point yylex(), the lexer reads input until an expression is matched, then executes the C code given after that expression, which in the sample above just returns an integer code to the caller indicating which token was found. The caller can see the actual lexeme matched by looking in the char array yytext. (The finite state machine always makes the longest match possible from all the regular expressions, and in the case of multiple matches of identical length uses the rule given first in the lex program.)

    The supplied Makefile should take care of the details of building lex.yy.c from pl0.l, linking it to the scanner, etc.

    Electronically turn in your entire working directory, including both the part A and part B code, and your test case files, clearly labeled. Please "clean up" before you turn in: delete backup versions of source files, all .o files, executable files, etc. ("make clean" might be hepful here.) Please arrange your directories so that your RCS files are NOT turned in; see instructions on Turnin in the Help section of the web. Also, turn in hard copy printouts of the source files you changed, with changes marked, as described in the instructions on Printing Code in the Help section.

  7. Project, Part C. Imagine that you are employed by NetBeans, the new compiler startup, darling of Wall Street. Before you take your stock options to your yacht, your boss has asked you to do Parts A and B as a brief comparative study of the two main approaches to writing scanners: bared-handed, vs lex. Write a 1 page memo to her/him summarizing (a) the evaluation criteria you think are important, (b) the pros/cons of these two methods relative to your criteria, and (c) your recommendations for future NetBeans scanners. Include a brief discussion of some lexical extension that's more complex than those considered in Parts A & B, like float constants, say.


cse401-webmaster@cs.washington.edu (Last modified: 10/08/99)