Source Language | Target Language |
C++ | x86 assembly language |
Java | Java virtual machine |
Java virtual machine | SPARC or x86 machine code (JIT) |
Ada | Java virtual machine |
COBOL | IBM mainframe machine language |
TeX/LaTeX | dvi |
dvi | Postscript |
Postscript | |
Database query (SQL) | sequence of database engine commands |
VHDL | hardware model checker input |
There are many choices for how to structure a compiler. The GNU compiler collection (GCC) and many commercial systems have a suite of front ends (C/C++, Fortran, Pascal, Java, ...), a common middle end (optimization and transformations that are independent of the source language and machine), and multiple back ends (code generators for different targets - x86, MIPS, SPARC, PowerPC, Java VM, ...)
Interfaces between phases: Variety. Some may be function calls, others actual data structures giving explicit intermediate representation (streams, abstract syntax trees, etc.).
What we have:
/* a very short code fragment */ if (x > y) y = 42 * x; else // don't do this if x > y x = f(x+17);
What we want:
load x ; jump to L1 if x <= y cmp y ble L1 load 42 ; y = 42 * x; mul x store y jmp l2 L1: load x ; x = f(x+17) add 17 push call f store x L2:
How do we get from here to there? The first task is to analyze the input and reconstruct the program structure.
Lexical analysis: Break the program into a token stream, hiding the details of reading the input and source program layout, and discarding whitespace. This is often called scanning. Result:
IF LPAREN ID(x) GT ID(y) RPAREN ID(y) GETS INT(42) TIMES ID(x) SCOLON ELSE ID(x) GETS ID(f) LPAREN ID(x) PLUS INT(17) RPAREN SCOLON EOF
Parsing: Read the token stream and analyze the underlying grammatical structure of the input. Most production compilers build explicit parse trees. For the project in CSE 413, we won't build the parse tree explicitly, but we will still analyze the phrase structure of the program.
Usual interface with between parser and scanner: parser calls method in the scanner when it needs to read the next token.
When we've gotten this far, we can generate code based on the information discovered during the analysis.
There are many books about programming languages and compilers. Here are a couple you might find useful. Copies are on reserve in the Engineering library.
Compilers: Principles, Techniques, and Tools, Alfred Aho, Ravi Sethi, and Jeffrey Ullman; Addison-Wesley, 1986 (corrected printing 1988), new edition 2007. This has been the standard text for compiler courses for many years. Chapters 1 & 2 cover basically all that we will see in CSE 413.
Programming Language Pragmatics, Michael L. Scott; Morgan Kaufmann, 2nd ed, 2006. This would be an ideal textbook for CSE 413, but it covers far more, and in more depth, than we can possibly manage in 10 weeks. Good source for supplemental reading though.