CSE 401 Final Answer Key

What is the purpose of generating machine-independent intermediate code before generating machine code for the target machine?

(a) Make the task of generating machine code easier

(b) Make the code generator and the optimizer portable across several processors

Answer: (b). The answer (a) is not valid because machine code can be just as easily be generated straight from the AST (as our compiler does). The answer (c) is not always valid, and even if it were, that would not constitute much purpose anyway.

How many temporary variables can the intermediate code generator create?

(a) As many as there are registers in the target processor

(b) A number that's arbitrarily chosen, but fixed

Answer: (c). As we discussed, new temporaries are generated liberally.

What is the format of intermediate code?

(a) Tree-structured: mimics the AST of the source program because it was generated from it.

(b) Linear list of instructions

Answer: (b). We never discussed tree-structured intermediate code.

What is the most accurate definition of an l-value?

(a) An expression referring to an entity that has an address in memory.

(b) A variable name

(d) A function call

Answer: (a). Don't get fooled by (b); an array element reference (such as x[5]) is an l-value alright, and it's at the same time an expression and not a simple variable name.

In an optimized interpreter, could the activation record for procedures be merged with the symbol table of that procedure?

(a) No

(b) Yes, if the language disallows recursive calls

(d) Yes, but only for procedures that don't define local variables

Answer: (b). This is a subtle one. A procedure that's not recursive will always have one activation record (like in FORTRAN). In that case, it could be merged with the symbol table.

When accessing variables defined in the enclosing scope in a compiled PL/0 program, what can you say about the runtime access speed?

(a) Constant

(b) Linear in the depth of scoping; the deeper the nesting, the longer the access time

(d) Linear in the length of the identifier name

Answer: (b). We discussed in class how accessing variables in syntactically enclosing scopes involves fetching and dereferencing the frame pointer as many times as levels we go up. This fetching takes time that increases linearly with the depth of scoping. The variable names business is a red herring because you don't care to store variable names after compilation.

When calling a function in a given language, the order of pushing the arguments onto the stack (left-to-right versus right-to-left) is decided by:

(a) The complexity of the expressions that form the actual arguments

(b) Whether the function has a fixed or variable number of arguments

(d) The language and the implementor

Answer: (d) It's indeed up to the language definition and the implementation on which convention is chosen. The answer (b) was a red herring; if you implement right-to-left passing you make it easier to implement variable arguments functions, but a language could choose to implement variable arguments in a myriad of ways, many of which don't prescribe right-to-left passing.

Alignment issues are caused by:

(a) Different widths of the internal and external processor bus

(b) The quirks of transferring data between cache and main memory

(d) The Virtual Memory Manager (VMM)

Answer: (c). As we discussed in class, portions of the memory can only be addressed by certain address lines, which means that, for example, the least significant bit of an integer can only sit in a specific subset of addresses – such as, those divisible by 4.

Under what circumstances could an implementation of the C++ programming language store a variable of type bool in a single bit?

(a) That variable is a compile-time constant, i.e., it is initialized with a constant and then never written

(b) The address of that variable is never taken

(d) That variable is defined with the register qualifier

Answer: (b). If someone takes the address of the variable, the information about the datum must fit in a standard pointer, so there's no place to store the actual bit position. The answer (d) looks plausible; you'd think you wouldn't be allowed to take the address of a register, but, as we talked durring the raffles, in C++ you can.

Consider the following C++ code:

void Fun()
{
    char* p = new char[sizeof(int) + 1];
    int* pInt1 = (int*)p;
    ++p;
    int* pInt2 = (int*)p;
    *pInt1 = 1;
    *pInt2 = 2;
}

(a) The code likely has an alignment issue, and it depends on the system whether a run-time error/crash will occur

(b) The code does not have an alignment issue

(d) The code has an alignment issue and a buffer overrun issue

(e) The code only has a buffer overrun issue

Answer: (a). We discussed similar code in class. The two integers are one byte off from each other, and on most systems this engenders an alignment problem. There is no buffer overrun. Whether the code crashes or works (potentially slowly) depends on how the operating system handles the alignment exception generated by the processor.

In C and C++, can a type require more alignment than its own size?

(a) Yes; the typical example is float numbers. Although they might be only 4 bytes in size, they sometimes need 8 bytes of alignment due to the way some floating-point units work.

(b) No, because that would mean you can't compute the position of data in an array by multiplying the index with sizeof(element), and this is how array offsets are supposed to be computed

(c) Yes, because for example floating point numbers can internally (in the processor registers) hold more bits of precision than exposed externally

(d) No, because the type couldn't be stored properly in structures and classes

Answer: (b). Fundamentally, this is how all pointer arithmetic works in C and C++. If a type does not need all bytes but has stringent alignment requirements, the compiler will pad that type.

Which matrix format storage is better, row-major or column-major?

(a) Depends on the style of iteration that's most commonly used

(b) Column major because most scientific computations scan arrays columnwise

(c) Given that both styles of iteration are common, row-major has an edge because it allows the type system to naturally implement multidimensional arrays as arrays of arrays of arrays...

(d) Given that both styles of iterations are common, column-major has an edge because it allows the type system to naturally implement multidimensional arrays as arrays of arrays of arrays...

Answer: (c). We discussed that both column-wise and row-wise iterations are commonly used in ordinary operations such as matrix multiplication. Furthermore, compounding types naturally leads to a row-major storage scheme.

What disadvantages do fixed-size arrays have?

(a) Make static checking of program harder

(b) Impose bounds checking at each access in the array

(c) Make writing reusable code harder, because the length of the array is part of the type; therefore functions written for arrays of size 1024 won't simply work with arrays of other sizes

Answer: (c). This answer comes mostly by eliminating (a), which is not the case (on the contrary, the compiler has richer type information so static checking is actually easier), and (b), which is bogus: bounds checking is needed for both kinds of arrays. It could be even said that bounds checking is more stringent for dynamically-allocated arrays; for statically-sized arrays, the compiler can eliminate some bound checks if it can prove the index won't run over the statically-known size.

What does “static allocation” mean for a variable?

(a) It has statically-known size and is allocated on the program stack

(b) It is allocated in the static data area of the executable program

(d) It is invisible outside the function where it is defined

Answer: (b). This is the definition we discussed in class. Choice (a) is bogus, and (c) and (d) are two effects of the overloading of the static keyword in C and C++. (Ok, that was a bit cunning.)

Consider the following C++ function:

void Fun(int a, int b)
{
    if (a == b)
    {
        double d1 = a / 4;
    }
    double d2 = b / 4;
}

What can you say about the addresses of d1 and d2?

(a) They could be allocated at the same address, because their lifetimes are not overlapping

(b) They cannot be allocated at the same address

(c) They can be allocated at the same address because double is a primitive type. If in the code above you replace double with a class type that has constructors and destructor, then d1 and d2 cannot possibly be allocated at the same address.

Answer: (a). These variables have non-overlapping lifetimes, so the compiler could allocate them at the same address. There's no possible confusion about the identity of the value at that address. (This is not what most compilers do; most use a simple greedy algorithm that will allocate one double inside Fun's scope and one more double inside the if statement. In my tests, of three compilers – Metrowerks CodeWarrior 7.0, Microsoft Visual C++ .NET Everett Final Beta, and gcc 2.95.3-6 – only gcc saves stack space by allocating d1 and d2 at the same address.)

The issue on whether the type has cdtors or not is irrelevant.

When accessing variables defined in the enclosing scope in a compiled C or C++ program, what can you say about the runtime access speed?

(a) Constant because in C and C++ there is only the global scope and the local scope

(b) Linear in the depth of scoping; the deeper the nesting, the longer the access time

(d) Linear in the length of the identifier name

Answer: (a) In C and C++ you can't access variables in all hierarchically enclosing scopes. It's either your scope or the global scope. Again, identifier names will not be part of the compiled program.

What inherent limitations imposes stack allocation on language power?

(a) Stack variables cannot have their address taken

(b) Stack variables cannot escape their context, and therefore supporting first-class functions is difficult

(d) Stack variables make variable lookup rules overly complicated because of nested scopes

Answer: (b). Pretty much all other answers are fluff.

In PL/0, the dynamic link is conceptually needed:

(a) Solely for knowing where to return when the function exits (the dynamic link is practically the return address)

(b) For the purpose mentioned at (a), plus to access variables defined in caller's scope

Answer: (a). As we discussed again during the final review, PL/0 is just a syntactically scoped language; a procedure has only access to variables defined in its own scope and in the lexically-enclosing scopes. That's what the static link is used for. The dynamic link's only semantics is that of a return address.

In class we discussed a non-standard C library function called alloca bearing the following prototype:

void* alloca(size_t size);

What does alloca do? Select the answer that best describes alloca.

(a) Allocates memory on the stack of the executing function. To free that memory, you need to call the pair function dealloca.

(b) Allocates memory on the stack of the caller of the executing function. The memory will last after the current function returns, but will be automatically reclaimed when the caller returns.

(c) Allocates memory on the stack of the executing function. When the executing function returns, memory will be automatically reclaimed.

(d) Allows the programmer to control local variable allocation

(e) Allows functions to return arrays by value

Answer: (c). Simply put, alloca subtracts size from your stack pointer. The return sequence will take care of restoring the stack pointer without any intervention from the programmer.

In our PL/0 implementation, the static link is conceptually needed:

(a) For accessing variables defined in caller's scope

(b) For for knowing where to return when the function exits

Answer: (a), as discussed in answer 18.

What can you say about the relationship between the stack pointer and the frame pointer in a PL/0 implementation in which the stack grows downward?

(a) FP is always greater than SP

(b) FP is always smaller than SP

Answer: (a). FP always points to the current frame, which is within valid, allocated stack space.

Consider the following program written in pseudo-C:

int a[] = { 1, 2, 3, 4, 5 };
int i = 0;

int Fun(int x)
{
    x += a[0];
    i += 2;
    x += a[0];
    return x;
}

int main()
{
    cout << Fun(a[i]);
}

We said pseudo-C because we can tweak the calling convention used by this language. Under which calling convention will the program print the largest number?

(a) Call-by-value

(b) Call-by-reference

Answer: (c) Call-by-value returns 3, call-by-reference returns 4, call-by-name returns 5.

What is a closure as discussed in class?

(a) The set of variables accessible at a given point in a program

(b) The set of variables local to a given function

(d) The act of closing a deal with a used car salesman

Answer: (c).

What can be said about the stack space needed to execute a function?

(a) It is statically known, unless the function uses tricks such as alloca

(b) It can't be known statically because the exact stack space needed depends on the control flow, recursion etc.

Answer: (b). Only the stack frame size is known statically.

What is the quintessential advantage of a compiler over an interpreter?

(a) Better type checking

(b) Faster execution

(d) Adherence to binary standards

Answer: (b). An interpreter can do (a) and (d), and some interpreters beat compiled code when it comes about small size.

In our PL/0 implementation, where are constants' values held?

(a) In the symbol table

(b) In the activation record

(d) In the AST after parsing

Answer: (a).

In a compiled C++ program, is the symbol table loaded at runtime?

(a) Yes, because C++ allows definition of local classes which can define functions

(b) Only in programs that use non-local control flow, such as exceptions.

Answer: (c).

What is the structure of the symbol table of a program?

(a) Singly-linked list

(b) Doubly-linked list

(d) Tree with a different topology than the AST of the program

(e) Array

Answer: (d).

Consider the types int and double as the archetypal integer and floating point types. If you were to establish a relationship between these types, what would you say?

(a) int could inherit double

(b) double could inherit int

Answer: (c). None of these types is substitutable for the other.

Consider two types: Point (having as members two coordinates) and ColoredPoint (having as members two coordinates and a color). If you were to establish a relationship between these types, what would you say?

(a) Point could inherit ColoredPoint

(b) ColoredPoint could inherit Point

(d) Either of (a), (b)

Answer: (b).

In our PL/0 implementation, what is the relationship between the hierarchy rooted in Type and that rooted in TypeAST?

(a) TypeAST is the type as parsed, and Type is the type used for typechecking purposes

(b) Type is the type as parsed, and TypeAST is the type as stored in the AST.

(d) TypeAST inherits Type

Answer: (a).

How does PL/0 solve the dangling else problem?

(a) The if-else statement is defined in a manner that avoids the dangling else altogether

(b) The IfStmt class has a simple hack to take care of that

Answer: (a). By the language definition, after the then clause, you either see end (case in which you're done) or else (case in which there's an else branch). There's no ambiguity.

How is an assignment typechecked in the new (enhanced) PL/0?

(a) The left-hand-side and right-hand-side expressions must be of the same type

(b) The left-hand-side and right-hand-side expressions must be of the same type, and the left-hand-side expression must refer to an lvalue.

(c) The left-hand-side and right-hand-side expressions must be of the same type, and the left-hand-side expression must be a variable name.

(d) The right-hand-side type must be convertible to the left-hand side type, and the left-hand-side expression must refer to an lvalue.

Answer: (b). Arrays make answer (c) insufficient.

What can you say about downcasts and upcasts in C++?

(a) Downcasts are always safe, upcasts are sometimes unsafe

(b) Upcasts are always safe, downcasts are sometimes unsafe

(d) Both are safe

Answer: (b). Upcasts are derived-to-base conversions and are always safe. Downcasts (base-to-derived) are ok only if the pointer or reference actually refers to what the cast claims it refers.

A grammar is ambiguous when:

(a) A symbol name appears duplicated

(b) Two different textual inputs parse to the same AST

(d) There are valid inputs that cannot be parsed into a valid AST

Answer: (c).

In a recursive descent parser:

(a) Each terminal is a function

(b) Each nonterminal is a function

(d) Each nonterminal is a class

Answer: (b). In our compiler (d) is also true, but that's not the case in general.

In an LL(k) grammar, k stands for:

(a) The maximum number of nonterminals that can build a valid production

(b) The maximum number of nonterminals that a parser must look ahead to decide which production to use

Answer: (c), by definition.

For a string µ of terminals and non-terminals, FIRST(µ) is:

(a) The set of terminals that begin strings derived from µ, together with ε, if µ can derive ε.

(b) The first k elements in µ, assuming the grammar is LL(k)

Answer: (a), easy to guess given that the others are utter bogus :o}.

For an LL(1) grammar, the PREDICT function:

(a) Takes a terminal, returns a nonterminal

(b) Takes a nonterminal, returns a terminal

(d) Takes a nonterminal and a terminal, returns a set of productions

(e) Takes a terminal, returns a production

Answer: (c). PREDICT takes a nonterminal (the sentential structure we're looking for, initially the start symbol) and a terminal (the incoming symbol), and returns the production to choose (only one because we're dealing with an LL(1) grammar).

Which part of the build process does yacc automate?

(a) Scanning

(b) Parsing

(d) Code generation

Answer: (b).

In a compiler with a classic structure, would the code generator need to invoke the scanner?

(a) No

(b) Yes, but only when macro preprocessing is involved

Answer: (a). The code generator has no business invoking the scanner.

Consider the following routine for scanning integers in our compiler:

Token* Scanner::GetInt()
{
    int integer = 0;
    while (isdigit(CurrentCh)) 
    {
        integer += CurrentCh - '0';
        integer *= 10;
        GetCh();
    }
    return new IntegerToken(integer);
}

This routine has a bug, can you spot it?

(a) The subtraction of '0' from CurrentCh is not needed

(b) The integer must be multiplied by CurrentCh – '0' and must be added 10

Answer: (c). All others are visibly fluff, easy to detect with a couple of simple tests.

How would be class inheritance most accurately described?

(a) “Is-a”

(b) “Is substitutable for”

(d) “Extends”

Answer: (b). A derived class must be transparently substitutable for its base class for all code that manipulates base class objects. Anything else is more vague or just wrong.

When scanning a program, the scanner finding an integer will:

(a) Emit the same token for all integers

(b) Emit a distinct token for each integer

Answer: (a). There is one INTEGER token.

An “undefined identifier” error might be caught during:

(a) Scanning

(b) Parsing

(d) Space allocation

Answer: (c). Scanning and parsing don't have enough power; Space allocation occurs long after the kin of each identifier is known.

In C and C++, typedef introduces:

(a) A brand new type with the same behavior as the original type

(b) A type that's structurally equivalent with the original type

Answer: (c). This was a midterm question, too.

When calling a function, the responsibility for clearing up the arguments off the stack after the callee has returned falls on:

(a) The caller

(b) The callee

Answer: (c). Traditionally, Pascal lets the callee do it, C has the caller do it.

Can an implementation pass some arguments in registers (as opposed to the stack)?

(a) No, because the registers are destroyed during the issue of the call

(b) No, because that would mess up the stack frame of the function

(d) No, because the compiler lacks information in the case of separate compilation

Answer: (c). The compiler really has everything at its fingertips (assuming it has fingers).

In a program, in general variables should:

(a) Be as local as possible

(b) Be as global as possible

Answer: (a).

What is the major problem with passing arguments by reference?

(a) Simple/standard functions are hard to implement

(b) Aliasing

Answer: (b).

Hopefully you are still in a good mood as you reached this point... Happy Holidays!