CSE 351: The Hardware/Software Interface, Winter 2011
  CSE Home   About Us   Search   Contact Info 
 
Course Home
  Home
Administation
  Overview
  Course email
  Anonymous feedback
  View feedback
 
Assignment Utilities
  Home Virtual Machines
  Homework Turnin
  Class GoPost Forum
  Gradebook
 
Most Everything
  Schedule
    Homework 2 - z86 Programming and Implementation
Out: Friday January 13
Part A Due: Wednesday January 19, 11:59 PM
Part B Due: Wednesday January 26, 11:59 PM
Turnin: Online

Assignment Goals

  • Get a feel for what hardware instruction sets look like
  • Get a feel for what the hardware does
  • Get a feel for what the code that is actually running on the hardware is like
  • Get a feel for what an assembler does

Overview

This is a two part assignment. For Part A you will write a small program in machine code. There is no compiler or assembler, so you'll be writing in hex. For Part B you will implement a processor simulator in C. The simulator will be capable of running the program you wrote in Part A.

Both parts of the assignment are based on the z86 architecture, which is a version of the y86 architecture, which itself is a simplification of the x86 architecture. The y86 architecture is described (primarily) in Section 4.1 of the text, and you'll find that section helpful. The x86 architecture is described (primarily) in Section 3 of the book, and you shouldn't need to look at that for this assignment. Differences between he z86 and y86 architectures are described here.

z86 Architecture Overview

An instruction set architecture (ISA) defines the components of the hardware interface that are visible to a programmer. This includes things like the number of registers, the number of bits in each register, the instructions that are available, and how the instructions are encoded as bit strings.

Programmer Visible State

The "programmer visible state" of the z86 is nearly identical to that of the y86, shown in Figure 4.1 of the text (page 337):

  • There are 8 32-bit registers, which are programmer controlled storage on the CPU chip.
  • There is a program counter (PC), which holds the memory address of the next instruction to be executed. The PC is incremented automatically by the hardware to the next sequential instruction each time an instruction is fetched from memory. Programmers can cause non-sequential execution using jump (sometimes called branch) instructions. Branches can be conditional.
    Instruction execution always begins at PC=0 on the z86.
  • There is a condition code register (CC) that contains a bit mask. The bits in the mask are set as a side effect of some instructions. Conditional branches either jump or not depending on the current value of the CC and the type of conditional branch. (See Table 3.12 of the text, page 190.) The z86 condition code register differs from the y86 one: there is no overflow bit (OF) in the z86.
  • There is a program status word. In the z86 this is almost not visible to the programmer, as it indicates simply whether or not the processor has been halted. (So, while instructions are executing, the status is "running." The only other status is "halted," at which point no further instructions are executed.)
  • There is main memory, which holds both program instructions and data. The z86 is little endian.
    While the maximum amount of memory that can be configured on a system is an aspect of the machine architecture, the amount that is actually configured is not. The z86 implementation we'll use is configured with 1024 bytes (1KB) of memory.
  • The z86 does not implement exceptions (Section 4.1.4 of the text).

Instruction Set

The z86 instruction set is primarily a subset of the y86, the sole exception being that the z86 includes a "print" instruction.

  • The z86 architecture includes the halt, nop, rrmovl, irmovl, rmmovl, mrmovl, addl, subl, andl, xorl, jmp, je, and jne instructions from the y86 ISA. Instruction encoding and operation is identical to those instructions. (See Section 4.1 of the text.)
  • The z86 includes a print instruction, which prints the contents of a register in this format:
    [r5] 0x000000C
    
    The instruction is encoded as C0Fr, where r is the number of the register to be printed. For example, the output just shown might result from executing C0F5.
  • Only the addl, subl, andl, and xorl instructions set the CC. They always set the CC. No other instruction modifies the CC.
  • The z86 halts when it fetches a bit string that is not a legal encoding of any instruction in its ISA.

Preliminaries

Fetch and unarchive hw2.tar.gz. Inside you will find a fully implemented z86 simulator (referenceSim), sample codefiles in the format the simulator accepts (e.g., printMem.pgm), and a set of .c and .h files and makefile used in Part B.

Part A Instructions

Implement in z86 machine code a program that iterates over an array of integers and computes and prints the difference between successive elements. Here is a rough equivalent in pseudo-C:
int array[];
int i;
for ( i=1; i < array[0]; i++ ) {
  printf("0x%08x", array[i+1] - array[i]);
}
Note that element 0 of the array gives the number of elements in the array (not counting the 0th one).

Details of the required z86 implementation

  • You know at code time that the address of the array is stored in the last word of memory (i.e., the four bytes starting at address 102010). (It's your responsibility to put the address there, as part of your program.) The array itself can be located in any otherwise free memory.
  • The 0th element of the array is the number of array elements that follow.
  • You compute the difference between successive elements in some register, and then use the z86 print instruction to print them.
  • You execute a halt instruction when done.
  • You don't need do any validity checking for the input - if the data is wrong somehow, the program gets undefined results (meaning anything it does is okay).
Here is a picture of the memory layout:

Debugging

You can use the reference simulator to test your program:

$ ./referenceSim <hw2.pgm
The code file, hw2.pgm, contains your machine code program.

Here is a little example (included in the file distribution) to show the syntax accepted by the simulator:

# printMem.pgm

# Prints the contents of memory between the addresses
#  registers 0 and 1 are initialized to.  The upper bound
#  address must be larger than the lower bound address by a multiple
#  of 4 bytes.
#  (Goes into an infinite loop if you make a grievous initialization error.)

# Author: jz
# Date: 1/13/2011

# Register usage:
#  0:  lower bound address
#  1:  upper bound address
#  2:  $4
#  3:  scratch register

30F000000000        # 0x00:     irmovl $0, r0      lower bound address = 0
30F120000000        # 0x06:     irmovl $0x20, r1   upper bound address = 32
30F204000000        # 0x0C:     irmovl $4, r2
2013                # 0x12: loop: rrmovl r1, r3    starting checking if we're done
6103                # 0x14:     subl r0, r3        next address == stop address?
732A000000          # 0x16:     je done            if yes, go to done
503000000000        # 0x1b:     mrmovl 0(r0),r3    fetch next word
C0F3                # 0x21:     print r3           print it
6020                # 0x23:     addl r2, r0        move pointer forward one word
7012000000          # 0x25:     jmp loop           do it again
00                  # 0x2A: done: halt

@C0        # Load what follows starting at address 0xC0
01020304   # This is not needed by this program.  It's here just to illustrate
           # the syntax the loader accepts.
A '#' character begins a comment, which ends at the end of the line.

Values to be stored into memory are given as strings of bytes, written in (case insenstive) hex. Successive bytes in the input file, reading left to right, are written to successive bytes of memory. In this sample file, byte 0 of memory gets value 0x30, byte 1 gets 0xF0, byte 2 gets 0x00, etc.

The special symbol '@' requests that the memory location about to be written be reset to the location that immediately follows, written in hex. For instance, the '@C0' results in memory byte 0xC0 having value 0x01, byte 0xC1 having value 0x02, etc.

To invoke:

$ ./referenceSim <printMem.pgm 
[r3] 0x0000f030
[r3] 0xf1300000
[r3] 0x00000020
[r3] 0x0004f230
[r3] 0x13200000
[r3] 0x2a730361
[r3] 0x50000000
[r3] 0x00000030
Halt instruction at PC = 0x0000002a
Halted.  63 instructions executed.

Note that most of my file is comments. Because there's a chance your first attempt to write the program won't work, you're likely to need to edit it. Trying to read the most spartan possible file (one containing just the hex encoding of the instructions) gets old very fast. The comments help me debug (although I have a tendency to edit the comment but forget to edit the actual instruction!). One of the more tedious parts of writing the machine instructions is figuring out the addresses they're loaded at, which is needed for the various jump/branch instructions. I do that by cheating: I initially give target addresses of 00000000, then run the simulator and have it tell me what the addresses actually are (using a feature described next), then go back and fill in the actual branch/jmp target addresses. A little reflection on the tasks you're having to do should make it totally clear what the purpose of an assembler is!

Tracing Execution

As a debugging aid, the reference simulator will print some minimal trace information as it fetches each instruction: the PC and the opcode byte. You request that using the -t switch:

$ ./referenceSim -t <printMem.pgm 
0: PC = 0x00000000  opcode = 0x30
1: PC = 0x00000006  opcode = 0x30
2: PC = 0x0000000c  opcode = 0x30
3: PC = 0x00000012  opcode = 0x20
4: PC = 0x00000014  opcode = 0x61
5: PC = 0x00000016  opcode = 0x73
6: PC = 0x0000001b  opcode = 0x50
7: PC = 0x00000021  opcode = 0xc0
[r3] 0x0000f030
8: PC = 0x00000023  opcode = 0x60
9: PC = 0x00000025  opcode = 0x70
10: PC = 0x00000012  opcode = 0x20
...

Turn-in

Hand in just your hw2.pgm file.

Part B Instructions

Implement the functionality of the reference simulator. Your work should be wholely contained in file sim.c.

You are provided with a skeletal sim.c, and a file that implements reads and writes to the simulated memory (memory.c), as well as the .h files C requires to glue them together. The skeletal sim.c contains just things that are hard for a beginning C programmer to write, and easy to get wrong. In particular, the provided code (a) deals with the -t switch, and (b) invokes the loader to read the input and load it into memory. Because the loader needs the simulated memory object, there is a declaration for it in sim.c: the mem array. You will need to declare other C variables for the other components of the z86 architecture (e.g., the registers).

memory.c implements the loader, as well as routines that let your C program read and write either a byte or a word from/to the simulated memory.

  • You should never read or write the mem array directly in your code. Always use the methods contained in memory.c.
  • Words are always read/written in little endian order, including 32-bit values contained within the encoding of an instruction.

The output of print

For grading reasons, we are semi-picky about the format of the output of the print command. It should look like the output produced by the reference simulator, except that anywhere there is a space we're willing to accept any positive number of spaces or tabs.

To print in hex in C, you do something like this:

printf("0x%08x", myval);
The string argument is a format string. C prints it literally, except that the '%' character is special. In this example, the '08' means print 8 digits, and print leading 0's if necessary. The 'x' means print in hex. Other specifiers include %d, for decimal, and %f for float. More than one specifier can be given in the format string. Specifiers are matched, in order, with arguments that follow the format string. For example:
printf("An int: %d  A float: %f\n", myInt, myFloat);
might print "An int: 2 A float: 1.3\n".

The Linux shell command "man 3 printf" will give you endless details.

Notes

  • I've set things up so that you can use the string byte as a type, even though it is not built into C. It is an alias for unsigned char.
  • The string MEMSIZE is an integer representing the size of the memory configured on the machine (1024). You can, and should, use it in place of the literal value if you find you need the memory size in your code.
  • The reference simulator tries to detect and complain about invalid instructions -- any bit string pointed at by the PC that isn't a completely valid instruction. To simplify your implementation, you can ignore invalid instructions by simply declaring that the behavior of the processor is "undefined" for invalid instructions. You should get correct results for all correct instructions, however.
  • It is not an error to reference a memory location that doesn't exist. All memory addresses are taken modulo the memory size. The string MEMADDRMASK is a hex constant, 0x3FF, that can be used as an and-mask for memory addresses (to compute the modulo).
  • Memory addresses are unsigned.
  • The reference implementation contains 84 more semi-colons than the skeletal code.
  • Don't modify anything except sim.c.

Turn-in

Turn in just your sim.c file.


Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to zahorjan at cs.washington.edu]