|
|
|
|
Homework 2 - z86 Programming and Implementation
Out: Friday January 13
Part A Due: Wednesday January 19, 11:59 PM
Part B Due: Wednesday January 26, 11:59 PM
Turnin: Online
Assignment Goals
- Get a feel for what hardware instruction sets look like
- Get a feel for what the hardware does
- Get a feel for what the code that is actually running on the hardware is like
- Get a feel for what an assembler does
Overview
This is a two part assignment. For Part A you will write a small
program in machine code. There is no compiler or assembler, so you'll
be writing in hex.
For Part B you will implement a processor simulator in C. The
simulator will be capable of running the program you wrote in Part A.
Both parts of the assignment are based on the z86 architecture, which is a version of the y86
architecture, which itself is a simplification of the x86 architecture.
The y86 architecture is described (primarily) in Section 4.1 of the text,
and you'll find that section helpful. The x86 architecture is described
(primarily) in Section 3 of the book, and you shouldn't need to look at that
for this assignment.
Differences between he z86 and y86 architectures are described here.
z86 Architecture Overview
An instruction set architecture (ISA) defines the components of the hardware interface
that are visible to a programmer. This includes things like the number of registers,
the number of bits in each register, the instructions that are available, and how
the instructions are encoded as bit strings.
Programmer Visible State
The "programmer visible state" of the z86 is nearly identical to that
of the y86, shown in Figure 4.1 of the text (page 337):
- There are 8 32-bit registers, which are programmer controlled storage
on the CPU chip.
- There is a program counter (PC), which holds the memory address of the next
instruction to be executed. The PC is incremented automatically by the
hardware to the next sequential instruction each time an instruction
is fetched from memory. Programmers can cause non-sequential execution
using jump (sometimes called branch) instructions.
Branches can be conditional.
Instruction execution always begins at PC=0 on the z86.
- There is a condition code register (CC) that contains a bit mask.
The bits in the mask are set as a side effect of some instructions.
Conditional branches either jump or not depending on the current value of the CC
and the type of conditional branch.
(See Table 3.12 of the text, page 190.)
The z86 condition code register differs from the y86 one:
there is no overflow bit (OF) in the z86.
- There is a program status word. In the z86 this is almost not visible to
the programmer, as it indicates simply whether or not the processor has been halted.
(So, while instructions are executing, the status is "running." The only other status
is "halted," at which point no further instructions are executed.)
- There is main memory, which holds both program instructions and data.
The z86 is little endian.
While the maximum amount of memory that can be configured on a system is an aspect
of the machine architecture, the amount that is actually configured is not.
The z86 implementation we'll use is configured with 1024 bytes (1KB) of memory.
- The z86 does not implement exceptions (Section 4.1.4 of the text).
Instruction Set
The z86 instruction set is primarily a subset of the y86, the sole exception being that the
z86 includes a "print" instruction.
- The z86 architecture includes the
halt, nop, rrmovl, irmovl, rmmovl, mrmovl, addl, subl, andl, xorl, jmp, je, and jne instructions from the y86 ISA.
Instruction encoding and operation is identical to those instructions. (See Section 4.1 of the text.)
- The z86 includes a print instruction, which prints the contents of a register in this format:
[r5] 0x000000C
The instruction is encoded as C0Fr ,
where r is the number of the register to be printed.
For example, the output just shown might result from executing C0F5 .
- Only the
addl, subl, andl, and xorl
instructions set the CC. They always set the CC. No other instruction
modifies the CC.
- The z86 halts when it fetches a bit string that is not a legal encoding of
any instruction in its ISA.
Preliminaries
Fetch and unarchive hw2.tar.gz.
Inside you will find a fully implemented z86 simulator (referenceSim ),
sample codefiles in the format the simulator accepts (e.g., printMem.pgm ),
and a set of .c and .h files and makefile used in
Part B.
Part A Instructions
Implement in z86 machine code a program that iterates over an array of integers
and computes and prints the difference between successive elements.
Here is a rough equivalent in pseudo-C:
int array[];
int i;
for ( i=1; i < array[0]; i++ ) {
printf("0x%08x", array[i+1] - array[i]);
}
Note that element 0 of the array gives the number of elements in the array (not counting the 0th one).
Details of the required z86 implementation
- You know at code time that the address of the array is stored in the last word of memory (i.e., the four
bytes starting at address 102010). (It's your responsibility to put the address there, as part
of your program.) The array itself can be located in any otherwise free memory.
- The 0th element of the array is the number of array elements that follow.
- You compute the difference between successive elements in some register, and then use the z86
print instruction to print them.
- You execute a
halt instruction when done.
- You don't need do any validity checking for the input - if the data is wrong somehow, the program gets
undefined results (meaning anything it does is okay).
Here is a picture of the memory layout:
Debugging
You can use the reference simulator to test your program:
$ ./referenceSim <hw2.pgm
The code file, hw2.pgm , contains your machine code program.
Here is a little example (included in the file distribution) to show the
syntax accepted by the simulator:
# printMem.pgm
# Prints the contents of memory between the addresses
# registers 0 and 1 are initialized to. The upper bound
# address must be larger than the lower bound address by a multiple
# of 4 bytes.
# (Goes into an infinite loop if you make a grievous initialization error.)
# Author: jz
# Date: 1/13/2011
# Register usage:
# 0: lower bound address
# 1: upper bound address
# 2: $4
# 3: scratch register
30F000000000 # 0x00: irmovl $0, r0 lower bound address = 0
30F120000000 # 0x06: irmovl $0x20, r1 upper bound address = 32
30F204000000 # 0x0C: irmovl $4, r2
2013 # 0x12: loop: rrmovl r1, r3 starting checking if we're done
6103 # 0x14: subl r0, r3 next address == stop address?
732A000000 # 0x16: je done if yes, go to done
503000000000 # 0x1b: mrmovl 0(r0),r3 fetch next word
C0F3 # 0x21: print r3 print it
6020 # 0x23: addl r2, r0 move pointer forward one word
7012000000 # 0x25: jmp loop do it again
00 # 0x2A: done: halt
@C0 # Load what follows starting at address 0xC0
01020304 # This is not needed by this program. It's here just to illustrate
# the syntax the loader accepts.
A '#' character begins a comment, which ends at the end of the line.
Values to be stored into memory are given as strings of bytes, written in (case insenstive) hex.
Successive bytes in the input file, reading left to right, are written to successive bytes of memory.
In this sample file, byte 0 of memory gets value 0x30, byte 1 gets 0xF0, byte 2 gets 0x00, etc.
The special symbol '@' requests that the memory location about to be written be reset to the
location that immediately follows, written in hex.
For instance, the '@C0' results in memory byte 0xC0 having value 0x01, byte 0xC1 having value 0x02, etc.
To invoke:
$ ./referenceSim <printMem.pgm
[r3] 0x0000f030
[r3] 0xf1300000
[r3] 0x00000020
[r3] 0x0004f230
[r3] 0x13200000
[r3] 0x2a730361
[r3] 0x50000000
[r3] 0x00000030
Halt instruction at PC = 0x0000002a
Halted. 63 instructions executed.
Note that most of my file is comments. Because there's a chance your first attempt to write the program
won't work, you're likely to need to edit it. Trying to read the most spartan possible file (one containing
just the hex encoding of the instructions) gets old very fast. The comments help me debug (although I have
a tendency to edit the comment but forget to edit the actual instruction!).
One of the more tedious parts of writing the machine instructions is figuring out the addresses they're loaded
at, which is needed for the various jump/branch instructions. I do that by cheating: I initially give target addresses of 00000000, then run the simulator and have it tell me what the addresses actually are (using a feature described
next), then go back and fill in the actual branch/jmp target addresses.
A little reflection on the tasks you're having to do should make it totally clear what the purpose of an assembler is!
Tracing Execution
As a debugging aid, the reference simulator will print some minimal trace information as it fetches each instruction:
the PC and the opcode byte. You request that using the -t switch:
$ ./referenceSim -t <printMem.pgm
0: PC = 0x00000000 opcode = 0x30
1: PC = 0x00000006 opcode = 0x30
2: PC = 0x0000000c opcode = 0x30
3: PC = 0x00000012 opcode = 0x20
4: PC = 0x00000014 opcode = 0x61
5: PC = 0x00000016 opcode = 0x73
6: PC = 0x0000001b opcode = 0x50
7: PC = 0x00000021 opcode = 0xc0
[r3] 0x0000f030
8: PC = 0x00000023 opcode = 0x60
9: PC = 0x00000025 opcode = 0x70
10: PC = 0x00000012 opcode = 0x20
...
Turn-in
Hand in just your hw2.pgm file.
Part B Instructions
Implement the functionality of the reference simulator.
Your work should be wholely contained in file sim.c .
You are provided with a skeletal sim.c , and a
file that implements reads and writes to the simulated memory
(memory.c ), as well as the .h files C requires
to glue them together.
The skeletal sim.c contains just things that are hard
for a beginning C programmer to write, and easy to get wrong.
In particular, the provided code (a) deals with the -t switch,
and (b) invokes the loader to read the input and load it into memory.
Because the loader needs the simulated memory object, there is a declaration
for it in sim.c : the mem array.
You will need to declare other C variables for the other components of the
z86 architecture (e.g., the registers).
memory.c implements the loader, as well as routines that let
your C program read and write either a byte or a word from/to the simulated
memory.
- You should never read or write the
mem array directly in your code.
Always use the methods contained in memory.c .
- Words are always read/written in little endian order, including 32-bit values
contained within the encoding of an instruction.
The output of print
For grading reasons, we are semi-picky about the format of the output of the print
command. It should look like the output produced by the reference simulator,
except that anywhere there is a space we're willing to accept any positive number of
spaces or tabs.
To print in hex in C, you do something like this:
printf("0x%08x", myval);
The string argument is a format string. C prints it literally, except that the '%' character
is special. In this example, the '08' means print 8 digits, and print leading 0's if necessary.
The 'x' means print in hex. Other specifiers include %d, for decimal, and %f for float.
More than one specifier can be given in the format string. Specifiers are matched, in order, with arguments
that follow the format string. For example:
printf("An int: %d A float: %f\n", myInt, myFloat);
might print "An int: 2 A float: 1.3\n".
The Linux shell command "man 3 printf " will give you endless details.
Notes
- I've set things up so that you can use the string
byte as a type,
even though it is not built into C.
It is an alias for unsigned char .
- The string MEMSIZE is an integer representing the size of the memory configured on the
machine (1024). You can, and should, use it in place of the literal value if you find you need
the memory size in your code.
- The reference simulator tries to detect and complain about invalid instructions -- any bit string
pointed at by the PC that isn't a completely valid instruction.
To simplify your implementation, you can ignore invalid instructions by simply declaring that the
behavior of the processor is "undefined" for invalid instructions. You should get correct results
for all correct instructions, however.
- It is not an error to reference a memory location that doesn't exist. All memory addresses are
taken modulo the memory size. The string MEMADDRMASK is a hex constant, 0x3FF, that can be used as
an
and -mask for memory addresses (to compute the modulo).
- Memory addresses are unsigned.
- The reference implementation contains 84 more semi-colons than the skeletal code.
- Don't modify anything except
sim.c .
Turn-in
Turn in just your sim.c file.
|