The Assignment

You should be able to do this assignment using only the Arm Instruction Set Quick Reference Card and the following information timing extracted from the SA-1100 Developer's Manual. The "Delay Result" column indicates for how many clock cycles the next instruction must wait if it is dependent on the results of the listed instruction. The "Issue cycles" column is the number of clock cycle it takes for the processor to issue the instruction before the next instruction can be issued into the pipeline. According to the table, any instruction following a data processing instruction can issue one cycle after the data processing instruction has been issued. However, due to pipelining, a data-processing instruction that depends on the results of a load-single instruction must wait an extra cycle before getting started, as if the load instruction really takes two cycles ..

Here is the question: What clock rate would be required for an ARM based SW solution for the bright-spot cross-hairs to match the performance of our FPGA solution running on a 24MHz clock.

  1. Show the ARM assembly code for my simplified version of my bright-spot solution below. It should be written as a subroutine that returns an output pixel value given an input pixel value, the current X,Y location, the "blank" input, and the "vsynch" input.  You may use any instructions you want from the instruction set. See how much you can optimize this. If anyone in the class can beat the best solution between Theodora, Deepak and I, then we will have have Pizza in your name at the following meeting (this is easy!). You must implement my version of the bright-spot algorithm to qualify for the contest. Incidentally, there is something a little bit wrong with my implementation though it would most likely not be noticeable. Can you identify the bug?
  2. Assumptions: The call stack contains the inputs in the following order: pixel, hcnt, vcnt, blank, vsynch The return stack contains only the single output data value. Any static values needed by your subroutine should be stored in memory. You can make up labels for those locations.
  3. How many clock cycles are needed for execution of your subroutine?
  4. What clock rate is needed to compete with the FPGA solution (one data output for each 24MHz clock cycle).
  5. Lite Option: If you are really pressed for time, you can write a psuedo-assembly code version of this routine and give me a 1/cycle per instruction estimate for the answer above. 

/************** My Version of Bright-Spot ****************************/

module bright (HCNT, VCNT, PIXEL, FILTERED, BLANK, CLOCK, VSYNCH ;
input         BLANK, CLOCK, VSYNCH;
input   [7:0] PIXEL ;
input   [7:0] HCNT, VCNT;
output  [7:0] FILTERED;

reg [7:0] ACC, MAX, V, PV, H, PH;
reg [7:0] p1, p2, p3;
reg synch;

assign FILTERED = BLANK ? 0 : (((PV==VCNT)||(PH==HCNT)) ? 255: PIXEL);

always @(posedge CLOCK) synch <= VSYNCH;

always @(posedge CLOCK) begin;
      ACC <= (PIXEL>>2)+(p1>>2)+(p2>>2)+(p3>>2);
      p1  <= PIXEL;
      p2  <= p1;
      p3  <= p2;
end

always @(posedge CLOCK) begin
      if (VSYNCH && !synch) begin // synch pulse is passed (just do this once/frame)
            MAX <= 0;
            PV <= V;
            PH <= H;
      end else if ((ACC > MAX) && !BLANK) begin
            MAX <= ACC;
            V <= VCNT;
            H <= HCNT;
      end
end

endmodule       

 

More Background Stuff

Your might want to familiarize yourself with the following documents, but don't worry about it too much. We will step you through the tools. For now, it is important to learn about the processor. We will get to the development board an tools in lecture.

Once you register with ARM at this link, you can download the Arm Instruction Set Quick Reference Card From ARM's website. You have to register to get access to this. If you don't want to register, let me know and I will e-mail you the document. You can get most of the instruction set information from the ARM SDT User's Guide as well.