CSE 477 -- Video Imaged Spatial Positioning Project

Preliminary Design Package
    Introduction
    Project Design
       Theory
       Impelemntation
          Image Capture
          Image Processor
          Data Correlator
          Data Presentation
    Project Requirements
    Parts List
    Design Analysis
    Test Plans
    Design Issues
    Technical References
    Powerpoint Presentation
    Word Format
    Group H Design Review
    Design Review (Word)

Final Design Report

Product Brochure

Final Project Report

PROJECT DESIGN

Triangulation is the idea behind stereo-vision. Stereo-vision allows things with two eyes (i.e. animals, VISPS) to determine X, Y, and Z coordinates for an arbitrary point in space. First, we will describe the theory and mathematics of triangulation and stereo-vision that make VISPS possible. Second, we will discuss how these calculations are implemented in hardware.

Theory

Figure 1 is an example of the visual data we will have to work with.

Figure 1 -- The view of our two cameras in an example scene

Two cameras are located one foot apart and are both facing in the same direction as shown in Figure 6. From this data, our goal is to discover functions that will tell us the X, Y, and Z coordinates of the white dot (the laser marker appears as a point of saturation). X is defined as the horizontal location of that dot relative to a point exactly between the two cameras with positive X to the right of center and negative X to the left. Y is defined as the vertical component of the distance from the point exactly between the two cameras to the white dot. Z is defined as the distance that the white dot is in front of the cameras.

To discover these coordinates, the important values we will need from the visual data are the horizontal and vertical components of the angles from each camera to the laser marker.

There are several basic techniques for extracting the angle from the data. All techniques rely on the fact that the image at a given pixel location always exists at a fixed angle from the camera. For example, 20 pixels from the left edge of the frame are always five degrees from that edge, no matter how near or far the object at that pixel is.

We use the following algorithm to determine the angles between each camera and the laser marker.

First, we will determine the total viewing width (M) and viewing height (P) at a given distance (N). Figure 2 below shows the widths and distances we will predetermine.

Figure 2 -- Finding the viewing width of the camera

After predetermining M, P, and N as shown in Figure 2, we then find the pixels that correspond to the laser marker. This is done through taking three successive frames, averaging the frames together to remove white noise, and then locating the set of brightest intensity values within a certain pixel distance from each other. This set of pixels corresponds to the marker. From the set of pixels, we will identify the pixel that is at the center of the marker. For the rest of our computations, we use the row and column index of this pixel as the location of the laser marker.

Since we know that an image width contains 356 pixels and the image height contains 292 pixels, we know that the center of the image is located at column 178 and row 146. From this information we know that the height of the pixel at distance N is defined as:

We also know that the horizontal deviation distance at a distance N is defined as:

From the laser marker's pixel coordinates we can find the horizontal and vertical deviation from the point half way between the cameras. Once we find the two deviations, we can obtain the azimuth and elevation angles as shown in Figure 3.

Figure 3 - Calculating the azimuth and elevation angle from the vertical and horizontal deviations

The azimuth angle is determined by the following equations:

We choose to determine the angles in this manner, rather than using a lookup table, for two reasons. First a lookup table requires a considerable amount of memory; we do not feel that the memory overhead is justified. Secondly, if we use a lookup table the number of entries in the table will limit our precision. We feel that by actually performing the floating-point calculations we will be able to obtain better accuracy.

Now that we have the angles from each camera we have enough information to compute the spatial coordinates of the laser marker.

To find the X coordinate we use the two azimuths obtained from the pixel coordinates. Because the cameras are at a fixed length apart and each azimuth is associated with a specific camera, we can construct the triangle shown in Figure 4 below.

Figure 4 - Triangle created from camera distances and azimuths

In Figure 4 the location exactly half way between the cameras will be used as the reference point. From the triangle in Figure 4, the value for C (in inches) can found by using the following equation:

After locating C, the X coordinate can be determined by using the equation:

To find the Z coordinate, the C value will be used again. The relationship between C and the Z coordinate is defined by the following equation:

Because the laser marker can never be behind the cameras, the value of Z will always be positive. To calculate the Y coordinate, we will use the following equation:

Implementation

VISPS consists of four distinct subsystems:

Image Capture
Image Processor
Data Correlator
Data Presentation

Figure 5 below shows the relationship between each of these subsystems.

Figure 5 - Subsystem relationships in VISPS

Image Capture Subsystem

The image capture subsystem uses two cameras and a laser marker. The purpose of the cameras is to take digital snapshots of the environment containing the laser marker and to feed the resulting images to the image processor subsystem.

The two cameras are Spectronix RC-2BW RoboCams. The digital image is in CIF format with a size of 356 x 292. The cameras will be mounted so they face the same direction and the centers of the two cameras are one foot apart. Figure 6 below shows this configuration.

Figure 6 - VISPS camera setup

We initially considered using a high contrast sticker as the marker. However, if we use a sticker we can only find the marker by searching for the sticker's color or shape. Neither of these alternatives is practical. Firstly, the cameras can only produce black and white images so searching for color is impossible. Secondly, our initial research revealed that searching for a specific shape in an image is a difficult computational problem.

We found that the best marker option was to use a high intensity laser pointer. A laser delivers a high intensity and well-defined beam of light. This beam can easily be directed at an arbitrary point or be redirected into a baffle as shown in Figure 7 to form a flare. To construct the laser flare, a white baffle of reflective material will cross the laser's beam to reflect light towards the cameras.

Figure 7 - Marker construction with baffle demonstration

This flexibility makes the idea of using a laser attractive from the user interface perspective. Furthermore, the great intensity of the laser means that we can turn down the saturation of the frame to a point where only the beam saturates pixels in the frame. This makes our job of finding the laser marker much simpler. To find the laser marker we simply look for the brightest pixels in the frame. Once these pixels are located, we have found the laser marker.

Image Processor Subsystem

The purpose of the image processor is to take a pair of images (one from each camera) and find the pixels in each image that correspond to the marker. Once the set of pixels is found, the image processor will find the center point of the set of pixels and return the pixel coordinates to the data correlator. The pixel coordinates of the marker in the image frame will be defined as the pair: (row, column). Figure 8 below shows the coordinates that will be generated.

Figure 8 - Pixel coordinates from image frame

After the pixel coordinates are determined, the row coordinates from the two cameras are compared. Since both cameras are at the same altitude, they should produce approximately the same row coordinate (See Figure 1). We will use this property as an initial error-checking scheme. The image processor will make sure that the two row indices are within six pixels of each other. If this condition fails, the image processor will resample the image from the cameras and recalculate the pixel locations. Otherwise, the image processor will send the two column indices and a row index. The row index will be an average of the row indices obtained from the two cameras.

To implement the image processor, we will use the XS-40 development board and program the onboard FPGA. The FPGA will contain the logic needed to find the pixels that make up the laser marker. The communication between the XS-40 and the camera is defined in the Project Requirements section

Before we look for the laser marker we need to filter out the noise. The image data is speckled with random points of saturation that will confuse the logic that looks for the laser marker. We will filter the image by averaging the intensity of every pixel over three frames before looking for the laser. Points that were saturated randomly will not appear as saturated in all three frames. The average will dim the noise so that it will not be confused with the laser marker since the pixels representing the laser marker will be saturated in all three frames.

Unfortunately, the laser marker does not appear as a single pixel point in the image data. The saturation region surrounding the laser marker is several pixels across. Not only do we want to locate the laser region, but we also want to determine a reasonable center of the saturation region to increase accuracy. Since we have already filtered out the noise and the image is dimmed so that no saturated regions exist except for the region saturated by the laser marker, we can look for the n brightest points in the averaged frame and then average the column and row values of these points. The optimal value for n will be empirically determined once we have a working system. This will give us a sufficiently accurate estimate of the laser marker's center.

The communication between the data correlator and the image processor will be handled through a four-way handshake as described in the Project Requirements section.

Data Correlator Subsystem

After the image processor produces the two column coordinates and the single row coordinate, the data correlator will find two azimuth angles and the elevation angle. Then the data correlator applies trigonometric identities as defined in the Math and Theory section to determine the spatial X, Y, Z coordinates of the laser marker. We will implement the data correlator on an Atmel 8051 Microcontroller.

The reason we choose to implement the data correlator on an Atmel Microcontroller is because of its processing capabilities. The data correlator needs to handle a number of trigonometric functions with some degree of accuracy. This means floating-point arithmetic is required. Although the Atmel Microcontroller does not have native support for floating-point operations, it can emulate floating-point operations with some reduction in throughput. In VISPS, the decrease in throughput of the Atmel Microcontroller is not a concern because the data will be streaming out through the RS-232 port when the data is ready and not when the data is demanded.

Data Presentation Subsystem

To present the data, we chose to stream the data using the RS-232 protocol. The primary reason is that the Atmel Microcontroller has native support for the protocol. We will use the RS-232 protocol with the parameters listed in Table 1 below.

Table 1 - RS-232 protocol parameters

Parameter	Value
Baud Rate	19,200 bps
Number of bits	8
Parity	None
Number of stop bits	1
Flow Control	None

The exact transfer protocol is discussed in detail in the Project Requirements section.