CSE 590SS Project 1

Computer Vision (CSE 490CV/EE 400B), Winter 2002

Project 2: Single View Modeling

Assigned: Wednesday, Feb 27, 2002
Due: Tuesday, Mar 12, 2002 (by 11:59pm)

In this assignment you and your partner will create 3D texture-mapped models from a single image using the single view modeling method discussed in class. You may find the following resources useful:

Class lecture notes on projective geometry and single view modeling.
Original paper "Single View Metrology," by Criminisi, Reid, and Zisserman, ICCV 99.

Note that all of the above describe slightly different methods that you can use to compute the same information. Choose the one that you find the most natural and useful.

The steps of the project are:

Image acquisition
Calculate vanishing points
Choose reference points
Compute textures and 3-D positions and create a VRML model
Submit results

Image Acquisition

For this assignment you should take high resolution (preferably at least 800x800) images or scans of at least two different scenes. One of your images should be a sketch or painting. For instance, a photo of a Greek temple and a painting of Leonardo da Vinci's "The Last Supper" might be interesting choices. (We don't want everyone in the class to do these objects, however.) Note also that the object you digitize need not be monumental, or be a building exterior. An office interior or desk is also a possibility. At the other extreme, aerial photographs of a section of a city could also be good source material (you might have more occlusion in this case, necessitating some manual fabrication of textures for occluded surfaces). Be sure to choose images that accurately model perspective projection without radial distortions. You'll want to choose images that are complex enough to create an interesting model with at least ten textured polygons, yet not so complex that the resulting model is hard to digitize or approximate.

Calculating Vanishing Points

Choose a scene coordinate frame by defining lines in the scene that are parallel to the X, Y, and Z axis. For each axis, specify more than two lines parallel to that axis. The intersection of these lines in the image defines the corresponding vanishing point. A vanishing point may be "at infinity". Since the accuracy of your model depends on the precision of the vanishing points, implement a robust technique for computing vanishing points that uses more than two lines. Here is a write-up for a recommended method that extends the cross-product method discussed in class to return the best intersection point of 3 or more lines in a least squared sense, and helper code for solving symmetric matrix equations.

To compute vanishing points, choose line segments that are as long as possible and far apart in the image. Use high resolution images, and use the zoom feature to specify line endpoints with sub-pixel accuracy. A small number of "good" lines is generally better than many inaccurate lines. Use the "save" feature in your program so that you don't have to recalculate vanishing points every time you load the same image.

Choose Reference Points

You will need to set the reference points as described in lecture and in the write-ups. One way of doing this is to measure, in 3-D, when you shoot the picture, the positions of 4 points on the reference plane and one point off of that plane. The 4 reference plane points and their image projections define a 3x3 matrix H that maps u-v points to X-Y positions on the plane. The fifth point determines the reference height R off of the plane, as described in lecture. Alternatively, you can specify H and R without physical measurement by identifying a regular structure such as a cube and choosing its dimensions to be unit lengths. This latter approach is necessary for paintings and other scenes in which physical measurements are not feasible.

Compute 3D Positions

There are two different approaches for computing distances: in-plane measurements and out-of-plane measurements. You can combine these techniques to increase the power of the technique. For instance, once you have computed the height of one point X off of the reference plane P, you can compute the coordinates of any other point on the plane through X that is parallel to P (see the man on box slide from lecture). By choosing more than one reference plane, you can make even more measurements. Be creative and describe what you did to make measurements in your web page.

Compute Texture Maps

Use the points you have measured to define several planar patches in the scene. Note that even though your measurements may be in horizontal or vertical directions, you can include planes that are slanted, such as a roof.

The last step is to compute texture maps for each of these patches. If the patch is a rectangle in the scene, e.g., a wall or door, all that is needed is to warp the quadrilateral image region into a rectangular texture image. You can use the technique described in class to identify the best homography warp between the original and texture image, using the constraints that the four points of the quadrilateral map to the corners of the texture image. We recommend using inverse warping of the pixels in the texture image into pixels in the original image, and bilinear interpolation to evaluate fractional pixel values in the original image. It is best to choose the width and height of the texture image to be about the same as that of the original quadrilateral, to avoid loss of resolution. You may either write your own inverse warping code, or modify the warping code from project 2 to compute homographies instead of cylindrical projections.

If the patch is a non-rectangular region such as the outline of a person, you will need to perform the following steps: (1) define a quadrilateral in the image containing the region you want, (2) warp this into a rectangular texture image, as before, and (3) edit the texture image and mark out "transparent" pixels using your project 1 code or other image editing software.

Create a VRML model

For each image you work from, create a VRML model (see documentation below) with at least 10 texture-mapped polygonal faces. The skeleton code will create the VRML file for you but you need to add texture map images and masks for each polygon, in .gif or .jpg format.

Submit Results

Put your code and executable in the project 3 turnin directory, and your images and VRML models in the artifact directory with a web page project3.htm that contains:

source images, show them both in their original form and with annotations and marks to show which points and lines you digitized (e.g., from a screen shot of the user interface). Give details on where you got the image (name of building, book and page number, artist, etc)
a still image of a new view of the reconstructed scene, fairly far away from the input image.
some of your texture maps, show some of the more interesting ones, commenting on any hand retouching you did (perhaps show before and after retouching, if it was significant)
Include at least one non-quadrilateral object to make the scene more interesting.
VRML files--for each input image.
A description of your approach and analysis of the results.
Describe extensions that would be nice to include if you had more time.

Skeleton Code

We provide a skeleton code for you to start. The skeleton code provide an interface and several basic data structure for you to work with. We hope the skeleton code will save some labor for you. However, you don't have to use our skeleton code.

User Interface

The interface allows you to load an image and add points, lines, and polygons. After you compute the 3D position of those points, you can save the model and reload it for further editing. When you are done, you can dump the model in VRML 2.0 format and view it in VRML viewer.

The file IO related functions are under "File" submenu as usual.

The "Edit" submenu, you have the following choices:

Point: add or delete points. To add a point, left click. To delete a point, move to the point till it is high lighted as white, then press "Backspace". The point can be deleted if it is not used by any other lines or polygons.

X Line, Y Line, Z Line, Other Line: to add or delete lines. To add a line, the first left click defines the start point, and the second left click defines the end point. If you want to reuse one of the existing points as start/end point, just press "Ctrl" when you left click. To delete a line, move the mouse onto it till it becomes white and press "Backspace". In "X Line" edit mode, the lines you add are supposed to be parallel to the X axis in 3D. Similar meaning for the "Y Line" and "Z Line" mode. Lines added in "Other Line" mode may have any orientation.

Polygon: add or delete polygons. Each polygon consists of a list of points. To add a polygon, you sequentially left click on desired positions and then press "Enter". A closed a polygon will be drawn. ( You don't have to click on the first point to make the polygon closed, the system automatically does it for you. ) To delete a polygon, move the mouse to the center of the polygon, shown as a white square, and press "Backspace". Every time you create a new polygon, you will give a name for it, e.g, "ceiling", "floor", which will be used as the texture file names when you save the model in VRML. The texture file name for a particular polygon will be the polygon name with ".gif" extension.

The "Draw" submenu, you can toggle the following options:

Points: draw points or not.

Lines: draw lines or not.

Polygons: draw polygons or not.

Draw 3D: draw it in 2D or 3D mode.

When "Draw 3D" is not checked, all the image and points, lines, and polygons are drawn in image plane. You can edit them and

zoom in/out: Ctrl+/-;

move image: drag with right button;

When "Draw 3D" is checked, all the points, lines, and polygons are drawn in 3D (based on your computation of X,Y,Z for each point). The image is texture mapped onto the polygon (based on your estimation of homograph H, invH). You can not edit in this mode, but you can

scale up/down: Ctrl+/-;

move model parallel to the viewing plane: drag with left button;

move model further/closer: drag with left button upwards/downwards, with Alt down;

rotate around X: drag with left button vertically, with Ctrl down;

rotate around Y: drag with left button horizontally, with Ctrl down;

rotate clockwise/counterclockwise: drag with left button to the right/left, with Shift down;

Data Structure

Currently, there are 30 C++ files in skeleton codes: HelpPageUI.cpp/h, svmUI.cpp/h, ImgView.cpp/h, smvMain.cpp, svm.h, svmAux.h, PriorityQueue.h.

HelpPageUI.cpp/h, svmUI.cpp/h: defines a help window and a main window respectively.

svmAux.h, PriorityQueue.h: defines some auxiliary data structures and functions.

svmMain: defines the "main" function, which is simply a loop;

svm.h: includes several head files and defines the following important data structures:

struct SVMPoint {

double u,v,w;

double X,Y,Z,W;

};

typedef CTypedPtrDblList<SVMPoint> PointList;

where (u,v,w) is 2D Homogeneous Coordinates in image plane, and X, Y, Z, W are 3D Homogeneous Coordinates in 3D world. If w = 1, (u, v) is image coordinates, ranging from 0 to image width and 0 to image height respectively. If w=0 means the point is at infinity. Otherwise, (u/w, v/w) is image coordinates. Similar means for X, Y, Z, W.

struct SVMLine {

int orientation;

SVMPoint *pnt1, *pnt2;

};

typedef CTypedPtrDblList<SVMLine> LineList;

where orientation indicates whether the line is supposed to be parallel to X, Y, Z axis or just any possible orientation in 3D.

struct SVMPolygon {

CTypedPtrDblList <SVMPoint> pntList;

double cntx, cnty;

double H[3][3],invH[3][3];

char name[256];
};

typedef CTypedPtrDblList<SVMPolygon> PolygonList;

where each polygon consist of a list of SVMPoint and the pointers to the SVMPoints are saved in pntList.
(cntx, cnty) is the mean of all points in the list, used for polygon selection in UI. H is the homography from normalized texture image of this polygon to the original image; that is, if the INVERSE of H is applied to the image coordinates (u,v,w) in the pntList, the result is the texture coordinates, ranging between [0,1]. invH is the inverse matrix of H. H is used when generating texture images from original image. invH is used to convert image coordinates in pntList to texture coordinates. Whenever you change H, please update invH using Matrix3by3Inv function in svmAux.h.

name is the name of the polygon. name.gif will be used as texture file name for VRML file, which will be explained later.

ImgView.cpp/h: defines and implements imgView class, which handles most of the UI messages and drawing routines. You will work with the follow member data:

PointList pntList;

LineList lineList;

PolygonList plyList;

which save all the points, lines, and polygons you create.

What to fill in?

This project is not the same as previous one in a sense that you are not given several TODOs, each of which you just fill in several sentences. Instead, you are given a goal to generate 3D textured models. To achieve the goal, we recommend you to follow the 5 steps mentioned above. It is up to you how to organize your code based our skeleton code. As of the 5 steps, they can be divided into two stages:

A. Estimate the geometry of the model.

The goal of this stage is to compute the X,Y,Z,W fields for each SVMPoint. It involves calculate vanishing points, and choose reference points. This has been covered in Steve's lectures;

B. Compute the texture image for each polygon.

Based on the 3D point positions, you need to compute the homography from the polygon plane to the image plane and then resample the original image to generate texture image. You need to fill in code in the skeleton code to do this ! As for the naming convention for texture images, if the polygon has a name "wall", the texture image name should be "wall.tga". "wall.tga" maybe contain something more than a wall, you want to use your scissor programming to cut the wall out of its background. Based on the mask from your scissor and wall.tga, you want to generate a wall.gif with Photoshop, in which background are transparent and the foreground are opaque. If you do this for all the polygons, and save the model as VRML. The skeleton code will generate a VRML file, using polygon's name with ".gif" extension as texture image filename. That's the reason use want to follow my naming convention: wall-->wall.tga-->wall.gif ! If you put the VRML file and all *.gif texture image files under the same directory, you can view it with a VRML viewer. Here is a detailed document about computing homography with some helper codes to solve symmetric linear equations Ax=b, where A is symmetric.

The "Tool" submenu is empty. You can put what ever tools you invented to achieve the single view modeling goal.

Bells and Whistles

Here is a list of suggestions for extending the program for extra credit. You are encouraged to come up with your own extensions. We're always interested in seeing new, unanticipated ways to use this program!

Show the camera position in each VRML file, marked by a sphere or other shape. We discussed how to obtain the height of the camera in lecture. The X position can be obtained the exact same way, using the vanishing line between the Y and Z vanishing points and a reference length parallel to the X axis (and similarly for the Y position).

Merging models from multiple images. For instance, create a complete model of a building exterior from a few photographs that capture all four sides..

Extend the method to create a 3D model from a cylindrical panorama. Hint: parallel lines in a panorama sweep out a curved path--you need to determine what this curve is.

Resources

Class lecture notes on projective geometry and single view modeling
Single view modeling web pages and paper by Antonio Criminisi and colleagues.
"Single View Metrology" paper by Criminisi, Reid, and Zisserman, ICCV 99. Also see other
There are also some projective geometry tutorials online.
VRML: The Virtual Reality Modeling Language, a file format for interactive 3-D models (a.k.a. virtual worlds) on the Internet. The VRML repository has specifications for the file format and information on free VRML plugins that permit a web browser to display VRML models. We recommend the Cortona browser, which is installed on the machines in the graphics lab. Make sure you install both the Browser and the 1.0 converter. A comparison of many VRML browsers is here. We have put a sample VRML file (it is a text file) online. If your browser has a VRML plugin (you can download these freely over the web), you should see a guy with sunglasses standing on a plank. The guy should look like a cardboard cutout (not a rectangle) if transparency is working. The two texture gif files, floor.gif and io.gif, are in the same directory. More on the VRML file format. Note that we'll only be using a fraction of VRML's capabilities.
Image Editing Tools: We recommend
- Photoshop on Macintosh, PC
- gimp on Unix
Be sure to choose an image editor that supports transparent gifs.

Last modified February 28, 2002