Computer Vision (CSE 455), Winter 2008

Project 3:  Single View Modeling

Assigned:  Friday, Feb 15, 2008
Project Due (NEW):  Friday, Feb 29, 2008 (by 11:59pm).
Artifact Due (NEW):  Monday, Mar 3, 2008 (by 11:59pm)
BRING your physical models (forward and reverse) to class on Tuesday Mar 4.  We'll all gaze admiringly and vote on the favorites in class!!!

Head TA:  Noah Snavely (ask him first!)

In this assignment you will create 3D texture-mapped models from a single image using the single view modeling method discussed in class. 
The steps of the project are:

  1. Image acquisition
  2. Calculate vanishing points
  3. Choose reference points
  4. Compute 3D coordinates of several points in the scene
  5. Define polygons based on these points.
  6. Compute texture maps for the polygons and output them to files.
  7. Create a 3D texture-mapped VRML model
  8. Create a physical paper cutout of the object.
  9. Create a reverse perspective paper cutout of the object.
  10. Submit results

We provide a user interface for specifying points, polygons, and 3D boxes.

Image Acquisition

For this assignment you should take high resolution (preferably at least 800x800) images or scans of at least two different scenes. One of your images should be a sketch or painting. For instance, a photo of a Greek temple and a painting of Leonardo da Vinci's "The Last Supper" might be interesting choices. (We don't want everyone in the class to do these objects, however.) Note also that the object you digitize need not be monumental, or be a building exterior. An office interior or desk is also a possibility. At the other extreme, aerial photographs of a section of a city could also be good source material (you might have more occlusion in this case, necessitating some manual fabrication of textures for occluded surfaces). Be sure to choose images that accurately model perspective projection without radial distortions. You'll want to choose images that are complex enough to create an interesting model with at least ten textured polygons, yet not so complex that the resulting model is hard to digitize or approximate.

Calculating Vanishing Points

Choose a scene coordinate frame by defining lines in the scene that are parallel to the X, Y, and Z axis. For each axis, specify more than two lines parallel to that axis. The intersection of these lines in the image defines the corresponding vanishing point. A vanishing point may be "at infinity". Since the accuracy of your model depends on the precision of the vanishing points, implement a robust technique for computing vanishing points that uses more than two lines. Here is a write-up for a recommended method that extends the cross-product method discussed in class to return the best intersection point of 3 or more lines in a least squared sense, and helper code for eigen-decompositing symmetric matrices with example uses (it's already included in the skeleton .zip file, and you will be using a wrapper function that calls this routine).

To compute vanishing points, choose line segments that are as long as possible and far apart in the image. Use high resolution images, and use the zoom feature to specify line endpoints with sub-pixel accuracy. A small number of "good" lines is generally better than many inaccurate lines. Use the "save" feature in your program so that you don't have to recalculate vanishing points every time you load the same image.

Choose Reference Points

You will need to set the reference points as described in lecture and in the write-ups. One way of doing this is to measure, in 3-D, when you shoot the picture, the positions of 4 points on the reference (ground) plane and one point off of that plane. The 4 reference plane points and their image projections define a 3x3 homography matrix H that maps X-Y positions of points on the ground plane to u-v image coordinates. The fifth point determines the reference height R off of the plane, as described in lecture. Alternatively, you can specify H and R without physical measurement by identifying a regular structure such as a cube and choosing its dimensions to be unit lengths. This latter approach is necessary for paintings and other scenes in which physical measurements are not feasible. If you'd like, you can use the X and Y vanishing points as two of the reference points on the plane. In this case, you need to specify only 2 more on the plane and one off the plane.

Compute 3D Positions

You will use two different approaches for computing 3D coordinates of an image point: in-plane measurements and out-of-plane measurements. You can combine these techniques to increase the power of the technique.  These are:

We will provide a UI that lets you easily create box shapes and polygons, that build upon these methods.

Compute Camera Position and Projection Matrix

You can solve for the height of the camera using the horizon as discussed in class (Projective Geometry slide 33).  You may find the SameXY routine useful here. 

To solve for the X and Y world coordinates of the camera, first imagine a point C0 on the ground plane that lies directly below the camera (this point has the same X and Y coordinates as the camera), as shown in the figure below.

If we can find where C0 projects into the image, sameZplane will tell us its XY coordinates and we'll be done.  So where does C0 project?  The projection is the intersection of the ray from C0 through the camera center with the image plane.  Notice that this ray is vertical---it goes straight up towards the Z point at infinity, [0 0 1 0]T).  Notice also that every point on this ray (including [0 0 1 0]T) projects to the same point in the image.  Hence, the projection of C0 is the same as the Z vanishing point, vz (pretty neat, huh?).  vz in combination with sameZplane gives you the X and Y coordinates of the camera.

You can also solve for the projection matrix П using the homography H of the ground plane.  If

 

and vz = [uvz vvz wvz]then П is given by

To see why this is true, consider what this projection matrix does to points [X Y 0 1] on the ground plane---it's easy to see that the result is just H [X Y 1]T.  It remains to solve for the value of c.  To find c, use a reference point P = [X Y Z 1]T that is not on the ground plane, and its known image position (u, v).  This point gives us two equations involving c:

Now multiply both sides of the equations by the denominator of the fraction on the right hand side, and rearrange to get the equations in the form: 



The least squares solution for c is then:

Compute Texture Maps

Use the points you have measured to define several planar patches (polygons) in the scene. Note that even though your measurements may be in horizontal or vertical directions, you can include planes that are slanted, such as a roof.

The last step is to compute texture map images for each of these polygons.  Your program will store a separate texture image for each polygon in the scene, created by applying a homography to the original photo.  You need to solve for the appropriate homography for each polygon.  If the polygon is a rectangle in the scene, e.g., a wall or door, all that is needed is to warp the quadrilateral image region into a rectangular texture image.  More generally, you will need to convert the coordinate system of the polygon to align with the texture image.  See this document for a more detailed explanation.

Create a VRML model

For each image you work from, create a VRML model (see documentation below) with at least 10 texture-mapped polygonal faces. The skeleton code will create the VRML file for you but you need to add texture map images and masks for each polygon, in .gif or .jpg format.

Create 3D Paper Cutout

Create a physical replica of your scene by printing out the texture images onto paper, glue-sticking to cardboard, cardstock, or another stiff material, and folding/taping into a 3D recreation of your scene.

Create Reverse Perspective

A reverse perspective is an optical illusion where the 3D geometry is transformed so that depths are inverted--concave scenes become convex and vice versa.  However, it is done in such a way that the inverted 3D model appears correct from a certain viewpoint (or range of viewpoints), and it is not until you move your head that you notice that something is wrong.  The scene appears to move in the opposite way that it should, and serves to exaggerate the perception of 3D.  Click for a nice writeup of this illusions with some examples.  The artist Patrick Hughes produces stunning examples of this illusion.

While creating reverse perspectives is more of an art than a science, we propose the following approach.  First, we must choose the viewpoint from which we want the scene to appear "correct", i.e., identical to the photo from which it was created.  One option would be to choose to use the viewpoint from which the photo was actually taken (computed as above).  However, this tends not to work well, since the model is typically reproduced at a smaller scale than the original scene, which also scales down the distance from the viewer to the model, requiring the viewer to stand uncomfortably close (the effect works best if you stand a few feet back). 

Instead, we assume the viewer far enough away to approximate an orthographic projection, i.e., (X, Y, Z) is projected to (X, Y).  We therefore wish to transform the shape of the 3D model so that it appears correct under an orthographic projection.  To see how to accomplish this transformation, the perspective projection formula:

 

 

can be re-written as follows:

 

 

The leftmost matrix above is the orthographic projection matrix.  Hence, we can accomplish the reversal by applying the rightmost three matrices to the original shape, as follows:

 

 

Here, (CX, CY, CZ)  is the camera position (we described above how to compute this), and R is the camera rotation.  We will provide you with a function that computes R from the projection matrix.  We've also added a scale factor S--entry (3,4) in the leftmost matrix--that controls the depth scale of the result.  Changing this value stretches the model along the Z direction.  This doesn't change the appearance under orthographic projection when looking down the Z axis, but will affect it's appearance when the observer moves.  You might want to experiment with different S values to find the one that produces the best illusion.  Patrick Hugues constructs his models so that sides meet at 45 degree angles--this may be a good rule of thumb to shoot for and will avoid models that are too shallow or too deep.

Apply this transformation to your 3D model.  Then create a physical replica of your reverse perspective by printing out the texture images onto paper, glue-sticking to cardboard, cardstock, or another stiff material, and folding/taping into a 3D recreation of your scene.  In general, the larger the physical replica, the more convincing the illusion. Consider using the department's large format printer (the TAs can help you with this if you give them enough advance notice...).

Note that not all scenes will produce good reverse perspectives.  Study these examples by Patrick Hughes for examples of scenes that work particularly well.  Relatively simple scenes composed of one or more simple box primitives tend to work quite well (use the box tool in the UI).

To Do

click here for a list of specific routines that you need to write.

Submit Results

Put your code and executable in the project 3 turnin directory, and your images and VRML models in the artifact directory with a web page project3.htm that contains:

Remember to provide results for at least two different scenes.  Since not all scenes produce good reversals, we do not require that you do normal and reverse paper cutouts for both (one is sufficient).

Hints

Click here to see a list of useful hints for working on this project.

Skeleton Code

Download skeleton code and solution executable.
Update (02/24/08): new version of unwrap.cpp
Update (02/24/08): new version of svmui.cpp
Update (02/24/08): new version of WarpImage.cpp
Update (02/24/08): new version of ImgViewHandle.cpp
Update (02/26/08): new version of ImgView.cpp
The skeleton code provides a user interface and several basic data structures that you will be working with (be sure and read about these by clicking on the given links). It's based on the code originally written by Li Zhang and significantly modified by Chris Twigg, Eugene Hsu, and Noah Snavely.

Bells and Whistles

Here is a list of suggestions for extending the program for extra credit. You are encouraged to come up with your own extensions. We're always interested in seeing new, unanticipated ways to use this program!

[whistle]Show the camera position in each VRML file, marked by a sphere or other shape.  

[bell] Merging models from multiple images. For instance, create a complete model of a building exterior from a few photographs that capture all four sides..

[bell][bell] Extend the method to create a 3D model from a cylindrical panorama.  Hint:  parallel lines in a panorama sweep out a curved path--you need to determine what this curve is.


Resources


Last modified February 27, 2008