For this assignment, you will team up with one other classmate and implement a technique for estimating 3D depth information from digital photographs. The main components will be (1) choosing a method to implement, (2) coding it up, (3) deploying it in the lab, and (4), applying your method to create depth scans of a few objects.
We have purchased equipment and arranged support infrastructure for the following 7 shape scanning methods. By 5pm on Thursday, March 29, send us (seitz@cs.washington.edu and curless@cs.washington.edu) email with your preferences, as a ranked ordered list from 1 (top choice) to 7 (bottom choice). Please include all 7 projects. Based on this information, we will choose about 5 projects and choose groups of 2 students for each project. The assignments will be announced on Friday.
We will provide more information soon about equipment and support code and so forth. The purpose of this handout is to give you enough information to choose which method you want to implement. The project will be due Wednesday, April 18.
The possible projects are as follows:
This is a passive vision technique for creating voxel models from several images. The approach works by sweeping a plane (or other surface) through the scene volume and reconstructing voxels on the plane that are consistently colored in the input images. In particular, a voxel is deemed consistent if it projects to pixels whose color variance is below some threshold. Visibility is accounted for by sweeping the plane in a “front-to-back” order. The scene needs to be outside the convex hull of the cameras viewpoints.
Image Acquisition: a simple solution is to put the object on a turntable and rotate it on a rotary turntable in front of a stationary video or still camera, taking several pictures (between 20-100 is reasonable for a 360 degree rotation). The camera would have to be positioned slightly above the object, looking down, to satisfy the convex hull constraint. The camera should be calibrated with respect to the turntable before hand. One way to do this is to take a calibration object like a plane whose 3D geometry is known, place it on the turntable and rotate it to three or four known orientations. From these images, you should be able to obtain the intrinsic and extrinsic parameters using any of a variety of camera calibration routines, and also solve for the rotation axis of the turntable.
References:
S. M. Seitz and C. R. Dyer, Photorealistic Scene Reconstruction by Voxel Coloring, International Journal of Computer Vision, 35(2), 1999, pp. 151-173. Shorter conference paper appeared at Proc. Computer Vision and Pattern Recognition Conf., 1997, pp. 1067-1073.
K. N. Kutulakos and S. M. Seitz, A Theory of Shape by Space Carving, International Journal of Computer Vision, Marr Prize Special Issue, 2000. Earlier version appeared in Proc. Seventh International Conference on Computer Vision (ICCV) , 1999, pp. 307-314.
Create a depth map from two views of a scene by correlating pixels between views. In the standard stereo pipeline, the images are first calibrated, using any of a variety of camera calibration routines (you probably want to first place a known “calibration object” in the scene and use a standard routine to find camera parameters, then remove the object). The next step is to “rectify” the images by warping them so that the pixel displacement between images is purely horizontal. This rectification transformation is a planar projective (texture-map) warp and is described in standard vision textbooks such as the Faugeras text (see below). Next, the pixels in each row of the first image are matched with those of the second to determine disparity. There are several methods for performing this search. Since this step is the crucial one, be sure and implement a good technique, such as one of the three methods listed below in the references. It is straightforward to convert disparity into depth, given the known camera parameters.
References:
Three-Dimensional Computer Vision: A Geometric
Viewpoint, by Olivier Faugeras, MIT Press.
C. L. Zitnick and T. Kanade, A
Cooperative Algorithm for Stereo Matching and Occlusion Detection, Proc.
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 7, July
2000.
Web page: http://www.cs.cmu.edu/~clz/stereo.html
D. Scharstein and R. Szeliski. Stereo matching with nonlinear diffusion. International Journal of Computer Vision, 28(2):155-174, July 1998
Y. Boykov, O. Veksler, and Ramin Zabih, Fast Approximate Energy Minimization via Graph Cuts, Unpublished manuscript, 2000.
Swept-stripe
triangulation is a technique for capturing the shape of an object by sweeping a
plane of light over a surface and observing the reflections with an off-axis
camera. This approach has been
demonstrated to be one of the most accurate range scanning techniques for
applications such as static shape capture for engineering, film, and sculpture.
The plane of light
can be formed in several ways, including the use of a video projector, a slide
projector, or a laser line source.
There are two primary choices for scanning motion:
Fix the light source and camera and move the
object on a linear or rotational stage.
Fix the object and camera and move the light
source.
The second choice
can be implemented with a motion-controlled rotational or translational sweep
of the light source or by using a video projector and playing a swept light
video sequence.
Reference:
B. Curless and M.
Levoy, "Better Optical Triangulation through Spacetime Analysis,"
Proc. of Intl. Conf. on Computer Vision, pp. 987-994, Boston, June 1995.
Hierarchical
stripe scanning is a robust triangulation method for rapidly determining the
shape of an object. The idea is to
project a sequence of progressively finer vertical stripes onto the object and
observe the refelections with an off-axis camera.
In particular, if
you illuminate the object with a pattern that is half white and half black,
then each pixel of the camera will observe a white reflection or no
reflection. White pixels are known to
map to the left half of the object, and black pixels to the right half. You can now subdivide the projection so that
there are four stripes (white, black, white, black) and take another
image. If a pixel was observed to be
white the first time and black the second, then that pixel's line of sight must
intersect the second stripe, which is half as thick as the stripe used for the
first image. Continuing this refinement
process, each pixel will record a bit code (a sequence of white and black
observations) that will ultimately correspond to a particular stripe at the
finest level. Triangulating with that
stripe gives the desired range point for that pixel.
Reference:
Sato, K. and
Inokuchi, S., Three-dimensional surface measurement by space encoding range
imaging, Journal of Robotic Systems 2 (1985), pp. 27-39.
"One-shot"
active triangulation is a method for projecting a single grid pattern onto an
object and determining its shape.
Making such a technique robust take some effort and a set of
assumptions, but the result is a method for capturing real-time deformations,
such as a human face making a variety of expressions.
The idea is to
project a grid pattern, e.g., a rectilinear grid, onto a surface and observe
the reflection with an off-axis camera.
Next, you process the image to detect horizontal and predominantly
vertical edge segments. You then link
the edge segments to form a graph that represents a deformed version of the
grid. The deformation (which may
include breaks in the grid due to occlusion) corresponds to the shape of the
object. In the end, the positions of
the extracted grid lines can be triangulated with the original projector grid
lines.
Note that Steve
and Brian have a colored grid projection technique in mind that should
substantially reduce the difficulty in determining the edge connectivity of the
graph. Still, for reference, we have
included pointers to a technique that uses a single color (white) grid.
References:
M. Proesmans, L. Van Gool, and A.
Oosterlinck. Active acquisition of 3D shape for Moving Objects. In
International Conference on Image Processing, Lausanne, Switzerland, September
1996.
M. Proesmans, L.
Van Gool, A. Oosterlinck: One-shot active 3d shape acquisition in Proceedings
13th IAPR International conference on pattern recognition robotic systems, vol.
III C, pp. 336--340, August 25--26, 1996, Vienna, Austria.
Photometric stereo
is a technique for determining the normals and reflectances over a
surface. Under certain assumptions, it
is a fairly simple technique and yields data which can be used, e.g., for
bump-mapping surfaces.
Here's how it
works. Consider taking three images of
a surface in succession, where a single point light source illuminates the
surface for each image. For a given
pixel, you will observe three values.
If you assume the surface is diffuse with reflectance R, then you will
observe:
I_i = R * S_i l_i . n
where S_i is the
"intensity" of the i-th light source, l_i is the lighting direction,
and n is the normal. We can re-write
this as:
I_i = L_i . N
where L_i = S_i
l_i and N = R * n. Using column
vectors, our measurement equations can be combined into a matrix:
| (L_1)^T |
| (L_2)^T | N = I
| (L_3)^T |
where I = (I_1,
I_2, I_3)^T. This equation can be
solved for the normal and reflectance.
References:
Woodham, R.J.:
Photometric method for determining surface orientations from multiple images.
Optical Engineering 19, 139-144 (1980).
H. Rushmeier, G.
Taubin, and A. Gueziec, Applying shape from lighting variation to bump map
capture, Eurographics Rendering Workshop 1997 (P. Slusallek J. Dorsey, ed.),
Springer Wien, June 1997, pp. 35--44.
In this approach, a stick is waved by hand under a light source, casting a shadow stripe onto the surface of the object to be scanned. As the stick moves, the shadow slides across the object and is imaged by a video camera. The image of the shadow reveals the shape of the object under the shadow. In particular, each “edge” of the shadow lies on a plane determined by the light source and the stick. This plane may be identified by ensuring that part of the shadow hits a known ground plane and calibrating the light source beforehand. Given the plane and the shadow edge in the image, the 3D profile of the shadow edge is readily computed.
Getting this working requires calibrating the light source, implementing a shadow edge detector, and setting up a good working environment where the shadows will show up well.
This technique was developed by Jean-Yves Bouguet and Pietro Perona. Check out Bouguet’s web page at Caltech for more information:
http://www.vision.caltech.edu/bouguetj/ICCV98/index.html