Assigned:
|
Synopsis
Description
Skeleton Code
ToDo
Turn in
Bells and Whistles
In this project, you will implement a system to construct a height field from a series of images of a diffuse object under different point light sources. Your software will be able to calibrate the lighting directions, find the best fit normal and albedo at each pixel, then find a surface which best matches the solved normals.
To start your project, you will be supplied with some test images and skeleton code you can use as the basis of your project, and a sample solution executable you can use to compare with your program.
This is a new project, so if you find something that seems strange about the skeleton code, please report it!
Before we can extract normals from images, we have to calibrate our capture setup. This includes determining the lighting intensity and direction, as well as the camera response function. For this project, we have already taken care of two of these tasks: First, we have linearized the camera response function, so you can treat pixel values as intensities. Second, we have balanced all the light sources to each other, so that they all appear to have the same brightness. You'll be solving for albedos relative to this brightness, which you can just assume is 1 in some arbitrary units. In other words, you don't need to think too much about the intensity of the light sources.
The one remaining calibration step we have left for you is calibrating lighting directions. One method of determining the direction of point light sources is to photograph a shiny chrome sphere in the same location as all the other objects. Since we know the shape of this object, we can determine the normal at any given point on its surface, and therefore we can also compute the reflection direction for the brightest spot on the surface.
The appearance of diffuse objects can be modeled as where I is the pixel intensity, kd is the albedo, and L is the lighting direction (a unit vector), and n is the unit surface normal. (Since our images are already balanced as described above, we can assume the incoming radiance from each light is 1.) Assuming a single color channel, we can rewrite this as so the unknowns are together. With three or more different image samples under different lighting, we can solve for the product by solving a linear least squares problem. The objective function is:
To help deal with shadows and noise in dark pixels, its helpful to weight the solution by the pixel intensity: in other words, multiply by Ii:
The objective Q is then minimized with respect to g. Once we have the vector g = kd * n, the length of the vector is kd and the normalized direction gives n.
This gives a way to get the normal and albedo for one color channel. Once we have a normal n for each pixel, we can solve for the albedos by another least squares solution. But this one ends up being a simple projection. The objective function is
To minimize it, differentiate with respect to kd, and set to zero:
Writing Ji = Li . n, we can also write this more concisely as a ratio of dot products: This can be done for each channel independently to obtain a per-channel albedo.
Next we'll have to find a surface
which has these normals. If a surface having these normals exists, then we can use path integration, as
described in Forsyth and
If the normals are perpendicular to the surface, then they'll be perpendicular to any vector on the surface. We can construct vectors on the surface using the edges that will be formed by neighbouring pixels in the height map. Consider a pixel (i,j) and its neighbour to the right. They will have an edge with direction
(i+1, j, z(i+1,j)) - (i, j, z(i,j)) = (1, 0, z(i+1,j) - z(i,j))
This vector is perpendicular to the normal n, which means its dot product with n will be zero:
(1, 0, z(i+1,j) - z(i,j)) . n = 0
Similarly, in the vertical direction:
We can construct similar constraints for all of the pixels which have neighbours, which gives us roughly twice as many constraints as unknowns (the z values). These can be written as the matrix equation Mz = v. The least squares solution solves the equation However, unlike the Lucas-Kanade algorithm, the matrix will still be really really big! It will have as many rows and columns as their are pixels in your image. Even for a small image of 100x100 pixels, the matrix will have 10^8 entries!
Fortunately, most of the entries are zero, and there are some clever ways of representing such matrices and solving linear systems with them. We are providing you with code to do this, all you have to do is fill in the nonzero entries of the matrix M and the vector v.
We have provided you with a skeleton interface for your program; you will just need to fill in the code fragments for a few key computations.
The tabs at the bottom of the main window allow you to switch between looking at different representations of the input and output data you'll be working with. At first they are all grayed out because there is no data loaded or computed, but as the data becomes available the tabs will become active.
To load a set of source images into the interface, open up one of the provided txt files containing names of image files using "Load Source Images..." from the File menu. (You may need to edit a local copy of these text files, so that the paths to the image files are relative to the directory where you run the program.) Note that the "Sources" tab is now active. You can look at different images in the sequence by clicking the arrows on the "Show Image #" counter. You can also pan these images by right-mouse-dragging, and zoom and unzoom using Control - and Control +. (If you get "lost", press the 0 (zero) key to reset the view to its default setting.) Each image sequence also comes with a mask image to limit computation to "interesting" regions of the images.
To solve for normals, you first need to compute the lighting directions. Using the solution code, load in the file chrome.txt to see the images of a chrome sphere. Choose "Solve->Lighting Directions...". This will compute an estimate of the reflection directions for the lights visible in these images. All the images in this data set were captured using the same set of lighting directions, so after you've computed this once you can save this out to an ASCII file (using "File->Save Lighting Directions...") and reuse it for later computations.
Now load in another set of images, for example buddha.txt. Once you have the images and lighting directions you want to use for your normal estimation, choose "Solve->Normals". When the computation is complete, the "Normals" tab will become active. The initial view is a color-coded view of the normals, in which the x, y, and z components of the normal are represented using red, green and blue. You can switch to an alternate representation by choosing "Needle Map" from the "Show As:" popup menu in the "Normals" tab. Normals can be saved out to a binary data file using "File->Save Normals...". (This data file can be read back into the program later, but it's not viewable in any other program.)
Now that the normals have been computed, we can solve for the color albedos using "Solve->Albedos". Again, when the computation is complete you can see the results by clicking on the "Albedos" tab. You can load and save albedos into tga files.
Finally, you can solve for depths using "Solve->Depths". This computation will take longer than the others. Like normals, you can save these to a binary data file. Once the depths have been computed (or loaded in from a binary data file), you can spin around in 3D using the left mouse button. You can also move the lights using the middle mouse button, or pan just like in the other modes, using the right mouse button. To apply the albedo map to this rendering, click in the "Use Albedos" checkbox. The resulting images should look a lot like the source images. But now you can change the lighting and viewpoint!
(For your coding convenience, the skeleton code also includes a Makefile which can be used to compile your code under Linux or MacOS X.3 (Panther). The only external dependency is FLTK 1.1.4. However, we prefer that you provide a Windows executable for your turnin.)
We have provided six different datasets
of diffuse (or nearly diffuse) objects for you to try out:
gray |
buddha |
horse |
cat |
owl |
rock |
There are four major sections for you to write:
TODO a) Complete the function "normalOnSphere" in LightDirs.cpp. Given an image position and the center and radius of the chrome sphere, compute the normal at that position.
TODO b) Complete the function "computeOneLight" in LightDirs.cpp. The goal is to find the center of the highlight, from which you can compute the lighting direction. First find the "center of brightness" of the pixels using a weighted average of pixel intesities, where each pixel location is weighted by its brightness. This gives you an "average" of the brightest pixel locations. Then:
TODO c) Complete the function "reflect" in Vec.h. This will allow you to reflect the viewing vector (0,0,1) about the normal to obtain the lighting direction.
TODO d) Complete the function "computeOneNormal" in Normals.cpp. Construct the matrix M and vector b given the description above, and the skeleton code will solve the 3x3 system M^T M g = M^T b. Then normalize and return the result. (Some pixels may not be solvable; for instance if a pixel is black in every image, or is non-black in only one or two images. In this case you may get "NaN" values from solveLinearSystem. In this case, just return the vector (0,0,1).)
TODO e) Complete the function "computeOneAlbedo" in Albedos.cpp. Using the description above, compute J for each lighting direction, then sum up the numerator and denominator and divide.
TODO f) Complete the functions "makeMmatrix" and "makeVvector" in Depths.cpp. Using the description above, fill in the entries of the M matrix and v vector for the least squares approximation to the surface.
Note that the
matrix M has one row for every constraint, and one
column for every pixel in the normal image. To translate from image locations
to indices in this matrix, read pixels from the image "zind",
which contains an integer index at each pixel. Pixels in zind
with the value -1 are outside the image mask, and you should not create
constraints for them. In other words, in Depths.cpp, pixels outside the image mask can be identified
as zind.Pixel(i,j,0) == -1.
Also note that the object "normals" is of type NormalImage &, which is defined in ImageTypes.h: its elements are of type Vec3f, and it has only one channel. So to access the z component of the normal at pixel i,j, you would write: normals.Pixel(i,j,0).z . In other words, in Depths.cpp, to get the z component of a normal, do normals.Pixel(i,j,0).z, NOT normals.Pixel(i,j,2). The latter may compile correctly but cause runtime errors.
In the sample images directory, we have provided a few files computed by the sample solution, for comparison with your own solution and also so that you can work on these tasks independently. The files are:
We have provided you with matrix and vector classes which handle most of the heavy lifting for solving linear systems. You'll just have to fill in the entries appropriately. You can get and set matrix and vector elements using function notation: mat(i,j), where i is the row and j is the column, or vec(i). There is also a somewhat more compact Vec3 template for triples like normals and light directions: you can access its members as v.x, v.y, v.z.
There are actually two different arbitrary-size matrix classes. The first one, CMatrix, is used in solving for normals, and it's very straightforward. The second, CSparseMatrix, is used in solving for depths, and it's designed particularly for problems like this one in which the matrices are very large but most of the entries are zero. Only nonzero entries are stored, in an efficient data structure. Any entry which is not specifically set is assumed to have a value of 0. We mention this explicitly because your code should NOT loop through the rows and columns of this matrix setting all the values to 0! If you do it will actually be considerably less efficient than the vanilla CMatrix! So, the moral of the story is: Only set elements that have nonzero entries.
1. Turn in the executable (psm.exe).
2. Turn in the code that you wrote (just the .cpp files you modified and any new files you needed).
3. In the artifact directory, turn in a web page containing snapshots of the following, for at least three different example datasets:
o RGB-encoded normals
o needle map
o albedo map
o two or more views of the reconstructed surface without albedos
o two or more views of the reconstructed surface with albedos
Also in your web page, include a discussion of your results. What worked and what didn't? Which reconstructions were successful and which were not? What are some problems with this method and how might you improve it?
Here are a few suggestions for extra credit.
Modify the objective function for surface reconstruction and rewrite the matrix accordingly. One idea is to include a "regularization" (smoothing) term. This can sometimes help reduce noise in your solutions. Another idea is to weight each edge constraint by some measure of confidence. For instance, you might want the constraints for dark pixels where the normal estimate is bad to be weighted less heavily than bright pixels where the estimate is good.
In our formulation of solving for normals, we weighted each term by the image intensity, to reduce the influence of shadowed regions. This approach has the drawback of overweighting specular highlights, where they appear? Can you come up with an alternate weighting scheme that avoids this? Can you apply it to solving for albedos also? Depths?
Be creative! Do something we didn't think of...
Capture your own source images and reconstruct them. (See a TA or Prof. Seitz early if you want to do this, there are a number of "gotchas" but it can be fun!)
Implement example-based photometric stereo, as described in class. We can provide some sample datasets if you're interested in this.