Computer Vision (CSE P576), Spring 2015

Project 3: 3D Reconstruction from Photometric Stereo

Assigned: Wednesday, May 6th, 2015

Due: Wednesday, May 20th, 2015 (by 6:30pm)

Project Head TA: Ricardo Martin (send your questions here first!)

Project Secondary TA: Igor Mordatch

Example of reconstruction using "Buddha" image set. From left to right: source image (1 of 12), normal vectors, albedo map, depth map, and a reconstructed view.

In this project, you will implement an algorithm to construct a height field from a series of 2D images. These images are taken of a (mostly) diffuse object under a series of point light sources. Your software will be able to calibrate the lighting directions, find the best fit normal and albedo at each pixel, then find a surface which best matches the solved normals.

To start your project, you will be supplied with some test images and skeleton code you can use as the basis of your project, and a sample solution executable you can use to compare with your program.

Project FILES

The files needed for this project include:

It should be possible to develop in other environments if you prefer. However, you are responsible for making sure that your project turn-ins work in the official development enviornment. Bugs created by incompatibilities with the testing environment are still your bugs.

Project Description

Calibration

Before we can extract normals from images, we have to calibrate our capture setup. This includes determining the lighting intensity and direction, as well as the camera response function. For this project, we have already taken care of two of these tasks: First, we have linearized the camera response function, so you can treat pixel values as intensities. Second, we have balanced all the light sources to each other, so that they all appear to have the same brightness. You'll be solving for albedos relative to this brightness, which you can just assume is 1 in some arbitrary units. In other words, you don't need to think too much about the intensity of the light sources.

The one remaining calibration step we have left for you is calibrating lighting directions. One method of determining the direction of point light sources is to photograph a shiny chrome sphere in the same location as all the other objects. Since we know the shape of this object, we can determine the normal at any given point on its surface, and therefore we can also compute the reflection direction for the brightest spot on the surface.

Normals from images

The appearance of diffuse objects can be modeled as where I is the pixel intensity, kd is the albedo, and L is the lighting direction (a unit vector), and n is the unit surface normal. (Since our images are already balanced as described above, we can assume the incoming radiance from each light is 1.) Assuming a single color channel, we can rewrite this as so the unknowns are together. With three or more different image samples under different lighting, we can solve for the product by solving a linear least squares problem. The objective function is:

To help deal with shadows and noise in dark pixels, its helpful to weight the solution by the pixel intensity: in other words, multiply by Ii:

The objective Q is then minimized with respect to g. Once we have the vector g = kd * n, the length of the vector is kd and the normalized direction gives n.

Solving for color albedo

This gives a way to get the normal and albedo for one color channel. Once we have a normal n for each pixel, we can solve for the albedos by another least squares solution. The objective function is:

To minimize it, differentiate with respect to kd, and set to zero:


Writing Ji = Li . n, we can also write this more concisely as a ratio of dot products: . This can be done for each channel independently to obtain a per-channel albedo.

Least-squares surface fitting

Next we'll have to find a surface which has these normals, assuming such a surface exists. We will again use a least-squares technique to find the surface that best fits these normals. Here's one way of posing this problem as a least squares optimization.

If the normals are perpendicular to the surface, then they'll be perpendicular to any vector on the surface. We can construct vectors on the surface using the edges that will be formed by neighbouring pixels in the height map. Consider a pixel (i,j) and its neighbour to the right. They will have an edge with direction:

(i+1, j, z(i+1,j)) - (i, j, z(i,j)) = (1, 0, z(i+1,j) - z(i,j))

This vector is perpendicular to the normal n, which means its dot product with n will be zero:

(1, 0, z(i+1,j) - z(i,j)) . n = 0

Similarly, in the vertical direction:

We can construct similar constraints for all of the pixels which have neighbours, which gives us roughly twice as many constraints as unknowns (the z values). These can be written as the matrix equation Mz = v. The least squares solution solves the equation . However, the matrix will still be really really big! It will have as many rows and columns as their are pixels in your image. Even for a small image of 100x100 pixels, the matrix will have 10^8 entries!

Fortunately, most of the entries are zero, and there are some clever ways of representing such matrices and solving linear systems with them. We are providing you with code to do this, all you have to do is fill in the nonzero entries of the matrix M and the vector v.

Program Operation

When running the program, the tabs at the bottom of the main window allow you to switch between looking at different representations of the input and output data you'll be working with. At first they are all grayed out because there is no data loaded or computed, but as the data becomes available the tabs will become active.

To load a set of source images into the interface, open up one of the provided .txt files containing names of image files using "Load Source Images..." from the File menu. (You may need to edit a local copy of these text files, so that the paths to the image files are relative to the directory where you run the program.) Note that the "Sources" tab is now active. You can look at different images in the sequence by clicking the arrows on the "Show Image #" counter. You can also pan these images by right-mouse-dragging, and zoom and unzoom using Control - and Control +. (If you get "lost", press the 0 (zero) key to reset the view to its default setting.) Each image sequence also comes with a mask image to limit computation to "interesting" regions of the images.

To solve for normals, you first need to compute the lighting directions. Using the solution code, load in the file chrome.txt to see the images of a chrome sphere. Choose "Solve->Lighting Directions...". This will compute an estimate of the reflection directions for the lights visible in these images. All the images in this data set were captured using the same set of lighting directions, so after you've computed this once you can save this out to an ASCII file (using "File->Save Lighting Directions...") and reuse it for later computations.

Now load in another set of images, for example buddha.txt. Once you have the images and lighting directions you want to use for your normal estimation, choose "Solve->Normals". When the computation is complete, the "Normals" tab will become active. The initial view is a color-coded view of the normals, in which the x, y, and z components of the normal are represented using red, green and blue. You can switch to an alternate representation by choosing "Needle Map" from the "Show As:" popup menu in the "Normals" tab. Normals can be saved out to a binary data file using "File->Save Normals...". (This data file can be read back into the program later, but it's not viewable in any other program.)

Now that the normals have been computed, we can solve for the color albedos using "Solve->Albedos". Again, when the computation is complete you can see the results by clicking on the "Albedos" tab. You can load and save albedos into tga files.

Finally, you can solve for depths using "Solve->Depths". This computation will take longer than the others. Like normals, you can save these to a binary data file. Once the depths have been computed (or loaded in from a binary data file), you can spin around in 3D using the left mouse button. You can also move the lights using the middle mouse button, or pan just like in the other modes, using the right mouse button. To apply the albedo map to this rendering, click in the "Use Albedos" checkbox. The resulting images should look a lot like the source images. But now you can change the lighting and viewpoint!

We have provided six different datasets of diffuse (or nearly diffuse) objects for you to try out:

gray

buddha

horse

cat

bowl

rock

Note that the code runs *MUCH* faster in Release mode.

TO DO

We have provided you with a project skeleton for your program. You will just need to fill in the code fragments for a few key computations.

In the sample images directory, we have provided a few files computed by the sample solution, for comparison with your own solution and also so that you can work on these tasks independently. The files are:

If you want to open the depth map elsewhere:

With appropriate scaling and resampling, it will give results like the one shown at right.

There are four major sections for you to write:

Lighting Calibration

Complete the function "normalOnSphere" in LightDirs.cpp. Given an image position and the center and radius of the chrome sphere, compute the normal at that position.

Complete the function "computeOneLight" in LightDirs.cpp. The goal is to find the center of the highlight, from which you can compute the lighting direction. First find the "center of brightness" of the pixels using a weighted average of pixel intesities, where each pixel location is weighted by its brightness. This gives you an "average" of the brightest pixel locations.

Complete the function "reflect" in Vec.h. This will allow you to reflect the viewing vector (0,0,1) about the normal to obtain the lighting direction.

Solve for Normals

Complete the function "computeOneNormal" in Normals.cpp. Construct the matrix M and vector b given the description above, and the skeleton code will solve the 3x3 system M^T M g = M^T b. Then normalize and return the result. Some pixels may not be solvable; for instance if a pixel is black in every image, or is non-black in only one or two images. In this case you may get "NaN" values from solveLinearSystem. In this case, just return the vector (0,0,1).

Solve for Albedos

Complete the function "computeOneAlbedo" in Albedos.cpp. Using the description above, compute J for each lighting direction, then sum up the numerator and denominator and divide.

Solve for Depths

Complete the functions "makeMmatrix" and "makeVvector" in Depths.cpp. Using the description above, fill in the entries of the M matrix and v vector for the least squares approximation to the surface.

Note that the matrix M has one row for every constraint, and one column for every pixel in the normal image. To translate from image locations to indices in this matrix, read pixels from the image "zind", which contains an integer index at each pixel. Pixels in zind with the value -1 are outside the image mask, and you should not create constraints for them. In other words, in Depths.cpp, pixels outside the image mask can be identified as zind.Pixel(i,j,0) == -1.

Also note that the object "normals" is of type NormalImage &, which is defined in ImageTypes.h: its elements are of type Vec3f, and it has only one channel. So to access the z component of the normal at pixel i,j, you would write: normals.Pixel(i,j,0).z . In other words, in Depths.cpp, to get the z component of a normal, do normals.Pixel(i,j,0).z, NOT normals.Pixel(i,j,2). The latter may compile correctly but cause runtime errors.

Notes on Matrix classes

We have provided you with matrix and vector classes which handle most of the heavy lifting for solving linear systems. You'll just have to fill in the entries appropriately. You can get and set matrix and vector elements using function notation: mat(i,j), where i is the row and j is the column, or vec(i). There is also a somewhat more compact Vec3 template for triples like normals and light directions: you can access its members as v.x, v.y, v.z.

There are actually two different arbitrary-size matrix classes. The first one, CMatrix, is used in solving for normals, and it's very straightforward. The second, CSparseMatrix, is used in solving for depths, and it's designed particularly for problems like this one in which the matrices are very large but most of the entries are zero. Only nonzero entries are stored, in an efficient data structure. Any entry which is not specifically set is assumed to have a value of 0. We mention this explicitly because your code should NOT loop through the rows and columns of this matrix setting all the values to 0! If you do it will actually be considerably less efficient than the vanilla CMatrix! So, the moral of the story is: Only set elements that have nonzero entries.

Solving for some normals may sometimes result in a NaN value, such as when the determinant of the matrix passed to Solve3x3 is zero. This value may appear when using calculated normals to compute the depth.

The following code will allow you to check if a variable is equal to NaN.

float variable;
if( !variable || (variable != variable) )
    // This is NaN (Not a Number)
else
    // This is a number

Artifact / Write-Up

Your write-up should include results for at least three of the example data sets (excluding the "chrome" calibration set). Each data set should include the following:

Your web page should include a discussion of your results. What worked and what didn't? Which reconstructions were successful and which were not? What are some problems with this method and how might you improve it?

The main page of your web documentation must be called index.html and must be located in your turnin/artifact folder. You should not add additional folders unless one is created for you by a WYSIWYG HTML editor.

EXTRA CREDIT

To claim extra credit, your write-up should include a section which describes what you did for each, which demonstrates your results, and which explains why this is a useful or interesting extension of the base program.

Here are a few suggestions:

Capture your own source images and reconstruct them. See a TA or Prof. Seitz early if you want to do this, there are a number of "gotchas" but it can be fun!

Do 3D reconstruction using facial images, either from the internet or from your own collection.

Implement example-based photometric stereo. We can provide some sample datasets if you're interested in this.

Be creative! Come up with a project extension we didn't think of.

Links