Computer Vision (CSE
P576), Spring 2015
Project 3: 3D
Reconstruction from Photometric Stereo
Assigned: Wednesday, May 6th, 2015
Due: Wednesday, May 20th, 2015 (by 6:30pm)
Project Head TA: Ricardo Martin (send your questions here first!)
Project Secondary TA: Igor Mordatch
Example of reconstruction using "Buddha" image set. From left to
right: source image (1 of 12), normal vectors, albedo map, depth map, and a
reconstructed view.
In this
project, you will implement an algorithm to construct a height field from a
series of 2D images. These images are taken of a (mostly) diffuse object under
a series of point light sources. Your software will be able to calibrate the
lighting directions, find the best fit normal and albedo at each pixel, then
find a surface which best matches the solved normals.
To start
your project, you will be supplied with some test images and skeleton code you
can use as the basis of your project, and a sample solution executable you can
use to compare with your program.
Project FILES
The
files needed for this project include:
It
should be possible to develop in other environments if you prefer. However, you
are responsible for making sure that your project turn-ins work in the official
development enviornment. Bugs created by incompatibilities
with the testing environment are still your bugs.
Project
Description
Before
we can extract normals from images, we have to
calibrate our capture setup. This includes determining the lighting intensity
and direction, as well as the camera response function. For this project, we
have already taken care of two of these tasks: First, we have linearized the
camera response function, so you can treat pixel values as intensities. Second,
we have balanced all the light sources to each other, so that they all appear
to have the same brightness. You'll be solving for albedos relative to this
brightness, which you can just assume is 1 in some arbitrary units. In other
words, you don't need to think too much about the intensity of the light sources.
The one
remaining calibration step we have left for you is calibrating lighting
directions. One method of determining the direction of point light sources is
to photograph a shiny chrome sphere in the same location as all the other
objects. Since we know the shape of this object, we can determine the normal at
any given point on its surface, and therefore we can also compute the
reflection direction for the brightest spot on the surface.
The
appearance of diffuse objects can be modeled as where
I is the pixel intensity, kd
is the albedo, and L is the lighting direction (a unit vector), and n is the
unit surface normal. (Since our images are already balanced as described above,
we can assume the incoming radiance from each light is 1.) Assuming a single
color channel, we can rewrite this as so the unknowns are together. With three or
more different image samples under different lighting, we can solve for the
product by solving a linear least squares problem. The objective function is:
To help
deal with shadows and noise in dark pixels, its helpful to weight the solution by the pixel
intensity: in other words, multiply by Ii:
The
objective Q is then minimized with respect to g. Once we have the vector g = kd * n, the length of the vector
is kd and the normalized direction gives n.
This
gives a way to get the normal and albedo for one color channel. Once we have a
normal n for each pixel, we can solve for the albedos by another least squares
solution. The objective function is:
To
minimize it, differentiate with respect to kd, and set to zero:
Writing Ji = Li . n,
we can also write this more concisely as a ratio of dot products: . This
can be done for each channel independently to obtain a per-channel albedo.
Next
we'll have to find a surface which has these normals,
assuming such a surface exists. We will again use a least-squares technique to
find the surface that best fits these normals. Here's
one way of posing this problem as a least squares optimization.
If the normals are perpendicular to the surface, then they'll be
perpendicular to any vector on the surface. We can construct vectors on the
surface using the edges that will be formed by neighbouring
pixels in the height map. Consider a pixel (i,j) and its neighbour to
the right. They will have an edge with direction:
(i+1, j, z(i+1,j))
- (i, j, z(i,j)) = (1, 0, z(i+1,j) - z(i,j))
This
vector is perpendicular to the normal n, which means its dot product with n
will be zero:
(1, 0, z(i+1,j)
- z(i,j)) . n = 0
Similarly,
in the vertical direction:
We can
construct similar constraints for all of the pixels which have neighbours, which gives us roughly twice as many
constraints as unknowns (the z values). These can be written as the matrix
equation Mz = v. The least squares solution solves
the equation . However, the matrix will
still be really really big! It will have as many rows
and columns as their are
pixels in your image. Even for a small image of 100x100 pixels, the matrix will
have 10^8 entries!
Fortunately,
most of the entries are zero, and there are some clever ways of representing
such matrices and solving linear systems with them. We are providing you with
code to do this, all you have to do is fill in the nonzero entries of the
matrix M and the vector v.
Program Operation
When
running the program, the tabs at the bottom of the main window allow you to
switch between looking at different representations of the input and output
data you'll be working with. At first they are all grayed out because there is
no data loaded or computed, but as the data becomes available the tabs will
become active.
To load
a set of source images into the interface, open up one of the provided .txt
files containing names of image files using "Load Source Images..."
from the File menu. (You may need to edit a local copy of these text files, so
that the paths to the image files are relative to the directory where you run
the program.) Note that the "Sources" tab is now active. You can look
at different images in the sequence by clicking the arrows on the "Show
Image #" counter. You can also pan these images by right-mouse-dragging,
and zoom and unzoom using Control - and Control +.
(If you get "lost", press the 0 (zero) key to reset the view to its
default setting.) Each image sequence also comes with a mask image to limit
computation to "interesting" regions of the images.
To solve
for normals, you first need to compute the lighting
directions. Using the solution code, load in the file chrome.txt to see the
images of a chrome sphere. Choose "Solve->Lighting Directions...". This will compute an estimate of the reflection
directions for the lights visible in these images. All the images in this data
set were captured using the same set of lighting directions, so after you've
computed this once you can save this out to an ASCII file (using
"File->Save Lighting Directions...") and reuse it for later
computations.
Now load
in another set of images, for example buddha.txt. Once you have the images and
lighting directions you want to use for your normal estimation, choose
"Solve->Normals". When the computation
is complete, the "Normals" tab will become
active. The initial view is a color-coded view of the normals,
in which the x, y, and z components of the normal are represented using red,
green and blue. You can switch to an alternate representation by choosing
"Needle Map" from the "Show As:" popup menu in the "Normals" tab. Normals can be
saved out to a binary data file using "File->Save Normals...". (This data file can be read back into the program
later, but it's not viewable in any other program.)
Now that
the normals have been computed, we can solve for the
color albedos using "Solve->Albedos". Again, when the computation
is complete you can see the results by clicking on the "Albedos" tab.
You can load and save albedos into tga files.
Finally,
you can solve for depths using "Solve->Depths". This computation
will take longer than the others. Like normals, you
can save these to a binary data file. Once the depths have been computed (or
loaded in from a binary data file), you can spin around in 3D using the left
mouse button. You can also move the lights using the middle mouse button, or
pan just like in the other modes, using the right mouse button. To apply the
albedo map to this rendering, click in the "Use Albedos" checkbox.
The resulting images should look a lot like the source images. But now you can
change the lighting and viewpoint!
We have
provided six different datasets of diffuse (or nearly diffuse) objects for you
to try out:
gray
buddha
horse
cat
bowl
rock
Note
that the code runs *MUCH* faster in Release mode.
TO DO
We have
provided you with a project skeleton for your program. You will just need to
fill in the code fragments for a few key computations.
In the sample
images directory, we have provided a few files computed by the sample solution,
for comparison with your own solution and also so that you can work on these
tasks independently. The files are:
If you
want to open the depth map elsewhere:
With
appropriate scaling and resampling, it will give results like the one shown at
right.
There are
four major sections for you to write:
Complete
the function "normalOnSphere" in
LightDirs.cpp. Given an image position and the center and radius of the chrome
sphere, compute the normal at that position.
Complete
the function "computeOneLight" in
LightDirs.cpp. The goal is to find the center of the highlight, from which you
can compute the lighting direction. First find the "center of
brightness" of the pixels using a weighted average of pixel intesities, where each pixel location is weighted by its
brightness. This gives you an "average" of the brightest pixel
locations.
Complete
the function "reflect" in Vec.h. This will
allow you to reflect the viewing vector (0,0,1) about
the normal to obtain the lighting direction.
Complete
the function "computeOneNormal" in
Normals.cpp. Construct the matrix M and vector b given the description above,
and the skeleton code will solve the 3x3 system M^T M g = M^T b. Then normalize
and return the result. Some pixels may not be solvable; for instance if a pixel
is black in every image, or is non-black in only one or two images. In this
case you may get "NaN" values from solveLinearSystem. In this case, just return the vector (0,0,1).
Complete
the function "computeOneAlbedo" in
Albedos.cpp. Using the description above, compute J for each lighting
direction, then sum up the numerator and denominator and divide.
Complete
the functions "makeMmatrix" and "makeVvector" in Depths.cpp. Using the description
above, fill in the entries of the M matrix and v vector for the least squares
approximation to the surface.
Note
that the matrix M has one row for every constraint, and one column for every
pixel in the normal image. To translate from image locations to indices in this
matrix, read pixels from the image "zind",
which contains an integer index at each pixel. Pixels in zind
with the value -1 are outside the image mask, and you should not create
constraints for them. In other words, in Depths.cpp, pixels outside the image
mask can be identified as zind.Pixel(i,j,0) == -1.
Also
note that the object "normals" is of type NormalImage &, which is defined in ImageTypes.h:
its elements are of type Vec3f, and it has only one channel. So to access the z
component of the normal at pixel i,j,
you would write: normals.Pixel(i,j,0).z . In other
words, in Depths.cpp, to get the z component of a normal, do normals.Pixel(i,j,0).z, NOT normals.Pixel(i,j,2).
The latter may compile correctly but cause runtime errors.
We have
provided you with matrix and vector classes which handle most of the heavy
lifting for solving linear systems. You'll just have to fill in the entries
appropriately. You can get and set matrix and vector elements using function
notation: mat(i,j), where i is the row and j is the column, or vec(i). There is also a somewhat more compact Vec3 template for
triples like normals and light directions: you can
access its members as v.x, v.y,
v.z.
There
are actually two different arbitrary-size matrix classes. The first one, CMatrix, is used in solving for normals,
and it's very straightforward. The second, CSparseMatrix,
is used in solving for depths, and it's designed particularly for problems like
this one in which the matrices are very large but most of the entries are zero.
Only nonzero entries are stored, in an efficient data structure. Any entry
which is not specifically set is assumed to have a value of 0. We mention this
explicitly because your code should NOT loop through the rows and columns of
this matrix setting all the values to 0! If you do it will actually be
considerably less efficient than the vanilla CMatrix!
So, the moral of the story is: Only set elements that have nonzero entries.
Solving
for some normals may sometimes result in a NaN value, such as when the determinant of the matrix
passed to Solve3x3 is zero. This value may appear when using calculated normals to compute the depth.
The
following code will allow you to check if a variable is equal to NaN.
float variable;
if( !variable || (variable != variable) )
// This is NaN (Not a
Number)
else
// This is a number
Artifact / Write-Up
Your
write-up should include results for at least three of the example data sets
(excluding the "chrome" calibration set). Each data set should
include the following:
Your web
page should include a discussion of your results. What worked and what didn't?
Which reconstructions were successful and which were not? What are some
problems with this method and how might you improve it?
The main
page of your web documentation must be called index.html and must be located in your turnin/artifact folder. You should not add additional
folders unless one is created for you by a WYSIWYG HTML editor.
EXTRA CREDIT
To claim
extra credit, your write-up should include a section which describes what you
did for each, which demonstrates your results, and which explains why this is a
useful or interesting extension of the base program.
Here are
a few suggestions:
Capture your own source images and reconstruct them. See a TA or Prof.
Seitz early if you want to do this, there are a number of "gotchas"
but it can be fun!
Do 3D reconstruction using facial images, either from
the internet or from your own collection.
Implement example-based
photometric stereo. We can provide some sample datasets if you're
interested in this.
Be
creative! Come up with a project extension we didn't think of.
Links