PART I. (Written answers).
- Word processing (20 points).
In Kleinberg and Tardos, Chapter 6, Exercise 6.
-
Longest Common Subsequences
(15 points).
The Longest Common Subsequence (LCS) of two strings:
Sa = sa1 sa2 ... sam
and
Sb = sb1 sb2 ... sbn
is a string
Sc = LCS(Sa, Sb) = sc1 sc2 ... scq
such that Sc is a subsequence of Sa, and Sc is a subsequence of Sb,
and for any other common subsequence Sc' of Sa and Sb,
length(Sc) >= length(Sc').
It is possible to compute the length of an LCS using Dynamic Programming.
(a) give a recurrence equation for computing M(Prefix(Sa, i), Prefix(Sb, j)) which is defined as
the length of a longest common subsequence of Prefix(Sa, i) and Prefix(Sb, j).
Here Prefix(S, k) is defined as the string consisting of the first k characters of S.
(b) Draw the 2D array of values for M(Prefix(Sa, i), Prefix(Sb, j)) for the following
pair of strings.
Sa = A L K W A R I Z M I
Sb = A L G O R I T H M
(c) Draw a backtrace on the array that shows how to recover LCS(Sa, Sb) from the M array.
Circle the entries of the array M that correspond to one particular LCS.
- Stereo Vision (15 points).
In this exercise, you'll apply dynamic programming to the problem
of stereo computer vision.
The full stereo computer vision problem consists of taking two
images of a scene (a left-eye image and a right-eye image, actually
taken with a camera or two cameras), and computing a depth map
from them, using the slight differences between the images.
We will focus on a small but key part of this problem: the problem
of taking a pair of scanlines (one from each image) and determining
the correspondence between pixels. For example, if the scene includes
a human face, there should be a pixel at the lower-left corner of
the person's nose in each of the images, and assuming that the
images are aligned properly, the two pixels should occur in corresponding
scanlines (at the same height in the images). The two pixels will probably
be in slightly different horizontal positions due to the difference
in the left and right views. The pixels can be expected to have approximately
the same brightness value. We want our method to put these two pixels into
correspondence (along with many other corresponding pairs).
Our procedure should accept as input two lists of numbers. The first
will represent the pixel values along the left scanline, and the
second will be for the right scanline. It should compute a minimum-cost
correspondence, which for each pixel of the scanlines gives it either
a displacement (to its corresponding pixel in the other scanline) or
marks it as "occluded" which means it has no corresponding pixel in the
other scanline.
Here is an example:
Left scanline L: [7, 6, 0, 12, 12, 4]
Right scanline R: [7, 0, 12, 13, 11, 4, 3]
With Dynamic Programming, we build an m by n array to hold the M values of the
subproblems. Each subproblem is of the form S(i,j) = minimum cost
of a correspondence between the first i pixels of L and the first
j pixels of R.
We assume that the cost of a correspondence is based on the following:
if pixel pi is matched with pixel pj, the cost for this pair is
(pi-pj)2.
If pixel pi is considered "occluded" (not matched), the cost is Coccl,
a constant that is set before the algorithm runs.
Let us use Coccl = 10 for our example.
(a) Draw the M array for the sample problem (with the two scanline sequences given above),
filling in the correct values at least on the main diagonal and three diagonals on each side of it. (5 points)
(b) Draw a backtrace in M that shows an optimal set of corresponding pixels. (5 points).
(c) Determine the sequence of disparity values that goes with the left scanline sequence. (5 points).
[Late clarifications: Since the scanlines here have different lengths, assume that we define the cyclopean to follow the main diagonal of the
matrix from the (0,0) entry to the (6,6) entry, not the (6,7) or (7,6) entry. To determine the disparity at row i of the table, find the
leftmost table cell in row i through which your chosen optimal path passes, and then subtract from i the value of j there. (Disparity values can
be either negative or positive or zero.) You should give a sequence of 6 disparity values, starting with that for row 0 and ending with that for row 6.]
- Signature Verification: (15 points).
The ACME Commercial Checkout Equipment Company has just gotten a contract to
design a new payment system for Trader Tom's Healthfood Chain.
To help prevent credit-card fraud, the client (Trader Tom's) wants the vendor (ACME) to do a quick
automatic signature verification every time a customer pays with plastic and
enters their signature on the checkout tablet. Depending upon the automatically computed
score, the checkout clerk can OK the purchase or ask to see the customer's card up close.
Needless to say, the algorithm will have to run quickly, because Trader Tom's already has
a reputation for long lines at the checkout counters, and they don't want that to get worse.
They've asked you, an algorithms expert, to come up with a fast means of computing the verification
score.
Some of the work as already been done by the previous consultants, and you are to pick up where
they left off. They have already identified some hardware that comes with a software library
making signature data available as a sequence of short strokes. Each stroke is a small vector
in one of eight directions. Not only that, but the hardware tracks stylus movements both during
contact with the tablet and when the stylus is lifted up slightly from the surface. So there are
two modes for vectors: down (touching and "inking") and up (movement of the stylus with no
touching or inking on the surface).
The first time a customer uses a credit card at one of Trader Tom's stores, their signature
is captured as a sequence of these basic vectors, and it is stored in a database associated
with their loyalty number (and credit card suffix). (The clerk also does an "evaluation" of
the customer looking for any signs that the customer might be using his or her credit card
improperly. If everything seems normal, the sale goes through without a hitch. Otherwise,
the credit card is checked carefully before the sale is completed.)
Whenever that customer returns to the
store and signs again, the new signature is to be compared to the original using your algorithm.
If the comparison score is zero or less than some low threshold, then the payment by credit
card is immediately accepted with no special interaction from the clerk. But if the comparison
score exceeds the threshold, the clerk is to ask to see the credit card and make a visual
comparison, with an eye to detecting fraudulent use.
The previous consultants have determined that some sort of sequence matching method is
needed, and that an algorithm that finds an optimal alignment of sequences is in order.
An alignment should put elements from the sequence into one-to-one correspondence for the
most part, but mismatches and gaps are allowed, at a price, of course.
There are 16 different possible elements in each sequence, shown in the figure below.
The costs for matches/mismatches and gaps are as follows. Let m(Va, Vb) be the cost for
pairing element Va with element Vb.
m(va, vb) = 0, if va = vb,
1 if va and vb have mode u, but their angles differ by pi/4,
2 if va and vb have mode d, but their angles differ by pi/4,
2 if va and vb have mode u, but their angles differ by pi/2,
4 if va and vb have mode d, but their angles differ by pi/2,
11 if va and vb have different modes,
4 if va and vb have mode u, but their angles differ by more than pi/2,
8 if va and vb have mode d, but their angles differ by more than pi/2.
gap cost: 5
The gap cost is the cost of leaving an element in one sequence unmatched with any element in the other sequence.
Let M(S1, S2) represent the cost of a minimum-cost alighment between two signatures, S1 and S2.
Give an algorithm that will compute M(S1, S2) given the two
two signature sequences S1 and S2.
(Also give a short proof of its correctness and a small example of its operation, say for a sequence of length 2 aligned with a sequence of length 3.)
PART II. (Code).
- Implementing Signature Verification (30 points).
Implement the signature verification algorithm in Python.
Below are some examples to try your program on. Turn in the results for the
basic examples and the advanced examples.
Basic examples:
v0d,v3d,v0u --> input
v0d,v3d,v2d,v1u --> Test for gaps and angle difference in the matching (min cost: 6)
v7u,v0d,v3d,v0u --> Test for gaps on the edge (min cost: 5)
v0d,v3d,v0u --> Test for cost=0 (min cost: 0)
v0u,v3d,v0d --> Test for mode difference (min cost: 20)
v7d,v3d,v0u --> Test for angle difference between 1 and 7 (min cost: 2)
v1d,v2u,v6d,v3u,v4d --> combination (min cost: 24)
Here is csv (comma-separated values) format file containing the basic examples:
basic.csv. It is convenient to use the csv format because
Python has a csv module for reading in and writing such data.
The data for the advanced examples is in a
John-sigs.csv.
In this file,
there are three sequences of elements that we can call john1, john2, and john3.
(The first sequence, john1, is based very roughly on the first name in a photograph of John Hancock's signature.)
Each sequence is on its own line of text.
Compute the following quantities:
M(john1, john2)
M(john1, john3)
|