Notes
Slide Show
Outline
1
SIFT
  • Guest Lecture by Jiwon Kim
  • http://www.cs.washington.edu/homes/jwkim/
2
SIFT Features and
Its Applications
3
Autostitch Demo
4
Autostitch
  • Fully automatic panorama generation
    • Input: set of images
    • Output: panorama(s)
  • Uses SIFT (Scale-Invariant Feature Transform) to find/align images
5
1. Solve for homography
6
1. Solve for homography
7
1. Solve for homography
8
2. Find connected sets of images
9
2. Find connected sets of images
10
2. Find connected sets of images
11
3. Solve for camera parameters
  • New images initialised with rotation, focal length of best matching image
12
3. Solve for camera parameters
  • New images initialised with rotation, focal length of best matching image
13
4. Blending the panorama
  • Burt & Adelson 1983
    • Blend frequency bands over range µ l
14
2-band Blending
15
Linear Blending
16
2-band Blending
17
So, what is SIFT?
  • Scale-Invariant Feature Transform
  • David Lowe at UBC
  • Scale/rotation invariant
  • Currently best known feature descriptor
  • Many real-world applications
    • Object recognition
    • Panorama stitching
    • Robot localization
    • Video indexing
    • …
18
Example: object recognition
19
SIFT properties
  • Locality: features are local, so robust to occlusion and clutter
  • Distinctiveness: individual features can be matched to a large database of objects
  • Quantity: many features can be generated for even small objects
  • Efficiency: close to real-time performance
20
SIFT algorithm overview
  • Feature detection
    • Detect points that can be repeatably selected under location/scale change
  • Feature description
    • Assign orientation to detected feature points
    • Construct a descriptor for image patch around each feature point
  • Feature matching
21
1. Feature detection
  • Detect points stable under location/scale change
    • Build continuous space (x, y, scale)
    • Approximated by multi-scale Difference-of-Gaussian pyramid
    • Select maxima/minima in (x, y, scale)
22
1. Feature detection
23
1. Feature detection
  • Localize extrema by fitting a quadratic


      • Sub-pixel/sub-scale interpolation using Taylor expansion



      • Take derivative and set to zero
24
1. Feature detection
  • Discard low-contrast/edge points
      • Low contrast: discard keypoints with        < threshold
      • Edge points: high contrast in one direction, low in the other ŕ compute principal curvatures from eigenvalues of 2x2 Hessian matrix, and limit ratio
25
1. Feature detection
  • Example
26
2. Feature description

    • Create histogram of local gradient directions computed at selected scale
    • Assign canonical orientation at peak of smoothed histogram
27
2. Feature description
  • Construct SIFT descriptor
    • Create array of orientation histograms
    • 8 orientations x 4x4 histogram array = 128 dimensions
28
2. Feature description
  • Advantage over simple correlation
    • Gradients less sensitive to illumination change
    • Gradients may shift: robust to deformation, viewpoint change
29
Performance: stability to noise
  • Match features after random change in image scale & orientation, with differing levels of image noise
  • Find nearest neighbor in database of 30,000 features
30
Performance:
stability to affine change
  • Match features after random change in image scale & orientation, with 2% image noise, and affine distortion
  • Find nearest neighbor in database of 30,000 features
31
Performance: distinctiveness
  • Vary size of database of features, with 30 degree affine change, 2% image noise
  • Measure % correct for single nearest neighbor match
32
3. Feature matching
  • For each feature in A, find nearest neighbor in B
33
3. Feature matching
  • Nearest neighbor search too slow for large database of 128-dimenional data
  • Approximate nearest neighbor search:
    • Best-bin-first [Beis et al. 97]: modification to k-d tree algorithm
    • Use heap data structure to identify bins in order by their distance from query point
  • Result: Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time
34
3. Feature matching
  • Reject false matches
    • Compare distance of nearest neighbor to second nearest neighbor
    • Common features aren’t distinctive, therefore bad
    • Threshold of 0.8 provides excellent separation
35
3. Feature matching
  • Now, given feature matches…
    • Find an object in the scene
    • Solve for homography (panorama)
    • …
36
3. Feature matching
  • Example: 3D object recognition
37
3. Feature matching
  • 3D object recognition
    • Assume affine transform: clusters of size >=3
    • Looking for 3 matches out of 3000 that agree on same object and pose: too many outliers for RANSAC or LMS
    • Use Hough Transform
      • Each match votes for a hypothesis for object ID/pose
      • Voting for multiple bins & large bin size allow for error due to similarity approximation
38
3. Feature matching
  • 3D object recognition: solve for pose
    • Affine transform of [x,y] to [u,v]:


    • Rewrite to solve for transform parameters:
39
3. Feature matching
  • 3D object recognition: verify model
    • Discard outliers for pose solution in prev step
    • Perform top-down check for additional features
    • Evaluate probability that match is correct
      • Use Bayesian model, with probability that features would arise by chance if object was not present
      • Takes account of object size in image, textured regions, model feature count in database, accuracy of fit [Lowe 01]
40
Planar recognition
  • Training images
41
Planar recognition
42
3D object recognition
  • Training images
43
3D object recognition
  • Only 3 keys are needed for recognition, so extra keys provide robustness
  • Affine model is no longer as accurate
44
Recognition under occlusion
45
Illumination invariance
46
Applications of SIFT
  • Object recognition
  • Panoramic image stitching
  • Robot localization
  • Video indexing
  • …


  • The Office of the Past
    • Document tracking and recognition
47
Location recognition
48
Robot Localization
49
Map continuously built over time
50
Locations of map features in 3D
51
 
52
The Office of the Past
  • Paper everywhere
53
Unify physical and
electronic desktops
  • Recognize video of paper on physical desktop
    • Tracking
    • Recognition
    • Linking
54
Unify physical and
electronic desktops
  • Applications
    • Find lost documents
    • Browse remote desktop
    • Find electronic version
    • History-based queries
55
Example input video
56
Demo – Remote desktop
57
System overview
58
System overview
59
System overview
60
System overview
61
System overview
62
System overview
63
System overview
64
System overview
65
Assumptions
  • Document
    • Corresponding electronic copy exists
    • No duplicates of same document
66
Assumptions
  • Document
    • Corresponding electronic copy exists
    • No duplicates of same document
  • Motion
    • 3 event types: move/entry/exit
    • One document at a time
    • Only topmost document can move
67
Non-assumptions
  • Desk need not be initially empty
68
Non-assumptions
  • Desk need not be initially empty
  • Stacks may overlap
69
Algorithm overview
70
Algorithm overview
71
Algorithm overview
72
Algorithm overview
73
Algorithm overview
74
Algorithm overview
75
Document tracking example
76
Document tracking example
77
Document tracking example
78
Document tracking example
79
Document tracking example
80
Document tracking example
81
Document tracking example
82
Document tracking example
83
Document tracking example
84
Document tracking example
85
Document Recognition
  • Match against PDF image database
86
Document Recognition
  • Performance analysis
    • Tested 20 pages against database of 162 pages
87
Document Recognition
  • Performance analysis
    • Tested 20 pages against database of 162 pages
    • ~200x300 pixels per document for reliable match
88
Document Recognition
  • Performance analysis
    • Tested 20 pages against database of 162 pages
    • ~200x300 pixels per document for reliable match
89
Results
  • Input video
    • ~40 minutes
    • 1024x768 @ 15 fps
    • 22 documents, 49 events
  • Running time
    • Video processed offline
    • No optimization
    • A few hours for entire video
90
Demo – Paper tracking
91
Photo sorting example
92
Photo sorting example
93
Demo – Photo sorting
94
Future work
  • Enhance realism
    • Handle more realistic desktops
    • Real-time performance
  • More applications
    • Support other document tasks
      • E.g., attach reminder, cluster documents
    • Beyond documents
      • Other 3D desktop objects, books/CD’s
95
Summary
  • SIFT is:
    • Scale/rotation invariant local feature
    • Highly distinctive
    • Robust to occlusion, illumination change, 3D viewpoint change
    • Efficient (real-time performance)
    • Suitable for many useful applications
96
References
  • Distinctive image features from scale-invariant keypoints
    • David G. Lowe, International Journal of Computer Vision, 60, 2 (2004), pp. 91-110
  • Recognising panoramas
    • Matthew Brown and David G. Lowe, International Conference on Computer Vision (ICCV 2003), Nice, France (October 2003), pp. 1218-25.
  • Video-Based Document Tracking: Unifying Your Physical and Electronic Desktops
    • Jiwon Kim, Steven M. Seitz and Maneesh Agrawala, ACM Symposium on User Interface Software and Technology (UIST 2004), pp. 99-107.