Computer Vision

A Modern Approach

Häftad, Engelska, 2012

4 789 kr

Beställningsvara. Skickas inom 3-6 vardagar. Fri frakt för medlemmar vid köp för minst 249 kr.

Computer Vision: A Modern Approach, 2e, is appropriate for upper-division undergraduate- and graduate-level courses in computer vision found in departments of Computer Science, Computer Engineering and Electrical Engineering.

This textbook provides the most complete treatment of modern computer vision methods by two of the leading authorities in the field. This accessible presentation gives both a general view of the entire computer vision enterprise and also offers sufficient detail for students to be able to build useful applications. Students will learn techniques that have proven to be useful by first-hand experience and a wide range of mathematical methods

Produktinformation

Utgivningsdatum2012-02-21
Mått10 x 10 x 10 mm
Vikt1 400 g
FormatHäftad
SpråkEngelska
Antal sidor800
Upplaga2
FörlagPearson Education
ISBN9780136085928

Tillhör följande kategorier

Systemvetenskap och AI inom Data och it
Artificiell intelligens inom Data och it

David A. Forsyth received the D.Phil. degree in computer science from Oxford University. He is currently a Professor in the Computer Science Division at the University of California at Berkeley. He has co-authored over eighty technical papers on computer vision, computer graphics and machine learning and has co-edited two books. Jean Ponce received the Ph.D. degree in Computer Science from the University of Paris Orsay. He is currently a Professor in the Department of Computer Science and the Beckman Institute at the University of Illinois at Urbana Champaign. Professor Ponce has written over a hundred conference and journal papers and co-edited two books on a range of subjects including computer vision and robotics.

I IMAGE FORMATION 11 Geometric Camera Models 31.1 Image Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.1 Pinhole Perspective . . . . . . . . . . . . . . . . . . . . . . . 41.1.2 Weak Perspective . . . . . . . . . . . . . . . . . . . . . . . . . 61.1.3 Cameras with Lenses . . . . . . . . . . . . . . . . . . . . . . . 81.1.4 The Human Eye . . . . . . . . . . . . . . . . . . . . . . . . . 121.2 Intrinsic and Extrinsic Parameters . . . . . . . . . . . . . . . . . . . 141.2.1 Rigid Transformations and Homogeneous Coordinates . . . . 141.2.2 Intrinsic Parameters . . . . . . . . . . . . . . . . . . . . . . . 161.2.3 Extrinsic Parameters . . . . . . . . . . . . . . . . . . . . . . . 181.2.4 Perspective Projection Matrices . . . . . . . . . . . . . . . . . 191.2.5 Weak-Perspective Projection Matrices . . . . . . . . . . . . . 201.3 Geometric Camera Calibration . . . . . . . . . . . . . . . . . . . . . 221.3.1 ALinear Approach to Camera Calibration . . . . . . . . . . . 231.3.2 ANonlinear Approach to Camera Calibration . . . . . . . . . 271.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Light and Shading 322.1 Modelling Pixel Brightness . . . . . . . . . . . . . . . . . . . . . . . 322.1.1 Reflection at Surfaces . . . . . . . . . . . . . . . . . . . . . . 332.1.2 Sources and Their Effects . . . . . . . . . . . . . . . . . . . . 342.1.3 The Lambertian+Specular Model . . . . . . . . . . . . . . . . 362.1.4 Area Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.2 Inference from Shading . . . . . . . . . . . . . . . . . . . . . . . . . . 372.2.1 Radiometric Calibration and High Dynamic Range Images . . 382.2.2 The Shape of Specularities . . . . . . . . . . . . . . . . . . . 402.2.3 Inferring Lightness and Illumination . . . . . . . . . . . . . . 432.2.4 Photometric Stereo: Shape from Multiple Shaded Images . . 462.3 Modelling Interreflection . . . . . . . . . . . . . . . . . . . . . . . . . 522.3.1 The Illumination at a Patch Due to an Area Source . . . . . 522.3.2 Radiosity and Exitance . . . . . . . . . . . . . . . . . . . . . 542.3.3 An Interreflection Model . . . . . . . . . . . . . . . . . . . . . 552.3.4 Qualitative Properties of Interreflections . . . . . . . . . . . . 562.4 Shape from One Shaded Image . . . . . . . . . . . . . . . . . . . . . 592.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 Color 683.1 Human Color Perception . . . . . . . . . . . . . . . . . . . . . . . . . 683.1.1 Color Matching . . . . . . . . . . . . . . . . . . . . . . . . . . 683.1.2 Color Receptors . . . . . . . . . . . . . . . . . . . . . . . . . 713.2 The Physics of Color . . . . . . . . . . . . . . . . . . . . . . . . . . . 733.2.1 The Color of Light Sources . . . . . . . . . . . . . . . . . . . 733.2.2 The Color of Surfaces . . . . . . . . . . . . . . . . . . . . . . 763.3 Representing Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.3.1 Linear Color Spaces . . . . . . . . . . . . . . . . . . . . . . . 773.3.2 Non-linear Color Spaces . . . . . . . . . . . . . . . . . . . . . 833.4 AModel of Image Color . . . . . . . . . . . . . . . . . . . . . . . . . 863.4.1 The Diffuse Term . . . . . . . . . . . . . . . . . . . . . . . . . 883.4.2 The Specular Term . . . . . . . . . . . . . . . . . . . . . . . . 903.5 Inference from Color . . . . . . . . . . . . . . . . . . . . . . . . . . . 903.5.1 Finding Specularities Using Color . . . . . . . . . . . . . . . 903.5.2 Shadow Removal Using Color . . . . . . . . . . . . . . . . . . 923.5.3 Color Constancy: Surface Color from Image Color . . . . . . 953.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99II EARLY VISION: JUST ONE IMAGE 1054 Linear Filters 1074.1 Linear Filters and Convolution . . . . . . . . . . . . . . . . . . . . . 1074.1.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074.2 Shift Invariant Linear Systems . . . . . . . . . . . . . . . . . . . . . 1124.2.1 Discrete Convolution . . . . . . . . . . . . . . . . . . . . . . . 1134.2.2 Continuous Convolution . . . . . . . . . . . . . . . . . . . . . 1154.2.3 Edge Effects in Discrete Convolutions . . . . . . . . . . . . . 1184.3 Spatial Frequency and Fourier Transforms . . . . . . . . . . . . . . . 1184.3.1 Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . 1194.4 Sampling and Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . 1214.4.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1224.4.2 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1254.4.3 Smoothing and Resampling . . . . . . . . . . . . . . . . . . . 1264.5 Filters as Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . 1314.5.1 Convolution as a Dot Product . . . . . . . . . . . . . . . . . 1314.5.2 Changing Basis . . . . . . . . . . . . . . . . . . . . . . . . . . 1324.6 Technique: Normalized Correlation and Finding Patterns . . . . . . 1324.6.1 Controlling the Television by Finding Hands by NormalizedCorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1334.7 Technique: Scale and Image Pyramids . . . . . . . . . . . . . . . . . 1344.7.1 The Gaussian Pyramid . . . . . . . . . . . . . . . . . . . . . 1354.7.2 Applications of Scaled Representations . . . . . . . . . . . . . 1364.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1375 Local Image Features 1415.1 Computing the Image Gradient . . . . . . . . . . . . . . . . . . . . . 1415.1.1 Derivative of Gaussian Filters . . . . . . . . . . . . . . . . . . 1425.2 Representing the Image Gradient . . . . . . . . . . . . . . . . . . . . 1445.2.1 Gradient-Based Edge Detectors . . . . . . . . . . . . . . . . . 1455.2.2 Orientations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1475.3 Finding Corners and Building Neighborhoods . . . . . . . . . . . . . 1485.3.1 Finding Corners . . . . . . . . . . . . . . . . . . . . . . . . . 1495.3.2 Using Scale and Orientation to Build a Neighborhood . . . . 1515.4 Describing Neighborhoods with SIFT and HOG Features . . . . . . 1555.4.1 SIFT Features . . . . . . . . . . . . . . . . . . . . . . . . . . 1575.4.2 HOG Features . . . . . . . . . . . . . . . . . . . . . . . . . . 1595.5 Computing Local Features in Practice . . . . . . . . . . . . . . . . . 1605.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1606 Texture 1646.1 Local Texture Representations Using Filters . . . . . . . . . . . . . . 1666.1.1 Spots and Bars . . . . . . . . . . . . . . . . . . . . . . . . . . 1676.1.2 From Filter Outputs to Texture Representation . . . . . . . . 1686.1.3 Local Texture Representations in Practice . . . . . . . . . . . 1706.2 Pooled Texture Representations by Discovering Textons . . . . . . . 1716.2.1 Vector Quantization and Textons . . . . . . . . . . . . . . . . 1726.2.2 K-means Clustering for Vector Quantization . . . . . . . . . . 1726.3 Synthesizing Textures and Filling Holes in Images . . . . . . . . . . 1766.3.1 Synthesis by Sampling Local Models . . . . . . . . . . . . . . 1766.3.2 Filling in Holes in Images . . . . . . . . . . . . . . . . . . . . 1796.4 Image Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1826.4.1 Non-local Means . . . . . . . . . . . . . . . . . . . . . . . . . 1836.4.2 Block Matching 3D (BM3D) . . . . . . . . . . . . . . . . . . 1836.4.3 Learned Sparse Coding . . . . . . . . . . . . . . . . . . . . . 1846.4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1866.5 Shape from Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . 1876.5.1 Shape from Texture for Planes . . . . . . . . . . . . . . . . . 1876.5.2 Shape from Texture for Curved Surfaces . . . . . . . . . . . . 1906.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191III EARLY VISION: MULTIPLE IMAGES 1957 Stereopsis 1977.1 Binocular Camera Geometry and the Epipolar Constraint . . . . . . 1987.1.1 Epipolar Geometry . . . . . . . . . . . . . . . . . . . . . . . . 1987.1.2 The Essential Matrix . . . . . . . . . . . . . . . . . . . . . . . 2007.1.3 The Fundamental Matrix . . . . . . . . . . . . . . . . . . . . 2017.2 Binocular Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 2017.2.1 Image Rectification . . . . . . . . . . . . . . . . . . . . . . . . 2027.3 Human Stereopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2037.4 Local Methods for Binocular Fusion . . . . . . . . . . . . . . . . . . 2057.4.1 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2057.4.2 Multi-Scale Edge Matching . . . . . . . . . . . . . . . . . . . 2077.5 Global Methods for Binocular Fusion . . . . . . . . . . . . . . . . . . 2107.5.1 Ordering Constraints and Dynamic Programming . . . . . . . 2107.5.2 Smoothness and Graphs . . . . . . . . . . . . . . . . . . . . . 2117.6 Using More Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . 2147.7 Application: Robot Navigation . . . . . . . . . . . . . . . . . . . . . 2157.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2168 Structure from Motion 2218.1 Internally Calibrated Perspective Cameras . . . . . . . . . . . . . . . 2218.1.1 Natural Ambiguity of the Problem . . . . . . . . . . . . . . . 2238.1.2 Euclidean Structure and Motion from Two Images . . . . . . 2248.1.3 Euclidean Structure and Motion from Multiple Images . . . . 2288.2 Uncalibrated Weak-Perspective Cameras . . . . . . . . . . . . . . . . 2308.2.1 Natural Ambiguity of the Problem . . . . . . . . . . . . . . . 2318.2.2 Affine Structure and Motion from Two Images . . . . . . . . 2338.2.3 Affine Structure and Motion from Multiple Images . . . . . . 2378.2.4 From Affine to Euclidean Shape . . . . . . . . . . . . . . . . 2388.3 Uncalibrated Perspective Cameras . . . . . . . . . . . . . . . . . . . 2408.3.1 Natural Ambiguity of the Problem . . . . . . . . . . . . . . . 2418.3.2 Projective Structure and Motion from Two Images . . . . . . 2428.3.3 Projective Structure and Motion from Multiple Images . . . . 2448.3.4 From Projective to Euclidean Shape . . . . . . . . . . . . . . 2468.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248IV MID-LEVEL VISION 2539 Segmentation by Clustering 2559.1 Human Vision: Grouping and Gestalt . . . . . . . . . . . . . . . . . 2569.2 Important Applications . . . . . . . . . . . . . . . . . . . . . . . . . 2619.2.1 Background Subtraction . . . . . . . . . . . . . . . . . . . . . 2619.2.2 Shot Boundary Detection . . . . . . . . . . . . . . . . . . . . 2649.2.3 Interactive Segmentation . . . . . . . . . . . . . . . . . . . . 2659.2.4 Forming Image Regions . . . . . . . . . . . . . . . . . . . . . 2669.3 Image Segmentation by Clustering Pixels . . . . . . . . . . . . . . . 2689.3.1 Basic Clustering Methods . . . . . . . . . . . . . . . . . . . . 2699.3.2 The Watershed Algorithm . . . . . . . . . . . . . . . . . . . . 2719.3.3 Segmentation Using K-means . . . . . . . . . . . . . . . . . . 2729.3.4 Mean Shift: Finding Local Modes in Data . . . . . . . . . . . 2739.3.5 Clustering and Segmentation with Mean Shift . . . . . . . . . 2759.4 Segmentation, Clustering, and Graphs . . . . . . . . . . . . . . . . . 2779.4.1 Terminology and Facts for Graphs . . . . . . . . . . . . . . . 2779.4.2 Agglomerative Clustering with a Graph . . . . . . . . . . . . 2799.4.3 Divisive Clustering with a Graph . . . . . . . . . . . . . . . . 2819.4.4 Normalized Cuts . . . . . . . . . . . . . . . . . . . . . . . . . 2849.5 Image Segmentation in Practice . . . . . . . . . . . . . . . . . . . . . 2859.5.1 Evaluating Segmenters . . . . . . . . . . . . . . . . . . . . . . 2869.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28710 Grouping and Model Fitting 29010.1 The Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 29010.1.1 Fitting Lines with the Hough Transform . . . . . . . . . . . . 29010.1.2 Using the Hough Transform . . . . . . . . . . . . . . . . . . . 29210.2 Fitting Lines and Planes . . . . . . . . . . . . . . . . . . . . . . . . . 29310.2.1 Fitting a Single Line . . . . . . . . . . . . . . . . . . . . . . . 29410.2.2 Fitting Planes . . . . . . . . . . . . . . . . . . . . . . . . . . 29510.2.3 Fitting Multiple Lines . . . . . . . . . . . . . . . . . . . . . . 29610.3 Fitting Curved Structures . . . . . . . . . . . . . . . . . . . . . . . . 29710.4 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29910.4.1 M-Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . 30010.4.2 RANSAC: Searching for Good Points . . . . . . . . . . . . . 30210.5 Fitting Using Probabilistic Models . . . . . . . . . . . . . . . . . . . 30610.5.1 Missing Data Problems . . . . . . . . . . . . . . . . . . . . . 30710.5.2 Mixture Models and Hidden Variables . . . . . . . . . . . . . 30910.5.3 The EM Algorithm for Mixture Models . . . . . . . . . . . . 31010.5.4 Difficulties with the EM Algorithm . . . . . . . . . . . . . . . 31210.6 Motion Segmentation by Parameter Estimation . . . . . . . . . . . . 31310.6.1 Optical Flow and Motion . . . . . . . . . . . . . . . . . . . . 31510.6.2 Flow Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 31610.6.3 Motion Segmentation with Layers . . . . . . . . . . . . . . . 31710.7 Model Selection: Which Model Is the Best Fit? . . . . . . . . . . . . 31910.7.1 Model Selection Using Cross-Validation . . . . . . . . . . . . 32210.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32211 Tracking 32611.1 Simple Tracking Strategies . . . . . . . . . . . . . . . . . . . . . . . . 32711.1.1 Tracking by Detection . . . . . . . . . . . . . . . . . . . . . . 32711.1.2 Tracking Translations by Matching . . . . . . . . . . . . . . . 33011.1.3 Using Affine Transformations to Confirm a Match . . . . . . 33211.2 Tracking Using Matching . . . . . . . . . . . . . . . . . . . . . . . . 33411.2.1 Matching Summary Representations . . . . . . . . . . . . . . 33511.2.2 Tracking Using Flow . . . . . . . . . . . . . . . . . . . . . . . 33711.3 Tracking Linear Dynamical Models with Kalman Filters . . . . . . . 33911.3.1 Linear Measurements and Linear Dynamics . . . . . . . . . . 34011.3.2 The Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . 34411.3.3 Forward-backward Smoothing . . . . . . . . . . . . . . . . . . 34511.4 Data Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34911.4.1 Linking Kalman Filters with Detection Methods . . . . . . . 34911.4.2 Key Methods of Data Association . . . . . . . . . . . . . . . 35011.5 Particle Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35011.5.1 Sampled Representations of Probability Distributions . . . . 35111.5.2 The Simplest Particle Filter . . . . . . . . . . . . . . . . . . . 35511.5.3 The Tracking Algorithm . . . . . . . . . . . . . . . . . . . . . 35611.5.4 A Workable Particle Filter . . . . . . . . . . . . . . . . . . . . 35811.5.5 Practical Issues in Particle Filters . . . . . . . . . . . . . . . 36011.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362V HIGH-LEVEL VISION 36512 Registration 36712.1 Registering Rigid Objects . . . . . . . . . . . . . . . . . . . . . . . . 36812.1.1 Iterated Closest Points . . . . . . . . . . . . . . . . . . . . . . 36812.1.2 Searching for Transformations via Correspondences . . . . . . 36912.1.3 Application: Building Image Mosaics . . . . . . . . . . . . . . 37012.2 Model-based Vision: Registering Rigid Objects with Projection . . . 37512.2.1 Verification: Comparing Transformed and Rendered Sourceto Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37712.3 Registering Deformable Objects . . . . . . . . . . . . . . . . . . . . . 37812.3.1 Deforming Texture with Active Appearance Models . . . . . 37812.3.2 Active Appearance Models in Practice . . . . . . . . . . . . . 38112.3.3 Application: Registration in Medical Imaging Systems . . . . 38312.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38813 Smooth Surfaces and Their Outlines 39113.1 Elements of Differential Geometry . . . . . . . . . . . . . . . . . . . 39313.1.1 Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39313.1.2 Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39713.2 Contour Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40213.2.1 The Occluding Contour and the Image Contour . . . . . . . . 40213.2.2 The Cusps and Inflections of the Image Contour . . . . . . . 40313.2.3 Koenderink’s Theorem . . . . . . . . . . . . . . . . . . . . . . 40413.3 Visual Events: More Differential Geometry . . . . . . . . . . . . . . 40713.3.1 The Geometry of the Gauss Map . . . . . . . . . . . . . . . . 40713.3.2 Asymptotic Curves . . . . . . . . . . . . . . . . . . . . . . . . 40913.3.3 The Asymptotic Spherical Map . . . . . . . . . . . . . . . . . 41013.3.4 Local Visual Events . . . . . . . . . . . . . . . . . . . . . . . 41213.3.5 The Bitangent Ray Manifold . . . . . . . . . . . . . . . . . . 41313.3.6 Multilocal Visual Events . . . . . . . . . . . . . . . . . . . . . 41413.3.7 The Aspect Graph . . . . . . . . . . . . . . . . . . . . . . . . 41613.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41714 Range Data 42214.1 Active Range Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . 42214.2 Range Data Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 42414.2.1 Elements of Analytical Differential Geometry . . . . . . . . . 42414.2.2 Finding Step and Roof Edges in Range Images . . . . . . . . 42614.2.3 Segmenting Range Images into Planar Regions . . . . . . . . 43114.3 Range Image Registration and Model Acquisition . . . . . . . . . . . 43214.3.1 Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43314.3.2 Registering Range Images . . . . . . . . . . . . . . . . . . . . 43414.3.3 Fusing Multiple Range Images . . . . . . . . . . . . . . . . . 43614.4 Object Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43814.4.1 Matching Using Interpretation Trees . . . . . . . . . . . . . . 43814.4.2 Matching Free-Form Surfaces Using Spin Images . . . . . . . 44114.5 Kinect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44614.5.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44714.5.2 Technique: Decision Trees and Random Forests . . . . . . . . 44814.5.3 Labeling Pixels . . . . . . . . . . . . . . . . . . . . . . . . . . 45014.5.4 Computing Joint Positions . . . . . . . . . . . . . . . . . . . 45314.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45315 Learning to Classify 45715.1 Classification, Error, and Loss . . . . . . . . . . . . . . . . . . . . . . 45715.1.1 Using Loss to Determine Decisions . . . . . . . . . . . . . . . 45715.1.2 Training Error, Test Error, and Overfitting . . . . . . . . . . 45915.1.3 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . 46015.1.4 Error Rate and Cross-Validation . . . . . . . . . . . . . . . . 46315.1.5 Receiver Operating Curves . . . . . . . . . . . . . . . . . . . 46515.2 Major Classification Strategies . . . . . . . . . . . . . . . . . . . . . 46715.2.1 Example: Mahalanobis Distance . . . . . . . . . . . . . . . . 46715.2.2 Example: Class-Conditional Histograms and Naive Bayes . . 46815.2.3 Example: Classification Using Nearest Neighbors . . . . . . . 46915.2.4 Example: The Linear Support Vector Machine . . . . . . . . 47015.2.5 Example: Kernel Machines . . . . . . . . . . . . . . . . . . . 47315.2.6 Example: Boosting and Adaboost . . . . . . . . . . . . . . . 47515.3 Practical Methods for Building Classifiers . . . . . . . . . . . . . . . 47515.3.1 Manipulating Training Data to Improve Performance . . . . . 47715.3.2 Building Multi-Class Classifiers Out of Binary Classifiers . . 47915.3.3 Solving for SVMS and Kernel Machines . . . . . . . . . . . . 48015.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48116 Classifying Images 48216.1 Building Good Image Features . . . . . . . . . . . . . . . . . . . . . 48216.1.1 Example Applications . . . . . . . . . . . . . . . . . . . . . . 48216.1.2 Encoding Layout with GIST Features . . . . . . . . . . . . . 48516.1.3 Summarizing Images with Visual Words . . . . . . . . . . . . 48716.1.4 The Spatial Pyramid Kernel . . . . . . . . . . . . . . . . . . . 48916.1.5 Dimension Reduction with Principal Components . . . . . . . 49316.1.6 Dimension Reduction with Canonical Variates . . . . . . . . 49416.1.7 Example Application: Identifying Explicit Images . . . . . . 49816.1.8 Example Application: Classifying Materials . . . . . . . . . . 50216.1.9 Example Application: Classifying Scenes . . . . . . . . . . . . 50216.2 Classifying Images of Single Objects . . . . . . . . . . . . . . . . . . 50416.2.1 Image Classification Strategies . . . . . . . . . . . . . . . . . 50516.2.2 Evaluating Image Classification Systems . . . . . . . . . . . . 50516.2.3 Fixed Sets of Classes . . . . . . . . . . . . . . . . . . . . . . . 50816.2.4 Large Numbers of Classes . . . . . . . . . . . . . . . . . . . . 50916.2.5 Flowers, Leaves, and Birds: Some Specialized Problems . . . 51116.3 Image Classification in Practice . . . . . . . . . . . . . . . . . . . . . 51216.3.1 Codes for Image Features . . . . . . . . . . . . . . . . . . . . 51316.3.2 Image Classification Datasets . . . . . . . . . . . . . . . . . . 51316.3.3 Dataset Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . 51516.3.4 Crowdsourcing Dataset Collection . . . . . . . . . . . . . . . 51516.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51717 Detecting Objects in Images 51917.1 The Sliding Window Method . . . . . . . . . . . . . . . . . . . . . . 51917.1.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 52017.1.2 Detecting Humans . . . . . . . . . . . . . . . . . . . . . . . . 52517.1.3 Detecting Boundaries . . . . . . . . . . . . . . . . . . . . . . 52717.2 Detecting Deformable Objects . . . . . . . . . . . . . . . . . . . . . . 53017.3 The State of the Art of Object Detection . . . . . . . . . . . . . . . 53517.3.1 Datasets and Resources . . . . . . . . . . . . . . . . . . . . . 53817.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53918 Topics in Object Recognition 54018.1 What Should Object Recognition Do? . . . . . . . . . . . . . . . . . 54018.1.1 What Should an Object Recognition System Do? . . . . . . . 54018.1.2 Current Strategies for Object Recognition . . . . . . . . . . . 54218.1.3 What Is Categorization? . . . . . . . . . . . . . . . . . . . . . 54218.1.4 Selection: What Should Be Described? . . . . . . . . . . . . . 54418.2 Feature Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54418.2.1 Improving Current Image Features . . . . . . . . . . . . . . . 54418.2.2 Other Kinds of Image Feature . . . . . . . . . . . . . . . . . . 54618.3 Geometric Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 54718.4 Semantic Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54918.4.1 Attributes and the Unfamiliar . . . . . . . . . . . . . . . . . . 55018.4.2 Parts, Poselets and Consistency . . . . . . . . . . . . . . . . . 55118.4.3 Chunks of Meaning . . . . . . . . . . . . . . . . . . . . . . . . 554VI APPLICATIONS AND TOPICS 55719 Image-Based Modeling and Rendering 55919.1 Visual Hulls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55919.1.1 Main Elements of the Visual Hull Model . . . . . . . . . . . . 56119.1.2 Tracing Intersection Curves . . . . . . . . . . . . . . . . . . . 56319.1.3 Clipping Intersection Curves . . . . . . . . . . . . . . . . . . 56619.1.4 Triangulating Cone Strips . . . . . . . . . . . . . . . . . . . . 56719.1.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56819.1.6 Going Further: Carved Visual Hulls . . . . . . . . . . . . . . 57219.2 Patch-Based Multi-View Stereopsis . . . . . . . . . . . . . . . . . . . 57319.2.1 Main Elements of the PMVS Model . . . . . . . . . . . . . . 57519.2.2 Initial Feature Matching . . . . . . . . . . . . . . . . . . . . . 57819.2.3 Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57919.2.4 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58019.2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58119.3 The Light Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58419.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58720 Looking at People 59020.1 HMM’s, Dynamic Programming, and Tree-Structured Models . . . . 59020.1.1 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . 59020.1.2 Inference for an HMM . . . . . . . . . . . . . . . . . . . . . . 59220.1.3 Fitting an HMM with EM . . . . . . . . . . . . . . . . . . . . 59720.1.4 Tree-Structured Energy Models . . . . . . . . . . . . . . . . . 60020.2 Parsing People in Images . . . . . . . . . . . . . . . . . . . . . . . . 60220.2.1 Parsing with Pictorial Structure Models . . . . . . . . . . . . 60220.2.2 Estimating the Appearance of Clothing . . . . . . . . . . . . 60420.3 Tracking People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60620.3.1 Why Human Tracking Is Hard . . . . . . . . . . . . . . . . . 60620.3.2 Kinematic Tracking by Appearance . . . . . . . . . . . . . . . 60820.3.3 Kinematic Human Tracking Using Templates . . . . . . . . . 60920.4 3D from 2D: Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . 61120.4.1 Reconstruction in an Orthographic View . . . . . . . . . . . . 61120.4.2 Exploiting Appearance for Unambiguous Reconstructions . . 61320.4.3 Exploiting Motion for Unambiguous Reconstructions . . . . . 61520.5 Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 61720.5.1 Background: Human Motion Data . . . . . . . . . . . . . . . 61720.5.2 Body Configuration and Activity Recognition . . . . . . . . . 62120.5.3 Recognizing Human Activities with Appearance Features . . 62220.5.4 Recognizing Human Activities with Compositional Models . . 62420.6 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62420.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62621 Image Search and Retrieval 62721.1 The Application Context . . . . . . . . . . . . . . . . . . . . . . . . . 62721.1.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 62821.1.2 User Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62921.1.3 Types of Image Query . . . . . . . . . . . . . . . . . . . . . . 63021.1.4 What Users Do with Image Collections . . . . . . . . . . . . 63121.2 Basic Technologies from Information Retrieval . . . . . . . . . . . . . 63221.2.1 Word Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . 63221.2.2 Smoothing Word Counts . . . . . . . . . . . . . . . . . . . . . 63321.2.3 Approximate Nearest Neighbors and Hashing . . . . . . . . . 63421.2.4 Ranking Documents . . . . . . . . . . . . . . . . . . . . . . . 63821.3 Images as Documents . . . . . . . . . . . . . . . . . . . . . . . . . . 63921.3.1 Matching Without Quantization . . . . . . . . . . . . . . . . 64021.3.2 Ranking Image Search Results . . . . . . . . . . . . . . . . . 64121.3.3 Browsing and Layout . . . . . . . . . . . . . . . . . . . . . . 64321.3.4 Laying Out Images for Browsing . . . . . . . . . . . . . . . . 64421.4 Predicting Annotations for Pictures . . . . . . . . . . . . . . . . . . 64521.4.1 Annotations from Nearby Words . . . . . . . . . . . . . . . . 64621.4.2 Annotations from the Whole Image . . . . . . . . . . . . . . 64621.4.3 Predicting Correlated Words with Classifiers . . . . . . . . . 64821.4.4 Names and Faces . . . . . . . . . . . . . . . . . . . . . . . . 64921.4.5 Generating Tags with Segments . . . . . . . . . . . . . . . . . 65121.5 The State of the Art of Word Prediction . . . . . . . . . . . . . . . . 65421.5.1 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65521.5.2 Comparing Methods . . . . . . . . . . . . . . . . . . . . . . . 65521.5.3 Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 65621.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659VII BACKGROUND MATERIAL 66122 Optimization Techniques 66322.1 Linear Least-Squares Methods . . . . . . . . . . . . . . . . . . . . . . 66322.1.1 Normal Equations and the Pseudoinverse . . . . . . . . . . . 66422.1.2 Homogeneous Systems and Eigenvalue Problems . . . . . . . 66522.1.3 Generalized Eigenvalues Problems . . . . . . . . . . . . . . . 66622.1.4 An Example: Fitting a Line to Points in a Plane . . . . . . . 66622.1.5 Singular Value Decomposition . . . . . . . . . . . . . . . . . . 66722.2 Nonlinear Least-Squares Methods . . . . . . . . . . . . . . . . . . . . 66922.2.1 Newton’s Method: Square Systems of Nonlinear Equations. . 67022.2.2 Newton’s Method for Overconstrained Systems . . . . . . . . 67022.2.3 The Gauss—Newton and Levenberg—Marquardt Algorithms . 67122.3 Sparse Coding and Dictionary Learning . . . . . . . . . . . . . . . . 67222.3.1 Sparse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 67222.3.2 Dictionary Learning . . . . . . . . . . . . . . . . . . . . . . . 67322.3.3 Supervised Dictionary Learning . . . . . . . . . . . . . . . . . 67522.4 Min-Cut/Max-Flow Problems and Combinatorial Optimization . . . 67522.4.1 Min-Cut Problems . . . . . . . . . . . . . . . . . . . . . . . . 67622.4.2 Quadratic Pseudo-Boolean Functions . . . . . . . . . . . . . . 67722.4.3 Generalization to Integer Variables . . . . . . . . . . . . . . . 67922.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682Bibliography 684Index 737List of Algorithms 760