CMU-CS-01-168
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-01-168

A Subspace Approach to Layer Extraction and Its Application
to Patch-Based Structure from Motion and Video Compression

Qifa Ke, Takeo Kanade

December 2001
(Available August 2003)

CMU-CS-01-168.ps
CMU-CS-01-168.pdf
(Color images)


Keywords: Supspace, layer extraction, layered representation, structure from motion, patch-based SFM, video compression, video representation, motion segmentation


Representing videos with layers has important applications such as video compression, motion analysis, 3D modeling and rendering. This thesis proposes a subspace approach to extracting layers from video by taking advantage of the fact that homographies induced by planar patches in the scene form a low dimensional linear subspace. In the subspace, layers in the input images are mapped onto well-defined clusters, and can be reliably identified by a standard clustering algorithm (e.g., mean-shift). Global optimality is achieved since both spatial and temporal redundancy are simultaneously taken into account, and noise can be effectively reduced by enforcing the subspace constraint. The existence of subspace also enables outlier detection, making the subspace computation robust. Based on the subspace constraint, we propose a patch-based scheme for affine structure from motion (SFM), which recovers the plane equation of each planar patch in the scene, as well as the camera epipolar geometry. We propose two approaches to patch-based SFM: (1) factorization approach; and (2) layer based approach. Patch-based SFM provides a compact video representation that can be used to construct a high quality texture map for each layer.

We plan to apply our approach to generating Video Object Planes (VOPs) defined by MPEG-4 standard. VOP generation is a critical but unspecified step in MPEG-4 standard. Our motion model for each VOP consists of a global planar motion and localized deformations, which has a closed-form solution. Our goals are: (1) combining different low level cues to model VOPs; and (2) extracting VOPs that undergo more complicated motion (non-planar or non-rigid).

37 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by [email protected]