Segmenting independent objects in video is a difficult problem. To begin with, how does one define what constitutes an object? One way of doing so is to learn the typical appearance of an object from training data and to look for it in new images, but this would only detect objects that look somehow similar and appear in similar poses. The appearance and the shape of an object depend on its type, position, illumination of the scene, and many other factors. But the motion of parts that constitute an object has a measurable consistency, even if the object is not rigid.
We are working on algorithms for detecting objects by measuring how their parts move with respect to each other.
Image features are independently tracked during time, and then grouped according to a variety of criteria. Instead of looking for objects with a particular shape or appearance, groups of features which are formed consistently are identified as objects. As a result, the number of objects is decided automatically.
For each time interval, estimates of the features' mean velocities are obtained and segmented. Then, a consensus algorithm is used to merge multiple segmentations.
The merged segmentation is constrained by a bound on the ratio of pair-wise agreements and contradictions with the partial equivalence relations given by the contributing segmentations.
Published in Motion Segmentation by Consensus, R. Fraile, D. Hogg and A. G. Cohn
ICPR 2008.
PDF