Close-range imaging, long-range vision

  
2. Computer Vision background 
Our recognition and tracking scheme of mobile 3D scenes from 
a sequence of views presents several levels of analysis: 
e Signal treatment: filter design and their computer 
implementation to detect features and basic primitives 
into binarized images associated to sampled images 
extracted from a video sequence. 
e Extraction of basic primitives: grouping of 
minisegments located at boundaries, linearization of 
auxiliary straight lines and auxiliary vanishing points 
e Generation of local structural elements: regions 
identification and grouping 
e Global matching of objects and scenes: identification 
of adjacency relations, generation and tracking of 
apparent 
This scheme follows an increasing information complexity from 
pixel or "infinitesimal" local, next local, and finally the global 
level. We have focused towards some relations between local 
and global analysis levels for each view. There, we have 
implemented algorithms for compatibility criteria relative to the 
local analysis, and algorithms for coherence evaluation criteria 
for the global analysis. Next we store and compare local and 
global information for triplets of images. Compatibility criteria 
follow an ascendant sense, inversely to the descendant 
coherence criteria, which evaluate and validate the acquired 
information to integrate it or not in a knowledge scheme. The 
robustness is reinforced by modifying the weight of data which 
are already labelled as locally compatible and globally coherent 
in precedent evaluations. 
At each level, we have local and global aspects. The feedback 
between both of them is carried out by verifying /ocal and 
global constraints associated to spatio-temporal propagation 
models, again. Spatial propagation models are used to analyze 
the local compatibility between data. They can be represented as 
coordinate changes induced by the projection of rigid 
transformations. Temporal propagation models involve to 
locally symmetric dynamical systems defined onto such 
support. If we select piecewise linear models, then the geometry 
of lines configurations with the corresponding induced action 
will provide a good candidate for temporal propagation models. 
As we have a robust geometric support for the scene given by a 
perspective model, we have choosen a kinematic propagation 
model. So, in our work, propagation is performed along 
comparable elements contained in the same image for the static 
case, and along putative homologue elements for triplets of 
images of a video sequence for the mobile case. The validation 
of propagation models includes also a classical model of error 
propagation and unbiased estimators for a grouping based on 
trapezoids, extending precedent constructions ([Coe92]). The 
search of correspondences is performed between trapezoidal 
regions, instead of common features, which are more difficult 
for identification and tracking problems. From a robust pair of 
corresponding trapezoids in successive images of a video 
sequence, we apply a propagation algorithm for prediction and 
tracking of egomotion. 
Extraction and grouping of basic primitives is performed in a 
standard way. We have applied the Canny's filter ([Can86]) to 
obtain minisegments for the frames extracted from a digitalized 
video sequence (two each second). Canny's filter is easily 
implementable, allow us to discriminate curved contours, and it 
gives a unique response by edge. Minisegments are grouped 
along lines by applying the Shin's algorithm ([SGB98]}). In this 
way attenuate the striped effect which appears from boundaries 
and minisegments obtained from the application of Canny's 
detector. Furthermore, we eliminate short segments and apply 
an active discrimination for vertical lines, due to the 
characteristics of scene and bad illumination (reflectances, 
irradiance of the ground and floor, etc). 
Perspective models are robust, provide an initialization for the 
system, and allow an image-to-image matching from easily 
identifiable perspective elements (vanishing points, projection 
lines, etc). We adopt a paraperspective model for a coarse 
planar representation of the scene, because it is easier to 
maintain and update due to a larger tolerance with small errors 
arising from noisy data. In indoor scenes, due to illumination 
problems (irradiance, shining, etc), the true corners obtained 
from Canny's detector are not well-defined. Thus, we generate 
perspective elements (projection lines and vanishing points) by 
using regression methods around apparent extremes of central 
vertical large segments. A weighted average of intersections 
corresponding to the pairs of projection lines determine a 
vanishing point. Next, the projection lines are retraced to 
recover a simplified version of the scene which is easier to work 
out. In this way, we simplify the matching procedures, by 
avoiding a profusion of partial matches which could give us a 
wrong representation of the paraperspective model for the 3D 
scene. 
The general framework to perform the mise in correspondence 
for homologue points along a monocular vide sequence is based 
on the epipolar geometry.([Fau93]) Reconstruction based onto 
3D points is feasible after identifying homologue points in pairs 
of images in terms of lifted triangulations linked to the images. 
Large segments are reconstructed by using collinearity 
constraints and the Hough transform for partially occluded 
segments. The epipolar constraints allow us to decouple the 
estimation of 3D motion from estimation of the structure of 3D 
scene. The global coherence of motion is based on the 
identification and patching of homologue triangles along the 
sequence of views. Next, homologue triangles are lifted out to 
prisms, and their intersection determines 3D triangles which 
provide the basic pieces for the 3D reconstruction. 
Triangulations appear as the standard tool in computer 
packages, with their applications to grouping ([Ber97]). 
Triangulations are useful for 3D reconstruction because 
homologue triangles determine the local homographies 
associated to a projective transformation linking two views. The 
update of triangulations is easy in terms of insertion/deletion 
algorithms of points ([Fau93], [Har00] ). Hence, elementary 
events in triangular data structures are linked to the 
(dis)apparition of points. Any triangulation 7 displays a simple 
combinatorial structure which can be translated to a graph Gr 
Nodes of Gr represent simple triangular regions, and their 
edges represent adjacency relations between triangles. 
Furthermore, each inserted point inside a triangle unfolds the 
original triangle in another three triangles with the 
corresponding adjacency relations. Often, the localization of 
such points is corrupted by the noise; another said, they are no 
easy to identify or they are even eliminated along the image 
preprocessing. 
In real indoor scenes, due to illumination and partial occlusion 
problems, large 2D segments are better determined than 
extremal real points at views. Instead of using points as in above 
described approaches, we identify and track segments along 
some vertical directions and projection lines. The segments are 
—150—
1
2
...
163
164
165
166
167
...
640
641
Full text: Close-range imaging, long-range vision

Access restriction

Copyright

Note to user