In: Paparoditis N., Pierrot-Deseilligny M.. Mallet C.. Tournaire O. (Eds). IAPRS. Vol. XXXVIII. Part ЗА - Saint-Mandé, France. September 1-3, 2010
2.2 Strueture-froin-Motion
Stmcture-from-Motion (Lhuillier, 2008a) is applied to estimate
geometry (camera poses and a sparse cloud of 3D points) using
the calibration initialization of Section 2.1: estimate the 2-view
and 3-view geometries of consecutive images from matched Har
ris points (step 1) and then estimate the whole sequence geometry
using bundle adjustment (BA) applied in a hierarchical frame
work (step 2). Another BA is applied to refine simultaneously
radial distortion parameters and the 3D assuming that the radial
distortion is constant in the whole sequence (step 3).
Although it is important to get a maximum number of recon
structed features for 3D scene modeling, w'e noticed that there
are many more 3D points than needed to initialize the geometry
in our w ide view field context. Indeed, this is not uncommon to
have more than 2000 features per outdoor image involved in BA,
and this implies a waste of computation time. So the number ??/
of features per image is limited to 500 in all BAs of steps 1-2-3:
3D points are randomly selected and removed while rif is larger
than 500 in all images. In practice, this simple scheme holds a
good point distribution in the view field. The 4 th step is the fol
lowing: step 3 is applied a second time without iif limit to get
a maximum number of reconstructed features consistent with the
poses and calibration.
Our BA is the sparse Levenberg-Marquardt method assuming that
there are more structure parameters than camera ones: it includes
profile Choleski decomposition (Triggs et al., 2000) of the re
duced camera system.
2.3 Drift Removal
Drift or error accumulation is unavoidable in the geometry esti
mation of long sequence. Methods (Havlena et ah. 2009, Anan
and Hartley, 2005, Comelis et ah, 2004) detect the drift between
two reconstructed images ? and j if these images are taken at
similar locations. These methods also provide list Li,j of point
matches between i and j, which is used to remove drift. Without
drift removal, scene part visible in i and j is reconstructed twice.
Adequate BA and its initialization are applied to remove recon
struction duplicates while maintaining low re-projection errors in
the whole sequence of images {0.1 • • • n — 1}. Once the 2D fea
ture match list Lij is given for pair {?', j}, we remove the drift
between i and j as follow's. First, w ; e choose integer k such that
the neighborhood Af(?) of i is the list {? — k ■ ■ ■ i • • • i 4- k}. Sec
ond, j\T(i) and its data (3D geometry and image features) are du
plicated in images Af(n + k) = {?? • • • r?. + 2A’} such that images
n + k and i are the same. Third, we use RANSAC to fit the simi
larity transformation s of 3D points matched by Lij and apply s
to obtain Af(n + k) geometry in the same basis as {0 • • • n■— 1}
geometry. Fourth, {0 • • • n + 2k} geometry is refined by BA tak
ing into account L n +k,j (L n+ k,j is a copy of Lij with image
index changes). Now Af (n 4- k) geometry is the drift correction
of Af{i) geometry. Fifth, constrained bundle adjustment (CBA)
is applied to minimize the global re-projection error subject to
constraint c(x) = 0, where x concatenates 3D parameters of
{()•■• n + 2k} and c(x) concatenates the drifts between poses
of Af(i) and Af{n + k) (more details in Appendix B). At this
point, the drift between ? and j is removed but Af(n + k) is re
dundant. Last, we remove data involving Af (n + k) and apply
BA to {0 • • • n — 1} geometry by taking into account Lij.
This scheme is applied using a limit of ??/ = 500 features per
image to avoid w’aste of time as in Section 2.2, with the only
difference that Lj, ? and L n +k,j are not counted by this limit.
2.4 Over-Segmentation Mesh
This mesh has the follow ing purposes. It makes image-based sim
plification of the scene such that the view field is uniformly sam
pled. This is useful for time and space complexities of further
processing and more adequate than storing depth maps for all im
ages. Furthermore, it segments the image into polygons such that
depth discontinuities are constrained to be at polygon borders.
These borders are selected among image contours. If contours
are lacking, borders are preferred on concentric circles or radial
segments of the donut image. This roughly corresponds to hori
zontal and vertical depth discontinuities for standard orientation
of the catadioptric camera (if its symmetry axis is vertical).
In short, the image mesh is build in four steps: initialization
checkerboard (row's are concentric rings, columns have radial di
rections. cells are two Delaunay triangles), gradient edge inte
gration (perturb vertices to approximate the most prominent im
age contours), optimization (perturb all vertices to minimize the
sum, for all triangles, of color variances, plus the sum, for all ver
tices. of squared moduluses of umbrella operators), and polygon
segmentation (triangles are regrouped in small and convex poly
gons). In practice, lots of polygons are quadrilaterals similar to
those of the initialization checkerboard.
2.5 View-Centered 3D Model
View-centered 3D model is build from image mesh (Section 2.4)
assuming that the geometry is known (Sections 2.1, 2.2 and 2.3).
Depth Map in the Reference Image We reproject catadioptric
image onto the 6 faces of a virtual cube and apply match prop
agation (Lhuillier and Quan, 2002) to two parallel faces of two
cubes. The depth map in the i ih image is obtained by chaining
matches between consecutive images of Af (?). In the next steps,
the over-segmentation mesh in the i ih image is back-projected to
approximate the depth map.
Mesh Initialization For all polygons in image i, a plane in 3D
(or nil if failure) is estimated by a RANSAC procedure applied
on depths available inside the polygon. A depth is inlier of ten
tative plane if the corresponding 3D point is in this plane up to
thresholding (Appendix A). The best plane 7r defines 3D points
w'hich are the intersections between n and the observation rays of
the polygon vertices in the i th image. These 3D points are called
“3D vertices of polygon" although the polygon is 2D.
For all edges e in image i, we define boolean b e which will deter
mine the connection of triangles in both edge sides. Since depth
discontinuity is prohibited inside polygons, we initialize b e = 1
if both triangles are in the same polygon (other cases: b e = 0).
Connection Connections between polygons are needed to ob
tain a more realistic 3D model. Thus edge booleans are forced to
1 if neighboring polygons satisfy coplanarity constraint. For all
polygons p with a plane in 3D, we collect in list L p the polygons
q in p-neighborhood (including p) such that all 3D vertices of q
are in the plane ofp up to thresholding (Appendix A). If the sum
of solid angles of polygons in L p is greater than a threshold, we
have confidence in coplanarity between all polygons in L p and
we set b e = 1 for all edges e between two polygons of L p .
Hole Filling We fill hole H if its neighborhood N is copla-
nar. Both H and N are polygon lists. The former is a connected
component of polygons without plane in 3D and the latter con
tains polygons with plane in 3D. Neighborhood N is coplanar if
there is a plane ir (generated by random samples of vertices of
N) such that all 3D vertices in N are in 7r up to thresholding
(Appendix A). If N is coplanar, all polygons of H get plane n
and we set b e = 1 for all edges between tw'o triangles of H U N.