22
In: Paparoditis N., Pierrot-Deseilligny M.. Mallet C.. Tournaire O. (Eds), 1APRS. Vol. XXXVIII. Part 3A - Saint-Mandé, France. September 1-3. 2010
Figure 1: Geometry estimation steps: (a) day 1 sequence, (b) re
move drift, (c) merge day 1-2 sequences, (d-f) remove drifts, (g)
use all features. All results are registered in rectangle [0.1] x
[0. 0.8] by enforcing constant coordinates on the two poses sur
rounded by gray disks in (a). Gray disks in (b,d,e,f) show poses
where drift is corrected. Day 1-2 sequences are merged on gray
disk in (c).
remove drifts using k — 1. Cases (b,d,e.f) of Fig. 1 are trajec
tory loops with (424,451,1434,216) images and are obtained by
(16.62,39,9) CBA iterations in (190,2400.1460,370) seconds, re
spectively. We think that a large part of the drift in case (d) is
due to the single view point approximation, which is inaccurate
in the outdoor corridor (top right corner of Fig. 4) with small
scene depth. A last BA is applied to refine the geometry (3D
and intrinsic parameters) and to increase the list of reconstructed
points. The final geometry (Fig. l.g) has 699410 points recon-
staicted from 3.9M Harris features; the means of track lengths
and 3D points visible in one view are 5.5 and 1721. respectively.
Then, 2256 view-centered models are reconstructed thanks to the
methods in Section 2.4 and 2.5 using k — 1. This is the most time
consuming part of the method since one view-centered model is
computed in about 3 min 30s. The first step of view-centered
model computation is the over-segmentation mesh in the refer
ence image. It samples the view field such that the super-pixels at
the neighborhood of horizontal plane projection are initialized by
squares of size 8x8 pixels in the images. The mean of number
of 3D triangles is 17547. Fig. 3 shows super-pixels of a reference
image and the resulting view-centered model.
Figure 2: From top to bottom: 722 and 535 matches of L t ,j used
to remove drift in cases (d) and (e) of Fig. 1. Images of days 1
and 2 are on the left and right, respectively.
Last, the methods in Section 2.6 are applied to filter the 39.6M
triangles stored in hard disk. A first filtering is done using relia
bility («o = 5 degrees), prior knowledge and uncertainty filters
(wo = 1-1): we obtain 6.5M triangles in 40 min and store them in
RAM. Redundancy removal is the last filtering and selects 4.5M
triangles in 44 min. Texture packing and VRML file saving take
9 min. Fig. 4 shows views of the final model. We note that the
scene is curved as if it lie on a sphere surface whose diameter has
several kilometers: a vertical component of drift is left.
An other experiment is the quantitative evaluation of scene ac
curacy (discrepancy between scene reconstruction and ground
truth) for a view-centered model using k = 1. A represen
tative range of baselines is obtained with the following ground
truth: the [0, 5] 3 cube and camera locations defined by c, =
(l 1 + ?'/5 l) ,i € {0,1, 2} (numbers in meters). First,
synthetic images are generated using ray-tracing and the knowl
edge of mirror/perspective camera/textured cube. Second, meth
ods in Sections 2.1, 2.2. 2.4 and 2.5 are applied. Third, a camera-
based registration is applied to put the scene estimation in the
coordinate frame of ground tnith. Last, the scene accuracy ao.9
is estimated using the distance e between vertex v of the model
and the ground truth surface: inequality |e(v)| < ao.g||v — ci||
is true for 90% of vertices. We obtain 00.9 = 0.015.
4 CONCLUSION
We present an environment reconstruction system from images
acquired by a $1000 camera. Several items are described: camera
model, structure-from-motion. drift removal, view field sampling
by super-pixels, view-centered model and triangle filtering. Un
like previous work, image meshes define both super-pixels (con
vex polygons) and triangles of 3D models. The current system
is fully automatic up to the loop detection step (that previous
methods could solve). Last it is experimented on a challenging
sequence.
Future work includes loop detection integration, better use of
visibility and prior knowledge for scene reconstruction, joining