ISPRS Commission II, Vol.34, Part 3A „Photogrammetric Computer Vision“, Graz, 2002
@ | (b)
(c) (d)
Figure 4: 3D reconstruction of a Medusa head. (a) one
of the original video frames, (b) corresponding depth map,
(c) shaded and (d) textured view of the 3D model.
increased through the combination of multiple viewpoints
and large global baseline while the matching is simplified
through the small local baselines.
2.4 Building visual models
In the previous sections a dense structure and motion re-
covery approach was explained. This yields all the neces-
sary information to build photo-realistic virtual models.
3D models The 3D surface is approximated by a triangu-
lar mesh to reduce geometric complexity and to tailor the
model to the requirements of computer graphics visualiza-
tion systems. A simple approach consists of overlaying a
2D triangular mesh on top of one of the images and then
build a corresponding 3D mesh by placing the vertices of
the triangles in 3D space according to the values found in
the corresponding depth map. The image itself is used as
texture map. If no depth value is available or the confi-
dence is too low the corresponding triangles are not recon-
structed. The same happens when triangles are placed over
discontinuities. This approach works well on dense depth
maps obtained from multiple stereo pairs.
The texture itself can also be enhanced through the multi-
view linking scheme. A median or robust mean of the
corresponding texture values can be computed to discard
imaging artifacts like sensor noise, specular reflections and
highlights (Koch et al., 1998, Ofek et al., 1997).
To reconstruct more complex shapes it is necessary to com-
bine multiple depth maps. Since all depth-maps are located
in a single metric frame, registration is not an issue. To
integrate the multiple depth maps into a single surface rep-
resentation, the volumetric technique proposed in (Curless
and Levoy, 1996) is used.
An important advantage of our approach compared to more
interactive techniques (Debevec et al., 1996, PhotoMod-
eler) is that much more complex objects can be dealt with.
Compared to non-image based techniques we have the im-
portant advantage that surface texture is directly extracted
from the images. This does not only result in a much higher
degree of realism, but is also important for the authenticity
of the reconstruction. Therefore the reconstructions ob-
tained with this system can also be used as a scale model
on which measurements can be carried out or as a tool
for planning restorations. A disadvantage of our approach
(and more in general of most image-based approaches) is
that our technique can not directly capture the photometric
properties of an object, but only the combination of these
with lighting. It is therefore not possible to re-render the
3D model under different lighting. This is a topic of future
research.
lightfield rendering Alternatively, when the purpose is
to render new views from similar viewpoints image-based
approaches can be used (Levoy and Hanrahan, 1996, Gortler
et al, 1996). The approach we present here avoids the
difficult problem of obtaining a consistent 3D model by
using view-dependent texture and geometry. This also al-
lows to take more complex visual effects such as reflec-
tions and highlights into account. This approach renders
views directly from the calibrated sequence of recorded
images with use of local depth maps. The original images
are directly mapped onto one or more planes viewed by a
virtual camera.
To obtain a high-quality image-based scene representation,
we need many views from a scene from many directions.
For this, we can record an extended image sequence mov-
ing the camera in a zigzag like manner. To obtain a good
quality structure-and-motion estimation from this type of
sequence and reduce error accumulation it can be impor-
tant to also match close views that are not predecessors or
successors in the image stream (Koch et al., 1999).
The simplest approach consists of approximating the scene
geometry by a single plane. The mapping from a recorded
image to a new view or vice-versa then corresponds to a
homography. To construct a specific view it is best to in-
terpolate between neighboring views. The color value for a
particular pixel can thus best be obtained from those views
whose projection center is close to the viewing ray of this
pixel or, equivalently, project closest to the specified pixel.
For simplicity the support is restricted to the nearest three
cameras (see Figure 5). All camera centers are projected
into the virtual image and a 2D triangulation is performed.
The cameras corresponding to the corners of a triangle then
contribute to all pixels inside the triangle. The color val-
ues are blended using the baricentric coordinates on the
triangle as weights. The total image is built up as a mo-
saic of these triangles. Although this technique assumes a
very sparse approximation of geometry, the rendering re-
sults show only small ghosting artifacts (see experiments).
The results can be further improved. It is possible to use a
different approximating plane for each triangle. This im-
proves the accuracy further as the approximation is not
done for the whole scene but just for that part of the image
which is seen through the actual triangle. The 3D position
of the triangle vertices can be obtained by looking up the
A- 255