Photogrammetric computer vision: Papers accepted on the basis of peer-review full manuscripts

kalliany, r.; leberl, franz w.
ISPRS Commission II, Vol.34, Part 3A „Photogrammetric Computer Vision“, Graz, 2002 
  
@ | (b) 
  
(c) (d) 
Figure 4: 3D reconstruction of a Medusa head. (a) one 
of the original video frames, (b) corresponding depth map, 
(c) shaded and (d) textured view of the 3D model. 
increased through the combination of multiple viewpoints 
and large global baseline while the matching is simplified 
through the small local baselines. 
2.4 Building visual models 
In the previous sections a dense structure and motion re- 
covery approach was explained. This yields all the neces- 
sary information to build photo-realistic virtual models. 
3D models The 3D surface is approximated by a triangu- 
lar mesh to reduce geometric complexity and to tailor the 
model to the requirements of computer graphics visualiza- 
tion systems. A simple approach consists of overlaying a 
2D triangular mesh on top of one of the images and then 
build a corresponding 3D mesh by placing the vertices of 
the triangles in 3D space according to the values found in 
the corresponding depth map. The image itself is used as 
texture map. If no depth value is available or the confi- 
dence is too low the corresponding triangles are not recon- 
structed. The same happens when triangles are placed over 
discontinuities. This approach works well on dense depth 
maps obtained from multiple stereo pairs. 
The texture itself can also be enhanced through the multi- 
view linking scheme. A median or robust mean of the 
corresponding texture values can be computed to discard 
imaging artifacts like sensor noise, specular reflections and 
highlights (Koch et al., 1998, Ofek et al., 1997). 
To reconstruct more complex shapes it is necessary to com- 
bine multiple depth maps. Since all depth-maps are located 
in a single metric frame, registration is not an issue. To 
integrate the multiple depth maps into a single surface rep- 
resentation, the volumetric technique proposed in (Curless 
and Levoy, 1996) is used. 
An important advantage of our approach compared to more 
interactive techniques (Debevec et al., 1996, PhotoMod- 
eler) is that much more complex objects can be dealt with. 
Compared to non-image based techniques we have the im- 
portant advantage that surface texture is directly extracted 
from the images. This does not only result in a much higher 
degree of realism, but is also important for the authenticity 
of the reconstruction. Therefore the reconstructions ob- 
tained with this system can also be used as a scale model 
on which measurements can be carried out or as a tool 
for planning restorations. A disadvantage of our approach 
(and more in general of most image-based approaches) is 
that our technique can not directly capture the photometric 
properties of an object, but only the combination of these 
with lighting. It is therefore not possible to re-render the 
3D model under different lighting. This is a topic of future 
research. 
lightfield rendering Alternatively, when the purpose is 
to render new views from similar viewpoints image-based 
approaches can be used (Levoy and Hanrahan, 1996, Gortler 
et al, 1996). The approach we present here avoids the 
difficult problem of obtaining a consistent 3D model by 
using view-dependent texture and geometry. This also al- 
lows to take more complex visual effects such as reflec- 
tions and highlights into account. This approach renders 
views directly from the calibrated sequence of recorded 
images with use of local depth maps. The original images 
are directly mapped onto one or more planes viewed by a 
virtual camera. 
To obtain a high-quality image-based scene representation, 
we need many views from a scene from many directions. 
For this, we can record an extended image sequence mov- 
ing the camera in a zigzag like manner. To obtain a good 
quality structure-and-motion estimation from this type of 
sequence and reduce error accumulation it can be impor- 
tant to also match close views that are not predecessors or 
successors in the image stream (Koch et al., 1999). 
The simplest approach consists of approximating the scene 
geometry by a single plane. The mapping from a recorded 
image to a new view or vice-versa then corresponds to a 
homography. To construct a specific view it is best to in- 
terpolate between neighboring views. The color value for a 
particular pixel can thus best be obtained from those views 
whose projection center is close to the viewing ray of this 
pixel or, equivalently, project closest to the specified pixel. 
For simplicity the support is restricted to the nearest three 
cameras (see Figure 5). All camera centers are projected 
into the virtual image and a 2D triangulation is performed. 
The cameras corresponding to the corners of a triangle then 
contribute to all pixels inside the triangle. The color val- 
ues are blended using the baricentric coordinates on the 
triangle as weights. The total image is built up as a mo- 
saic of these triangles. Although this technique assumes a 
very sparse approximation of geometry, the rendering re- 
sults show only small ghosting artifacts (see experiments). 
The results can be further improved. It is possible to use a 
different approximating plane for each triangle. This im- 
proves the accuracy further as the approximation is not 
done for the whole scene but just for that part of the image 
which is seen through the actual triangle. The 3D position 
of the triangle vertices can be obtained by looking up the 
A- 255
1
2
...
268
269
270
271
272
...
456
457
Full text: Papers accepted on the basis of peer-review full manuscripts (Part A)

Access restriction

Copyright

Note to user