5.4 Minimization
The main problem of non-linear parameter estima-
tion is to find a method which guarantees conver-
gence of the cost function (eq. 20) to a global mini-
mum. The minimization using the Levenberg-Mar-
quardt method (see [20]), which is a combination of
Newton’s method and a gradient descent, converges
to the nearest local minimum. The global minimum
is found with good initial parameter values. How-
ever, we do not have initial parameter estimates.
Thus, we divide the global model fitting problem
into three steps to enhance and monitor the param-
eter estimates.
Step I: In the first step, the poses of all objects are
reconstructed individually, and separately for each
camera view. This procedural knowledge belongs to
the concept RC_OBJECT and is inherited by every
specialization. The projection of one object model
depends on 7 parameters. As few parameters are
to be estimated, the individual reconstructions are
performed very quickly; however the minimizations
have to be monitored in order not to let them con-
verge to false local minima because of inappropriate
initial values. If the focal length leaves an admissible
range (10-100mm in our case), the object is rotated
by negating two rotational parameters and the min-
imization is restarted with the other parameters re-
set to their original initial values. The cost function
is also monitored during minimization. If the pro-
cess converges to a local minimum with inadmissible
high costs, the z-translation parameter is modified
according to a predefined scheme. This monitored
Levenberg-Marquardt iteration is stopped if either
the change of the parameter estimates from one iter-
ation step to the next is less than a given threshold,
or if the model fitting does not succeed, i.e. if a max-
imum number of iterations is reached or if the same
local minimum is found despite modified parameter
values.
Step II: If a successful instance of a reconstructed
object is created then it is added as part of
RC.viEW. This concept performs step II of the min-
imization process. For a given camera view the me-
dian of all estimates of the focal length from step I
is fixed at this step and it is used to reconstruct the
pose of each object in the scene. So during this step,
better initial estimates for objects’ poses are derived
for each view of the scene.
Step III: The median focal length and the resulting
objects’ poses of step II are used as initial values for
global model fitting. It is possible to estimate the
relative pose between different cameras from the ob-
ject correspondences. This step is part of the proce-
dural knowledge of the concept RC_SCENE. Within
this step it is possible to instantiate the concept
RC_CAM_PARAM.
720
International Archives of Photogrammetry and Remote Sensing. Vol. XXXI, Part B3. Vienna 1996
5.5 Camera Parameter Estimation
Classical camera calibration methods (e.g. [28]) can
not be performed on-line as they demand a special
calibration pattern. Depth estimation is then a two-
step process and it may lead to suboptimal solu-
tions. We have explicitly modeled the camera pa-
rameters in our projection functions and thus they
are estimated using the knowledge of the 3D struc-
ture of the objects in the scene as part of the pro-
cedural knowledge of the concepts RC_SCENE and
RC_cAM_PARAM. We estimate the external cam-
era parameters and the focal length. The results
show that principal point and scale factors are sta-
ble enough for our off-the-shelf CCD cameras to as-
sume fixed values. The influence of lens distortion
to the results of our approach is quite small. Never-
theless, it is possible to model the estimation of lens
distortion in a manner similar to that of [10].
Tsai [28] shows that full camera calibration is pos-
sible with five coplanar reference points. A solu-
tion for calibration derived with four coplanar points
is unique because four coplanar points determine a
collineation in a plane and any further imaginary
points in that plane as intersections of lines between
lines through the four points can be derived. Six non
coplanar points determine a unique solution as well
(see [30]).
Scene reconstruction is possible with one camera
view. Taking a stereo image leads to much more ro-
bust results. Furthermore, the pose of a circle with
known radius can not be computed uniquely from
one view (see [11]). Taking at least two images for
reconstruction, the pose of a circle in space is, if the
focal lengths are known, uniquely defined up to the
direction of its normal vector (ref. [4]). The sign of
the normal can be determined due to the visibility
of the projected ellipse.
5.6 Results
Fig. 8 shows the object recognition results and
the 3D reconstruction of a stereo image typical for
our scenario. In Fig. 8 a) and b) the instances
of the corresponding specializations of the concept
PE_OBJECT (names in German) and their image re-
gions, obtained by the color segmentation, are visu-
alized. All objects are recognized correctly. Only in
the right image the small ring is missing. This is cor-
rected taking the left image in the 3D reconstruction
processes. Fig. 8 c) shows the final result of the 3D
scene reconstruction (instance of RC_SCENE). The
geometric object models are projected onto the right
image. The projected object models fit very well to
the objects in the images.
6 Conclusion
Based on a detailed discussion of object modeling
for object recognition and scene interpretation, a
Fig
obj