2. VISION-AIDED PEDESTRIAN NAVIGATION
Recently, by the increase in the resolution of digital cameras
and computing power of mobile devices, visual sensors have
gained a great attention in the positioning research community.
Therefore, they have been used for motion detection, obstacle
avoidance, and relative and absolute localization. Vision-based
navigation has been used for decades in navigation of robots
(Corke et al. 2007); however, using it in pedestrian navigation
has become a research topic only in the last few years
(Ruotsalainen et al., 2011; Hide et al., 2011; Steinhoff et al.,
2007). The focus of the vision-aided navigation research has
been mainly in systems using a priori formed databases. When a
match between images in the database and the ones taken by a
pedestrian is found, the absolute position can be obtained. This
procedure needs a priori preparations and highly depends on the
availability of image database for that area. On the other hand,
another group of algorithms with a wide range of applications
deploy real-time motion estimation of a single camera moving
freely through an environment. This estimation can be helpful in
detecting displacement and orientation of the device and
estimating the user's turns (Hide et al., 2011). This information
can be incorporated in the position and heading estimation of
pedestrian navigation. However, there are various problems
when processing the video frames from a hand-held device's
camera. First of all, the measurements are relative, therefore to
estimate the absolute quantities, initialization of the parameters
are required. Moreover, the scale of the observation cannot be
obtained using only vision, and another sensor or a known
dimension reference has to be used in order to retrieve the scale
of the observation. Also the orientation of the mobile device
affects the heading and velocity information. In this paper, we
describe a low-cost context-aware personal navigation system
that is capable of localizing a pedestrian using fusion of GPS
and camera to robustly estimate frame-to-frame motion in real
time (also known as visual odometry).
2.4 Computer Vision Algorithm
Motion estimation from video is a well-studied problem in
computer vision. Approaches for motion estimation are based
on either dense optical flow or sparse feature tracks (Steinhoff
et al, 2007). In this paper a computer vision algorithm is
developed to find the motion vector using the matched features
between successive frames. The detected motion vectors are
employed to estimate the forward motion velocity and the
azimuth rotation angle between the two frames. To detect the
motion vectors, interest points are detected from the frames
using Speeded Up Robust Features (SURF) algorithm (Bay et
al., 2008). The detected interest points of two successive frames
are matched based on the Euclidean distance between the
descriptors of these points. The vectors starting form an interest
point in frame and ending at the corresponding matched point in
the next frames are considered as candidate motion vectors.
(a) | (b)
Figure 1. The matched features, condidate motion vectors (red), and
acceptable motion vectors using RANSAC in two different cases: a)
forward motion and b) change of the the heading.
As shown in figure 1, some matches could be incorrect due to
the existence of repeated similar points in the frames. Therefore,
the candidate motion vectors should be filtered out to remove
the inconsistent vectors based on discrepancy in length or
orientation of the vector (figure 1). The RANdom SAmpling
Consensus algorithm (Fischler et al., 1981) is used to find the
vector angle and vector length with the maximum number of
compatible vectors.The accepted motion vectors are then
averaged to get the average motion vector. The accuracy of their
average motion vector is highly dependent on the number of the
compatible vectors and variance of the angles and lengths of
these vectors. Figure 2 shows the number of acceptable motion
vectors from the first 20 motion vectors detected as the best
matches in the successive frames. Under the assumption of
having context information of the hand-held device alignment
(texting mode and landscape/portrait forward alignment), the
vertical component of the average motion vector is a measure of
the forward motion speed between the two frames. The
horizontal component of the average motion vector is a measure
of the azimuth change between the two frames. To calibrate the
scale approximation between the motion vector and both the
forward velocity and the azimuth change, a reference track is
navigated using the motion vector only. The transformation
parameters between the motion vector and forward speed and
azimuth change are computed so that the navigation solution
matches the reference solution. Using the computed
transformation parameters, the forward motion velocity and the
azimuth change can be approximated between any two
successive frames with the help of the average motion vector.
The estimation of the velocity from camera can also be
improved by the user mode such as walking, stairs, running
context information. However, the relative measurements from
the computer vision algorithm tend to accumulate error over
time, resulting in long-term drifts. To limit this drift, it is
necessary to augment such local pose systems with global
estimations such as GPS.
Number of the acceptable motion vectors from 20 best matched features on the successive frames
Number of Acceptable Motion Vectors
Number of Ihe frames
Figure 2. The numbe of the acceptable motion vectors from 20 best
matched features on consecutive frames.
3. CONTEXT INFORMATION IN PNS
In order to achieve a context-aware “vision-aided pedestrian
navigation” system, two important questions must be answered:
what type of context is important for such a system and how can
we extract it using the sensors on a hand-held device? The
following section discusses these issues and investigates
different methods for context extracting from a mobile device's
sensors.
Context may refer to any piece of information that can be used
to characterize the situation of an entity (person, place, or
object) that is relevant to the interaction between a user and an
application (Dey, 2001). While location information is by far
the most frequently used attribute of context, attempts to use
218
otl
ov
co
ca
de
an
stt
ac
ac
du
sel
lo:
de
an
on
hii
an
ba
rol
3.]
dri