Full text: Technical Commission IV (B4)

  
2. VISION-AIDED PEDESTRIAN NAVIGATION 
Recently, by the increase in the resolution of digital cameras 
and computing power of mobile devices, visual sensors have 
gained a great attention in the positioning research community. 
Therefore, they have been used for motion detection, obstacle 
avoidance, and relative and absolute localization. Vision-based 
navigation has been used for decades in navigation of robots 
(Corke et al. 2007); however, using it in pedestrian navigation 
has become a research topic only in the last few years 
(Ruotsalainen et al., 2011; Hide et al., 2011; Steinhoff et al., 
2007). The focus of the vision-aided navigation research has 
been mainly in systems using a priori formed databases. When a 
match between images in the database and the ones taken by a 
pedestrian is found, the absolute position can be obtained. This 
procedure needs a priori preparations and highly depends on the 
availability of image database for that area. On the other hand, 
another group of algorithms with a wide range of applications 
deploy real-time motion estimation of a single camera moving 
freely through an environment. This estimation can be helpful in 
detecting displacement and orientation of the device and 
estimating the user's turns (Hide et al., 2011). This information 
can be incorporated in the position and heading estimation of 
pedestrian navigation. However, there are various problems 
when processing the video frames from a hand-held device's 
camera. First of all, the measurements are relative, therefore to 
estimate the absolute quantities, initialization of the parameters 
are required. Moreover, the scale of the observation cannot be 
obtained using only vision, and another sensor or a known 
dimension reference has to be used in order to retrieve the scale 
of the observation. Also the orientation of the mobile device 
affects the heading and velocity information. In this paper, we 
describe a low-cost context-aware personal navigation system 
that is capable of localizing a pedestrian using fusion of GPS 
and camera to robustly estimate frame-to-frame motion in real 
time (also known as visual odometry). 
2.4 Computer Vision Algorithm 
Motion estimation from video is a well-studied problem in 
computer vision. Approaches for motion estimation are based 
on either dense optical flow or sparse feature tracks (Steinhoff 
et al, 2007). In this paper a computer vision algorithm is 
developed to find the motion vector using the matched features 
between successive frames. The detected motion vectors are 
employed to estimate the forward motion velocity and the 
azimuth rotation angle between the two frames. To detect the 
motion vectors, interest points are detected from the frames 
using Speeded Up Robust Features (SURF) algorithm (Bay et 
al., 2008). The detected interest points of two successive frames 
are matched based on the Euclidean distance between the 
descriptors of these points. The vectors starting form an interest 
point in frame and ending at the corresponding matched point in 
the next frames are considered as candidate motion vectors. 
  
(a) | (b) 
Figure 1. The matched features, condidate motion vectors (red), and 
acceptable motion vectors using RANSAC in two different cases: a) 
forward motion and b) change of the the heading. 
As shown in figure 1, some matches could be incorrect due to 
the existence of repeated similar points in the frames. Therefore, 
the candidate motion vectors should be filtered out to remove 
the inconsistent vectors based on discrepancy in length or 
orientation of the vector (figure 1). The RANdom SAmpling 
Consensus algorithm (Fischler et al., 1981) is used to find the 
vector angle and vector length with the maximum number of 
compatible vectors.The accepted motion vectors are then 
averaged to get the average motion vector. The accuracy of their 
average motion vector is highly dependent on the number of the 
compatible vectors and variance of the angles and lengths of 
these vectors. Figure 2 shows the number of acceptable motion 
vectors from the first 20 motion vectors detected as the best 
matches in the successive frames. Under the assumption of 
having context information of the hand-held device alignment 
(texting mode and landscape/portrait forward alignment), the 
vertical component of the average motion vector is a measure of 
the forward motion speed between the two frames. The 
horizontal component of the average motion vector is a measure 
of the azimuth change between the two frames. To calibrate the 
scale approximation between the motion vector and both the 
forward velocity and the azimuth change, a reference track is 
navigated using the motion vector only. The transformation 
parameters between the motion vector and forward speed and 
azimuth change are computed so that the navigation solution 
matches the reference solution. Using the computed 
transformation parameters, the forward motion velocity and the 
azimuth change can be approximated between any two 
successive frames with the help of the average motion vector. 
The estimation of the velocity from camera can also be 
improved by the user mode such as walking, stairs, running 
context information. However, the relative measurements from 
the computer vision algorithm tend to accumulate error over 
time, resulting in long-term drifts. To limit this drift, it is 
necessary to augment such local pose systems with global 
estimations such as GPS. 
Number of the acceptable motion vectors from 20 best matched features on the successive frames 
Number of Acceptable Motion Vectors 
  
Number of Ihe frames 
Figure 2. The numbe of the acceptable motion vectors from 20 best 
matched features on consecutive frames. 
3. CONTEXT INFORMATION IN PNS 
In order to achieve a context-aware “vision-aided pedestrian 
navigation” system, two important questions must be answered: 
what type of context is important for such a system and how can 
we extract it using the sensors on a hand-held device? The 
following section discusses these issues and investigates 
different methods for context extracting from a mobile device's 
sensors. 
Context may refer to any piece of information that can be used 
to characterize the situation of an entity (person, place, or 
object) that is relevant to the interaction between a user and an 
application (Dey, 2001). While location information is by far 
the most frequently used attribute of context, attempts to use 
218 
otl 
ov 
co 
ca 
de 
an 
stt 
ac 
ac 
du 
sel 
lo: 
de 
an 
on 
  
hii 
an 
ba 
rol 
3.] 
dri
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.