and
; to
ing
y in
rian
are
e.g.
tion
de).
and
and
ing
(T ls
„In
anal
DR)
the
rent
the
OWN
ince
ling
sing
tion
tion
The
tion
rent
awn
mal
ase,
where E, N represent the absolute position in the East and North
coordinate, both in meters, V (m/s) is the speed, (radian) the
heading defined with the origin North and clockwise positive,
and J (radian/s) the heading change rate. The variable At
presents the time between two epochs. The state vector in our
system is:
xx =[Ex Ng jy Ux WX] (9)
To avoid linearization, the state transition matrix is defined here
simplified as:
0 0 0 sin Via Atyk
1:50 0. cosyn.744,
0 1 0 0 (10)
QA, 1 0
0 0.0 1
®, is approximated as a constant matrix at every time epoch k.
Observation Model general form is presented in equation (11)
and is defined according to the information provided by the
GPS and visual sensor.
Zy — HyXy t Vk (11)
where zy, is the observation vector, Hy is the observation model
which relates the state space into the observed space and vy is
the observation noise which is assumed to be zero mean
Gaussian white noise with covariance Ry (wy, N(0, Ry).
The number of measurements fed to the filter is varied on an
epoch-to-epoch basis based on the availability of the sensors
and its data rate. The non-availability situation of the visual
aiding is based on the matching accuracy and was discussed in
the computer vision section of this paper. The accuracy of the
GPS sensor is also available on the android smartphones. The
full-scale measurement vector (zy) is as follows:
zx = [Egps Naps Vars Weam Veaml (12)
The KF works in two phases: the prediction and the update. In
the first phase, the filter propagates the states and state’s
accuracies using the dynamic matrix ®_; and $t , (estimated
in the previous epoch), based on this equation: Ry = OR}.
Then the covariance matrix Py can be estimated using P. , .
The usual equation to calculate P, is Pg = ® PF df +
Qk-1. In the update phase the state is corrected by a robust
blending of prediction solution with the update measurements
based on the following equation:
Rx = Ri + Ky(zx — HRY) (13)
where K is the Kalman gain obtained by:
K = PHI (H, Pc Hf + Ry). The update of the covariance
takes place with the equation: PB} = Po — Ki HP.
S. EXPERIMENTS AND RESULTS
The potential of the proposed method are evaluated through
comprehensive experimental tests conducted on a wide variety
of datasets using a Samsung Galaxy Note smartphone. Multiple
sensors are integrated on the circuit board including MEMS tri-
axial accelerometers (STMicroelectronics — k3dh), three
orthogonal gyros (K3G), a back camera (Samsung SSKSBAF-
2MP that can record video frames in HD format, and a GPS
receiver module. To gather data from the phone, an application
called TPI android logger (developed by MMSS research group
at the University of Calgary) is used. These applications can be
used in real time and collect data with a timestamp.
For the context recognition, extensive pedestrian field tests have
been performed. First, training datasets for accelerometer and
gyro signals were collected for 10 minutes: three users were
asked to perform walking around a tennis court repeatedly with
different activities and device orientations such as on belt, in
pocket, carting in the backpack, in-hand dangling, texting and
talking modes. After the activity recognition step, the classified
results were compared with the known placement
configurations as shown in figure 6 to evaluate the accuracy of
the context recognition.
Stationary : A
Driving |
Walking ;€
Running
Stairs
Elevator
Bicking |
In hand (Dangling) ;&
In hand (Reading) ;&
Close to ear :
In a Pants pocket |
On belt i8
In hand bag _
In backpack .
In a Jacket pocket |
0 20 40 60 80 100
Figure 6: Recognition rates for different activities using
Feature-level fusion algorithm (SVM)
Figure 6 shows the recognition rate for each activity using
SVM. By investigating each activity's recognition rate, it can be
inferred that the user activities such as: texting, driving,
walking, running, taking stairs and elevator modes have an
accuracy of 95%. In contrast, the classification models cannot
distinguish between the device placements such as in pocket
and on belt. This is expected because the way the users put their
navigators in pocket and bags are quite ambiguous. In the case
of vision-aided pedestrian navigation, we only need the rexting-
mode and this mode can be detected from accelerometer sensor
with the accuracy of almost 82%. In this mode, the orientation
of the device (i.e. landscape or portrait mode) can be detected
with an accuracy of almost 93%.
Finally, a dataset with two combined user context was collected
for testing the total context-aware and navigation solution. The
user walked along the side-line of a tennis court in a close loop.
During the loop, the user changed the placement twice before
and after making turns which represents a very challenging
situation for vision navigation. Using the classification
algorithm, the system recognized the mode change and adapts
the most suitable vision-based heading estimation
automatically. Then, to accomplish vision-aided solution, the
frame rate of four images per second was used. The resolution
of the images was down-sampled to 320x240 pixels. The frame
rate of 4 Hz was chosen because the experiments show that it
provides sufficient information to capture meaningful motion
vectors in different scenarios. A comparison of integrated
navigation solutions is shown in Figure 7. The tennis court is
located between two buildings and therefore, the smartphone's
GPS navigation solution has been degraded. As it can be seen
from the figure, without using the context-aware vision-aided
navigation, the GPS solution in comparison with the vision
sensor is not accurate enough and unable to discern turns.
221