Full text: Proceedings, XXth congress (Part 3)

    
S. This is 
: inherently 
sses visual 
ne may be 
nformation 
bservations 
>ry analysis 
| images in 
ye-tracking 
Xf a trained 
iccuracy of 
bserving à 
1on focuses 
> individual 
ements In 
ures of the 
f geospatial 
quirements 
f geospatial 
x perimental 
spatial and 
sed on the 
protocols. 
ITION 
the human 
ted through 
"his allows 
attention Is 
(saccades) 
scrutiny of 
yroximately 
/isual angle 
ysis is the 
al over the 
t (ROIs). 
   
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol XXXV, Part B3. Istanbul 2004 
  
2.1 Where and What in Vision 
A visual scene, perceived by a human or an animal, is so 
complex that it is not possible to perceive the whole scene as 
one unit. Such a holistic perception would make the scene 
unique, rendering associations to other scenes and other 
perceptions impossible. Thus it is necessary to have a 
mechanism in the perceptual system that breaks down or 
fragments the scene to a more appropriate form of a 
representation. 
It has been found (Mishkin et.al.,1983) that humans and higher 
animals represent visual information in at least two important 
subsystems: the where- and the what systems. The where- 
system only processes the location of the object in the scene. It 
does not represent the kind of object, but this is the task of the 
what-system. The two systems work independently of each 
other and never converge to one common representation 
(Goldman-Rakic,1993). Physiologically, they are separated 
throughout the entire cortical process of visual analysis. 
The where-system builds up a spatial relation map, where no 
information about the form of the object is represented. This 
form of representation can be used for variable binding in 
collaboration with the what-system. The what-system 
represents categories of objects, without any information about 
their spatial location. 
In natural environments, a significant problem is to attend to a 
stimuli of interest. The where-system is a part of the attention 
process, since the where-system supplies information about 
where to foveate in the scene. The fovea in the retina is 
exclusively concerned with form perception, not with the 
location of the objects in the scene. 
2.2 Attention 
When the brain processes a visual scene, some of the elements 
of the scene are put in focus by various attention mechanisms 
(Posner, 1990). It is obvious that attention must be a very 
important property for identification and for learning in 
biological as well as artificial systems- many researches are 
focused on attention mechanisms necessary for grasping spatial 
relations. In a natural scene, one of the basic problems is to 
locate and identify objects and their parts. 
2.3 Saccades 
When the brain analyses a visual scene, it must combine the 
representations obtained from different domains. One 
hypothesis underlying the simulations states that attention 
shifts from domain to domain in a sequential way (Crick, 
1984). Since information about the form and other features of 
particular objects can be obtained only when the object is 
foveated, different objects can be attended to only through 
saccadic movements of the eye. 
These rapid eye movements, which are made at the rate of 
about three per second, orient the high-acuity foveal region of 
the eye over targets of interest in a visual scene. The 
characteristic properties of saccadic eye movements (or 
saccades) have been well studied (Carpenter, 1983). The high 
velocity of saccades, reaching up to 700° per second for large 
movements, serves to minimize the time in flight, so that most 
of the time is spent fixating chosen targets. 
Saccades are known to be ballistic, for example, their final 
location is computed prior to making the movement, and the 
trajectory of the movement is uninterrupted by incoming visual 
signals. Furthermore, owing to the structure of the retina, the 
central 1.5° of the visual field is represented with a visual 
resolution that is many times greater than that of the periphery. 
Saccades subserve the important function of bringing high 
resolution foveal region onto targets of interest in the visual 
scene. 
Initial eye movement studies suggest that the primary role of 
saccades might be to compensate for the lack of resolution over 
the visual field by “painting” an image into an internal 
memory. It was proposed that the saccadic movements and 
their resultant fixations allowed the formation of a visual- 
motor memory (“scan path”) that could be used for encoding 
objects and scenes (Noton and Stark, 1971). However, a 
number of studies, starting from Yarbus’ classical work 
(Yarbus, 1967), have suggested that gaze changes are most 
often directed according to the ongoing demands of the task at 
hand. 
The task-specific use of gaze is best understood for reading 
text (O' Regan, 1990) where the eyes fixate almost every word, 
sometimes skipping over smaller function words. In addition, it 
is known, that saccade size during reading is modulated 
according to the specific nature of the pattern recognition task 
at hand (Kowler and Anton, 1987). 
2.4 Fixations 
It is generally agreed that visual and cognitive processing do 
occur during fixations (Just and Carpenter, 1984). Fixation 
identification is an inherently statistical description of 
observed eye movement behaviors. The process of fixation 
identification - separating and labeling fixations and saccades 
in eye-tracking protocols - is an essential part of eye-movement 
data analysis and can have a dramatic impact on higher-level 
analyses. 
Common analysis metrics include fixation or gaze durations, 
saccadic velocities, saccadic amplitudes, and various 
transition-based parameters between fixations and/or regions 
of interest (Salvucci and Goldberg, 2000). The analysis of 
fixations and saccades requires some form of fixation 
identification (or simply identification) - that is, the translation 
from raw eye-movement data points to fixation locations (and 
implicitly the saccades between them) on the visual display. 
While it is generally agreed upon that visual and cognitive 
processing do occur during fixations, it is less clear exactly 
when fixations start and when they end. Regardless of the 
precision and flexibility associated with identification 
algorithms, the identification problem is still a subjective 
process. Therefore one efficient way to validate these 
algorithms is to compare resultant fixations to an observer's 
subjective impressions. 
For spatial characteristics, three criteria have been identified 
that distinguish three primary types of algorithms: velocity- 
based,  dispersion-based, and area-based (Salvucci and 
 
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.