249
A COMPUTATIONAL METHOD TO EMULATE BOTTOM-UP ATTENTION TO
REMOTE SENSING IMAGES
X. Chen 3 *, H. Huo a , F. Tao 3 , D. Li b , Z. Li 3
3 Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, No.800 Dongchuan Road, Shanghai, China
- weedcx@gmail.com, (huohong, tfang, lizhiqiangmy)@ sjtu.edu.cn
b State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Hubei Wuhan,
China - dli@wtusm.edu.cn
Youth Forum
KEY WORDS: Image interpretation, Modelling, Image understanding, Texture Analysis, Process modelling
ABSTRACT:
In this paper, we propose a computational model which is capable of emulating the expert’s bottom-up attention to remote sensing
images. The bottom-up visual attention is a relatively primary step in neuroscience, and it can perfectly perform recognition if
combined with context. Thus, efficient and fast bottom-up model is in need to give convenience to process context in following step.
Our computational model well conforms to these conditions. The model cut down uncertain complication of visual attention by
introduction of textons based on neurobiology and information entropy. First, our model processes images extremely rapidly while
achieves relatively high hit rates. Second, our model provides rarity hierarchy by converting unique or rare visual attributes to
number rare attribute for future processing. Third, our results provide size, shape and location information for the future context
attention computation.
1. INTRODUCTION
Nowadays, vast amounts of remote sensing data achieved by a
great deal of sensory receptors, satellites and other instruments
are ready to be processed in time, but the processing ability still
lies behind. A novel and promising solution is to analyze
remote sensing data automatically by emulating the experienced
interpreters’ psychological procedures(Lloyd 2002). To meet
such challenge, the first and the key step is to get the size, shape
and the location of latent attentive regions efficiently during
simulating, which relates to “visual attention”, a term in
psychology.
Visual attention has not been thoroughly understood in
neurobiology and psychology so far, but it is still get a lot of
attention for its grand potential (Fabrikant 2005). Visual
attention enables people to select most relevant information to
ongoing action (Chun & Wolfe, 2001), and it will be helpful to
interpret remote sensing images by focusing on the most
informative places. Though there are not only bottom-up model
like that of Itti (Itti 1998) but also models based on task-specific
attentional bias like Tsotsos’s (Tsotsos 1995), it is not clear how
factors concerning with tasks can be formally predicted or be
incorporated into the mathematical model(Davies 2006).
According to recent work on visual attention, the perceptual
saliency critically depends on the surrounding context (Itti
2001), and the bottom-up model provides the crude data for
future processing. Thus, the bottom-up implementation is the
practical way and an important step to emulate interpretation.
Itti and Koch depend on neurobiology to construct a framework
for understanding of visual attention, such as “saliency map” to
code stimulus conspicuity, “inhibition of return” to prevent
from being attended again. Some psychologists have already
attempted to employ Itti and Koch’s model to update aerial
photogrammetry (Davies 2006). They prove that the
distribution of visual attention is measured by the use of a
relatively simple, low-cost method, instead of full eye tracking.
Through the contrast among the prediction of model, experts’
and participants’ performance, the model’s results are more
similar to experts than novices. The reason is that the low-level
visual patterns guide experts’ attention, which means it is the
important or special data that attract experts instead of the
content. Some researchers employ Itti’s model to empirically
assess the effectiveness of dynamic displays for learning,
knowledge discovery, and knowledge construction based on the
relationship of perceptual salience and thematic relevance in
static and animated displays (Fabrikant 2005). However, the
model does not optimally predict people’s visual attention
(Davies 2006) and can not be considered a typical design
solution (Fabrikant 2005). Even to relatively simple natural
scene images, the model’s hit rate is not larger than 0.5076 and
false alarm rate is between 0.1433 and 0.293 l(Hou 2007).
Furthermore, Itti’s model also falls to depict interesting areas
accurately.
In light of foregoing analysis, this study is intended for
detecting the size, shape and location of latently salient texture
regions comprising a specific texton according to rarity based
on neurobiology and information theory. We extract textons
based on the grey level information at lowest cost; use the
entropy measure to categorize the rarity hierarchy, which is one
of key factors in visual attention.
* Corresponding author: Tao Fang. E-mail: tfang@sjtu.edu.cn; phone:+86-021-34204758