Full text: Proceedings; XXI International Congress for Photogrammetry and Remote Sensing (Part B6b)

249 
A COMPUTATIONAL METHOD TO EMULATE BOTTOM-UP ATTENTION TO 
REMOTE SENSING IMAGES 
X. Chen 3 *, H. Huo a , F. Tao 3 , D. Li b , Z. Li 3 
3 Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, No.800 Dongchuan Road, Shanghai, China 
- weedcx@gmail.com, (huohong, tfang, lizhiqiangmy)@ sjtu.edu.cn 
b State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Hubei Wuhan, 
China - dli@wtusm.edu.cn 
Youth Forum 
KEY WORDS: Image interpretation, Modelling, Image understanding, Texture Analysis, Process modelling 
ABSTRACT: 
In this paper, we propose a computational model which is capable of emulating the expert’s bottom-up attention to remote sensing 
images. The bottom-up visual attention is a relatively primary step in neuroscience, and it can perfectly perform recognition if 
combined with context. Thus, efficient and fast bottom-up model is in need to give convenience to process context in following step. 
Our computational model well conforms to these conditions. The model cut down uncertain complication of visual attention by 
introduction of textons based on neurobiology and information entropy. First, our model processes images extremely rapidly while 
achieves relatively high hit rates. Second, our model provides rarity hierarchy by converting unique or rare visual attributes to 
number rare attribute for future processing. Third, our results provide size, shape and location information for the future context 
attention computation. 
1. INTRODUCTION 
Nowadays, vast amounts of remote sensing data achieved by a 
great deal of sensory receptors, satellites and other instruments 
are ready to be processed in time, but the processing ability still 
lies behind. A novel and promising solution is to analyze 
remote sensing data automatically by emulating the experienced 
interpreters’ psychological procedures(Lloyd 2002). To meet 
such challenge, the first and the key step is to get the size, shape 
and the location of latent attentive regions efficiently during 
simulating, which relates to “visual attention”, a term in 
psychology. 
Visual attention has not been thoroughly understood in 
neurobiology and psychology so far, but it is still get a lot of 
attention for its grand potential (Fabrikant 2005). Visual 
attention enables people to select most relevant information to 
ongoing action (Chun & Wolfe, 2001), and it will be helpful to 
interpret remote sensing images by focusing on the most 
informative places. Though there are not only bottom-up model 
like that of Itti (Itti 1998) but also models based on task-specific 
attentional bias like Tsotsos’s (Tsotsos 1995), it is not clear how 
factors concerning with tasks can be formally predicted or be 
incorporated into the mathematical model(Davies 2006). 
According to recent work on visual attention, the perceptual 
saliency critically depends on the surrounding context (Itti 
2001), and the bottom-up model provides the crude data for 
future processing. Thus, the bottom-up implementation is the 
practical way and an important step to emulate interpretation. 
Itti and Koch depend on neurobiology to construct a framework 
for understanding of visual attention, such as “saliency map” to 
code stimulus conspicuity, “inhibition of return” to prevent 
from being attended again. Some psychologists have already 
attempted to employ Itti and Koch’s model to update aerial 
photogrammetry (Davies 2006). They prove that the 
distribution of visual attention is measured by the use of a 
relatively simple, low-cost method, instead of full eye tracking. 
Through the contrast among the prediction of model, experts’ 
and participants’ performance, the model’s results are more 
similar to experts than novices. The reason is that the low-level 
visual patterns guide experts’ attention, which means it is the 
important or special data that attract experts instead of the 
content. Some researchers employ Itti’s model to empirically 
assess the effectiveness of dynamic displays for learning, 
knowledge discovery, and knowledge construction based on the 
relationship of perceptual salience and thematic relevance in 
static and animated displays (Fabrikant 2005). However, the 
model does not optimally predict people’s visual attention 
(Davies 2006) and can not be considered a typical design 
solution (Fabrikant 2005). Even to relatively simple natural 
scene images, the model’s hit rate is not larger than 0.5076 and 
false alarm rate is between 0.1433 and 0.293 l(Hou 2007). 
Furthermore, Itti’s model also falls to depict interesting areas 
accurately. 
In light of foregoing analysis, this study is intended for 
detecting the size, shape and location of latently salient texture 
regions comprising a specific texton according to rarity based 
on neurobiology and information theory. We extract textons 
based on the grey level information at lowest cost; use the 
entropy measure to categorize the rarity hierarchy, which is one 
of key factors in visual attention. 
* Corresponding author: Tao Fang. E-mail: tfang@sjtu.edu.cn; phone:+86-021-34204758
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.