PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE
Proceedings Volume 7240, including the Title Page, Copyright
information, Table of Contents, and the Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present the design of a spherical imaging system with the following properties: (i) A 4π field of view that enables it
to "see" in all directions; (ii) a single center of projection to avoid parallax within the field of view; and (iii) a uniform
spatial and angular resolution in order to achieve a uniform sampling of the field of view. Our design consists of a spherical
(ball) lens encased within a spherical detector shell. The detector shell has a uniform distribution of sensing elements,
but with free space between neighboring elements, thereby making the detector partly transparent to light. We determine
the optimal dimensions of the sensing elements and the diameter of the detector shell that produce the most compact
point spread function. The image captured with such a camera has minimal blur and can be deblurred using spherical
deconvolution. Current solid state technologies do not permit the fabrication of a high resolution spherical detector array.
Therefore, in order to verify our design, we have built a prototype spherical camera with a single sensing element, which
can scan a spherical image one pixel at a time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Three sets of findings are reported here, all related to behavioral and neural correlates of preference decision. First, when
one is engaged in a preference decision task with free observation, one's gaze is biased towards the to-be-chosen stimulus
(eg. face) long before (s)he is consciously aware of the decision ("gaze cascade effect"). Second, an fMRI study
suggested that implicit activity in a subcortical structure (the Nucleus Accumbens) precedes cognitive and conscious
decision of preference. Finally, both novelty and familiarity causally contribute to attractiveness, but differently across
object categories (such as faces and natural scenes). Taken together, these results point to dynamical and implicit
processes both in short- and long-term, towards conscious preference decision. Finally, some discussion will be given on
aesthetic decision (i.e. "beauty").
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Social Software, Internet Experiments, and New Paradigms for the Web
Web-based or on-line experiments are still a relatively new research topic. Laboratory or highly controlled experiments
are, for many reasons, the preferred methodology for visual experiments. However recent experiments suggest that online
experiments have some unique properties and advantages that may in some cases outweigh or offset their
disadvantages. This paper will consider on-line experiments from both a general and a narrow perspective. Specifically a
range of specific experiments, and on-line tools, will be considered from a broad vantage and possible themes or
considerations for these experiments will be considered.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper considers the problem of delivering calibrated images over the web with the precision appropriate
for psychophysical experimentation. We are interested only in methods that might be employed by a remote
participant possessing nothing other than a computer terminal. Therefore, we consider only purely psychophysical
methods not requiring any measurement instruments or standards. Because of this limitation, there are certain
things we can not determine, the most significant of which is the absolute luminance. We present solutions
for three particular problems: linearization, also known as gamma correction; determination of the relative
luminances of the display primaries; and colorimetry, i.e. determining the chromaticity of the primaries.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Social tagging is an emerging methodology that allows individual users to assign semantic keywords to content on the
web. Popular web services allow the community of users to search for content based on these user-defined tags. Tags
are typically attached to a whole entity such as a web page (e.g., del.icio.us), a video (e.g., YouTube), a product
description (e.g., Amazon) or a photograph (e.g., Flickr). However, finding specific information within a whole entity
can be a difficult, time-intensive process. This is especially true for content such as video, where the information sought
may be a small segment within a very long presentation. Moreover, the tags provided by a community of users may be
incorrect, conflicting, or incomplete when used as search terms.
In this paper we introduce a system that allows users to create "micro-tags," that is, semantic markers that are attached to
subsets of information. These micro-tags give the user the ability to direct attention to specific subsets within a larger
and more complex entity, and the set of micro-tags provides a more nuanced description of the full content. Also, when
these micro-tags are used as search terms, there is no need to do a serial search of the content, since micro-tags draw
attention to the semantic content of interest. This system also provides a mechanism that allows users in the community
to edit and delete each others' tags, using the community to refine and improve tag quality. We will also report on
empirical studies that demonstrate the value of micro-tagging and tag editing and explore the role micro-tags and tag
editing will play in future applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Internet experiment is now a well-established and widely used method. The present paper describes guidelines for
the proper conduct of Internet experiments, e.g. handling of dropout, unobtrusive naming of materials, and pre-testing.
Several methods are presented that further increase the quality of Internet experiments and help to avoid frequent errors.
These methods include the "seriousness check", "warm-up," "high hurdle," and "multiple site entry" techniques, control
of multiple submissions, and control of motivational confounding. Finally, metadata from sites like WEXTOR
(http://wextor.org) and the web experiment list (http://genpsylab-wexlist.uzh.ch/) are reported that show the current state
of Internet-based research in terms of the distribution of fields, topics, and research designs used.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The appearance of objects in scenes is determined by their shape, material properties and by the light field, and,
in contradistinction, the appearance of those objects provides us with cues about the shape, material properties
and light field. The latter so-called inverse problem is underdetermined and therefore suffers from interesting
ambiguities. Therefore, interactions in the perception of shape, material, and luminous environment are bound
to occur.
Textures of illuminated rough materials depend strongly on the illumination and viewing directions. Luminance
histogram-based measures such as the average luminance, its variance, shadow and highlight modes, and
the contrast provide robust estimates with regard to the surface structure and the light field. Human observers
performance agrees well with predictions on the basis of such measures. If we also take into account the spatial
structure of the texture it is possible to estimate the illumination orientation locally. Image analysis on the
basis of second order statistics and human observers estimates correspond well and are both subject to the
bas-relief and the convex-concave ambiguities. The systematic robust illuminance flow patterns of local illumination
orientation estimates on rough 3D objects are an important entity for shape from shading and for light
field estimates. Human observers are able to match and discriminate simple light field properties (e.g. average
illumination direction and diffuseness) of objects and scenes, but they make systematic errors, which depend
on material properties, object shapes and position in the scene. Moreover, our results show that perception of
material and illumination are basically confounded. Detailed analysis of these confounds suggests that observers
primarily attend to the low-pass structure of the light field. We measured and visualized this structure, which
was found to vary smoothly in natural scenes in- and outdoors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we review empirical studies that investigate performance effects of stereoscopic displays for medical
applications. We focus on four distinct application areas: diagnosis, pre-operative planning, minimally invasive surgery
(MIS) and training/teaching. For diagnosis, stereoscopic displays can augment the understanding of complex spatial
structures and increase the detection of abnormalities. Stereoscopic viewing of medical data has proven to increase the
detection rate in breast imaging. A stereoscopic presentation of noisy and transparent images in 3D ultrasound results in
better visualization of the internal structures, however more empirical studies are needed to confirm the clinical
relevance. For MRI and CT, where images are frequently rendered in 3D perspective, the added value of binocular depth
has not yet been convincingly demonstrated. For MIS, stereoscopic displays can decrease surgery time and increase
accuracy of surgical procedures. Performance of surgical procedures is similar when high resolution 2D displays are
compared with lower resolution stereoscopic displays, indicating an image quality improvement for stereoscopic
displays. Training and surgical planning already use computer simulations in 2D, however more research is needed to the
benefit of stereoscopic displays in those applications. Overall there is a clear need for more empirical evidence that
quantifies the added value of stereoscopic displays in medical domains, such that the medical community will have
ample basis to invest in stereoscopic displays in all or some of the described medical applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In three experiments the perceived roughness of visual and of auditory materials was investigated. In Experiment 1, the
roughness of frequency-modulated tones was determined using a paired-comparison paradigm. It was found that using
this paradigm similar results in comparison to literature were found. In Experiment 2, the perceived visual roughness of
textures drawn from the CUReT database was determined. It was found that participants could systematically judge the
roughness of the textures. In Experiment 3 the perceived pleasantness for the textures used in Experiment 2 was
determined. It was found that two groups of participants could be distinguished. One group found rough textures
unpleasant and smooth textures pleasant. The other group found rough textures pleasant and smooth textures unpleasant.
Although for the latter groups the relation between relative roughness and perceived pleasantness was less strong.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Current automatic sign language recognition (ASLR) seldom uses perceptual knowledge about the recognition
of sign language. Using such knowledge can improve ASLR because it can give an indication which elements
or phases of a sign are important for its meaning. Also, the current generation of data-driven ASLR methods
has shortcomings which may not be solvable without the use of knowledge on human sign language processing.
Handling variation in the precise execution of signs is an example of such shortcomings: data-driven methods
(which include almost all current methods) have difficulty recognizing signs that deviate too much from the
examples that were used to train the method. Insight into human sign processing is needed to solve these
problems. Perceptual research on sign language can provide such insights. This paper discusses knowledge
derived from a set of sign perception experiments, and the application of such knowledge in ASLR. Among
the findings are the facts that not all phases and elements of a sign are equally informative, that defining the
'correct' form for a sign is not trivial, and that statistical ASLR methods do not necessarily arrive at sign
representations that resemble those of human beings. Apparently, current ASLR methods are quite different
from human observers: their method of learning gives them different sign definitions, they regard each moment
and element of a sign as equally important and they employ a single definition of 'correct' for all circumstances.
If the object is for an ASLR method to handle natural sign language, then the insights from sign perception
research must be integrated into ASLR.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Communication of American Sign Language (ASL) over mobile phones would be very beneficial to the Deaf
community. ASL video encoded to achieve the rates provided by current cellular networks must be heavily
compressed and appropriate assessment techniques are required to analyze the intelligibility of the compressed
video. As an extension to a purely spatial measure of intelligibility, this paper quantifies the effect of temporal
compression artifacts on sign language intelligibility. These artifacts can be the result of motion-compensation
errors that distract the observer or frame rate reductions. They reduce the the perception of smooth motion
and disrupt the temporal coherence of the video. Motion-compensation errors that affect temporal coherence
are identified by measuring the block-level correlation between co-located macroblocks in adjacent frames. The
impact of frame rate reductions was quantified through experimental testing. A subjective study was performed
in which fluent ASL participants rated the intelligibility of sequences encoded at a range of 5 different frame rates
and with 3 different levels of distortion. The subjective data is used to parameterize an objective intelligibility
measure which is highly correlated with subjective ratings at multiple frame rates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Ubiquitous computing (or Ambient Intelligence) is an upcoming technology that is usually associated with futuristic
smart environments in which information is available anytime anywhere and with which humans can interact in a
natural, multimodal way. However spectacular the corresponding scenarios may be, it is equally challenging to consider
how this technology may enhance existing situations. This is illustrated by a case study from the Dutch medical field:
central quality reviewing for pathology in child oncology. The main goal of the review is to assess the quality of the
diagnosis based on patient material. The sharing of knowledge in social face-to-face interaction during such meeting is
an important advantage. At the same time there is the disadvantage that the experts from the seven Dutch academic
medical centers have to travel to the review meeting and that the required logistics to collect and bring patient material
and data to the meeting is cumbersome and time-consuming. This paper focuses on how this time-consuming, nonefficient
way of reviewing can be replaced by a virtual collaboration system by merging technology supporting
Computer Mediated Collaboration and intuitive interfacing. This requires insight in the preferred way of communication
and collaboration as well as knowledge about preferred interaction style with a virtual shared workspace.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes a radial basis memory system that is used to model the performance of human participants in a task
of learning to traverse mazes in a virtual environment. The memory model is a multiple-trace system, in which each
event is stored as a separate memory trace. In the modeling of the maze traversal task, the events that are stored as
memories are the perceptions and decisions taken at the intersections of the maze. As the virtual agent traverses the
maze, it makes decisions based upon all of its memories, but those that match best to the current perceptual situation, and
which were successful in the past, have the greatest influence. As the agent carries out repeated attempts to traverse the
same maze, memories of successful decisions accumulate, and performance gradually improves. The system uses only
three free parameters, which most importantly includes adjustments to the standard deviation of the underlying Gaussian
used as the radial basis function. It is demonstrated that adjustments of these parameters can easily result in exact
modeling of the average human performance in the same task, and that variation of the parameters matches the variation
in human performance. We conclude that human memory interaction that does not involve conscious memorization, as
in learning navigation routes, may be much more primitive and simply explained than has been previously thought.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Both our visual and haptic systems contribute to the perception of the three dimensional world, especially the
proximal perception of objects. The interaction of these systems has been the subject of some debate over the
years, ranging from the philosophically posed Molyneux problem to the more pragmatic examination of their
psychophysical relationship. To better understand the nature of this interaction we have performed a variety of
experiments characterizing the detection, discrimination, and production of 3D shape. A stimulus set of 25 complex,
natural appearing, noisy 3D target objects were statistically specified in the Fourier domain and manufactured
using a 3D printer. A series of paired-comparison experiments examined subjects' unimodal (visual-visual and
haptic-haptic) and crossmodal (visual-haptic) perceptual abilities. Additionally, subjects sculpted objects using
uni- or crossmodal source information. In all experiments, the performance in the unimodal conditions were
similar to one another and unimodal presentation fared better than crossmodal. Also, the spatial frequency of
object features affected performance differentially across the range used in this experiment. The sculpted objects
were scanned in 3D and the resulting geometry was compared metrically and statistically to the original stimuli.
Objects with higher spatial frequency were harder to sculpt when limited to haptic input compared to only visual
input. The opposite was found for objects with low spatial frequency. The psychophysical discrimination and
comparison experiments yielded similar findings. There is a marked performance difference between the visual
and haptic systems and these differences were systematically distributed along the range of feature details. The
existence of non-universal (i.e. modality-specific) representations explain the poor crossmodal performance. Our
current findings suggest that haptic and visual information is either integrated into a multi-modal form, or each is
independent and somewhat efficient translation is possible. Vision shows a distinct advantage when dealing with
higher frequency objects but both modalities are effective when comparing objects that differ by a large amount.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article provides an overview of an ongoing program of research designed to investigate the effectiveness of haptic
cuing to redirect a user's visual spatial attention under various conditions using a visual change detection paradigm.
Participants visually inspected displays consisting of rectangular horizontal and vertical elements in order to try and
detect an orientation change in one of the elements. Prior to performing the visual task on each trial, the participants
were tapped on the back from one of four locations by a vibrotactile stimulator. The validity of the haptic cues (i.e., the
probability that the tactor location coincided with the quadrant where the visual target occurred) was varied. Response
time was recorded and eye-position monitored with an eyetracker. Under conditions where the validity of the haptic cue
was high (i.e., when the cue predicted the likely target quadrant), initial saccades predominantly went to the cued
quadrant and response times were significantly faster as compared to the baseline condition where no haptic cuing was
provided. When the cue validity was low (i.e., when the cue provided no information with regard to the quadrant in
which the visual target might occur), however, the participants were able to ignore haptic cuing as instructed.
Furthermore, a spotlight effect was observed in that the response time increased as the visual target moved away from
the center of the cued quadrant. These results have implications for the designers of multimodal (or multisensory)
interfaces where a user can benefit from haptic attentional cues in order to detect and/or process the information from a
small region within a large and complex visual display.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study explores the haptic rendering capabilities of a variable friction tactile interface through psychophysical
experiments. In order to obtain a deeper understanding of the sensory resolution associated with the Tactile Pattern
Display (TPaD), friction discrimination experiments are conducted. During the experiments, subjects are asked to
explore the glass surface of the TPaD using their bare index fingers, to feel the friction on the surface, and to compare
the slipperiness of two stimuli, displayed in sequential order. The fingertip position data is collected by an infrared frame
and normal and translational forces applied by the finger are measured by force sensors attached to the TPaD. The
recorded data is used to calculate the coefficient of friction between the fingertip and the TPaD. The experiments
determine the just noticeable difference (JND) of friction coefficient for humans interacting with the TPaD.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a new approach for converting graphical and pictorial information into tactile patterns that can
be displayed in a static or dynamic tactile device. The key components of the proposed approach are (1) an
algorithm that segments a scene into perceptually uniform segments; (2) a procedure for generating perceptually
distinct tactile patterns; and (3) a mapping of the visual textures of the segments into tactile textures that
convey similar concepts. We used existing digital halftoning and other techniques to generate a wide variety of
tactile textures. We then conducted formal and informal subjective tests with sighted (but visually blocked) and
visually-impaired subjects to determine the ability of human tactile perception to perceive differences among
them. In addition to generating perceptually distinguishable tactile patterns, our goal is to identify significant
dimensions of tactile texture perception, which will make it possible to map different visual attributes into
independent tactile attributes. Our experimental results indicate that it is poosible to generate a number of
perceptually distinguishable tactile patterns, and that different dimensions of tactile texture perception can
indeed be identified.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study we demonstrate that touch decreases the ambiguity in a visual image. It has been previously
found that visual perception of three-dimensional shape is subject to certain variations. These variations can
be described by the affine transformation. While the visual system thus seems unable to capture the Euclidean
structure of a shape, touch could potentially be a useful source to disambiguate the image. Participants performed
a so-called 'attitude task' from which the structure of the perceived three-dimensional shape was calculated. One
group performed the task with only vision and a second group could touch the stimulus while viewing it. We found
that the consistency within the haptics+vision group was higher than in the vision-only group. Thus, haptics
decreases the visual ambiguity. Furthermore, we found that the touched shape was consistently perceived as
having more relief than the untouched the shape. It was also found that the direction of affine shear differences
within the two groups was more consistent when touch was used. We thus show that haptics has a significant
influence on the perception of pictorial relief.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Space operations present the human visual system with a wide dynamic range of images from faint stars and starlit
shadows to un-attenuated sunlight. Lunar operations near the poles will result in low sun angles, exacerbating visual
problems associated with shadowing and glare. We discuss the perceptual challenges these conditions will present to the
human explorers, and consider some possible mitigations and countermeasures. We also discuss the problems of
simulating these conditions for realistic training.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the real world we can find large intensity ranges: the ratio from the brightest to the darkest part of the
scene can be of the order of 10000 to 1. Since most of our electronic displays have a limited range of
around 100 to 1, the last 20 years has seen much work done to develop different algorithms that compress
the actual dynamic range of an image to that available in the display device. These algorithms, known as
tone mappers, attempt to preserve as much of the images characteristics as possible [1]. An increasing
amount of research has also been done to try to evaluate the 'best' tone mapper. Approaches have included
pair wise comparisons of tone mapped images [2], comparison with real scenes [3] or using images
displayed on a High Dynamic Range (HDR) monitor [4]. None of these approaches are entirely satisfactory
and all suffer from potential confounding factors due to participant's interpretation of instructions and
biases.
There is evidence that the spatial and chronological path of fixations made by observers' when viewing an
image (i.e. the scanpath) is repeated to some extent when the same image is again presented to the observer
(e.g. [5]). In this paper we are the first to investigate the potential of using eye movement recordings,
particularly scanpaths, as a discriminatory tool. We propose that if a tone-mapped image gives rise to
scanpaths that are different from those obtained when viewing the original image this might be an
indication of a poor quality tone mapper since it is eliciting eye movements that are different from those
observed when viewing the original image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the key issues for a successful roll out of digital cinema is in the quality it offers. The most practical and least
expensive way of measuring quality of multimedia content is through the use of objective metrics. In addition to the
widely used objective quality metric peak signal-to-noise ratio (PSNR), recently other metrics such as single scale
structural similarity (SS-SSIM) and multi scale structural similarity (MS-SSIM) have been claimed as good alternatives
for estimation of perceived quality by human subjects. The goal of this paper is to verify by means of subjective tests the
validity of such claims for digital cinema content and environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose and discuss some approaches for measuring perceptual contrast in digital images. We
start from previous algorithms by implementing different local measures of contrast and a parameterized way to
recombine local contrast maps and color channels. We propose the idea of recombining the local contrast maps
and the channels using particular measures taken from the image itself as weighting parameters. Exhaustive
tests and results are presented and discussed, in particular we compare the performance of each algorithm in
relation to perceived contrast by observers. Current results show an improvement in correlation between contrast
measures and observers perceived contrast when the variance of the three color channels separately is used as
weighting parameter for local contrast maps.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Much research has gone into developing methods for enhancing the contrast of displayed 3D scenes. In the
current study, we investigated the perceptual impact of an algorithm recently proposed by Ritschel et al.1 that
provides a general technique for enhancing the perceived contrast in synthesized scenes. Their algorithm extends
traditional image-based Unsharp Masking to a 3D scene, achieving a scene-coherent enhancement. We conducted
a standardized perceptual experiment to test the proposition that a 3D unsharp enhanced scene was superior to
the original scene in terms of perceived contrast and preference. Furthermore, the impact of different settings
of the algorithm's main parameters enhancement-strength (λ) and gradient size (σ) were studied in order to
provide an estimate of a reasonable parameter space for the method. All participants preferred a clearly visible
enhancement over the original, non-enhanced scenes and the setting for objectionable enhancement was far
above the preferred settings. The effect of the gradient size σ was negligible. The general pattern found for
the parameters provides a useful guideline for designers when making use of 3D Unsharp Masking: as a rule of
thumb they can easily determine the strength for which they start to perceive an enhancement and use twice
this value for a good effect. Since the value for objectionable results was twice as large again, artifacts should
not impose restrictions on the applicability of this rule.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we investigated how the luminance ratio of the surround field (Ls) to that of the central field (Lc) influence
the perceived blackness of the central field in a simple configuration of concentric circle (Experiment 1) and in digital
images of masterpieces (Experiment 2). Results of Experiment 1 showed that perceived blackness of the central field
becomes more blackish and deeper as the contrast between Lc and Ls increases. Results of Experiment 2 showed that
perceived blackness of black area surrounded by relatively bright area in artistic images is stronger than the perceived
blackness given by the same luminance contrast between the center and surround in a concentric circular configuration.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There are a number of modern myths about High Dynamic Range (HDR) imaging. There have been claims that
multiple-exposure techniques can accurately record scene luminances over a dynamic range of more than a million
to one. There are assertions that human appearance tracks the same range. The most common myth is that HDR
imaging accurately records and reproduces actual scene radiances. Regardless, there is no doubt that HDR imaging
is superior to conventional imaging. We need to understand the basis of HDR image quality improvements. This
paper shows that multiple exposure techniques can preserve spatial information, although they cannot record
accurate scene luminances. Synthesizing HDR renditions from relative spatial records accounts for improved
images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In Digital Cinema, the video compression must be as transparent as possible to provide the best image quality to the
audience. The goal of compression is to simplify transport, storing, distribution and projection of films. For all those
tasks, equipments need to be developed. It is thus mandatory to reduce the complexity of the equipments by imposing
limitations in the specifications. In this sense, the DCI has fixed the maximum bitrate for a compressed stream to 250
Mbps independently from the input format (4K/24fps, 2K/48fps or 2K/24fps). The work described in this paper This
parameter is discussed in this paper because it is not consistent to double/quadruple the input rate without increasing the
output rate. The work presented in this paper is intended to define quantization steps ensuring the visually lossless
compression. Two steps are followed first to evaluate the effect of each subband separately and then to fin the scaling
ratio. The obtained results show that it is necessary to increase the bitrate limit for cinema material in order to achieve
the visually lossless.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Statistical modeling of natural image sequences is of fundamental importance to both the understanding of
biological visual systems and the development of Bayesian approaches for solving a wide variety of machine
vision and image processing problems. Previous methods are based on measuring spatiotemporal power spectra
and by optimizing the best linear filters to achieve independent or sparse representations of the time-varying
image signals. Here we propose a different approach, in which we investigate the temporal variations of local
phase structures in the complex wavelet transform domain. We observe that natural image sequences exhibit
strong prior of temporal motion smoothness, by which local phases of wavelet coefficients can be well predicted
from their temporal neighbors. We study how such a statistical regularity is interfered with "unnatural" image
distortions and demonstrate the potentials of using temporal motion smoothness measures for reduced-reference
video quality assessment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There is a great deal of interest in methods to assess the perceptual quality of a video sequence in a full reference
framework. Motion plays an important role in human perception of video and videos suffer from several artifacts
that have to deal with inaccuracies in the representation of motion in the test video compared to the reference.
However, existing algorithms to measure video quality focus primarily on capturing spatial artifacts in the video
signal, and are inadequate at modeling motion perception and capturing temporal artifacts in videos. We present
an objective, full reference video quality index known as the MOtion-based Video Integrity Evaluation (MOVIE)
index that integrates both spatial and temporal aspects of distortion assessment. MOVIE explicitly uses motion
information from the reference video and evaluates the quality of the test video along the motion trajectories
of the reference video. The performance of MOVIE is evaluated using the VQEG FR-TV Phase I dataset and
MOVIE is shown to be competitive with, and even out-perform, existing video quality assessment systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To predict subjective quality it is necessary to develop and validate approaches that accurately predict video quality. For
perceptual quality models, developers have implemented methods that utilise information from both the original and the
processed signals (full reference and reduced reference methods). For many practical applications, no reference (NR)
methods are required. It has been a major challenge for developers to produce no reference methods that attain the
necessary predictive performance for the methods to be deployed by industry. In this paper, we present a comparison
between no reference methods operating on either the decoded picture information alone or using a bit-stream / decoded
picture hybrid analysis approach. Two NR models are introduced: one using decoded picture information only; the other
using a hybrid approach. Validation data obtained from subjective quality tests are used to examine the predictive
performance of both models. The strengths and limitations of the two NR methods are discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To develop accurate objective measurements (models) for video quality assessment, subjective data is traditionally
collected via human subject testing. The ITU has a series of Recommendations that address methodology for performing
subjective tests in a rigorous manner. These methods are targeted at the entertainment application of video. However,
video is often used for many applications outside of the entertainment sector, and generally this class of video is used to
perform a specific task. Examples of these applications include security, public safety, remote command and control,
and sign language. For these applications, video is used to recognize objects, people or events. The existing methods,
developed to assess a person's perceptual opinion of quality, are not appropriate for task-based video. The Institute for
Telecommunication Sciences, under a program from the Department of Homeland Security and the National Institute for
Standards and Technology's Office of Law Enforcement, has developed a subjective test method to determine a person's
ability to perform recognition tasks using video, thereby rating the quality according to the usefulness of the video
quality within its application. This new method is presented, along with a discussion of two examples of subjective tests
using this method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Present quality assessment (QA) algorithms aim to generate scores for natural images consistent with subjective
scores for the quality assessment task. For the quality assessment task, human observers evaluate a natural
image based on its perceptual resemblance to a reference. Natural images communicate useful information to
humans, and this paper investigates the utility assessment task, where human observers evaluate the usefulness of
a natural image as a surrogate for a reference. Current QA algorithms implicitly assess utility insofar as an image
that exhibits strong perceptual resemblance to a reference is also of high utility. However, a perceived quality
score is not a proxy for a perceived utility score: a decrease in perceived quality may not affect the perceived
utility. Two experiments are conducted to investigate the relationship between the quality assessment and utility
assessment tasks. The results from these experiments provide evidence that any algorithm optimized to predict
perceived quality scores cannot immediately predict perceived utility scores. Several QA algorithms are evaluated
in terms of their ability to predict subjective scores for the quality and utility assessment tasks. Among the QA
algorithms evaluated, the visual information fidelity (VIF) criterion, which is frequently reported to provide the
highest correlation with perceived quality, predicted both perceived quality and utility scores reasonably. The
consistent performance of VIF for both the tasks raised suspicions in light of the evidence from the psychophysical
experiments. A thorough analysis of VIF revealed that it artificially emphasizes evaluations at finer image scales
(i.e., higher spatial frequencies) over those at coarser image scales (i.e., lower spatial frequencies). A modified
implementation of VIF, denoted VIF*, is presented that provides statistically significant improvement over VIF
for the quality assessment task and statistically worse performance for the utility assessment task. A novel utility
assessment algorithm, referred to as the natural image contour evaluation (NICE), is introduced that conducts a
comparison of the contours of a test image to those of a reference image across multiple image scales to score the
test image. NICE demonstrates a viable departure from traditional QA algorithms that incorporate energy-based
approaches and is capable of predicting perceived utility scores.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual content typically exhibits regions that particularly attract the viewer's attention, usually referred to as
regions-of-interest (ROI). In the context of visual quality one may expect that distortions occurring in the ROI
are perceived more annoyingly than distortions in the background (BG). This is especially true given that the
human visual system is highly space variant in sampling visual signals. However, this phenomenon of visual
attention is only seldom taken into account in visual quality metric design. In this paper, we thus provide a
framework for incorporation of visual attention into the design of an objective quality metric by means of regionbased
segmentation of the image. To support the metric design we conducted subjective experiments to both
quantify the subjective quality of a set of distorted images and also to identify ROI in a set of reference images.
Multiobjective optimization is then applied to find the optimal weighting of the ROI and BG quality metrics. It
is shown that the ROI based metric design allows to increase quality prediction performance of the considered
metric and also of two other contemporary quality metrics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Spatial pooling strategies used in recent Image Quality Assessment (IQA) algorithms have generally been that of
simply averaging the values of the obtained scores across the image. Given that certain regions in an image are
perceptually more important than others, it is not unreasonable to suspect that gains can be achieved by using
an appropriate pooling strategy. In this paper, we explore two hypothesis that explore spatial pooling strategies
for the popular SSIM metrics.1, 2 The first is visual attention and gaze direction - 'where' a human looks. The
second is that humans tend to perceive 'poor' regions in an image with more severity than the 'good' ones - and
hence penalize images with even a small number of 'poor' regions more heavily. The improvements in correlation
between the objective metrics' score and human perception is demonstrated by evaluating the performance of
these pooling strategies on the LIVE database3 of images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Human Variation Model views disability as simply "an extension of the natural physical, social, and cultural
variability of mankind." Given this human variation, it can be difficult to distinguish between a prosthetic device such
as a pair of glasses (which extends limited visual abilities into the "normal" range) and a visual enhancement device such
as a pair of binoculars (which extends visual abilities beyond the "normal" range). Indeed, there is no inherent reason
why the design of visual prosthetic devices should be limited to just providing "normal" vision. One obvious
enhancement to human vision would be the ability to visually "zoom" in on objects that are of particular interest to the
viewer. Indeed, it could be argued that humans already have a limited zoom capability, which is provided by their highresolution
foveal vision. However, humans still find additional zooming useful, as evidenced by their purchases of
binoculars equipped with mechanized zoom features. The fact that these zoom features are manually controlled raises
two questions: (1) Could a visual enhancement device be developed to monitor attention and control visual zoom
automatically? (2) If such a device were developed, would its use be experienced by users as a simple extension of their
natural vision? This paper details the results of work with two research platforms called the Remote Visual Explorer
(ReVEx) and the Interactive Visual Explorer (InVEx) that were developed specifically to answer these two questions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Motivated by the reported increase in sharpness by image noise, we investigated how noise affects sharpness perception.
We first used natural images of tree bark with different amounts of noise to see whether noise enhances sharpness.
Although the result showed sharpness decreased as noise amount increased, some observers seemed to perceive more
sharpness with increasing noise, while the others did not. We next used 1D and 2D uni-frequency patterns as stimuli in
an attempt to reduce such variability in the judgment. The result showed, for higher frequency stimuli, sharpness
decreased as the noise amount increased, while sharpness of the lower frequency stimuli increased at a certain noise level.
From this result, we thought image noise might reduce sharpness at edges, but be able to improve sharpness of lower
frequency component or texture in image. To prove this prediction, we experimented again with the natural image used
in the first experiment. Stimuli were made by applying noise separately to edge or to texture part of the image. The result
showed noise, when added to edge region, only decreased sharpness, whereas when added to texture, could improve
sharpness. We think it is the interaction between noise and texture that sharpens image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Motion-blur is still an important issue on liquid crystal displays (LCD). In the last years, efforts have been done
in the characterization and the measurement of this artifact. These methods permit to picture the blurred profile
of a moving edge, according to the scrolling speed and to the gray-to-gray transition considered. However, other
aspects should be taken in account in order to understand the way LCD motion-blur is perceived.
In the last years, a couple of works have adressed the problem of LCD motion-blur perception, but only
few speeds and transitions have been tested. In this paper, we have explored motion-blur perception over 20
gray-to-gray transitions and several scrolling speeds. Moreover, we have used three different displays, to explore
the influence of the luminance range as well as the blur shape on the motion-blur perception.
A blur matching experiment has been set up to obtain the relation between objective measurements and
perception. In this experiment, observers must adjust a stationary test blur (simulated from measurements) until
it matches their perception of the blur occuring on a moving edge. Result shows that the adjusted perceived
blur is always lower than the objective measured blur. This effect is greater for low contrast edges than for high
contrast edges. This could be related to the motion sharpening phenomenon.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Human visual system is sensitive to both the first-order and the second-order variations in an image.
The latter one is especially important for the digital image processing as it allows human observers
to perceive the envelope of the pixel intensities as smooth surface instead of the discrete pixels. Here
we used pattern masking paradigm to measure the detection threshold of contrast modulated (CM)
stimuli, which comprise the modulation of the contrast of horizontal gratings by a vertical Gabor
function, under different modulation depth of the CM stimuli. The threshold function showed a
typical dipper shape: the threshold decreased with modulation depth (facilitation) at low pedestal
depth modulations and then increased (suppression) at high pedestal modulation. The data was well
explained by a modified divisive inhibition model that operated both on depth modulation and
carrier contrast in the input images. Hence the divisive inhibition, determined by both the first- and
the second-order information in the stimuli, is necessary to explain the discrimination between two
second-order stimuli.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The theory of linguistics teaches us the existence of a hierarchical structure in linguistic expressions, from letter
to word root, and on to word and sentences. By applying syntax and semantics beyond words, one can further
recognize the grammatical relationship between among words and the meaning of a sequence of words. This
layered view of a spoken language is useful for effective analysis and automated processing. Thus, it is interesting
to ask if a similar hierarchy of representation of visual information does exist. A class of techniques that have a
similar nature to the linguistic parsing is found in the Lempel-Ziv incremental parsing scheme. Based on a new
class of multidimensional incremental parsing algorithms extended from the Lempel-Ziv incremental parsing,
a new framework for image retrieval, which takes advantage of the source characterization property of the
incremental parsing algorithm, was proposed recently. With the incremental parsing technique, a given image is
decomposed into a number of patches, called a parsed representation. This representation can be thought of as
a morphological interface between elementary pixel and a higher level representation. In this work, we examine
the properties of two-dimensional parsed representation in the context of imagery information retrieval and in
contrast to vector quantization; i.e. fixed square-block representations and minimum average distortion criteria.
We implemented four image retrieval systems for the comparative study; three, called IPSILON image retrieval
systems, use parsed representation with different perceptual distortion thresholds and one uses the convectional
vector quantization for visual pattern analysis. We observe that different perceptual distortion in visual pattern
matching does not have serious effects on the retrieval precision although allowing looser perceptual thresholds
in image compression result poor reconstruction fidelity. We compare the effectiveness of the use of the parsed
representations, as constructed under the latent semantic analysis (LSA) paradigm so as to investigate their
varying capabilities in capturing semantic concepts. The result clearly demonstrates the superiority of the
parsed representation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The saliency map is useful for many applications such as image compression, display, and visualization. However, the
bottom-up model used in most saliency map construction methods is computationally expensive. The purpose of this
paper is to improve the efficiency of the model for automatic construction of the saliency map of an image while
preserving its accuracy. In particular, we remove the contrast sensitivity function and the visual masking component of
the bottom-up visual attention model and retain the components related to perceptual decomposition and center-surround
interaction that are critical properties of human visual system. The simplified model is verified by performance
comparison with the ground truth. In addition, a salient region enhancement technique is adopted to enhance the
connectivity of the saliency map, and the saliency maps of three color channels are fused to enhance the prediction
accuracy. Experimental results show that the average correlation between our algorithm and the ground truth is close to
that between the original model and the ground truth, while the computational complexity is reduced by 98%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a novel unsupervised color image segmentation algorithm named GSEG. This Gradient-based
SEGmentation method is initialized by a vector gradient calculation in the CIE L*a*b* color space. The obtained
gradient map is utilized for initially clustering low gradient content, as well as automatically generating thresholds for a
computationally efficient dynamic region growth procedure, to segment regions of subsequent higher gradient densities
in the image. The resultant segmentation is combined with an entropy-based texture model in a statistical merging
procedure to obtain the final result. Qualitative and quantitative evaluation of our results on several hundred images,
utilizing a recently proposed evaluation metric called the Normalized Probabilistic Rand index shows that the GSEG
algorithm is robust to various image scenarios and performs favorably against published segmentation techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The data model for image representation in terms of projective Fourier transform (PFT) is well adapted to both image
perspective transformations and the retinotopic mappings of the brain visual pathways. Here we model first aspects of
the human visual process in which the understanding of a scene is built up in a sequence of fixations for visual
information acquisition followed by fast saccadic eye movements that reposition the fovea on the next target. We make
about three saccades per second with an eyeball's maximum speed of 700 deg/sec. The visual sensitivity is markedly
reduced during saccadic eye movements such that, three times per second, there are instant large changes in the retinal
images without almost any information consciously carried across fixations. Inverse Projective Fourier transform is
computable by FFT in coordinates given by a complex logarithm that also approximates the retinotopy. Thus, it gives the
cortical image representation, and a simple translation in log coordinates brings the presaccadic scene into the
postsaccadic reference frame, eliminating the need for starting processing anew three times per second at each fixation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The classical approach to converting colour to greyscale is to code the luminance signal as a grey value image.
However, the problem with this approach is that the detail at equiluminant edges vanishes, and in the worst case
the greyscale reproduction of an equiluminant image is a single uniform grey value. The solution to this problem,
adopted by all algorithms in the field, is to try to code colour difference (or contrast) in the greyscale image. In this
paper we reconsider the Socolinsky and Wolff algorithm for colour to greyscale conversion. This algorithm, which
is the most mathematically elegant, often scores well in preference experiments but can introduce artefacts which
spoil the appearance of the final image. These artefacts are intrinsic to the method and stem from the underlying
approach which computes a greyscale image by a) calculating approximate luminance-type derivatives for the
colour image and b) re-integrating these to obtain a greyscale image. Unfortunately, the sign of the derivative
vector is sometimes unknown on an equiluminant edge and, in the current theory, is set arbitrarily. However,
choosing the wrong sign can lead to unnatural contrast gradients (not apparent in the colour original). Our
contribution is to show how this sign problem can be ameliorated using a generalised definition of luminance and
a Markov relaxation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Color plays a significant role in the scene interpretation in terms of visual perception. Numerous visual substitution
systems deal with grayscale images disregarding this information from original image. Visually percept color-based
details often fade due to the grayscale conversion and that can mislead the overall comprehension of the considered
scene. We present a decolorization method that considers color contrast and preserve color saliency after transformation.
We exploit this model to enhance the perception of visually disable persons over the interpreted images by the
substitution system. The results demonstrate that our enhance system is capable to improves the overall scene
interpretation in comparison with similar substitution system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Depicting three dimensional surfaces in such a way that distances between these surfaces can be estimated
quickly and accurately is a challenging task. A promising approach is the use of semi-transparent textures
i.e. only some parts of the surface are colored. We conducted an experiment to determine the performance of
subjects in perceiving distances between an opaque ground surface and specific points on an overlayed surface
which was visualized using isolines and curvature oriented strokes. The results show that response times for
curvature oriented strokes were faster compared to isolines. For a trusted interpretation of these results, a
plausible explanation has to be given. We hypothesize that users visually integrate the available three dimensional
positions and thereby come to an estimate. Further experiments were carried out in order to formulate a model
which describes the involved perceptual process as several attention shifts between three dimensional positions.
The results of the experiments are reported here.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The computational view on image quality of Janssen and Blommaert states that the quality of an image is determined by
the degree to which the image is both useful (discriminability) and natural (identifiability). This theory is tested by
creating two manipulations. Firstly, multiplication of the chroma values of each pixel with a constant in the CIELab
color space, i.e., chroma manipulation, is expected to increase only the usefulness by increasing the distances between
the individual color points, enhancing the contrast. Secondly, introducing stereoscopic depth by varying the screen
disparity, i.e., depth manipulation, is expected to increase both the usefulness and the naturalness. Twenty participants
assessed perceived image quality, perceived naturalness and perceived depth of the manipulated versions of two natural
scenes. The results revealed a small, yet significant shift between image quality and naturalness as a function of the
chroma manipulation. In line with previous research, preference in quality was shifted to higher chroma values in
comparison to preference in naturalness. Introducing depth enhanced the naturalness scores, however, in contrast to our
expectations, not the image quality scores. It is argued that image quality is not sufficient to evaluate the full experience
of 3D. Image quality appears to be only one of the attributes underlying the naturalness of stereoscopic images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Human perception of material colors depends heavily on the nature of the light sources used for illumination.
One and the same object can cause highly different color impressions when lit by a vapor lamp or by daylight,
respectively. Based on state-of-the-art colorimetric methods we present a modern approach for calculating
color rendering indices (CRI), which were defined by the International Commission on Illumination (CIE) to
characterize color reproduction properties of illuminants. We update the standard CIE method in three main
points: firstly, we use the CIELAB color space, secondly, we apply a Bradford transformation for chromatic
adaptation, and finally, we evaluate color differences using the CIEDE2000 total color difference formula.
Moreover, within a real-world scene, light incident on a measurement surface is composed of a direct and
an indirect part. Neumann and Schanda1 have shown for the cube model that interreflections can influence the
CRI of an illuminant. We analyze how color rendering indices vary in a real-world scene with mixed direct and
indirect illumination and recommend the usage of a spectral rendering engine instead of an RGB based renderer
for reasons of accuracy of CRI calculations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Range maps have been actively studied in the last few years in the context of depth perception in natural
scenes. With the availability of co-registered luminance information, we have the ability to examine and
model the statistical relationships between luminance, range and disparity. In this study, we find that a onesided
generalized gaussian distribution closely fits the prior of the range gradient. This finding sheds new
light on statistical modeling of 2D and 3D image features in natural scenes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vision III Imaging, Inc. (the Company) has developed Parallax Image Display (PIDTM) software tools to critically
align and display aerial images with parallax differences. Terrain features are rendered obvious to the viewer when
critically aligned images are presented alternately at 4.3 Hz. The recent inclusion of digital elevation models in
geographic data browsers now allows true three-dimensional parallax to be acquired from virtual globe programs like
Google Earth. The authors have successfully developed PID methods and code that allow three-dimensional
geographical terrain data to be visualized using temporal parallax differences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Viewing video on mobile devices is becoming increasingly common. The small field-of-view and the vibrations in
common commuting environments present challenges (hardware and software) for the imaging community. By
monitoring the vibration of the display, it could be possible to stabilize an image on the display by shifting a portion of a
large image with the display (a field-of-view expansion approach). However, the image should not be shifted exactly per
display motion because eye movements have a 'self-adjustment' ability to partially or completely compensate for
external motions that can make a perfect compensation appear to overshoot. In this work, accelerometers were used to
measure the motion of a range of vehicles, and observers' heads and hands as they rode in those vehicles to support the
development of display motion compensation algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper gives an overview of the visual representation of reality with three imaging technologies: painting,
photography and electronic imaging. The contribution of the important image aspects, called dimensions
hereafter, such as color, fine detail and total image size, to the degree of reality and aesthetic value of the
rendered image are described for each of these technologies. Whereas quite a few of these dimensions - or
approximations, or even only suggestions thereof - were already present in prehistoric paintings, apparent
motion and true stereoscopic vision only recently were added - unfortunately also introducing accessibility and
image safety issues. Efforts are made to reduce the incidence of undesirable biomedical effects such as
photosensitive seizures (PSS), visually induced motion sickness (VIMS), and visual fatigue from stereoscopic
images (VFSI) by international standardization of the image parameters to be avoided by image providers and
display manufacturers. The history of this type of standardization, from an International Workshop Agreement to
a strategy for accomplishing effective international standardization by ISO, is treated at some length. One of the
difficulties to be mastered in this process is the reconciliation of the, sometimes opposing, interests of vulnerable
persons, thrill-seeking viewers, creative video designers and the game industry.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The problems of estimating the position of an illuminant and the direction of illumination in realist paintings
have been addressed using algorithms from computer vision. These algorithms fall into two general categories: In
model-independent methods (cast-shadow analysis, occluding-contour analysis, ...), one does not need to know
or assume the three-dimensional shapes of the objects in the scene. In model-dependent methods (shape-fromshading,
full computer graphics synthesis, ...), one does need to know or assume the three-dimensional shapes.
We explore the intermediate- or weak-model condition, where the three-dimensional object rendered is so simple
one can very confidently assume its three-dimensional shape and, further, that this shape admits an analytic
derivation of the appearance model. Specifically, we can assume that floors and walls are flat and that they
are horizontal and vertical, respectively. We derived the maximum-likelihood estimator for the two-dimensional
spatial location of a point source in an image as a function of the pattern of brightness (or grayscale value) over
such a planar surface. We applied our methods to two paintings of the Baroque, paintings for which the question
of the illuminant position is of interest to art historians: Georges de la Tour's Christ in the carpenter's studio
(1645) and Caravaggio's The calling of St. Matthew (1599-1600). Our analyses show that a single point source
(somewhat near to the depicted candle) is a slightly better explanation of the pattern of brightness on the floor
in Christ than are two point sources, one in place of each of the figures. The luminance pattern on the rear wall
in The calling implies the source is local, a few meters outside the picture frame-not the infinitely distant sun.
Both results are consistent with previous rebuttals of the recent art historical claim that these paintings were
executed by means of tracing optically projected images. Our method is the first application of such weak-model
methods for inferring the location of illuminants in realist paintings and should find use in other questions in
the history of art.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An emerging body of research suggests that artists consistently seek modes of representation that are efficiently
processed by the human visual system, and that these shared properties could leave statistical signatures. In earlier work,
we showed evidence that perceived similarity of representational art could be predicted using intensity statistics to which
the early visual system is attuned, though semantic content was also found to be an important factor. Here we report two
studies that examine the visual perception of similarity. We test a collection of non-representational art, which we argue
possesses useful statistical and semantic properties, in terms of the relationship between image statistics and basic
perceptual responses. We find two simple statistics-both expressed as single values-that predict nearly a third of the
overall variance in similarity judgments of abstract art. An efficient visual system could make a quick and reasonable
guess as to the relationship of a given image to others (i.e., its context) by extracting these basic statistics early in the
visual stream, and this may hold for natural scenes as well as art. But a major component of many types of art is
representational content. In a second study, we present findings related to efficient representation of natural scene
luminances in landscapes by a well-known painter. We show empirically that elements of contemporary approaches to
high-dynamic range tone-mapping-which are themselves deeply rooted in an understanding of early visual system
coding-are present in the way Vincent Van Gogh transforms scene luminances into painting luminances. We argue that
global tone mapping functions are a useful descriptor of an artist's perceptual goals with respect to global illumination
and we present evidence that mapping the scene to a painting with different implied lighting properties produces a less
efficient mapping. Together, these studies suggest that statistical regularities in art can shed light on visual processing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The title painting, in the Royal Picture Gallery Mauritshuis in The Hague, is remarkable in that every figure
and every part of every building is clearly discernible in the minutest detail: decorations, weathercock, bells
in the church tower, and so on. Thousands of individual bricks are visible in the buildings at the left and the
question has been posed by art scholars as to whether these bricks were laboriously painted individually or
instead more efficiently pressed to the painting by some form of template, for instance by pressing a wet print
against the painting. Close inspection of the painting in raking light reveals that the mortar work is rendered
in thick, protruding paint, but such visual analysis, while highly suggestive, does not prove van der Heyden
employed counterproofing; as such evidence must be sought in order to corroborate this hypothesis. If some form
of counterproofing was employed by the artist, there might be at least some repeated patterns of the bricks, as
the master print master was shifted from place to place in the painting. Visual search for candidate repeated
passages of bricks by art scholars has proven tedious and unreliable. For this reason, we instead used a method
based on computer forensics for detecting nearly identical repeated patterns within an image: discrete crosscorrelation.
Specifically, we preprocessed a high-resolution photograph of the painting and used thresholding and
image processing to enhance the brickwork. Then we convolved small portions of this processed image of the
brickwork with all areas of brickwork throughout the painting. Our results reveal only small regions of moderate
cross-correlation. Most importantly, the limited spatial extent of matching regions shows that the peaks found
are not significantly higher than would occur by chance in a hand-executed work or in one created using a single
counterproof. To our knowledge, ours is the first use of cross-correlation to search for repeated patterns in a
realist painting to answer a question in the history of art.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Chiasmus is a responsive and dynamically reflective, two-sided volumetric surface that embodies phenomenological
issues such as the formation of images, observer and machine perception and the dynamics of the screen as a space of
image reception. It consists of a square grid of 64 individually motorized cube elements engineered to move linearly.
Each cube is controlled by custom software that analyzes video imagery for luminance values and sends these values to
the motor control mechanisms to coordinate the individual movements. The resolution of the sculptural screen from the
individual movements allows its volume to dynamically alter, providing novel and unique perspectives of its mobile
form to an observer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In an earlier paper we showed, that perceived quality of channel zapping is related to the perceived quality of download
time of web browsing, as suggested by ITU-T Rec.G.1030. We showed this by performing subjective tests resulting in an
excellent fit with a 0.99 correlation. This was what we call a lean forward experiment and gave the rule of thumb result
that the zapping time must be less than 0.43 sec to be good ( > 3.5 on the MOS scale). To validate the model we have
done new subjective experiments. These experiments included lean backwards zapping i.e. sitting in a sofa with a remote
control. The subjects are more forgiving in this case and the requirement could be relaxed to 0.67 sec. We also conducted
subjective experiments where the zapping times are varying. We found that the MOS rating decreases if zapping delay
times are varying. In our experiments we assumed uniformly distributed delays, where the variance cannot be larger than
the mean delay. We found that in order to obtain a MOS rating of at least 3.5, that the maximum allowed variance, and
thus also the maximum allowed mean zapping delay, is 0.46 sec.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A Visual Model (VM) is used to aid in the design of an Ultra-high Definition (UHD) upscaling algorithm that renders
High Definition legacy content on a UHD display. The costly development of such algorithms is due, in part, to the time
spent subjectively evaluating the adjustment of algorithm structural variations and parameters. The VM provides an
image map that gives feedback to the design engineer about visual differences between algorithm variations, or about
whether a costly algorithm improvement will be visible at expected viewing distances. Such visual feedback reduces the
need for subjective evaluation.
This paper presents the results of experimentally verifying the VM against subjective tests of visibility improvement
versus viewing distance for three upscaling algorithms. Observers evaluated image differences for upscaled versions of
high-resolution stills and HD (Blu-ray) images, viewing a reference and test image, and controlled a linear blending
weight to determine the image discrimination threshold. The required thresholds vs. viewing distance varied as
expected, with larger amounts of the test image required at further distances. We verify the VM by comparison of
predicted discrimination thresholds versus the subjective data. After verification, VM visible difference maps are
presented to illustrate the practical use of the VM during design.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we lay groundwork for a model of extended dendritic processing based on temporal signalling using a model in hyperbolic space. The intended goal is to create a processing environment in which metaphorical and analogical processing is natural to the components. A secondary goal is to create a processing model which is naturally complex, naturally based in fractal and complex flows, and creates communication based on a compatibility rather than a duplication model. This is a still a work in progress, but some gains are made in creating the background model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Comprehension of a sentence under a wide range of delay conditions between auditory and visual stimuli was measured
in the environment with low auditory clarity of the level of -10dB and -15dB pink noise. Results showed that the image
was helpful for comprehension of the noise-obscured voice stimulus when the delay between the auditory and visual
stimuli was 4 frames (=132msec) or less, the image was not helpful for comprehension when the delay between the
auditory and visual stimulus was 8 frames (=264msec) or more, and in some cases of the largest delay (32 frames), the
video image interfered with comprehension.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study aimed to determine the relationship between the image-sticking phenomenon and human visual
perception. The contrast sensitivity of various checkerboard patterns was measured over a wide range of spatial frequencies.
Four subjective tests were designed to determine the contrast threshold of spatial frequency, edge effect, and noise
interference for the checkerboard stimuli. The experimental results were divided into four parts: (1) Special frequency: the
contrast sensitivity remains on 45 dB (dB = 20 log10 (1/contrast)) steadily in low-spatial frequency. The contrast sensitivity
dropped drastically when the spatial frequency was increased from 0.5 to 1.3(log C/deg). The spatial frequency 0.5 (log
C/deg) had maximum contrast sensitivity (2) Edge effect: the original checkerboard pattern was filtered by convolution
with the different mean filter sizes to produce a variety of scale blurred edges, and to estimate the influence of the edge
effect as regards human perception. The results showed that sharper edges of checkerboard stimuli can affect the contrast
threshold (3) Gaussian noise: checkerboard stimuli add noise with Gaussian distribution to evaluate the addition of the
noise effect for checkerboard stimuli. The low-contrast checkerboard stimuli were affect that when σ becomes greater and
the level of contrast sensitivity drops (4) Simultaneous edge effect and Gaussian noise interference: the level of contrast
sensitivity is lower than the others and the curve of contrast sensitivity is similar to that of Gaussian noise. According to the
experimental result the contribution of the spatial frequency, edge effect, and noise interference for human visual
perception could be determined.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.