PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 6506, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and the Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Data mining techniques have been applied in video databases to identify various patterns or groups. Clustering analysis is used to find the patterns and groups of moving objects in video surveillance systems. Most existing methods for the clustering focus on finding the optimum of overall partitioning. However, these approaches cannot provide meaningful descriptions of the clusters. Also, they are not very suitable for moving object databases since video data have spatial and temporal characteristics, and high-dimensional attributes. In this paper, we propose a model-based conceptual clustering (MCC) of moving objects in video surveillance based on a formal concept analysis. Our proposed MCC consists of three steps: 'model formation', 'model-based concept analysis', and 'concept graph generation'. The generated concept graph provides conceptual descriptions of moving objects. In order to assess the proposed approach, we conduct comprehensive experiments with artificial and real video surveillance data sets. The experimental results indicate that our MCC dominates two other methods, i.e., generality-based and error-based conceptual clustering algorithms, in terms of quality of concepts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The purpose of this paper is to propose a color image watermarking scheme based on an image dependent color gamut
sampling of the L*a*b* color space. The main motivation of this work is to control the reproduction of color images on
different output devices in order to have the same color feeling, coupling intrinsic informations on the image gamut and
output device calibration. This paper is focused firstly on the research of an optimal LUT (Look Up Table) which both
circumscribes the color gamut of the studied image and samples the color distribution of this image. This LUT is next
embedded in the image as a secret message. The principle of the watermarking scheme is to modify the pixel value of the
host image without causing any change neither in image appearance nor on the shape of the image gamut.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the medical world, the accuracy of diagnosis is mainly affected by either lack of sufficient understanding of
some diseases or the inter-, and/or intra-observer variability of the diagnoses. The former requires understanding
the progress of diseases at much earlier stages, extraction of important information from ever growing amounts
of data, and finally finding correlations with certain features and complications that will illuminate the disease
progression. The latter (inter-, and intra- observer variability) is caused by the differences in the experience
levels of different medical experts (inter-observer variability) or by mental and physical tiredness of one expert
(intra-observer variability). We believe that the use of large databases can help improve the current status
of disease understanding and decision making. By comparing large number of patients, some of the otherwise
hidden relations can be revealed that results in better understanding, patients with similar complications can
be found, the diagnosis and treatment can be compared so that the medical expert can make a better diagnosis.
To this effect, this paper introduces a search and retrieval system for brain MR databases and shows that brain
iron accumulation shape provides additional information to the shape-insensitive features, such as the total brain
iron load, that are commonly used in the clinics. We propose to use Kendall's correlation value to automatically
compare various returns to a query. We also describe a fully automated and fast brain MR image analysis system
to detect degenerative iron accumulation in brain, as it is the case in Alzheimer's and Parkinson's. The system
is composed of several novel image processing algorithms and has been extensively tested in Leiden University
Medical Center over so far more than 600 patients.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We tested our previously reported sports highlights playback for personal video recorders with a carefully chosen set of
sports aficionados. Each subject spent about an hour with the content, going through the same basic steps of
introduction, trying out the system, and follow up questionnaire. The main conclusion was that the users unanimously
liked the functionality very much even when it made mistakes. Furthermore, the users felt that if the user interface were
made much more responsive so as to quickly compensate for false alarms and misses, the functionality would be vastly
enhanced. The ability to choose summaries of any desired length turned out to be the main attraction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Informedia group at Carnegie Mellon University has since 1994 been developing and evaluating surrogates,
summary interfaces, and visualizations for accessing digital video collections containing thousands of documents,
millions of shots, and terabytes of data. This paper reports on TRECVID 2005 and 2006 interactive search tasks
conducted with the Informedia system by users having no knowledge of Informedia or other video retrieval interfaces,
but being experts in analyst activities. Think-aloud protocols, questionnaires, and interviews were also conducted with
this user group to assess the contributions of various video summarization and browsing techniques with respect to
broadcast news test corpora. Lessons learned from these user interactions are reported, with recommendations on both
interface improvements for video retrieval systems and enhancing the ecological validity of video retrieval interface
evaluations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic video summarization has become an active research topic in content-based video processing. However, not much emphasis has been placed on developing rigorous summary evaluation methods and developing summarization systems based on a clear understanding of user needs, obtained through user centered design. In this paper we address these two topics and propose an automatic video summary evaluation algorithm adapted from teh text summarization domain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a framework for improving the image index obtained by automated image annotation. Within this framework, the technique of keyword combination is used for fast image re-indexing based on initial automated annotations. It aims to tackle the challenges of limited vocabulary size and low annotation accuracies resulting from differences between training and test collections. It is useful for situations when these two problems are not anticipated at the time of annotation. We show that based on example images from the automatically annotated collection, it is often possible to find multiple keyword queries that can retrieve new image concepts which are not present in the training vocabulary, and improve retrieval results of those that are already present. We demonstrate that this can be done at a very small computational cost and at an acceptable performance tradeoff, compared to traditional annotation models. We present a simple, robust, and computationally efficient approach for finding an appropriate set of keywords for a given target concept. We report results on TRECVID 2005, Getty Image Archive, and Web image datasets, the last two of which were specifically constructed to support realistic retrieval scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Speaker change detection (SCD) is a preliminary step for many audio applications such as speaker segmentation
and recognition. Thus, its robustness is crucial to achieve a good performance in the later steps. Especially,
misses (false negatives) affect the results. For some applications, domain-specific characteristics can be used to
improve the reliability of the SCD. In broadcast news and discussions, the cooccurrence of shot boundaries and
change points provides a robust clue for speaker changes.
In this paper, two multimodal approaches are presented that utilize the results of a shot boundary detection
(SBD) step to improve the robustness of the SCD. Both approaches clearly outperform the audio-only approach
and are exclusively applicable for TV broadcast news and plenary discussions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video segmentation for content based retrieval has traditionally been done using shot cut detection algorithms
that search for abrupt changes in scene content. Surveillance videos however, usually use still cameras, and do
not contain any shots. Hence, a novel high level semantic change detection algorithm is proposed in this paper
that uses object trajectory features to segment surveillance footage. These trajectory features are extracted
automatically, using background subtraction and a multiple blob tracking algorithm. The trajectory features
are first used to remove false object detections from background subtraction. Semantics extracted from the
remaining object trajectories are then used to segment the video. The results of the algorithm when applied to
surveillance data are compared with hand labeled segmentation to obtain precision recall curves and harmonic
mean. Comparisons with traditional background subtraction and video segmentation algorithms show a drastic
improvement in performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A photograph captured by a digital camera usually includes camera metadata in which sensor readings, camera settings
and other capture pipeline information are recorded. The camera metadata, typically stored in an EXIF header,
contains a rich set of information reflecting the conditions under which the photograph was captured. This set of rich
information can be potentially useful for improvement in digital photography but its multi-dimensionality and
heterogeneous data structure make it difficult to be useful. Knowledge discovery, on the other hand, is usually
associated with data mining to extract potentially useful information from complex data sets. In this paper we use a
knowledge discovery framework based on data mining to automatically associate combinations of high-dimensional,
heterogeneous metadata with scene types. In this way, we can perform very simple and efficient scene classification for
certain types of photographs. We have also provided an interactive user interface in which a user can type in a query on
metadata and the system will retrieve from our image database the images that satisfy the query and display them. We
have used this approach to associate EXIF metadata with specific scene types like back-lit scenes, night scenes and snow
scenes. To improve the classification results, we have combined an initial classification based only on the metadata with
a simple, histogram based analysis for quick verification of the discovered knowledge. The classification results, in turn,
can be used to better manage, assess, or enhance the photographs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The SenseCam is a prototype device from Microsoft that facilitates automatic capture of images of a person's
life by integrating a colour camera, storage media and multiple sensors into a small wearable device. However,
efficient search methods are required to reduce the user's burden of sifting through the thousands of images that
are captured per day. In this paper, we describe experiments using colour spatiogram and block-based cross-correlation
image features in conjunction with accelerometer sensor readings to cluster a day's worth of data into
meaningful events, allowing the user to quickly browse a day's captured images. Two different low-complexity
algorithms are detailed and evaluated for SenseCam image clustering.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose an approach for automatically recognizing persons in images based on their general
outer appearance. Therefore we build a statistical model for each person. Large amounts of training data
are collected and labeled automatically by using a visual sensor array capturing image sequences containing the
person to be learnt. Foreground-background segementation is performed to seperate the person from background,
thus enabeling to learn the persons appearance independent of the background. Color and gradient features are
extracted representing the segmented person. Person recognition of incoming photos is carried out using (k)-
Nearest Neighbor(s) classification and the normalized histogram intersection match value is used as distance
measure. Reported experimental results show that the presented approach performs well.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a storage format which binds digital broadcasts with related data such as TV-Anytime
metadata, additional multimedia resources, and personal viewing history. The goal of the
proposed format is to make it possible to offer personalized content consumption after recording
broadcasting contents to storage devices, e.g., HD-DVD and Blu-ray Disc. To achieve that, we adopt
MPEG-4 file format as a container and apply a binary format for scenes (BIFS) for representing and
rendering personal viewing history. In addition, TV-Anytime metadata is used to describe broadcasts and
to refer to the additional multimedia resources, e.g, images, audio clips, and short video clips. To
demonstrate the usefulness of the proposed format, we introduce an application scenario and test it on that
scenario.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The extensive amount of video data stored on available media (hard and optical disks) necessitates video content
analysis, which is a cornerstone for different user-friendly applications, such as, smart video retrieval and intelligent
video summarization. This paper aims at finding a unified and efficient framework for court-net sports
video analysis. We concentrate on techniques that are generally applicable for more than one sports type to come
to a unified approach. To this end, our framework employs the concept of multi-level analysis, where a novel 3-D
camera modeling is utilized to bridge the gap between the object-level and the scene-level analysis. The new 3-D
camera modeling is based on collecting features points from two planes, which are perpendicular to each other, so
that a true 3-D reference is obtained. Another important contribution is a new tracking algorithm for the objects
(i.e. players). The algorithm can track up to four players simultaneously. The complete system contributes to
summarization by various forms of information, of which the most important are the moving trajectory and
real-speed of each player, as well as 3-D height information of objects and the semantic event segments in a
game. We illustrate the performance of the proposed system by evaluating it for a variety of court-net sports
videos containing badminton, tennis and volleyball, and we show that the feature detection performance is above
92% and events detection about 90%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image collections are most often domain specific. We have developed a system for image retrieval of multimodal
microscopy images. That is, the same object of study visualized with a range of microscope techniques and with a range
of different resolutions. In microscopy, image content is depending on the preparation method of the object under study
as well as the microscope technique. Both are taken into account in the submission phase as metadata whilst at the same
time (domain specific) ontologies are employed as controlled vocabularies to annotate the image. From that point
onward, image data are interrelated through the relationships derived from annotated concepts in the ontology. By using
concepts and relationships of an ontology, complex queries can be built with true semantic content. Image metadata can
be used as powerful criteria to query image data which are directly or indirectly related to original data. The results of
image retrieval can be represented using a structural graph by exploiting relationships from ontology rather than a listed
table. Applying this to retrieve images from the same subject at different levels of resolution opens a new field for the
analysis of image content.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Analysis of gene expression patterns within an organism plays a critical role in associating genes
with biological processes in both health and disease. During embryonic development the analysis and
comparison of different gene expression patterns allows biologists to identify candidate genes that may
regulate the formation of normal tissues and organs and to search for genes associated with congenital
diseases. No two individual embryos, or organs, are exactly the same shape or size so comparing spatial
gene expression in one embryo to that in another is difficult. We will present our efforts in comparing
gene expression data collected using both volumetric and projection approaches. Volumetric data is
highly accurate but difficult to process and compare. Projection methods use UV mapping to align texture
maps to standardized spatial frameworks. This approach is less accurate but is very rapid and requires
very little processing. We have built a database of over 180 3D models depicting gene expression
patterns mapped onto the surface of spline based embryo models. Gene expression data in different
models can easily be compared to determine common regions of activity. Visualization software, both
Java and OpenGL optimized for viewing 3D gene expression data will also be demonstrated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Morphometrics from images, image analysis, may reveal differences between classes of objects present in the images.
We have performed an image-features-based classification for the pathogenic yeast Cryptococcus neoformans. Building
and analyzing image collections from the yeast under different environmental or genetic conditions may help to
diagnose a new "unseen" situation. Diagnosis here means that retrieval of the relevant information from the image
collection is at hand each time a new "sample" is presented. The basidiomycetous yeast Cryptococcus neoformans can
cause infections such as meningitis or pneumonia. The presence of an extra-cellular capsule is known to be related to
virulence. This paper reports on the approach towards developing classifiers for detecting potentially more or less
virulent cells in a sample, i.e. an image, by using a range of features derived from the shape or density distribution. The
classifier can henceforth be used for automating screening and annotating existing image collections. In addition we will
present our methods for creating samples, collecting images, image preprocessing, identifying "yeast cells" and creating
feature extraction from the images. We compare various expertise based and fully automated methods of feature
selection and benchmark a range of classification algorithms and illustrate successful application to this particular
domain.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Although considerable work has been done in management of "structured" video such as movies, sports, and
television programs that has known scene structures, "unstructured" video analysis is still a challenging problem
due to its unrestricted nature. The purpose of this paper is to address issues in the analysis of unstructured video
and in particular video shot by a typical unprofessional user (i.e home video). We describe how one can make use
of camera motion information for unstructured video analysis. A new concept, "camera viewing direction," is
introduced as the building block of home video analysis. Motion displacement vectors are employed to temporally
segment the video based on this concept. We then find the correspondence between the camera behavior with
respect to the subjective importance of the information in each segment and describe how different patterns in
the camera motion can indicate levels of interest in a particular object or scene. By extracting these patterns,
the most representative frames, keyframes, for the scenes are determined and aggregated to summarize the video
sequence.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The proliferation of captured personal and broadcast content in personal consumer archives necessitates comfortable
access to stored audiovisual content. Intuitive retrieval and navigation solutions require however a semantic level that
cannot be reached by generic multimedia content analysis alone. A fusion with film grammar rules can help to boost the
reliability significantly. The current paper describes the fusion of low-level content analysis cues including face
parameters and inter-shot similarities to segment commercial content into film grammar rule-based entities and
subsequently classify those sequences into so-called shot reverse shots, i.e. dialog sequences. Moreover shot reverse shot
specific mid-level cues are analyzed augmenting the shot reverse shot information with dialog specific descriptions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, more and more people capture their experiences in home videos. However, home video editing still is a
difficult and time-consuming task. We present the Edit While Watching system that allows users to automatically create
and change a summary of a home video in an easy, intuitive and lean-back way. Based on content analysis, video is
indexed, segmented, and combined with proper music and editing effects. The result is an automatically generated home
video summary that is shown to the user. While watching it, users can indicate whether they like certain content, so that
the system will adapt the summary to contain more content that is similar or related to the displayed content. During the
video playback users can also modify and enrich the content, seeing immediately the effects of their changes. Edit While
Watching does not require a complex user interface: a TV and a few keys of a remote control are sufficient. A user study
has shown that it is easy to learn and to use, even if users expressed the need for more control in the editing operations
and in the editing process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose an effective framework for semantic analysis of human motion from a monocular
video. As it is difficult to find a good motion description for humans, we focus on a reliable recognition of the
motion type and estimate the body orientation involved in the video sequence. Our framework analyzes the
body motion in three modules: a pre-processing module, matching module and semantic module. The proposed
framework includes novel object-level processing algorithms, such as a local descriptor and a global descriptor
to detect body parts and analyze the shape of the whole body as well. Both descriptors jointly contribute to the
matching process by incorporating them into a new weighted linear combination for matching. We also introduce
a simple cost function based on time-index di.erences to distinguish motion types and cycles in human motions.
Our system can provide three different types of analysis results: (1) foreground person detection; (2) motion
recognition in the sequence; (3) 3-D modeling of human motion based on generic human models. The proposed
framework was evaluated and proved its effectiveness as it achieves the motion recognition and body-orientation
classification at the accuracy of 95% and 98%, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a study on video viewing behavior. Based on a well-suited Markovian model, we have
developed a clustering algorithm called K-Models and inspired by the K-Means technique to cluster and analyze
behaviors. These models are constructed using the different actions proposed to the user while he is viewing a
video sequence (play, pause, forward, rewind, jump, stop). We have applied our algorithm with a movie trailer
mining tool. This tool allows users to perform searches on basic attributes (cast, director, onscreen date...) and
to watch selected trailers. With an appropriate server, we log every action to analyze behaviors. First results
obtained from a set of beta users answering to a set of de.ned questions reveals interesting typical behaviors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper presents the Argos evaluation campaign of video content analysis tools supported by the French Techno-
Vision program. This project aims at developing the resources of a benchmark of content analysis methods and
algorithms. The paper describes the type of the evaluated tasks, the way the content set has been produced, metrics and
tools developed for the evaluations and results obtained at the end of the first phase.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The rapid increase of technological innovations in the mobile phone industry induces the research community to
develop new and advanced systems to optimize services offered by mobile phones operators (telcos) to maximize their
effectiveness and improve their business. Data mining algorithms can run over data produced by mobile phones usage
(e.g. image, video, text and logs files) to discover user's preferences and predict the most likely (to be purchased) offer
for each individual customer. One of the main challenges is the reduction of the learning time and cost of these
automatic tasks. In this paper we discuss an experiment where a commercial offer is composed by a small picture
augmented with a short text describing the offer itself. Each customer's purchase is properly logged with all relevant
information. Upon arrival of new items we need to learn who the best customers (prospects) for each item are, that is,
the ones most likely to be interested in purchasing that specific item. Such learning activity is time consuming and, in
our specific case, is not applicable given the large number of new items arriving every day. Basically, given the current
customer base we are not able to learn on all new items. Thus, we need somehow to select among those new items to
identify the best candidates. We do so by using a joint analysis between visual features and text to estimate how good
each new item could be, that is, whether or not is worth to learn on it. Preliminary results show the effectiveness of the
proposed approach to improve classical data mining techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the advent and proliferation of low cost and high performance digital video recorder devices, an increasing
number of personal home video clips are recorded and stored by the consumers. Compared to image data, video
data is lager in size and richer in multimedia content. Efficient access to video content is expected to be more
challenging than image mining. Previously, we have developed a content-based image retrieval system and the
benchmarking framework for personal images. In this paper, we extend our personal image retrieval system to
include personal home video clips.
A possible initial solution to video mining is to represent video clips by a set of key frames extracted from
them thus converting the problem into an image search one. Here we report that a careful selection of key
frames may improve the retrieval accuracy. However, because video also has temporal dimension, its key frame
representation is inherently limited. The use of temporal information can give us better representation for video
content at semantic object and concept levels than image-only based representation.
In this paper we propose a bottom-up framework to combine interest point tracking, image segmentation and
motion-shape factorization to decompose the video into spatiotemporal regions. We show an example application
of activity concept detection using the trajectories extracted from the spatio-temporal regions. The proposed
approach shows good potential for concise representation and indexing of objects and their motion in real-life
consumer video.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.