In this paper videos are analyzed to get a content-based description of the video. The structure of a given video is useful to index long videos efficiently and automatically. A comparison between shots gives an overview about cut frequency, cut pattern, and scene bounds. After a shot detection the shots are grouped into clusters based on their visual similarity. A time-constraint clustering procedure is used to compare only those shots that are positioned inside a time range. Shots from different areas of the video (e.g., begin/end) are not compared. With this cluster information that contains a list about shots and their clusters it is possible to calculate scene bounds. A labeling of all clusters gives a declaration about the cut pattern. It is easy now to distinguish a dialogue from an action scene. The final content analysis is done by the ImageMinerTM system. The ImageMiner system developed at the University of Bremen of the Image Processing Department of the Center for Computing Technology realizes content-based image retrieval for still images through a novel combination of methods and techniques of computer vision and artificial intelligence. The ImageMiner system consists of three analysis modules for computer vision, namely for color, texture, and contour analysis. Additionally exists a module for object recognition. The output of the object recognition module can be indexed by a text retrieval system. Thus, concepts like forestscene may be searched for. We combine the still image analysis with the results of the video analysis in order to retrieve shots or scenes.
The large amount of available multimedia information (e.g. videos, audio, images) requires efficient and effective annotation and retrieval methods. As videos start playing a more important role in the frame of multimedia, we want to make these available for content-based retrieval. The ImageMiner-System, which was developed at the University of Bremen in the AI group, is designed for content-based retrieval of single images by a new combination of techniques and methods from computer vision and artificial intelligence. In our approach to make videos available for retrieval in a large database of videos and images there are two necessary steps: First, the detection and extraction of shots from a video, which is done by a histogram based method and second, the construction of the separate frames in a shot to one still single images. This is performed by a mosaicing-technique. The resulting mosaiced image gives a one image visualization of the shot and can be analyzed by the ImageMiner-System. ImageMiner has been tested on several domains, (e.g. landscape images, technical drawings), which cover a wide range of applications.
Texture analysis plays an important role for automatic image segmentation and object recognition. Objects and regions in an image can be distinguished by their texture, where the distinction arises from the different physical surface properties of the objects represented. To a human observer the different textures in an image are usually very apparent, but the verbal description of the visual properties of these patterns is a difficult and ambiguous task. In computer vision it has turned out in theoretical and experimental comparisons of different methods that the co-occurrence matrix is suitable for texture analysis. Therefore, in this approach the co-occurrence matrix is used as a mathematical model for natural textures. We propose a promising improvement for texture classification and description in the context of natural textures. After developing a new abstract language for describing visual properties of natural textures, we establish a relation between these visual properties used by a human observer, and statistical textural features computed out of the digital image data. Our experiments indicate that some statistical features are more significant for classifying natural textures than others. Finally we apply our new approach on landscape scenes: we show how the new language is used for defining texture classes.
In order to retrieve a set of intended images from a huge image archive, human beings think of special contents with respect to the searched scene, like a countryside or a technical drawing. Therefore, in general it is harder to retrieve images by using a syntactical feature- based language than a language which offers the selection of examples concerning color, texture, and contour in combination with natural language concepts. This motivation leads to a content-based image analysis and goes on to a content-based storage and retrieval of images. Furthermore, it is unreasonable for any human being to make the content description for thousands of images manually. From this point of view, the project IRIS (image retrieval for information systems) combines well-known methods and techniques in computer vision and AI in a new way to generate content descriptions of images in a textual form automatically. IRIS retrieves the images by means of text retrieval realized by the SearchManager/6000. The textual description is generated by four sub-steps: feature extraction like colors, textures, and contours, segmentation, and interpretation of part-whole relations. The system is implemented on IBM RS/6000 using AIX. It has already been tested with 350 images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.