PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
1Tokyo Institute of Technology (Japan) 2National Yunlin University of Science and Technology (Taiwan) 3National Sun Yat-sen Univ. (Taiwan) 4Korea Aerospace Univ. (Korea, Republic of) 5Nanyang Technological Univ. (Singapore) 6Univ. Tunku Abdul Rahman (UTAR) (Malaysia)
This PDF file contains the front matter associated with SPIE Proceedings Volume 13510, including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
International Workshop on Advanced Imaging Technology (IWAIT) 2025
Breast cancer remains one of the most prevalent and life-threatening diseases among women worldwide. Early diagnosis of breast cancer is pivotal in improving patient outcomes and survival rates. The earliest signs of nonpalpable breast cancer are calcifications. This paper proposes a deep learning network for breast calcification areas detection based on YOLO with self-attention mechanism. By using Bi-Level Routing Attention (BRA) mechanisms, the model’s performance can be significantly enhanced. Later, the modified Bi-directional Feature Pyramid Network (BiFPN) technique was used. The advanced model architecture is a modification of the YOLOv8 framework. In order to improve the instances detection of breast calcification, we applied several image preprocessing steps. The contrast of each input image was enhanced and standardized, and the images were resized to a fixed resolution. Utilizing k-fold cross-validation, multiple supervised machine learning techniques were compared. The model demonstrated effective performance across various metrics in the task of calcification detection, achieving a precision rate of 99.32%, a recall rate of 85.0% and an F1-score of 91.59% at the IoU threshold of 0.6. Based on these experimental results, the model is shown to reliably detect areas of breast calcification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Moiré effect is important in image processing because it causes unwanted patterns that reduce image quality, especially during scanning or digitization. Many deep learning-based approaches, including CNN- and Transformer-based methods, have been effectively employed to remove moiré patterns, delivering promising results. While CNNs struggle with modeling remote dependencies, transformers face challenges due to their quadratic computational complexity. Recently, the state space model (SSM) known as Mamba has emerged as a promising solution, efficiently capturing long-range interactions with the advantage of linear computational complexity. This paper proposes a two-stage moiré removal network through Mamba architecture for removing moiré patterns. In the first stage, we leverage Mamba’s capability to identify moiré-contaminated areas and analyze the spatial distribution of the contamination. In the second stage, the detected patterns, along with the contaminated image, are input into a refinement network for restoration. This distinct separation between detection and refinement enables a more precise and efficient removal of moiré patterns, leading to improved restoration outcomes. Experiments conducted on publicly available datasets demonstrate that our model outperforms state-of-the-art methods, achieving superior quantitative and qualitative results, and producing image restorations with enhanced clarity and fine detail.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many recent research papers have focused on improving the YOLO algorithm to enhance ship recognition accuracy and speed. However, little attention has been paid to the detection performance comparison between different versions of the original YOLO algorithms. Moreover, current ship image sets are either too few in categories or are aerial images and satellite remote sensing images which are unsuitable for Maritime Safety Administration. Based on the actual needs of the maritime department, we curated a dataset, ShipForMSA, containing 16 ship types with 9216 pictures of real-life photographs in total. We compared and analyzed the performance of five commonly used YOLO algorithms on the dataset by using Grad-CAM. We also designed a YOLO algorithm-based ship detection and recognition system with a recognition accuracy of 95.75%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Optical coherence tomography (OCT) is crucial in medical imaging, especially for retinal diagnostics. However, its effectiveness is often limited by imaging devices, resulting in high noise levels, low resolution, and reduced sampling rates, which hinder OCT image diagnosis. This paper proposes a generative adversarial network (GAN) based OCT image super-resolution framework that leverages a blind degradation and Multi-frame Fusion mechanism, namely MFGAN, for retinal OCT image super-resolution. Our method jointly performs denoising, blind super-resolution, and multi-frame fusion, which can reconstruct high quality OCT images without requiring paired ground-truth data. We employ a blind degradation model to handle OCT image degradation and a denoising prior to effectively process noisy inputs. Experimental results on the PKU37 dataset and the VIP Cup 2024 dataset demonstrate that MFGAN excels in both visual quality and quantitative performance, outperforming existing OCT image super-resolution methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, the integration of deep reinforcement learning (DRL) with virtual reality (VR) has opened new avenues for developing advanced interactive systems, with many Artificial Intelligence (AI) opponents being used in different games. Based on the VR agent we trained previously, we expand on the models and capabilities of the agents. By reflecting the agent’s physical state and capability to as close to the real world as possible, we train agents with different physical heights, different arm lengths and different speeds. By comparing these agents’ strategies to the real-world table tennis players, we can provide a more comprehensive list of AI opponents that is even closer to real humans in VR. Experimental results show that models with different attributes tend to adopt different strategies to win the game, which is of great significance for table tennis training in real life.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a method of source-free domain adaptation (SFDA) with a novel early stopping criterion for cardiac segmentation between CT and MRI. This approach enables stable segmentation when adapting a model trained on one modality to perform well on the other, while eliminating the need for ground-truth labels of the target domain. The proposed criterion evaluates segmentation results by aligning them with expected cardiac features, such as the heart’s near-spherical shape and distinct regions. This enables the model to stop training at an optimal point for accurate segmentation. Experiments using 20 CT and MRI volumes showed that our method achieved results comparable to partly using target domain’s ground-truth.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Weakly supervised semantic segmentation reduces annotation costs by using less detailed data, such as image-level labels or bounding boxes, but often suffers from lower accuracy due to insufficient annotations, leading to classification and boundary errors. Additionally, training weakly-supervised models requires complex algorithms, increasing computational resources and training time. This paper introduces an algorithm that combines a generalization unsupervised segmentation model with zero-shot learning and cross-modal understanding between images and texts. This approach reduces training time and computational costs while improving object boundary recognition. Tested on the PASCAL VOC 2012 dataset, the algorithm achieves a mean Intersection over Union (MIoU) of 77.3% on the test set, with a minor increase in computational speed by only 0.04 seconds per frame.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces a computational tool designed to simplify string art creation for beginners by automating the placement of pins and the calculation of thread paths to replicate input images. String art, a visually striking art form, uses threads strung across pins to create intricate designs, but accurately recreating images can be challenging for novices due to the precision required in pin placement and thread routing. Our tool leverages convex hull algorithms and genetic optimization to convert simple images with clear contours into string art patterns that resemble the input image in appearance. To achieve this, we define an optimization function that aims to minimize unused pins, avoid background areas, and reduce convex hull overlap, striving for clarity in the artwork. The genetic algorithm’s fitness function evaluates solutions based on these criteria, guiding the algorithm in selecting designs that align with the intended layout. Experimental results include images of the generated string art layouts, which approximate the target appearance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Medical image segmentation is essential for accurately extracting tissue structures or pathological regions from medical images. However, medical image segmentation methods are often influenced by factors such as image noise and irregular shapes, making precise segmentation challenging. To tackle these challenges, this paper proposes a triple-branch medical image segmentation network (TBIB-Net) that incorporates implicit boundary priors. The boundary map, acquired by a boundary detection algorithm, is used to restrict the results of the boundary branch. Extensive experiments indicate that TBIB-Net achieves state-of-the-art performance on publicly available polyp datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A portable device known as a Wood's lamp, which emits ultraviolet light, was attached to a smartphone. This device captures images of the skin, highlighting keratotic plugs with a bright glow. The images were then processed using adaptive binarization to count the number of keratotic plugs. The count fluctuated over several weeks, varying with the sensitivity parameters of the binarization process. To optimize the count, a mixed Gaussian model approximates the distribution of the counts and determines the sensitivity parameter that minimizes the approximation error. Understanding fluctuations in keratotic plugs can motivate self-medication for acne treatment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The subject's face was photographed, and their skin translucency was assessed using a beauty advisor. Small square regions were cropped from the captured images, and these cropped images along with their translucency values were used to train a regression model. The translucency of the skin of a new subject can be evaluated using a trained model. This computer-based assessment can complement the evaluation by a skilled beauty advisor and can help clarify skin translucency and train new beauty advisors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study aims to develop a dance evaluation system based on deep learning techniques. Using the image analysis technology through OpenPose, the system extracts the joint points of dancers and calculates the angles between various body parts to predict dance performance. First, pre-processing methods, including joint detection, filling missing values with linear interpolation and applying data augmentation such as space shift, data mixing, and noise smoothing to enhance the diversity of the dataset, are applied. Then, key features, such as the Euclidean distance, the dynamic time warping distance, and statistical data differences, are extracted from the processed data. These features are then fed to deep learning models like the long-short time memory (LSTM), the gated recurrent unit (GRU), the convolution neural network (CNN), and the temporal convolutional network (TCN). K-fold cross-validation is employed to evaluate model performance and the prediction results are combined through a weighted average. Finally, the predicted results are compared with the actual dance scores, and evaluation metrics such as the mean squared error, the mean absolute error, and the R-square are used to assess the accuracy and stability of predictions, constructing a comprehensive dance evaluation system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As intelligent devices become increasingly prevalent in our daily life, the requirement of privacy has been significantly increased. To address the issue of privacy protection, the topic of adversarial attack appeared in recent year. Initially, adversarial attack was predominantly applied to image recognition. However, due to the unique characteristics of audio data, the attacks suitable for images, e.g., additive perturbations, may not be applicable in audio cases. The goal of this study is to perform adversarial attack on speech signals such that they cannot be recognized by automatic speech recognition (ASR) systems but still be identified by humans. We introduce several distinct methods for noise addition and precision-reducing to generate adversarial examples for ASR systems. The proposed approach leverages audio features extracted through filtering and time-frequency transformations. The adversarial samples generated using the proposed methods not only retain their intelligibility for human listeners but also achieve a 100% success rate in blind attacks against ASR systems with unknown architectures and parameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Remote Photoplethysmography (rPPG) offers a promising non-invasive solution for vital sign monitoring, including blood pressure (BP) estimation, by extracting cardiovascular information from facial videos. This study presents a hybrid approach to BP estimation by comparing three widely used methods in rPPG analysis: the green channel method, CHROM (Chrominance-based method), and POS (Plane-Orthogonal-to-Skin method). The Green method relies solely on the intensity of the green light reflected from the skin, while it works better than Red and Green, but it is highly depended to skin tone and ambient light. At the same time, CHROM and POS exploit multi-channel color signals to enhance signal extraction under varying lighting conditions. We compare these methods' performance in signal quality, noise resilience, and their effectiveness in estimating blood pressure. Our results demonstrate that the hybrid combination of CHROM and POS methods yields improved accuracy by obtaining the MAE of 8.7bpm, and robustness over traditional single-channel approaches. This provides a step forward in practical, non-invasive BP estimation for healthcare applications. This comparative analysis highlights the strengths and limitations of each method, offering insights into their applicability in real-world scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Face detection is a very important process in facial image processing. There are many existing face detection algorithms, however, we observe that there are still a lot of room for improvement in the blurred scenario, since blurred faces have much fewer meaningful features compared to clear ones. In this work, we propose a detection framework for blurred faces using several image processing techniques. First, multiple facial images with different extents and approaches of blurriness are generated for the training and validation sets. With them, several neural network models with different architectures, including YOLO and the DenseNet, are trained. Finally, some geometric and color relationships are examined in order to eliminate the redundant face candidates. Moreover, we also conduct an experiment that involves ensemble learning. The experimental results show that our method is superior to the state-of-the-art face detection methods in dealing with blurred faces, and we can boost the overall performance for face detection effectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The training process of classification models is commonly based on real-world data and models may learn certain spurious relationships, leading to an over-reliance on features not directly relevant to the subject. In this paper, we propose a novel framework for generating counterfactual images. Our framework enables us to confirm whether classification models are sensitive to changes in the features under consideration. We have introduced the latest caption and image generator, which enables better counterfactual image generation as well as more efficient processing. Experimental results show that the counterfactual images generated by our method have superior feature perturbation capabilities, which allows us to assess the robustness of the classification model more effectively. The main improvements our framework offers over existing methods are the generation of higher-quality counterfactual images and the reduction of the computational cost of this process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of recording and storage hardware, efficient methods to retrieve the desired videos are required. Among the video retrieval methods, cross-modal video retrieval that aims at retrieving a target video from natural language queries has attracted attention. Cross-modal video retrieval is realized by learning a common representation of videos and texts so that their similarity can be calculated directly only based on their contents. However, traditional cross-modal video retrieval approaches only focus on the global features and ignore the fine-grained information such as a single action or event in the video. In this paper, we propose to use the large language model to extract rich action and event information from the text and match them with the paired video hierarchically. We design a prompt to get the semantically informative action and event components in the form of natural language. Experimental results demonstrate the effectiveness of our method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The superimposing 3D display, which can be observed from 360 degrees, can show AR 3D contents to many people with the naked eye. Therefore, this display is expected to be used for advertising and exhibition. In the previous study, the display consisting of a high-speed projector, a diffusion screen and a thin strip mirror is proposed. However, diffuse light that is not reflected by the mirror distracts viewers who observe the 3D image. This problem is solved by using optical elements with the necessary properties for 3D display and not using a diffusion screen. In this paper, we propose a novel 3D display consisting of a specially designed wedge guide and a high-speed projector instead of a diffusion screen and a thin strip mirror. We will confirm that the wedge guide has the necessary properties for 3D display and that the proposed display is able to display 3D images by simulation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Clothing is one of the attributes of information that expresses a person's tastes and preferences. Therefore, automatically identifying and categorizing clothing can have significant marketing applications. In this paper, we developed a communication system based on clothing recognition as an example of multimodal communication with humans using MLLM (Multimodal Large Language Model). We examined the communication part such as recognizing the customer's attire and calling out to them, saying what they think of the outfit, and complimenting them. We planned the flow and content of the exchange. In addition, the system development and its effectiveness were verified. Experiments were conducted at stores, and results such as customers becoming interested in the system and the number of smiles increased after using the system were obtained. From these, the usefulness of the system was shown.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Clustered Federated Learning (CFL) is a modern approach that addresses the heterogeneous settings in Federated Learning. However, conventional models often fall into overfitting to the data within the cluster due to insufficient attention to communication between clients in different clusters. To tackle the issue of insufficient knowledge sharing in CFL, we present a novel approach to CFL that facilitates inter-cluster communication. Our proposed method promotes effective knowledge sharing among clusters while also improving the efficiency of clustering by layer separation. Experimental results show that our proposed method significantly enhances the model’s performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Diminished Reality (DR) is a technology that conceals real-world objects by blending them with background information. This technology has applications in various fields, including simulating furniture arrangements at home before purchase. In such a system, existing furniture is first concealed using DR, and then new virtual furniture is displayed onto the concealed scene using Augmented Reality (AR). This allows users to visualize how new furniture will look in their home, even in spaces where items are already placed, therefore it makes it easier to make informed purchasing decisions. However, existing approaches are difficult to accurately account for and correct the environmental effects of concealed objects, particularly those that influence lighting, such as light equipment. This study proposes a method to not only conceal light equipment within a scene but also to correct the associated lighting effects, and it enhances the realism of the DR process. The method first estimates the light source’s 3D position using depth information captured by an RGB-D camera and applies a brightness correction based on the estimated position and its distance from the light source. Then, Generative Adversarial Network (GAN) is used to perform the DR process and conceal the light equipment effectively within the corrected image. The results show that applying the proposed brightness correction reduces the inconsistency in lighting effects and leads to a more realistic concealment of the light equipment and improves the overall visual quality of the scene.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The shape similarity assessment of round eave tiles provides crucial clues for understanding regional interactions and technological dissemination. However, many excavated round eave tiles are damaged, and only a few remain in complete shape. This study proposes an initial position estimation method suited for partial shapes that include the boundary of the inner field. A method of estimating circles from the boundary of the inner field and performing partial matching along the circumference of the complete shape is used. Experiments confirmed that a boundary of the inner field equivalent to 25% of the circle is necessary.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Malaysia's oil palm industry faces many challenges in sustaining manual oil palm harvesting operations. This work investigates an effective oil palm Fresh Fruit Bunch (FFB) image processing method for robot harvesting automation. This research explores the proposed image processing method that first detects the Fresh Fruit Bunch (FFB) category which involves 6 different categories of FFB growth stages and then detects its 6D pose estimation for harvesting. Next, this research proposes a novel image processing framework that utilises the convolutional neural network deep learning classification and is followed by markerless feature-registration-based oil palm FFB for 6D pose estimation with the public FFB dataset. Furthermore, this work introduced view obstruction to the public FFB dataset as noise for practical robot harvester applications in plantation field operation. Moreover, the experiment results show the proposed model can maintain a high F1 score performance up until 70% of view obstruction before the F1 score performance is reduced.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Scene-adaptive imaging has been introduced as a new shooting architecture that facilitates dynamic control of the shooting conditions on the basis of local subject characteristics. This imaging architecture efficiently captures wide field-of-view videos that contain various subjects exhibiting diverse textures and movements. The shooting conditions, such as resolution and frame rate, are changed frame-by-frame for each local area of the image sensor. To achieve higher frame rates when shooting moving subjects, moving areas need to be predicted accurately in real time inside the camera. The block matching methods used in video coding are difficult to apply to motion detection owing to their long signal processing time. We propose three motion area determination methods for scene-adaptive imaging with relatively lightweight signal processing: a) frame subtraction, b) optical flow, and c) event-based vision sensor as a sub-sensor. We present the experimental validation of the effectiveness of each method in terms of accuracy and processing speed. The frame subtraction method can approximate the motion area with dilation process and has a short processing time. The method for predicting the direction of movement using the optical flow effectively reduces false detections. The event-based vision sensor reliably detects movements within a short processing time. The proposed evaluation method based on the binary classification index can effectively compare the performance of each method. The findings afford insights related to motion area determination techniques to enable quick and accurate identification of moving areas that are necessary for scene-adaptive imaging technology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study proposes a model with YOLOv8x as the core framework and targeted optimization to achieve low-cost road defect detection and classification. First, the diversity and robustness of the dataset were enriched through Data Augmentation technology, which effectively improved the model’s ability to recognize different types of road defect after training. Secondly, for Convolutional Neural Network (CNN), the Feature Pyramid structure is introduced in this study, which enables the model to capture and identify the subtle defect features in road images more accurately, further improving the accuracy and effect of detection. The experimental results show that the improved scheme has achieved obvious results, the F1 score increased from 0.58350 of the traditional YOLOv8 to 0.62936. The model can identify various types of damaged pavement more comprehensively and accurately. It provides a more reliable and effective solution for road defect detection in various road conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we made a basic study for the improvement of the screen system to raise the stability of mist flow and to produce highly bright reproduced images without using mist controlling fan. As this result, we succeeded in constructing a reformed screen system adopted a nozzle producing strongly directed mist flow with voluminous form to the depth direction. It enabled us to generate a mist screen of the form like air curtain. We constructed a system for raising the stability of the screen adopted one unit of ultrasonic vibrator, and another system for raising the brightness of recovered projected images adopted two units of ultrasonic vibrator, and confirmed the effectiveness of these systems. Like this way, we achieved improvement in stability and brightness compared to previous systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a method for generating expert comments on human motion in sports videos using a Multimodal Large Language Model (MLLM). In the proposed method, a pretrained Vision Transformer (ViT) encoder and a transformer encoder realize the extraction of tokens from sports videos, enabling expert comment generation by considering temporal information. Experiments using basketball videos in the Ego-Exo4D dataset validated the effectiveness of incorporating temporal information in expert comment generation, demonstrating the proposed method’s superiority over existing techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In baseball, accurate pitch type classification is essential for strategic decision-making by teams and analysts. Traditional methods rely heavily on ball trajectory tracking using radar and high-speed cameras, which are costly and accessible only in professional leagues. In contrast, this paper presents a skeleton-based approach to classify pitch types using pose estimation and spatial-temporal modeling. We extract key joint coordinates of the pitcher using OpenPose and model their body movements over time using a Spatial-Temporal Graph Convolutional Network (ST-GCN). Our method is evaluated on the publicly available MLB-YouTube dataset, achieving 68.2% accuracy in classifying six pitch types, and outperforming state-of-the-art methods that rely on full-frame data with 3D CNNs. By focusing exclusively on pitcher’s skeletal information through graph-based modeling, our approach improves classification performance, while showing robustness in binary tasks, reaching 85.7% accuracy for fastball detection and 80.8% distinguishing fast versus slow pitches. Our method’s performance underscores the effectiveness of relying on body mechanics for pitch classification, demonstrating that accurate results can be achieved without the need for costly ball trajectory data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In machine learning, the reliability of an analytical method is crucial in validating the assumption of linearity for any linear regression model. The existing analytical method of proving the assumption linearity, such as Pearson’s Correlation Coefficient (PCC), Spearman’s Rank Correlation Coefficient (SRCC) and Kendall’s Tau Correlation Coefficient (KTCC), has its limitation as it does not work in monotonic relationship graph. In this paper, we propose a Normalized Least Dependent Difference (NLDD) method to improve the limitation of existing linearity method in identifying monotonic relationship graph. By calculating the difference between each data point and its predicted value on the regression line, we can determine how much the predicted value deviates from the observed value. A consistent difference between each data point and its predicted value, represented by a relative standard deviation in the y-axis that is near to its mean, suggests that the model accurately reflects the relationship between the dependent variable and the independent variable. Our findings show that our NLDD is effective in identifying linearity in linear relationship graph and non-linearity in monotonic relationship graph.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Eye tracking technology is recognized increasingly for its ability to capture nonverbal emotional cues by analyzing eye movements and gaze patterns. However, traditional eye tracking systems are often expensive and have limitations in terms of usability. Additionally, blink patterns are key indicators of emotional and psychological states, but current systems fail to incorporate blink detection effectively. This paper proposes a cost-effective system that combines eye tracking and blink detection using a standard webcam to estimate psychological states. The system uses biometric information such as changes in eye gaze and blink frequency to provide a more comprehensive analysis of psychological states. This study introduces a new method consisting of face and eye feature point extraction, gaze tracking, and blink detection using the Eye Aspect Ratio (EAR) to evaluate the state of eye open and closed. For gaze position estimation, GazeNet pre-trained on the MPIIGaze dataset and along with individual calibration data are used. Experimental results demonstrate that estimation accuracy is improved significantly by calibration, and that spatial factors, such as the relationship between the camera and the display screen, influence performance. The results of these experiments suggest that the integration of blink detection and eye tracking in this system could contribute to the prediction of emotional and psychological states. Future works will focus on further validating the system’s reliability and its potential for emotion prediction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the past, paper was a valuable resource, so there are ancient documents where text is written on both the front and back sides of the paper. Among these, the documents written on the reverse side of the paper, are referred to as "Shihai-monjo." In particular, when analyzing the content of documents with a bag-bound structure using images taken from the top of the document, a significant issue arises from the overlapping of text from the front and back sides, causing the Shihai-monjo to become incomplete. In this study, we addressed this issue by applying an image inpainting method based on deep learning to restore the missing Shihai-monjo.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a MR-based self-learning system, which assists in learning regular polyhedrons and conic section with selected virtual 3D objects. Displaying complex virtual 3D objects in a virtual space, the system provides actual experience of manipulating and observing those virtual objects to the user without 3D models. The system, installed on MetaQuest3, is real self-learning tool for spatial geometry for secondary school students. User tests and questionnaires were conducted with five science students, and all of them showed improvement in their scores, indicating that the system is a useful teaching tool.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Adversarial patches pose a significant threat to image recognition systems, exploiting small, maliciously crafted patterns to mislead models into incorrect classifications. Unlike traditional adversarial examples, adversarial patches operate under real-world constraints, making defense against them a challenging task. This study introduces an alternative defense strategy that employs partial masking to mitigate the impact of adversarial patches without relying on precise patch detection. Various global masking patterns were evaluated to determine the best balance between adversarial robustness and image integrity. The proposed method effectively reduces attack success rates while maintaining reasonable classification accuracy, demonstrating its potential to ensure protection in real-world applications without dependency on the detection methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recently years, while photorealistic human representation techniques have evolved in the fields of VR and MR, the realism of human hair remains a challenge. Although there are various methods for hair generation in CG model creation, they involve high production costs, and there is a need to develop practical and efficient approaches. In this study, we propose a simple method for automatically generating realistic hair mesh models based on point cloud data obtained from scans of actual human heads. It is assumed that the scan uses a structural light projection technique that can measure detailed irregularities under natural light. The hair mesh model is generated by performing polygon construction considering the features of the hair. The process of creating a mesh for the entire hair is carried out in two steps. First, a base mesh representing the features of the hair is constructed, and then an extended mesh with adjusted hair volume is constructed. The experimental results confirm that the proposed method can generate a voluminous and realistic mesh with the characteristics of the unevenness of hair.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, there has been a growing interest in using e-sports to maintain the physical and mental health of the elderly. In response, we developed the interactive game "Sancoro Bingo," which incorporates elements of dementia prevention. We hypothesized that this game could become a viable e-sports theme for the elderly. Therefore, we researched the potential of "Sancoro Bingo" as an e-sport for older adults. We conducted a comparative study of "Sancoro Bingo" against two other games used in elderly e-sports: "Master of the Drums" and "GRAN TURISMO." A survey was conducted among participants to gather feedback. The results showed that "Sancoro Bingo" and "Master of the Drums" scored similarly or higher in terms of communication, a key aspect that elderly individuals seek in their gaming experience, while also improving the competitive elements typical of e-sports.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a new lightweight hybrid video codec consisting of a conventional video codec (HEVC or VVC), a lossless image codec, and our new restoration network. The encoder is composed of a conventional video encoder and a lossless image encoder. It transmits a lossy-compressed video bitstream along with a losslessly compressed reference frame. The decoder is constructed with corresponding video/image decoders and a new restoration network, which enhances the compressed video in two-step processes. The first step involves using a network that has been trained with a video dataset to restore the details that are lost by the conventional encoder. After this, we enhance the video quality by using a reference image that is a losslessly compressed video frame. The reference image provides video-specific information, which can be utilized to better restore the details of a compressed video. Experimental results show that the overall coding gain is comparable to recent top-tier neural codecs while requiring much less encoding time and lower complexity. Our code is available at https://github.com/myideaisgood/hybrid_video_compression_rhee.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study presents an innovative approach that utilizes neural network-based techniques to address the Ordered Escape Routing (OER) problem. The OER problem holds a crucial position in integrated circuit design, requiring multiple pins to be connected to the boundary of a chip in a specific order while ensuring that the routing paths do not cross. Additionally, the total wire length must be minimized during the process. To tackle this challenging problem, we propose a novel routability-driven method that leverages neural networks (NN) to predict feasible routing paths that satisfy the required conditions. This approach not only ensures non-crossing paths but also strictly adheres to the specified pin connection order, while further minimizing the total wire length under these constraints. The core of this technique lies in the efficient predictive capabilities of neural networks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We have developed a video conversion system between the standard dynamic range (SDR) videos and high dynamic range (HDR) videos as well as wide color gamut (WCG) videos. Our system is equipped with a programmable 3D look-up table that allows it easy to design and execute immediately video conversion functions needed at the video production site.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a genetic algorithm (GA) to address the non-crossing escape routing problem on printed circuit board (PCB) of grid pin arrays (GPA), a critical challenge in modern electronic circuit design. The algorithm is designed to optimize routing efficiency by minimizing total wire length while ensuring all connections remain free from crossing violations. The fitness function evaluates individuals based on these criteria, ensuring that solutions align with the stringent requirements of GPA layouts. A tournament selection mechanism is employed to identify and propagate the fittest individuals, while crossover operations are conducted on identical nodes to improve solution viability. To introduce diversity and prevent premature convergence, random mutations are incorporated, enhancing the algorithm's ability to explore a wide solution space. This paper presents a systematic approach to solving the non-crossing escape routing problem in GPA, highlighting the effectiveness of the GA in achieving an optimal balance between routing efficiency and wire length minimization.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study proposes a method for creating projection images for projectors in a volumetric display that presents 3D images by overlapping projection rays. Conventional methods have limited the presented 3D images to the silhouettes of objects. However, by generating projection images based on the back-projection method of Computed Tomography, it is theoretically possible to project objects with internal details or figures drawn on horizontal planes. In this paper, simulations of the back-projection method were conducted using MATLAB to evaluate the number of projectors required to present 3D images using the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study focuses on lossless coding of still images utilizing multimodal signals. Instead of limiting the analysis to a single modality, the aim is to improve coding efficiency by exploiting correlations between different modalities. The experiments target the encoding of infrared images and their corresponding visible light images (grayscale images), with the objective of achieving more efficient coding by using the infrared image as auxiliary data for the grayscale image, rather than encoding them separately. Specifically, two encoding methods are applied to the grayscale image: single-modal encoding and encoding using multimodal data as auxiliary information. The method with the highest efficiency is selected. Experimental results on 157 images demonstrate that the proposed method reduces the encoded size by 4.25% compared to single-modal encoding. The gain of 4.25% achieved by the proposed method is significantly higher compared to the 2–3% reported in previous method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the biometric authentications is iris recognition using template matching. There are many methods proposed for iris template matching such as Local Binary Pattern (LBP) and Histogram of Oriented Gradients (HOG). In this paper, the iris matching is performed from the templates generated using the adaptive Gabor extraction feature. The comparison is made among the same eye side of the same person, different eye sides of the same person, and the eyes of the different person for better authentication. The Gabor feature extraction technique paired with the SVM classifier exhibited promising outcomes. With an accuracy of 91.4% and a precision of 94.7%, Gabor features showcased their capability to enhance iris recognition accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Unplanned readmission within 14 days is a critical indicator of healthcare quality, impacting patient risk, costs, and hospital reputations. This study explores the use of machine learning to predict unplanned hospital readmissions within 14 days and explainable artificial intelligence techniques to identify key risk factors. Patient data, such as age, gender, and hospital stay length, were used to create a prediction model based on artificial neural networks. Techniques like class weighting were applied to improve the prediction of less common cases. Shapley Additive Explanations and Integrated Gradients methods were used to explain the model, making it easier to understand and use in clinical settings. The results show that the model improves the accuracy of readmission risk predictions, helps healthcare professionals find high-risk patients early, and supports timely interventions to improve care quality and reduce readmissions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Proteus effect refers to the phenomenon where the appearance of an avatar influences the psychological traits and behaviors of its user. In this study, we examined the potential of the Proteus effect to enhance concentration. Participants wore a robot avatar and performed a concentration measurement task. To evaluate concentration, we employed EEG-based assessments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the research area of image super-resolution, Swin-transformer-based models are favored for their global spatial modeling and shifting window attention mechanism. However, existing methods often limit self-attention to nonoverlapping windows to cut costs and ignore the useful information that exists across channels. To address this issue, this paper introduces a novel model, the Hybrid Attention Aggregation Transformer (HAAT), designed to better leverage feature information. HAAT is constructed by integrating Swin-Dense-Residual-Connected Blocks (SDRCB) with Hybrid Grid Attention Blocks (HGAB). SDRCB expands the receptive field while maintaining a streamlined architecture, resulting in enhanced performance. HGAB incorporates channel attention, sparse attention, and window attention to improve nonlocal feature fusion and achieve more visually compelling results. Experimental evaluations demonstrate that HAAT surpasses state-of-the-art methods on benchmark datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Escape routing is a critical task in the design of printed circuit boards (PCB) and integrated circuits (IC), aiming to establish effective connections among multiple pin points while avoiding path overlaps and interference from obstacles. This paper proposes a Deep Reinforcement Learning (DRL)-based solution utilizing a Deep Q-Network (DQN) to address the escape routing problem. The problem is modeled as a Markov Decision Process (MDP), and the agent learns effective routing strategies through interactions with the environment. Experiments were conducted in a 25×26 simulated grid environment, testing scenarios with 30, 60, and 90 pin points. This study highlights the potential of Deep Reinforcement Learning in solving escape routing problems, offering a novel approach to addressing routing challenges in PCB design.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we proposed a method to generate a more realistic image of buildings drawn by the three-point perspective method than the conventional perspective projection when the building is observed in real space. In one-point and two-point perspective, it has been found that a high sense of realism can be achieved by image generation methods that use a magnification function that represents the sense of size as perceived in real space. In three-point perspective buildings, it was necessary not only to use the magnification function but also to adjust the position of the vanishing points which represent the height of horizontal line.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video super-resolution (VSR) is a technique used for generating high-resolution (HR) frames from the corresponding lowresolution (LR) frames. With the advancement in deep learning techniques and the popularity of high resolution display applications, VSR has drawn much attention in recent years. This paper presents an implementation and evaluation on the classical deep learning-based VSR framework, called TecoGAN (TEmporally COherent GAN), relying on learning temporal coherence via self-supervision for GAN (generative adversarial network)-based video generation. It has been found while applying VSR to enhance the video quality, possibly inherent noises within the frames would be enhanced simultaneously, resulting in unpleasing visual experience. To tackle this problem, we propose to integrate noise removal and VSR for obtaining the noise-removed HR video. Our improved VSR framework has been shown to outperform the original TecoGAN quantitatively and qualitatively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study presents an innovative application utilizing image recognition technology to automatically identify values displayed on electronic blood pressure monitors. This study uses deep learning, specifically a Mask-RCNN-based algorithm, to accurately detect and recognize numerical readings on monitor panels. By incorporating advanced image processing techniques such as mask binarization, Canny edge detection, and Hough transform, the system corrects image distortions caused by varying camera angles, transforming them into standardized rectangular formats for precise recognition. Rigorous testing under diverse lighting and angle conditions ensured consistent high-precision results. This solution demonstrates significant potential for automating medical data management through image recognition technology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Converting rough sketches into line drawings is a mechanical and simple process in illustration production. Therefore, it is desirable to automate the process. Previous methods for automatically generating line drawings from rough sketches produce only uniform line drawings although rough sketches contain information about the strength of lines. In this paper, we propose a method that automatically converts rough sketches into non-uniform line drawings by estimating the thickness and the directions from the rough sketch and adding this information to the uniform line drawing. We evaluate the proposed method, and the generated non-uniform line drawings reflect the thickness and the direction of the rough sketch. While adding thickness manually takes about 20 to 30 minutes, the proposed method can generate a non-uniform line drawing in 1 to 2 seconds. It is a significant reduction in processing time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We aim to evaluate dribbling movements in soccer by analyzing the movement of both the player and the ball. For this purpose, we have extended the existing skeletal estimation model, OpenPose, and propose a new method to recognize the posture of human skeletons holding the ball, which we have named the ”Dribbling Player Model.” A distinctive feature of this model is that the ball is incorporated as part of the human skeleton. To train this model, we handcrafted a video database of dribbling actions. We annotated videos of players who were dribbling with a skeleton that included the ball, and videos of players who were not dribbling were annotated without the ball. Our experiments have confirmed the success of estimating the dribbling postures of players who are dribbling.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes methods for extracting attack scenes and replay scenes in a handball match video to reduce manual tagging time. The system identifies the attack scenes through goal object detection, allowing analysts to skip non-attack initiation scenes. The replay detection algorithm utilizes player size analysis within video frames. According to the experimental results, the attack scene extraction method achieved 100% accuracy, and implementing the skip function between attack initiation scenes reduced the manual tagging time for a one-hour video from approximately 30 minutes to 15 minutes. The replay extraction achieved 99.8% accuracy across 146,051 frames.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This abstract discusses methods and techniques for underwater image restoration. Underwater images are often affected by factors such as light scattering, color dispersion, and suspended particles, leading to blurriness, distortion, and difficulty in recognizing features. In order to improve the quality of underwater images, researchers have proposed restoration techniques based on mathematical models and computational methods. These include steps such as removal of scattered light, color correction, filtering, and contrast enhancement to enhance the clarity and realism of the images. Additionally, the application of deep learning techniques in underwater image restoration has shown significant progress. Through extensive training with large datasets, models can automatically learn and adapt to the restoration needs of different underwater environments. In summary, this study proposed a Vision Transformer-base UNet model for underwater image restoration which test on Large Scale Underwater Image(LSUI) dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a method for estimating the pose of a bicycle rider from images by integrating the relationship between bicycle parts and the human body and modeling them as a single skeleton for traffic analysis. The feature of this method is that the bicycle rider and the bicycle are represented as an integrated skeletal model instead of separating the skeletal representation of the bicycle parts and the human body. We call this the Bicycle Rider Model. To train the Bicycle Rider Model, we handcrafted our video dataset. In the dataset, we annotate the center of the handlebars, the crankshaft, and the front end of the front wheel as the new keypoints of the skeleton. We confirmed that the Bicycle Rider Model can estimate the pose of the bicycle rider.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Contrastive Language-Image Pre-training (CLIP) is vulnerable to adversarial attacks which cause misclassification by subtle modifications undetectable to the human eye. Although adversarial training strengthens CLIP models against such attacks, it often degrades their accuracy on clean images. To tackle this challenge, we propose a novel defense strategy that leverages human brain activity data. The proposed method combines features of brain activity with those of adversarial examples, which enhances the robustness of CLIP while maintaining high accuracy on clean images. Experimental results demonstrate the effectiveness of our method for the accurate retrieval of clean and adversarial images. These results highlight the potential ability of brain data to overcome the existing challenges of adversarial defense in foundation models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a method for detecting spike and toss events in a volleyball game video, addressing the time-consuming task for a less-experienced analyst. The approach combines analyzing ball trajectory and detecting player movements near the net. It uses TrackNet to detect and track the high-speed ball, and deep learning to detect player regions. The height of the ball is used to identify the toss event from peaks in the ball’s y-coordinate, and the sizes of the player regions are monitored to detect the spike event. The method achieved an F1-score of 0.82 in volleyball video evaluation, and it indicates high accuracy and potential to streamline the volleyball analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In soccer games, the cognitive ability to understand a situation is essential. In this study, we use an HMD-type VR system in combination with an EEG-measuring device to measure brain waves while experiencing a specific soccer situation in a VR space and analyze the brain wave characteristics during cognition. To obtain accurate EEG measurements, we have to cope with the unexpected influence of body movements that occur while wearing the HMD on EEG measurements, including head rotation and gaze movement, which typically happen in HMD VR experience. In this research, we build a preliminary system that can measure EEG together with head rotation and gaze movement. Based on the alpha and beta band power of EEG frequency analysis, we discuss the influence of body movements on EEG analysis during different types of pass experiences. We found that the band powers of active experiences are larger than those of passive experiences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolutional sparse coding (CSC) reconstructs a signal using given spatial patterns by identifying their sparse distributions through a convex optimization, achieving notable success in image processing applications. This paper proposes solving the recently proposed L1-L1 CSC by applying primal-dual Douglas-Rachford (DR) splitting, which can converge faster than the traditionally applied Alternating Direction Method of Multipliers (ADMM). The proposed CSC demonstrated comparable convergence under an intuitive hyperparameter setting, verifying the correctness of our formulation and suggesting the potential for convergence acceleration with optimal settings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In sports training, coaching by sports instructors is a good way to learn appropriate movements. However, there are fewer instructors for minor sports than for popular sports. The lack of sports instructors is a major problem that causes the decline of sports culture. Darts are a type of minor sport, and there are few instructors. Therefore, most players must study by themselves using books and movies because it is hard to find suitable instructors. In this paper, we propose a dart training system that considers the player's physical mechanics to help players improve their throwing form. We use machine learning algorithms to create the user's skeletal estimation model and select a teacher model based on the user's physique. Thus, the player him/herself can compare the self-throwing motion with that of the instructors. Furthermore, we conduct experiments to evaluate the system's contribution to improving user skills.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes the restoration performance of neural network models for the 'missing pixel restoration method' used in lightweight compression codecs. In this method, pixels in video frames are selectively removed during transmission and restored using a neural network model. We evaluate the restoration performance based on 2×2-pixel missing patterns and various missing rates from 1/12 to 11/12. The results indicate a significant decline in restoration performance when the missing rate exceeds 6/12.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper explores a phase component compression method for computer-generated hologram data. The method performs template matching at each pixel to exploit spatial correlations of the phase component. Then, the probability distribution of the phase component is modeled to enable efficient entropy coding. In this framework, the discontinuous nature of phase values mapped to integers can deteriorate the coding efficiency. To cope with this problem, in this paper, both the template matching and the probability modeling processes are modified considering the circular distribution characteristics of the phase component.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Red tides are phenomena caused by the abnormal proliferation of marine plankton, leading to massive fish deaths and significant damage to the fishing industry. Currently, detection and quantification of plankton responsible for red tides are performed primarily through manual inspection using optical microscopy, which requires considerable time, effort, and expertise in species identification. This study explored the use of object detection methods to classify various marine plankton species from microscopy images and attempted to automate the detection of red tide phytoplankton.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
3D human pose estimation (HPE) has improved significantly through Graph Convolutional Networks (GCNs), which effectively model body part relationships. However, GCNs have limitations, including uniform feature transformations across nodes and reliance on skeleton-based graphs that may miss complex motion patterns. To address these issues, we introduce a Multi-Normalization Residual Graph Convolutional Network that fine-tunes the graph structure through affinity multi-normalization and activation, allowing the representation of additional connections beyond the skeleton. Our extensive ablation study shows that this approach enhances performance with minimal overhead while maintaining the same model size, consistently outperforming state-of-the-art techniques on two benchmark datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Anemia is one of the most common social problems, as one in ten people is said to be anemic. There are several methods to diagnose anemia, such as color observation of the eyelid conjunctiva and blood sampling, but both require specialized knowledge and are not easy to perform in terms of both time and cost. In this paper, we propose a new method for estimating anemia from facial images captured by a hyperspectral camera. First, the spectrum of the eyelid conjunctiva region is obtained from the hyperspectral image. Next, the hemoglobin concentration is estimated by finding the ratio of reflectance in a specific wavelength band. Experiments with different water contents of blood-like samples confirmed the possibility of concentration estimation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a new approach to digital marbling, an artistic technique of floating ink on water to create unique patterns. Traditional marbling involves expressive techniques, such as blowing on the ink to influence its movement. However, digital systems have not fully replicated this effect. This study enables the interaction by blowing, allowing more intuitive interactions. The system uses an LCD tablet that serves as a simulated water surface with ink, and a camera that is used to detect facial landmarks. The obtained coordinates are then used to solve the PnP (Perspective-n-Point) problem to estimate the position and direction of the user’s breath on the tablet’s surface. We tested the system with an LCD tablet measuring 26.8cm in height and 47.6cm in width. The system estimated the position of the blow with an average error of approximately 3cm, and the blowing direction with an average error of approximately 13 degrees, although the accuracy decreases when blowing near the corners of the tablet. Additionally, the system cannot detect the onset and cessation of blowing. Future work will focus on addressing these two issues.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We previously implemented an inexpensive imaging system that combines a single real camera with a mirror array located along a paraboloid. It allows us to robustly acquire dynamic light fields composed of multi-view videos by providing a virtual camera array, where its viewpoints exist in the mirrors. Actually, as moving the real camera to the focus of the paraboloid, virtual viewpoints get equally-spaced to achieve multi-view imaging with structured disparity. In this paper, for such better multi-view imaging, we discuss two methods for pose estimation of the real camera, where reflections of a checkerboard on the mirrors are analyzed. Both methods determine virtual viewpoints by Zhang’s method to utilize them for estimating poses of our real camera and mirror array. Experimental results of simple simulations demonstrate that we can enjoy more ideal multi-view imaging of 3D scenes after rectifying their poses based on the estimation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we developed a method with the aim of making virtual objects appear more realistically in the real space. This method allows users to paint colors interactively onto a 3DCG object displayed in three dimensions in mid-air by touching it with their fingers. The color applied to the CG object reflects the color of the touching finger. Because the movements and color of the finger in the real space are directly reflected onto the CG object in the virtual space, it is expected that the boundary between the virtual and real spaces will become blurred, thereby enhancing the sensation that the CG object exists in the real world.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces a Channel Attention-based Bilateral Feature Pyramid U-Net (CABFPU-Net) for efficient change detection tasks in remote sensing. CABFPU-Net comprises three main networks: a backbone, a neck, and a head. The backbone, based on DenseNet, extracts primary features from input images. The neck network leverages channel attention to efficiently process multi-scale features, accentuating regions of change, and culminating in generation of multi-scale change attention features. This attention mechanism efficiently extracts relevant features by applying channel-wise attention, reducing dimensionality and enabling faster change detection. Finally, the head network integrates these features to produce a detailed change map. CABFPU-Net achieves a 53.9% reduction in processing time compared to CADNet on the LEVIR-CD dataset while maintaining F1-score of 91.3%, thereby demonstrating its efficiency and accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Completing standardization of the VVC in July 2020, the Joint Video Experts Team of ISO/IEC SC29/WG05 launched the "Beyond VVC" exploration by developing the Enhanced Compression Model (ECM) which keeps integrating newly developed advanced encoding tools together. In inter prediction, the Advanced Motion Vector Prediction (AMVP) method encodes motion vectors using Motion Vector Predictor (MVP) selected from MVP candidate list. In the current ECM of version 11.0, unless zero MV is added at the first place into the candidate list while at least one neighboring block has zero MV, zero MV has very low probability to be selected as MVP, thus potentially resulting in reduced coding efficiency. To overcome this limitation, we investigate a method that puts zero MV into the list unless the list is already full so that the zero MV can be subject to MVP refinement and reordering process to improve coding efficiency. Experimental results demonstrate BDBR gain of 0.03% in the luma channel.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we investigate an explicit transform selection method that merges transform sets of the two intra prediction modes used for predictor blending, aimed at the new blending intra prediction (BIP) tools (i.e., DIMD, TIMD, OBIC) for the enhanced compression beyond VVC capability. The existing multiple transform selection (MTS) mechanism in ECM is designed based on the characteristics of regular intra prediction modes, which differ from those of BIP modes. Therefore, applying the existing selection method of multiple transform set to BIP modes is ineffective. The proposed method merges multiple transform sets corresponding to the two intra prediction modes used for predictor generation and makes it possible for a decoder to decide which transform kernel pair to use without explicit signaling. It enables the use of more diverse transform kernel pairs. Experimental results show performance improvements of -0.01%, 0.00%, and 0.01% for the Y, Cb, and Cr components, respectively, compared to the ECM 13.0 test model under the All Intra (AI) configuration. These results emphasize clear need for an improved MTS scheme tailored to BIP tools as well.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, with the rapid development of the artificial intelligence technology, many computer vision applications have been proposed, including Virtual Reality (VR), Augmented Reality (AR), and the metaverse. One of the key technologies to realize these vision applications is "human 3D mesh reconstruction." This paper aims to research the task of reconstructing human 3D mesh model from a single RGB image, with a focus on achieving good reconstruction results while reducing computational costs, thereby establishing advantages for future daily life applications. We propose an improved version of the HyperGraph Convolution Network (HGCN), called the Swift HyperGraph Convolution Network (Swift-HGCN), which allows for faster transmission of information across different parts of the human mesh model. Additionally, we apply the Mamba module to address the high computational complexity caused by the self-attention mechanism in Transformers, while still maintaining good accuracy. Moreover, our system analyzes multi-scale image features and perform multi-stage refinement to reduce reconstruction errors. In the experimental results, our method showed an average vertex position error that is 1.3mm higher than a baseline method, but used only 86.5% of the parameters and had just 17.3% of the computational complexity. This demonstrates that our approach is more suitable for environments with limited computational resources, such as embedded systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In ball sports, understanding spatial information, such as player positions and open spaces, is crucial for making informed decisions regarding when and where to pass the ball. This study develops a novel training support system for improving players' spatial cognition and skill levels. The support system employed a 360° camera positioned near the viewpoint of the passer and recreating the scene via a virtual reality device. By superimposing the dominant region (an area the target player can reach faster than any other player) and the space in which a pass can be received onto 360° images, the scenario obtained by the passer’s viewpoint can be replayed from an arbitrary line of sight. Furthermore, the system provides additional spatial information that evolves based on player movements in real time. This study proposes a novel method for extracting and visualizing spatial information and presents examples of its applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The key objective in video coding is to eliminate redundant information. To address spatial redundancy, which is a representative redundancy, intra prediction utilizes spatial neighboring information. to use spatial neighboring sample information, techniques such as directional prediction, non-directional prediction, and offline-trained matrix-based prediction is currently applied in beyond VVC capability. To achieve better intra prediction in beyond VVC capability, both improvements to the existing modes and the adoption of new tools have been implemented. Despite the improvments of existing modes and the adoption of new tools in beyond VVC capability, tools utilizing the directional intra mode still only rely on the local information for the prediction of the current block. To improve the predictors of the tools using the directional intra mode, this paper proposes a non-local refinement method of the predictor coded by directional intra mode. Block vector from adjacent sample of the current block is utilized to construct a non-local neightbor area for the refinement of the predictor. Using the constructed non-local neighbor area, the refinement process for the predictor is conducted. the refinement is performed on a row and column basis by using predefined weights. Our experiment is conducted under the all intra configuration using the first 65 frames of natural content sequences of ECM-11.0. The experiment result shows that there was 0.02% gain in Y channel, but 0.04% and 0.09% loss in Cb and Cr channel.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sound field visualization helps us understand the complicated sound propagation. However, it is difficult to visualize the sound field in detail because it requires many measurement points. In this study, based on physical models and deep learning, we propose the visualization method of a scattered sound field around a rigid object, including non-spherical geometry, with a small number of microphones. From the simulation experiments in the two-dimensional sound fields, the proposed method improved the estimation accuracy by introducing the boundary conditions in addition to the wave equation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we investigate the separable transform skip condition in the current ECM which selectively permits the identity matrix to be used as a separable transform kernel. Using the identity matrix for transform kernel is equivalent to skipping transform for the selected direction of horizontal or vertical. By noting its significantly lower utilization under the blending intra prediction (BIP) mode (i.e. DIMD, OBIC, TIMD) in the current ECM compared to the regular intra prediction while the transform skip is effective in encoding screen contents, we investigate its separable transform skip (STS) condition for enhancing the usage of separable transform skip specifically for the BIP mode. In our experiment with screen content video sequences (classes F and TGM), our new modified STS condition achieves BDBR gain of -0.02%, 0.00%, and 0.01% in Y, Cb, and Cr components, respectively. For the natural content videos (classes B, C, D, E), it does not have negative impact since it shows BDBR gain of -0.00%, 0.03%, and -0.01%, respectively for Y, Cb, and Cr components.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.