With the development of Deep Reinforcement Learning (DRL), the applications of intelligent decisions in nonsymmetric information games have has become realizable. However, it is still a challenging task in DRL for its difficulties of building efficient exploration and action-reward mechanism, especially in an environment with multiple targets. To address this problem, a generating method of multi-target attacking strategy based on environment-aware DRL is proposed in this paper. Our proposed method consists of two stages in the agent learning process. The first stage is an environmentaware module for predicting the motion trajectories of multiple targets by using the optical flow estimation. The second stage is a decision-making module for predicting appropriate actions such as choosing angles and attacking by using the improved Deep Q-Network (DQN). To solving the problem of sparse rewards in the learning process, the motion trajectories predicted in the first stage are used to build reward trajectories for accelerating the convergence rate of the algorithm in the second stage. The experiments indicate that the proposed method can effectively generate multi-target attacking strategy in our self-built simulation environment. Our method can also provide a novel perspective of intelligent decisions in three-dimension space.
With the technological development of stereoscopic display, an immersive 3D space with large size can be reconstructed more and more easily, and a 3D spatial interaction method with high-efficiency become more and more urgent. Gesture interaction, as the most natural and efficient way of human-computer interaction, can convey information very quickly and efficiently. However, the effective interaction distance of most existing gesture interaction methods is less than one meter, and can not meet the demand of the long distance 3D spatial interaction. In this paper, an efficient network named Gesture YOLO for long-distance gesture detection is proposed to achieve the small gesture object detection with improved accuracy. There are two modules in our Gesture YOLO, one is the Dual CSPDarknet53-tiny Backbone module for fusing person features and gesture features, and the other is the Progressive Multi-Scale Feature Fusion module for enhancing output features. The experimental results on our test set show that our Gesture YOLO can achieve higher gesture detection accuracy than the YOLOv4-tiny at distances ranging from 2m to 5m, and can mitigate the significant drop in gesture detection accuracy when the distance increases.
KEYWORDS: Video, 3D image processing, Video acceleration, 3D displays, 3D acquisition, Image processing, RGB color model, Solids, Networks, Image quality
Matting is a method to extract foreground objects of arbitrary shape from an image. In the field of 3D display, matting technology is of great significance. Through the study of this technology, we can extract high-quality target foreground, and then reduce unnecessary stereo matching calculation and improve the effect of 3D display. This paper primarily studies the human target in 3D light field, and proposes a real-time multi-view background matting algorithm based on deep learning. Three-dimensional video live broadcast puts forward high requirements for the real-time performance of the matting algorithm. We pre-compose a group of multi-view images taken at the same time into a multi-view combined image. The network directly carries on the background matting to the multi-view combined image and outputs a group of foreground images at one time. Because the background of the multi-view combined image is not holistic, a pre-photographed background picture without human is added to the input to assist the network for learning. In addition, we add a channel subtraction module to help the network better understand the role of the original image and background image in the matting task. The method in this paper is tested on our multi-view data set. For pictures with different background complexity, it can run about 65 frames per second and maintain a relatively stable accuracy. The method can efficiently generate multi-view matting results and meet the requirements of 3D video live broadcast.
KEYWORDS: 3D displays, Visualization, Information visualization, 3D metrology, Stereoscopic displays, 3D image processing, 3D image reconstruction, Statistical analysis, Radar, Glasses
3D light field display (LFD) which can reconstruct the emitted light field of objects in real world, provides human with natural 3D visual experience in the way of humans observing the real world without any glasses, and has attracted more and more attentions in recent years. However, most of the researches on 3D LFD focus on the improvement of display performance, and there are few researches on the performance evaluation of 3D LFD. In this paper, a 3D LFD performance evaluation method is proposed based on the performance measurement of air traffic control (ATC) task. ATC task is a comprehensive visual cognitive task, in which controllers need to obtain a lot of visual information to identify the aircraft collision situation in the dynamic scene as quickly as possible. The experimental independent variables were display conditions, including 2D display and 3D LFD with a viewing angle of 85 degrees. The number of misjudgments as well as the time taken to complete the task for different groups were counted. In order to decrease the influence of individual differences and learning effects on the 2D/3D experimental results, 15 subjects are tested in random order. The time interval between the two experiments was 5-7 days of each person. The superiority of the 3D LFD is demonstrated quantitatively by comparing results under different experimental conditions and using a statistical analysis method called the T-test.
KEYWORDS: RGB color model, 3D displays, Visualization, 3D modeling, Image processing, Eye models, Visual process modeling, Feature extraction, Data modeling, 3D visualizations
The human visual attention mechanism promotes human to acquire the most important cues from large amount of information. However, most methods of simulating human visual attention now focus on 2D display, people know little about how human assign their visual attention under a 3D display. This paper firstly produced a saliency dataset consisting of human eye-fixation data for different 3D scenes under human quick glance, which demonstrated the human visual attention distribution. By analyzing the dataset, an approach based on convolutional neural network for human visual attention prediction under 3D light field display was proposed. The network is composed of three parts which are two-way feature extraction, feature fusion and prediction output. Comparing with saliency prediction models under 2D display devices, our proposed model can predict the distribution of human visual attention under 3D light field more accurately. This research promote further investigation of 3D applications such as 3D device evaluation and 3D content production.
Recent researches on object segmentation mostly concentrate on single-view images or objects in 3D settings. In this paper, a novel method for efficient multi-view foreground object segmentation is presented, using spatial consistency across adjacent views as constraints to generate identical masks. Even though the conventional segmentation results at different views are relatively accurate, there always are inconsistent regions where the boundaries of the mask are different over the same area across multiple views. The central idea of our method is to utilize the camera parameters to guide the refocusing procedure, during which each instance across different views is refocused using multi-view projections. The refocused images are then used as the input of instance segmentation network to predict the bounding box and object mask. The final step takes the network output as the prior information for the GMMs to achieve more accurate segmentation results. While many concrete implementations of the general idea are feasible, satisfactory results can be achieved with this simple and efficient approach. Experimental results demonstrate both qualitatively and quantitatively that the proposed method outputs excellent results with less background pixels, thus allowing us to improve the 3D display quality eventually. We hope this simple and effective method can be of help to future researches in relevant tasks.
KEYWORDS: Video, 3D displays, Video coding, RGB color model, Video compression, 3D video compression, Display technology, Computer programming, 3D acquisition, Video processing
Large viewing angle, dense viewpoints, high-resolution 3D light field display technology can achieve large viewing angle, high resolution, and high dynamic reconstruction of the objective world scene light field, which is an important direction for the development of future display technology. The application of advanced 3D light field display technology is getting more and more attention. However, the current lack of efficient access to large viewing angles, dense viewpoints, and high-resolution 3D light field content limits its development and application. To this end, here proposes a dense viewpoints video generation, coding, transmission method, and build a super multi-view 3D light field video live broadcast system. The method and system are composed of 3D light field video generation, light field video coding, light field video transmission and a display part. It can provide viewers with a real-time, high-quality 3D visual experience.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.