Images captured by mobile camera systems are subject to distortions that can be irreversible. Sources of these distortions vary and can be attributed to sensor imperfections, lens defects, or shutter inefficiency. One form of image distortion is associated with high Parasitic-Light-Sensitivity (PLS) in CMOS Image Sensors when combined with Global Shutters (GS-CIS) in a moving camera system. The resulting distortion appears as widespread semi-transparent purple artifacts, or a complex purple fringe, covering a large area in the scene around high-intensity regions. Most of the earlier approaches addressing the purple fringing problems have been directed towards the simplest forms of this distortion and rely on heuristic image processing algorithms. Recently, machine learning methods have shown remarkable success in many image restoration and object detection problems. Nevertheless, they have not been applied for the complex purple fringing detection or correction. In this paper, we present our exploration and deployment of deep learning algorithms in a pipeline for the detection and correction of the purple fringing induced by high-PLS GS-CIS sensors. Experiments show that the proposed methods outperform state-of-the-art approaches for both problems of detection and color restoration. We achieve a final MS-SSIM of 0.966 on synthetic data, and a distortion classification accuracy of 96.97%. We further discuss the limitations and possible improvements over the proposed methods.
This paper proposes a growing-based floor-plan generation method that creates the global layout of buildings from noisy point clouds obtained by a stereo camera. We introduce a PCA-based line-growing concept with a subsequent filtering step, which is able to robustly handle the high noise levels in input point clouds. Experimental results show that this method outperforms the state-of-the-art techniques in floor-plan generation. The average F1 score for building layouts has increased from 0.38 to 0.66 on our test dataset, compared to the previous best floor-plan generation method. Furthermore, the resulting floor plans are multiple thousands of times smaller in memory size than the input point clouds, while still preserving the main building structures.
Reliable multitype and orientation vessel detection is of vital importance for maritime surveillance. We develop three separate convolutional neural network (CNN) models for high-performance single-class vessel detection and then multiclass vessel-type/orientation detection. We also propose a modular combined network, which enhances the multiclass operation. The initial three models provide reliable F1 scores of 85%, 82%, and 76%, respectively. In addition, the modular combined approach improves the F1 scores for the multitype and orientation vessel detection by 2% and 3%, respectively. The training and testing were done on a dataset, including the multitype/orientation annotations, covering 31,078 vessel labels (10 vessel types and 5 orientations), which is offered to public access.
Person re-identification (re-ID) is a valuable tool for multi-camera tracking of persons. Up till now, research on person re-ID has mainly focused on the closed-set case, where a given query is assumed to always have a correct match in the gallery set, which does not hold for practical scenarios. In this study, we explore the open-set person re-ID problem with queries not always included in the gallery set. First, we convert the popular closed-set person re-ID datasets into the open-set scenario. Second, we compare the performances of six state-of-the-art closed-set person re-ID methods under open-set conditions. Third, we investigate the impact of a simple and fast statistics-driven gallery refinement approach on the open-set person re-ID performance. Extensive experimental evaluations show that, gallery refinement increases the performance of existing methods in the low false-accept rate (FAR) region, while simultaneously reducing the computational demands of retrieval. Results show an average detection and identification rate (DIR) increase of 7.91% and 3.31% on the DukeMTMC-reID and Market1501 datasets, respectively, for an FAR of 1%.
In this work, we present a camera geopositioning system based on matching a query image against a database with panoramic images. For matching, our system uses memory vectors aggregated from global image descriptors based on convolutional features to facilitate fast searching in the database. To speed up searching, a clustering algorithm is used to balance geographical positioning and computation time. We refine the obtained position from the query image using a new outlier removal algorithm. The matching of the query image is obtained with a recall@5 larger than 90% for panorama-to-panorama matching. We cluster available panoramas from geographically adjacent locations into a single compact representation and observe computational gains of approximately 50% at the cost of only a small (approximately 3%) recall loss. Finally, we present a coordinate estimation algorithm that reduces the median geopositioning error by up to 20%.
KEYWORDS: Data modeling, Performance modeling, Statistical modeling, Sensors, Visual process modeling, Surveillance, Maritime surveillance, Video surveillance, Visualization, Surveillance systems
This paper proposes a novel self-learning framework, which converts a noisy, pre-labeled multi-class object dataset into a purified multi-class object dataset with object bounding-box annotations, by iteratively removing noise samples from the low-quality dataset, which may contain a high level of inter-class noise samples. The framework iteratively purifies the noisy training datasets for each class and updates the classification model for multiple classes. The procedure starts with a generic single-class object model which changes to a multi-class model in an iterative procedure of which the F-1 score is evaluated to reach a sufficiently high score. The proposed framework is based on learning the used models with CNNs. As a result, we obtain a purified multi-class dataset and as a spin-off, the updated multi-class object model. The proposed framework is evaluated on maritime surveillance, where vessels need to be classified into eight different types. The experimental results on the evaluation dataset show that the proposed framework improves the F-1 score approximately by 5% and 25% at the end of the third iteration, while the initial training datasets contain 40% and 60% inter-class noise samples (erroneously classified labels of vessels and without annotations), respectively. Additionally, the recall rate increases nearly by 38% (for the more challenging 60% inter-class noise case), while the mean Average Precision (mAP) rate remains stable.
KEYWORDS: Image segmentation, 3D image processing, Edge detection, Detection and tracking algorithms, Sensors, Image processing algorithms and systems, Reconstruction algorithms, Clouds, Data modeling, Algorithm development
Real-time execution of processing algorithms for handling depth images in a three-dimensional (3-D) data framework is a major challenge. More specifically, considering depth images as point clouds and performing planar segmentation requires heavy computation, because available planar segmentation algorithms are mostly based on surface normals and/or curvatures, and, consequently, do not provide real-time performance. Aiming at the reconstruction of indoor environments, the spaces mainly consist of planar surfaces, so that a possible 3-D application would strongly benefit from a real-time algorithm. We introduce a real-time planar segmentation method for depth images avoiding any surface normal calculation. First, we detect 3-D edges in a depth image and generate line segments between the identified edges. Second, we fuse all the points on each pair of intersecting line segments into a plane candidate. Third and finally, we implement a validation phase to select planes from the candidates. Furthermore, various enhancements are applied to improve the segmentation quality. The GPU implementation of the proposed algorithm segments depth images into planes at the rate of 58 fps. Our pipeline-interleaving technique increases this rate up to 100 fps. With this throughput rate improvement, the application benefit of our algorithm may be further exploited in terms of quality and enhancing the localization.
KEYWORDS: Clouds, Sensors, 3D modeling, Data modeling, RGB color model, Data fusion, Visualization, Reconstruction algorithms, LIDAR, Image segmentation
In this paper we present techniques for highly detailed 3D reconstruction of extra large indoor environments. We discuss the benefits and drawbacks of low-range, far-range and hybrid sensing and reconstruction approaches. The proposed techniques for low-range and hybrid reconstruction, enabling the reconstruction density of 125 points/cm3 on large 100.000 m3 models, are presented in detail. The techniques tackle the core challenges for the above requirements, such as a multi-modal data fusion (fusion of a LIDAR data with a Kinect data), accurate sensor pose estimation, high-density scanning and depth data noise filtering. Other important aspects for extra large 3D indoor reconstruction are the point cloud decimation and real-time rendering. In this paper, we present a method for planar-based point cloud decimation, allowing for reduction of a point cloud size by 80-95%. Besides this, we introduce a method for online rendering of extra large point clouds enabling real-time visualization of huge cloud spaces in conventional web browsers.
KEYWORDS: Cameras, Detection and tracking algorithms, Visualization, Algorithm development, 3D modeling, Optical tracking, Near field, Associative arrays, Data fusion, Sensors
In recent years, many research has been devoted to real-time dense mapping and tracking techniques due to the availability of low-cost RGB-D cameras. In this paper, we present a novel multi-volume mapping and tracking algorithm to generate photo-realistic mapping while maintaining accurate and robust camera tracking. The algorithm deploys one small volume of high voxel resolution to obtain detailed maps of near-field objects, while utilizes another big volume of low voxel resolution to increase robustness of tracking by including far-field scenes. The experimental results show that our multivolume processing scheme achieves an objective quality gain of 2 dB in PSNR and 0.2 in SSIM. Our approach is capable of real-time sensing with approximately 30 fps and can be implemented on a modern GPU.
One of the major challenges for applications dealing with the 3D concept is the real-time execution of the algorithms. Besides this, for the indoor environments, perceiving the geometry of surrounding structures plays a prominent role in terms of application performance. Since indoor structures mainly consist of planar surfaces, fast and accurate detection of such features has a crucial impact on quality and functionality of the 3D applications, e.g. decreasing model size (decimation), enhancing localization, mapping, and semantic reconstruction. The available planar-segmentation algorithms are mostly developed using surface normals and/or curvatures. Therefore, they are computationally expensive and challenging for real-time performance. In this paper, we introduce a fast planar-segmentation method for depth images avoiding surface normal calculations. Firstly, the proposed method searches for 3D edges in a depth image and finds the lines between identified edges. Secondly, it merges all the points on each pair of intersecting lines into a plane. Finally, various enhancements (e.g. filtering) are applied to improve the segmentation quality. The proposed algorithm is capable of handling VGA-resolution depth images at a 6 FPS frame-rate with a single-thread implementation. Furthermore, due to the multi-threaded design of the algorithm, we achieve a factor of 10 speedup by deploying a GPU implementation.
Object detection is an important technique for video surveillance applications. Although different detection algorithms were proposed, they all have problems in detecting occluded objects. In this paper, we propose a novel system for occlusion handling and integrate this in a sliding-window detection framework using HOG features and linear classification. The occlusion handling is obtained by applying multiple classifiers, each covering a different level of occlusion and focusing on the non-occluded object parts. Experiments show that our approach based on 17 classifiers, obtains an increase of 8% in detection performance. To limit computational complexity, we propose a cascaded implementation that only increases the computational cost by 3.4%. Although the paper presents results for pedestrian detection, our approach is not limited to this object class. Finally, our system does not need an additional dataset for training, covering all possible types of occlusions.
KEYWORDS: Video, Systems modeling, Computer simulations, Modeling, Computer aided design, Multimedia, Data modeling, Computer architecture, Process modeling, Software development
Component-based software development is very attractive, because it allows a clear decomposition of logical
processing blocks into software blocks and it offers wide reuse. The strong real-time requirements of media
processing systems should be validated as soon as possible to avoid costly system redesign. This can be achieved
by prediction of timing and performance properties. In this paper, we propose a scenario simulation design
approach featuring early performance prediction of a component-based software system. We validated this
approach through a case study, for which we developed an advanced MPEG-4 coding application. The benefits
of the approach are threefold: (a) high accuracy of the predicted performance data; (b) it delivers an efficient
real-time software-hardware implementation, because the generic computational costs become known in advance,
and (c) improved ease of use because of a high abstraction level of modelling. Experiments showed that the
prediction accuracy of the system performance is about 90% or higher, while the prediction accuracy of the
time-detailed processor usage (performance) does not get lower than 70%. However, the real-time performance
requirements are sometimes not met, e.g. when other applications require intensive memory usage, thereby
imposing delays on the retrieval from memory of the decoder data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.