KEYWORDS: RGB color model, Video, Image segmentation, Sensors, Video surveillance, Detection and tracking algorithms, Temporal coherence, Image processing algorithms and systems, Cameras, Human subjects
We develop an algorithm for 4-D (RGB+Depth) video segmentation targeting immersive teleconferencing ap-
plications on emerging mobile devices. Our algorithm extracts users from their environments and places them
onto virtual backgrounds similar to green-screening. The virtual backgrounds increase immersion and interac-
tivity, relieving the users of the system from distractions caused by disparate environments. Commodity depth
sensors, while providing useful information for segmentation, result in noisy depth maps with a large number
of missing depth values. By combining depth and RGB information, our work signi¯cantly improves the other-
wise very coarse segmentation. Further imposing temporal coherence yields compositions where the foregrounds
seamlessly blend with the virtual backgrounds with minimal °icker and other artifacts. We achieve said improve-
ments by correcting the missing information in depth maps before fast RGB-based segmentation, which operates
in conjunction with temporal coherence. Simulation results indicate the e±cacy of the proposed system in video
conferencing scenarios.
In this paper we share our recent observations on methods for sparsity enforced orthogonal transform design. In our previous work on this problem, our target was to design transforms (sparse orthonormal transforms - SOT) that minimize the overall sparsity-distortion cost of a collection of image patches mainly for improving the performance of compression methods. In this paper we go one step further to understand why these transforms achieve better approximation and how different they are from transforms like the DCT or the Karhunen-Loeve transform (KLT). Our study lead us to mathematically validate that for a Gaussian process the KLT is the optimal transform not only in a linear approximation sense but also in a nonlinear approximation sense, the latter forming the basis for sparsity-based regularization. This means that the search for SOTs yields the KLT in Gaussian processes, but results in transforms that are distinctly different from the KLT in non-Gaussian cases by capturing useful structures within the data. Both toy examples and real compression results in various representation domains are presented in this paper to support our observations.
Due to the prevalence of edges in image content, various directional transforms have been proposed for the
efficient representation of images. Such transforms are useful for coding, denoising, and image restoration using
sparse signal representation techniques. This paper describes a new non-separable 2D DCT-like orthonormal
block transform that is optimized for a specified orientation angle. The approach taken in this paper is to extend
to two-dimensions one approach (of several) for constructing the standard 1D DCT. The proposed transform is
obtained as the eigenvectors of particular matrices, as is the standard 1D DCT.
This paper describes the construction of a set of sparsity-distortion-optimized orthonormal transforms designed for wavelet-domain
image denoising. The optimization operates over sub-bands of given orientation and exploits intra-scale dependencies
of wavelet coefficients over image singularities. When applied on the top of standard wavelet transforms, the resulting
new sparse representation provides compaction that can be exploited in transform domain denoising via cycle-spinning.1
Our construction deviates from the literature, which mainly focuses on model-based methods, by offering a data-driven optimization
of wavelet representations. Compared with translational-invariant denoising, the proposed method consistently
offers better performance compared to the original wavelet-representation and can reach up to 3dB improvements.
The bulk of the video content available today over the Internet and over mobile networks suffers from many
imperfections caused during acquisition and transmission. In the case of user-generated content, which is typically
produced with inexpensive equipment, these imperfections manifest in various ways through noise, temporal
flicker and blurring, just to name a few. Imperfections caused by compression noise and temporal flicker are
present in both studio-produced and user-generated video content transmitted at low bit-rates. In this paper,
we introduce an algorithm designed to reduce temporal flicker and noise in video sequences. The algorithm takes
advantage of the sparse nature of video signals in an appropriate transform domain that is chosen adaptively based
on local signal statistics. When the signal corresponds to a sparse representation in this transform domain, flicker
and noise, which are spread over the entire domain, can be reduced easily by enforcing sparsity. Our results show
that the proposed algorithm reduces flicker and noise significantly and enables better presentation of compressed
videos.
KEYWORDS: Video, Transform theory, Super resolution, Linear filtering, Video compression, Visualization, Video processing, Multimedia, Computer programming, Electronic filtering
Multimedia services for mobile phones are becoming increasingly popular thanks to capabilities brought about
by location awareness, customized programming, interactivity, and portability. With mounting attraction to
these services there is desire to seamlessly expand the mobile multimedia experience to stationary environments
where high-resolution displays can offer significantly better viewing conditions. In this paper, we propose a
fast, high quality super-resolution algorithm that enables high resolution display of low-resolution video. The
proposed algorithm, SWAT, accomplishes sparse reconstructions using directionally warped transforms and spatially
adaptive thresholding. Comparisons are made with some existing techniques in terms of PSNR and visual
quality. Simulation examples show that SWAT significantly outperforms these techniques while staying within
a limited computational complexity envelope.
In this paper we propose a prediction method that is geared toward forming successful estimates of a signal
based on a correlated anchor signal contaminated with complex interference. The interference model is based
on real-life, and it involves intensity modulations, linear distortions, structured clutter, and white noise just
to name a few. The proposed method first transforms signals to an over-complete domain where we assume
sparse decompositions. In this sparse domain, we show that very simple predictors can be designed to perform
efficient prediction. The parameters of these predictors are derived from causal information, enabling completely
automated and blind operation. The utilized over-complete representation allows multiple predictions for each
sample in signal domain, which are averaged and combined into a single prediction. Experimental results on
images and video frames show that the proposed method can provide successful predictions under a variety of
complex transitions, such as cross-fades, brightness changes, focus variations, and other complex distortions. The
proposed prediction method is also implemented to operate inside a state-of-the-art video compression codec and
results show significant improvements on scenes that are hard to encode using traditional prediction techniques.
KEYWORDS: Video, Super resolution, Video compression, Computer programming, Mobile devices, Wavelets, Linear filtering, Image processing, Wavelet transforms, Reconstruction algorithms
We consider the mobile service scenario where video programming is broadcast to low-resolution wireless terminals. In such a scenario, broadcasters utilize simultaneous data services and bi-directional communications capabilities of the terminals in order to offer substantially enriched viewing experiences to users by allowing user participation and user tuned content. While users immediately benefit from this service when using their phones in mobile environments, the service is less appealing in stationary environments where a regular television provides competing programming at much higher display resolutions. We propose a fast super-resolution technique that allows the mobile terminals to show a much enhanced version of the broadcast video on nearby high-resolution devices, extending the appeal and usefulness of the broadcast service. The proposed single frame super-resolution algorithm uses recent sparse recovery results to provide high quality and high-resolution video reconstructions based solely on individual decoded frames provided by the low-resolution broadcast.
KEYWORDS: Denoising, Associative arrays, Data modeling, Wavelets, Interference (communication), Optimization (mathematics), Signal processing, Chemical elements, Annealing, Data communications
In this paper we consider the recovery of missing regions in images and we compare the performance of two recent prediction algorithms that utilize sparse recovery. The first algorithm is based on recent work that tries to find sparse atomic decompositions (AD) using l1-norm regularization, while the second algorithm employs iterated denoising (ID). Experimental results indicate that ID generally outperforms the l1 based technique and we investigate the reasons for the often substantial performance difference. We discuss many issues that effect the robustness of the l1 based technique and in particular, we point to inherent problems in the missing data prediction setting that challenge the underlying sparse atomic decomposition assumptions at their core. Inspired by what ID does right, we provide techniques that are expected to improve the performance of sparse atomic decomposition motivated algorithms and we establish connections with ID.
In this paper we present an algorithm that segments a scanned, compound document into halftone and non-halftone areas. Our work is intended as a precursor to sophisticated document processing applications (descreening, compression, document content analysis, etc.) for which undetected halftones may cause serious adverse affects. Our method is of very low computational and memory complexity and performs only a single pass on the scanned document.
Halftone regions of arbitrary size and shape are detected on compound multilingual documents in a completely automated fashion
without any prior knowledge on the type and resolution of the halftones to be detected. The proposed technique can be adjusted to determine halftones of particular dpi resolution or decompose
detected halftones to constituent resolutions. We obtain high detection probabilities on compound multilingual documents containing halftones and fine text.
We present a multiresolutional algorithm that segments a compound document and uses the results of the segmentation for document enhancement in copier applications. The document is initially segmented into halftone and nonhalftone areas. Based on this segmentation the location of the edges due to text, graphics, and images (and not due to halftone dots) are detected on halftone as well as on nonhalftone portions. We further detect constant-tone regions within nonhalftone areas for subsequent bleed-through removal applications. Edge enhancement on detected edges and descreening on detected halftones are carried out. The algorithm can detect general halftones over regions of arbitrary sizes and shapes, and it can be straightforwardly adjusted for operation at various dpi resolutions. We obtain high detection probabilities on compound multilingual documents containing halftones and fine text. The proposed enhancement stage is tolerant of segmentation errors providing robust performance for the remaining problem cases. Our main contribution is the accomplishment of these tasks with a single pass algorithm that is computationally very simple and that requires less than 1% of full page memory, with active memory requirements less than 0.02% of full page memory. The operation of the algorithm can be imagined as a very thin line (of thickness the size of a "full-stop" in 11 pt text) that rapidly scans an input page while simultaneously producing an output page.
We propose a non-iterative, globally optimal dense motion field estimation technique based on a multiresolutional probability model. We consider the field to be estimated in terms of its wavelet coefficients and carry out the estimation in the field’s wavelet transform domain. Our approach models interscale dependencies of the wavelet coefficients and allows for smooth, edge, and occluded regions in the field. We obtain segmentations of the field and our results show that the field estimates yield accurate depictions of scene motion. The globally optimal nature of our estimation framework allows it to be applicable in scenes exhibiting large motion and in settings of ill-posed motion. Hence, our algorithms can also be used to determine accurate initializations for optical flow type estimation techniques, which use more sophisticated models but can only obtain locally optimal solutions that are heavily dependent on initial conditions. The performance is illustrated on several examples.
We present standards-compliant visible watermarking schemes for digital images and video in DCT-based compressed formats. The watermarked data is in the same compressed format as the original and can be viewed with standard tools and applications. Moreover, for most of the schemes presented, the watermarked data has exactly the same compressed size as the original. The watermark can be inserted and removed using a key for applications requiring content protection. The watermark application and removal algorithms are very efficient and exploit some features of compressed data formats (such as JPEG and MPEG) which allow most of the work to be done in the compressed domain.
This paper proposes an algorithm for improving the performance of standard block transform coding algorithms by better exploiting the correlations between transform coefficients. While standard algorithms focus on decorrelating coefficients within each block, the new approach focuses on exploiting correlations between coefficients of different blocks. Interblock correlations are minimized by linearly estimating coefficients from previously transmitted neighboring block coefficients. The use of linear estimators between blocks leads to a nonorthogonal representation of the image. Quantization issues relating to this nonorthogonal transform are addressed, and an image coding implementation is simulated. Simulations demonstrate that large improvements are observed over standard block transform coding systems, over a wide range of bitrates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.