Face super-resolution, successfully using fusion network approach, has successfully solved the problem of face image restoration. Recently, face attributes have been effectively used to guide the low-level feature point of the face to perform viable face recovery. First, the low-resolution image is enlarged into a super-resolution face image. Landmarks are estimated to guide the network to enhance the super-resolution image repeatedly. However, the face super-resolution network architecture parameter is redundant, and the learning efficiency is low on mapping input and target output. This paper proposes a deep attention pixel for face super-resolution, which applies an attention mechanism to optimize feature extraction and fuses the channel attention with facial landmarks heatmaps. Experimental results demonstrate that the proposed method achieves higher performance than other state-of-the-art face super-resolution methods.
An improvement in the method of automatic vehicle classification is investigated. The challenges are to correctly classify vehicles regardless of changes in illumination, differences in points of view of the camera, and variations in the types of vehicles. Our proposed appearance-based feature extraction algorithm is called linked visual words (LVWs) and is based on the existing technique bag-of-visual word (BoVW) with the addition of spatial information to improve accuracy of classification. In addition, to prevent over-fitting due to a large number of LVWs, four common sampling techniques with LVWs are investigated. Our results suggest that the sampling of LVWs using TF-IDF with grouping improved the accuracy of classification for the test dataset. In summary, the proposed system is able to classify nine types of vehicles and work with surveillance cameras in real-world scenarios. The classification accuracy of the proposed system is 5.58% and 4.27% higher on average for three datasets when compared with BoVW + SVM and Lenet-5, respectively.
Human action classification based on the adaptive key frame interval (AKFI) feature extraction is presented. Since human movement periods are different, the action intervals that contain the intensive and compact motion information are considered in this work. We specify AKFI by analyzing an amount of motion through time. The key frame is defined to be the local minimum interframe motion, which is computed by using frame differencing between consecutive frames. Once key frames are detected, the features within a segmented period are encoded by adaptive motion history image and key pose history image. The action representation consists of the local orientation histogram of the features during AKFI. The experimental results on Weizmann dataset, KTH dataset, and UT Interaction dataset demonstrate that the features can effectively classify action and can classify irregular cases of walking compared to other well-known algorithms.
H.264/AVC is the newest standard for digital video compression developed jointly by ITU-T's Video Coding Experts Group and ISO/IEC's Moving Picture Experts Group. One feature of the new standard is the adoption of a robust error resilience tool at the encoder known as flexible macroblock ordering (FMO). In this paper, we present an algorithm to generate a one-pass FMO map based on spatial and temporal information at the encoder, and an error concealment method at the decoder for wireless video transmission. The error concealment method at the decoder is applied according to the residual information derived from the distortion information obtained at the encoder while the one-pass FMO map is being generated. Our simulation results performed under slow and fast fading channels confirm that the proposed technique can reduce the number of undecodable macroblocks up to 66.79% and 80.54%, respectively, when compared with no FMO. The peak signal-to-noise ratio improvements are up to 2.67 and 1.05 dB, respectively when compared to predefined FMO (Type 1).
JPEG has been a widely recognized image compression standard for many years. Nevertheless, it faces its own
limitations as compressed image quality degrades significantly at lower bit rates. This limitation has been addressed in
JPEG2000 which also has a tendency to replace JPEG, especially in the storage and retrieval applications. To efficiently
and practically index and retrieve compressed-domain images from a database, several image features could be extracted
directly in compressed domain without having to fully decompress the JPEG2000 images. JPEG2000 utilizes wavelet
transform. Wavelet transforms is one of widely-used to analyze and describe texture patterns of image. Another
advantage of wavelet transform is that one can analyze textures with multiresolution and can classify directional texture
pattern information into each directional subband. Where as, HL subband implies horizontal frequency information, LH
subband implies vertical frequency information and HH subband implies diagonal frequency. Nevertheless, many
wavelet-based image retrieval approaches are not good tool to use directional subband information, obtained by wavelet
transforms, for efficient directional texture pattern classification of retrieved images. This paper proposes a novel image
retrieval technique in JPEG2000 compressed domain using image significant map to compute an image context in order
to construct image index. Experimental results indicate that the proposed method can effectively differentiate and
categorize images with different texture directional information. In addition, an integration of the proposed features with
wavelet autocorrelogram also showed improvement in retrieval performance using ANMRR (Average Normalized
Modified Retrieval Rank) compared to other known methods.
We investigate the scenario of using the Automatic Repeat reQuest (ARQ) retransmission scheme for two-way low bit-rate video communications over wireless Rayleigh fading channels. We show that during the retransmission of error packets, due to the reduced channel throughput, the video encoder buffer may fill-up quickly and cause the TMN8 rate-control algorithm to significantly reduce the bits allocated to each video frame. This results in Peak Signal-to-Noise Ratio (PSNR) degradation and many skipper frames. To reduce the number of frames skipped, in this paper we propose a coding scheme which takes into consideration the effects of the video buffer fill-up, an a priori channel model, the channel feedback information, and hybrid ARQ/FEC. The simulation results indicate that our proposed scheme encode the video sequences with much fewer frame skipping and with higher PSNR compared to H.263 TMN8.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.