KEYWORDS: Video, Gas lasers, Visualization, Signal attenuation, Education and training, Image quality, Image enhancement, Video coding, Video processing, Signal to noise ratio
Introduction of loss function that considers image structure to DNN-based video prediction has been proven to reduce blurriness of generated prediction frames. In this paper, we propose an improved loss function based on image gradient difference (GDL) which captures edge structure of image, and evaluate its performance over PredNet that is a well-known DNN-based video prediction scheme. Our experimental results show that the proposed loss function can improve prediction performance in terms of color representation and generating sharper prediction frames.
DNN-based video frame prediction can be a powerful tool to improve performance of motion-compensated prediction in video coding. In this paper, we propose a method applying multiple convolution kernels with different sizes to PredNet, which is one of the DNN-based video prediction schemes, to enhance the prediction accuracy by incorporating context adaptivity in its convolutional LSTM layers. We analyze the prediction performance of the proposal, and the results show that applying multiple-size kernels is effective than applying a single-size kernel in terms of prediction error reduction.
Semantic segmentation is a pixel-level classification problem in computer vision, in which pixels of the same class are grouped into a single category in order to interpret pictures at the pixel level. In this field, semantic segmentation of street fashion images is a challenging task since the clothing items would appear with wide variations in fabrics, layering, occlusion and viewpoint. To help better understanding the street fashion images, we propose a lightweight Semantic Context Aware Transformer (SCAT) to be applied to the semantic segmentation task for street fashion images, which integrates semantic context into the encoding, and models the relationship between multi-level outputs from transformer layers. Extensive experiments and comparisons show that the proposal achieves the state-of-the-art results on ModaNet dataset with relatively small model size, with over 1.1 point improvement compared to Shunted Transformer, and even surpasses other CNNs and Transformers with a large margin of over 2 point in mIoU.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.