It is crucial to reduce the cost of deep convolutional neural networks while preserving their accuracy. Existing methods adaptively prune DNNs in a layer-wise or channel-wise manner based on the input image. In this paper, we develop a novel dynamic network, namely Dynamic-Stride-Net, to improve residual network with layer-wise adaptive strides in the convolution operations. Dynamic-Stride-Net leverages a gating network to adaptively select the strides of convolutional blocks based on the outputs of the previous layer. To optimize the selection of strides, the gating network is trained by reinforcement learning. The floating point operations per second (FLOPS) is significantly reduced by adapting the strides to convolutional layers without loss of accuracy. Dynamic-Stride-Net reduces the computational cost by 35%-50% with equivalent accuracy of the original model on CIFAR-10 and CIFAR-100 datasets. It outperforms the state-of-the-art dynamic networks and static compression methods.
To compensate the deficit of 3D content, 2D to 3D video conversion (2D-to-3D) has recently attracted more
attention from both industrial and academic communities. The semi-automatic 2D-to-3D conversion which
estimates corresponding depth of non-key-frames through key-frames is more desirable owing to its advantage
of balancing labor cost and 3D effects. The location of key-frames plays a role on quality of depth propagation.
This paper proposes a semi-automatic 2D-to-3D scheme with adaptive key-frame selection to keep temporal
continuity more reliable and reduce the depth propagation errors caused by occlusion. The potential key-frames
would be localized in terms of clustered color variation and motion intensity. The distance of key-frame interval
is also taken into account to keep the accumulated propagation errors under control and guarantee minimal user
interaction. Once their depth maps are aligned with user interaction, the non-key-frames depth maps would be
automatically propagated by shifted bilateral filtering. Considering that depth of objects may change due to
the objects motion or camera zoom in/out effect, a bi-directional depth propagation scheme is adopted where a
non-key frame is interpolated from two adjacent key frames. The experimental results show that the proposed
scheme has better performance than existing 2D-to-3D scheme with fixed key-frame interval.
KEYWORDS: Cameras, Light, Calibration, Light sources, Reconstruction algorithms, 3D modeling, Time of flight imaging, Clouds, Light scattering, 3D image reconstruction
This paper is devoted to generating the coordinates of partial 3D points in scene reconstruction via time of flight
(ToF) images. Assuming the camera does not move, only the coordinates of the points in images are accessible.
The exposure time is two trillionths of a second and the synthetic visualization shows that the light moves at half
a trillion frames per second. In global light transport, direct components signify that the light is emitted from
a light point and reflected from a scene point only once. Considering that the camera and source light point are
supposed to be two focuses of an ellipsoid and have a constant distance at a time, we take into account both the
constraints: (1) the distance is the sum of distances which light travels between the two focuses and the scene
point; and (2) the focus of the camera, the scene point and the corresponding image point are in a line. It is
worth mentioning that calibration is necessary to obtain the coordinates of the light point. The calibration can
be done in the next two steps: (1) choose a scene that contains some pairs of points in the same depth, of which
positions are known; and (2) take the positions into the last two constraints and get the coordinates of the light
point. After calculating the coordinates of scene points, MeshLab is used to build the partial scene model. The
proposed approach is favorable to estimate the exact distance between two scene points.
In this paper, we study the optimal bandwidth allocation for scalable video coding (SVC) streaming in multiple overlays.
We model the whole bandwidth request and distribution process as a set of decentralized auction games between the
competing peers. For the upstream peer, a bandwidth allocation mechanism is introduced to maximize the aggregate
revenue. For the downstream peer, a dynamic bidding strategy is proposed. It achieves maximum utility and efficient
resource usage by collaborating with a content-aware layer dropping/adding strategy. Also, the convergence of the
proposed auction games is theoretically proved. Experimental results show that the auction strategies can adapt to
dynamic join of competing peers and video layers.
In this paper, we address the joint optimization problem of rate allocation and distortion control for scalable video coding
(SVC) multicast networks. Firstly, we construct video distribution meshes by coupling network coding and multipath
routing with multi-rate control to seek optimal routing paths and associated transmission rates. Secondly, a particular
minimum bandwidth consumption scheme is presented for each video layer to reduce video distortion, where the content
priority of the base layer and the minimum bandwidth consumption at higher layers are taken into account. Finally, a
convex mathematic model combining these considerations is proposed. Through decomposition and dual approach, the
target convex optimization problem is solved by a fully decentralized algorithm which is decomposed into a two-level
optimization procedure. Simulation results validate the convergence behavior and performance of the proposed
algorithm.
This paper presents a new rate allocation algorithm for MCTF-based video coding with the aim to control quality fluctuation. Distortion analysis is conducted for MCTF using a simplified signal model. Based on the analysis, the aim to control quality fluctuation is posed as a quadratic programming problem and its solution forms the basis for our
proposed algorithm. After discussions on some extensions of the proposed rate allocation method, we verify it on MPEG SVC reference software and the experimental results demonstrate that the proposed rate allocation scheme can reduce quality fluctuation significantly.
In most of the existing in-band wavelet video coding schemes, over-complete form of reference bands is used to solve the wavelet shift-variance problem on motion compensated temporal filtering (MCTF) in wavelet domain. It can more or less improve the coding efficiency of the in-band MCTF scheme. However, there is a dilemma always existing when the input video is decomposed into a low-pass band and several high-pass bands. If the temporal filtering in the lowpass band does not use any information from the high-pass bands, the coding efficiency will decrease significantly for the higher resolution reconstructed video. If the temporal filtering in the low-pass band does use data from the high-pass bands, it will bring serious drifting error when the high-pass bands are not available at the decoder. In this paper, for the latter case, we analyze when drifting errors occur and how they will propagate along the lifting structure for decoding at the lower resolution. Based on the analysis, we introduce two new inter coding modes at macroblock level for spatial low-pass band MCTF to make a better trade-off between the lower resolution drifting error and the high resolution
coding efficiency. Furthermore, we present the criterion to adaptively select the proper coding mode for each macroblock of spatial low-pass band. The experiment results show that our proposed scheme can dramatically reduce the drifting error about 0.4~2.6dB for different bit rates at low resolution, while for the high resolution, the performance loss is marginal.
KEYWORDS: 3D image processing, 3D video compression, Wavelets, Reconstruction algorithms, Video, Distortion, 3D scanning, Video coding, Video compression, Visualization
3-D embedded wavelet video coding (3-D EWVC) algorithms become a vital scheme for state-of-the-art scalable video
coding. A major objective in a progressive transmission scheme is to select the most important information which yields
the largest distortion reduction to be transmitted first, so traditional 3-D EWVC algorithms scan coefficients according
to bit-plane order. To significant bit information of the same bit-plane, however, these algorithms neglect the different
effect of coefficients in different subbands to distortion. In this paper, we analyze different effect of significant
information bits of the same bit-plane in different subbands to distortion and propose a high-efficient significant
coefficient scanning algorithm. Experimental results of 3-D SPIHT and 3-D SPECK show that high-efficient significant
coefficient scanning algorithm can improve traditional 3-D EWVC algorithms' ability of compression, and make
reconstructed videos have higher PSNR and better visual effects in the same bit rate compared to original significant
coefficient scanning algorithms respectively.
KEYWORDS: Visualization, Motion models, Visual process modeling, Video, Digital filtering, Video compression, Scalable video coding, Wavelets, Digital watermarking, Image compression
A fundamental difference in the MCTF coding scheme from the conventional compensated DCT schemes is that the predicted residue is further used to update the temporal low-pass frames. If the motion prediction is inaccurate, it would introduce ghost at the low-pass frames when some high-pass frames are dropped due to limited channel bandwidth or device capability. However, it will definitely hurt the coding efficiency if the update step is removed totally. To solve the dilemma, this paper proposes a content adaptive update scheme, where the JND (Just Noticeable Difference) metric is used to evaluate the impact of the update steps in terms of visual quality at the low-pass frames. The JND thresholds are image dependent, and as long as the update information remains below these thresholds, we achieve “update residual” transparency. Therefore, the potential ghost artifacts detected by the model can be alleviated by adaptively removing visible part of the predicted residues. Experimental results show that the proposed algorithm not only significantly improves subjective visual quality of the temporal low-pass frames but also maintains the PSNR performance compared with the normal full update.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.