Current learning-based Computer-Generated Holography (CGH) algorithms often utilize Convolutional Neural Networks (CNN)-based architectures. However, the CNN-based non-iterative methods mostly underperform the State-Of-The-Art (SOTA) iterative algorithms such as Stochastic Gradient Descent (SGD) in terms of display quality. Inspired by the global attention mechanism of Vision Transformer (ViT), we propose a novel unsupervised autoencoder-based ViT for generating phase-only holograms. Specifically, for the encoding part, we use Uformer to generate the holograms. For the decoding part, we use the Angular Spectrum Method (ASM) instead of a learnable network to reconstruct the target images. To validate the effectiveness of the proposed method, numerical simulations and optical reconstructions are performed to compare our proposal against both iterative algorithms and CNN-based techniques. In the numerical simulations, the PSNR and SSIM of the proposed method are 26.78 dB and 0.832, which are 4.02 dB and 0.09 higher than that of the CNN-based method, respectively. Moreover, the proposed method contains less speckles and features a higher display quality than other CGH methods in experiments. We suggest the improvement might be ascribed to the ViT’s global attention mechanism, which is more suitable for learning the cross-domain mapping from image (spatial) domain to hologram (Fourier) domain. We believe the proposed ViT-based CGH algorithm could be a promising candidate for future real-time high-fidelity holographic displays.
Inpainting shadowed regions cast by superficial blood vessels in retinal optical coherence tomography (OCT) images is critical for accurate and robust machine analysis and clinical diagnosis. Traditional sequence-based approaches such as propagating neighboring information to gradually fill in the missing regions are cost-effective. But they generate less satisfactory outcomes when dealing with larger missing regions and texture-rich structures. Emerging deep learning-based methods such as encoder-decoder networks have shown promising results in natural image inpainting tasks. However, they typically need a long computational time for network training in addition to the high demand on the size of datasets, which makes it difficult to be applied on often small medical datasets. To address these challenges, we propose a novel multi-scale shadow inpainting framework for OCT images by synergically applying sparse representation and deep learning: sparse representation is used to extract features from a small amount of training images for further inpainting and to regularize the image after the multi-scale image fusion, while convolutional neural network (CNN) is employed to enhance the image quality. During the image inpainting, we divide preprocessed input images into different branches based on the shadow width to harvest complementary information from different scales. Finally, a sparse representation-based regularizing module is designed to refine the generated contents after multi-scale feature aggregation. Experiments are conducted to compare our proposal versus both traditional and deep learning-based techniques on synthetic and real-world shadows. Results demonstrate that our proposed method achieves favorable image inpainting in terms of visual quality and quantitative metrics, especially when wide shadows are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.