|
|
1.INTRODUCTIONSuper-resolution, structured illumination microscopy (SIM) is an ideal modality for imaging live cells due to its relatively high speed and low photon-induced damage to the cells in comparison to other super-resolution fluorescence microscopy techniques [2, 3]. Structured illumination microscopy and its variants thereof are based on the original wide-field design of Gustafsson [4]. SIM consists of two generic components: (i) sample illumination by a sinusoidal pattern and (ii), computational reconstruction of a super-resolution image [5]. Over the years, intensive research has focused on improving the hardware, the means of sample illumination, the algorithms to reconstruct images, and approaches to increase algorithm reconstruction speed [6-10]. The overarching goal of these combined efforts is to produce an imaging modality that produces super-resolution images in real-time with minimal artifacts [3, 11-14]. Often the rate-limiting step in observing a super-resolution image in SIM is the reconstruction speed of the algorithm required to form a single image from as many as nine raw images [15, 16]. The speed of execution can be limited by either the code of the algorithms themselves or the computer hardware. Most widely used approaches perform a Fourier transform of the captured images, then perform calculations in Fourier space and once this is done, an inverse Fourier transform is done to produce the super-resolution image. These reconstruction algorithms impose a significant computing burden due to a complex workflow and a large number of calculations to produce the final image [17, 18]. This requires several seconds (10-300 per image) which essentially nullifies real-time imaging [8, 19]. In addition, image reconstruction calculations must be performed with great care as artifacts can be introduced into the final images and this is further complicated by the motion of the cell or organelles during imaging [12, 17, 18, 20-22]. Finally, careful selection of hardware components must be done as these too can be rate limiting in algorithm execution. Until recently, most image reconstruction algorithms were executed on central processing units (CPUs), where instructions are executed serially. In contrast, the execution of instructions within the graphics processing unit (GPU) environment is done in a massively parallel fashion and is 10- 100-fold faster [23-25]. Thus, and due to the heavy computing burden, it makes sense to reconstruct super-resolution images in the GPU environment. This was first demonstrated, albeit in a complex fashion using three cameras and multiple computers, by Markwirth et al. [26]. More recently, an improved algorithm that used a simplified workflow called Joint Space and Frequency Reconstruction SIM (JSFR-SIM) was developed [27]. While this algorithm is only 2-fold faster than the widely used Wiener SIM, the conversion of code to the GPU environment resulted in a 77-fold improvement in execution speed. However, the CPU-GPU code conversion is not straightforward, and in addition, the vast majority of SIM image reconstruction code is not written by computer scientists, which implies that there could be performance bottlenecks due to the inefficient code. To improve code execution speed, we developed a set of simple techniques within the framework of MATLAB using the compute-intensive Hessian SIM as the test code and described how to enhance algorithm processing speed [19, 28]. MATLAB is a popular programming language and computing environment for many microscopy researchers as it offers an easy way to write, test, and run many image processing algorithms without background knowledge in computer science. However, the resulting algorithms can suffer from poor performance due to inefficiently written code. When code is optimized, significant speed increases are seen. Execution speeds are further enhanced using GPU-enabled desktop computers with optimized code for the GPU. These lessons were then used to enhance the execution speed of both JSFR- and JSFR-AR-SIM [1, 27]. The results show that the combination of code improvement, conversion to the GPU environment, and use of a GPU-enabled computer, results in a 4- to 500-fold improvement in algorithm execution speed. Importantly, the resulting image quality is identical to that produced by the original algorithm. 2.RESULTSThe scheme to improve code execution is straightforward and uses tools already present in MATLAB. These steps include first identifying both algorithm and hardware bottlenecks as either one or both can contribute to poor execution performance. Details are provided elsewhere but are summarized in the following paragraphs [28]. To identify hardware bottlenecks, the most straightforward approach is to use the Task Manager in Windows and visualize the use of different components during algorithm execution. In the test example shown, that is, using the Hessian SIM algorithm and our baseline computer, the CPU is being slightly used while the GPU is not utilized at all (Fig. 1). These results are directly attributed to how the algorithm was written and is independent of the type of computer used. That is, the code is written inefficiently for the GPU to be idle (less utilized) and designed to be executed on the CPU only, and identical results are observed on more powerful GPU-enabled computers (data not shown; see [28]). Then, using the following functions in MATLAB, one can identify the code bottlenecks - MATLAB Profiler combined with the tic and toc functions to determine function execution times. Then the microscopist must determine how frequently memory is accessed and reduce this to a minimum. This follows because memory access latency (> 60 ns) is much higher than either the CPU or GPU (< 1 ns). That is, the code requiring frequent memory access is one of the major performance bottlenecks. Once these issues have been addressed, the code must be carefully examined to determine if it has been inefficiently written. One typical error is conducting redundant operations in a loop that is sometimes repeated hundreds of times. Since such redundant operations waste CPU cycles without any progress, they must be removed from the loops. In addition to other issues, most software is written to use only a single core within the CPU. Modern CPUs have multiple CPU cores and each can process different tasks independently, i.e., multitasking. Thus, multiple CPU cores can be used for image processing algorithms to improve performance by processing different and independent tasks in parallel, i.e., concurrency. To achieve this, MATLAB provides the addon tool called Parallel Computing Toolbox which allows researchers to exploit multiple CPU cores. Once code has been optimized to execute as rapidly and efficiently as possible on the CPU, it can be converted to run in the GPU environment. The MATLAB Parallel Computing Toolbox also allows researchers to easily exploit the GPUs. As for the CPU code, the GPU code must again be optimized to fully exploit GPU performance via massively parallelism. For example, image data need to be stored in CPU memory and GPU memory to be processed and transferred between them. However, memory access is slow operation which makes CPU and GPU wait and the transfer between CPU and GPU memory is rate-limiting due to the limited bandwidth between these components. Thus, frequent memory access and data transfer between the CPU and the GPU during algorithm execution incurs performance overhead and thus must be avoided. To avoid unnecessary memory access and data transfer, all data required for image processing algorithms in CPU memory can be copied into GPU memory a priori, i.e., pre-allocating. This makes all data processing done in the GPU without additional CPU memory access, which improves performance significantly. The outcome of the improved code on algorithm execution speed is shown in Figure 2. For performance comparison, we use a single 128 x 256 x 180 image stack. At the top of panel A, the improved code for the CPU executes 5-fold faster than the vanilla code (baseline) on a single CPU core. When multiple CPU cores are used, code executes 7-fold faster compared to unmodified code. Finally, when the code is optimized for the GPU environment, the algorithm that took 330 secs to execute is now accomplished in 2 secs, a 165-fold improvement. In addition, the CPU is now used at only 17% of capacity (down almost 2-fold; Fig. 1) and the GPU which was not used at all, is now working at full capacity (Fig. 2B), which shows that the GPU has now become a performance bottleneck. Additional improvements are observed when powerful GPU-enabled computers are used (Table 1) [28]. Here two machines were built. The first used Intel I9 technology and had a GPU, while the second used an AMD Ryzen Threadripper CPU and had two GPUs. For the Intel-based machine, the algorithm takes 670 msec to execute while the AMD- based machine requires 800 msec. We expect further improvements in execution speed for the AMD machine as we did not take full advantage of the two GPUs due to the limited support of MATLAB for multiple GPUs (data not shown). Collectively the combination of improved code, conversion to the GPU environment, and use of powerful GPU-enabled computers results in a 490-fold increase in algorithm execution speed. Table 1.The combination of GPU-optimized code and improved hardware produces maximum performance increases in algorithm execution speed.
a.The three computers used for testing are: (1), Baseline - Dell (XPS 15 9570); Intel Core i7-8750H, 32GB of DDR 4 RAM; 1 SSD (2TB Samsung SSD 970 EVO Plus one NVIDIA GeForce GTX 1050 Ti with Max-Q Design. The operating system is Windows 10. (2), Dell (Precision 3660) with an Intel W680 (Alder Lake-S PCH) motherboard; an Intel Core i9-12900K CPU, 64GB of DDR5 RAM; 2 SSDs (1TB NVMe SK Hynix and 4TB Seagate ST4000DX005) and one NVIDIA RTX 3090 graphics card with 24 GB of GPU memory. The operating system is Windows 11. (3), DigitalStorm computer with an ASUS ROG Zenith II Extreme Alpha motherboard; an AMD Ryzen Threadripper 3990X CPU, 128GB of DDR4 RAM; 3 SSDs (1TB Samsung 970 EVO Plus; 2TB Samsung 860 Pro and a 4TB Samsung 860 Pro) and two, NVIDIA RTX A6000 graphics cards with 48GB of GPU memory each. The operating system is Windows 11. b.ND, not done. Due to time constraints, only some of the above-mentioned improvements could be applied to the JSFR- and JSFR-AR-SIM code (Table 2) [1, 27]. Even with these limited improvements, these data show that each algorithm is executed 20- to 60-fold faster in the GPU environment as compared to the CPU. These speed improvements mean that the combination of acquisition and image processing produces a super-resolution image in 67 to 88 msec. Consequently, using these two modalities, all microscopy is done in super-resolution imaging mode only. In contrast, previous SIM implementations required that an initial field of view be located in widefield mode and then switch to SIM for superresolution imaging. This is a laborious and time-consuming process that is now eliminated. Table 2.The impact of improved code and implementation of the GPU environment on algorithm execution speed.
Once algorithm execution speed has been improved the resulting image quality must be assessed to determine if it is unchanged compared to the original. To do this, images can be compared byte by byte and separately, using Image J (Fig. 3) [28]. This analysis shows that the image quality is identical and the only difference is the execution speed of the image reconstruction algorithm To further demonstrate this, a comparison of images obtained using the GPU-enhanced code is presented [1](Fig. 4). These images are within experimental error, identical but are reconstructed at significantly greater speeds relative to the unenhanced algorithm. Compare Fig 4B and D to Fig. 4C which were reconstructed using GPU-enhanced JSFR- and JSFR-AR-SIM and CPU-executed HiFi-SIM, respectively. 3.CONCLUSIONSThe primary conclusions of this work are that improved code implemented on GPU-enhanced computers results in significantly faster algorithm execution. For structured illumination microscopy, the reconstruction of super-resolution images is sufficiently rapid to enable the microscopist to image in only super-resolution mode, simplifying the workflow while simultaneously obtaining images in less than 90 msec. Due to time constraints, only a limited number of improvements in algorithms could be implemented and the GPU-enhanced computers could not be used for the superresolution imaging. Consequently, we anticipate further speed improvements in algorithm execution speed once all changes are implemented and the enhanced computing environment is taken advantage of. 4.4.REFERENCESWang, Z., et al.,
“Rapid, artifact-reduced, image reconstruction for super-resolution structured illumination microscopy,”
Innovation (Camb), 4
(3), 100425
(2023). Google Scholar
Hirano, Y., A. Matsuda, and Y. Hiraoka, Recent advancements in structured-illumination microscopy toward live-cell imaging. Microscopy (Oxf), 64
(4), 237
–49
(2015). Google Scholar
Heintzmann, R. and T. Huser,
“Super-Resolution Structured Illumination Microscopy,”
Chem Rev, 117
(23), 13890
–13908
(2017). https://doi.org/10.1021/acs.chemrev.7b00218 Google Scholar
Gustafsson, M.G.,
“Surpassing the lateral resolution limit by a factor of two using structured illumination microscopy,”
J Microsc, 198 82
–7
(2000). https://doi.org/10.1046/j.1365-2818.2000.00710.x Google Scholar
Kner, P., et al.,
“Super-resolution video microscopy of live cells by structured illumination,”
Nat Methods, 6
(5), 339
–42
(2009). https://doi.org/10.1038/nmeth.1324 Google Scholar
Wu, Y. and H. Shroff,
“Faster, sharper, and deeper: structured illumination microscopy for biological imaging,”
Nat Methods, 15
(12), 1011
–1019
(2018). https://doi.org/10.1038/s41592-018-0211-z Google Scholar
Zhao, T., et al.,
“Advances in High-Speed Structured Illumination Microscopy,”
Frontiers in Physics, 9
(2021). Google Scholar
Muller, M., et al.,
“Open-source image reconstruction of super-resolution structured illumination microscopy data in ImageJ,”
Nat Commun, 7 10980
(2016). https://doi.org/10.1038/ncomms10980 Google Scholar
Ma, Y., Wen, K., Liu, M., Zheng, J., Chu, K., Smith, Z. J., Liu, L., Gao, P.,
“Recent advances in structured illumination microscopy,”
Jphys Photonics, 3 024009
(2021). https://doi.org/10.1088/2515-7647/abdb04 Google Scholar
Curd, A., et al.,
“Construction of an instant structured illumination microscope,”
Methods, 88 37
–47
(2015). https://doi.org/10.1016/j.ymeth.2015.07.012 Google Scholar
Smith, C.S., et al.,
“Structured illumination microscopy with noise-controlled image reconstructions,”
Nat Methods, 18
(7), 821
–828
(2021). https://doi.org/10.1038/s41592-021-01167-7 Google Scholar
Fan, J., et al.,
“A protocol for structured illumination microscopy with minimal reconstruction artifacts,”
Biophysics Reports, 5
(2), 80
–90
(2019). https://doi.org/10.1007/s41048-019-0081-7 Google Scholar
Pospíšil, J., K. Fliegel, and M. Klíma,
“Analysis of image reconstruction artifacts in structured illumination microscopy,”
in Proc. SPIE 10396, Applications of Digital Image Processing XL,
1
–12
(2017). Google Scholar
Wen, G., et al.,
“High-fidelity structured illumination microscopy by point-spread-function engineering,”
Light Sci Appl, 10
(1),
(2021). https://doi.org/10.1038/s41377-021-00513-w Google Scholar
Sahl, S.J., et al.,
“Comment on “Extended-resolution structured illumination imaging of endocytic and cytoskeletal dynamics,”
Science, 352
(6285), 527
(2016). https://doi.org/10.1126/science.aad7983 Google Scholar
Heintzmann, R. and M.G. Gustafsson,
“Subdiffraction resolution in continuous samples,”
Nature Photonics, 3
(7), 362
–364
(2009). https://doi.org/10.1038/nphoton.2009.102 Google Scholar
Wicker, K.,
“Non-iterative determination of pattern phase in structured illumination microscopy using autocorrelations in Fourier space,”
Opt Express, 21
(21), 24692
–701
(2013). https://doi.org/10.1364/OE.21.024692 Google Scholar
Chu, K., et al.,
“Image reconstruction for structured-illumination microscopy with low signal level,”
Opt Express, 22
(7), 8687
–702
(2014). https://doi.org/10.1364/OE.22.008687 Google Scholar
Huang, X., et al.,
“Fast, long-term, super-resolution imaging with Hessian structured illumination microscopy,”
Nat Biotechnol, 36
(5), 451
–459
(2018). https://doi.org/10.1038/nbt.4115 Google Scholar
Schaefer, L.H., D. Schuster, and J. Schaffer,
“Structured illumination microscopy: artefact analysis and reduction utilizing a parameter optimization approach,”
J Microsc, 216
(Pt 2), 165
–74
(2004). https://doi.org/10.1111/jmi.2004.216.issue-2 Google Scholar
Forster, R., et al.,
“Motion artefact detection in structured illumination microscopy for live cell imaging,”
Opt Express, 24
(19), 22121
–34
(2016). https://doi.org/10.1364/OE.24.022121 Google Scholar
Zhou, X., et al.,
“Image recombination transform algorithm for superresolution structured illumination microscopy,”
J Biomed Opt, 21
(9), 96009
(2016). https://doi.org/10.1117/1.JBO.21.9.096009 Google Scholar
Gong, H., W. Guo, and M.A.A. Neil,
“GPU-accelerated real-time reconstruction in Python of three-dimensional datasets from structured illumination microscopy with hexagonal patterns,”
Philos Trans A Math Phys Eng Sci, 379
(2199), 20200162
(2021). Google Scholar
Lu, G., et al.,
“A real-time GPU-accelerated parallelized image processor for large-scale multiplexed fluorescence microscopy data,”
Front Immunol, 13 981825
(2022). https://doi.org/10.3389/fimmu.2022.981825 Google Scholar
Aydin, M., Uysalli, Y., Ozgonul, E., Morova, B., Kiraz, A.,
“An LED-based Super Resolution GPU Implemented Structured Illumination Microscope,”
Single Molecule Spectroscopy and Superresolution Imaging XIII, 1124610 SPIE, San francisco, CA
(2020). Google Scholar
Markwirth, A., et al.,
“Video-rate multi-color structured illumination microscopy with simultaneous real-time reconstruction,”
Nature communications, 10
(1), 4315
(2019). https://doi.org/10.1038/s41467-019-12165-x Google Scholar
Zhaojun Wang, T.Z., Huiwen Hao, Yanan Cai, Kun Feng, Xue Yun, Yansheng Liang, Shaowei Wang, Yujie Sun, Piero, R. Bianco, Kwangsung Oh, Ming Lei,
“High-speed image reconstruction for optically sectioned, super-resolution structured illumination microscopy,”
Advanced Photonics, 4
(2), 026003
(20222022). Google Scholar
Oh, K. and P.R. Bianco,
“Facile conversion and optimization of structured illumination image reconstruction code into the GPU environment,”
International Journal of Biomedical Imagingsubmitted,
(2023). Google Scholar
|