|
1.INTRODUCTIONThe introduction of conformal radiotherapy enables high-precision dose delivery to the tumor and spares surrounding normal tissues, enabling treatment margin reduction, dose escalation, and improved tumor control [1]. However, internal anatomical motion such as respiratory or cardiac motion leads to tumor location uncertainties, and may cause the radiation beams to miss the tumor and damage normal tissues. Image-guided radiation therapy widely uses X-ray based imaging to localize the tumor before and during the treatment to maintain the delivery accuracy [2]. Real-time imaging, in particular, is highly desired as it can localize the tumor instantly and allow the treatment to adapt to such real-time changes to achieve ultimate treatment accuracy. Due to the stringent temporal resolution requirement (hundreds of milliseconds) of real-time imaging, the volumetric information will be severely under-sampled via current mainstream imaging modalities including cone-beam computed tomography (CBCT). Such a degree of under-sampling makes it impossible to reconstruct high-quality CBCTs using conventional methods for tumor localization. Due to the recent successes of deep learning (DL), several groups have proposed DL-based methods for real-time imaging. A few network architectures were proposed to reconstruct three-dimensional (3D) CBCT images from singleview or orthogonal-view X-ray projections [3-5]. Such networks, however, were built on an ill-conditioned problem, trying to estimate high-dimensional volumetric data from a single X-ray projection. Considerable reconstruction errors remain, albeit much smaller than those of the conventional reconstruction algorithms. Moreover, to track tumors in realtime, additional steps of image registration or segmentation are necessary to further localize tumors from the reconstructed CBCT images. This is particularly challenging for liver tumors due to the low contrast of liver tumors against surrounding normal liver parenchyma. To address the above challenges toward real-time imaging, especially for liver tumor localization, we propose a mesh registration-based method combining deep neural networks with biomechanical modeling. The method directly solves the liver tumor motion between a prior CT/CBCT image and a single X-ray projection to localize liver tumors in realtime, and effectively eliminates the need to reconstruct a high-quality, intermediate CBCT image prior to localization. Specifically, a deep graph neural network-based architecture was trained to model the correlation between patient-specific liver boundary motion and features learned on individual X-ray projections. The trained network can then predict liver boundary motion from a single real-time X-ray projection. Using the predicted liver boundary motion as the boundary condition, we further performed finite element analysis-based biomechanical modeling of liver to solve intraliver tumor motion. The method adopts a deformation-driven approach that incorporates prior information to tackle the extreme under-sampling issue. The two-step-based registration scheme simplifies the complexity of the deep graph neural network with introduced domain knowledge (biomechanical modeling). Biomechanical modeling uses information including structure geometry, material composition and elasticity to derive physiologically and physically meaningful deformation, and complements the intensity information provided in the X-ray projection to further improve the registration and tumor localization accuracy [6]. The accuracy of the proposed technique was evaluated using 10 patients with liver cancer, and compared with the accuracy of two other techniques. The first technique uses the diaphragm as an anatomic landmark, and tracks the diaphragm motion directly from the on-board projections via template matching to represent liver tumor motion. The second technique is a principal component analysis (PCA) based method which models 3D motion into a few motion eigenvectors for dimension reduction and tumor tracking [7]. 2.MATERIALS AND METHODS2.1Method overviewIn this study, liver motion and liver tumor localization were solved via deformable registration between a liver mesh (extracted offline from a prior CT/CBCT image available before the treatment) and the liver features projected on a single X-ray projection (Fig. 1). The registration was achieved via two steps: (a) liver surface motion estimation via a deep graph neural network-based structure (Fig. 2); and (b) intra-liver motion estimation via biomechanical modeling. Specifically, in step (a) a patient-specific DL model was trained to predict a liver boundary deformation vector field (DVF) that deforms the prior liver surface mesh to match with the liver shape variations encoded in the X-ray projection. In step (b), a biomechanical model of the liver was built, and an intra-liver DVF was solved through finite element analysis using the liver boundary DVF as the boundary condition. 2.2The deep-learning network architectureThe network was trained to learn image features from on-board X-ray projections to predict boundary movement from the prior liver mesh to each on-board projection. The DL network architecture is illustrated in Fig. 2, which contains two subnetworks. The first subnetwork extracts image features from each on-board X-ray projection, and the extracted feature maps are pooled and fed into the second subnetwork for liver boundary DVF prediction. Here we used ResNet-50 [8] as the feature extraction network. Consisting of a series of convolutional layers stacked in a residual learning architecture, ResNet-50 extracts encoded liver shape variations, via local and global image features contained in the X-ray projection, and learns short- and long-range dependencies among these extracted features. These learned dependencies are helpful for the deformation estimation because they are shown common in respiration-induced liver motion. The perceptual feature pooling layer pools the ResNet-50 extracted feature maps by associating each 3D node of a liver surface mesh with a 2D point in the feature maps, based on the same geometry of the cone-beam projection. The second subnetwork comprises a series of deformation blocks that progressively deforms the liver surface mesh nodes based on the extracted feature maps from the first subnetwork. Each deformation block involves a graph convolutional network (GCN, Fig. 3) and a spatial transform layer that deforms the liver surface mesh using the GCN-predicted DVF [9]. GCN performs graph-based convolutions that generalize the standard convolution operations to data structures lack of underlying Euclidean structures, such as functional networks in brain imaging. A non-Euclidean data structure can be represented by a weighted graph comprised of a set of vertices, edges connecting the vertices, and weights associated with each vertex (e.g., vertex features, DVFs, vertex-associated image features). The use of GCN is indicated for our problem, as the liver surface mesh nodes, the geometrical connectivity (edges) between the nodes, and the learned image, DVF, and vertex features associated with each node make a standard non-Euclidean data structure for inputs into the GCN. Using extracted image features that encode the liver shape, preceding DVFs, and learned vertex features from the previous block, the GCN learns to predict a liver surface DVF to further deform the surface mesh deformed by the previous block. The inputs of the GCN in the first deformation block contain ResNet50-extracted image features and an initial DVF which was set to be zero. For each subsequent GCN, the image features were re-pooled based on the new node coordinates (Fig. 2), deformed via the spatial transform layer and the DVF predicted by the preceding block. The image features were then input into the subsequent GCN, along with the predicted surface DVF and the learned vertex features from the preceding block. We used a GCN of the same architecture as the G-ResNet [9], which is illustrated in Fig. 3. The corresponding network was modified and adapted from the Pixel2Mesh library [10]. The model training was driven by a loss function involving a mesh similarity loss and regularization losses that regularize the deformation and enforce smoothness of the boundary DVFs. 2.3Biomechanical modelingAfter the deep neural network solves a liver boundary DVF to match with the liver shape features on the X-ray projection, the intra-liver DVF was subsequently derived using a biomechanical model. Here we used the Mooney- Rivlin material model, which describes a hyperelastic (i.e., nonlinear elasticity) material that fits biological tissues well. The details of implementing the biomechanical model can be found in Ref. [6]. 2.4Dataset curation and augmentationA dataset of 10 patients with liver cancer from our institute was used to evaluate the proposed method. The study was approved under an institutional review boards protocol. Each patient had a contrast-enhanced four-dimensional CT set from treatment planning, and the CT images were binned into 10 respiratory phases (from 0% to 90%), with 0% being the end-of-inhale phase. The CT images were resampled to a uniform size of 256×256×128 with an isotropic resolution of 2 mm×2 mm×2 mm. On-board X-ray cone-beam projections were simulated from the CT images using a ray-tracing algorithm. We simulated projections from three angles: 0, 45, and 90 degrees. The 0- and 90-degree are for anteriorposterior and left-right directions, respectively. Since each patient had only a 10-phase 4D-CT set, to generate sufficient motion variation scenarios to train the patient-specific network and avoid overfitting, we augmented the dataset of each patient by simulating realistic respiratory deformations encountered in on-board liver imaging. The augmentation was based on a PCA-based motion model of each patient [6, 11]. We first performed deformable registrations between the reference 0% phase and the other phases to attain DVFs, using the open-source software package Elastix. To improve the intra-liver DVF accuracy, we applied biomechanical modeling to derive intra-liver DVFs, using the liver surface DVFs solved by Elastix as boundary conditions. We then replaced the Elastix intra-liver DVFs with the biomechanical modeling-derived intra-liver DVFs. PCA was subsequently performed on these high-quality DVFs of each patient to obtain patient-specific principal motion components. For augmentation, the coefficients of the first three principal motion components were randomly scaled to re-generate DVFs of various magnitudes and patterns [11]. In total, for each patient we generated 1,728 augmented samples which were partitioned into training, validation, and testing sets. The partitioning was assigned according to the original respiratory phases of the PCA coefficients prior to the random scaling. The training set includes the samples of which the original PCA coefficients were from the 10% to 40% phases; the validation set includes the samples whose original PCA coefficients were from the 60% and 70% phases; and the testing set includes the samples whose original PCA coefficients were from the 50%, 80%, and 90% phases. 50% is the end-of-exhale phase that has the largest deformation from the 0% phase. 2.5Evaluation schemesThe deformation accuracy of liver surface meshes was evaluated using the Hausdorff distance (HD) between the deformed and the ‘ground-truth’ target liver surface meshes extracted from the augmented dataset [12]. To evaluate the performance of liver tumor tracking, we manually contoured the tumors from the prior CT images at phase 0%. The tumor contours at 0% phase were then propagated using the augmentation DVFs (II.D.) to other augmented motion states, which were used as the ‘ground-truth’ to evaluate the ones deformed by our method. The accuracy of liver tumor tracking was evaluated by the Dice similarity score (DSC), center-of-mass error (COME), and HD. 3.RESULTS3.1Liver deformation accuracyFigure 4 presents a qualitative comparison of liver surface meshes and projected nodes on X-ray projections at three projection angles (0, 45, and 90 degrees). The first row shows the surface mesh overlays between the prior and ‘ground - truth’ target meshes (left panel), and between the graph network-deformed and ‘ground-truth’ target meshes (right panel). The prior and target meshes correspond to the end-of-inhale phase and the end-of-exhale phase (with motion augmentation), respectively. The other rows show the overlay of the pre- (left panel) and post-registration (right panel) surface mesh nodes onto the corresponding X-ray projections of the end-of-exhale phase, at three different angles. Both the surface mesh overlay and the node-projection overlay demonstrate high registration accuracy. Table 1 summarizes the mean (±s.d.) liver HDs of the proposed method and the PCA-based 2D-3D registration method [7]. The DL-based method results in much smaller HDs than the PCA-based 2D-3D method. Table 1.Mean (±s.d.) liver Hausdorff distances.
3.2Liver tumor tracking accuracyTable 2 summarizes the mean (±s.d.) liver tumor DSCs, COMEs, and HDs of the proposed method at three projection angles. In addition, the results of PCA-based 2D-3D registration and diaphragm tracking are also presented in the Table for comparison. The diaphragm tracking is only able to localize the diaphragm in 2D from a single X-ray projection, thus we only used it to represent liver tumor motion along the superior-inferior (SI) direction. The COME of the diaphragm-based method is thus only for SI direction, with 3D COME potentially being much larger. Table 2 clearly shows that the proposed method has the best liver tumor localization accuracy, and the performance is consistent among different angles. Table 2.Mean (±s.d.) liver tumor DSC, COME, and HD. The COME for the diaphragm-based method is for the superiorinferior direction only (*).
REFERENCESVerellen, D., et al.,
“Innovations in image-guided radiotherapy,”
Nature Reviews Cancer, 7
(12), 949
–960
(2007). https://doi.org/10.1038/nrc2288 Google Scholar
Dhont, J., et al.,
“Image-guided Radiotherapy to Manage Respiratory Motion: Lung and Liver,”
Clinical Oncology, 32
(12), 792
–804
(2020). https://doi.org/10.1016/j.clon.2020.09.008 Google Scholar
Shen, L.Y., W. Zhao, and L. Xing,
“Patient-specific reconstruction of volumetric computed tomography images from a single projection view via deep learning,”
Nature Biomedical Engineering, 3
(11), 880
–888
(2019). https://doi.org/10.1038/s41551-019-0466-4 Google Scholar
Lei, Y., et al.,
“Deep learning-based real-time volumetric imaging for lung stereotactic body radiation therapy: a proof of concept study,”
Physics in Medicine and Biology, 65
(23),
(2020). https://doi.org/10.1088/1361-6560/abc303 Google Scholar
Wei, R., et al.,
“Real-time tumor localization with single x-ray projection at arbitrary gantry angles using a convolutional neural network (CNN),”
Physics in Medicine and Biology, 65
(6),
(2020). https://doi.org/10.1088/1361-6560/ab66e4 Google Scholar
Zhang, Y., et al.,
“4D liver tumor localization using cone-beam projections and a biomechanical model,”
Radiother Oncol, 133 183
–192
(2019). https://doi.org/10.1016/j.radonc.2018.10.040 Google Scholar
Zhang, Y., et al.,
“A technique for estimating 4D-CBCT using prior knowledge and limited-angle projections,”
Medical Physics, 40
(12),
(2013). https://doi.org/10.1118/1.4825097 Google Scholar
Paszke, A., et al.,
“PyTorch: An Imperative Style, High-Performance Deep Learning Library,”
Advances in Neural Information Processing Systems 32 (Nips 2019), 32
(2019). Google Scholar
Wang, N.Y., et al.,
“Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images,”
Computer Vision - Eccv 2018, Pt Xi, 11215 55
–71
(2018). https://doi.org/10.1007/978-3-030-01252-6 Google Scholar
Cao, J.,
“Pixel2Mesh,”
(2021) https://github.com/noahcao/Pixel2Mesh January 2022). Google Scholar
Jiang, Z., et al.,
“Enhancing digital tomosynthesis (DTS) for lung radiotherapy guidance using patient-specific deep learning model,”
Phys Med Biol, 66
(3), 035009
(2021). https://doi.org/10.1088/1361-6560/abcde8 Google Scholar
Taha, A.A. and A. Hanbury,
“Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool,”
Bmc Medical Imaging, 15
(2015). Google Scholar
|