Open Access
1 April 2001 Use of penalty terms in gradient-based iterative reconstruction schemes for optical tomography
Andreas H. Hielscher, Sebastian Bartel
Author Affiliations +

1.

Introduction

Optical tomography (OT) is a fast growing field in which near-infrared light is used to image the distribution of optical properties inside the human body. Optical properties of interest are, for example, the absorption coefficient μa, the reduced scattering coefficient μs , or the diffusion coefficient D=c/(3μa+3μs ), where c is the speed of light in the medium. The instrumentation for making light transmission measurements that are necessary for OT is nowadays widely available.1 2 3 4 5 6 Furthermore, several algorithms that transform these measurements into useful cross-sectional images have matured to such a degree that first clinical trials are underway, especially in breast imaging.7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 However, a major difficulty in OT remains that the image reconstruction problem is ill-posed or underdetermined. In other words, there are many distributions of optical properties inside the medium under investigation that lead to the same set of detector readings on the surface of the medium.

Depending on the underlying structure and concepts of any particular reconstruction scheme, the problem of ill-posedness can be approached in different ways. Most of the currently employed reconstruction schemes fall in one of two classes, which we refer to as the linear-perturbation approach and the nonlinear-gradient method (see also Ref. 30). In both cases the goal is to minimize the difference between intensities measured on the boundary of the medium and some prediction for those measured values. Therefore the image reconstruction problem may be interpreted as an optimization problem in which an objective function is minimized. If we use the χ2 error norm to calculate the difference between measurements and predictions we can define the objective function as

Eq. (1)

Φ(ζ)=χ2(ζ)sd(Ms,dPs,d(ζ))22σs,d2.
In this equation the parameter ζ is a vector that contains the optical properties at all positions in the medium. If the image is discretized into n pixels, and each pixel can vary in both absorption and scattering coefficient, the vector ζ is of length N=2n. Ps,d(ζ) is the predicted reading at detector location d, when light is injected at location s. The Ps,d(ζ) values are calculated with a forward model, e.g., by solving the diffusion equation for the given medium.27 Ms,d is the measured value at detector position d given a source at s. The parameter σs,d is a normalization constant.

In the linear perturbation approach, it is assumed that we have an estimate ζ 0 that is close to the true distribution ζ. In this case we can perform a Taylor expansion of Ps,d around ζ 0. Neglecting nonlinear terms, we obtain

Eq. (2)

Ps,d(ζ)=Ps,d(ζ0)+Ps,d(ζ)/ζ(ζζ0)Ps,d(ζ)=Ps,d(ζ0)+Js,dΔζ,
where J s,d=∂Ps,d(ζ)/∂ζ is a row in the Jacobian matrix J, which is often also referred to as the weight-function matrix. If Q is the number of source–detector pairs (s,d) and N the number of unknowns in the problem, then J is a Q×N matrix. The vector Δζ=ζζ 0 is the difference in optical properties between the estimated and actual medium given by ζ. Inserting Eq. (2) into Eq. (1) yields:

Eq. (3)

Φ(ζ)=sd(Ms,dPs,d(ζ0)Js,dΔζ)22σs,d2.
Therefore, in this case, minimizing the functional in Eq. (1) is equivalent to solving the equation

Eq. (4)

JΔζ=MPJJTΔζ=JT(MP),
where M and P are vectors that contain all Ms,d and Ps,d(ζ 0) values, respectively. The image reconstruction problem is to find Δζ and to determine the image given by (Δζ+ζ). The matrix J T is the transpose of J. Approaching the imaging problem this way, ill-posedness means that the quadratic matrix JJ T is ill conditioned. Therefore, the determinant of the matrix JJ T is almost zero and many different Δζs solve Eq. (4). A standard way of overcoming this problem is to make JJ T diagonally dominant.17 30 31 Obviously, this is most easily accomplished by adding a diagonal matrix and solved

Eq. (5)

(JJT+λI)Δζ=JT(MP),
where I is the identity matrix and λ is usually referred to as the regularization parameter or hyperparameter. The goal is now to find a λ that is large enough to avoid problems encountered with ill-conditioned matrixes, and at the same time small enough to not completely alter the basic relation defined in Eq. (1). For example, Jiang et al.;,18 Paulsen and Jiang,17 and Pogue et al.;32 have derived such diagonal matrices for optical tomography.

More recently several groups (Saquib et al.;,24 Hielscher et al.;,25 27 28 Arridge and Schweiger,26 Roy and Sevick,29 and Ye et al.;33) have developed so called gradient based iterative reconstruction (GIIR) schemes that do not solve Eqs. (4) or (5) to obtain an update of ζ. Instead, starting from an initial guess ζ0 subsequent distributions ζ k+1 are obtained by calculating

Eq. (6)

ζk+1=ζk+αAg,
where g=∂Ф(ζ)/∂ζ is the gradient of the objective function [Eq. (1)] in column vector form of length N, A is an N×N matrix, and α is a real number representing the step size in the direction of the gradient. For the case of steepest gradient descent A equals the identity matrix, with only ones on the diagonal. The major advantage of GIIR algorithms over other currently employed algorithms is that no inversion of a full, ill-conditioned Jacobian matrix is necessary to obtain an update Δζ of the optical properties in the medium. In GIIR schemes a Jacobian is calculated as part of the gradient calculation of the objective function. Once this gradient is found, a line minimization of the objective function along the direction of the gradient is performed to find the update for the optical properties. A more detailed description and discussion of GIIR algorithms can be found elsewhere.24 25 26 27 28 29 30

The question arises: what does ill-posedness mean using the GIIR approach? In the perturbation approach ill-posedness is identical to an ill-conditioned matrix JJ T. In the GIIR scheme, which interprets image reconstruction as a multidimensional minimization problem, ill-posedness means that a global minimum is not well defined. A global minimum is not well defined when either many minima with similar values exist or one minimum ζ min , which is surrounded by many other ζ, result in almost identical objective functions. Examples of such objective functions in optical tomography have been described, for example, by Arridge,30 Schweiger and Arridge34 and Hielscher et al.;35 In general, the reconstruction result will strongly depend on the initial guess ζ 0. These phenomena are well known in the field of general optimization theory. A typical way to overcome this problem is to add penalty terms that provide additional constraints on the solution space. The penalty term may push the solution into the right area of the solution space and the minimization of the resulting objective function may provide a better-defined minimum. While the use of penalty terms has been studied for a variety of problems,36 37 38 39 40 41 42 43 44 their use in GIIR schemes for optical tomography has not been explored.

The goal of this paper is to introduce and study the effects of various penalty functions on the reconstruction process in OT. In particular, we seek to derive penalty function from a priori knowledge about the systems under investigation. A priori knowledge is, in general, information that does not depend on the difference between predicted and measured data. For example, we know that optical properties are larger than zero. Furthermore in the near-infrared wavelength region μa<10 cm −1 and μs <100 cm −1. In addition, it is often known how many different tissue types, and therefore how many different optical properties, are present in the interrogated medium. For example, in the brain we find cerebrospinal fluid, gray, and white matter. If magnetic resonance imaging data are available, one may even know the location of these tissues. The questions arise: how can this additional a priori knowledge be appropriately cast into a penalty function and can using this information improve image quality?

To illustrate the effects of penalty functions derived from a priori knowledge about the system, we will focus in this work on information regarding the composition of a given tissue volume. We consider the following cases: First, it is assumed that one knows that n tissue types with n different optical properties are present in the medium. However one neither knows the location nor their respective volume fraction of any given component in the tissue sample. Second, we will assume to have prior knowledge of all tissue types present and the volume percentage they occupy. However, just as in the first case, one does not know where the different tissue types are located inside the medium. This latter point corresponds to knowing the histogram of the medium to be reconstructed. We will derive the appropriate penalty terms for both cases. Numerical examples that demonstrate the effect of these penalty terms on the quality of the reconstructed images are shown and discussed. Furthermore, the relationship between penalty functions in gradient-based schemes and algebraic regularization mechanisms used in linear-perturbation schemes is discussed and put into context with our results.

2.

Mathematical Background

2.1.

Gradient-Based Iterative Image Reconstruction

Our gradient-based iterative image reconstruction scheme has three major components. (1) Forward Model. This model is a theory or algorithm that predicts a set of measured signals U given the positions rs of the light sources and the spatial distribution of optical properties ζ. In this work we use as the governing equation for light propagation in tissue the time-dependent diffusion equation ∂U/∂t=∇(D∇U)−cμaU+S. 27 As optical parameters of interest we chose ζ=(cμa(r),D(r)). (2) Analysis Scheme. Here an objective function Ф is defined, which describes the difference between the measured M and predicted data P. An example is the least-square error norm, also called χ2 norm, described in Eq. (1). (3) Updating Scheme. Once the objective function is defined, the task becomes to minimize Ф. This is accomplished in two substeps. First the gradient of the objective function dФ(ζ)/dζ is calculated by means of adjoint differentiation. Second, given the gradient an iterative line minimization in the direction of the gradient is performed. This step is refereed to as inner iteration and consists of several forward calculations in which the parameters ζ are varied. Once the minimum along the line is found, a new gradient is calculated at this minimum (outer iteration) and another line minimization is performed, now along a different direction in the ζ space. These steps are repeated until a distribution ζ is found for which Ф(ζ) is smallest. A more detailed description of the GIIR algorithm used in this work can be found elsewhere.27

2.2.

Penalty Functions

The incorporation of additional knowledge about the image to be reconstructed can be achieved by including a penalty term into the definition of the objective function. Instead of only trying to minimize the difference between the measurements and predictions, as is done if the objective function equals the χ2 error norm defined in Eq. (1), one can define the objective function as

Eq. (7)

Φ(ζ)=χ2(ζ)+ωΠ(ζ),
where Π(ζ) denotes the penalty term and ω is the coupling parameter, also often referred to as the hyperparameter. The additional term ωΠ(ζ) is designed to provide a better-defined minimum in the solution space and push the gradient scheme toward more probable solutions. The latter is achieved by penalizing certain distributions ζ that are unlikely given our prior information of the system under investigation. Ideally, we expect Π(ζ) to constrain the N-dimensional minimum search to a small subspace of desirable solutions.

The penalty term should be continuously differentiable to be useful in GIIR schemes. Therefore the derivative of the penalty term with respect to the optical properties should not have any discontinuities. In general, smooth functions are much better suited for any gradient based minimum search. We will address these restrictions further when we derive the respective penalty functions.

2.2.1.

Tissue-Type Penalty Functions

In this work we seek to provide penalty functions that are rooted in a particular kind of prior knowledge, namely the different types of tissues that are most likely to be found in the sample under investigation. The optical properties of these tissues are known within certain error margins. Areas of the reconstructed image that show any other than the expected optical parameters should therefore be penalized the more they differ from the expected values. Under these considerations we define the following penalty function:

Eq. (8)

Πtt(ζ)=xS(1k=1Kexp((akζx)22σk2)).
Here K is the number of different tissue types k in the system S. The parameter ak is the most likely optical property of tissue k, and ζx is the reconstructed optical property at pixel x. The parameter 1/σk can be interpreted as the confidence that ak is the exact value. For 1/σk→∞ (⇔σk→0) the Gaussian becomes a delta function that is 1 only if ζx=ak and zero otherwise. Therefore, only if ζx=ak does the penalty term disappear. In this case Eq. (8) is not continuously differentiable. For any nonzero value of σk there will be a range of ζx for which the penalty term changes smoothly from 1 to 0 and Eq. (8) becomes continuously differentiable. An example is shown in Figure 1, where Π tt is plotted for the case that three different tissue types with optical properties D1, D2, and D3 are present and σk=0.1 cm 2ns −1 for all three D values. The derivative of this penalty function with respect to a pixel value ζx is easily calculated as

Figure 1

Example of a tissue-type penalty function Π tt [Eq. (8)] that assumes three different types of tissues (k=3), with D1=0.52 cm 2ns −1, D2=0.92 cm 2ns −1, and D3=1.44 cm 2ns −1.

016102j.1.jpg

Eq. (9)

Πtt(ζ)ζx=k=1Kakζxσk2exp((akζx)22σk2).

2.2.2.

Histogram Penalty Function

If not only the various types of tissue in the sample are known but additionally their respective volume fraction, we are able to provide a most likely histogram H0 to the reconstruction scheme. In general, a histogram maps the N pixel values of an image represented by ζ=(ζ12,…,ζx,…,ζN) onto L discrete intervals or bins of width Δζll−ζl−1(l=1,2,…,L). Therefore, the histogram H(ζl,ζ) associated with an image S or given set ζ of optical properties is defined by the following sum over all pixels x of the image

Eq. (10)

H(ζl,ζ)xSδ(ζl,ζx),  l=1,,L,
where

Eq. (11)

δ(ζl,ζx){1for ζl1<ζxζl0otherwise.
The sum in Eq. (10) sorts all pixel values ζx into L bins of the histogram, since the δ-function only contributes to the sum H(ζl,ζ) if a pixel lies within the corresponding interval.

We can now define a penalty term Π Hist , which evaluates the histogram H(ζl,ζ) of a reconstructed image relative to the expected histogram H0l,ζ). Here we choose the χ2 error norm between both functions and define the penalty term

Eq. (12)

ΠhistχH0,H2=l=1L (H0(ζl)H(ζl))2H0(ζl),
where we use the short notation H(ζl) for H(ζl,ζ).

However, if the definition of the delta function in Eq. (11) is considered, we observe that Eq. (12) is not continuously differentiable, which makes it unsuitable for gradient based schemes. To overcome this problem we consider a representation of the delta function that uses a Gaussian

Eq. (13)

δ(ζl,ζx)=limσ01/(2πσ2)exp((ζlζx)22σ2).
With this we can calculate the derivative and obtain

Eq. (14)

Πhistζx=l=1L21/(2πσ2)(H0(ζl)H(ζl))H0(ζl)(ζlζx)σ2exp((ζlζx)22σ2),
where we omitted the lim. By choosing an appropriate nonzero value for σ and calculating the image histogram using Eq. (10) and the penalty term Eq. (12), we obtain a penalty term that is continuously differentiable and can be applied within the GIIR framework. Typically, we chose σ, so that the singular peaks overlapped (see Figure 2) to avoid forbidden intervals in the target histogram H0.

Figure 2

Ideal histogram Hδ for tissue that contains 15 tissue type 1 (D1=0.520 cm 2ns −1), 75 tissue type 2 (D2=0.92 cm 2ns −1), and 10 tissue type 3 (D3=1.44 cm 2ns −1) and smoothed histogram H. The smoothed histogram results from convolution of Hδ with a Gaussian function of width σ=0.15 [Eq. (10) and Eq. (13)].

016102j.2.jpg

Using Eq. (13) with a nonzero σ to generate H(ζl) results in a smoothing operation on the histogram and is equivalent to convolving the exact histogram with a Gaussian of width σ. It may seem that the histogram thus obtained is blurred to an extent that spoils the goal of reconstructing only certain types of tissue. However, since we perform the same convolution on both, the histogram H of the reconstructed image and the target histogram H0 before they are compared [Eq. (12)] becomes minimal, if and only if the unconvolved functions are identical. Hence, let (Hδ 0,Hδ) be accurate histograms [Eq. (10)] and (H0,H) be their convolution with Eq. (13) (with nonzero σ), then

Eq. (15)

Hδ0=Hδ  H0=H.
Consequently, by minimizing the χ2 norm between two blurred histograms, we are at the same time minimizing the difference between the underlying sharp histograms. However, using the blurred histogram provides us with a continuously differentiable penalty term that is better suited for use in gradient based minimization schemes.

2.2.3.

The Hyperparameter

Each of the penalty functions has to be coupled to the error norm χ2 [Eq. (1)] with a hyperparameter. This parameter fixes the relative strength of the penalty term in the minimization scheme and describes the confidence one has that the additional information is correct. A good starting point is to choose ω in a way which ensures that the gradients of both the χ2 term and the penalty function are of similar magnitude. If we interpret the χ2 term and the ωΠ term as potentials, the derivatives ∂χ2/∂ζ and ω∂Π/∂ζ can be interpreted as two forces that pull the pixel values in certain directions. By defining a new hyperparameter

Eq. (16)

  γ=|ωΠζ|/|χ2ζ|,
we can adjust the relative strength of these forces, originating from the χ2 potential and the Π potential, respectively. If γ=1 then ω equals the ratio of the gradients and both forces are equally strong. If γ>1 the force due to penalty function ωΠ is stronger than due to the χ2 term, and if γ<1 the χ2 term has a stronger influence. We will later see how the choice of γ influences the reconstruction results.

3.

Results

3.1.

Problem Setup

To test how assumptions about the medium can improve the image reconstruction we consider the following example. Given is an 8×8 cm medium that contains two objects (see Figure 3). The speed of light in this medium is considered to be constant c=22 cm ns −1. The optical properties of the background medium are given by μa=0.1 cm −1, μS =8 cm −1 (⇒D=c/(3μa+3μs )=0.905 cm 2ns −1). The optical properties of the inclusions are given by μa=0.1 cm −1, μS =14.0 cm −1 (⇒D=0.520 cm 2ns −1), and μa=0.1 cm −1, μS =5.0 cm −1 (⇒D=1.438 cm 2ns −1), respectively. The medium is discretized into a 40×40 mesh with Δx=0.2 cm. Four sources, one centered at each side, surround the medium. For each source 20 detector readings are available, four on each side and one on each corner. These detector readings were simulated by using the time-dependent finite-difference forward model and adding Gaussian noise with a signal to noise ratio of 30 db to the result. The simulated detector readings each consists of 100 time-dependent fluence rates, with Δt=0.01 ns. Therefore we have 4×20×100=8000 data points. From these points the χ2 term is calculated as

Eq. (17)

χ2=sdt(Ms,dtPs,dt(D,μa)Ps,dt(D,μa))2,
where D and μa are vectors of length 40×40=1600, which contain the diffusion and absorption coefficients throughout the medium.

Figure 3

(a) Composition of two-dimensional example problem. An 8×8 cm domain is divided into a 40×40 grid. Source positions are indicated with white circles and detectors with black circles. (b) Reconstruction without penalty functions after 20 iterations. The initial guess for the first iteration is a homogeneous medium with D=1 cm −2ns −1.

016102j.3.jpg

To quantify the quality of the image reconstruction, we furthermore define the relative image error as

Eq. (18)

IE1N nN(DntargetDnwp)2(DntargetDnwop)2
where N is the number of pixels in the image, Dn target is the correct diffusion coefficient at pixel position n, Dn wop is the diffusion coefficient at pixel position n reconstructed without using a penalty term, and Dn wp is the diffusion coefficient at pixel position n reconstructed using a penalty term. Defining the image error this way allows us to quantify the improvement in image quality when using a penalty function as compared to not using a penalty function. A value of IE=1 indicates that the penalty term yields neither an improvement nor degradation in image quality.

3.2.

Reconstruction Without Penalty Terms

We first use the GIIR algorithm to minimize the χ2 term, without any additional penalty terms. As the initial guess we chose D=1 cm 2ns −1 and μa=0.1 cm −1 for all points in the medium. Therefore, we only reconstruct the diffusion coefficient D in this case, since the initial guess for μa equals the original value. The result of the reconstruction after 20 iterations is shown in Figure 3(b). The general features of the medium are recovered. The diffusion coefficient D in the larger inclusion is increased while D in the smaller inclusion is decreased. The minimal and maximal values of D in the heterogeneities are D=0.50 cm 2ns −1 and D=1.31 cm 2ns −1, respectively. These values differ by approximately 10 from the original value. The sharp edges are not recovered and the inclusions appear blurred, as is typical for optical tomography. We found that as long as the initial guesses are within 20 of the average D value of the image, the reconstruction results are very similar. When initial D guesses are chosen that are outside the range of D values present in the image strong artifacts start to appear.

3.3.

Reconstruction with Tissue-Type Penalty Term

Next we consider how these results can be improved when the information is used that only three different tissues types with given optical properties are present in the medium [Eq. (8)]. However, no knowledge is assumed about where these three different regions are located, or how much of the image volume is occupied by a particular tissue type. Figures 4(a)–4(c) show reconstructions obtained with the penalty term Π tt for three different parameters γ. In this example we chose D1=0.520 cm 2ns −1, D2=0.92 cm 2ns −1, and D3=1.44 cm 2ns −1. Furthermore, we set the width of the Gaussians to σk=0.1 cm 2ns −1 [Eq. (8)]. The initial guess is a homogeneous medium with D=1 cm 2ns −1 and μa=0.1 cm −1. Reconstructions were terminated after 20 iterations. It can be seen that the penalty term has an effect on the image for γ=0.02 [Figure 4(a)]. Increasing γ to 0.08 does produce better results [Figure 4(b)], however, a further increase to γ=0.1 yields reconstructions with almost constant D values across the entire image [Figure 4(c)]. All pixel values are close to the background value D=0.92 cm 2ns −1. Figure 5 shows the dependence of the image error IE on the parameter γ. It can be seen that only for a small region of 0.01<γ<0.1 does the image quality improve (IE<1). For all other values γ>0.1 the image error is larger than without the penalty term (IE>1).

Figure 4

Image reconstruction with tissue-type penalty terms [Eq. (8)] after 20 iterations. The initial guess is a homogeneous medium with D=1 cm −1: (a) γ=0.02, (b) γ=0.08, (c) γ=0.1, and (d) γ=0.005 n2, where n equals the number of iterations.

016102j.4.jpg

Figure 5

Image error IE as function of the hyperparameter γ for the tissue type prior Π tt (open circles) and histogram prior Π hist (filled circles). Both curves are normalized to 1 for the result of the reconstruction without penalty term (γ=0).

016102j.5.jpg

Rather than keeping γ fixed for all iterations, one can also change this parameter dynamically. Figure 4(d) shows the result of increasing γ with the number of iterations n, so that γ=0.005 n2. After 25 iterations both the location and D values of the inclusions are correctly recovered. Deviations between the reconstruction and the original image are on the order of single pixels, and the image error is relatively small (IE=0.18).

3.4.

Reconstruction with Histogram Prior

Next, we consider the case where we have knowledge about the different tissue types in the medium and in addition know their respective volume fractions. Therefore, we assume to have knowledge about the histogram of the cross-sectional image of the tissue sample. Figures 6(a)–6(d) show reconstructed images using the histogram prior with increasing hyperparameter γ. As in previous cases, the initial guess is a homogeneous medium with D=1 cm 2ns −1 and μa=0.1 cm −1. Choosing γ<0.1 does not result in a significant improvement of the reconstruction and yields images similar to the unbiased case γ=0 [Figure 3(b)]. For γ⩾0.2 the two objects become more localized and show plateaus of constant D values. For γ=0.5 we observe increasingly sharp boundaries separating the heterogeneities from the background medium. Although their shape does not exactly match the original objects they reflect the correct size or rather volume fraction of these objects, as enforced by the histogram prior. Furthermore, the reconstructed D values remain constant across most of the areas covered by the cubes, rather than showing a pronounced peak, as observed without the penalty term. This allows for a much better extraction of the absolute optical parameters from the reconstruction. As the strength of the prior is increased further, we observe only a weak dependence of the image quality on γ [Figures 5, 6(c) and 6(d)]. For large values of γ, the reconstruction tends to exaggerate sharp edges and produces artifacts, while sacrificing the correct shape of the objects. As expected, the diffusion coefficient eventually assumes one of the three values imposed by the histogram. In Figure 6(d), the prior information is weighted 106 stronger than the χ2 term, so that fitting the image’s histogram to the prediction becomes the main objective. Nevertheless we still obtain reasonably good agreement with the original image.

Figure 6

Image reconstruction with histogram penalty terms [Eq. (12)] after 20 iterations. The initial guess is a homogeneous medium with D=1 cm −1: (a) γ=0.2, (b) γ=0.5, (c) γ=1.0, and (d) γ=106.

016102j.6.jpg

4.

Discussion

In general we found that the effectiveness of a penalty term depends as much on the right choice of the penalty term as on the right choice of the hyperparameter, which adjusts the influence of the penalty term during the reconstruction process. The results for the tissue-type penalty terms Π tt show that an improvement in the reconstruction result can only be obtained for a relatively small range of hyperparameters (0.01<γ<0.1). For values of γ<0.01, the additional penalty term has no effect, while for γ>0.1 the quality of the reconstruction is actually worse than without penalty function. Invoking the earlier discussed interpretation of the χ2 term and penalty term as potentials and there derivatives as forces, we see that if γ is chosen too strong the tissue-type penalty term traps all pixel values in the minimum closest to the initial guess (D=1 cm 2ns −1, Figure 1). Forces due to the χ2 term are too weak to lift a pixel out of the valley centered around D=0.92 cm 2ns −1.

Choosing the hyperparameter dynamically can improve the convergence of the GIIR scheme with Π tt -type penalty terms, as shown in Figure 4(d). Giving only little weight to the penalty term in the early stages of the reconstruction process allows the algorithm to find areas with increased or decreased values of D based on information from the χ2 term. Increasing the weight of the penalty term later in the reconstruction process leads to a sorting of each pixel in one of the three “potential” wells of the penalty term. However, even the use of a dynamically adjusted penalty term depends strongly on the appropriate choice of the initial value of γ and the rate at which it is increased. One is forced to find these parameters empirically for each particular reconstruction problem. The same drawback was previously encountered for total variation minimization penalty terms (see Ref. 17).

The histogram penalty terms Π hist provides good reconstruction results for all γ>0.01. Surprisingly even the very large γ=106 provides a good reconstruction. The information contained in the histogram prevents the scheme from converging toward singular solutions, such as trapping all pixels at the same value of D. If more and more pixels assume only one of the expected values, the force acting on them diminishes and eventually becomes repulsive. Pixel values are pushed by penalty-term forces to alternative values to fit the overall histogram distribution. When the histogram of the reconstructed image approaches the “true” histogram the force acting on individual pixels diminishes and eventually becomes zero. At the global minimum of Π hist we may change single pixel values without increasing the penalty, by raising and lowering several pixels simultaneously. This is different in the scheme that uses the tissue-type penalty term Π tt . To interchange two pixel values that are already located at the bottom of two different potential valleys (Figure 1), one always needs a force that pushes each pixel value over the potential hill in-between.

The weak dependence of the reconstruction on γ and the fact that sensible results are still obtained as the penalty term becomes dominant, are important features of the histogram regularization. This makes it a very stable method to incorporate additional information into the reconstruction process. However, the full histogram information may not always be available, and only the tissue-type penalty term, which contain less information, may have to be used.

An additional effect of applying the histogram penalty function is a faster conversion of the reconstruction algorithm. Typically only 20 iterations are necessary to produce qualitatively correct images, compared to approximately 50 iterations without this prior information. As we increase γ to values ≫1, as little as 10 iterations suffice to produce the sharply contrasted results shown in Figure 6(d).

Given that the target histogram is known, one could be tempted to start with an image that has the right histogram. Therefore, rather than using a homogenous medium as the initial guess, one could chose a heterogeneous image as the initial guess, which has the correct volume fraction of certain tissue types. In Figures 7(a) and 7(b), a scrambled version of the exact image was used as input, so that during the first iteration Π hist =0 is already minimal. The hyperparameter γ was set to 0 [Figure 7(a)] and 0.5 [Figure 7(b)], respectively. While for both cases the two objects are somewhat recovered, many of the initial single pixels at wrong positions remain, even when the histogram prior is switched on [Figure 7(b)]. Overall the image quality is much poorer than starting from a homogenous initial guess. In general, we found that starting with a homogeneous guess provides better images than starting with a wrong heterogeneous initial guess.

Figure 7

Image reconstruction starting from nonhomogeneous initial guess. Histogram of initial guess equals that of the target medium, however pixels with different D are randomly placed: (a) reconstruction without penalty term; (b) result with histogram penalty term (γ=0.5).

016102j.7.jpg

While the presented examples only show one type of medium, similar results have been obtained for a variety of differently structured media. The main results, namely that additional a priori information can be expressed in appropriate penalty terms, and that more information leads to better reconstructions and less sensitivity to a “correct” hyperparameter γ, is of general validity. Another example is shown in Figure 8 where instead of two compact elliptical inhomogeneities two curved lines with different diffusion coefficient cross a 4×4 cm medium. The medium is discretized into a 40×40 mesh with Δx=0.1 cm. The optical properties of the background are given by μa=0.1 cm −1, μS =8 cm −1, (⇒D=0.905 cm 2ns −1), while the optical properties of the inclusions are given by μa=0.2 cm −1, μS =13.0 cm −1 (⇒D=0.556 cm 2ns −1), and μa=0.05 cm −1, μS =4.5 cm −1 (⇒D=1.61 cm 2ns −1), respectively. The initial guess for all reconstructions is D=1.0 cm 2ns −1. Figure 8(a) shows the original image and Figures 8(b), 8(c), and 8(d) provide the reconstruction results after 40 iterations without a penalty term, with tissue-type penalty term, and histogram penalty term, respectively. Again we can see how penalty terms improve the image quality and that the use of more information leads to better results. The tissue-type penalty term only leads to improved images for γ<1.0, while the histogram penalty term can be applied with a much wider range of γ values. What type of penalty functions and choices of hyperparameters will be most suitable for various other applications such as optical tomographic imaging of the brain, breast, limb, joint, and other body parts remains to be determined for each case. Here we have limited ourselves to the derivation of the appropriate framework for such studies.

Figure 8

Example of reconstruction of two curved lines after 40 iterations. The initial guess is a homogeneous medium with D=1 cm 2ns −1: (a) original medium, (b) reconstruction without penalty term (γ=0), (c) reconstruction with tissue-type penalty term [Eq. (8), γ=0.8)], and (d) reconstruction with histogram penalty term [Eq. (12), γ=0.8].

016102j.8.jpg

Finally, it is interesting to compare the penalty term approach of gradient based schemes, with regularization schemes employed in the linear-perturbation approach (see Sec. I). It can be shown that the minimization of the functional in Eq. (7), which contains the χ2 term as well as a penalty term ω⋅Π, is equivalent to solving a matrix equation for Δζ of the form 17 30 32

Eq. (19)

(JJT+λR)Δζ=JT(MP(ζ))ωΠ(ζ),
where Π is the derivative of the penalty term with respect to the distribution of optical properties ζ(r). The matrix R is given by

Eq. (20)

R(2Π(ζ)ζ1ζ12Π(ζ)ζ1ζj2Π(ζ)ζ1ζN2Π(ζ)ζiζ12Π(ζ)ζiζj2Π(ζ)ζiζN2Π(ζ)ζNζ12Π(ζ)ζNζj2Π(ζ)ζNζN).

For the tissue-type penalty term we obtain

Eq. (21)

(Rtt)ij=2Πtt(ζ)ζiζj=k=1K{1/σk2(akζi)2/σk4}×exp((akζi)22σk2)for i=j=0forij.
Therefore, (R tt )ij has only entries on the diagonal, which means that no local coupling between pixels occurs. Adding this matrix to JJ T always will make Eq. (19) diagonally dominant, if we choose the parameter λ sufficiently large.

For the histogram penalty function we obtain:

(Rhist)ij=2Πhist(ζ)ζiζj={Ai,i+Bi,ifor i=jAi,jfor ij
with
Ai,j=l=1l2β2H0(ζl)(ζlζi)(ζlζj)σ4×exp((ζlζi)2+(ζlζj)22σ2)
Bi,i=l=1L2β(H0(ζl)H(ζl))H0(ζl)(1σ2(ζlζi)2σ4)×exp((ζlζi)22σ2)
where β=1/(2πσ2).

Therefore, (R hist )ij has nondiagonal entries, which means that coupling between pixels occurs. However, calculating R hist for the examples used in this work, we find that the values of diagonal elements are in general 3 orders of magnitude higher than the values of nondiagonal elements. Therefore, adding either the tissue-type or histogram penalty function to the χ2 term in a GIIR scheme, can also be interpreted as adding diagonally dominant matrices to the ill-conditioned matrix JJ T obtained in the linear perturbation approach to the image reconstruction problem.

The advantage of the penalty-term approach in GIIR schemes over regularization schemes employed in linear perturbation methods seems to be that penalty terms have an immediate physical interpretation. Even though useful regularizers can obviously be derived without the concept of penalty terms as shown for example by Pogue et al.;, 32 the concept of penalty terms may provide an intuitive conduit through which we can incorporate additional information into the reconstruction process. Since penalty terms can be considered as potentials that bias the χ2 potentials according to our prior knowledge about the system they may supply more insight into the underlying physical assumptions about the system to be reconstructed. Furthermore, matrix diagonality as demanded in the linear-perturbation approach to regularization, is clearly not the only factor to get reasonable reconstruction results. As shown in this work, the choice of a good hyperparameter is very critical, and much more work needs to be done to identify robust schemes for finding appropriate hyperparameters.

5.

Summary

In this study we addressed the problem of ill-posedness of the image reconstruction problem in optical tomography. Ill-posedness is caused by the fact that different spatial distributions of optical properties inside the medium can lead to similar detector readings on the surface of the medium. The question arises as to how can one nevertheless find the correct distribution of optical properties inside the medium from measurements on the surface of the medium. In this work we approach the problem within a gradient-based iterative image reconstruction (GIIR) scheme. Using this scheme the image reconstruction problem is interpreted as an optimization problem in which an appropriately defined scalar objective function is minimized using the gradient of the objective function with respect to the distribution of optical properties. In this context ill-posedness is reflected by the absence of a unique global minimum. To overcome this problem we added penalty terms that are derived from a priori information about the system to the commonly used least-square-error term, which compares numerically predicted and actual detector readings.

Specifically we derived and tested so-called tissue-type-penalty terms and histogram-penalty terms. The tissue-type penalty term is derived from the knowledge that a fixed number of different tissues are present in the medium. The optical properties of these tissues are known within certain error margins. If a pixel in the reconstructed image takes on a value that is different from the expected value, a penalty is added to the objective function. The larger the difference the larger is the penalty. The histogram penalty term is derived from the knowledge of the exact histogram of the correct image. Therefore, in addition to the knowledge about the number of different tissue types, one also knows their respective volume fractions in the image.

Performing numerical studies we found that both types of penalty terms can improve the image quality as well as the convergence rate of the iterative image reconstruction process. However, a crucial variable is the coupling or hyperparameter that adjusts the relative strength of the penalty term with respect to the least-square-error term. The tissue-type penalty term only provides image improvement for a narrow range of hyperparameters. If the hyperparameter is not chosen correctly the addition of this penalty term can have a detrimental effect on the image quality. On the other hand, the histogram-penalty term is much less sensitive to the right choice of the hyperparameter and in almost all cases leads to improved reconstruction results.

Finally, we compared the penalty terms derived for the gradient-based iterative image reconstruction scheme, with regularization schemes employed in the linear-perturbation approach. We showed that both tissue-type and histogram-penalty terms lead to diagonally dominate matrixes in the linear-perturbation approach.

Acknowledgments

The authors would like to thank Alexander Klose for many useful discussions concerning iterative image reconstruction algorithms. This work was supported in part by the National Institute of Arthritis and Musculoskeletal and Skin Diseases, a part of the National Institute of Health (Grant No. R01 AR46255-01), the Whitaker Foundation (Grant No. 98-0244), by the City of New York Council Speaker’s Fund for Biomedical Research: Towards the Science of Patient Care, and the Dean’s Office of the College of Medicine at the State University of New York Health Science Center Brooklyn (SUNY HSCB).

REFERENCES

1. 

K. Wells , J. C. Hebden , F. E. W. Schmidt , and D. T. Delpy , “The UCL multichannel time-resolved system for optical tomography,” 599 –607 (1997). Google Scholar

2. 

H. Jess, H. Erdl, K. T. Moesta, S. Fantini, M. A. Franceschini, E. Gratton, and M. Kaschke, “Intensity modulated breast imaging: Technology and clinical pilot study results,” in Advances in Optical Imaging and Photon Migration, OSA Trends in Optics and Photonics, Vol. II, R. R. Alfano and J. G. Fujimoto, Eds., pp. 126–129, Optical Society of America, Washington, DC (1996).

3. 

J. P. Vanhouten , D. A. Benaron , S. Spilman , and D. K. Stevenson , “Imaging brain injury using time-resolved near-infrared light scanning,” Pediatr. Res. , 39 470 –476 (1996). Google Scholar

4. 

M. Miwa and Y. Ueda , “Development of time-resolved spectroscopy system for quantitative noninvasive tissue measurement,” 142 –149 (1995). Google Scholar

5. 

S. Fantini , M. A. Franceschini , J. S. Maier , S. A. Walker , B. Barbieri , and E. Gratton , “Frequency domain multichannel optical detector for noninvasive tissue spectroscopy and oximetry,” Opt. Eng. , 34 32 –42 (1995). Google Scholar

6. 

C. H. Schmitz , H. L. Graber , H. Luo , I. Arif , J. Hira , Y. Pei , A. Bluestone , S. Zhong , R. Andronica , I. Soller , N. Ramirez , S-L. S. Barbour , and R. L. Barbour , “Instrumentation and calibration protocol for imaging dynamic features in dense-scattering media by optical tomography,” Appl. Opt. , 39 (34), 6466 –6487 (2000). Google Scholar

7. 

M. A. Franceschini , K. T. Moesta , S. Fantini , G. Gaida , E. Gratton , H. Jess , W. W. Mantulin , M. Seeber , P. M. Schlag , and M. Kaschke , “Frequency-domain techniques enhance optical mammography: initial clinical results,” Proc. Natl. Acad. Sci. U.S.A. , 94 6468 –6647 (1997). Google Scholar

8. 

S. R. Arridge , “The Forward and inverse problems in time-resolved infrared imaging,” 35 –64 (1993). Google Scholar

9. 

S. R. Arridge , “Photon-measurement density functions. Part I: Analytical forms,” Appl. Opt. , 24 7395 –7409 (1995). Google Scholar

10. 

S. R. Arridge and M. Schweiger , “Photon measurement density functions. Part 2: Finite-element-method calculation,” Appl. Opt. , 34 8026 –8037 (1995). Google Scholar

11. 

M. Schweiger and S. R. Arridge, “A system for solving the forward and inverse problems in optical spectroscopy and imaging,” in Advances in Optical Imaging and Photon Migrations, OSA Trends in Optics and Photonics Series, R. R. Alfano and J. G. Fujimoto, Eds., Vol. 2, pp. 263–268, Optical Society of America, Washington, DC (1996).

12. 

R. L. Barbour , H. L. Graber , Y. Wang , J. H. Chang , and R. Aronson , “A perturbation approach for optical diffusion tomography using continuous-wave and time-resolved data,” 87 –120 (1993). Google Scholar

13. 

H. L. Graber , J. Chang , R. Aronson , and R. L. Barbour , “A perturbation model for imaging in dense scattering media: Derivation and Evaluation of Imaging Operators,” 121 –143 (1993). Google Scholar

14. 

R. L. Barbour , H. L. Graber , J. W. Chang , S. L. S. Barbour , P. C. Koo , and R. Aronson , “MRI-guided optical tomography: Prospects and computation for a new imaging method,” IEEE Comput. Sci. Eng. , 2 63 –77 (1995). Google Scholar

15. 

Y. Q. Yao , Y. Wang , Y. L. Pei , W. W. Zhu , and R. L. Barbour , “Frequency-domain optical imaging of absorption and scattering distributions by Born iterative method,” J. Opt. Soc. Am. A , 14 325 –342 (1997). Google Scholar

16. 

K. D. Paulsen and H. Jiang , “Spatially varying optical property reconstruction using a finite element diffusion equation approximation,” Med. Phys. , 22 691 –701 (1995). Google Scholar

17. 

K. D. Paulsen and H. Jiang , “Enhanced frequency domain optical image reconstruction in tissues through total variation minimization,” Appl. Opt. , 35 3447 –3458 (1996). Google Scholar

18. 

H. Jiang , K. D. Paulsen , and U. L. O¨sterberg , “Optical image reconstruction using DC data: simulations and experiments,” Phys. Med. Biol. , 41 1483 –1498 (1996). Google Scholar

19. 

H. Jiang , K. D. Paulsen , U. L. Osterberg , B. W. Pogue , and M. S. Patterson , “Simultaneous reconstruction of optical absorption and scattering maps in turbid media from near-infrared frequency-domain data,” Opt. Lett. , 20 2128 –2130 (1995). Google Scholar

20. 

H. B. Jiang , K. D. Paulsen , U. L. Osterberg , B. W. Pogue , and M. S. Patterson , “Optical image reconstruction using frequency domain data: Simulations and experiments,” J. Opt. Soc. Am. A , 13 253 –266 (1996). Google Scholar

21. 

J. C. Schottland , J. C. Haselgrove , and J. S. Leigh , “Photon Hitting Density,” Appl. Opt. , 32 448 –453 (1993). Google Scholar

22. 

M. A. O’Leary , D. A. Boas , B. Chance , and A. G. Yodh , “Experimental images of heterogeneous turbid media by frequency-domain diffusion-photon tomography,” Opt. Lett. , 20 426 –428 (1995). Google Scholar

23. 

D. Y. Paithankar , A. U. Chen , B. W. Pogue , M. S. Patterson , and E. M. Sevick-Muraca , “Imaging of fluorescent yield and lifetime from multiply scattered light reemitted from random media,” Appl. Opt. , 36 2260 –2272 (1997). Google Scholar

24. 

S. S. Saquib , K. M. Hanson , and G. S. Cunningham , “Model-based image reconstruction from time-resolved diffusion data,” 369 –380 (1997). Google Scholar

25. 

A. H. Hielscher, A. Klose, D. M. Catarious Jr., and K. M. Hanson, “Tomographic imaging of biological tissue by time-resolved, model-based, iterative, image reconstruction,” OSA Trends in Optics and Photonics: Advances in Optical Imaging and Photon Migration II, Vol. 21, R. R. Alfano and J. G. Fujimoto, Eds., Vol. 21, pp. 151–161, Optical Society of America, Washington, DC (1998).

26. 

S. R. Arridge and M. Schweiger , “A gradient-based optimization scheme for optical tomography,” Opt. Express , 2 (6), 213 –226 (1998). Google Scholar

27. 

A. H. Hielscher , A. D. Klose , and K. M. Hanson , “Gradient-based iterative image reconstruction scheme for time-resolved optical tomography,” IEEE Trans. Med. Imaging , 18 262 –271 (1999). Google Scholar

28. 

A. D. Klose and A. H. Hielscher , “Iterative reconstruction scheme for optical tomography based on the equation of radiative transfer,” Med. Phys. , 26 (8), 1698 –1707 (1999). Google Scholar

29. 

R. Roy and E. M. Sevick-Muraca , “Truncated Newton’s optimization scheme for absorption and fluorescence optical tomography: Part I Theory and formulation,” Opt. Express , 4 (10), 353 –371 (1999). Google Scholar

30. 

S. R. Arridge , “Optical tomography in medical imaging,” Inverse Probl. , 15 R41 –R93 (1999). Google Scholar

31. 

S. R. Arridge, “Forward and inverse problems in time-resolved infrared imaging,” in Medical Optical Tomography, G. Mu¨ller Ed., Vol. IS11, pp. 53–64, SPIE Optical Engineering, Bellingham, WA (1993).

32. 

B. W. Pogue , T. O. McBride , J. Prewitt , U. L. Osterberg , and K. D. Paulsen , “Spatially variant regularization improves diffuse optical tomography,” Appl. Opt. , 38 2950 –2961 (1999). Google Scholar

33. 

J. Chul Ye , K. J. Webb , C. A. Bouman , and R. P. Millane , “Optical diffusion tomography by iterative-coordinate-descent optimization in a Bayesian framework,” J. Opt. Soc. Am. A , 16 2400 –2412 (1999). Google Scholar

34. 

M. Schweiger and S. R. Arridge , “Application of temporal filters to time resolved data in optical tomography,” Phys. Med. Biol. , 44 1699 –1717 (1999). Google Scholar

35. 

A. H. Hielscher , A. D. Klose , and J. Beuthan , “Evolution strategies for optical tomographic characterization of homogeneous media,” Opt. Express , 7 (13), 507 –518 (2000). Google Scholar

36. 

D. A. Pierre, Optimization Theory with Applications, Dover, New York, NY (1986).

37. 

D. H. Hristov and B. Gino Fallone , “A continuous penalty function method for inverse treatment planning,” Med. Phys. , 25 (2), 208 –224 (1998). Google Scholar

38. 

M. Ben-Daya and K. S. Al-Sultan , “A new penalty function algorithm for convex quadratic programming,” Eur. J. Operational Res. , 101 (1), 155 –164 (1997). Google Scholar

39. 

D. J. White , “The maximization of a function over the efficient set via a penalty function approach,” Eur. J. Operational Res. , 94 (1), 143 –154 (1996). Google Scholar

40. 

H. Xu , “Stochastic Penalty Function Methods for Nonsmooth Constrained Minimization,” J. Optim. Theory Appl. , 88 (3), 709 –725 (1996). Google Scholar

41. 

D. Vucina , “Flow formulation FE metal-forming analysis with boundary friction via a penalty function,” J. Mater. Process. Technol. , 59 (3), 272 –277 (1996). Google Scholar

42. 

B. C. Fabien , “An extended penalty function approach to the numerical solution of constrained optimal control problems,” Opt. Control Appl. Methods , 17 (5), 341 –355 (1996). Google Scholar

43. 

J. A. Snyman , N. Stander , and W. J. Roux , “A dynamic penalty function method for the solution of structural optimization problems,” Appl. Math. Model. , 18 (8), 453 –463 (1994). Google Scholar

44. 

K. Toma , “Protein three-dimensional structure generation with an empirical hydrophobic penalty function,” J. Mol. Graphics , 11 (4), 222 –232 (1993). Google Scholar

Notes

Andreas H. Hielscher is also at Department of Electrical and Computer Engineering, Polytechnic University, 5 MetroTech Center, Brooklyn NY 11201. Address all correspondence to Andreas H. Hielscher. E-mail: ahielscher@downstate.edu

©(2001) Society of Photo-Optical Instrumentation Engineers (SPIE)
Andreas H. Hielscher and Sebastian Bartel "Use of penalty terms in gradient-based iterative reconstruction schemes for optical tomography," Journal of Biomedical Optics 6(2), (1 April 2001). https://doi.org/10.1117/1.1352753
Published: 1 April 2001
Lens.org Logo
CITATIONS
Cited by 53 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Tissues

Optical properties

Image restoration

Image quality

Optical tomography

Tissue optics

Reconstruction algorithms

Back to Top