Open Access
11 January 2021 Efficient high-order accurate Fresnel diffraction via areal quadrature and the nonuniform fast Fourier transform
Author Affiliations +
Abstract

We present a fast algorithm for computing the diffracted field from arbitrary binary (sharp-edged) planar apertures and occulters in the scalar Fresnel approximation for up to moderately high Fresnel numbers (≲103). It uses a high-order areal quadrature over the aperture and then exploits a single 2D nonuniform fast Fourier transform to evaluate rapidly at target points (on the order of 107 such points per second, independent of aperture complexity). It thus combines the high accuracy of edge integral methods with the high speed of Fourier methods. Its cost is O  (  n2 log n  )  , where n is the linear resolution required in the source and target planes, to be compared with O  (  n3  )   for edge integral methods. In tests with several aperture shapes, this translates to between two and five orders of magnitude acceleration. In starshade modeling for exoplanet astronomy, we find that it is roughly 104  ×   faster than the state-of-the art in accurately computing the set of telescope pupil wavefronts. We provide a documented, tested MATLAB/Octave implementation. An appendix shows the mathematical equivalence of the boundary diffraction wave, angular integration, and line integral formulas and then the analysis of a new non-singular reformulation that eliminates their common difficulties near the geometric shadow edge. This supplies a robust edge integral reference against which to validate the main proposal.

1.

Introduction

The numerical modeling of wave diffraction from thin two-dimensional (2D) screens and apertures in the Fresnel regime has many applications in optics1 and acoustics (Ref. 2, Sec. 8.4), including instrument modeling,3,4 lithography mask design,5 Fourier optics,6 coherent x-rays,7 acoustic emission,8 computer-generated binary holograms,9 starshades,10 and Fresnel zone plate imagers.11,12 This usually involves a plane or spherical wavefront hitting a binary (“0–1”) mask of given shape, although continuous opacity/phase variation is also possible. We confine ourselves to the former case, although the method can trivially accommodate the latter. Our point is to present a simple—yet seemingly overlooked—method that renders a high-accuracy numerical modeling at moderate Fresnel numbers orders of magnitude more efficient than prior methods.

One motivation is starshade design.10,1317 Given a space telescope, the goal is that a distant binary occulter blocks the direct light from a star, allowing for much dimmer exoplanets separated from it by only tens of milli-arcseconds to be imaged. The occulter shape and distance are thus optimized to give a deep shadow region, with a relative intensity on the order of 1010 across the telescope pupil, throughout a given wavelength range, while minimizing the occulter’s physical size (for practical reasons) and the angular size at the telescope. This has led to shapes with “petals” that emulate a continuous radial apodization, with radii on the order of 10 m and distances on the order of 107  m. Here the scalar18 and Fresnel approximations are superb (Ref. 17, App. A), with a small Fresnel number [defined in Eq. (2)] of typically 5 to 20. Numerical modeling is challenging, demanding at least 6-digit accuracy in amplitude to validate the shadow and many runs with different wavelengths and shapes to assess mechanical and thermal stability.16

Fixing a wavelength λ and propagation distance z, a point in the aperture (or occulter) plane is (x,y,0) and a target point in the detector (or pupil) plane is (ξ,η,z) (see Fig. 1). We drop the constant z-coordinates hereafter, so the problem is in essence 2D. In the case of a unit amplitude plane wave with wavevector (0,0,2π/λ) incident on a planar aperture ΩR2, the mathematical task is to evaluate the Fresnel integral for the scalar potential:

Eq. (1)

uap(ξ,η)=1iλzΩeiπλz[(ξx)2+(ηy)2]dxdy.
This takes the form of a 2D convolution of the aperture’s characteristic function χΩ with a radially symmetric kernel (a complex Gaussian), with half-wave oscillation regions that are commonly called “zones” (Fig. 1). If R is an effective (or maximum) radius of Ω, then the in-plane separation r(ξx)2+(ηy)2 is typically bounded by R times a small constant. Thus the number of zones inside Ω is on the order of

Eq. (2)

fR2λz(Fresnel number),
and the finest oscillation scale of the integrand is O(1/f). In Eq. (1), the prefactor 1/iλz ensures that u tends to unity in the limit of a large aperture, i.e., the unimpeded wave.

Fig. 1

Geometry and notation for Fresnel diffraction, in the case of Ω an aperture with boundary Ω. A plane wave is incident from behind, along the z axis. For the target (ξ,η), the real part of the integrand in Eq. (1) is imaged in color (red positive, blue negative, and green around zero; each red or blue annular region is a Fresnel zone).

JATIS_7_2_021211_f001.png

It is worth reviewing the origin of Eq. (1). It arises from the Kirchhoff diffraction approximation to the full Maxwell equations. This is good when aperture features are much larger than λ (Ref. 6. Ch. 3). Only the zeroth and first term in the Taylor expansion of the exponent in the free-space Green’s function e2πiρ/λ/ρ are then kept, ρ=r2+z2 being the source-target distance (dotted line in Fig. 1). The Fresnel approximation is thus valid to the extent that the next term is small, implying the condition r4λz3. The denominator of the Green’s function is approximated by z. We refer the reader to (Ref. 1, Sec. 8.3.3) (Ref. 6, Ch. 4) for details. Note that the zeroth term gave the plane propagation phase e2πiz/λ, which is usually included as a prefactor in Eq. (1). We drop it for simplicity; it is trivial to insert. By replacing z1 by z1+D1, Eq. (1) also applies when a point source at finite distance D produces a spherical incident wave.1,17,19 In the plane wave case, by Babinet’s principle (Ref. 1, Sec. 8.3.2) (Ref. 14, Sec. 2.2), the potential when Ω defines an occulter rather than an aperture is simply given by

Eq. (3)

uoc(ξ,η)=1uap(ξ,η).

Analytical forms for the integral Eq. (1) are known only for the straight edge, infinite slit, and rectangles (all involving the same 1D special function) (Ref. 1, Sec. 8.7) (Ref. 6, Sec. 4.5.1) and the disc.2022 Semianalytical Bessel expansions can be useful for symmetric starshade design.10,14 But for general shapes, one is left with fully numerical methods, which fall into two main categories [sketched in Figs. 2(a) and 2(b)]:

  • (a) 2D fast Fourier transform (FFT) methods. There are two types: (i) methods implementing the convolution theorem (forward then backward FFTs), useful only for large f (near-field), or (ii) methods exploiting the quadratic form in Eq. (1) via a single FFT,11,13,23 useful from zero to moderate f. See Ref. 24 for a review, in which fractional FFTs are also considered. The FFT of course requires only O(n2logn) operations to transform a n×n source to the target grid. However, the aperture or occulter must be sampled (quantized) on the source grid, and if this is done in a binary fashion11 [as in Fig. 2(a)], the error convergence rate is a very low order, no faster than O(1/n). This is inadequate for starshade shadow modeling.14,15 Subpixel averaging can improve accuracy,3,17 but this can give at best25 O(1/n2), and for starshades, huge (n>105) sized FFTs are still needed to reach the required accuracy.17 The underlying problem is that χΩ is not a bandlimited function, so it is always poorly represented on regular grids.

  • (b) Edge integral methods. Such methods discretize a target-dependent integral over the aperture/occulter boundary Ω. The literature splits these methods into two formulas: (i) the “boundary diffraction wave” (BDW) of Miyamoto–Wolf,26 arising from the Kirchhoff approximation, versus (ii) reduction of Eq. (1) to a 1D angular integral, a result of Dauger19 and Dubra–Ferrari.22 To resolve the Fresnel integrand, the number of discretization nodes must scale as n=O(f), so the cost for (direct) evaluation on a resolved n×n target grid is O(n3)=O(f3). Both formulas have been applied to state-of-the art starshade modeling: Cady15,27 uses (i), while Cash14 and Harness et al.17 use (ii). Here second-order, i.e., O(1/n2) accurate, midpoint quadrature rules are used, with up to about 105 nodes, to reach the needed 6-digit accuracy.

Fig. 2

Sketch of three alternative methods for discretization in the source (aperture) plane, shown for a kite-shaped aperture Ω with smooth boundary Ω. (a) Uniform 2D grid sampling; (b) line integral quadrature; and (c) high-order areal quadrature. (a) and (b) are well established. Our method is (c): areal nodes (xj,yj) are shown with their color indicating the weights wj according to the scale at the right.

JATIS_7_2_021211_f002.png

Remark 1. Such edge integral methods should not be confused with the (more involved) boundary integral equation method, which solves the 3D Helmholtz or Maxwell equations using surface unknowns and potential theory.28,29

Our proposal combines the best features of the above two categories, namely the speed of the FFT methods with the high (and potentially high order) accuracy of edge integral methods for binary apertures. The result is an acceleration over edge integral methods of between two and five orders of magnitude. In a nutshell, the idea is, realizing that a 2D regular grid guarantees poor quadrature for integrals over Ω, to replace it with a much better (high-order) areal quadrature scheme [see Fig. 2(c)]. Then exploiting the quadratic form in Eq. (1), as in “single FFT” methods (a)(ii), leaves one remaining problem: how to rapidly evaluate Fourier sums involving off-grid frequencies. Fortunately, fast algorithms for this task—nonuniform FFTs30,31—are quite mature and have speeds only one order of magnitude below those of plain FFTs.

High accuracy relies on constructing a good areal quadrature for Ω, which depends on its geometric description. We show that such quadratures can easily piggyback off boundary quadratures or be built independently. Our proposal is in some way related to diffraction methods that subsample the FFT32 or use chirp FFTs;4 however, it is much simpler and more general than either.

Turning to the structure of this paper, Sec. 2 explains the method for arbitrary targets (Sec. 2.1) and then for gridded targets (Sec. 2.2), the latter being somewhat faster. In Sec. 3, we demonstrate the high accuracy and efficiency of the method for a smooth occulter (Sec. 3.1), two symmetric starshades (Sec. 3.2), and an aperture built from 67 million triangles (Sec. 3.3). We compare the method with the performance of a BDW edge integral code.15,27 We draw conclusions, explain how the method can be applied to perturbed starshades, and propose extensions in Sec. 4.

Finally, validating the proposed nonuniform fast Fourier transform (NUFFT) method down to errors of 1012 or better demanded a high-accuracy reference edge integral method, which required new research, to which the Appendix is devoted. There we clarify that the BDW edge-integral (b)(i) and angular-integration (b)(ii) methods are equivalent and are equivalent to a more convenient line integral due to Cash.14 However, as we show, existing edge integral methods suffer numerical breakdown as targets approach Ω (the geometric shadow edge) because they represent the (smooth) diffracted field as a sum of two discontinuous terms, one with a singular integrand. We instead present and analyze a simple, robust non-singular line integral (NSLI) that maintains close to machine accuracy for target points near or even on Ω, without extra work, yet takes only five lines to code.

Remark 2. We maintain a documented, tested, open-source MATLAB/Octave implementation of the proposed fast algorithm (using the FINUFFT library33), and the NSLI, on GitHub.34 Queries to the author are welcome.

2.

Proposed Method

Given a set of targets (ξk,ηk), k=1,,M, recall that the goal is to approximate Eq. (1) efficiently, i.e., to evaluate

Eq. (4)

ukapuap(ξk,ηk)=1iλzΩeiπλz[(ξkx)2+(ηky)2]dxdy,k=1,,M.
Suppose that an areal quadrature rule for the aperture Ω has been found, that is, a set of nodes (xj,yj)R2 and weights wj, j=1,,N, such that, for all sufficiently smooth functions f,

Eq. (5)

Ωf(x,y)dxdyj=1Nf(xj,yj)wj
holds to high accuracy. Specifically, one seeks a family of rules of increasing N, with a high order of convergence p, meaning that, for each C-smooth f, the error (difference between left and right sides) is O(Np). This may even hold for all p>0, in which case the convergence is said to be super-algebraic or spectral. For instance, such a quadrature over a rectangle is given by a tensor product of 1D Gauss–Legendre rules (Ref. 35, Ch. 19). By passing those nodes through a smooth mapping of R2 to R2 (and multiplying the weights by its Jacobean), such rules for triangles and distorted quadrilaterals are easily made, the union of which can approximate any domain with a piecewise smooth boundary to a high order. For more node-efficient quadratures, we refer the reader to recent works on triangles,36 polygons,37,38 and domains defined by piecewise rational curves.39 Converting a general boundary into an areal quadrature for its interior is a software engineering task beyond the scope of this paper. We will be content constructing areal quadratures for three types of domains: the interior of a smooth closed curve, symmetric starshades, and unions of triangles.

Since any rule Eq. (5) must resolve the Fresnel integrand oscillations, for a fixed Ω, its number of nodes must grow like

Eq. (6)

N=O(f2).
Now simply applying Eq. (5) to Eq. (4) gives a high-order accurate direct (slow) summation method that, as the results below show, can exceed accuracy requirements with reasonable numbers of nodes. (Our results show that claims such as “a single point in the shadow plane can require a trillion sine calculations at quadruple precision”14 are overly pessimistic.) The cost of this direct sum to M targets would be O(NM). Our goal is now to reduce this to close to O(N+M) via fast Fourier methods.

2.1.

Fast Algorithm for Diffraction to Arbitrary Target Points

We apply Eq. (5) to Eq. (4) and then in the second line expand each quadratic term. (This trick, common to “single FFT” methods,11,13,23 is similar to, but essentially the reverse of, that in the Bluestein method.40) This gives

Eq. (7)

ukap1iλzj=1Neiπλz[(ξkxj)2+(ηkyj)2]wj

Eq. (8)

=1iλzeiπλz(ξk2+ηk2)·j=1Ne2πiλz(ξkxj+ηkyj)[eiπλz(xj2+yj2)wj].
This factorized form allows for a three-step “fast” (in the sense of quasi-optimal scaling) algorithm:

  • 1. Compute all “strengths” cj according to the following, which takes O(N) effort:

    cjeiπλz(xj2+yj2)wj,j=1,,N.

  • 2. Evaluate

    Eq. (9)

    vk=j=1Ne2πiλz(ξkxj+ηkyj)cj,k=1,,M,
    which is precisely the task performed by the so-called 2D NUFFT of type 3,30,31 a well-established algorithm with modern software implementations.33,41 This takes O(N+M+q2logq) effort, where q=O(f) is the largest magnitude of the exponent in Eq. (9).

  • 3. Postmultiply all outputs by their quadratic phases, which takes O(M) effort:

    ukap=1iλzeiπλz(ξk2+ηk2)vk,k=1,,M.

The overall cost is thus O(N+f2logf+M). Recalling Eq. (6), this is O(f2logf+M). The NUFFT requires a user-chosen error tolerance ε, which affects the prefactor of this run time scaling, but rather weakly.33

2.2.

Fast Algorithm for Target Points Lying on a Grid

Often sampling the diffracted wave on a dense regular Cartesian grid is sufficient. In this special case, more speed can be gained. Specifically, let (ξk,ηk) be the 2D grid points defined by the product of n-point regular 1D grids

{nh/2,(n/2+1)h,,h,0,h,,(n/21)h}
in the ξ and η directions, with h being the grid spacing. We assume that n is even. This grid has M=n2 targets and (ignoring its left-most column and bottom row) is centered on the origin. If we relabel the grid points as (hk1,hk2) for integer indices n/2k1, k2<n/2 and define rescaled source points x˜j(2πh/λz)xj and y˜j(2πh/λz)yj, the middle step Eq. (9) can be written
vk1,k2=j=1Nei(k1x˜j+k2y˜j)cj,n/2k1,k2<n/2,
which is precisely the so-called type 1 NUFFT30 (also known as the “adjoint NFFT”41). Its nonuniform points (x˜j,y˜j) are only defined modulo 2π and may need to be “folded” back into a valid input domain such as [π,π)2. If the grid width is similar in size to Ω, then it is easy to check that such a folding is only needed if n is less than the order of f, i.e., the target grid under-resolves the diffracted field u. This is probably not a common use case.

The total cost for the regular grid case is O(N+MlogM), which, recalling Eq. (6) and n=O(f), is O(f2logf). In practice, we find that, for the same number M of targets spanning the same domain and the same tolerance ε, this regular grid version is around four times faster than the above arbitrary target version because the type 1 NUFFT is faster than the type 3 for the same space-bandwidth product.33 The generalization to rectangular target grids with arbitrary translations is simple, achieved through prephasing in step 1, and we will not present it here.

3.

Performance Tests and Results

We now test the accuracy and speed of the above method in several geometries and compare it with two edge integral methods. All codes are written in MATLAB R2017a, apart from FINUFFT, which is a parallel C++ library with MEX interface. All timings are reported for double-precision arithmetic on a 4-core i7-7700HQ laptop with 32 GB RAM, using 8 threads (full hyperthreading). All codes take advantage of the multiple threads, either via vectorized MATLAB statements or via OpenMP in FINUFFT. In each geometry, we first need to describe the areal quadrature used.

3.1.

Domains Defined by a Simple Smooth Closed Curve

We start by setting up a simple areal quadrature for the interior of a simple smooth closed curve. Suppose that we have good quadrature nodes (Xi,Yi)Ω and weights (Wi,Vi), indexed by i=1,,n, for vector line integrals on Ω, that is, for all sufficiently smooth vector-valued functions f on Ω:

Eq. (10)

Ωf(x,y)·dsi=1nf(Xi,Yi)·(Wi,Vi),
where ds is the counter-clockwise vector line element on Ω. Now fix a number of “radial” nodes m, and let {αl}l=1m be the Gauss–Legendre (Ref. 35, Ch. 19) nodes and {w˜l}l=1m be their weights for the interval (0, 1). The integral over Ω may be rewritten using a “dilation” parameterization Ωfdxdy=Ω(01f(αx)αdα)x×ds, where xΩ and the 2D cross product is understood to give a scalar. Applying the m-node rule on (0, 1) to the inner integral and Eq. (10) to the outer integral gives the “tensor product” areal quadrature for Ω, with N=nm nodes (xj,yj) and weights wj given by

Eq. (11)

[xl+(i1)m,yl+(i1)m]=(αlXi,αlYi),for  i=1,,n,l=1,,m,

Eq. (12)

wl+(i1)m=αlw˜l(XiViYiWi),for  i=1,,n,l=1,,m.
These nodes lie along “spokes” connecting the origin to the boundary nodes, as in Fig. 2(c). If Ω is not star-shaped about the origin, then some of the nodes lie outside Ω and some wj are negative; however, we observe little loss of accuracy unless Ω is highly non-convex or poorly centered on the origin.

Remark 3. The above “dilation” method automatically creates an areal quadrature for Ω given only a (vector) line integral quadrature for Ω and the convergence parameter m. It thus also applies to boundaries with corners or cusps, including non-symmetric starshades. In practice, its construction time is negligible compared with that of the proposed NUFFT algorithm.

Finally, we must build a vector line integral rule Eq. (10). The simplest case is when Ω is parameterized in a counter-clockwise sense over t[0,2π) by a smooth 2π-periodic vector function x(t)[X(t),Y(t)]. Then

Eq. (13)

Ωf·ds=02πf(x(t))·x(t)dti=1nf(x(2πi/n))·2πnx(2πi/n),
where x(t)dx/dt and we apply the n-point periodic trapezoid rule quadrature with nodes t=2πi/n and equal weights 2π/n. Comparing the right sides of Eqs. (10) and (13), one reads

Eq. (14)

Xi=X(2πi/n),Yi=Y(2πi/n),Wi=2πnX(2πi/n),Vi=2πnY(2πi/n),i=1,,n.

We apply the above to build a family of areal quadratures for the kite domain with a smooth boundary [X(t),Y(t)]=(0.5cost+0.5cos2t,sint), shown in Fig. 2. Its maximum radius is R1.13. We then show multiple types of error convergence for the proposed method for Fresnel diffraction in Fig. 3. The graphs not labeled “self” (i.e., blue, black, and green) show absolute error in uap at a single point, using as a reference solution the (converged) NSLI edge integral method in the Appendix. The other self convergence (red) graphs show the maximum error in uap over 106 targets, using the converged values themselves as a reference. The “direct” (blue) graph simply sums Eq. (7) without the fast algorithm. The other graphs test two types of the proposed NUFFT method—arbitrary targets (t3, + signs) and gridded targets (t1, ∘ signs)—each at two different requested tolerances (6-digit and 12-digit).

Fig. 3

Convergence the proposed method for the smooth kite domain (see Sec. 3.1), with respect to m (“radial” nodes) and n (boundary nodes). Unless labeled self, all errors are measured at the target (3/2,3/2), shown as a green dot in the right-most plots, relative to the reference line integral method (NSLI). Those labeled self give the maximum error over M=106 target points relative to their converged values. “direct” (blue) uses (7); “t3” (+ signs) uses the arbitrary target type 3 NUFFT of Sec. 2.1 (random points lying in [3/2,3/2]2), and “t1” (∘ signs) uses the gridded target type 1 NUFFT of Sec. 2.2 (grid of n=103, nh=3). Two NUFFT tolerances ε=106 and ε=1012 are compared. The occulter Fresnel number for (a)–(c) is f12.8 and (d)–(f) is f128.

JATIS_7_2_021211_f003.png

In each row of panels, the first shows convergence in m (“radial” nodes), with fixed n (boundary nodes), and the second vice versa. The right-most panels show, on a M=10002 point grid, the converged intensity |uoc|2, applying Eq. (3) so that Ω is an occulter. There are several observations as follows.

  • In all cases, the convergence is very sudden, as is typical with a spectrally accurate quadrature rule applied to an oscillatory integrand.

  • Comparing the top and bottom rows, where f has increased by a factor 10, the converged m and n have each become about 10 times larger. This matches Eq. (6).

  • The “direct” application of Eq. (7) converges to 13 to 14 digits, while the NUFFT methods convergence bottoms out at within 1 digit of the requested ε, as expected.

  • The self convergence (red graphs) in n [panels (b) and (e)] occurs in tandem with the independent error at the single test point (3/2,3/2). However, for m [panels (a) and (d)] this is not quite true, even though the test point is the point in the target domain [3/2,3/2]2 maximizing the maximum source-target separation r. This is due to particulars of the angular variation in node density and reminds one that convergence must be tested at all target points.

Table 1 reports CPU timings and errors for these same tasks at converged n and m quadrature parameters. (The areal quadrature generation is not included as it was found to be negligible.) NSLI (known to achieve 13 to 14 digits with these parameters) is used as the reference for all errors. A state-of-the-art edge integral code BDWF15 (as available in SISTER27 and documented in Ref. 34) is also tested with the same n: while its median errors are as expected from its use of a second-order accurate midpoint rule, its maximum errors are larger than 1. These huge errors appear to be confined to a few target points very near Ω, the geometric shadow edge. The main conclusions from Table 1 are as follows.

  • In this setting, the proposed NUFFT-based methods are 100 to 300 times faster than the edge integral methods (for arbitrary targets) or 400 to 600 times faster (for gridded targets).

  • The proposed methods robustly (uniformly at all targets) achieve close to the requested error.

  • At the largest λz (f12.8), the t1 achieves 107  targets/s, and the t3 achieves 4×106  targets/s, with only a weak dependence on tolerance. These are only slightly slower for f 10 times larger.

  • At the smallest λz=103 (f1280), the asymptotic O(f2logf) cost has started to dominate. Note that 134 million nodes are being mapped to 10 million targets in around 10 s.

Table 1

Absolute error in uoc and run times of several methods for the smooth kite occulter with converged quadrature parameters. The domain, upper two λz choices, and target region are as in Fig. 3. Blank entries in this table to be taken as repeated from above and “—” indicates not applicable. In both random and grid cases, errors are measured relative to the NSLI reference method (see Appendix). Timings for NSLI and BDWF15 are not listed for the grid cases since they are identical to the random cases. The last two rows are close to the largest Fresnel number f that the laptop can handle (errors were checked at only 104 targets in those cases). See Sec. 3.1 for other details.

λzfn (bdry)m (radial)M (targets)MethodMedian errMax errCPU time (s)
0.112.8320106, randomNSLI25.8
BDWF2.1×1031.9×10135.2
32080NUFFT t3 (ε=106)6.2×1081.0×1060.23
NUFFT t3 (ε=1012)3.3×10132.8×10120.32
106, gridNUFFT t1 (ε=106)9.6×1098.0×1070.06
NUFFT t1 (ε=1012)1.4×10132.6×10120.12
0.011282400106, randomNSLI79
BDWF1.1×1051.1×101115
2400560NUFFT t3 (ε=106)9.3×1084.7×1060.29
NUFFT t3 (ε=1012)3.4×10139.5×10120.51
106, gridNUFFT t1 (ε=106)2.1×1084.6×1060.17
NUFFT t1 (ε=1012)1.8×10139.6×10120.21
0.001128024,0005600107, randomNUFFT t3 (ε=106)5.2×1085.3×10616.2
107, gridNUFFT t1 (ε=106)1.9×1083.7×10610.4

Since under the hood, the NUFFT uses FFTs, the reader might worry about their RAM usage. It is very mild: for t1 (gridded) cases, the FFT size is 5/4 times (for ε109, or twice otherwise) the requested grid size in each dimension. For t3, FFT dimensions scale like f: for the smallest f12.8, the FFT is a tiny 216×288. This is to be compared with the 32768×32768 FFT needed for a subpixel sampling method to reach around 6-digit accuracy in u in various tests17 at similar f. In the penultimate row of the table f is 102 times larger, yet the FFT is only 7500×9600 (similar to the maximum r2/λz8900 zones), and total RAM usage is 17 GB, about the largest the laptop can handle.

Another natural question is: does the method suffer at smaller f than tested above? It does not: node numbers and CPU times only get smaller. In fact, by expanding the target grid in proportion to z, the Fraunhofer limit is reached in a stable fashion [here the first factor in Eq. (8) should be discarded, whereas the third factor tends to unity].

3.2.

Application to Starshade Modeling

Idealized starshades are described10,14,15 by a radial apodization function A(r), where (in this section only) we use (r,θ) as occulter-plane polar coordinates about the origin. A(r) is 1 (indicating a fully blocking disc) for r<a and drops in a carefully optimized fashion in r[a,R] to close to zero at R, the maximum occulter radius, and identically 0 beyond this. Apodization over [a,R] is realized via Np identical binary petals, each of which has an angular width at radius r of 2πA(r)/Np. Let the function P(α) denote α+2πn for the unique nZ such that α+2πn[π,π), a common definition of the principal value of an angle. Then the occulter is the “flower” shape:

Eq. (15)

Ω={(r,θ):  0rR,  P(Npθ)[πA(r),πA(r)]}.

See Fig. 4. Note that published A(r) designs are discontinuous at a (indicating a gap between petals) and at R (petal tips have finite widths). In early “analytic” designs, these discontinuities were required to be no larger than about 105 in size to minimize Arago-spot-style diffraction into the deep shadow (Ref. 14, § 4.3), but designs generated by optimization over a λz band10,27 have much larger gap and tip discontinuities, on the order of 102 to 103, and an Arago effect that is apparently cancelled out over the band by distributed “ripples” (Ref. 14, §5) in A(r).

Fig. 4

Idealized starshade geometry (zoom for clarity). A(r) is the apodization profile controlling petal angular width, here illustrated with the HG (offset hyper-Gaussian14) with Np=16 petals. The colored dots show areal quadrature nodes (xj,yj) with weights wj indicated by color as in the colorbar. m=80 Gauss radii cover the petal length, with np=30 Gauss nodes covering the petal angular width at each radius.

JATIS_7_2_021211_f004.png

Recall that the task is simply to evaluate Eqs. (1) and (3) with errors in uoc no worse than 106. To apply the proposed method, we build a high-order areal quadrature as follows. Since A(r) is discontinuous at r=a, we split Ω into the disc of radius a plus each of Np petals. Our disc quadrature simply applies Eqs. (11) and (12) to the uniform nd-node line integral on its boundary, that is, Eq. (14) applied to the parameterization [X(t),Y(t)]=(acost,asint). We use md radial nodes. We then cover each petal by m nodes to handle the (outer) radial integral and then handle the (inner) arc integral at each of their node radii by np angular nodes (see Fig. 4). Specifically, let {rl,w^l}l=1m be 1D Gauss–Legendre nodes and weights for the (outer) radial integral over (a,R). Similarly, let {ti,ω^i}i=1np be nodes and weights for the fixed interval [π/Np,π/Np]. Then recalling the area element rdrdθ, the resulting areal nodes (in Cartesians) and weights covering one petal are

(xl+(i1)m,yl+(i1)m)=(rlcos(A(rl)ti),rlsin(A(rl)ti)),for  i=1,,np,l=1,,m,wl+(i1)m=rlw^lA(ri)ω^i,for  i=1,,np,l=1,,m.
Other petals are obtained by rotating by multiples of 2π/Np. The total number of nodes is then N=ndmd+Npnpm. We fix md=m and nd0.3Npnp, leaving two (petal) convergence parameters m and np. In Sec. 4, we discuss applying this to perturbed (non-ideal) starshades.

3.2.1.

Accuracy validation

In Fig. 5, we compare the wave amplitudes along a radial slice computed by three methods for two designs of starshade: “NI2” (a small occulter with rippled profile optimized for a blue-green λ range16) and “HG” (a large occulter with analytic “offset hyper-Gaussian” profile14). We choose λ within their designed wavelength windows. The parameters used, and some CPU timings, are listed in Table 2. Since they have different geometry descriptions, we treat the two designs in turn.

Fig. 5

Validation of diffracted uoc along a radial slice for two starshade designs: (a) NI2 (optimized function with ripples16) and (b) HG (offset hyper-Gaussian analytic function14). Both have Np=16 petals. The proposed NUFFT t1 method (black line) is compared against edge-integral methods BDWF (red) and NSLI (green). Labels such as “t1-BDWF” indicate the absolute difference in uoc between two methods. BDWF and NSLI use the same boundary nodes, except for “NSLIlo”, which uses 10× the boundary nodes and second-order weights. For details, see Sec. 3.2.

JATIS_7_2_021211_f005.png

Table 2

Parameters and CPU times for the proposed NUFFT t1 and the BDWF edge-integral to complete the same diffraction tasks, for two starshades. See Fig. 5 for comparisons of their answers. The Fresnel number f uses the maximum radius R in Eq. (2). N is the number of areal quadrature nodes and n is the number of boundary nodes.

Designλ (m)z (m)fm (petal)Total nodesM (targets)MethodCPU time (s)
NI25×1073.72×1079.16000n=192,000106, gridBDWF5361
400N=499,200NUFFT t1 (ε=108)0.076
HG5×1078×1072460n=2048106, gridBDWF80.5
60N=37,440NUFFT t1 (ε=108)0.042

NI2

The optimized profile A(r) is available in the SISTER package27 in the form of 2462 equispaced samples covering the petal radius range [a,R]=[5,13]  m. From these, we use piecewise cubic splines to interpolate A at m radial Gauss nodes in [a,R]. Since A(r) appears to have at least 13 “bang-bang” type discontinuities (the discrete second derivative mostly takes values ±σ, for some constant σ, or 0), this necessarily limits accuracy to around 6 to 7 digits. By a convergence study, we found that m=400 and np=40 nodes across each petal were sufficient for areal quadrature to match this accuracy. For BDWF, we used the n=192,000 boundary nodes as given and used in SISTER. These have 6000 nodes per petal edge, but no nodes covering the interpetal gaps or tips (each of which is 0.03 m wide). For NSLI, we used an n-node vector line integral quadrature matching the second-order accurate midpoint rule in BDWF (this match was needed to accurately handle the wide tips with a single segment). Figure 5(a) shows that the proposed NUFFT t1 method matches both of these edge integral methods to around 3×107 in uoc in the shadow region where |uoc|2×105. NSLI agrees with t1 to around 6 digits everywhere. However, at ξ13  m, the error of BDWF spikes to O(1) as (ξ,0) approaches Ω.

Remark 4. Since z/λ1014, overall phase is meaningless; thus we fit the phase of BDWF to the other two methods at a single target. We then quote absolute differences in complex uoc. This is a more predictable metric than the error in intensity |uoc|2, which is affected by local intensity.

HG

Here the profile is analytically known from Ref. 14: A(r)=e[(ra)/b]p in [a,R], where a=b=12.5  m, the maximum radius is R=31  m, and p=6. A convergence study shows that only m=60 and np=30 are needed, giving an areal quadrature of N=37,440 nodes. For NSLI, we used A(r) and A(r) to generate a high-order line integral quadrature using the same m=60 radii per petal, plus four Gauss nodes across each gap and tip, giving n=2048 in total (see starshadeliquad in our repository34). We also fed these boundary nodes (but of course not their high-order weights) to BDWF. Figure 5(b) shows that the three methods again agree to the desired accuracy almost everywhere, apart from BDWF near Ω where again its errors hit O(1).

Remark 5. For HG with m=60, the errors of BDWF are summarized by 2 to 3 digits of relative accuracy overall, giving 6 to 7 digits of absolute uoc accuracy in the deep shadow. Yet NSLI, if fed the low-order midpoint rule used inside BDWF, gives only 4-digit absolute accuracy; thus it is useless in deep shadow. Figure 5(b) thus also explores (dash-dot line) the errors of NSLI with this low-order rule and the larger m=600: the absolute error now bottoms out at a useful 106. This highlights an advantage of BDWF over plain NSLI when deep shadows are modeled with poor quadrature, a subtle point explained in Remark 7.

3.2.2.

Solution speed

Table 2 presents CPU times for the above converged experiments on gridded targets (we omit NSLI times since they are similar to BDWF). The time to construct the areal quadrature is not included for two reasons: (i) it is never more than half the NUFFT method run time and (ii) it would be amortized away over runs at multiple wavelengths. We make the following observations.

  • For NI2, our proposal is around 70,000× faster and for HG around is 2000× faster than a state-of-the-art edge integral method.

  • By reinterpolation of the NI2 profile and code changes to use a high-order quadrature, BDWF could probably be sped up by a factor of 15. BDWF is found also to gain a factor of about two when multiple λ are needed. Neither factor impacts the conclusions much.

  • Since the number N of areal nodes is smaller than the large number M of targets, the cost of the NUFFT method is almost independent of N, hence of the starshade complexity, or f.

We now turn to arbitrary targets. In Fig. 6, the intensity suppression of the two starshade designs are studied, sweeping 50 wavelengths and taking the maximum intensity |uoc|2 over circles of varying radii ρ (M=60,000 targets at each λ), using the proposed NUFFT t3 method. The narrow-band nature of NI2 and deterioration of HG above 0.8  μm are apparent. The entire calculation for both starshades, including quadrature generation, totals 6 s on the laptop.

Fig. 6

Intensity (on log10 scale indicated on the right) as a function of wavelength and target radius ρ from the center for the two starshade designs [(a) NI2 and (b) HG] of Fig. 5. At each of 200 ρ values, the maximum over 300 angles is taken. The indicent intensity is 1. The NUFFT t3 method is used. Vertical dotted lines show the designed λ range.

JATIS_7_2_021211_f006.png

Finally, we report initial timing results upon having (rather crudely) inserted the NUFFT t3 method in place of BDWF within the SISTER code base27 (see sister_mods in our repository34). Running a standard SISTER “PSF basis” generation task for the non-spinning NI2 starshade, 14 wavelengths are needed covering [0.425,0.555]  μm, at each of which M=806,144 targets are needed. (Targets are organized into 16×16 telescope pupil grids, translated to 3149 different (ξ,η) centers covering a sector with angle 2π/Np.) We had to reorganize the loop ordering since BDWF was called separately for each pupil while grouping wavelengths together for speed, whereas the NUFFT t3 is most efficient with a single call to all targets at each wavelength.

The original SISTER run time was an estimated 6.5 h (since BDWF gets about 6×107 node-target pairs per second). The NUFFT t3 method, using parameters as in the previous section and including quadrature generation, took 2.6 s. The acceleration is thus about 10,000×.

Remark 6. The above shows that the proposed algorithm excels in efficiency when the number of desired targets M is large, resolving a region similar in size to the occulter. Since its cost is close to O(N+M), dropping M does not reduce run time much: N then dominates, and the relative speed over edge integral methods drops in proportion. The natural question is: what is the crossover M such that there is no advantage? For the NI2 starshade, since BDWF gives about 250  targets/s, the answer is as small as M20 for t1 and 50 for t3. Thus whenever the user needs more targets than this, the NUFFT wins.

3.3.

Complicated Domain

As a final example, we compute Fresnel diffraction for an aperture with a fractal boundary, specifically the standard Koch snowflake with a maximum radius R=1. Let Ω0 denote the equilateral triangle with side length 3, Ω1 is its union with three triangles of side 3/3, Ω2 is the union of Ω1 with 12 triangles of side 3/9, etc., so that ΩL is the level-L construction [see Fig. 7(a)]. To reach level L=13, ntri=1+3k=0L14k=67,108,864 triangles are needed. To build an areal quadrature, the integral over each triangle is approximated by a simple p×p node product Gauss–Legendre quadrature, by translating one vertex to the origin and then applying Eqs. (11) and (12) to the discretized line integral connecting the other two vertices (see the inset of figure). To accurately handle the oscillatory integrand as in Eq. (6), the node spacing should not exceed O(1/f); thus we designed a heuristic choice of p that varied from 217 for the largest triangle to p=1 at levels L10 and checked p-convergence for Fresnel integrals for λz0.01. For Ω13, the resulting total node number N is about 69 million, requiring about 4 min to build in our simple implementation.

Fig. 7

Koch fractal aperture diffraction example from Sec. 3.3. (a) The areal quadrature constructed by a union of about 67 million triangles. The color of each node (xj,yj) indicates its weight wj using the scale on the right. The inset shows a zoom into the region shown, resolving individual nodes. (b) Intensity (on log10 scale indicated on the right) computed on a million-point grid by the NUFFT t1 method in under 5 s.

JATIS_7_2_021211_f007.png

The NUFFT t1 method with ε=106 is then applied to this areal quadrature to resolve the diffracted field for λz=0.01 on a grid of 106 target points, as shown in Fig. 7(b). For each new λz0.01, this takes 4.6 s. Since Ω13 has 3×4132×108 edges, an edge integral method of similar accuracy is estimated to be around 105 to 106× slower. We checked (via p-convergence at each level) that this u computed for Ω13 has at least 6-digit accuracy.

However, we may also interpret the calculation as an approximation to one for the limit domain Ω with a true fractal boundary. Since the smallest triangles in Ω13 have sides of 1.1×106λz, i.e., much smaller than any Fresnel zone, we are well into the regime of Richardson extrapolation in L, with differences from the limit scaling like (4/9)L. The largest absolute change in u on the grid in going from Ω12 to Ω13 was 1.6×104; thus by extrapolation, the largest change in u between Ω13 and the limiting domain Ω is around 4/5 of this. Thus we may quote uniformly about 4-digit accuracy for diffraction from the limit fractal domain. (By applying Richardson to a sequence, one could in fact recover many more digits without much extra effort.)

4.

Conclusion and Discussion

We have explained a fast algorithm for Fresnel diffraction from binary aperture or occulter shapes, which achieves high accuracy via flexible areal quadrature schemes over the planar domain yet with speeds close to regular-grid FFT propagation methods, via the nonuniform FFT. Extensive tests of error convergence and CPU timings show between two and five orders of magnitude acceleration over edge integral methods at comparable accuracies. Thus at moderate Fresnel numbers f102, one can evaluate accurate diffraction fields from complicated shapes such as starshades almost instantaneously (0.1 s for a grid of 106 targets). For higher f, the O(f2logf) cost starts to dominate. RAM usage starts to become a limitation only at f103. Along the way, we have reformulated edge integrals as an NSLI that is numerically robust (Appendix A).

Although we did not exploit it, the proposed NUFFT method can trivially include smooth source-plane phase/amplitude variations while retaining accurate edge diffraction simply by multiplying wj by the source term at each areal node, in step 1 of Sec. 2.1. This is impossible for traditional edge integral methods, but we note exciting recent progress on including low-frequency phase variations with edge integrals (Ref. 17, Sec. 6).

Our findings highlight the importance of high-order accurate quadratures, both for edge integrals and, crucially, planar integrals. We have shown how the latter may be constructed for three classes of domain Ω (those with an existing boundary quadrature, ideal starshades, and unions of triangles). Such construction on a case-by-case basis is possible, at least up to the tolerance with which a description of Ω is available. Yet to automate this procedure for a requested accuracy and f, given “any” Ω, raises 2D geometry representation and meshing issues common throughout engineering and scientific computing. This is a huge topic with many available tools. Given the benefits that we show in optical simulation, automated high-order areal quadratures from CAD formats should be a useful future project.

4.1.

Application to Non-Ideal (Perturbed) Starshades

Since the proposed method does not exploit symmetry, it could vastly accelerate Monte Carlo studies of optical stability under the various types of realistic shape perturbations. The latter include manufacturing tolerances, in-flight misalignment, thermal distortions, and damage over time. The topic is complicated by geometry descriptions, statistical correlations, and averaging due to spinning.16 Numerical study of this is beyond the scope of this initial work. Yet, recall that once an areal quadrature exists for the desired shape Ω, the proposed method is very simple. We explain three ways that such a quadrature can be generated as follows.

  • 1. Section 3.1 showed how to automatically build an areal quadrature from an existing boundary quadrature for Ω, as in Fig. 2(c). In this way, the method may immediately be applied to a set of boundary nodes describing a perturbed starshade. This will not be as efficient as the quadratures of Sec. 3.2: e.g., for NI2, using the rather large supplied n=192,000 would give N108, giving CPU times around 10 s. Yet this is still two to three orders of magnitude faster than edge integrals in the case of M=106 targets.

  • 2. For ideal rigid petals that are misaligned, a deformed areal quadrature may be built as follows: apply rigid motion to the nodes for each petal, without changing their weights, and then add new quadratures covering the new (signed) areas that connect the base arc of each petal with its “root” arc on the ideal disc. This would only add a few nodes to N, hence preserving the 0.1-s CPU times quoted.

  • 3. A general non-ideal shape may be written as the ideal shape plus a narrow (signed) “ribbon” domain in the neighborhood of the boundary Ω. If its width is everywhere smaller than a Fresnel zone, as expected for realistic perturbations, the ribbon contribution is well approximated by a line integral on Ω. The resulting quadrature is that of Sec. 3.2 plus a smaller number of boundary nodes, preserving 0.1 s CPU times.

Which of the above methods, or whether another method, proves best in practice will depend on the number of runs required and spatial details of the perturbation.

4.2.

Other Extensions

Beyond the geometry-handling extensions discussed above, fruitful future directions include the following.

  • Further acceleration is probable by drop-in replacement of FINUFFT by a GPU library.

  • In this initial work, Gauss–Legendre rules were used for all 1D quadratures. However, since the Fresnel integrands are oscillatory and band-limited, it is probable that substituting high-order 1D rules with quasi-uniform nodes42,43 will allow N to be reduced for the same accuracy. Asymptotically at high f, one would expect a reduction by up to a factor (π/2)22.5.

  • It seems that a NUFFT replacing the first FFT in method (a)(i) from Sec. 1 could enable efficient new “angular spectrum of plane waves” (Ref. 6, Sec. 3.10) (one-way Helmholtz) methods for binary aperture diffraction beyond the Fresnel regime.

5.

Appendix A: Fresnel Edge Integral Methods—Equivalence and Desingularization

5.1.

Equivalence of Edge Integral Methods for Planar Waves and Apertures

Dauger19 noticed that, using polar coordinates about the target (ξ,η), i.e., x=ξ+rcosθ and y=η+rsinθ, the Fresnel aperture integral Eq. (1) may be analytically integrated in r, for each θ, as follows. Assuming that the target is in Ω (geometric shadow) and that Ω is strictly star-shaped about this target, Eq. (1) becomes

Eq. (16)

uap(ξ,η)=1iλz02π0R(θ)eiπλzr2rdrdθ=12π02π[eiπλzR(θ)21]dθ,
where R(θ) is the distance r where the in-plane ray launched at angle θ from the target exits Ω. If these conditions are broken, R(θ) becomes multivalued. For targets outside Ω, the second term in the square brackets must be replaced by one similar to the first but involving the r value where the ray first enters Ω. The resulting numerical method is cumbersome for a general Ω because much effort is spent finding, and tracking as a function of θ, these multiple ray intersection points.19

It is simpler to reformulate Eq. (1) in terms of a line integral over Ω. Cash [Ref. 14, (46)] provides such a formula but no rigorous derivation. To remedy this, fixing the target, we write r(xξ,yη) and hence r2=r2, and we define the 2D vector field:

Eq. (17)

F(x,y)12πrr2eiπλzr2,(x,y)(ξ,η).
After cancelling terms, its divergence is found to be simply the Fresnel integrand from Eq. (1) minus the unit 2D delta distribution at the target (the latter can be proven by excluding a small disk of radius r0 about the target), that is,

Eq. (18)

·F(x,y)=1iλzeiπλzr2δ(r).
Applying the divergence theorem in Ω, with n being the unit outward normal, and, as before, taking the 2D cross product as a scalar,
Ω·Fdxdy=ΩF·nds=ΩF×ds=12πΩeiπλzr2r×dsr2.
Substituting Eq. (18) and recalling Eq. (1) gives the line integral formula

Eq. (19)

uap(ξ,η)=12πΩeiπλzr2r×dsr2+ugeomap(ξ,η),where  ugeomap(ξ,η){1,(ξ,η)Ω0,otherwise.
This is easily seen to be equivalent to Dauger’s formulas using the facts: (i) dθ=(r×ds)/r2, (ii) ugeomap(ξ,η)=02πdθ/2π, and (iii) the multiple values of R(θ) correspond to θ folding back as Ω is traversed.

We now show that, within the Fresnel approximation, the Miyamoto–Wolf [Ref. 26, Eqs. (5.1), (5.5)] BDW formulation used by Cady15 is also equivalent to the above. Given u(x,y), the plane incident wave at the aperture with unit direction vector p, this states (noting that our uap definition excludes the phase of plane z-propagation) that

Eq. (20)

uap(ξ,η)=14πe2πiz/λΩu(x,y)e2πiρ/λρρ^×p·ds1+ρ^·p+ugeomap(ξ,η).
Here we recall that ρ=r2+z2 is the target-source 3D distance and define ρ^ to be the unit vector pointing from the target to the source (ρ is notated as s in standard references, but we reserve the latter for arclength). Since we are concerned with planar incidence, u(x,y)1 and p=(0,0,1). Since rz is implicit in Eq. (1), we insert the leading-order small-angle approximations ρz, ρ^×p·ds(r×ds)/z, and 1+ρ^·pr2/(2z2) and the usual Fresnel approximation e2πiρ/λe2πiz/λeiπλzr2. The result is precisely Eq. (19). Thus all three edge formulations are equivalent.

However, it is worth noting that BDW Eq. (20) and some formulas in Dubra–Ferrari22 have a wider range of validity than Eq. (1), Dauger’s formulas, or Eq. (19) since they allow for out-of-plane apertures and more general incident waves.

5.2.

Robust Non-Singular Line Integral Formulation

To our knowledge, all edge integral numerical codes use formulas shown in the previous section to be equivalent to Eq. (19) and are thus well known to be plagued by two serious problems as follows.14,15,17,19,22

  • 1. Targets must be labeled as being inside or outside Ω in a robust fashion, no matter how close they are to Ω; otherwise O(1) errors result.

  • 2. When the target approaches Ω, the integrand on Ω becomes nearly singular, requiring increasingly refined quadrature near the target to retain accuracy. (Dauger’s θ-parameterization conceals this but does not remove the difficulty since R(θ) changes arbitrarily rapidly.)

For instance, in Secs. 3.1 and 3.2.1, we saw that the BDWF code loses all accuracy near Ω. However, once it is realized that the two problems are in fact facets of the same phenomenon, they can be made to “cancel out.”

This works as follows. It is well known [Ref. 44, (6.23)] [or combining facts (i) and (ii) above] that

Eq. (21)

ugeomap(ξ,η)=12πΩr×dsr2.
Inserting this into Eq. (19) gives one formula that applies whether the target is inside or outside Ω:

Eq. (22)

uap(ξ,η)=12πΩ(1eiπλzr2)r×dsr2(NSLIformula).
This has no singularity as r0 (target approaching Ω) because the term in square brackets is O(r2), cancelling the denominator. The integrand is as smooth as the Fresnel zones, i.e., as smooth as the diffracted field in the target plane. We believe that Eq. (22) is new.

This leads to an incredibly simple yet robust code. For instance, in MATLAB, if bx and by list coordinates of nodes on Ω, with wx and wy being the corresponding weights for a vector line integral as in Sec. 3.1, the entire NSLI code to output uap at a target (xi,eta) is five lines:

rx = bx - xi; ry = by - eta;  % components of r displacement vector

r2 = rx.*rx + ry.*ry;     % r^2

f = (1 - exp((1i*pi/lambdaz)*r2))./ r2;

f(r2==0.0) = 0.0;       % kill NaNs (target hits node)

uap = sum((rx.*wy - ry.*wx).* f) / (2*pi);  % cross product, quadrature

Note that when a target hits a node to within machine error (r=0), any finite value of f may be inserted in line 4 because r×ds=0 in line 5. Yet there is a subtlety here: numerical eyebrows should immediately be raised because f involves catastrophic cancellation as r0. To understand why this is in fact barely a problem, we apply forward error analysis [Ref. 45, Ch. 1] and treat the real and imaginary parts separately (combining them leads to a pessimistic prediction).

The imaginary part of the exp is sin[(π/λz)r2], which, given a rounded value of r2, is computed to relative accuracy O(ϵmach), where ϵmach1.1×1016 is the usual double precision relative error. Subtraction from 1 does not change the imaginary part. The division by r2 then results in absolute error O(ϵmach), which then gets multiplied by the O(r) cross product, giving O(ϵmachr). Note that this holds even though r2 is necessarily inaccurate due to coordinate subtraction in line 1.

Now to the real part of the exp, which is 1+O(r4). Thus when rϵmach1/4, the real part of exp is in machine arithmetic exactly 1, which cancels the other 1 exactly, leaving zero. Since the true answer is O(r3), in this regime the final error is bounded by O(ϵmach3/4). On the other hand, for rϵmach1/4, catastrophic cancellation occurs: the error in the real part of exp is O(ϵmach), so the final error is O(ϵmach/r). In summary, uniformly in r, the final error is bounded by O(ϵmach3/4). In practice, we find by comparison to the areal quadrature answers that this uniform bound is around 1014, which is adequate. Replacing by a Taylor expansion for small r (e.g., via cexprl46) could possibly gain a digit.

Equation (22), in the form of the above code looped over target points, serves as our reference direct method. A documented, tested MATLAB/Octave implementation is in the repository34 bdrymeths/nsli_pts.m

Remark 7. When a poor quadrature (that is, low order and few nodes) is used with deep shadow regions, the usual line integral Eq. (19) has one advantage over NSLI Eq. (22): it can in shadows achieve relative accuracy in u, appropriate to the quadrature, because ugeom=0 exactly. NSLI merely achieves absolute accuracy in u; thus it may require a better quadrature to resolve deep shadows than Eq. (19) (as implemented by, e.g., BDWF). In essence, ugeomap1 [the “1” term in Eq. (22)] to limited accuracy, which is then poorly canceled in Eq. (3). To remedy this, our NSLI implementation also includes an option to use Eq. (19) for targets far from Ω, combining the robustness of Eq. (22) with the deep shadow relative accuracy of traditional edge integrals.

Acknowledgments

This work benefited from the input of the anonymous reviewers and discussions with David Spergel, Robert Vanderbei, Stuart Shaklan, and Eric Cady. The Flatiron Institute is a division of the Simons Foundation.

References

1. 

M. Born and E. Wolf, Principles of Optics, 6th ed.Pergamon Press, Oxford (1980). Google Scholar

2. 

P. M. Morse and K. U. Ingard, Theoretical Acoustics, McGraw-Hill, New York (1968). Google Scholar

3. 

M. D. Perrin et al., “Simulating point spread functions for the James Webb Space Telescope with WebbPSF,” Proc. SPIE, 8442 84423D (2012). https://doi.org/10.1117/12.925230 PSISDG 0277-786X Google Scholar

4. 

Y. Hu et al., “Efficient full-path optical calculation of scalar and vector diffraction using the Bluestein method,” Light Sci. Appl., 9 (1), 119 (2020). https://doi.org/10.1038/s41377-020-00362-z Google Scholar

5. 

A. J. Bourdillon et al., “A critical condition in Fresnel diffraction used for ultra-high resolution lithographic printing,” J. Phys. D, 33 (17), 2133 –2141 (2000). https://doi.org/10.1088/0022-3727/33/17/307 JPAPBE 0022-3727 Google Scholar

6. 

J. W. Goodman, Introduction to Fourier Optics, 2nd ed.McGraw-Hill(1996). Google Scholar

7. 

M. Ruiz-Lopez et al., “Coherent x-ray beam metrology using 2D high-resolution Fresnel-diffraction analysis,” J. Synchrotron. Rad., 24 (1), 196 –204 (2017). https://doi.org/10.1107/S1600577516016568 JSYRES 0909-0495 Google Scholar

8. 

T. D. Mast, “Fresnel approximations for acoustic fields of rectangularly symmetric sources,” J. Acoust. Soc. Am., 121 3311 –3322 (2007). https://doi.org/10.1121/1.2726252 JASMAN 0001-4966 Google Scholar

9. 

P. Tsang et al., “Computer generation of binary Fresnel holography,” Appl. Opt., 50 (7), B88 –B95 (2011). https://doi.org/10.1364/AO.50.000B88 APOPAI 0003-6935 Google Scholar

10. 

R. J. Vanderbei, E. J. Cady and N. J. Kasdin, “Optimal occulter design for finding extrasolar planets,” Astrophys. J., 665 (1), 794 –798 (2007). https://doi.org/10.1086/519452 ASJOAB 0004-637X Google Scholar

11. 

D. Serre, “The Fresnel imager: instrument numerical model,” Exp. Astron., 30 111 –121 (2011). https://doi.org/10.1007/s10686-010-9200-7 EXASER 0922-6435 Google Scholar

12. 

R. Wilhem and K. Laurent, “Improvements on Fresnel arrays for high contrast imaging,” Exp. Astron., 45 21 –40 (2018). https://doi.org/10.1007/s10686-017-9568-8 EXASER 0922-6435 Google Scholar

13. 

A. S. Lo, T. Glassman and C. Lillie, “New worlds observer optical performance,” Proc. SPIE, 6687 668716 (2007). https://doi.org/10.1117/12.730521 PSISDG 0277-786X Google Scholar

14. 

W. Cash, “Analytic modeling of starshades,” Astrophys. J., 738 (1), 76 (2011). https://doi.org/10.1088/0004-637X/738/1/76 ASJOAB 0004-637X Google Scholar

15. 

E. J. Cady, “Boundary diffraction wave integrals for diffraction modeling of external occulters,” Opt. Express, 20 (14), 15196 –15208 (2012). https://doi.org/10.1364/OE.20.015196 OPEXFF 1094-4087 Google Scholar

16. 

S. B. Shaklan, L. Marchen and E. Cady, “Shape accuracy requirements on starshades for large and small apertures,” Proc. SPIE, 10400 104001T (2017). https://doi.org/10.1117/12.2273436 PSISDG 0277-786X Google Scholar

17. 

A. Harness et al., “Advances in edge diffraction algorithms,” J. Opt. Soc. Am. A, 35 (2), 275 –285 (2018). https://doi.org/10.1364/JOSAA.35.000275 JOAOD6 0740-3232 Google Scholar

18. 

A. Harness et al., “Modeling non-scalar diffraction in the Princeton starshade testbed,” Proc. SPIE, 10698 1069865 (2018). https://doi.org/10.1117/12.2310187 PSISDG 0277-786X Google Scholar

19. 

D. E. Dauger, “Simulation and study of Fresnel diffraction for arbitrary two-dimensional apertures,” Comput. Phys., 10 (6), 591 –604 (1996). https://doi.org/10.1063/1.168584 CPHYE2 0894-1866 Google Scholar

20. 

J. E. Harvey and J. L. Fordham, “The spot of Arago: new relevance for an old phenomenon,” Am. J. Phys., 52 (3), 243 –247 (1984). https://doi.org/10.1119/1.13681 AJPIAS 0002-9505 Google Scholar

21. 

G. E. Sommargren and H. J. Weaver, “Diffraction of light by an opaque sphere. 1: Description and properties of the diffraction pattern,” Appl. Opt., 29 4646 –4657 (1990). https://doi.org/10.1364/AO.29.004646 APOPAI 0003-6935 Google Scholar

22. 

A. Dubra and J. A. Ferrari, “Diffracted field by an arbitrary aperture,” Am. J. Phys., 67 (1), 87 –92 (1999). https://doi.org/10.1119/1.19195 AJPIAS 0002-9505 Google Scholar

23. 

L. Junchang and W. Yanmei, “An indirect algorithm of Fresnel diffraction,” Opt. Commun., 282 455 –458 (2009). https://doi.org/10.1016/j.optcom.2008.10.060 OPCOB8 0030-4018 Google Scholar

24. 

D. Mas et al., “Fast algorithms for free-space diffraction patterns calculation,” Opt. Commun., 164 233 –245 (1999). https://doi.org/10.1016/S0030-4018(99)00201-1 OPCOB8 0030-4018 Google Scholar

25. 

A. F. Oskooi et al., “MEEP: a flexible free-software package for electromagnetic simulations by the FDTD method,” Comput. Phys. Commun., 181 (3), 687 –702 (2010). https://doi.org/10.1016/j.cpc.2009.11.008 CPHCBZ 0010-4655 Google Scholar

26. 

K. Miyamoto and E. Wolf, “Generalization of the Maggi-Rubinowicz theory of the boundary diffraction wave—Part II,” J. Opt. Soc. Am., 52 626 –636 (1962). https://doi.org/10.1364/JOSA.52.000626 JOSAAH 0030-3941 Google Scholar

27. 

S. R. Hildebrandt et al., “Starshade imaging simulation toolkit for exoplanet reconnaissance (SISTER),” http://sister.caltech.edu/ Google Scholar

28. 

D. Colton and R. Kress, Inverse Acoustic and Electromagnetic Scattering Theory, 93 2nd ed.Springer-Verlag, Berlin (1998). Google Scholar

29. 

O. P. Bruno and S. K. Lintner, “A high-order integral solver for scalar problems of diffraction by screens and apertures in three-dimensional space,” J. Comput. Phys., 252 250 –274 (2013). https://doi.org/10.1016/j.jcp.2013.06.022 JCTPAH 0021-9991 Google Scholar

30. 

A. Dutt and V. Rokhlin, “Fast Fourier transforms for nonequispaced data,” SIAM J. Sci. Comput., 14 1368 –1393 (1993). https://doi.org/10.1137/0914081 SJOCE3 1064-8275 Google Scholar

31. 

J.-Y. Lee and L. Greengard, “The type 3 nonuniform FFT and its applications,” J. Comput. Phys., 206 1 –5 (2005). https://doi.org/10.1016/j.jcp.2004.12.004 JCTPAH 0021-9991 Google Scholar

32. 

R. Soummer et al., “Fast computation of Lyot-style coronagraph propagation,” Opt. Express, 15 (24), 15935 –15951 (2007). https://doi.org/10.1364/OE.15.015935 OPEXFF 1094-4087 Google Scholar

33. 

A. H. Barnett, J. F. Magland and L. af Klinteberg, “A parallel non-uniform fast Fourier transform library based on an ‘exponential of semicircle’ kernel,” SIAM J. Sci. Comput., 41 (5), C479 –C504 (2019). https://doi.org/10.1137/18M120885X SJOCE3 1064-8275 Google Scholar

34. 

A. H. Barnett, “FRESNAQ: MATLAB/Octave library for fast Frensel diffraction from apertures and occulters,” (2020). https://github.com/ahbarnett/fresnaq Google Scholar

35. 

L. N. Trefethen, Approximation Theory and Approximation Practice, SIAM,2013). http://www.chebfun.org/ATAP Google Scholar

36. 

B. Vioreanu and V. Rokhlin, “Spectra of multiplication operators as a numerical tool,” SIAM J. Sci. Comput., 36 A267 –A288 (2014). https://doi.org/10.1137/110860082 SJOCE3 1064-8275 Google Scholar

37. 

S. E. Mousavi, H. Xiao and N. Sukumar, “Generalized Gaussian quadrature rules on arbitrary polygons,” Int. J. Numer. Methods Eng., 82 (1), 99 –113 (2010). https://doi.org/10.1002/nme.2759 IJNMBH 0029-5981 Google Scholar

38. 

H. Xiao and Z. Gimbutas, “A numerical algorithm for the construction of efficient quadrature rules in two and higher dimensions,” Comput. Math. Appl., 59 (2), 663 –676 (2010). https://doi.org/10.1016/j.camwa.2009.10.027 CMAPDK 0898-1221 Google Scholar

39. 

D. Gunderman, K. Weiss and J. A. Evans, “Spectral mesh-free quadrature for planar regions bounded by rational parametric curves,” Comput. Aided Des., 130 102944 (2021). https://doi.org/10.1016/j.cad.2020.102944 CAIDA5 0010-4485 Google Scholar

40. 

L. Bluestein, “A linear filtering approach to the computation of discrete Fourier transform,” IEEE Trans. Audio Electroacoust., 18 451 –455 (1970). https://doi.org/10.1109/TAU.1970.1162132 ITADAS 0018-9278 Google Scholar

41. 

J. Keiner, S. Kunis and D. Potts, “Using NFFT 3—a software library for various nonequispaced fast Fourier transforms,” ACM Trans. Math. Software, 36 (4), 1 –30 (2009). https://doi.org/10.1145/1555386.1555388 ACMSCU 0098-3500 Google Scholar

42. 

B. K. Alpert, “Hybrid gauss-trapezoidal quadrature rules,” SIAM J. Sci. Comput., 20 1551 –1584 (1999). https://doi.org/10.1137/S1064827597325141 SJOCE3 1064-8275 Google Scholar

43. 

N. Hale and L. N. Trefethen, “New quadrature formulas from conformal maps,” SIAM J. Numer. Anal., 46 (2), 930 –948 (2008). https://doi.org/10.1137/07068607X Google Scholar

44. 

R. Kress, Linear Integral Equations, 82 2nd ed.Springer, New York (1999). Google Scholar

45. 

N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed.SIAM, Philadelphia (2002). Google Scholar

46. 

W. Fullerton, “cexprl.f: Fortran code for the complex relative error exponential,” (1989) https://www.netlib.org/slatec/fnlib/cexprl.f Google Scholar

Biography

Alex H. Barnett received his PhD in physics from Harvard University, followed by postdoctoral work in radiology at Massachusetts General Hospital and in applied mathematics at New York University. He then taught in the Department of Mathematics at Dartmouth College for 12 years, becoming a full professor in 2017. He is a group leader for numerical analysis at the Center for Computational Mathematics, Flatiron Institute, New York, USA. His research interests include computational partial differential equations, waves, fast algorithms, integral equations, neuroscience, imaging, and inverse problems. He has published more than 50 papers, received several grants from the National Science Foundation, won the XXI International Physics Olympiad, and received Dartmouth’s Karen E. Wetterhahn Memorial Award for Distinguished Creative or Scholarly Achievement.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Alex H. Barnett "Efficient high-order accurate Fresnel diffraction via areal quadrature and the nonuniform fast Fourier transform," Journal of Astronomical Telescopes, Instruments, and Systems 7(2), 021211 (11 January 2021). https://doi.org/10.1117/1.JATIS.7.2.021211
Received: 12 October 2020; Accepted: 16 December 2020; Published: 11 January 2021
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications.
Advertisement
Advertisement
KEYWORDS
Near field diffraction

Diffraction

Fourier transforms

Tolerancing

Detection and tracking algorithms

Telescopes

Apodization

RELATED CONTENT


Back to Top