Data augmentation in extreme ultraviolet lithography simulation using convolutional neural network

Hiroyoshi Tanabe; Atsushi Takahashi

doi:10.1117/1.JMM.21.4.041602

14 October 2022 Data augmentation in extreme ultraviolet lithography simulation using convolutional neural network

Hiroyoshi Tanabe, Atsushi Takahashi

Journal of Micro/Nanopatterning, Materials, and Metrology, Vol. 21, Issue 4, 041602 (October 2022). https://doi.org/10.1117/1.JMM.21.4.041602

Abstract

Background

In the previous work, we developed a convolutional neural network (CNN), which reproduces the results of the rigorous electromagnetic (EM) simulations in a small mask area. The prediction time of CNN was 5000 times faster than the calculation time of EM simulation. We trained the CNN using 200,000 data, which were the results of EM simulation. Although the prediction time of CNN was very short, it took a long time to build a huge amount of the training data. Especially when we enlarge the mask area, the calculation time to prepare the training data becomes unacceptably long.

Aim

Reducing the calculation time to prepare the training data.

Approach

We apply data augmentation technique to increase the number of training data using limited original data. The training data of our CNN are the diffraction amplitudes of mask patterns. Assuming a periodic boundary condition, the diffraction amplitudes of the shifted or flipped mask pattern can be easily calculated using the diffraction amplitudes of the original mask pattern.

Results

The number of training data after the data augmentation is multiplied by 200 from 2500 to 500,000. Using a large amount of training data, the validation loss of CNN was reduced. The accuracy of CNN with augmented data is verified by comparing the CNN predictions with the results of EM simulation.

Conclusions

Data augmentation technique is applied to the diffraction amplitude of the mask pattern. The data preparation time is reduced by a factor of 200. Our CNN almost reproduces the results of EM simulation. In this work, the mask patterns are restricted to line and space patterns. It is a challenge to build several CNNs for specific mask patterns or ultimately a single CNN for arbitrary mask patterns.

1. Introduction

High aspect absorbers used in extremely ultraviolet (EUV) masks induce several mask three-dimensional (3D) effects, such as critical dimension (CD) and image placement errors.¹^,² It is necessary to include the mask 3D effects in EUV lithography simulation. Mask 3D effects can be calculated rigorously using electromagnetic (EM) simulators.³^–⁷ However, these simulators are highly time-consuming for full-chip applications.

Recently, many attempts have been made to simulate the mask 3D effects using deep neural networks (DNNs). They are classified into three models depending on the targets of DNNs. Three possible targets are, from the mask plane to the wafer plane, the near-field amplitude on the mask, the far-field amplitude (diffraction spectrum) at the pupil of the projection optics, and the image intensity on the wafer. In the first model, the target is the near-field amplitude on the mask calculated by EM simulation.⁸^–¹¹ This model requires many DNNs to reproduce different near-field amplitudes depending on the source position. In the second model, which is our model,¹² the target of DNN is the far-field amplitude at the pupil of the projection optics. Because the far-field amplitudes are described in momentum (wave vector) space and the source position corresponds to the incident momentum in Koehler illumination, our model naturally parametrizes the source position dependence of the amplitude. The third model¹³^,¹⁴ uses the image intensity on the wafer as the target of DNN. This model is much straightforward than other models because the image intensity is used in the following resist simulation. However, the phase information is lost when the diffraction amplitude is converted to the image intensity. The phase of the amplitude is not included in the targets of this model. The phase indirectly influences the focus dependence of the intensity. Therefore, the model needs many intensity targets at different focus positions for each mask pattern.

In our previous work,¹² we developed a convolutional neural network (CNN), which reproduces the results of the rigorous EM simulation in a small mask area. The prediction time of CNN was 5000 times faster than the calculation time of EM simulation. We trained the CNN using 200,000 data, which were the results of EM simulation. Such a large amount of the data was necessary to reduce the validation loss during the training. Although the prediction time of CNN was very short, it took a long time ( $\sim 1$ week) to build the training data. Creating the training data in the work was possible because the mask area was small. However, when we enlarge the mask area for optical proximity correction (OPC) in large area, the calculation time to prepare the training data becomes unacceptably long. In the large area OPC process, the large mask area is clipped into many small mask areas. The size of the clipped areas needs to be large enough to avoid the influence from the surrounding mask pattern, at least near the center of the clipped area.

In this work, we apply data augmentation to our CNN, which is a standard technique in DNN. The technique allows us to increase the number of the training data without performing EM calculation, which significantly reduces the time to prepare the training data. In Sec. 2, we explain the detail of our data augmentation technique. In this work, we focus on the application to metal layers and assume that the mask patterns are simple line and space patterns. In Sec. 3, we study the accuracy of our CNN prediction on CDs and edge placement errors (EPEs). Section 4 is the summary.

2. Data Augmentation for Large Mask Patterns

In the previous work,¹² we assumed a periodic mask pattern with $720 nm \times 720 nm$ mask area. When we clip out a small mask area from the mask data, we should not use the edges of the mask area to avoid the influence of the neighboring mask pattern. According to Ref. 15, the optical interaction range $R_{opt}$ is calculated as

Eq. (1)

R_{opt} = \frac{1.12 λ}{σ NA},

where

λ

,

σ

, and NA represent the wavelength, coherence factor, and numerical aperture of the scanner, respectively. The wavelength of EUV light is 13.5 nm, and the numerical aperture of the current EUV scanner is 0.33. The coherence factor depends on the illumination setting, and the typical value is 0.5. Inserting these values in Eq. (1), the optical interaction range

R_{opt} = 90 nm

. This value is the length on the wafer, and the number is multiplied by four on the mask. Therefore, the optical interaction range on the mask is

4 \times R_{opt} = 360 nm

. Figure 1 shows the usable mask area excluding the area influenced by the neighboring mask pattern. The mask size

L

should be larger than 720 nm to get the usable mask area. Therefore, there was no usable area for large area OPC in our previous work.

Fig. 1

Usable mask area.

In this work, we choose $1024 nm \times 1024 nm$ mask area. The usable mask area is $300 {nm}^{2}$ . The usable area is not large, but the EM simulation time highly depends on the size of the mask area. The calculation time of $1024 nm \times 1024 nm$ mask area takes 162 s using Core i9-9900K CPU. In the simulation, we use 3D waveguide model,⁵^–⁷ which solves coupled wave equations in momentum space. The calculation time highly depends on the cut-off momentum. In this work, we include the momentum $(k_{x}, k_{y})$ , which satisfies

Eq. (2)

(\frac{| k_{x} |}{k_{x}^{\max}} + 1) (\frac{| k_{y} |}{k_{y}^{\max}} + 1) \leq 2,

where

k_{x}^{\max} = k_{y}^{\max} = 6 \cdot \frac{NA}{4} \frac{2 π}{λ}

. This number is six times larger than the size of the pupil

\frac{NA}{4} \frac{2 π}{λ}

. Discretizing the momentum by

2 π / L

, there are 2121

(k_{x}, k_{y})

pairs, which satisfy Eq. (2). The size of the matrix solving the coupled wave equations is

4242 \times 4242

because there are two polarizations. The region in Eq. (2) is quasi-hyperbola, which resembles the diffraction spectrum of mask patterns consisting of vertical and horizontal lines or holes. Mask patterns are conventionally designed using

X – Y

coordinates. The minimum pattern pitch in

X

or

Y

direction is small compared to the minimum pattern pitch in the diagonal direction. Therefore, the diffraction amplitude in the diagonal direction decreases rapidly compared to the amplitude in

X

or

Y

direction in momentum space.

DNNs require a large amount of the training data. In the previous work, we used 200,000 training data. It will take a year if we calculate the same number of the data with the mask area in this work. Data augmentation is a powerful technique in deep learning to increase the number of the training data with limited original data. In our CNN, the input is the mask pattern, and the outputs are the far-field diffraction amplitude $A (l, m; l_{s}, m_{s})$ , where $(l, m)$ is the diffraction order, and $(l_{s}, m_{s})$ is the source position (Fig. 2). In 3D waveguide model,⁵^–⁷ not only the diffraction momentum but also the source position (or incident momentum) is discretized by $2 π / L$ . As discussed in Ref. 12, assuming the largest $σ$ value to be 1, the diffraction order and the source position are restricted by the pupil shape and the source shape as follows:

Eq. (3)

\sqrt{{(l + l_{s})}^{2} + {(m + m_{s})}^{2}} \leq \frac{NA}{4} \frac{L}{λ},

Eq. (4)

\sqrt{l_{s}^{2} + m_{s}^{2}} \leq \frac{NA}{4} \frac{L}{λ} .

When

L = 1024 nm

, the number of the possible combination of

(l, m)

is 457.

Fig. 2

Schematic view of light diffraction by an EUV mask.

When the mask pattern is shifted or $Y$ -flipped as shown in Fig. 3, the diffraction amplitudes of these patterns can be easily calculated from that of the original pattern. Note that in EUV reflective optics, $X$ -flip is not symmetrical to the chief ray because it is tilted 6 deg in $Y$ direction.

Fig. 3

Original, shifted, and $Y$ -flipped mask patterns.

Following Ref. 12, the far-field diffraction amplitude $A (l, m; l_{s}, m_{s})$ is divided into the thin mask amplitude (Fourier transform of the mask pattern) $A^{FT} (l, m)$ , which does not depend on the source position $(l_{s}, m_{s})$ , and the mask 3D amplitude $A^{3 D} (l, m; l_{s}, m_{s})$

Eq. (5)

A (l, m; l_{s}, m_{s}) = A^{FT} (l, m) + A^{3 D} (l, m; l_{s}, m_{s}) .

Figure 4 shows the source position dependence of the mask 3D amplitude. The source position where the amplitude contributes to the image intensity is limited by the source shape and the pupil shape. Only the overlapping area in Fig. 4 contributes to the image intensity. We approximate the mask 3D amplitude in this area by a linear function of the source position $(l_{s}, m_{s})$ as follows:

Eq. (6)

A_{x}^{3 D} (l, m; l_{s}, m_{s}) ≅ a_{0} (l, m) + a_{x} (l, m) (l_{s} + l / 2) + a_{y} (l, m) (m_{s} + m / 2),

where

a_{0}

is the mask 3D amplitude at the center of the overlapping area:

(l_{s}, m_{s}) = (- l / 2, - m / 2)

, and

a_{x}

and

a_{y}

are the slopes of the amplitude in the

X

and

Y

directions on the source plane, respectively. We call these three numbers as mask 3D parameters.

Fig. 4

Source position dependence of mask 3D amplitude.

Equation (6) is slightly different from equation (11) in Ref. 12. We modify the equation for the following reason. 3D waveguide model calculates the diffraction amplitudes at the grid points in Fig. 4. The model solves coupled wave equations, so all the amplitudes are calculated simultaneously. Mask 3D parameters are derived by least square fitting to the amplitudes in the overlapping area. The larger $(l, m)$ , the fewer grid points inside the overlapping area. If the number of the grid points is too small, the overlapping area becomes a line or just a point. In such case, we approximate the amplitude in the area using only $a_{0} (l, m)$ as the average of the amplitudes and do not use $a_{x} (l, m)$ and $a_{y} (l, m)$ . Therefore, the number of $a_{0} (l, m)$ is 457, whereas the number of $a_{x} (l, m)$ or $a_{y} (l, m)$ is 349.

We use two different methods to derive $a_{0} (l, m)$ for small $(l, m)$ and large $(l, m)$ . Equation (6) is used to ensure that $a_{0} (l, m)$ is always the average of the amplitudes. When we define the overlapping area as $O$ and the number of the grid points in the area as $N$ , the result of the least square fitting to Eq. (6) gives

Eq. (7)

a_{0} (l, m) = \frac{1}{N} \sum_{(l_{s}, m_{s}) \in O} A_{x}^{3 D} (l, m; l_{s}, m_{s}),

because

Eq. (8)

\sum_{(l_{s}, m_{s}) \in O} (l_{s} + l / 2) = \sum_{(l_{s}, m_{s}) \in O} (m_{s} + m / 2) = 0 .

Equation (6) contributes to reducing the loss of CNN training in $a_{0}$ compared to using equation (11) in Ref. 12.

In Eq. (6), we consider only the $x$ polarized amplitude. The difference between the diffraction amplitudes of $x$ and $y$ polarizations is very small, and the polarization change by EUV mask diffraction is negligible as shown in Ref. 12.

The values of mask 3D parameters are determined by the mask pattern and the absorber. In this work, we use Ta absorber with 50-nm thickness. We construct a CNN model, which predicts the mask 3D parameters from the mask pattern. Figure 5 shows the architecture of our CNN. Six independent CNNs are used for the real part and the imaginary part of three mask 3D parameters. Six CNNs are merged into one model after the training. The input is a random line and space pattern with a mask area of $1024 nm \times 1024 nm$ . The pattern size is randomly selected from 60 to 160 nm (15 to 40 nm on wafer). Half of the training data are bright field (BF) masks, and the rest are dark field (DF) masks. $1024 \times 1024$ binary data are averaged to $256 \times 256$ float data before inputting to the CNNs. Circular padding¹⁶ is used because we assume periodic boundary conditions for input mask patterns.

Fig. 5

Architecture of our CNN.

Figure 6 shows the loss functions of training and validation data for $Real (a_{0})$ with/without data augmentation. The number of the original data for training is 2500, and the number of the data for validation is 1000. With data augmentation, the original data are shifted by 103 nm increments in both $X$ and $Y$ directions and flipped along the $Y$ axis. Therefore, the number of the training data after the data augmentation is multiplied by 200 to 500,000.

Fig. 6

Training and validation loss for $Real (a_{0})$ with/without data augmentation.

Without data augmentation, the training loss decreases during the training, whereas the validation loss does not. This is a typical overfitting phenomenon. With data augmentation, both the training loss and the validation loss decrease during the training. The validation loss after the training is small. The mean square error per target after the training is $1.52 / 457 = 0.0033$ . For every 457 targets, the maximum value of the input data is normalized to 1 in the training.

Figure 7 compares the mask 3D parameters at several diffraction orders for 100 test data. The correlation between the parameters by EM simulation and CNN predictions is generally good. There are some exceptions such as $Real (a_{x} (0, 0))$ where the correlation is poor. However, the value is very small compared to the values of other parameters.

Fig. 7

Mask 3D parameters calculated by EM simulation and parameters predicted by CNN.

3. CNN Prediction Accuracy

The accuracy of our CNN is verified by calculating the image intensities of test mask patterns. Training data in this work are random line and space patterns. Standard line and space test mask patterns are used to confirm the accuracy of CNN. Figure 8 compares the image intensities of a line mask pattern by EM simulation, Fourier transformation (FT), and CNN prediction. In the calculations, we assume $λ = 13.5 nm$ , $NA = 0.33$ , and annular illumination with $σ_{in} / σ_{out} = 0.3 / 0.8$ . The bottom figures show the difference of the intensities between EM simulation and FT or CNN prediction. The difference between EM and CNN is much smaller than the difference between FT and EM.

Fig. 8

Line mask pattern and its image intensities by EM simulation, FT, and CNN prediction.

Figure 9 compares the CDs and the EPEs of vertical (V) lines with several line widths. They are measured at the cut line across the V lines in Fig. 8. In addition to EM, CNN, and FT, we plot the result of the simulation using the linear (LIN) approximation of the diffraction amplitude in Eq. (6) (LIN in Fig. 9). The difference between EM and LIN indicates the accuracy of the linear approximation in Eq. (6). Also, the difference between LIN and CNN indicates the accuracy of CNN prediction. The agreement among EM simulation, the linear approximation, and CNN prediction is good. Figure 10 shows the results for horizontal (H) lines. The agreement between LIN and CNN is good but we see small difference between EM and LIN. Adding higher-order terms to Eq. (6) may help reduce this error.

Fig. 9

CDs and EPEs of vertical lines.

Fig. 10

CDs and EPEs of horizontal lines.

Figure 11 compares the image intensities of a space mask pattern, and Figs. 12 and 13 show the CDs and EPEs of V and H spaces with several space widths. Similar results can be seen with space patterns as with line patterns.

Fig. 11

Space mask pattern and its image intensities by EM simulation, FT, and CNN prediction.

Fig. 12

CDs and EPEs of vertical spaces.

Fig. 13

CDs and EPEs of horizontal spaces.

4. Summary

Data augmentation technique was applied to the diffraction amplitude of the mask pattern. Diffraction amplitudes of shifted or $Y$ -flipped mask patterns were calculated using the diffraction amplitude of the original mask pattern. The number of the training data after the data augmentation is multiplied by 200 from 2500 to 500,000. Using a large amount of training data, the validation loss of CNN was significantly reduced compared to the validation loss without augmentation.

We verified the accuracy of our CNN by comparing the results of EM simulation with CNN predictions. Our CNN almost reproduced the CDs and EPEs of line and space patterns.

In this work, the mask patterns are restricted to line and space patterns. We did not include hole patterns, patterns with serifs and assist bars, or curvilinear patterns in the training data. We do not expect our CNN to correctly predict images for these patterns. Neural network is only as good as the data we feed it. It is a challenge to build several CNNs for specific mask patterns or ultimately a single CNN for arbitrary mask patterns.

This work is based on the prior SPIE proceedings paper.¹⁷

References

1.

V. Philipsen, “Mask is key to unlock full EUV potential,” Proc. SPIE, 11609 1160904 https://doi.org/10.1117/12.2584583 PSISDG 0277-786X (2021). Google Scholar

2.

A. Erdmann et al., “3D mask effects in high NA EUV imaging,” Proc. SPIE, 10957 109570Z https://doi.org/10.1117/12.2515678 PSISDG 0277-786X (2019). Google Scholar

3.

A. Wong, “TEMPEST users’ guide,” (1994). Google Scholar

4.

M. G. Moharam and T. K. Gaylord, “Rigorous coupled-wave analysis of planar-grating diffraction,” J. Opt. Soc. Am., 71 (7), 811 https://doi.org/10.1364/JOSA.71.000811 JOSAAH 0030-3941 (1981). Google Scholar

5.

H. Tanabe, “Modeling of optical images in resist by vector potentials,” Proc. SPIE, 1674 637 https://doi.org/10.1117/12.130360 PSISDG 0277-786X (1992). Google Scholar

6.

K. D. Lucas, H. Tanabe and A. J. Strojwas, “Efficient and rigorous three-dimensional model for optical lithography simulation,” J. Opt. Soc. Am. A, 13 2187 https://doi.org/10.1364/JOSAA.13.002187 JOAOD6 0740-3232 (1996). Google Scholar

7.

P. Evanschitzky and A. Erdmann, “Fast near field simulation of optical and EUV masks using the waveguide method,” Proc. SPIE, 6533 65330Y https://doi.org/10.1117/12.736978 PSISDG 0277-786X (2007). Google Scholar

8.

S. Lan et al., “Deep learning assisted fast mask optimization,” Proc. SPIE, 10587 105870H https://doi.org/10.1117/12.2297514 PSISDG 0277-786X (2018). Google Scholar

9.

P. Liu, “Mask synthesis using machine learning software and hardware platforms,” Proc. SPIE, 11327 1132707 https://doi.org/10.1117/12.2551816 PSISDG 0277-786X (2020). Google Scholar

10.

R. Pearman et al., “Fast all-angle mask 3D ILT patterning,” Proc. SPIE, 11327 113270F https://doi.org/10.1117/12.2554856 PSISDG 0277-786X (2020). Google Scholar

11.

J. Lin et al., “Fast mask near-field calculation using fully convolution network,” in Int. Workshop Adv. Pattern. Solutions, (2020). https://doi.org/10.1109/IWAPS51164.2020.9286805 Google Scholar

12.

H. Tanabe, S. Sato and A. Takahashi, “Fast EUV lithography simulation using convolutional neural network,” J. Micro/Nanopattern. Mater. Metrol., 20 (4), 041202 https://doi.org/10.1117/1.JMM.20.4.041202 (2021). Google Scholar

13.

W. Ye et al., “TEMPO: fast mask topography effect modeling with deep learning,” in Int. Symp. Phys. Design, 127 –134 (2020). https://doi.org/10.1145/3372780.3375565 Google Scholar

14.

A. Awad et al., “Accurate prediction of EUV lithographic images and 3D mask effects using generative networks,” J. Micro/Nanopattern. Mater. Metrol., 20 (4), 043201 https://doi.org/10.1117/1.JMM.20.4.043201 (2021). Google Scholar

15.

A. Wong, “Resolution Enhancement Techniques in Optical Lithography,” SPIE Press, Bellingham, WA (2001). Google Scholar

16.

S. Schubert et al., “Circular convolutional neural networks for panoramic images and laser data,” in Proc. IEEE Intell. Veh. Symp. (IV), (2019). https://doi.org/10.1109/IVS.2019.8813862 Google Scholar

17.

H. Tanabe and A. Takahashi, “Data augmentation in EUV lithography simulation based on convolutional neural network,” Proc. SPIE, 12052 120520T https://doi.org/10.1117/12.2615267 PSISDG 0277-786X (2022). Google Scholar

Biography

Hiroyoshi Tanabe is a researcher at Tokyo Institute of Technology. He received his PhD in physics from the University of Tokyo in 1986. He has more than 30 years of experience in optical and EUV lithography. He is the author of more than 30 papers. He was the program committee chair of Photomask Japan in 2003 and 2004. His current research interests include EUV masks and lithography simulation. He is a member of SPIE.

Atsushi Takahashi received his BE, ME, and DE degrees in electrical and electronic engineering from Tokyo Institute of Technology, Tokyo, Japan, in 1989, 1991, and 1996, respectively. He is currently a professor in the Department of Information and Communications Engineering, School of Engineering, Tokyo Institute of Technology. His research interests are in VLSI layout design and combinational algorithms. He is a fellow of IEICE, a senior member of IEEE and IPSJ, and a member of ACM.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 International License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.

Citation Download Citation

Hiroyoshi Tanabe and Atsushi Takahashi "Data augmentation in extreme ultraviolet lithography simulation using convolutional neural network," Journal of Micro/Nanopatterning, Materials, and Metrology 21(4), 041602 (14 October 2022). https://doi.org/10.1117/1.JMM.21.4.041602

Received: 12 May 2022; Accepted: 29 July 2022; Published: 14 October 2022

Access the abstract

JOURNAL ARTICLE
10 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

CITATIONS

Cited by 1 scholarly publication.

Explore citations on Lens.org

KEYWORDS

Photomasks

Diffraction

Extreme ultraviolet lithography

3D modeling

Fourier transforms

Cadmium

Convolutional neural networks

Background

Aim

Approach

Results

Conclusions

1.

Introduction

2.

Data Augmentation for Large Mask Patterns

Eq. (1)

Fig. 1

Eq. (2)

Eq. (3)

Eq. (4)

Fig. 2

Fig. 3

Eq. (5)

Eq. (6)

Fig. 4

Eq. (7)

Eq. (8)

Fig. 5

Fig. 6

Fig. 7

3.

CNN Prediction Accuracy

Fig. 8

Fig. 9

Fig. 10

Fig. 11

Fig. 12

Fig. 13

4.

Summary

References

Biography

Show All Keywords

Keywords/Phrases

Search In:

Publication Years