Future lithography systems must produce more dense microchips with smaller feature sizes, while maintaining throughput
comparable to today's optical lithography systems. This places stringent data-handling requirements on the design of
any maskless lithography system. Today's optical lithography systems transfer one layer of data from the mask to the entire
wafer in about sixty seconds. To achieve a similar throughput for a direct-write maskless lithography system with a pixel
size of 22 nm, data rates of about 12 Tb/s are required. Over the past 8 years, we have proposed a datapath architecture
for delivering such a data rate to a parallel array of writers. Our proposed system achieves this data rate contingent on
two assumptions: consistent 10 to 1 compression of lithography data, and implementation of real-time hardware decoder,
fabricated on a microchip together with a massively parallel array of lithography writers, capable of decoding 12 Tb/s of
data.
To address the compression efficiency problem, in the past few years, we have developed a new technique, Context
Copy Combinatorial Coding (C4), designed specifically for microchip layer images, with a low-complexity decoder for
application to the datapath architecture. C4 combines the advantages of JBIG and ZIP, to achieve compression ratios higher
than existing techniques. We have also devised Block C4, a variation of C4 with up to hundred times faster encoding
times, with little or no loss in compression efficiency. While our past work has focused on characterizing the compression
efficiency of C4 and Block C4 on samples of a variety of industrial layouts, there has been no full chip performance
characterization of these algorithms. In this paper, we show compression efficiency results of Block C4 and competing
techniques such as BZIP2 and ZIP for the Poly, Active, Contact, Metal1, Via1, and Metal2 layers of a complete industry
65 nm layout.
Overall, we have found that compression efficiency varies significantly from design to design, from layer to layer,
and even within parts of the same layer. It is difficult, if not impossible, to guarantee a lossless 10 to 1 compression for
all blocks within a layer, as desired in the design of our datapath architecture. Nonetheless, on the most complex Metal1
layer of our 65 nm full chip microprocessor design, we show that a average lossless compression of 5.2 is attainable,
which corresponds to a throughput of 60 wafer layers per hour for a 1.33 Tb/s board-to-chip communications link. As
a reference, state-of-the-art HyperTransport 3.0 offers 0.32 Tb/s per link. These numbers demonstrate the role lossless
compression can play in the design of a maskless lithography datapath.
|