provided by: Semiconductor International The move to smaller geometries, while allowing higher transistor counts and increasing chip functionality and performance, is also reshaping the design and manufacturing flows in unprecedented ways. The wall between manufacturing and design is crumbling even faster than predicted as crucial technologies, tools and interfaces begin to emerge and demonstrate practical value. In this transition, foundational elements of the design flow and manufacturing practice find themselves in a state of radical change. Computational lithography is a key enabler in predicting and preventing systematic issues in design, arising from the increasing complexity of photolithographic process hurdles. Recent advances in computational lithography provide important new capabilities that continue to expand the changes required in both the design and manufacturing flows to support successful manufacturing yield at advanced process technologies.
For years, well-defined hand-offs existed between design, mask making and wafer manufacturing. The designer, constrained by design rules defined by manufacturing, worked with polygonal structures to implement physical functionality from circuit schematics or register transfer level (RTL). The mask house produced reticles with structures on them with the stipulation that mask rule constraints defining acceptable mask writing capabilities were met. Wafer fabs have continued to meet the challenge of printing these structures on silicon using computational lithography tools and techniques to overcome the physical limitations of lithography equipment patterning at subwavelength resolution. Chip designers, however, are increasingly concerned with the impact of manufacturing variations on device performance, timing closure and power, and are searching for efficient and effective means to bring more information on process variation into the design flow.
Foundational challenges in manufacturing flows are well documented in conference journals. Low k1 lithography forces computational lithographers to overcome hardware system resolution limitations with innovative illumination schemes and resolution enhancement techniques (RETs). Narrowing process window latitude requires increased layer planarity to overcome depth of focus (DoF) limitations. Mask data preparation groups encounter geometrically increasing data file sizes (>200 Gb at 65 nm), driven by large designs and more intricate and elaborate optical proximity corrections (OPCs). Mask makers now deal with subresolution imaging issues with RET mask features - even at 4× the magnification.
Despite increasing complexity, turnaround time is the one constraint in both manufacturing and design that remains largely unchanged, pushing software and hardware advances.
Computational litho inflection points
The subwavelength regime in semiconductor manufacturing was defined when critical feature sizes fell below the 193 nm wavelength of today's scanners. Since the advent of OPC tools, computational lithographers have been faced with the challenge of meeting turnaround times for mask data preparation with increasing computational complexity at each progressive process technology. Both the lithographic challenges and computational complexity associated with the 65 and 45 nm process nodes create a need for advanced capabilities for computational lithography tools and the hardware platforms needed to support them.
The history of computational lithography applications can be divided into three distinct periods to illustrate the impact of hardware and software advances necessary to sustain the ability for consistent turnaround times in manufacturing.
250 to 130 nm, 1996 to 2001
As CD feature size descended below the wavelength of the 250 nm scanners (before 193 nm scanners), lithographers started applying simple corrections to limit corner rounding and line-end pullback. These were straightforward adjustments that could be effectively applied using a system of rules that was relatively easy to develop. The resulting computation was fast, typically done using standard design rule check (DRC) tools. The number of polygons on a layer was small enough that these corrections could be done on standard computers used for DRC. A computer with one or two general-purpose processors (GPP) and <1 Gb of memory performed the job.
130 to 65 nm, 2001 to 2006
This was when the first inflection point evolved from a single-thread, single GPP to multiple-thread, multiple GPP platforms. Thus, 180 nm managed to make it to production using simple rule-based corrections with 250 nm scanners. At 130 nm, the interaction distances for OPC expanded from covering the nearest neighbor to several neighbors and beyond, and the problem of correcting the critical layers using a rules-based approach became more complicated. While model-based correction techniques had been developed, users were hesitant to adopt them because of the fear of long turnaround times caused by high computational loads. This led to the first deployment of multi-threaded solutions, initially on symmetric multi-processing machines (SMPs), but distributed processing on loosely coupled clusters quickly emerged as the dominant multi-threading platform (Fig. 1).
At the time, SMP servers with 8-128 GPPs sharing very large memories in one computer were enjoying a heyday because of extremely high-volume demand driven by the Internet's historically unprecedented growth. Web server applications seemed very well suited to the shared-memory SMP architecture, in which multiple identical tasks (web page requests) needed to access the same large amount of data (web page content). Any and all parts of that content needed to be accessible to the processor responding to the web page hit. However, this functionality led to a non-linear cost-per-core curve, because printed circuit board (PCB) costs as a function of area and device count increase geometrically. Getting more processors or memory chips on a set of boards in a single chassis and tightly coupling them with a motherboard with an extremely high-speed parallel bus led to larger boards. It also became apparent that the overall performance of the system did not scale linearly as the number of processors was increased, caused by contention for memory and disk.
At about the same time, low-cost Linux servers emerged and loosely coupled computing clusters - standard compute farms (SCFs) - challenged the dominance of the SMP systems for many high-performance computing applications. Software applications were adapted to split the content across SCFs. As a result, the general high-performance computing market of applications began moving to SCFs. At this time, computational lithography applications also began to grow to 32, 64, 128 and even 256 GPP jobs. At such high GPP counts, the cost per GPP of an SCF was much better than a large SMP. In addition, computational lithography applications were particularly sensitive to the memory contention problem, resulting in a geometric increase of turnaround time as GPP count grew on SMP computers. On the other hand, OPC software applications were ideal for parallelizing operations across a SCF. For these reasons, computational lithography applications able to run and scale on SCFs were developed and quickly adopted. The linear cost vs. GPP count made SCF the most cost-effective platform to deliver the best turnaround time for higher GPP count jobs. SCFs using x86 GPPs with Linux operating systems quickly emerged as the standard for computational lithography applications.
Below 65 nm, 2007 and beyond
Two factors instigated the need for a new hardware platform for computational lithography going forward. First, as work on the 45 nm node began in 2006, it became apparent that the amount of computation required to run lithography applications was increasing geometrically because of:
- The rise in the number of layers requiring OPC
- The increasing optical diameter or interaction distance for OPC computations
- More complex models
- The number of process window points needing simulation
- The increase in the complexity of the fundamental correction algorithms
The end result was a geometrically rising cost of ownership reflected in the number of GPPs required to operate a computational lithography application system (Fig. 2).
Therefore, while the cost per GPP of SCFs increases linearly with GPP count, the GPP count required for running computational lithography applications on a layer increases geometrically. This leads to an unacceptable increase in hardware ownership cost (including software licenses) for computational lithography applications at 45 nm. In response, new platform approaches were sought and, as a result, special-purpose processors (CPAs) are now being applied to accelerate the performance of GPPs for the most computationally intensive portions of the lithography applications. Using CPAs in this fashion is a more cost-effective approach than using additional GPPs, if the CPA can provide enough acceleration of those portions of the application. For example, the Table shows the speedup for a single precision (32 bit floating point) fast Fourier transform (FFT) algorithm (one example of the lower-level portions of algorithms used in image processing applications) on a cell processor compared with a 3.0 GHz x86 GPP. This led to the use of SCFs augmented with CPAs or hybrid compute farms (HCFs, Fig. 3).
The second factor pushing the need for a new hardware platform for computational lithography is that a large part of the semiconductor industry, particularly processor companies, depends on continuously increasing computing capacity to drive revenue growth. Until the past few years, this was accomplished with continuously increasing clock rates. However, power considerations now limit the speed of the underlying transistors in both GPPs and CPAs.
This led to a major shift in the approach processor companies from increasing clock speed to increasing the number of processor cores on a die (It should be noted here that the increase in computational power might not scale linearly as cores are added.). This change also led to the emergence of new CPA technology that offers attractive attributes for computational lithography:
- Reduced cost
- Dramatically reduced power, cooling and space requirements in the data center
- Turnaround time improvements of 20 to 30× over previous generation systems
Computation and the DFM design flow
Advances in computer platforms and software architectures provide significant enabling benefits for emerging developments in the design for manufacturing (DFM) design flow. At 65 nm and below, there is a host of known systematic and parametric yield limiters. Some are being addressed with rule-driven computational methods. Some require the transposition of process simulation technology to the design domain to effectively model and predict yield-limiting layout conditions.
Recent developments in parallel computing solutions in DRC-like rule-based processing commands have dramatically reduced turnaround times for spacing-based checks like critical area analysis and recommended rule analysis. Innovative architectural changes in EDA software increased scalability of traditionally poor scaling operations, providing the means to minimize turnaround time and enable increased trade-off analysis in design.
DFM applications like litho-friendly design rely on accurately modeling production lithographic processes and replicating the results of those processes upstream in the design flow where topological improvements to enhance yield can be implemented. Flows have been established to address both custom IP development - libraries and application-specific IP blocks - and full-chip or full-block routing-layer checks. Since these DFM-related checks are new requirements in traditional design flows, the computation time associated with these checks is extremely critical to adoption.
The improvement in yield robustness through an application of lithographic process checks is now well established in DFM-related conference proceedings. Most major merchant foundries offer litho-related process kits to their fabless design customers to provide fabless designers access to the same manufacturing information that is available to their IDM counterparts. Advanced EDA tool software architectures that enable high-performance, loosely coupled co-processor acceleration will supply the enabling performance capabilities to provide designers with fast turnaround times for next-generation process technologies, where multiple process window simulations will likely be required.
Access to wafer contour simulation in design brings with it benefits for optimizing design and characterizing the impact of parametric variation. A new capability is emerging with 65 nm development for designers to simulate the process variation effects of lithography and etch, and to use wafer contour information to model device performance and other electrical parameters on patterned vs. drawn shapes. Using these simulation outputs, actual patterned device geometries can be used for device performance modeling. SPICE netlists can be extracted for each of several combinations of process conditions. Simulation of the desired circuit with these extracted SPICE netlists allows the designer to accurately perform functional timing and power analysis across a process window with patterned shapes, not just drawn shapes. Using simulated circuit performance results in conjunction with the actual fab process control distributions make it possible to evaluate the relative impact on yield and chip functional performance at multiple process variation conditions.
Additional DFM tools are becoming available including improved density fill tools for control, via enclosure tools, and wire modification and via doubling tools. The obvious challenge is to provide an integrated set of capabilities and cost functions that give designers clear guidance on what changes are beneficial under all considered process variation conditions. For example, via doubling, although generally acknowledged to provide increased yield, can be shown to create lithographically unfriendly layout changes in some instances.
With the industry moving toward manufacturing at the 45 and 32 nm half-pitch, new strategies are required to ensure manufacturability. The number of variables influencing the fidelity of pattern transfer is increasing, and the interaction between these variables is leading to increasingly more complex computational challenges.
Litho Speedup for FFT Algorithm | FFT size | Speedup CPA vs. GPP |
| 1K point | 30x |
| 64 K point | 18x |
Charles Albertalli is the director of Calibre RET & MDP marketing for Mentor Graphics. He is a graduate of Washington University in St. Louis.
Tom Kingsley is the director of product marketing for Calibre RET for Mentor Graphics He is a graduate of UCLA.
author: By Charles Albertalli and Tom Kingsley, Mentor Graphics, Wilsonville, Ore., www.mentor.com
Semiconductor International. Copyright © 2007 Reed Business Information, a division of Reed Elsevier Inc. All rights reserved.