ISPD ’22: Proceedings of the 2022 International Symposium on Physical Design
Full Citation in the ACM Digital Library
SESSION: Session 1: Opening Session and First Keynote
Session details: Session 1: Opening Session and First Keynote
- Laleh Behjat
- Stephen Yang
The Need for Speed: From Electric Supercars to Cloud Bursting for Design
- Dean Drako
Our industry has insatiable need for speed. In addition to fast products for consumer electronics, medical, mil-aero, security, smart sensors, AI processing, robots, and more, we also continuously push for higher performance for the processing and communication infrastructure needs for true hyper-connectivity.
Dean Drako will compare our industry’s drive for speed to electric supercars. He will then drill down into four key elements that advanced design and verification teams deploy to speed the delivery of their innovative products to successfully meet market windows.
SESSION: Session 2: Placement, Clock Tree Synthesis, and Optimization
Session details: Session 2: Placement, Clock Tree Synthesis, and Optimization
- Deepashree Sengupta
RTL-MP: Toward Practical, Human-Quality Chip Planning and Macro Placement
- Andrew B. Kahng
- Ravi Varadarajan
- Zhiang Wang
In a typical RTL-to-GDSII flow, floorplanning plays an essential role in achieving decent quality of results (QoR). A good floorplan typically requires interaction between the frontend designer, who is responsible for the functionality of the RTL, and the backend physical design engineer. The increasing complexity of macro-dominated designs (especially machine learning accelerators with autogenerated RTL) has made the floorplanning task even more challenging and time-consuming. In this paper, we propose RTL-MP, a novel macro placer which utilizes RTL information and tries to “mimic” the interaction between the frontend RTL designer and the backend physical design engineer to produce human-quality floorplans. By exploiting the logical hierarchy and processing logical modules based on connection signatures, RTL-MP can capture the dataflow inherent in the RTL and use the dataflow information to guide macro placement. We also apply autotuning to optimize hyperparameter settings based on input designs. We have built RTL-MP based on OpenROAD infrastructure and applied RTL-MP to a set of industrial designs. RTL-MP outperforms state-of-the-art commercial macro placers and achieves QoR similar to that of handcrafted floorplans.
Clock Design Methodology for Energy and Computation Efficient Bitcoin Mining Machines
- Chien-Pang Lu
- Iris Hui-Ru Jiang
- Chih-Wen Yang
Bitcoin mining machines become a new driving force to push the physical limitation of semiconductor process technology. Instead of peak performance, mining machines pursue energy and computation efficiency of implementing cryptographic hash functions. Therefore, the state-of-the-art ASIC design of mining machines adopts near-threshold computing, deep pipelines, and uni-directional data flow. According to these design properties, in this paper, we propose a novel clock reversing tree design methodology for bitcoin mining machines. In the clock reversing tree, the clock of global tree is fed from the last pipeline stage backward to the first one, and the clock latency difference between the local clock roots of two consecutive stages maintains a constant delay. The local tree of each stage is well balanced and keeps the same clock latency. The special clock topology naturally utilizes setup time slacks to gain hold time margins. Moreover, to alleviate the incurred on-chip variations due to near-threshold computing, we maximize the common clock path shared by flip-flops of each individual stage. Finally, we perform inverter pair swap to maintain duty cycle. Experimental results show that our methodology is promising for industrial bitcoin mining designs: Compared with two variation-aware clock network synthesis approaches widely used in modern ASIC designs, our approach can reduce up to 64% clock buffer/inverter usage, 12% clock power, decrease 99% hold time violating paths, and achieve 85% area saving for timing fixing. The proposed clock design methodology is general and applicable to blockchain and other ASICs with deep pipelines and strong data flow.
Kernel Mapping Techniques for Deep Learning Neural Network Accelerators
- Sarp Özdemir
- Mohammad Khasawneh
- Smriti Rao
- Patrick H. Madden
Deep learning applications are compute intensive and naturally parallel; this has spurred the development of new processor architectures tuned for the work load. In this paper, we consider structural differences between deep learning neural networks and more conventional circuits — highlighting how this impacts strategies for mapping neural network compute kernels onto available hardware. We present an efficient mapping approach based on dynamic programming, and also a method to establish performance bounds. We also propose an architectural approach to extend the practical life time of hardware accelerators, enabling the integration of a variety of heterogenous processors into a high performance system. Experimental results using benchmarks from a recent ISPD contest are also reported.
SESSION: Session 3: Design Flow Advances with Machine Learning and Lagrangian Relaxation
Session details: Session 3: Design Flow Advances with Machine Learning and Lagrangian Relaxation
- Ulf Schlichtmann
Design Flow Parameter Optimization with Multi-Phase Positive Nondeterministic Tuning
- Matthew M. Ziegler
- Lakshmi N. Reddy
- Robert L. Franch
Synthesis and place & route tools are highly leveraged for modern digital design. But, despite continuous improvement in CAD tool performance, products in competitive markets often set PPA (performance, power, area) targets beyond what the tools can natively deliver. These aggressive targets lead to circuit designers attempting to tune a vast number of design flow parameters in search of near-optimal design specific flow recipes. Compounding the complex design flow parameter tuning problem is that many digital design tools exhibit nondeterminism, i.e., run-to-run variation. While CAD tool nondeterminism is typically considered an undesirable behavior, this paper proposes design flow tuning methodologies that take advantage of nondeterminism. We propose techniques that employ a combination of running targeted scenarios multiple times to exploit positive deviations nondeterminism can produce and leverage the best observed runs as seeds for multi-phase tuning. We introduce three seed variants for multi-phase tuning that have a spectrum of characteristics, trading off PPA improvement and reduce run-to-run variation. Our experimental analysis using high-performance industrial designs show that the proposed novel techniques outperform an existing state-of-the-art industrial design flow tuning program across all PPA metrics. Furthermore, our proposed approaches reduce run-to-run variation of the best scenarios, leading to a more predictable design flow.
Integrating LR Gate Sizing in an Industrial Place-and-Route Flow
- David Chinnery
- Ankur Sharma
Lagrangian relaxation (LR) based gate sizing is the state-of-the-art gate-sizing approach. Integrating it within a place-and-route (P&R) tool is difficult as LR needs multiple iterations to converge, requiring very fast timing analysis. Gate-sizing is invoked in many P&R flow steps, so it is also unclear where best to use LR sizing. We detail development of a LR gate sizer for an industrial P&R flow. Software architecture and P&R flow needs are discussed. We summarize how we sped up the LR sizer by 3x to resize a million gates per hour, and ensure multi-threaded results are deterministic. LR sizing experiments at the fast WNS/TNS optimization steps in the flow stages before and after clock tree synthesis (CTS) show excellent results: 10% to 20% setup timing total negative slack (TNS) reduction with 11% to 14% less leakage power, or 1% to 3% lower total power (dynamic power + leakage) with a total power objective, and 1% to 3% lower cell area. Worst negative slack (WNS) also improved in 2/3 of designs in pre-CTS. In the full flow, 5% lower leakage, 1% lower total power, and 0.6% lower cell area can be achieved, with roughly neutral impact on other metrics, compared to a high-effort low-power P&R flow baseline.
Machine-Learning Enabled PPA Closure for Next-Generation Designs
- Vishal Khandelwal
Slowdown in process scaling is putting increasing pressure on EDA tools to bridge the power, performance and area (PPA) entitlement gap of Moore’s Law. State-of-the-art designs are pushing the PPA envelope to the limit, accompanied by increasing design size and complexity, and shrinking time-to-market constraints. Al/ML techniques provide a promising direction to address many of the modeling and convergence challenges seen in physical design flows. Further, the promise of intelligent design tools capable of exploring the solution space efficiently brings game-changing possibilities to next-generation design methodologies. In this talk we will discuss various challenges and opportunities in delivering best-in-class PPA closure with AI/ML augmented digital implementation tools. We will also talk about some aspects of large-scale industrial adoption of such a system and the AI capabilities needed to power these tools to minimize the need for an expert user, or endless tool iterations.
Improving Chip Design Performance and Productivity Using Machine Learning
- Narender Hanchate
Engineering teams are always under pressure to deliver increasingly aggressive power, performance and area (PPA) goals, as fast as possible, on many concurrent projects. Chip designers often spend significant time tuning the implementation flow for each project to meet these goals. Cadence Cerebrus machine learning chip design flow optimization automates this whole process, delivering better PPA much more quickly. During this presentation Cadence will discuss Cerebrus machine learning and distributed computing techniques which enable RTL to GDS flow optimization, delivering better engineering productivity and design performance.
SESSION: Session 4: Panel on Traditional Algorithms Versus Machine Learning Approaches
Session details: Session 4: Panel on Traditional Algorithms Versus Machine Learning Approaches
- Patrick Groeneveld
From Hard-Coded Heuristics to ML-Driven Optimization: New Frontiers for EDA
- Patrick R. Groeneveld
The very first Design Automation Conference was held in 1964 when computers were programmed with punch cards. The initial topics were related to automated Printed Circuit Board design, cell placement, and early attempts at transient circuit analysis. The next decades saw the introduction of key graph algorithms and numerical analysis methods. Optimal algorithms and more practical heuristic methods were published. The 1980ies saw the advent of simulated annealing, a universal heuristic optimization method that found many applications. The next decade introduced powerful numerical placement methods for millions of cells. Soon after, physical synthesis was born by combining several incremental synthesis and analysis tools. Today’s commercial EDA tools run a very complex design flow that chains together hundreds of algorithms that were developed over 60 decades. Most effort is in the careful fine-tuning of parameters and addressing the complex – and often surprising – algorithmic interactions. This is a difficult trial-and-error process, driven by a small set of benchmarks. Machine Learning methods will take some of the human tuning efforts out of this loop. Some have already found their way in commercial tools. It will take a while before a Machine Learning method fully replaces a ‘traditional’ EDA algorithm. Each method in the flow has a limited sweet spot and is often run-time critical. On the other hand, conventional algorithms leave only insignificant opportunities for speed up through parallelism. Machine Learning methods may provide the only viable way to unlock the potential of massive cloud computing resources.
Embracing Machine Learning in EDA
- Haoxing Ren
The application of machine learning (ML) in EDA is a hot research trend. To use ML in EDA, it is nature to think from the ML method point of view, i.e. supervised learning, reinforcement learning and unsupervised learning. Based on this point of view, we can roughly classify the ML applications in EDA into three categories: prediction, optimization, and generation. The prediction category applies supervised learning methods to predict design quality of result (QoR) metrics. There are two kinds of QoR metrics that benefit from the prediction. One kind of metrics are those that can be determined at the current design stage but calculating them consumes a lot of computing resources. For ex-ample,   leverage ML to predict circuit power consumption without expensive simulations. The other kind of metrics are those that depend on future design stages. For example,  predicts post layout parasitics from schematic of analog circuits. The optimization category applies Bayesian Optimization (BO)and reinforcement learning (RL) to directly optimize EDA problems.BO treats the optimization objective as a blackbox function and tries to find optimal solutions by iteratively sampling the solution space. For example,  proposes to use BO with graph embedding and neural network-based surrogate model to size analog circuits. RL treats the optimization objective as the reward from an environment, and trains agents to maximize the reward.  proposes to use RL to optimize macro placement, and  proposes to use RL to optimize parallel prefix circuit structures. The generation category applies generative models such as generative adversarial networks (GANs) to directly generate solutions to EDA problems. Generative models can learn from previous optimized data distribution and generate solutions for a new problem instance without going through iterative processes like BO or RL. For example,  builds a conditional GAN model that learns to generate optical proximity correction (OPC) layout from the original mask.
What’s So Hard About (Mixed-Size) Placement?
- Mohammad Khasawneh
- Patrick H. Madden
For years, integrated circuit design has been a driver for algorithmic advances. The problems encountered in the design of modern circuits are often intractable — and with exponentially increasing size. Efficient heuristics and approximations have been essential to sustaining Moore’s Law growth, and now almost every aspect of the design process is heavily automated. There is, however, one notable exception: there is often substantial floor planning effort from human designers to position large macro blocks. The lack of full automation on this step has motivated the exploration of novel optimization methods, most recently with reinforcement learning. In this paper, we argue that there are multiple forces which have prevented full automation — and a lack of algorithmic methods is not the only factor. If the time has come for automation, there are a number of “traditional” methods that should be considered again. We focus on recursive bisection, and highlight key ideas from partitioning algorithms that have broader impact than one might expect. We also stress the importance of benchmarking as a way to determine which approaches may be most effective.
Scalability and Generalization of Circuit Training for Chip Floorplanning
- Summer Yue
- Ebrahim M. Songhori
- Joe Wenjie Jiang
- Toby Boyd
- Anna Goldie
- Azalia Mirhoseini
- Sergio Guadarrama
Chip floorplanning is a complex task within the physical design process, with more than six decades of research dedicated to it. In a recent paper published in Nature~\citemirhoseini2021graph, a new methodology based on deep reinforcement learning was proposed that solves the floorplanning problem for advanced chip technologies with production quality results. The proposed method enables generalization, which means that the quality of placements improves as the policy is trained on a larger number of chip blocks. In this paper, we describe Circuit Training, an open-source distributed reinforcement learning framework that re-implements the proposed methodology in TensorFlow v2.x. We will explain the framework and discuss ways it can be extended to solve other important problems within physical design and more generally chip design. We also show new experimental results that demonstrate the scaling and generalization performance of Circuit Training.
SESSION: Session 5: Second Keynote
Session details: Session 5: Second Keynote
- Louis K. Scheffer
The Cerebras CS-2: Designing an AI Accelerator around the World’s Largest 2.6 Trillion Transistor Chip
- Jean-Philippe Fricker
The computing and memory demands from state-of-the-art neural networks have increased several orders of magnitude in just the last couple of years, and there’s no end in sight. Traditional forms of scaling chip performance are necessary but far from sufficient to run the machine learning models of the future. In this talk, Cerebras Co-Founder and Chief Systems Architect Jean-Philippe Fricker will explore the fundamental properties of neural networks and why they are not well served by traditional architectures. He will examine how co-design can relax the traditional boundaries between technologies and enable designs specialized for neural networks with new architectural capabilities and performance. Finally, Jean-Philippe will explore this rich new design space using the Cerebras architecture as a case study, highlighting design principles and tradeoffs that enable the machine learning models of the future.
SESSION: Session 6: Third Keynote
Session details: Session 6: Third Keynote
- Chuck Alpert
Leveling Up: A Trajectory of OpenROAD, TILOS and Beyond
- Andrew B. Kahng
Since June 2018, the OpenROAD project has developed an open-source, RTL-to-GDS EDA system within the DARPA IDEA program. The tool achieves no-human-in-loop generation of design-rule clean layout in 24 hours. This enables system innovation and design space exploration, while also democratizing hardware design by lowering barriers of cost, expertise and risk. Since November 2021, The Institute for Learning-enabled Optimization at Scale (TILOS), an NSF AI institute for advances in optimization partially supported by Intel, has begun its work toward a “new nexus” of AI, optimization, and the leading edge of practice for use domains that include IC design. This paper traces a trajectory of “leveling up” in the research enablement for IC physical design automation and EDA in general. This trajectory has OpenROAD and TILOS as waypoints, and advances themes of openness, infrastructure, and culture change.
SESSION: Session 7: Prototyping, Packaging, and Integration
Session details: Session 7: Prototyping, Packaging, and Integration
- Tiago Reimann
3DIC Design: Challenges and Opportunities in System-of-Chips Integration
- Ming Zhang
Technology scaling has enabled the semiconductor industry to successfully address the application performance demands over the past three decades. However, the cost, complexity and diminishing returns of the classic Moore’s Law scaling is accelerating the migration from traditional system-on-chip design to systems-of-chips design consisting of 3D heterogenous integration systems that open a new dimension to improve density, bandwidth, performance, power, and cost. Designing such 3D systems has its own challenges – to enable them, we need to look beyond piece-meal tooling to more hyperconvergent design systems that provide the comprehensive technological solution and productivity gains. This talk will outline the promise of the 3D system-of-chips design and present key design and verification challenges faced by the engineering teams associated with the development of such systems. It will discuss how a holistic design solution consisting of end-to-end design automation, integrated tools, die-to-die IP and methodologies can provide unique benefits in system-level design flow optimization and pave the way to achieving optimal power, performance and transistor volume density to drive the next wave of transformative products.
Novel Methodology for Assessing Chip-Package Interaction Effects onChip Performance
- Armen Kteyan
- Jun-Ho Choy
- Valeriy Sukharev
- Massimo Bertoletti
- Carmelo Maiorca
- Rossana Zadra
- Massimo Inzaghi
- Gabriele Gattere
- Giancarlo Zinco
- Paolo Valente
- Roberto Bardelli
- Alessandro Valerio
- Pierluigi Rolandi
- Mattia Monetti
- Valentina Cuomo
- Salvatore Santapà
The paper presents a multiscale simulation methodology and EDA tool that assesses the effect of thermal mechanical stresses arising after die assembly on chip performance. Existing non uniformities of feature geometries and composite nature of on-chip interconnect layers are addressed by developed methodology of the anisotropic effective thermomechanical material properties (EMP) that reduces complexity of FEA simulations and enhances the accuracy and performance. Physical nature of the calculated EMP makes it scalable with the simulation grid size, which enables resolution of stress/strain at different scales from package to device channel. With feature-scale resolution, the tool enables accurate calculation of stress components in the active region of each device, where the carrier mobility variation results in deviations of circuits performance. The tool’s capability of back-annotation of the hierarchic Spice netlist with the stress values allows a user to perform circuit simulation in different stress environments, by placing the circuit block in different locations in the layout characterized by different distances from the stress sources, such as die edges and C4 bumps. Both schematic and post-layout netlists can be employed for finding optimal floorplan minimizing the stress impact at early design stages, as well as for the final design sign-off. Electrical measurements on a specially designed test-package were used for validation of the methodology. Good agreement between measured and simulated variations of device characteristics has been demonstrated.
On Ensuring Congruency with Implementation During Emulation and Prototyping
- Alex Rabinovitch
ASIC-style design implementation ensures a certain degree of determinism in design behavior when it comes to glitches in clock cones and hold violations. Emulation and prototype products must follow the same deterministic rules of behavior in order to match the behavior of the real chip. Those techniques are surveyed and shown to be inherently rooted in modelling the timeline in a manner that creates an artificial common source of synchronization between different clocks in design. Also the capability of low skew clock lines provided by FPGA vendors is leveraged. However, this overall approach could result in performance degradation and techniques are presented to compensate for the degradation. It is an open question whether these methods could potentially benefit the Implementation which is presently using a rather different method to solve similar problems.
SESSION: Session 8: 3D IC Design
Session details: Session 8: 3D IC Design
- Lang Feng
Challenges and Solutions for 3D Fabric: A Foundry Perspective
- Sandeep Kumar Goel
3D ICs have increasingly become popular as they provide a way to pack more functionality on a chip and reduce manufacturing cost. TSMC offers a number of packaging technologies under the umbrella of “3D Fabric” to suit different product requirements. Just like any new technology, 3D Fabric brings forward several challenges associated with system, design, thermal as well as testing that require effective and efficient solutions before 3D Fabric can be used in high volume production. In this presentation, we will give a brief introduction about various 3D Fabric offerings and discuss challenges from a semiconductor foundry perspective. Next, we present an overview of solutions along with what EDA needs to solve. Lastly, how various IEEE Standards such as 1838, and 1149.1 can help in streamlining and standardizing testing approaches for 3D Fabrics will be discussed.
Recent Advances and Future Challenges in 2.5D/3D Heterogeneous Integration
- Tanay Karnik
In this presentation, we will review the recent advances in chiplet-based commercial products and prototypes [2,3,4,5]. Most chiplet usage has been confined to integrating die designed by the same organization applied to building chips for the same product types. The right approach should be able to reduce portfolio costs, scale innovation and improve time to solution . It is important to manage the associated trade-offs, such as thermal, power, I/O escapes, assembly, test, etc. We will conclude the talk by presenting the future 2.xD/3D integration opportunities becoming available .
ART-3D: Analytical 3D Placement with Reinforced Parameter Tuning for Monolithic 3D ICs
- Gauthaman Murali
- Sandra Maria Shaji
- Anthony Agnesina
- Guojie Luo
- Sung Kyu Lim
In this paper, we show that true 3D placement approaches, enhanced with reinforcement learning, can offer further PPA improvements over pseudo-3D approaches. To accomplish this goal, we integrate an academic true 3D placement engine into a commercial-grade 3D physical design flow, creating ART-3D flow (Analytical 3D Placement with Reinforced Parameter Tuning-based 3D flow). We use a reinforcement learning (RL) framework to find optimized placement parameter settings of the true 3D placement engine for a given netlist and perform high-quality 3D placement. We then use an efficient 3D optimization and routing engine based on a commercial place and route (P&R) tool to maintain or improve the benefits reaped from true 3D placement till design signoff. We evaluate our 3D flow by designing several gate-only and processor benchmarks on a commercial 28nm technology node. Our proposed 3D flow involving true 3D placement offers the best PPA results compared to existing 3D P&R flows and reduces power consumption by up to 31%, improves effective frequency by up to 25%, and therefore reduces power-delay product by up to 43% compared with commercial 2D IC design flow. These improvements predominantly come from RL-based parameter tuning, as it improves the performance of the 3D placer by up to 12%.
Intelligent Design Automation for Heterogeneous Integration
- Iris Hui-Ru Jiang
- Yao-Wen Chang
- Jiun-Lang Huang
- Charlie Chung-Ping Chen
As the design complexity grows dramatically in modern circuit designs, 2.5D/3D heterogeneous integration (HI) becomes effective for system performance, power, and cost optimization, providing promising solutions to the increasing cost of more-Moore scaling. In this talk, we investigate the chip, package, and board co-design methodology with advanced packages and optical communication considering essential issues on physical design, electrical, thermal, and mechanical effects, timing, and testing, and suggest future research opportunities. Layout: A robust and vertically integrated physical design flow for HI design is needed. We address chip-, package-, and board-level component planning, package-level RDL routing, board-level routing, optical routing, and placement and routing considering warpage and thermal effects. Timing: New chip-level and cross-chip timing analysis techniques are desired. We address timing propagation under current source delay model (CSM), timing analysis and optimization for optical-electrical routing, multi-corner multi-mode analysis for HI, hierarchical MCMM analysis. Testing: The scope covers functional-like test generation, System-in-Package (SiP) online testing, photonic integrated circuits (PIC) testing and design-for-test (DfT), etc. Integration: We shall address chip, package, and board co-design considering multi-domain physics, including physical, electrical, thermal, mechanical, and optical effects and optimization.
SESSION: Session 9: Routing
Session details: Session 9: Routing
- Jhih-Rong Gao
A Reinforcement Learning Agent for Obstacle-Avoiding Rectilinear Steiner Tree Construction
- Po-Yan Chen
- Bing-Ting Ke
- Tai-Cheng Lee
- I-Ching Tsai
- Tai-Wei Kung
- Li-Yi Lin
- En-Cheng Liu
- Yun-Chih Chang
- Yih-Lang Li
- Mango C.-T. Chao
This paper presents a router, which tackles a classic algorithm problem in EDA, obstacle-avoiding rectilinear Steiner minimum tree (OARSMT), with the help of an agent trained by our proposed policy-based reinforcement-learning (RL) framework. The job of the policy agent is to select an optimal set of Steiner points that can lead to an optimal OARSMT based on a given layout. Our RL framework can iteratively upgrade the policy agent by applying Monte-Carlo tree search to explore and evaluate various choices of Steiner points on various unseen layouts. As a result, our policy agent can be viewed as a self-designed OARSMT algorithm that can iteratively evolves by itself. The initial version of the agent is a sequential one, which selects one Steiner point at a time. Based on the sequential agent, a concurrent agent can then be derived to predict all required Steiner points with only one model inference. The overall training time can be further reduced by applying geometrically symmetric samples for training. The experimental results on single-layer 15×15 and 30×30 layouts demonstrate that our trained concurrent agent can outperform a state-of-the-art OARSMT router on both wire length and runtime.
LEO: Line End Optimizer for Sub-7nm Technology Nodes
- Diwesh Pandey
- Gustavo E. Tellez
- James Leland
Sub-7nm technology nodes have introduced new challenges, specifically in the lower metal layers. Extreme Ultraviolet Lithography (EUV) and multi-patterning-based lithography such as Self-Aligned Double Patterning (SADP) solutions have become key choices for the manufacturing of these layers. The demand for microprocessors has increased tremendously in the last few years and this imposes another challenge to the chip manufacturers to build their products at a very rapid rate. These days a mix of different lithography solutions for the manufacturing of metal layers is quite common. We propose a first-of-its-kind routing plugin which solves design rule violations for multiple lithography technologies without making any changes in the existing routers. Our plugin consists of a practical line-end optimization (LEO) algorithm, which solves most line-end problems in a few minutes, even for very large designs. Our solution is implemented in the development of a 7nm, industrial microprocessor design.
Routing Layer Sharing: A New Opportunity for Routing Optimization in Monolithic 3D ICs
- Sai Pentapati
- Sung Kyu Lim
A 3D Integrated Circuit consists of two or more dies bonded to each other in the vertical direction. This allows for a high transistor density without a need for shrinking the underlying transistor dimensions. While it has been shown to improve design power, performance, and area (PPA) due to the stacked Front End Of the Line (FEOL) layers, the Back End Of the Line (BEOL) structure of the stacked IC also allows for novel routing scenarios. With the split dies in 3D, nets would need to connect cells from different tiers, across many vertical layers and multiple FEOLs. More importantly, nets connecting cells in a single tier could still use metal layers from the BEOL of other tiers to complete routing. This is referred to as routing / metal layer sharing. While such sharing creates additional 3D connections, it can also be utilized to improve several aspects of the design such as cost, routing congestion, and performance. In this paper, we analyze the nets with metal layer sharing in 3D and provide ways to control the number of 3D connections. We show that the configuration of the 3D BEOL stack helps with metal layer cost reduction with up to 1-2 fewer layers needed to complete routing without a noticeable timing impact. Sharing also allows for a better distribution of wirelength in the BEOL stack that can achieve significant reduction in metal layer congestion of top most layer by up to a 50% reduction of its track usage. Finally, we also see performance benefits of up to 16% with the help of metal layer sharing in 3D IC design.
SESSION: Session 10: Fourth Keynote
Session details: Session 10: Fourth Keynote
- Jens Lienig
Triple-play of Hyperconvergency, Analytics, and AI Innovations in the SysMoore Era
- Aiqun Cao
The SysMoore Era can be characterized as the widening gap between classic Moore’s Law scaling and increasing system complexity. System-on-a-chip complexity has now fallen by the wayside to systems-of-chips with the need for smaller process nodes, and multi-die integration. With engineers now handling not just larger chip designs but systems comprised of multiple chips, the focus on user productivity and design robustness becomes a major factor in getting designs to market in the fastest time and with the best possible PPA. Combining a hyperconvergent design flow with smart data analytics and AI-based solution space exploration provides a huge benefit to the engineers tasked with completing these systems. This presentation outlines the challenges and the road to a triple-play solution that gets design engineers out of their late inning jams.
SESSION: Session 11: Lifetime Achievement Commemoration for Ricardo Reis
Session details: Session 11: Lifetime Achievement Commemoration for Ricardo Reis
- Jose Luiz Guntzel
A Lifetime of Physical Design Automation and EDA Education: ISPD 2022 Lifetime Achievement Award Bio
- Ricardo Augusto da Luz Reis
The 2022 International Symposium on Physical Design lifetime achievement award goes to Prof. Ricardo Reis for his instrumental impact on EDA research in South America and contributions to the physical design community.
Design and Optimization of Quantum Electronic Circuits
- Giovanni De Micheli
Quantum electronic circuits where the logic information is processed and stored in single flux quanta promise efficient computation in a performance/power metric, and thus are of utmost interest as possible replacement or enhancement of CMOS. Several electronic device families leverage superconducting materials and transitions between resistive and superconducting states. Information is coded into bits with deterministic values – as opposed to qubits used in quantum computing. As an example, information can be coded into pulses. Logic gates can be modeled as finite-state machines, that emit logic outputs in response to inputs. The most natural realization of such circuits is through synchronous implementations, where a clock stimulus is transmitted to every logic gate and where logic depth is balanced at every input to achieve full synchrony. Novel superconducting realization families try to go beyond the limitations of synchronous logic with approaches reminiscent of asynchronous design style and leveraging information coding. Moreover, some superconducting families exploit adiabatic operation, in the search for minimizing energy consumption. Design automation for quantum electronic logic families is still in its infancy, but important results have been achieved in terms of automatic balancing and fanout management. The combination of these problems with logic restructuring poses new challenges, as the overall problem is more complex as compared to CMOS and algorithms and tools cannot be just adapted. This presentation will cover recent advancement in design automation for superconducting electronic circuits as well as address future developments in the field.
Physical Design at the Transistor Level Beyond Standard-Cell Methodology
- Renato Hentschke
This talk offers a review of possibilities to explore on VLSI layout beyond traditional standard cell methodology. Existing Physical Design tools strictly avoid any modification to the contents of Standard Cells. Here, a post-processing step based on SAT solvers is proposed to obtain optimal solutions for local transistor level layout synthesis problems. This procedure can be constrained by metrics that ensure that quality is not degraded, and an acceptable and better-quality timing model can be rebuilt for the block. These problems and techniques are open research opportunities in Physical Design as they are not sufficiently explored in the literature and can bring significant improvements to the quality of a VLSI circuit.
Physical Design Optimization, From Past to Future
- Ricardo Augusto da Luz Reis
By the end of years 70s, microprocessors were designed by hand showing excellent layout compaction. It will be shown some highlights of the reverse engineering of the Z8000, which control part was designed by hand, showing several layout optimization strategies. The observation of the Z8000 layout inspired the research of methods to do the automatic generation of the layout of any transistor network, allowing to reduce the number of transistors to implement a circuit, and by consequence, the leakage power. Some of the layout automation tools developed by our group are briefly presented.
SESSION: Session 12: Fifth Keynote
Session details: Session 12: Fifth Keynote
- Bei Yu
Accelerating the Design and Performance of Next Generation Computing Systems with GPUs
- Sameer Halepete
The last few years have seen an accelerating growth in the demand for new silicon designs, even as the size and complexity of those designs has increased. However, the gains in design productivity necessary to implement these designs efficiently have not kept up. We need more than an order of magnitude increase in design productivity by the end of the decade to keep up with demand. Traditional methods for improving physical design tool capabilities are running out of steam, and there is a strong need for new approaches. Over the last two decades, we have seen other areas of computer science such as computer vision, speech recognition and natural language processing reach similar plateaus in performance, and each has been able to break out of the stall using GPU accelerated computing and machine learning. There is a similar opportunity in EDA but it will require a rethinking of the way these tools are implemented. The talk will cover where the demand for new silicon designs is coming from, what the productivity bottlenecks are, and then describe some advances in GPUs that could enable us to break through these bottlenecks with some examples.
SESSION: Session 13: Advances in Analog and Full Custom Design Automation
Session details: Session 13: Advances in Analog and Full Custom Design Automation
- Mark Po-Hung Lin
Optimized is Not Always Optimal – The Dilemma of Analog Design Automation
- Juergen Scheible
The vast majority of state-of-the-art integrated circuits are mixed-signal chips. While the design of the digital parts of the ICs is highly automated, the design of the analog circuitry is largely done manually; it is very time-consuming; and prone to error. Among the reasons generally listed for this is often the attitude of the analog designer. The fact is that many analog designers are convinced that human experience and intuition are needed for good analog design. This is why they distrust the automated synthesis tools. This observation is quite correct, but this is only a symptom of the real problem. This paper shows that this phenomenon is caused by very concrete technical (and thus very rational) issues. These issues lie in the mode of operation of the typical optimization processes employed for the synthesizing tasks. I will show that the dilemma that arises in analog design with these optimizers is the root cause of the low level of automation in analog design. The paper concludes with a review of proposals for automating analog design.
Analog/Mixed-Signal Layout Optimization using Optimal Well Taps
- Ramprasath S
- Meghna Madhusudan
- Arvind K. Sharma
- Jitesh Poojary
- Soner Yaldiz
- Ramesh Harjani
- Steven M. Burns
- Sachin S. Sapatnekar
Well island generation and well tap placement pose an important challenge in automated analog/mixed-signal (AMS) layout. Well taps prevent latchup within a radius of influence in a well island, and must cover all devices. Automated AMS layout flows typically perform well island generation and tap insertion as a postprocessing step after placement. However, this step is intrusive and potentially alters the placement, resulting in increased area, wire length, and performance degradation. This work develops a graph-based optimization that integrates well island generation, well tap insertion, and placement. Its efficacy is demonstrated within a stochastic placement engine. Experimental results show that this approach generates better area, wire length and performance metrics than traditional methods, at the cost of a marginal runtime degradation.
Analog Synthesis – The Deterministic Way
- Helmut Graeb
While the majority of research in design automation for analog circuits has been relying on statistical solution approaches, deterministic approaches are an attractive alternative. This paper gives a few examples of deterministic methods for sizing, structural synthesis and layout synthesis of analog circuits, which have been developed over the past decades. It starts from the so-called characteristic boundary curve for interactive parameter optimization, and ends at recent approaches for structural synthesis of operational amplifiers based on functional block composition. A deterministic approach to analog placement and to yield optimization will also be described. The central role of structural analysis of circuit netlists in these approaches will be explained. A summary of the underlying mindset of analog design automation and an outlook on future opportunities for deterministic sizing and layout synthesis concludes the paper.
AutoCRAFT: Layout Automation for Custom Circuits in Advanced FinFET Technologies
- Hao Chen
- Walker J. Turner
- Sanquan Song
- Keren Zhu
- George F. Kokai
- Brian Zimmer
- C. Thomas Gray
- Brucek Khailany
- David Z. Pan
- Haoxing Ren
Despite continuous efforts in layout automation for full-custom circuits, including analog/mixed-signal (AMS) designs, automated layout tools have not yet been widely adopted in current industrial full-custom design flows due to the high circuit complexity and sensitivity to layout parasitics. Nevertheless, the strict design rules and grid-based restrictions in nanometer-scale FinFET nodes limit the degree of freedom in full-custom layout design and thus reduce the gap between automation tools and human experts. This paper presents AutoCRAFT, an automatic layout generator targeting region-based layouts for advanced FinFET-based full-custom circuits. AutoCRAFT uses specialized place-and-route (P&R) algorithms to handle various design constraints while adhering to typical FinFET layout styles. Verified by comprehensive post-layout analyses, AutoCRAFT has achieved promising preliminary results in generating sign-off quality layouts for industrial benchmarks.
SESSION: Session 14: Panel on Challenges and Approaches in VLSI Routing
Session details: Session 14: Panel on Challenges and Approaches in VLSI Routing
- Gracieli Posser
Challenges and Approaches in VLSI Routing
- Gracieli Posser
- Evangeline F.Y. Young
- Stephan Held
- Yih-Lang Li
- David Z. Pan
In this paper, we will first have a brief review of the ISPD 2018 and 2019 Initial Detailed Routing Contests. We will then visit a few important and interesting topics in VLSI routing that includes GPU accelerated routing, signal speed optimization in routing, PCB routing and AI-driven analog routing.
Challenges for Automating Package Routing
- Wen-Hao Liu
- Bing Chen
- Hua-Yu Chang
- Gary Lin
- Zi-Shen Lin
Package routing is typically done by semi-auto or manual manners in order to meet several customized requests for different design styles. However, in recent years, the scale of package designs rapidly enlarges, and routing rules become more and more complicated, such that the engineering effort of the manual solution increases dramatically. Therefore, the need of full-auto solution becomes necessary and critical. In addition, in order to build an automatic design flow for 3D-IC, full-auto package routing is one of most important pieces. There are many challenges for realizing full-auto package routing solution. Some of the challenges will be introduced in this paper.
SESSION: Session 15: Global Placement, Macro Placement, and Legalization
Session details: Session 15: Global Placement, Macro Placement, and Legalization
- Joseph Shinnerl
Congestion and Timing Aware Macro Placement Using Machine Learning Predictions from Different Data Sources: Cross-design Model Applicability and the Discerning Ensemble
- Xiang Gao
- Yi-Min Jiang
- Lixin Shao
- Pedja Raspopovic
- Menno E. Verbeek
- Manish Sharma
- Vineet Rashingkar
- Amit Jalota
Modern very large-scale integration (VLSI) designs typically use a lot of macros (RAM, ROM, IP) that occupy a large portion of the core area. Also, macro placement being an early stage of the physical design flow, followed by standard cell placement, physical synthesis (place-opt), clock tree synthesis and routing, etc., has a big impact on the final quality of result (QoR). There is a need for Electronic Design Automation (EDA) physical design tools to provide predictions for congestion, timing, and power etc., with certainty for different macro placements before running time-consuming flows. However, the diversity of IC designs that commercial EDA tools must support and the limited number of similar designs that can provide training data, make such machine learning (ML) predictions extremely hard. Because of this, ML models usually need to be completely retrained for unseen designs to work properly. However, collecting full flow macro placement ML data is time consuming and impractical. To make things worse, common ML methods, such as regression, support vector machine (SVM), random forest (RF), neural network (NN) in general, lack a good estimation of prediction accuracy or confidence and lack debuggability for cross-design applications. In this paper, we present a novel discerning ensemble technique for cross-design ML prediction for macro placement. We developed our solution based on a large number of designs with different design styles and technology nodes, and tested the solution on 8 leading-edge industry designs and achieved comparable or even better results in a few hours (per design) than manual placement results that take many engineers weeks or even months to achieve. Our method shows great promise for many ML problems in EDA applications, or even in other areas.
Global Placement Exploiting Soft 2D Regularity
- Donghao Fang
- Boyang Zhang
- Hailiang Hu
- Wuxi Li
- Bo Yuan
- Jiang Hu
Cell placement is such a critical step for chip physical design that it needs many kinds of efforts for improvement. Recently, designs with 2D processing element arrays have become popular primarily due to their deep neural network computing applications. The 2D array regularity is similar to but different from the regularity of conventional datapath designs. To exploit the 2D array regularity, this work develops a new global placement technique built upon RePlAce, the latest state-of-the-art placement framework. Experimental results from various designs show that the proposed technique can reduce half-perimeter wirelength and Steiner tree wirelength by about $6%$ and $12%$, respectively.
Linear-time Mixed-Cell-Height Legalization for Minimizing Maximum Displacement
- Chung-Hsien Wu
- Wai-Kei Mak
- Chris Chu
Due to the aggressive scaling of advanced technology nodes, multiple-row-height cells have become more and more common in VLSI design. Consequently, the placement of cells is no longer independent among different rows, which makes the traditional row-based legalization techniques obsolete. In this work, we present a highly efficient linear-time mixed-cell-height legalization approach that optimizes both the total cell displacement and the maximum cell displacement. First, a fast window-based cell insertion technique introduced in  is applied to obtain a feasible initial row assignment and cell ordering which is known to be good for total displacement consideration. In the second stage, we use an iterative cell swapping algorithm to change the row assignment and the cell order of the critical cells for maximum displacement reduction. Then we develop an optimal linear time DAG-based fixed row and fixed order legalization algorithm to minimize the maximum cell displacement. Finally, we propose a cell shifting heuristic to reduce the total cell displacement without increasing the maximum cell displacement. Using the proposed approach, the quality provided by the global placement can be preserved as much as possible. Compared with the state-of-the-art work , experimental results show that our proposed algorithm can reduce the maximum cell displacement by more than 11% on average with similar average cell displacement.
SESSION: Session 16: Sixth Keynote
Session details: Session 16: Sixth Keynote
- Ajay Joshi
Hardware Security: Physical Design versus Side-Channel and Fault Attacks
- Ingrid Verbauwhede
What is “hardware” security? How can we improve trustworthiness in hardware circuits? Is there a design method for secure hardware design? To answer these questions, different communities have different expectations of trusted (expecting trustworthy) hardware components upon which they start to build a secure system. At the same time, electronics shrink: sensor nodes, IOT devices, smart electronics are becoming more and more available. In the past, adding security was only a concern for locked server rooms or now cloud servers. However, these days, our portable devices contain highly private and secure information. Adding security and cryptography to these often very resource constraint devices is a challenge. Moreover, they can be subject to physical attacks, including side-channel and fault attacks . This presentation aims at bringing some order in the chaos of expectations by introducing the importance of a design methodology for secure design . We will illustrate the capabilities of current side EM and laser fault passive and active attacks. In this context, we will also reflect on the role of physical design, place and route .
SESSION: Session 17: ISPD 2022 Contest Results and Closing Remarks
Session details: Session 17: ISPD 2022 Contest Results and Closing Remarks
- David Chinnery
Benchmarking Security Closure of Physical Layouts: ISPD 2022 Contest
- Johann Knechtel
- Jayanth Gopinath
- Mohammed Ashraf
- Jitendra Bhandari
- Ozgur Sinanoglu
- Ramesh Karri
Computer-aided design (CAD) tools mainly optimize for power, performance, and area (PPA). However, given a large number of serious hardware-security threats that are emerging, future CAD flows must also incorporate techniques for designing secure integrated circuits (ICs). In fact, the stakes are quite high for IC vendors and design companies, as security risks that are not addressed during design time will inevitably be exploited in the field, where vulnerabilities are almost impossible to fix. However, there is currently little to no experience related to designing secure ICs available within the CAD community. For the very first time, this contest seeks to actively engage with the community to close this gap. The theme of this contest is security closure of physical layouts, that is, hardening the physical layouts at design time against threats that are executed post-design time. More specifically, this contest is focused on selected and seminal threats that, once taken in, are relatively simple to approach and mitigate through means of physical design: Trojan insertion and probing as well as fault injection. Acting as security engineers, contest participants will iteratively and proactively evaluate and fix the vulnerabilities of provided benchmark layouts. Benchmarks and submissions are based on the generic DEF format and related files. Thus, participants are free to use any physical-design tools of their choice, helping us to open up the contest to the community at large.