ISPD ’24: Proceedings of the 2024 International Symposium on Physical Design
Full Citation in the ACM Digital Library
SESSION: Session 1: Opening Session and First Keynote
Software and Semiconductors are two fundamental technologies that have become woven into every aspect of our society, and it will be fair to say that “Software and Semiconductors have eaten the world”. More recently, advances in AI are starting to transform every aspect of our society as well. These are three tectonic forces of transformation – “AI”, -Software”, and “Semiconductors” which are colliding together resulting in a seismic shift – a future where both software and semiconductor chips themselves will be designed, optimized, and operated by AI – pushing us towards a future where “Computers can program themselves!”. In this talk, we will discuss these forces of “AI for Chips and Code” and how the future of Semiconductor chip design and software engineering is being redefined by AI.
SESSION: Session 2: Partitioning and Clustering
- Rongjian Liang
- Anthony Agnesina
- Haoxing Ren
State-of-the-art hypergraph partitioners, such as hMETIS, usually adopt a multi-level paradigm for efficiency and scalability. However, they are prone to getting trapped in local minima due to their reliance on refinement heuristics and overlooking global structural information during coarsening. SpecPart, the most advanced academic hypergraph partitioning refinement method, improves partitioning by leveraging spectral information. Still, its success depends heavily on the quality of initial input solutions. This work introduces MedPart, a multi-level evolutionary differentiable hypergraph partitioner. MedPart follows the multi-level paradigm but addresses its limitations by using fast spectral coarsening and introducing a novel evolutionary differentiable algorithm to optimize each coarsening level. Moreover, by analogy between hypergraph partitioning and deep graph learning, our evolutionary differentiable algorithm can be accelerated with deep graph learning toolkits on GPUs. Experiments on public benchmarks consistently show MedPart outperforming hMETIS and achieving up to a 30% improvement in cut size for some benchmarks compared to the best-published solutions, including those from SpecPart—moreover, MedPart’s runtime scales linearly with the number of hyperedges.
- Shuo Yin
- Wenqian Zhao
- Li Xie
- Hong Chen
- Yuzhe Ma
- Tsung-Yi Ho
- Bei Yu
Mask optimization in lithography is becoming increasingly impor- tant as the technology node size shrinks down. Inverse Lithography Technology (ILT) is one of the most performant and robust solutions widely used in the industry, yet it still suffers from heavy time con- sumption and complexity. As the number of transistors scales up, the industry currently focuses more on efficiency improvement and workload distribution. Meanwhile, most recent publications are still tangled in local pattern restoration regardless of real manufacturing conditions. We are trying to extend academia to some real industrial bottlenecks with FuILT, a practical full-chip ILT-based mask opti- mization flow. Firstly, we build a multi-level partitioning strategy with the divide-and-conquer mindset to tackle the full-chip ILT prob- lem. Secondly, we implement a workload distribution framework to maintain hardware efficiency with scalable multi-GPU parallelism. Thirdly, we propose a gradient-fusion technique and a multi-level healing strategy to fix the boundary error at different levels. Our experimental results on different layers from real designs show that FuILT is both effective and generalizable.
- Yen-Yu Chen
- Hao-Yu Wu
- Iris Hui-Ru Jiang
- Cheng-Hong Tsai
- Chien-Cheng Wu
Register clustering is an effective technique for suppressing the increasing dynamic power ratio in modern IC design. By clustering registers (flip-flops) into multi-bit flip-flops (MBFFs), clock circuitry can be shared, and the number of clock sinks and buffers can be lowered, thereby reducing power consumption. Recently, the use of mixed-driving strength MBFFs has provided more flexibility for power and timing optimization. Nevertheless, existing register clustering methods usually employ evenly distributed and invariant path slack strategies. Unlike them, in this work, we propose a register clustering algorithm with slack redistribution at the post-placement stage. Our approach allows registers to borrow slack from connected paths, creates the possibility to cluster with neighboring maximal cliques, and releases extra slack. An adaptive interval graph based on the red-black tree is developed to efficiently adapt timing feasible regions of flip-flops for slack redistribution. An attraction-repulsion force model is tailored to wisely select flip-flops to be included in each MBFF. Experimental results show that our approach outperforms state-of-the-art work in terms of clock power reduction, timing balancing, and runtime.
SESSION: Session 3: Timing Optimization
- Wuxi Li
- Yuji Kukimoto
- Gregory Servel
- Ismail Bustany
- Mehrdad E. Dehkordi
Placement plays a crucial role in the timing closure of integrated circuit (IC) physical design. This paper presents an efficient and effective calibration-based differentiable timing-driven global placement engine. Our key innovation is a calibration technique that approximates a precise but expensive reference timer, such as a signoff timer, using a lightweight simple timer. This calibrated simple timer inherently accounts for intricate timing exceptions and common path pessimism removal (CPPR) prevalent in industry designs. Extending this calibrated simple timer into a differentiable timing engine enables ultrafast yet accurate timing optimization in non-linear global placement. Experimental results on various industry designs demonstrate the superiority of the proposed framework over the latest AMD Vivado and traditional net-weighting methods across key metrics including maximum clock frequency, wirelength, routability, and overall back-end runtime.
- Wei-Chen Tai
- Min-Hsien Chung
- Iris Hui-Ru Jiang
BEOL with airgap technology is an alternative metallization option with promising performance, electrical yield and reliability to explore at 2nm node and beyond. Airgaps form cavities in inter-metal dielectrics (IMD) between interconnects. The ultra-low dielectric constant reduces line-to-line capacitance, thus shortening the interconnect delay. The shortened interconnect delay is beneficial to setup timing but harmful to hold timing. To minimize the additional manufacturing cost, the number of metal layers that accommodate airgaps is practically limited. Hence, circuit timing optimization at post routing can be achieved by wisely performing airgap insertion and layer reassignment to timing critical nets. In this paper, we present a novel and fast airgap insertion approach for timing optimization. A Slack Dependency Graph (SDG) is constructed to view the timing slack relationship of a circuit with path segments. With the global view provided by SDG, we can avoid ineffective optimizations. Our Linear Programming (LP) formulation simultaneously solves airgap insertion and layer reassignment and allows a flexible amount of airgap to be inserted. Both SDG update and LP solving can be done extremely fast. Experimental results show that our approach outperforms the state-of-the-art work on both total negative slack (TNS) and worst negative slack (WNS) with more than 89× speedup.
- Tsung-Wei Huang
- Boyang Zhang
- Dian-Lun Lin
- Cheng-Hsiang Chiu
Static timing analysis (STA) is an integral part in the overall design flow because it verifies the expected timing behaviors of a circuit. However, as the circuit complexity continues to enlarge, there is an increasing need for enhancing the performance of existing STA algorithms using emerging heterogeneous parallelism that comprises manycore central processing units (CPUs) and graphics processing units (GPUs). In this paper, we introduce several state-of-the-art STA techniques, including task-based parallelism, task graph partition, and GPU kernel algorithms, all of which have brought significant performance benefits to STA applications. Motivated by these successful results, we will introduce a task-parallel programming system to generalize our solutions to benefit broader scientific computing applications.
SESSION: Session 4: Panel on EDA Challenges at Advanced Technology Nodes
We have gathered a panel of experts who will delve into the electronic design automation (EDA) challenges at advanced technology nodes. In this rapidly evolving field, the race towards miniaturization presents new hurdles and complexities. With advanced nodes shrinking, design and technology co-optimization becomes an increasingly intricate task. Our panelists will share their unique insights on how they approach these challenges and the impact of complicated design rules on the design process. As designs grow larger and more complex, novel strategies and methodologies are necessary to address the runtime issues of EDA tools. Our discussion will explore these strategies, including promising emerging technologies such as multi-machine or GPU acceleration that show potential in mitigating runtime challenges.
In the realm of design space exploration, effective navigation of trade-offs between different design objectives, especially power, performance, and area, is critical. Our panelists will share experiences of frontend/backend co-optimization in the advanced node design flow and discuss how machine learning techniques can be harnessed to predict and estimate various aspects of the design process. As we step into the era of “More than Moore,” the integration of diverse technologies into a unified design process presents both opportunities and challenges. Our discussion will explore anticipated advancements in advanced packaging or heterogeneous integration and identify issues at advanced technology nodes that, in the panelists’ opinion, have not yet been adequately addressed by commercial tools and/or academic research.
We will also delve into the scaling and runtime challenges at multi-million gates and beyond, the current state of EDA tools for data-sharing for machine learning/AI across tools, and the major new issues that need to be addressed for more optimal design in the 2nm to 5nm process technology nodes. Finally, we will discuss the high-priority research topics that need to be tackled to address advanced technology nodes. We look forward to an enlightening discussion filled with expert insights and thought-provoking ideas.
Improvements in EDA technology become more urgent with the loss of traditional scaling levers and the growing complexity of leading-edge designs. The roadmap for device and cell architectures, power delivery, multi-tier integration, and multiphysics signoffs presents severe challenges to current optimization frameworks. In future advanced technologies, the quality of results, speed, and cost of EDA will be essential components of scaling. Each stage of the design flow must deliver predictable results for given inputs and targets. Optimization goals must stay correlated to downstream and final design outcomes.
The EDA and design ecosystem must achieve an aligned interest in scalable, efficient optimization. End-case solvers and core engines should become commoditized, freeing researchers and EDA suppliers to address new automation requirements. A rearchitected, rebuilt EDA 2.0 must achieve results in less time (multithreading, GPU) and better results in same time (cloud-native, sampling), while enabling AI/ML that steers and orchestrates optimizers within the design process. To deliver high-value system-technology pathfinding and implementation tools in time to meet industry needs, EDA must develop (learn) new optimization objectives, hierarchical solution strategies, multiphysics reduced-order models, and conditioning of instances at the interstices between chained optimizations. Last, as a community, EDA will meet the challenges of advanced technologies by investing in research and workforce development, via open infrastructure for ML EDA, and via proxy research enablements.
In this paper, we explore the burgeoning intersection of Large Language Models (LLMs) and Electronic Design Automation (EDA). We critically assess whether LLMs represent a transformative future for EDA or merely a fleeting mirage. By analyzing current advancements, challenges, and potential applications, we dissect how LLMs can revolutionize EDA processes like design, verification, and optimization. Furthermore, we contemplate the ethical implications and feasibility of integrating these models into EDA workflows. Ultimately, this paper aims to provide a comprehensive, evidence-based perspective on the role of LLMs in shaping the future of EDA.
Le-Chin Eugene Liu received a Bachelor’s and a Master’s degree in electronics engineering from the National Chao Tung University, Hsinchu, Taiwan, and a Ph.D. from the Department of Electrical Engineering, University of Washington, Seattle, Washington, USA.
PPA (Performance, Power, and Area) is our goal. Although most people no longer pursue top speed, it is still very important. When reducing power and area, a common side effect is that timing gets worse. Low power/area method never makes speed faster. Once the speed slows down, buffering or up-sizing is inevitable. The difference between previous tech nodes is the gap between global and detailed placement (GP and DP). More and more rules make gap growing. Once DP cannot make most instances to stay original place, expectation of global placement will be disappointed. GP or more buffering will be necessary again. Physical-rule awareness will be a necessary improvement for GP/buffering. We will list some of the challenges how advanced nodes face.
Modern System-on-Chips (SoCs), such as smartphone microprocessors, are composed of billions of transistors existing in various subsystems. These subsystems can include Central Processing Units (CPUs), Graphics Processing Units (GPUs), Neural Processing Units (NPUs), Image Signal Processors (ISPs), Digital Signal Processors (DSPs), communication modems, memory controllers, and many others. For efficient Electronic Design Automation (EDA) tasks, such as those involving logic synthesis, placement, clock tree synthesis (CTS), and/or routing, these subsystems are typically broken down into smaller, more manageable circuit blocks, or circuit partitions. This subdivision strategy is crucial for keeping design times within reasonable limits.
During the top-level floorplanning phase of chip design, the dimensions, interconnect ports, and physical locations of circuit partitions are defined; the physical boundaries of these partitions are commonly designed as rectilinear shapes rather than rectangles. Partitions that are excessively large can lead to inefficient use of chip area, higher power consumption, and higher production costs. Conversely, undersized partitions can hinder subsequent physical design processes, potentially causing delays in the overall chip design schedule. Furthermore, a poor floorplan can lead to longer wire lengths and can increase feedthrough net counts in partitions, adversely affecting power, performance, and area (PPA).
In practice, the top-level floorplanning phase of chip design can involve multiple iterations of its processes. An initial iteration typically involves estimating the approximate area of each circuit partition based on various factors, such as the dimensions of macros (including SRAM macros), the number of standard cell instances, and the standard cell utilization rate, which can be projected based on the data from previous designs. These preliminary estimates are crucial for defining the initial shapes, dimensions, interconnect ports, and physical locations of the partitions. Subsequently, the downstream design processes can advance either to partition-level physical design (which includes macro placement, standard cell placement, CTS, routing, etc.) or to physical-aware logic synthesis, which uses the defined layout data to more precisely assess layout-induced effects and produce more accurate gate-level netlists.
Once the dimensions and interconnect locations of circuit partitions are defined, macro placement, which is usually followed by standard cell placement and routing processes, can be conducted. After performing these processes, PPA results may indicate that certain partitions require size adjustments due to being too small, whereas others may be identified as candidates for area reduction. Such alterations in the circuit partition areas necessitate modifications to the top-level floorplan. Furthermore, in subsequent iterations of floorplanning, certain elements (such as feedthrough nets/ports) may be added into and/or removed from partitions, prompting a reevaluation of the physical implementation feasibility for these partitions; the reevaluation stage may involve additional macro placement, cell placement, and routing activities.
Macro placement is crucial in physical design as its outcomes can substantially influence standard cell placement, CTS, routing, circuit timing, and even power consumption. However, at advanced technology nodes, macro placement outcomes produced by commercial EDA tools and reinforcement learning (RL)-based tools often require human modifications prior to practical use, which in part owing to complex design rules associated with advanced technology nodes, although these tools can rapidly generate results. Additionally, it has been observed that suboptimal macro placement can lead to issues such as IR drop and increased dynamic/static power consumption. However, these issues, which may be checked more accurately in later stages of a design flow, are frequently not addressed in a typical macro placement process. In modern SoCs, moreover, it is very common that a circuit partition contains multiple power domains. Performing macro placement on this type of circuit partition may require domain floorplanning prior to placing macros and standard cell instances within their respective power domain regions.
As described previously, the floorplanning and the macro placement are often interrelated. Early iterations of floorplanning may not achieve the best configurations for partitions in terms of PPA, leading to additional iterations in the design flow. Also, the macro placement process, along with subsequent cell placement and routing tasks, can serve as a critical and potentially fast evaluation step to assess each partition’s physical implementation feasibility, thereby driving continuous refinements in the floorplan. This iterative methodology is crucial in achieving a more refined and optimized chip design, which is especially critical at advanced technology nodes where wafer costs are significantly high.
In designing modern SoCs, the importance of performing high-quality floorplanning and high-quality macro placement cannot be overemphasized. Specifically, the floorplanning and the macro placement challenges encountered in the industry, and the obstacles preventing complete automation of these processes need to be re-examined. With ongoing advancements in EDA and AI/ML technologies, such as the application of reinforcement learning (RL) in tuning design flow parameters, coupled with enhanced computational power, we anticipate a substantial improvement and/or potential automation in the iterative aspects of these design processes. Such advancements will not only alleviate the workload of engineers but also enhance the overall quality of results (QoR) in chip designs.
EDA ecosystem’s fantastic supports and innovations have helped achieve better logic, memory, wafer-level packaging, and AI chips and systems [1] [2] and [3], for decades. We look forward to the continuous win-win collaborations among university scientists, circuit designers, semiconductor chip manufacturers and EDA companies in the foreseeable future. The critical issue in 2024 is what the major challenges are. A few months ago in the December 2023 IEDM in San Francisco, world-wide semiconductor experts are postulating that the complexities of the upcoming high-end electronic systems will soon be in the range of one trillion transistors. Advanced transistors used are either FinFET or GAA/nanosheet, or both, in need of design and technology co-optimization (DTCO). Innovations in the 3DFabric Alliance [2] are essential to the trillion-transistor trend. The leap from traditional SoC/IC designs to 3DFabric (3DIC) designs brings new benefits and opportunities. This new system-level design paradigm inevitably introduces new EDA challenges on system design, verification, thermal management, mechanical stress, and electrical-photonic compliance of the entire 3DIC assembly and reliability.
The author summarizes here an updated list of four major EDA challenges: (1) New post-FinFET, GAA, and nanosheet design and technology co-optimization (DTCO) [3] and physical design algorithms that can provide VLSI/SoC circuit designers with efficient APR and physics based SPICE models along with efficient RLC back annotations; (2) New, fast, and physics based chip-level and system-level SI/PI/EMC simulation tools and flows; (3) New concept-to-physical design methodology that can help achieve high-quality, user-friendly, and fast time-to-market EDA methodology for wafer-level packages of AI and data center solutions; (4) We expect silicon photonics will be mature within a few years and its EDA solution is challenging.
SESSION: Session 5: 3D ICs
- Siting Liu
- Jiaxi Jiang
- Zhuolun He
- Ziyi Wang
- Yibo Lin
- Bei Yu
- Martin Wong
Face-to-face (F2F) stacked 3D IC is a promising alternative for scaling beyond Moore’s Law. In F2F 3D ICs, dies are connected through bonding terminals whose positions can significantly impact routing performance. Further, there exists resource competition among all the 3D nets due to the constrained bonding terminal number. In advanced technology nodes, such 3D integration may also introduce legality challenges of bonding terminals, as the metal pitches can be much smaller than the sizes of bonding terminals. Previous works attempt to insert bonding terminals automatically using existing 2D commercial P&R tools and then consider inter-die connection legality, but they fail to take the legality and routing performance into account simultaneously. In this paper, we explore the formulation of the generalized assignment in the hybrid bonding terminal assignment problem. Our framework, BTAssign, offers a strict legality guarantee and an iterative solution. The experiments are conducted on 18 open-source designs with various 3D net densities and the most advanced bonding scale. The results reveal that BTAssign can achieve improvements in routed wirelength under all testing conditions from 1.0% to 5.0% with a tolerable runtime overhead.
With the advancements in 2.5/3D fabrication offered by Foundry Technologies for unleashing computing power, EDA tools must adapt and take a direction to be more integrated and IC centric for multi-chiplet system design. 3D stacking introduces extra design and analysis requirements like full system planning, power, thermal analysis, cross-die STA and inter-die physical verification which have to be taken into account early during planning and implementation. In this paper, Cadence presents its technology that proactively looks ahead through integrated early analysis and addresses all aspects of 3D-IC design comprehensively from system planning, implementation, analysis and system level signoff capabilities.
- Jun-Ho Choy
- Stéphane Moreau
- Catherine Brunet-Manquat
- Valeriy Sukharev
- Armen Kteyan
A physics-based multi-scale simulation methodology that analyses die stress variations generated by package fabrication is employed for warpage study. The methodology combines coordinate-dependent anisotropic effective properties extractor with finite element analysis (FEA) engine, and computes mechanical stress globally on a package-scale, as well as locally on a feature-scale. For the purpose of mechanical failure analysis in the early stage of a package design, the warpage measurements were used for the tool’s calibration. The warpage measurements on printed circuit board (PCB), interposer and chiplet samples, during heating and subsequent cooling, were employed for calibrating the model parameters. The warpage simulation results on full package represented by PCB-interposer-chiplets stack demonstrate the overall good agreement with measurement profile. Performed study demonstrates that the developed electronic design automation (EDA) tool and methodology can be used for accurate warpage prediction in different types of IC stacks at early stage of package design.
- Wen-Hao Liu
- Anthony Agnesina
- Haoxing Mark Ren
Printed circuit board (PCB) design is typically semi-automated or fully manual. However, in recent years, the scale of PCB designs has rapidly enlarged, such that the engineering effort of manual design has increased dramatically. Therefore, the criticality of automation emerges. PCB houses are looking for productivity improvement that is contributed by automation. In this talk, the speaker will give a short tutorial about how a PCB design is done today and then indicate the challenges and opportunities for PCB design automation.
SESSION: Session 6: Artificial Intelligence and Machine Learning
- Hao-Hsiang Hsiao
- Yi-Chen Lu
- Pruek Vanna-Iampikul
- Sung Kyu Lim
Current state-of-the-art Design Space Exploration (DSE) methods in Physical Design (PD), including Bayesian optimization (BO) and Ant Colony Optimization (ACO), mainly rely on black-boxed rather than parametric (e.g., neural networks) approaches to improve end-of-flow Power, Performance, and Area (PPA) metrics, which often fail to generalize across unseen designs as netlist features are not properly leveraged. To overcome this issue, in this paper, we develop a Reinforcement Learning (RL) agent that leverages Graph Neural Networks (GNNs) and Transformers to perform “fast” DSE on unseen designs by sequentially encoding netlist features across different PD stages. Particularly, an attention-based encoder-decoder framework is devised for “conditional” parameter tuning, and a PPA estimator is introduced to predict end-of-flow PPA metrics for RL reward estimation. Extensive studies across 7 industrial designs under the TSMC 28nm technology node demonstrate that the proposed framework FastTuner, significantly outperforms existing state-of-the-art DSE techniques in both optimization quality and runtime. where we observe improvements up to 79.38% in Total Negative Slack (TNS), 12.22% in total power, and 50x in runtime.
- Suwan Kim
- Hyunbum Park
- Kyeonghyeon Baek
- Kyumyung Choi
- Taewhan Kim
Resolving the design rule checking (DRC) violations at the pre-route stage is critically important to reduce the time-consuming design closure process at the post-route stage. Recently, noticeable methodologies have been proposed to predict DRC hotspots using Machine Learning based prediction models. However, little attention has been paid to how the predicted DRC violations can be effectively resolved. In this paper, we propose a pre-route DRC violation resolution methodology that is tightly coupled with fully compatible prediction model. Precisely, we devise different resolution strategies for two types of DRC violations: (1) pin accessibility (PA)-related and (2) routing congestion (RC)-related. To this end, we develop a fully predictable ML-based model for both PA and RC-related DRC violations, and propose completely different resolution techniques to be applied depending on the DRC violation type informed by the compatible prediction model such that for (1) PA-related DRC violation, we extract the DRC violation mitigating regions, then improve placement by formulating the whitespace redistribution problem on the regions into an instance of Bayesian Optimization problem to produce an optimal cell perturbation, while for (2) RC-related DRC violation, we manipulate the routing resources within the regions that have high potential for the occurrence of RC-related DRC violation. Through experiments, it is shown that our methodology is able to resolve the number of DRC violations by 26.54%, 25.28%, and 20.34% further on average over that by a conventional flow with no resolution, a commercial ECO router, and a state-of-the-art academic predictor/resolver, respectively, while maintaining comparable design quality.
3D Integrated Circuits (3D-ICs) represent a significant advancement in semiconductor technology, offering enhanced functionality in smaller form factors, improved performance, and cost reductions. These 3D-ICs, particularly those utilizing Through-Silicon Vias (TSVs), are at the forefront of industry trends. They enable the integration of system components from various process nodes, including analog and RF, without being limited to a single node. TSVs outperform wire-bonded System in Package (SiP) in terms of reduced (RLC) parasitics, offering better performance, more power efficiency, and denser implementation. Compared to silicon interposer methods, vertical 3D die stacking achieves higher integration levels, smaller sizes, and quicker design cycles. This presentation introduces a novel AI-driven method designed to tackle the challenges hindering the automation of 3D-IC design flows.
The VLSI chip design process consists of a sequence of distinct steps like floor planning, placement, clock tree synthesis and routing. Each of these steps requires solving optimization problems that are often NP-hard, and the state-of-the art algorithms are not guaranteed to the optimal. Due to the compartmentalization of the design flow into distinct steps, these optimization problems are solved sequentially, with the output of first feeding into the next. This results in an inherent inefficiency, where the optimization goal of an early step problem is estimated using a fast and approximate surrogate model for the following steps. Consequently, any improvement in the step-specific optimization algorithm, while obvious at that step, is much smaller when measured at the end of the full design flow. For example, the placement step minimizes wire length. In the absence of routed nets, this wire length might be estimated by using a simple wire length model like the Steiner tree. Thus, any improvement in the placement algorithm is limited by the accuracy of the wire length estimate.
Recently, Reinforcement Learning (RL) has emerged as a promising alternative to the state-of-the-art algorithms used to solve optimization problems in placement and routing of a VLSI design [1, 2, 3]. The RL problem setup involves an agent exploring an unknown environment to achieve a goal. RL is based on the hypothesis that all goals can be described by the maximization of expected cumulative reward. The agent must learn to sense and perturb the state of the environment using its actions to derive maximal reward. Many problems in VLSI chip design can be represented as Markov Decision Problems (MDPs), where design optimization objectives are converted into rewards given by the environment and design variables are converted into actions provided to the environment. Recent advances in applying RL to VLSI implementation problems such as floor planning, standard cell layout, synthesis and placement have demonstrated improvements over the state-of-the-art algorithms. However, these improvements continue to be limited by the inaccuracies in the estimate of the optimization goal as described previously.
With DSO.ai, we have built a distributed system for the optimization of physical design flow, where multiple iterations of parallel runs are used to optimize enormous design parameter search spaces. In addition to a multiplicative improvement in human productivity, it has unlocked significant performance gains across a wide range of technology nodes. At the heart of DSO.ai’s decision engine is an implementation of RL that solves the sequential decision making and optimization problem spanning the entire design flow. Unlike prior works where RL is used for step-specific optimization within the chip design flow, DSO.ai’s RL algorithm wraps around the optimization steps to guide them via parameter choices that depend upon the optimization goal for the full flow. Thus, the quality of the final design generated by DSO.ai is no longer subject to the limitations of the compartmentalized design flow. DSO.ai’s RL algorithm views the full chip design flow as one optimization problem, where the design quality at the end of the flow is the only one that matters. To propagate the design through the design flow, DSO.ai makes parameter choices for the underlying optimization steps, which constitute its action space. By tracking the effect of these actions as a function of the design state, DSO.ai can find the optimal sequence of actions to meet the optimization goal at the end of the full flow.
In this presentation, we demonstrate how DSO.ai provides a flexible framework to integrate with existing design flows and serve the design quality needs throughout the design evolution cycle. We will also highlight how DSO.ai is allowing expert designers at Synopsys to package their knowledge into fully featured toolboxes ready to be deployed by novice designers. We will provide a summary of QoR gains that DSO.ai has delivered on advanced process nodes. Finally, we will show how DSO.ai’s decision engine is paving the way to automating the parameter choices in the chip design flow.
It has been six years since an ISPD-2018 invited talk on “Machine Learning Applications in Physical Design”. Since then, despite considerable activity across both academia and industry, many R&D targets remain open. At the same time, there is now clearer understanding of where AI/ML can and cannot (yet) move the needle in physical design, as well as some of the difficult blockers and technical challenges that lie ahead. Some futures for AI/ML-boosted physical design are visible across solvers, engines, tools and flows – and in contexts that span generative AI, the modeling of “magic” handoffs at flow interstices, academic research infrastructure, and the culture of benchmarking and open-source EDA.
SESSION: Session 7: Second Keynote
To achieve the power, performance, and area (PPA) target in modern semiconductor design, the trend to go for More-than-Moore heterogeneous integration by packing various components/dies into a package becomes more obvious as the economic advantages of More-Moore scaling for on-chip integration are getting smaller and smaller. In particular, we have already encountered the high cost of moving to more advanced technology and the high fabrication cost associated with extreme ultraviolet (EUV) lithography , mask, process, design, electronic design automation (EDA), etc. Heterogeneous integration refers to integrating separately manufactured components into a higher-level assembly (in a package or even multiple packages in a PCB) that provides enhanced functionality and improved operating characteristics. Unlike the on-chip designs with relatively regular components and wirings, the physical design problem for heterogeneous integration often needs to handle arbitrary component shapes, diverse metal wire widths, and different spacing requirements between components, wire metals, and pads, with multiple cross-physics domain considerations such as system-level, physical, electrical, mechanical, thermal, and optical effects, which are not well addressed in the traditional chip design flow. In this paper, we first introduce popular heterogeneous integration technologies and options, their layout modeling and physical design challenges, survey key published techniques, and provide future research directions for modern physical design for heterogeneous integration.
SESSION: Session 8: Analog
This article discusses fundamental differences between analog and digital circuits from a design perspective. On this basis one can understand why the design flows of these two circuit types differ so greatly, notably with regard to their degree of automation.
- Andreas Krinke
- Robert Fischbach
- Jens Lienig
The design and manufacturing of integrated circuits is an expensive endeavor. The use of open-source software can lower the barrier to entry significantly, especially for smaller companies or startups. In this paper, we look at open-source software for layout verification, a crucial step in ensuring the consistency and manufacturability of a design. We show that a comprehensive design rule check (DRC) and layout versus schematic (LVS) check for commercial technologies is possible with open-source software in general and with KLayout in particular. To facilitate the use of these tools, we present our approach to automatically generate the required DRC scripts from a more abstract representation. As a result, we are able to generate nearly 74% of the over 1000 design rules of X-FABs XH018 180nm technology as a DRC script for the open-source software KLayout. This demonstrates the potential of using open-source software for layout verification and open-source process design kits (PDKs) in general.
- Mark Po-Hung Lin
- Chou-Chen Lee
- Yi-Chao Hsieh
Analog placement is a crucial phase in analog integrated circuit synthesis, impacting the quality and performance of the final circuits. This process involves determining the physical positions of analog building blocks while minimizing chip area and interconnecting wire-length. Existing methodologies often rely on the simulated-annealing (SA) approach, prioritizing constraints like symmetry-island, proximity, and well-island. We present a novel reinforcement learning (RL) based analog placement methodology on the bounded-sliceline grid (BSG) structure. Introducing a hierarchical clustering feature in BSG, we address well-island, proximity, and symmetry constraints. In experimental comparisons with the SA approach, our RL-based method exhibits superior placement quality across various analog circuits.
SESSION: Session 9: Placement
- Teng-Ping Huang
- Shao-Yun Fang
Propelled by aggressive technology scaling, adopting mixed-cell-height design in VLSI circuits has made conventional single row-based cell legalization techniques obsolete. Furthermore, the vertical abutment constraint (VAC) among cells on consecutive rows emerges as an advanced design requirement, which has rarely been considered because the power/ground rails were sufficiently tall in conventional process nodes to isolate cells on different rows. Although there have been a number of studies on mixed-cell-height legalization, most of them cannot be trivially extended to well-tackle the general VAC due to the analytical optimization scheme. To address these issues, this work proposes the first mixed-cell-height legalization algorithm that addresses the general inter-row cell abutment constraint (i.e., VAC). The experimental results show that the proposed algorithm outperforms previous mixed-cell-height legalization works, even in the absence of the VAC. Upon applying the VAC, our algorithm offers superior performance and delivers promising results.
- Yu Zhang
- Yuan Pu
- Fangzhou Liu
- Peiyu Liao
- Kai-Yuan Chao
- Keren Zhu
- Yibo Lin
- Bei Yu
A circuit design incorporating non-integer multi-height (NIMH) cells, such as a combination of 8-track and 12-track cells, offers increased flexibility in optimizing area, timing, and power simultaneously. The conventional approach for placing NIMH cells involves using commercial tools to generate an initial global placement, followed by a legalization process that divides the block area into row regions with specific heights and relocates cells to rows of matching height. However, such placement flow often causes significant disruptions in the initial placement results, resulting in inferior wirelength. To address this issue, we propose a novel multi-electrostatics-based global placement algorithm that utilizes the NIMH-aware clustering method to dynamically generate rows. This algorithm directly tackles the global placement problem with NIMH cells. Specifically, we utilize an augmented Lagrangian formulation along with a preconditioning technique to achieve high-quality solutions with fast and robust numerical convergence. Experimental results on the OpenCores benchmarks demonstrate that our algorithm achieves about 12% improvements on HPWL with 23.5X speed up on average, outperforming state-of-the-art approaches. Furthermore, our placement solutions demonstrate a substantial improvement in WNS and TNS by 22% and 49% respectively. These results affirm the efficiency and effectiveness of our proposed algorithm in solving row-based placement problems for NIMH cells.
- Yuan Pu
- Tinghuan Chen
- Zhuolun He
- Chen Bai
- Haisheng Zheng
- Yibo Lin
- Bei Yu
This paper proposes IncreMacro, a novel approach for macro placement refinement in the context of integrated circuit (IC) design. The suggested approach iteratively and incrementally optimizes the placement of macros in order to enhance IC layout routability and timing performance. To achieve this, IncreMacro utilizes several methods including kd-tree-based macro diagnosis, gradient-based macro shifting and constraint-graph-based LP for macro legalization. By employing these techniques iteratively, IncreMacro meets two critical solution requirements of macro placement: (1) pushing macros to the chip boundary; and (2) preserving the original macro relative positional relationship. The proposed approach has been incorporated into DREAMPlace and AutoDMP, and is evaluated on several RISC-V benchmark circuits at the 7-nm technology node. Experimental results show that, compared with the macro placement solution provided by DREAMPlace (AutoDMP), IncreMacro reduces routed wirelength by 6.5% (16.8%), improves the routed worst negative slack (WNS) and total negative slack (TNS) by 59.9% (99.6%) and 63.9% (99.9%), and reduces the total power consumption by 3.3% (4.9%).
- Jai-Ming Lin
- You-Yu Chang
- Wei-Lun Huang
Since the multilevel framework with the analytical approach has been proven as a promising method to handle the very-large-scale integration (VLSI) placement problem, this paper presents two techniques including a pin-connectivity-aware cluster score function and identification of expected object distribution ranges to further improve the coarsening and refinement stages of this framework. Moreover, we extend the proposed analytical placement method to consider timing in order to speed up design convergence. To optimize timing without increasing wirelength, our approach only increases the weights of timing-critical nets, where the weight of a net is estimated according to the associated timing slack and degree. Besides, we propose a new equation to update net weights based on their historical values to maintain the stability of the net-based timing-driven placement approach. Experimental results demonstrate that the proposed analytical placement approach with new techniques can actually improve wirelength of the classic approach. Moreover, our TDP can get much better WNS and TNS than the previous timing-driven placers such as DREAMPlace4.0 and Differentiable TDP.
SESSION: Session 10: Standard Cell, Routability, and IR drop
- Bing-Xun Song
- Ting Xin Lin
- Yih-Lang Li
In recent years, the accessibility of pins has become a focal point for cell design and synthesis research. In this study, we propose a novel approach to improve routability in upper-level routing by eliminating one M1 track during cell synthesis. This creates space for accommodating upper-level routing, leading to improved routability. We achieve consolidated routability of transistor placement by integrating fast track assignment with dynamic programming-based transistor placement. Additionally, we introduce a hybrid routing algorithm that identifies an optimal cell routing territory for each net. This optimal territory facilitates subsequent Steiner Minimum Tree (SMT) solutions for mixed-integer linear programming (MILP) and constrains the routing region of MILP, resulting in accelerated execution. The proposed MILP approach enables concurrent routing planning and pin metal allocation, effectively resolving the chicken-or-egg causality dilemma. Experimental results demonstrate that, when using the routing-friendly synthesized cell library, the routing quality in various designs surpasses that achieved with a handcrafted cell library in ASAP7 PDK. This improvement is evident in metrics such as wirelength, number of vias, and design rule check (DRC) violations.
- Chia-Tung Ho
- Ajay Chandna
- David Guan
- Alvin Ho
- Minsoo Kim
- Yaguang Li
- Haoxing Ren
Standard cells are essential components of modern digital circuit designs. With process technologies advancing beyond 5nm, more routability issues have arisen due to the decreasing number of routing tracks (RTs), increasing number and complexity of design rules, and strict patterning rules. The standard cell design automation framework is able to automatically design standard cell layouts, but it is struggling to resolve the severe routability issues in advanced nodes. As a result, a better and more efficient standard cell design automation method that can not only resolve the routability issue but also scale to hundreds of transistors to shorten the development time of standard cell libraries is highly needed and essential.
High quality device clustering with the considerations of routability in the layouts of different technology nodes can reduce the complexity and assist finding the routable layouts faster. In this paper, we develop a novel transformer model-based clustering methodology – training the model using LVS/DRC clean cell layouts and leveraging the personalized page rank vectors to cluster the devices with the attentions to netlist graph and learned embeddings from the actual LVS/DRC clean layouts. On a benchmark of 94 complex and hard-to-route standard cells, the proposed method not only generates 15% more LVS/DRC clean layouts, but also achieves average 12.7× faster than previous work. The proposed method can generate 100% LVS/DRC clean cell layouts over 1000 standard cells and achieve 14.5% smaller cell width than an industrial standard cell library.
- Chien-Pang Lu
- Iris Hui-Ru Jiang
- Chung-Ching Peng
- Mohd Mawardi Mohd Razha
- Alessandro Uber
Multiple power domain design is prevalent for achieving aggressive power savings. In such design, power delivery to cross-domain cells poses a tough challenge at advanced technology nodes because of the stringent IR drop constraint and the routing resource competition between the secondary power routing and regular signal routing. Nevertheless, this challenge was rarely mentioned and studied in recent literature. Therefore, in this paper, we explore power sub-mesh construction to mitigate the IR drop issue for cross-domain cells and minimize its routing overhead. With the aid of physical, power, and timing related features, we train one IR drop prediction model and one design rule violation prediction model under power sub-meshes of various densities. The trained models effectively guide sub-mesh construction for cross-domain cells to budget the routing resource usage on secondary power routing and signal routing. Our experiments are conducted on industrial mobile designs manufactured by a 6nm process. Experimental results show that IR drop of cross-domain cells, the routing resource usage, and timing QoR are promising after our proposed methodology is applied.
SESSION: Session 11: Thermal Analysis and Packaging
Thermal Challenge from modeling heterogeneous 2.5/3D IC-package is important for several reasons. Designing a large high power device, e.g. a AI or HPC processor without considering how to get the heat out is likely to lead to problems later on, resulting in a sub-optimal packaging solution from cost, size, weight and performance perspectives. Thermal simulation combines with physical verification. The benefits are enablement for automatic extraction, power map generation and simulation of the complete 3D IC assembly, viewing thermal map, and addressing hotspot. Make the IC design flow aware temperature and hotspot at the early design stage.
The 3DIC design world is blooming with new ideas and new possibilities. With TSMC’s 3DFabricTM technology, new opportunities in architectural innovation have led to superior system performance and density. However, what comes with the new opportunities is the continuous rise in design complexity.
In this talk, we will introduce 3Dblox, our latest invention to ease the design complexity challenge. The 3Dblox is an innovative design language that modularizes the complex 3DIC structures to streamline the design flow, and is open and free to all industry participants. All 4 EDA vendors, including Cadence, Synopsys, Ansys, and Siemens have actively participated in this effort to provide a unified design ecosystem to unleash the ultimate 3DIC design productivity.
3D integration solutions have been called for in the semiconductor market for a long time to possibly substitute the place of technology scaling. It consists of 3D IC packaging, 3D IC integration, and 3D silicon integration. 3D IC packaging has been in the market, but 3D IC and silicon integrations have obtained more attention and care due to modern system requirements on high performance computing and edge AI applications. In the need of further integration in electronics system development at lower cost, chip and package design are therefore evolving along the years [11].
First in technologies, except through-silicon-via (TSV) technology being applied for 3D stacking ICs, hybrid bonding (also called Cu-to-Cu direct bonding) and microbump bonding are used in silicon integration/advanced packaging, they were presented in [6, 7, 11], to name a few. They are prepared for chiplet-based (some papers called multi-die-based) integration. Manufacturing concerns such as thermal or stress-induced difficulties are still with us, how we use/adapt modern technologies wisely/cost-effectively has been our continuing mission. Besides, the materials for the integration such as glass instead of silicon for interposer can possibly cut us some slack [9].
Second in methodologies, since they are closely coupled with the technologies, tool vendors have been worked closely with foundries and design houses for more effective solutions. Among those efforts, die-to-die (D2D) implementation and interface are considered the top priority for system realization. Through the development history of redistribution layer (RDL) routing in different period, to the standard creation in universal chiplet interconnect express (UCIe) [8], we need to figure out ways to achieve more efficient and cost effective methods.
Last but not least, the focus of the integration has been system specification itself. Either the system is in edge or cloud applications will make a lot of difference. There have been some efforts in chiplet-based LLM cloud chips such as [3, 4]. Edge AI accelerators should be chipletized as well, especially for coming trends in smaller language model applications. Although there have been prior research for tools such as [1], we need more efforts on edge AI design automation. In all, system-technology co-design (STCO) concept [5, 10] is another perspective to approach the best implementation for different applications.
In our first attempt of the series manuscripts [2], we have introduced some perspectives on 3D integration for system designs; in this talk, we continue to depict the future of 3D integration as we know of, plus new observations, technologies, and methodologies such as programmable package, building-block-based multi-chiplet methodology.
SESSION: Session 12: Lifetime Achievement Session
In a typical integrated circuit electronic design automation (EDA) flow, scheduling is a key step in high-level synthesis, which is the first stage of the EDA flow that synthesizes a cycle-accurate register transfer level (RTL) from the given behavior description, while physical design is the last stage in the EDA flow that generates the final geometric layout of the transistors and wires for fabrication. As a result, scheduling and physical design are usually carried out independently. In this paper, I discuss multiple research projects that I have been involved with, where the interaction between scheduling and physical design are shown to be highly beneficial. I shall start with my very first paper in EDA on multi-layer channel routing which benefited from an unexpected connection to the optimal two-processor scheduling algorithm, a joint work with Prof. Martin Wong, who is being honored at this conference for the 2024 ISPD Lifetime Achievement Award. Then, I shall further demonstrate how scheduling can help to overcome interconnect bottleneck, enable parallel placement and routing, and, finally, play a key role in layout synthesis for quantum computing.
Today, we have abundant parallel computing resources, while most EDA tools are still running sequentially. It is interesting to see how physical design can be advanced by leveraging this massive parallel computing power. To achieve significant speedup, it is usually not simply running the same sequential method a few copies in parallel. Innovative parallel algorithms that solve the problem from a new perspective using different mechanisms are needed. We will look at a few examples in physical design and logic synthesis in this talk to illustrate some methodologies and techniques in parallelizing design automation.
Professor Martin D. F. Wong is well recognized as a distinguished figure in the community of physical design, owing to his numerous and noteworthy contributions. This talk aims to highlight his pioneering works in the field of automatic floorplan design. Professor Wong’s profound insights and innovative approaches have not only propelled advancements in the field but also served as an inspirational source for other researchers.
The 2024 International Symposium on Physical Design lifetime achievement award goes to Professor Martin D F Wong for his oustanding contributions in the field.
SESSION: Session 13: Third Keynote
Large-language models (LLMs) have achieved remarkable performance in many AI applications, but they require large parameter size in their models. The parameter size ranges from several billions to trillion parameters, and results in huge computation requirements on both training and inference. General speaking, LLMs increasing more parameters are to explore “Emergent Abilities” for AI models. On the other hands, LLMs with fewer parameters are to reduce computing burden to democratize generative AI applications.
To fulfill huge computation requirement, Domain Specific Architecture is important to co-optimize AI models, hardware, and software designs, and to make trade-offs among different design parameters. Besides, there are also trade-offs between AI computation throughput and energy efficiency on different types of AI computing systems.
Large Multimodal Models (LMMs), also called Multimodal Large Language Models, integrates multiple data types as input. Multimodal information can provide rich and or environment information for LMMs to generate better user experience. LMM is also a trend for mobile devices, because mobile devices often connect with many sensors, such as video, audio, touch, gyro, navigation system, etc.
Recently, there is a trend to run smaller LLMs/LMMs (near or less than 10 billion parameters) on edge device-side, such as Llama 2, Gemini Nano, Phi-2, etc. It shines a light to apply LLMs/LMMs in mobile devices. Several companies provided experimental solutions on edge devices, such as smartphone and PC. Even LLMs/LMMs model size are reduced, they still require more computing resources than previous mobile processor workloads, and face challenges on memory size, bandwidth, and power efficiency requirements.
Besides, device-side LLMs/LMMs in mobile processors can collaborate with cloud-side LLMs/LMMs in the data center to deliver better performance. They can off-load computing from cloud-side models to provide seamless response, or to become an agent to prompt cloud-side LLMs/LMMs, or be fine-tuned locally by user data to keep privacy.
Those LLMs/LMMs trends and new usage scenarios will shape future computing architecture design. In this talk we will discuss those issues, and especially their impacts on mobile processor design.
SESSION: Session 14: Quantum and Superconducting Circuits
The physical qubits in current quantum computers do not all interact with each other. Therefore, in executing a quantum algorithm on an actual quantum computer, layout synthesis is a crucial step that ensures that the synthesized circuit of the quantum algorithm can run smoothly on the quantum computer. In this paper, we focus on a layout synthesis problem for quantum circuits and improve a prior work, TB-OLSQ, which adopts a transition-based satisfiability modulo theories (SMT) formulation. We present how to modify TB-OLSQ to obtain an accelerated version for runtime reduction. In addition, we extend the accelerated version by considering gate absorption for better solution quality. Our experimental results show that compared with TB-OLSQ, the accelerated version achieves 121X speedup for a set of SWAP-free circuits and 6X speedup for the other set of circuits with no increase in SWAP gates. In addition, the accelerated version with gate absorption helps reduce the number of SWAP gates by 38.9% for the circuits requiring SWAP gates, while it is also 3X faster.
- Wei-Hsiang Tseng
- Yao-Wen Chang
- Jie-Hong Roland Jiang
Qubit mapping is crucial in optimizing the performance of quantum algorithms for physical executions on quantum computing architectures. Many qubit mapping algorithms have been proposed for superconducting systems recently. However, due to their limitations on the physical qubit connectivity, costly SWAP gates are often required to swap logical qubits for proper quantum operations. Trapped-ion systems have emerged as an alternative quantum computing architecture and have gained much recent attention due to their relatively long coherence time, high-fidelity gates, and good scalability for multi-qubit coupling. However, the qubit mapping of the new trapped-ion systems remains a relatively untouched research problem. This paper proposes a new coupling constraint graph with multi-pin nets to model the unique constraints and connectivity patterns in one-dimensional trapped-ion systems. To minimize the time steps for quantum circuit execution satisfying the coupling constraints for trapped-ion systems, we devise a divide-and-conquer solution using Satisfiability Modulo Theories for efficient qubit mapping on trapped-ion quantum computing architectures. Experimental results demonstrate the superiority of our approach in scalability and effectiveness compared to the previous work.
Adiabatic quantum-flux parametron (AQFP) is a superconducting technology with extremely low power consumption compared to traditional CMOS structure. Since AQFP logic gates are all clocked by AC current, extra buffer cells are required for balancing the length of data paths. Furthermore, since the output current of an AQFP logic gate is too weak to drive more than one gate, splitter cells are needed for branching the output signals of multi-fanout gates. For an AQFP circuit, the total number of additional buffers and splitters may be much more than the number of logic gates (up to 9 times in the benchmark circuits after optimization), which would greatly impact the power, performance, and area of the circuit. In this paper, we propose several techniques to (i) reduce the total number of required buffers and splitters, and (ii) perturb the levels of logic gates in order to seek more optimization opportunities for buffer and splitter reduction. Experimental results shows that our approach has better quality with comparable runtime compared to a retiming-based method from ASP-DAC’23. Moreover, our approach has quality which is on equal footing with the integer linear programming-based method also from ASP-DAC’23.
SESSION: Session 15: Physical Design Challenges for Automotive
As vehicular technology advances, vehicles become more connected and autonomous. Connectivity provides the capability to exchange information between vehicles, and autonomy provides the capability to make decisions and control each vehicle precisely. Connectivity and autonomy realize many evolutional applications, such as intelligent intersection management and cooperative adaptive cruise control. Electric vehicles are sometimes combined to create more use cases and business models. However, these intelligent features make the design process more complicated and challenging. In this talk, we introduce several examples of automotive design automation, which is required to improve the design quality and facilitate the design process. We mainly discuss the rising incompatibility issue, where different original equipment manufacturers and suppliers are developing systems, but the designs are confidential and thus incompatible with other players’ designs. The incompatibility issue is especially critical with autonomous vehicles because no human driver resolves incompatible scenarios. We believe that techniques and experiences in electronic design automation can provide insights and solutions to automotive design automation.
The design of automotive ASICs faces several key challenges that mainly arise from the harsh environmental operating conditions, specific functional loads, cost pressure, safety requirements, and the steady progress of the automotive-grade semiconductor technologies that are unique to automotive applications.
The talk first highlights these key differences between the design approaches for automotive and non-automotive ASIC designs. It also addresses why automotive ASIC designs prefer larger and more mature nodes compared to leading-edge non-automotive ASIC designs. In addition, the talk introduces several automotive-specific physical design problems and essential solutions for design implementation, direct-verification and meta-verification to address them. Finally, the talk provides an outlook of several related and yet-unsolved challenges in the physical design domain.
Silicon systems have been part of automobiles for a long time. The physical design methodology to address the quality, reliability, and safety challenges of these systems are common knowledge in the leading automotive semiconductor companies. The rise of trends like autonomous driving (ADAS), software defined vehicles (SDV) and the electrification of our transportation network are giving rise to not only new levels of these challenges, but also many new players in the automotive semiconductor space. The same forces of opportunity which are transforming our society are also the foundation of a transformation in automotive semiconductor design: massive improvements in accelerated compute, 3DIC and chiplet based design, digital twins, and artificial intelligence (AI). We’ll discuss how these forces are helping modern automotive semiconductor design and highlight how the electronic design automation (EDA) industry can apply successful principles from earlier eras to these new challenges.
SESSION: Session 16: Contest Results and Closing Remarks
- Rongjian Liang
- Anthony Agnesina
- Wen-Hao Liu
- Haoxing Ren
Modern VLSI design flows demand scalable global routing techniques applicable across diverse design stages. In response, the ISPD 2024 contest pioneers the first GPU/ML-enhanced global routing competition, selecting advancements in GPU-accelerated computing platforms and machine learning techniques to address scalability challenges. Large-scale benchmarks, containing up to 50 million cells, offer test cases to assess global routers’ runtime and memory scalability. The contest provides simplified input/output formats and performance metrics, framing global routing challenges as mathematical optimization problems and encouraging diverse participation. Two sets of evaluation metrics are introduced: the primary one concentrates on global routing applications to guide post-placement optimization and detailed routing, focusing on congestion resolution and runtime scalability. Special honor is given based on the second set of metrics, placing additional emphasis on runtime efficiency and aiming at guiding early-stage planning.