ISPD ’23: Proceedings of the 2023 International Symposium on Physical Design

 Full Citation in the ACM Digital Library

SESSION: Session 1: Opening Session and Keynote I

Automated Design of Chiplets

  • Alberto Sangiovanni-Vincentelli
  • Zheng Liang
  • Zhe Zhou
  • Jiaxi Zhang

Chiplet-based designs have gained recognition as a promising alternative to monolithic SoCs due to their lower manufacturing costs, improved re-usability, and optimized technology specialization. Despite progress made in various related domains, the design of chiplets remains largely reliant on manual processes. In this paper, we provide an examination of the historical evolution of chiplets, encompassing a review of crucial design considerations and a synopsis of recent advancements in relevant fields. Further, we identify and examine the opportunities and challenges in the automated design of chiplets. To further demonstrate the potential of this nascent area, we present a novel task that

SESSION: Session 2: Routing

FastPass: Fast Pin Access Analysis with Incremental SAT Solving

  • Fangzhou Wang
  • Jinwei Liu
  • Evangeline F.Y. Young

Pin access analysis is a critical step in detailed routing. With complicated design rules and pin shapes, efficient and accurate pin accessibility evaluation is desirable in many physical design scenarios. To this end, we present FastPass, a fast and robust pin access analysis framework, which first generates design rule checking (DRC)-clean pin access route candidates for each pin, pre-computes incompatible pairs of routes, and then uses incremental SAT solving to find an optimized pin access scheme. Experimental results on the ISPD 2018 benchmarks show that FastPass produces DRC-clean pin access schemes for all cases while being 14.7× faster than the known best pin access analysis framework on average.

Pin Access-Oriented Concurrent Detailed Routing

  • Yun-Jhe Jiang
  • Shao-Yun Fang

Due to continuously shrunk feature sizes and increased design complexity, the difficulty in pin access becomes one of the most critical challenges in large-scale full-chip routing. State-of-the-art pin access-aware detailed routing techniques suffer from either the ordering problem of the sequential routing scheme or the inflexibility of pre-determining an access point for each pin. Some other routing-related studies create pin extensions with Metal-2 metal segments to optimize pin accessibility; however, this strategy may not be practical without considering the contemporary routing flow. This paper presents a pin access-oriented concurrent detailed routing approach conducted after the track assignment stage. The core detailed routing engine is based on an integer linear programming (ILP) formulation, which has lower complexity and can flexibly tackle multi-pin nets compared to an existing formulation. Besides, to maximize the free routing resource and to keep the problem size tractable, a pre-processing flow trimming redundant metals and inserting assistant metals is developed. The experimental results show that compared to a state-of-the-art academic router, the proposed concurrent scheme can effectively derive good results with fewer design rule violations and less runtime.

Reinforcement Learning Guided Detailed Routing for Custom Circuits

  • Hao Chen
  • Kai-Chieh Hsu
  • Walker J. Turner
  • Po-Hsuan Wei
  • Keren Zhu
  • David Z. Pan
  • Haoxing Ren

Detailed routing is the most tedious and complex procedure in design automation and has become a determining factor in layout automation in advanced manufacturing nodes. Despite continuing advances in custom integrated circuit (IC) routing research, industrial custom layout flows remain heavily manual due to the high complexity of the custom IC design problem. Besides conventional design objectives such as wirelength minimization, custom detailed routing must also accommodate additional constraints (e.g., path-matching) across the analog/mixed-signal (AMS) and digital domains, making an already challenging procedure even more so. This paper presents a novel detailed routing framework for custom circuits that leverages deep reinforcement learning to optimize routing patterns while considering custom routing constraints and industrial design rules. Comprehensive post-layout analyses based on industrial designs demonstrate the effectiveness of our framework in dealing with the specified constraints and producing sign-off-quality routing solutions.

Voltage-Drop Optimization Through Insertion of Extra Stripes to a Power Delivery Network

  • Jai-Ming Lin
  • Yu-Tien Chen
  • Yang-Tai Kung
  • Hao-Jia Lin

As the complexity increases, power delivery network (PDN) optimization becomes a more important step in a modern design. In order to construct a robust PDN, most classic PDN optimization methods focus on adjusting the dimensions of power stripes. However, this approach becomes infeasible when voltage violation regions also have severe routing congestion. Hence, this paper proposes a delicate procedure to insert additional power stripes to reduce voltage violation while maintaining routability. In the beginning, IR-drop high related regions are identified to reveal those locations which are thirsty for more currents. Then, we solve a minimum-cost flow problem to find the topologies of power delivery paths (PDPs) from power sources to these regions and determine the widths of edges in each PDP so that enough currents can be provided to these regions. Moreover, vertical power stripes (VPSs for short) are inserted to the locations which have less routing congestion and severe voltage violations by the dynamic programming to reduce a probability to deteriorate routability. Finally, more wires will be inserted to IR-drop high related regions if there still exist voltage violations. Experimental results show that our method can use much less routing resource and induce less routing congestion to meet IR-drop constraint in industry designs.

NVCell 2: Routability-Driven Standard Cell Layout in Advanced Nodes with Lattice Graph Routability Model

  • Chia-Tung Ho
  • Alvin Ho
  • Matthew Fojtik
  • Minsoo Kim
  • Shang Wei
  • Yaguang Li
  • Brucek Khailany
  • Haoxing Ren

Standard cells are essential components of modern digital circuit designs. With process technologies advancing beyond the 5nm node, more routability issues have arisen due to the decreasing number of routing tracks, increasing number and complexity of design rules, and strict patterning rules. Automatic standard cell synthesis tools are struggling to design cells with severe routability issues. In this paper, we propose a routability-driven standard cell synthesis framework using a novel pin density aware congestion metric, lattice graph routability modelling approach, and dynamic external pin allocation methodology to generate routability optimized layouts. On a benchmark of 94 complex and hard-to-route standard cells, NVCell 2 improves the number of routable and LVS/DRC clean cell layouts by 84.0% and 87.2%, respectively. NVCell 2 can generate 98.9% of cells LVS/DRC clean, with 13.9% of the cells having smaller area, compared to an industrial standard cell library with over 1000 standard cells.

SESSION: Session 3: 3D ICs, Heterogeneous Integration, and Packaging I

FXT-Route: Efficient High-Performance PCB Routing with Crosstalk Reduction Using Spiral Delay Lines

  • Meng Lian
  • Yushen Zhang
  • Mengchu Li
  • Tsun-Ming Tseng
  • Ulf Schlichtmann

In high-performance printed circuit boards (PCBs), adding serpentine delay lines is the most prevalent delay-matching technique to balance the delays of time-critical signals. Serpentine topology, however, can induce simultaneous accumulation of the crosstalk noise, resulting in erroneous logic gate triggering and speed-up effects. The state-of-the-art approach for crosstalk alleviation achieves waveform integrity by enlarging wire separation, resulting in an increased routing area. We introduce a method that adopts spiral delay lines for delay matching to mitigate the speed-up effect by spreading the crosstalk noise uniformly in time. Our method avoids possible routing congestion while achieving a high density of transmission lines. We implement our method by constructing a mixed-integer-linear programming (MILP) model for routing and a quadratic programming (QP) model for spiral synthesis. Experimental results demonstrate that our method requires, on average, 31% less routing area than the original design. In particular, compared to the state-of-the-art approach, our method can reduce the magnitude of the crosstalk noise by at least 69%.

On Legalization of Die Bonding Bumps and Pads for 3D ICs

  • Sai Pentapati
  • Anthony Agnesina
  • Moritz Brunion
  • Yen-Hsiang Huang
  • Sung Kyu Lim

State-of-the-art 3D IC Place-and-Route flows were designed with older technology nodes and aggressive bonding pitch assumptions. As a result, these flows fail to honor the width and spacing rules for the 3D vias with realistic pitch values. We propose a critical new 3D via legalization stage during routing to reduce such violations. A force-based solver and bipartite-matching algorithm with Bayesian optimization are presented as viable legalizers and are compatible with various process nodes, bonding technologies, and partitioning types. With the modified 3D routing, we reduce the 3D via violations by more than 10× with zero impact on performance, power, or area.

Reshaping System Design in 3D Integration: Perspectives and Challenges

  • Hung-Ming Chen
  • Chu-Wen Ho
  • Shih-Hsien Wu
  • Wei Lu
  • Po-Tsang Huang
  • Hao-Ju Chang
  • Chien-Nan Jimmy Liu

In this paper, we depict modern system design methodologies via 3D integration along with the advance of packaging, considering system prototyping, interconnecting, and physical implementation. The corresponding challenges are presented as well.

SESSION: Session 4: 3D ICs, Heterogeneous Integration, and Packaging II

Co-design for Heterogeneous Integration: A Failure Analysis Perspective

  • Erica Douglas
  • Julia Deitz
  • Timothy Ruggles
  • Daniel Perry
  • Damion Cummings
  • Mark Rodriguez
  • Nichole Valdez
  • Brad Boyce

As scaling for CMOS transistors asymptotically approaches the end of Moore’s Law, the need to push into 3D integration schemes to innovate capabilities is gaining significant traction. Further, rapid development of new semiconductor solutions, such as heterogeneous integration, has turned the semiconductor industry’s consistent march towards next generation products into new arenas. In 2018, the Department of Energy Office of Science (DOE SC) released their “Basic Research Needs for Microelectronics,” communicating a strong push towards “parallel but intimately networked efforts to create radically new capabilities,”1 which they have coined as “co-design.”

Advanced packaging and heterogeneous integration, particularly with mixed semiconductor materials (e.g., CMOS FPGAs & GaN RF amplifiers) is a realm ripe for applicability towards DOE SC’s co-design call to action. In theory, development occurring at all scales across the semiconductor ecosystem, particularly across disciplines that are not traditionally adjacent, should significantly accelerate innovation. In reality, co-design requires a paradigm shift in approach, requiring not only interconnected parallel development. Further, accurate ground truth data during learning cycles is critical in order to effectively and efficiently communicate across disparate disciplines and advise design iterations across the microelectronics ecosystem.

This talk will outline three orthogonal facets towards co-design for HI: (1) on-going efforts towards development of materials characterization and failure analysis techniques to enable accurate evaluation of materials and heterogeneously integrated components, (2) development of artificial intelligence & machine learning algorithms for large scale, high throughput process development and characterization, and (3) development of capabilities for rapid communication and visualization of data across disparate disciplines.

Goal Driven PCB Synthesis Using Machine Learning and CloudScale Compute

  • Taylor Hogan

X AI is a cloud-based system that leverages machine learning, and search to place and route printed circuit boards using physics-based analysis and high-level design. We propose a feedback-based Monte Carlo Tree Search (MCTS) algorithm to explore the space of possible designs. A metric, or metrics, is given to evaluate the quality of designs as MCTS learns about possible solutions. A policy and value network are trained during exploration to learn to accurately weight quality actions and identify useful design states. This is performed as a feedback loop in conjunction with other feedforward tools for placement and routing.

Gate-All-Around Technology is Coming.: What’s Next After GAA?

  • Victor Moroz

Currently, the industry is transitioning from FinFETs to gate-all-around (GAA) technology and will likely have several GAA technology generations in the next few years. What’s next after that? This is the question that we are trying to answer in this project by benchmarking GAA technology with transistors on 2D materials and stacked transistors (CFETs).

The main objective for logic is to get a meaningful gain in power, performance, area, and cost (PPAC). The main objective for SRAM is to get a noticeable density scaling for the SRAM array and its periphery without losing performance and yield. Another objective is to move in the direction that has a promise of longer-term progress, such as to start stacking two layers of transistors before moving to a larger number of transistor layers. With that in mind, we explore and discuss the next steps beyond GAA technology.

SESSION: Session 5: Analog Design

VLSIR – A Modular Framework for Programming Analog & Custom Circuits & Layouts

  • Dan Fritchman

We present VLSIR, a modular and fully open-source framework for programming analog and custom circuits and layouts. VLSIR is centered around a protobuf-defined design database. It features high-productivity front-ends for hardware description (“circuit programming”), simulation, and custom layout programming, designed to be amenable to both human designers and automation.

Joint Optimization of Sizing and Layout for AMS Designs: Challenges and Opportunities

  • Ahmet F. Budak
  • Keren Zhu
  • Hao Chen
  • Souradip Poddar
  • Linran Zhao
  • Yaoyao Jia
  • David Z. Pan

Recent advances in analog device sizing algorithms show promising results on the automatic schematic design. However, the majority of the sizing algorithms are based on schematic-level simulations and layout-agnostic. The physical layout implementation brings extra parasitics to the analog circuits, leading to discrepancies between schematic and post-layout performance. This performance gap raises questions about the effectiveness of automatic analog device sizing tools. Prior work has leveraged procedural layout generation to account for layout-induced parasitics in the sizing process. However, the need for layout templates makes such methodology limited in application. In this paper, we propose to bridge automatic analog sizing with post-layout performance using state-of-the-art optimization-based analog layout generators. A quantitative study is conducted to measure the impact of layout awareness in state-of-the-art device sizing algorithms. Furthermore, we present our perspectives on the future directions in layout-aware analog circuit schematic design.

Learning from the Implicit Functional Hierarchy in an Analog Netlist

  • Helmut Graeb
  • Markus Leibl

Analog circuit design is characterized by a plethora of implicit design and technology aspects available to the experienced designer. In order to create useful computer-aided design methods, this implicit knowledge has to be captured in a systematic and hierarchical way. A key approach to this goal is to “learn” the knowledge from the netlist of an analog circuit. This requires a library of structural and functional blocks for analog circuits together with their individual constraints and performance equations, graph homomorphism techniques to recognize blocks that can have different structural implementations and I/O pins, as well as synthesis methods that exploit the learned knowledge. In this contribution, we will present how to make use of the functional and structural hierarchy of operational amplifiers. As an application, we explore the capabilities of machine learning in the context of structural and functional properties and show that the results can be substantially improved by pre-processing data with traditional methods for functional block analysis. This claim is validated on a data set of roughly 100,000 readily sized and simulated operational amplifiers.

The ALIGN Automated Analog Layout Engine: Progress, Learnings, and Open Issues

  • Sachin S. Sapatnekar

The ALIGN (Analog Layout, Intelligently Generated from Netlists) project [1, 2] is a joint university-industry effort to push the envelope of automated analog layout through a systematic new approach, novel algorithms, and open-source software [3]. Analog automation research has been active for several decades, but has not found widespread acceptance due to its general inability to meet the needs of the design community. Therefore, unlike digital design, which has a rich history of automation and extensive deployment of design tools, analog design is largely unautomated.

ALIGN attempts to overcome several of the major issues associated with this lack of success. First, to mimic the human designer’s ability to recognize sub-blocks and specify constraints, ALIGN has used machine learning (ML) based methods to assist in these tasks. Second, to overcome the limitation of past automation approaches, which are largely specific to a class of designs, ALIGN attempts to create a truly general layout engine by decomposing the layout automation process into a set of steps, with specific constraints that are specific to the family of circuits, which are divided into four classes: low-frequency components (e.g., analog-to-digital converters (ADCs), amplifiers, and filters); wireline components for high-speed links (e.g., equalizers, clock/data recovery circuits, and phase interpolators); RF/Wireless components (e.g., components of RF transmitters and receivers), and power delivery components (e.g., capacitor- and inductor-based DC-DC converters and low dropout (LDO) regulators). For each class of circuits, different sets of constraints are important, depending on their frequency, parasitic sensitivity, need for matching, etc., and ALIGN creates a unified methodological framework that can address each class. Third, in each step, ALIGN has generated new algorithms and approaches to help improve the performance of analog layout. Fourth, given that experienced analog designers desire greater visibility into the process and input into the way that design is carried out, ALIGN is built modularly, providing multiple entry points at which a designer may intervene in the process.

Analog Layout Automation On Advanced Process Technologies

  • Soner Yaldiz

Despite the digitization of analog and the disaggregated silicon trends, high-volume or high-performance system-on-chip (SoC) designs integrate numerous analog and mixed-signal (AMS) intellectual property (IP) blocks including voltage regulators, clock generators, sensors, memory and other interfaces. For example, fine-grain dynamic voltage and frequency scaling requires a dedicated clock generator and voltage regulator per compute unit. The design of these blocks in advanced FinFET or GAAFET technologies is challenging due to the i) increasing gap between schematic and post-layout simulation, ii) design rule complexity, and iii) strict reliability rules [1]. The convergence of a high-performance or a high-power block may require multiple iterations of circuit sizing and layout changes. As a result, physical design, which is primarily a manual effort, has become a key bottleneck in the design process. Migrating these blocks across process technologies or process variants only exacerbates the problem. Layout synthesis for AMS IP blocks is an on-going research problem with a long history [2] and is gaining more attention recently to leverage the latest advances in machine learning [3]. Yet neither template nor optimization-based approaches have reduced the burden significantly for high performance products on leading process technologies

This talk will first overview physical design of AMS IP blocks on an advanced process technology highlighting the opportunities and the expectations from layout automation during this process. On a new process technology, this process starts with conducting early layout studies on a selection of critical high performance or high power subcircuits. In parallel, the IP blocks are placed in a bottom-up fashion to optimize the IP floorplan but also to provide information to SoC floorplanning. Routing follows the placement to verify the post-layout performance. A quick turnaround during these explorations is vital to decide on any architectural changes or circuit re-sizing. The rest of the talk will share experiences with piloting an open-source analog layout synthesis tool flow [4] on a 22nm FinFET technology for voltage regulators [5].

The learnings from this exercise and the extensions to the tool flow will be summarized that include Boolean satisfiability-based routing algorithm, formally verifiable constraint language and leveraging parameterized and standard cells. The talk will conclude with opportunities for research.

SESSION: Session 6: Keynote II

Immersion and EUV Lithography: Two Pillars to Sustain Single-Digit Nanometer Nodes

  • Burn J. Lin

Semiconductor technology has advanced to single-digit nanometer dimensions for the circuit elements. The minimum feature size has reached subwavelength dimension. Many resolution enhancement techniques have been developed to extend the resolution limit of optical lithography systems, namely illumination optimization, phase-shifting masks, and proximity corrections. Needless to say, the actinic wavelength and the numerical aperture of the imaging lens have been reduced in stages, The most recent innovations are Immersion lithography and Extreme UV (EUV) lithography

In this presentation, the working principles, advantages, and challenges of immersion lithography are given. The defectivity issue is addressed by showing possible causes and solutions. The circuit design issues for pushing immersion lithography to single-digit nanometer delineation are presented.

Similarly, the working principles, advantages, and challenges of EUV lithography are given. There are special focusses on EUV power requirement, generation, and distribution; EUV mask components, absorber thickness, defects, flatness requirement, and pellicles; EUV resist challenges on sensitivity, line edge roughness, thickness, and etch resistance.

SESSION: Session 7: DFM, Reliability, and Electromigration

Advanced Design Methodologies for Directed Self-Assembly

  • Shao-Yun Fang

Directed self-assembly (DSA), which uses the segregation nature after an annealing process of block co-polymer (BCP) to generate tiny feature shapes, becomes one of the most promising next generation lithography technologies. According to the different proportions of the two monomers in an adopted BCP, either cylinders or lamellae can be generated by removing one of the two monomers, which are respectively referred to as cylindrical DSA and lamellar DSA. In addition, guiding templates are required to produce trenches before filling BCP such that the additional forces from the trench walls regulate the generated cylinders/lamellae. Both the two DSA technologies can be used to generate contact/via patterns in circuit layouts, while the practices of designing guiding templates are quite different due to different manufacturing principles. This paper reviews the existing studies on the guiding template design problem for contact/via hole fabrication with the DSA technology. The design constraints are differentiated and the design methodologies are respectively introduced for cylindrical DSA and lamellar DSA. Possible future research directions are finally suggested to further enhance contact/via manufacturability and the feasibility of adopting DSA in semiconductor manufacturing.

Challenges for Interconnect Reliability: From Element to System Level

  • Olalla Varela Pedreira
  • Houman Zahedmanesh
  • Youqi Ding
  • Ivan Ciofi
  • Kristof Croes

The high current densities carried by the interconnects have a direct impact on the back-end-of-line (BEOL) reliability degradation as they locally increase the temperature by Joule heating, and they lead to drift in the metal atoms. Local increase in temperature due to Joule heating will lead to thermal gradients along the interconnects inducing degradation through thermomigration. As the power density of the chip increases, thermal gradients may become a major reliability concern for scaled Cu interconnects. Therefore, it is of utmost relevance to fundamentally understand the impact of thermal gradients in metal migration. Our studies show that by using a combined modelling approach and a dedicated test structure we can assess the local temperatures and temperature gradients profiles. Moreover, with long-term experiments, we are able to successfully generate voids at the location of highest temperature gradients. Additionally, the main consequence of scaling the Cu interconnects is the dramatic drop of EM lifetime (Jmax). Currently the experimentally obtained EM parameters are used at system design level to set the current limits through the interconnect networks. However, this approach is very simplistic and neglects the benefits provided by the redundancy and interconnectivity from the network. Our studies by using a system-level physics-based EM simulation framework which can determine the EM induced IR drop at the standard cell level, show that the circuit reliability margins of the power delivery network (PDN) can be further relaxed.

Combined Modeling of Electromigration, Thermal and Stress Migration in AC Interconnect Lines

  • Susann Rothe
  • Jens Lienig

The migration of atoms in metal interconnects in integrated circuits (ICs) increasingly endangers chip reliability. The susceptibility of DC interconnects to electromigration has been extensively studied. A few works on thermal migration and AC electromigration are also available. Yet, the combined effect of both on chip reliability has been neglected thus far. This paper provides both FEM and analytical models for atomic migration and steady-state stress profiles in AC interconnects considering electromigration, thermal and stress migration combined. For this we expand existing models by the impact of self-healing, temperature-dependent resistivity, and short wire length. We conclude by analyzing the impact of thermal migration on interconnect robustness and show that it cannot be neglected any longer in migration robustness verification.

Recent Progress in the Analysis of Electromigration and Stress Migration in Large Multisegment Interconnects

  • Nestor Evmorfopoulos
  • Mohammad Abdullah Al Shohel
  • Olympia Axelou
  • Pavlos Stoikos
  • Vidya A. Chhabria
  • Sachin S. Sapatnekar

Traditional approaches to analyzing electromigration (EM) in on-chip interconnects are largely driven by semi-empirical models. However, such methods are inexact for the typical multisegment lines that are found in modern integrated circuits. This paper overviews recent advances in analyzing EM in on-chip interconnect structures based on physics-based models that use partial differential equations, with appropriate boundary conditions, to capture the impact of electron-wind and back-stress forces within an interconnect, across multiple wire segments. Methods for both steady-state and transient analysis are presented, highlighting approaches that can solve these problems with a computation time that is linear in the number of wire segments in the interconnect.

Electromigration Assessment in Power Grids with Account of Redundancy and Non-Uniform Temperature Distribution

  • Armen Kteyan
  • Valeriy Sukharev
  • Alexander Volkov
  • Jun Ho Choy
  • Farid N. Najm
  • Yong Hyeon Yi
  • Chris H. Kim
  • Stephane Moreau

A recently proposed methodology for electromigration (EM) assessment in on-chip power/ground grid of integrated circuits has been validated by means of measurements, performed on dedicated test grids. IR drop degradation in the grid is used for defining the EM failure criteria. Physics-based models are involved for simulation of EM-induced stress evolution in interconnect structures, void formation and evolution, resistance increase of the voided segments, and consequent re-distribution of electric current in the redundant grid paths. A grid-like test structure, fabricated with a 65 nm technology and consisting of two metal layers, allowed to calibrate the voiding models by tracking voltage evolution in all grid nodes in experiment and in simulation. Good fit of the measured and simulated time-to-failure (TTF) probability distribution was obtained in both cases of uniform and non-uniform temperature distribution across the grid. The second test grid was fabricated with a 28 nm technology, consisted of 4 metal layers, and contained power and ground nets connected to “quasi-cells” with poly-resistors, which were specially designed for operating at elevated temperatures ~350°C. The existing current distributions resulted in different behavior of EM-induced failures in these nets: a gradual voltage evolution in power net, and sharp changes in ground net were observed in experiment, and successfully reproduced in simulations.

SESSION: Session 8: Placement

Placement Initialization via Sequential Subspace Optimization with Sphere Constraints

  • Pengwen Chen
  • Chung-Kuan Cheng
  • Albert Chern
  • Chester Holtz
  • Aoxi Li
  • Yucheng Wang

State-of-the-art analytical placement algorithms for VLSI designs rely on solving nonlinear programs to minimize wirelength and cell congestion. As a consequence, the quality of solutions produced using these algorithms crucially depends on the initial cell coordinates. In this work, we reduce the problem of finding wirelength-minimal initial layouts subject to density and fixed-macro constraints to a Quadratically Constrained Quadratic Program (QCQP). We additionally propose an efficient sequential quadratic programming algorithm to recover a block-globally optimal solution and a subspace method to reduce the complexity of problem. We extend our formulation to facilitate direct minimization of the Half-Perimeter Wirelength (HPWL) by showing that a corresponding solution can be derived by solving a sequence of reweighted quadratic programs. Critically, our method is parameter-free, i.e. involves no hyperparameters to tune. We demonstrate that incorporating initial layouts produced by our algorithm with a global analytical placer results in improvements of up to 4.76% in post-detailed-placement wirelength on the ISPD’05 benchmark suite. Our code is available on github.

DREAM-GAN: Advancing DREAMPlace towards Commercial-Quality using Generative Adversarial Learning

  • Yi-Chen Lu
  • Haoxing Ren
  • Hao-Hsiang Hsiao
  • Sung Kyu Lim

DREAMPlace is a renowned open-source placer that provides GPU-acceleratable infrastructure for placements of Very-Large-Scale-Integration (VLSI) circuits. However, due to its limited focus on wirelength and density, existing placement solutions of DREAMPlace are not applicable to industrial design flows. To improve DREAMPlace towards commercial-quality without knowing the black-boxed algorithms of the tools, in this paper, we present DREAM-GAN, a placement optimization framework that advances DREAMPlace using generative adversarial learning. At each placement iteration, aside from optimizing the wirelength and density objectives of the vanilla DREAMPlace, DREAM-GAN computes and optimizes a differentiable loss that denotes the similarity score between the underlying placement and the tool-generated placements in commercial databases. Experimental results on 5 commercial and OpenCore designs using an industrial design flow implemented by Synopsys ICC2 not only demonstrate that DREAM-GAN significantly improves the vanilla DREAMPlace at the placement stage across each benchmark, but also show that the improvements last firmly to the post-route stage, where we observe improvements by up to 8.3% in wirelength and 7.4% in total power.

AutoDMP: Automated DREAMPlace-based Macro Placement

  • Anthony Agnesina
  • Puranjay Rajvanshi
  • Tian Yang
  • Geraldo Pradipta
  • Austin Jiao
  • Ben Keller
  • Brucek Khailany
  • Haoxing Ren

Macro placement is a critical very large-scale integration (VLSI) physical design problem that significantly impacts the design power-performance-area (PPA) metrics. This paper proposes AutoDMP, a methodology that leverages DREAMPlace, a GPU-accelerated placer, to place macros and standard cells concurrently in conjunction with automated parameter tuning using a multi-objective hyperparameter optimization technique. As a result, we can generate high-quality predictable solutions, improving the macro placement quality of academic benchmarks compared to baseline results generated from academic and commercial tools. AutoDMP is also computationally efficient, optimizing a design with 2.7 million cells and 320 macros in 3 hours on a single NVIDIA DGX Station A100. This work demonstrates the promise and potential of combining GPU-accelerated algorithms and ML techniques for VLSI design automation.

Assessment of Reinforcement Learning for Macro Placement

  • Chung-Kuan Cheng
  • Andrew B. Kahng
  • Sayak Kundu
  • Yucheng Wang
  • Zhiang Wang

We provide open, transparent implementation and assessment of Google Brain’s deep reinforcement learning approach to macro placement (Nature) and its Circuit Training (CT) implementation in GitHub. We implement in open-source key “blackbox” elements of CT, and clarify discrepancies between CT and Nature. New testcases on open enablements are developed and released. We assess CT alongside multiple alternative macro placers, with all evaluation flows and related scripts public in GitHub. Our experiments also encompass academic mixed-size placement benchmarks, as well as ablation and stability studies. We comment on the impact of Nature and CT, as well as directions for future research.

SESSION: Session 9: New Computing Techniques and Accelerators

GPU Acceleration in Physical Synthesis

  • Evangeline F.Y. Young

Placement and routing are essential steps in physical synthesis of VLSI designs. Modern circuits contain billions of cells and nets, which significantly increases the computational complexity of physical synthesis and brings big challenges to leading-edge physical design tools. With the fast development of GPU architecture and computational power, it becomes an important direction to explore speeding up physical synthesis with massive parallelism on GPU. In this talk, we will look into opportunities to improve EDA algorithms with GPU acceleration. Traditional EDA tools run on CPU with limited degree of parallelism. We will investigate a few examples of accelerating some classical algorithms in placement and routing using GPU. We will see how one can leverage the power of GPU to improve both quality and computational time in solving these EDA problems.

Efficient Runtime Power Modeling with On-Chip Power Meters

  • Zhiyao Xie

Accurate and efficient power modeling techniques are crucial for both design-time power optimization and runtime on-chip IC management. In prior research, different types of power modeling solutions have been proposed, optimizing multiple objectives including accuracy, efficiency, temporal resolution, and automation level, targeting various power/voltage-related applications. Despite extensive prior explorations in this topic, new solutions still keep emerging and achieve state-of-the-art performance. This paper aims at providing a review of the recent progress in power modeling, with more focus on runtime on-chip power meter (OPM) development techniques. It also serves as a vehicle for discussing some general development techniques for the runtime on-chip power modeling task.

DREAMPlaceFPGA-PL: An Open-Source GPU-Accelerated Packer-Legalizer for Heterogeneous FPGAs

  • Rachel Selina Rajarathnam
  • Zixuan Jiang
  • Mahesh A. Iyer
  • David Z. Pan

Placement plays a pivotal and strategic role in the FPGA implementation flow to allocate the physical locations of the heterogeneous instances in the design. Among the placement stages, the packing or clustering stage groups logic instances like look-up tables (LUTs) and flip-flops (FFs) that could be placed on the same site. The legalization stage determines all instances’ physical site locations. With advances in FPGA architecture and technology nodes, designs contain millions of logic instances, and placement algorithms must scale accordingly. While other placement stages – global placement and detailed placement, have been accelerated using GPUs, the acceleration of packing and legalization stages on a GPU remains largely unexplored. This work presents DREAMPlaceFPGA-PL, an open-source packer-legalizer for heterogeneous FPGAs that employs GPU for acceleration. We revise the existing consensus-based parallel algorithms employed for packing and legalizing a flat placement to obtain further speedup on a GPU. Our experiments on the ISPD’2016 benchmarks demonstrate more than 2× acceleration.

SESSION: Session 10: Lifetime Achievement Commemoration for Professor Malgorzata Marek-Sadowska

Building Oscillatory Neural Networks: AI Applications and Physical Design Challenges

  • Aida Todri-Sanial

This talk is about a novel computing paradigm based on coupled oscillatory neural networks. Oscillatory neural networks (ONNs) are recurrent neural networks where each neuron is an oscillator and oscillator couplings are the synaptic weights. Inspired by Hopfield Neural Networks, ONNs make use of nonlinear dynamics to compute and solve computational problems such as associative memory tasks and combinatorial optimization problems difficult to address with conventional digital computers. An exciting direction in recent years has been to implement Ising machines based on the Ising model of coupled binary spins on magnets. In this talk, I cover the design aspects of building ONNs from devices to architecture to allow to benefit from the parallel computations with oscillators while implementing them in an energy efficient way.

Optimization of AI SoC with Compiler-assisted Virtual Design Platform

  • Chih-Tsun Huang
  • Juin-Ming Lu
  • Yao-Hua Chen
  • Ming-Chih Tung
  • Shih-Chieh Chang

As deep learning keeps evolving dramatically with rapidly increasing complexity, the demand for efficient hardware accelerators has become vital. However, the lack of software/hardware co-development toolchains makes designing AI SoCs (artificial intelligent system-on-chips) considerably challenging. This paper presents a compiler-assisted virtual platform to facilitate the development of AI SoCs from the early design stage. The electronic system-level design platform provides rapid functional verification and performance/energy analysis. Cooperating with the neural network compiler, AI software and hardware can be co-optimized on the proposed virtual design platform. Our Deep Inference Processor is also utilized on the virtual design platform to demonstrate the effectiveness of the architectural evaluation and exploration methodology.

Challenges and Opportunities for Computing-in-Memory Chips

  • Xiang Qiu

In recent years, artificial neural networks have been applied to many scenarios, from daily life applications like face detection, to industry problems like placement and routing in physical design. Neural network inference mainly contains multiply-accumulate operations, which requires huge amount of data movement. Traditional Von-Neumann architecture computers are inefficient for neural networks as they have separate CPU and memory, and data transfer between them costs excessive energy and performance. To address this problem, in-memory or near-memory computing have been proposed and attracted much attention in both academic and industry. In this talk, we will give a brief review of non-volatile memory crossbar-based computing-in-memory architecture. Next, we will demonstrate the challenges for chips with such architecture to replace current CPUs/GPUs for neural network processing, from an industry perspective. Lastly, we will discuss possible solutions for those challenges.

ISPD 2023 Lifetime Achievement Award Bio

  • Malgorzata Marek-Sadowska

The 2023 International Symposium on Physical Design lifetime achievement award goes to Professor Malgorzata Marek-Sadowska for her outstanding contributions to the field.

SESSION: Session 11: Keynote III

Neural Operators for Solving PDEs and Inverse Design

  • Anima Anandkumar

Deep learning surrogate models have shown promise in modeling complex physical phenomena such as photonics, fluid flows, molecular dynamics and material properties. However, standard neural networks assume finite-dimensional inputs and outputs, and hence, cannot withstand a change in resolution or discretization between training and testing. We introduce Fourier neural operators that can learn operators, which are mappings between infinite dimensional spaces. They are discretization-invariant and can generalize beyond the discretization or resolution of training data. They can efficiently solve partial differential equations (PDEs) on general geometries. We consider a variety of PDEs for both forward modeling and inverse design problems, as well as show practical gains in the lithography domain.

SESSION: Session 12: Quantum Computing

Quantum Challenges for EDA

  • Leon Stok

Though early in its development, quantum computing is now available on real hardware and via the cloud through IBM Quantum. This radically new kind of computing holds open the possibility of solving some problems that are now and perhaps always will be intractable for “classical” computers.

As with any new technology things are developing rapidly but there are still a lot of open questions. What is the status of Quantum computers today? What are the key metrics we need to look at to improve a Quantum System? What are some of the technical opportunities being looked at from an EDA perspective.

We will look at the Quantum Roadmap for the next couple of years and outline challenges that need to be solved and how the EDA community can potentially contribute to solve these challenges.

Developing Quantum Workloads for Workload-Driven Co-design

  • Anne Matsuura

Quantum computing offers the future promise of solving problems that are intractable for classical computers today. However, as an entirely new kind of computational device, we must learn how to best develop useful workloads. Today’s small workloads serve the dual purpose that they can also be used to learn how to design a better quantum computing system architecture. At Intel Labs, we develop small application-oriented workloads and use them to drive research into the design of a scalable quantum computing system architecture. We run these small workloads on the small systems of qubits that we have today to understand what is required from the system architecture to run them efficiently and accurately on real qubits. In this presentation, I will give examples of quantum workload-driven co-design and what we have learned from this type of research.

MQT QMAP: Efficient Quantum Circuit Mapping

  • Robert Wille
  • Lukas Burgholzer

Quantum computing is an emerging technology that has the potential to revolutionize fields such as cryptography, machine learning, optimization, and quantum simulation. However, a major challenge in the realization of quantum algorithms on actual machines is ensuring that the gates in a quantum circuit (i.e., corresponding operations) match the topology of a targeted architecture so that the circuit can be executed while, at the same time, the resulting costs (e.g., in terms of the number of additionally introduced gates, fidelity, etc.) are kept low. This is known as the quantum circuit mapping problem. This summary paper provides an overview of QMAP-an open-source tool that is part of the Munich Quantum Toolkit (MQT) and offers efficient, automated, and accessible methods for tackling this problem. To this end, the paper first briefly reviews the problem. Afterwards, it shows how QMAP can be used to efficiently map quantum circuits to quantum computing architectures from both a user’s and a developer’s perspective. QMAP is publicly available as open-source at

SESSION: Session 13: Panel on EDA for Domain Specific Computing

EDA for Domain Specific Computing: An Introduction for the Panel

  • Iris Hui-Ru Jiang
  • David Chinnery

This panel explores domain-specific computing from hardware, software, and electronic design automation (EDA) perspectives.

Hennessey and Patterson signaled a new “golden age of computer architecture” in 2018 [1]. Process technology advances and general-purpose processor improvements provided much faster and more efficient computation, but scaling with Moore’s law has slowed significantly. Domain-specific customization can improve power-performance efficiency by orders-of-magnitude for important application domains, such as graphics, deep neural networks (DNN) for machine learning [2], simulation, bioinformatics [3], image processing, and many other tasks.

The common features of domain-specific architectures are: 1) dedicated memories to minimize data movement across chip; 2) more arithmetic units or bigger memories; 3) use of parallelism matching the domain; 4) smaller data types appropriate for the target applications; and 5) domain-specific software languages. Expediting software development with optimized compilation for efficient fast computation on heterogeneous architectures is a difficult task, and must be considered with the hardware design. For example, GPU programming has used CUDA and OpenCL.

The hardware comprises application-specific integrated circuits (ASICs) [4] and systems-of-chips (SoCs). General-purpose processor cores are often combined with graphics processing units (GPUs) for stream processing, digital signal processors, field programmable gate arrays (FPGAs) for configurability [5], artificial intelligence (AI) acceleration hardware, and so forth.

Domain-specific computers have been deployed recently. For example: the Google Tensor Processing Unit (DNN ASIC) [6]; Microsoft Catapult (FPGA-based cloud domain-service solution) [7]; Intel Crest (DNN ASIC) [8]; Google Pixel Visual Core (image processing and computer vision for cell phones and tablets) [9]; and the RISC-V architecture and open instruction set for heterogeneous computing [10].

Software-driven Design for Domain-specific Compute

  • Desmond A. Kirkpatrick

The end of Dennard scaling has created a focus on advancing domain-specific computing; we are seeing a renaissance of accelerating compute problems through specialization, with orders-of-magnitude improvement in performance and energy efficiency [1]. Domain-specific compute, with its wide proliferation of domains and narrow specialization of hardware and software, provides unique challenges in design automation not met by the methodologies matured under the model of high-volume manufacturing of competitive CPUs, GPUS, and SOCs [2]. Importantly, domain-specific compute targets smaller markets that move more rapidly so design NRE plays a much larger role. Secondly, the role of software is so much more significant that we believe a software-first approach, where software drives hardware design and the product is developed at the speed of software, is required to keep pace with domain-specific compute market requirements. This creates significant new challenges and opportunities for EDA to address the domain-specific compute design space. The forces that are driving the renaissance in domain-specific compute architectures also require a renaissance in the tools, flows, and methods to maintain this pace of innovation.

This talk will present a general framework for approaching automation of domain-specific compute co-design of SW/HW and draw upon recent innovations in EDA that can help us address this challenge. The focus will be on driving software-oriented techniques, such as agile design, into hardware design [3], as well as vertically oriented domain-specific codesign automation stacks [4], and some of the gaps in EDA that currently limit these approaches.

Google Investment in Open Source Custom Hardware Development Including No-Cost Shuttle Program

  • Tim Ansell

The end of Moore’s Law combined with unabated growth in usage have forced Google to turn to hardware acceleration to deliver efficiency gains to meet demand. Traditional hardware design methodology for accelerators is practical when there’s a common core – such as with Machine Learning (ML) or video transcoding, but what about the hundreds of smaller tasks performed in Google data centers? Our vision is “software-speed” development for hardware acceleration so that it becomes commonplace and, frankly, boring. Toward this goal Google is investing in open tooling to foster innovation in multiplying accelerator developer productivity.

Tim Ansell will provide an outline of these coordinated open source projects in EDA (including high level synthesis), IP, PDKs, and related areas. This will be followed by presenting the CFU (Custom Function Unit) Playground, which utilizes many of these projects.

The CFU Playground lets you build your own specialized & optimized ML processor based on the open RISC-V ISA, implemented on an FPGA using a fully open source stack. The goal isn’t general ML extensions; it’s about a methodology for building your own extension specialized just for your specific tiny ML model. The extension can range from a few simple new instructions, up to a complex accelerator that interfaces to the CPU via a set of custom instructions; we will show examples of both.

A Case for Open EDA Verticals

  • Zhiru Zhang
  • Matthew Hofmann
  • Andrew Butt

With the end of Dennard scaling and Moore’s Law reaching its limits, domain-specific hardware specialization has become a crucial method for improving compute performance and efficiency for various important applications. Leading companies in competitive fields, such as machine learning and video processing, are building their own in-house technology stacks to better suit their accelerator design needs. However, currently this approach is only a viable option for a few large enterprises that can afford to invest in teams of experts in hardware, systems, and compiler development for high-value applications. In particular, the high license cost of commercial electronic design automation (EDA) tools presents a significant barrier for small and mid-size engineering teams to create new hardware accelerators. These tools are essential for designing, simulating, and testing new hardware, but can be too expensive for smaller teams with limited budgets, reducing their ability to innovate and compete with larger organizations.

More recently, open-source EDA toolflows [1] [12] [11] [5] have emerged which offer a promising alternative to commercial tools, with the potential to provide more cost-effective solutions for hardware development. For example, OpenROAD [1] allows the design of custom ASICs with minimal human intervention and no licensing fees. During initial development, it was also able to take advantage of existing tools such as Yosys [14] and KLayout [6] to reduce the amount of new code required to get a working flow. However, early adoption of open-source alternatives carries risk, as open-source EDA projects often lack important features and are less reliable than commercial options. Additionally, current open-source EDA tools may produce less competitive quality of results (QoR) and may not be able to catch up to commercial solutions anytime soon. Even when EDA tool access is not an issue, designing and implementing special-purpose accelerators using conventional RTL methodology can be unproductive and incurs high non-recurring engineering (NRE) costs. High-level synthesis (HLS) has become increasingly popular in both academia and industry to automatically generate RTL designs from software programs. However, existing HLS tools do not help maintain domain-specific context throughout the design flow (e.g., placement, routing), which makes achieving good QoR difficult without significant manual fine-tuning. This hinders wider adoption of HLS.

We advocate for open EDA verticals as a solution to enabling more widespread use of domain-specific hardware acceleration. The objective is to empower small teams of domain experts to productively develop high-performance accelerators using programming interfaces they are already familiar with. For example, this means supporting domain-specific frameworks like PyTorch or TensorFlow for ML applications. In order for EDA verticals to proliferate, there must first be extensible infrastructure similar to LLVM [8] and MLIR [9] from which to build new tool flows. The proper EDA infrastructure would include novel intermediate representations specifically tailored to the unique challenges in gradually lowering high-level code down to gates.

Addressing the EDA Roadblocks for Domain-specific Compilers: An Industry Perspective

  • Alireza Kaviani

Computer architects are now widely subscribed to domain-specific architectures as being the only path left for major improvements in performance-cost-energy. As a result, future compilers need to go beyond their traditional role of mapping a design input to a generic hardware platform. Emerging domain-specific compilers must subscribe to a broader view in which compilers provide more control to the end users, enabling customization of hardware components to implement their corresponding tasks. Transitioning into this new design paradigm, where control and customization are key enablers, poses new challenges for domain-specific compiler.

Today, generic vendor backend EDA compilers are the only available mechanism to realize a broad range of applications in many domains. The necessity of breadth coverage by commercial tools often leads to implementations that do not take full advantage of the underlying hardware. Domain-specific compilers, on the other hand, can potentially deliver near-spec performance by taking advantage of both application attributes and architecture details. This issue is less pronounced for more generic computing platforms such CPUs due to leveraging open source as an essential component of software development. However, quality EDA software has remained mostly proprietary. Existing open-source attempts do not produce quality results to be useful commercially at scale. Addressing the EDA roadblocks towards quality domain-specific compilers will require stepping milestones from both industry and community.

This suggests the need for a framework capable of interfacing between closed source vendor backend tools and open-source domain compilers. RapidWright [1] is an example of such framework that enables a new level of optimization and customization for the application architect to further exploit FPGA silicon capabilities focusing on a specific domain.

There are a few factors that will expedite the progress for this approach. For example, RapidStream [2] demonstrates 30% higher performance and more than 5X faster compile time for data flow applications. The key enabler for RapidStream domain compiler is the split-compilation that was made possible for data flow applications with a latency-tolerant front-end and design entry. EDA vendors could enable such bottom-up flows by implementing a foundational infrastructure that allows multiple application modules to be implemented independently. Another useful step would be to decouple certain portions of monolithic EDA tools with separate more permissible licensing to be combined with open-source domain compilers.

Another key step that is required for domain-specific compilers to be successful is a process to offer a guarantee to the end customer. Today’s vendor tool flow offers full guarantee and support to the end customer at the expense of limiting the customization and control. The new paradigm of domain-specific compilers implies many variations of the tool flow, and it might not be feasible to provide the same level of support and guarantee as existing standard flows. The community needs to explore alternative ways of offering an equivalent level of support and guarantee to the end users in order to make domain-specific compilers widely adopted.

High-level Synthesis for Domain Specific Computing

  • Hanchen Ye
  • Hyegang Jun
  • Jin Yang
  • Deming Chen

This paper proposes a High-Level Synthesis (HLS) framework for domain-specific computing. The framework contains three key components: 1) ScaleHLS, a multi-level HLS compilation flow. Aimed to address the lack of expressiveness and hardware-dedicated representation of traditional software-oriented compilers. ScaleHLS introduces a hierarchical intermediate representation (IR) for the progressive optimization of HLS designs defined in various high-level languages. ScaleHLS consists of three levels of optimizations, including graph, loop, and directive levels, to realize an efficient compilation pipeline and generate highly-optimized domain-specific accelerators. 2) AutoScaleDSE is an automated design space exploration (DSE) engine. Real-world HLS designs often come with large design spaces that are difficult for designers to explore. Meanwhile, the connections between different components of an HLS design further complicate the design spaces. In order to address the DSE problem, AutoScaleDSE proposes a random forest classifier and a graph-driven approach to improve the accuracy of estimating the intermediate DSE results while reducing the time and computational cost. With this new approach, AutoScaleDSE can evaluate thousands of HLS design points and find the Pareto-dominating design points within a couple of hours. 3) PyTransform is a flexible pattern-driven design customization flow. Existing HLS flows demand manual code rewriting or intrusive compiler customization to conduct domain-specific optimizations, leading to unscalable or inflexible compiler solutions. PyTransform proposes a Python-based flow that enables users to define custom matching and rewriting patterns at a high level of abstraction, being able to be incorporated into the DSL compilation flow in an automatic and scalable manner. In summary, ScaleHLS, AutoScaleDSE, and PyTransform aim to address the challenges present in the compilation, DSE, and customization of existing HLS flows, respectively. With the three key components, our newly proposed HLS framework can deliver a scalable and extensible solution for designing domain-specific languages to automate and speed up the process of designing domain-specific accelerators.

SESSION: Session 14: Hardware Security and Bug Fixing

Security-aware Physical Design against Trojan Insertion, Frontside Probing, and Fault Injection Attacks

  • Jhih-Wei Hsu
  • Kuan-Cheng Chen
  • Yan-Syuan Chen
  • Yu-Hsiang Lo
  • Yao-Wen Chang

The dramatic growth of hardware attacks and the lack of security-concern solutions in design tools lead to severe security problems in modern IC designs. Although many existing countermeasures provide decent protection against security issues, they still lack the global design view with sufficient security consideration in design time. This paper proposes a security-aware framework against Trojan insertion, frontside probing, and fault injection attacks at the design stage. The framework consists of two major techniques: (1) a large-scale shielding method that effectively covers the exposed areas of assets and (2) a cell-movement-based method to eliminate the empty spaces vulnerable to Trojan insertion. Experimental results show that our framework effectively reduces the vulnerability of these attacks and achieves the best overall score compared with the top-3 teams in the 2022 ACM ISPD Security Closure of Physical Layouts Contest.

Security Closure of IC Layouts Against Hardware Trojans

  • Fangzhou Wang
  • Qijing Wang
  • Bangqi Fu
  • Shui Jiang
  • Xiaopeng Zhang
  • Lilas Alrahis
  • Ozgur Sinanoglu
  • Johann Knechtel
  • Tsung-Yi Ho
  • Evangeline F.Y. Young

Due to cost benefits, supply chains of integrated circuits (ICs) are largely outsourced nowadays. However, passing ICs through various third-party providers gives rise to many threats, like piracy of IC intellectual property or insertion of hardware Trojans, i.e., malicious circuit modifications.

In this work, we proactively and systematically harden the physical layouts of ICs against post-design insertion of Trojans. Toward that end, we propose a multiplexer-based logic-locking scheme that is (i) devised for layout-level Trojan prevention, (ii) resilient against state-of-the-art, oracle-less machine learning attacks, and (iii) fully integrated into a tailored, yet generic, commercial-grade design flow. Our work provides in-depth security and layout analysis on a challenging benchmark suite. We show that ours can render layouts resilient, with reasonable overheads, against Trojan insertion in general and also against second-order attacks (i.e., adversaries seeking to bypass the locking defense in an oracle-less setting).

We release our layout artifacts for independent verification[29].

X-Volt: Joint Tuning of Driver Strengths and Supply Voltages Against Power Side-Channel Attacks

  • Saideep Sreekumar
  • Mohammed Ashraf
  • Mohammed Nabeel
  • Ozgur Sinanoglu
  • Johann Knechtel

Power side-channel (PSC) attacks are well-known threats to sensitive hardware like advanced encryption standard (AES) crypto cores. Given the significant impact of supply voltages (VCCs) on power profiles, various countermeasures based on VCC tuning have been proposed, among other defense strategies. Driver strengths of cells, however, have been largely overlooked, despite having direct and significant impact on power profiles as well.

For the first time, we thoroughly explore the prospects of jointly tuning driver strengths and VCCs as novel working principle for PSC-attack countermeasures. Toward this end, we take the following steps: 1) we develop a simple circuit-level scheme for tuning; 2) we implement a CAD flow for design-time evaluation of ASICs, enabling security assessment of ICs before tape-out; 3) we implement a correlation power analysis (CPA) framework for thorough and comparative security analysis; 4) we conduct an extensive experimental study of a regular AES design, implemented in ASIC as well as FPGA fabrics, under various tuning scenarios; 5) we summarize design guidelines for secure and efficient joint tuning.

In our experiments, we observe that runtime tuning is more effective than static tuning, for both ASIC and FPGA implementations. For the latter, the AES core is rendered > 11.8x (i.e., at least 11.8 times) as resilient as the untuned baseline design. Layout overheads can be considered acceptable, with, e.g., around +10% critical-path delay for the most resilient tuning scenario in FPGA.

We release source codes for our methodology, as well as artifacts from the experimental study in[13].

Validating the Redundancy Assumption for HDL from Code Clone’s Perspective

  • Jianjun Xu
  • Jiayu He
  • Jingyan Zhang
  • Deheng Yang
  • Jiang Wu
  • Xiaoguang Mao

Automated program repair (APR) is being leveraged in hardware description languages (HDLs) to fix hardware bugs without human involvement. Most existing APR techniques search for donor code (i.e., code fragment for bug fixing) in the original program to generate repairs, which is based on the assumption that donor code can be found in existing source code. The redundancy assumption is the fundamental basis of most APR techniques, which has been widely studied in software by searching code clones of donor code. However, despite a large body of work on code clone detection, researchers have focused almost exclusively on repositories in traditional programming languages, such as C/C++ and Java, while few studies have been done on detecting code clones in HDLs. Furthermore, little attention has been paid on the repetitiveness of bug fixes in hardware designs, which limits automatic repair targeting HDLs. To validate the redundancy assumption for HDL, we perform an empirical study on code clones of real-world bug fixes in Verilog. On top of empirical results, we find that 17.71% of newly introduced code in bug fixes can be found from the clone pairs of buggy code in the original program, and 11.77% can be found in the file itself. The findings not only validate the assumption but also provides helpful insights for the design of APR targeting HDLs.

SESSION: Session 15: ISPD 2023 Contest Results and Closing Remarks

Benchmarking Advanced Security Closure of Physical Layouts: ISPD 2023 Contest

  • Mohammad Eslami
  • Johann Knechtel
  • Ozgur Sinanoglu
  • Ramesh Karri
  • Samuel Pagliarini

Computer-aided design (CAD) tools traditionally optimize “only” for power, performance, and area (PPA). However, given the wide range of hardware-security threats that have emerged, future CAD flows must also incorporate techniques for designing secure and trustworthy integrated circuits (ICs). This is because threats that are not addressed during design time will inevitably be exploited in the field, where system vulnerabilities induced by ICs are almost impossible to fix. However, there is currently little experience for designing secure ICs within the CAD community.

This contest seeks to actively engage with the community to close this gap. The theme is security closure of physical layouts, that is, hardening the physical layouts at design time against threats that are executed post-design time. Acting as security engineers, contest participants will proactively analyse and fix the vulnerabilities of benchmark layouts in a blue-team approach. Benchmarks and submissions are based on the generic DEF format and related files.

This contest is focused on the threat of Trojans, with challenging aspects for physical design in general and for hindering Trojan insertion in particular. For one, layouts are based on the ASAP7 library and rules are strict, e.g., no DRC issues and no timing violations are allowed at all. In the alpha/qualifying round, submissions are evaluated using first-order metrics focused on exploitable placement and routing resources, whereas in the final round, submissions are thoroughly evaluated (red-teamed) through actual insertion of different Trojans.