MLCAD’22 TOC

Proceedings of the 2022 ACM/IEEE Workshop on Machine Learning for CAD

Full Citation in the ACM Digital Library

SESSION: Session 1: Physical Design and Optimization with ML

Placement Optimization via PPA-Directed Graph Clustering

  • Yi-Chen Lu
  • Tian Yang
  • Sung Kyu Lim
  • Haoxing Ren

In this paper, we present the first Power, Performance, and Area (PPA)-directed, end-to-end placement optimization framework that provides cell clustering constraints as placement guidance to advance commercial placers. Specifically, we formulate PPA metrics as Machine Learning (ML) loss functions, and use graph clustering techniques to optimize them by improving clustering assignments. Experimental results on 5 GPU/CPU designs in a 5nm technology not only show that our framework immediately improves the PPA metrics at the placement stage, but also demonstrate that the improvements last firmly to the post-route stage, where we observe improvements of 89% in total negative slack (TNS), 26% in effective frequency, 2.4% in wirelength, and 1.4% in clock power.

From Global Route to Detailed Route: ML for Fast and Accurate Wire Parasitics and Timing Prediction

  • Vidya A. Chhabria
  • Wenjing Jiang
  • Andrew B. Kahng
  • Sachin S. Sapatnekar

Timing prediction and optimization are challenging in design stages prior to detailed routing (DR) due to the unavailability of routing information. Inaccurate timing prediction wastes design effort, hurts circuit performance, and may lead to design failure. This work focuses on timing prediction after clock tree synthesis and placement legalization, which is the earliest opportunity to time and optimize a “complete” netlist. The paper first documents that having “oracle knowledge” of the final post-DR parasitics enables post-global routing (GR) optimization to produce improved final timing outcomes. Machine learning (ML)-based models are proposed to bridge the gap between GR-based parasitic and timing estimation and post-DR results during post-GR optimization. These models show higher accuracy than GR-based timing estimation and, when used during post-GR optimization, show demonstrable improvements in post-DR circuit performance. Results on open 45nm and 130nm enablements using OpenROAD show efficient improvements in post-DR WNS and TNS metrics without increasing congestion.

Faster FPGA Routing by Forecasting and Pre-Loading Congestion Information

  • Umair Siddiqi
  • Timothy Martin
  • Sam Van Den Eijnden
  • Ahmed Shamli
  • Gary Grewal
  • Sadiq Sait
  • Shawki Areibi

Field Programmable Gate Array (FPGA) routing is one of the most time consuming tasks within the FPGA design flow, requiring hours and even days to complete for some large industrial designs. This is becoming a major concern for FPGA users and tool developers. This paper proposes a simple, yet effective, framework that reduces the runtime of PathFinder based routers. A supervised Machine Learning (ML) algorithm is developed to forecast costs (from the placement phase) associated with possible congestion and hot spot creation in the routing phase. These predicted costs are used to guide the router to avoid highly congested regions while routing nets, thus reducing the total number of iterations and rip-up and reroute operations involved. Results obtained indicate that the proposed ML approach achieves on average a 43 reduction in the number of routing iterations and 28.6 reduction in runtime when implemented in the state-of-the-art enhanced PathFinder algorithm.

SESSION: Session 2: Machine Learning for Analog Design

Deep Reinforcement Learning for Analog Circuit Sizing with an Electrical Design Space and Sparse Rewards

  • Yannick Uhlmann
  • Michael Essich
  • Lennart Bramlage
  • Jürgen Scheible
  • Cristóbal Curio

There is still a great reliance on human expert knowledge during the analog integrated circuit sizing design phase due to its complexity and scale, with the result that there is a very low level of automation associated with it. Current research shows that reinforcement learning is a promising approach for addressing this issue. Similarly, it has been shown that the convergence of conventional optimization approaches can be improved by transforming the design space from the geometrical domain into the electrical domain. Here, this design space transformation is employed as an alternative action space for deep reinforcement learning agents. The presented approach is based entirely on reinforcement learning, whereby agents are trained in the craft of analog circuit sizing without explicit expert guidance. After training and evaluating agents on circuits of varying complexity, their behavior when confronted with a different technology, is examined, showing the applicability, feasibility as well as transferability of this approach.

LinEasyBO: Scalable Bayesian Optimization Approach for Analog Circuit Synthesis via One-Dimensional Subspaces

  • Shuhan Zhang
  • Fan Yang
  • Changhao Yan
  • Dian Zhou
  • Xuan Zeng

A large body of literature has proved that the Bayesian optimization framework is especially efficient and effective in analog circuit synthesis. However, most of the previous research works only focus on designing informative surrogate models or efficient acquisition functions. Even if searching for the global optimum over the acquisition function surface is itself a difficult task, it has been largely ignored. In this paper, we propose a fast and robust Bayesian optimization approach via one-dimensional subspaces for analog circuit synthesis. By solely focusing on optimizing one-dimension subspaces at each iteration, we greatly reduce the computational overhead of the Bayesian optimization framework while safely maximizing the acquisition function. By combining the benefits of different dimension selection strategies, we adaptively balancing between searching globally and locally. By leveraging the batch Bayesian optimization framework, we further accelerate the optimization procedure by making full use of the hardware resources. Experimental results quantitatively show that our proposed algorithm can accelerate the optimization procedure by up to $9\times$ and $38\times$ compared to LP-EI and REMBOpBO respectively when the batch size is 15.

RobustAnalog: Fast Variation-Aware Analog Circuit Design Via Multi-task RL

  • Wei Shi
  • Hanrui Wang
  • Jiaqi Gu
  • Mingjie Liu
  • David Z. Pan
  • Song Han
  • Nan Sun

Analog/mixed-signal circuit design is one of the most complex and time-consuming stages in the whole chip design process. Due to various process, voltage, and temperature (PVT) variations from chip manufacturing, analog circuits inevitably suffer from performance degradation. Although there has been plenty of work on automating analog circuit design under the nominal condition, limited research has been done on exploring robust designs under the real and unpredictable silicon variations. Automatic analog design against variations requires prohibitive computation and time costs. To address the challenge, we present RobustAnalog, a robust circuit design framework that involves the variation information in the optimization process. Specifically, circuit optimizations under different variations are considered as a set of tasks. Similarities among tasks are leveraged and competitions are alleviated to realize a sample-efficient multi-task training. Moreover, RobustAnalog prunes the task space according to the current performance in each iteration, leading to a further simulation cost reduction. In this way, RobustAnalog can rapidly produce a set of circuit parameters that satisfies diverse constraints (e.g. gain, bandwidth, noise…) across variations. We compare RobustAnalog with Bayesian optimization, Evolutionary algorithm, and Deep Deterministic Policy Gradient (DDPG) and demonstrate that RobustAnalog can significantly reduce required the optimization time by 14x-30x. Therefore, our study provides a feasible method to handle various real silicon conditions.

Automatic Analog Schematic Diagram Generation based on Building Block Classification and Reinforcement Learning

  • Hung-Yun Hsu
  • Mark Po-Hung Lin

Schematic visualization is important for analog circuit designers to quickly recognize the structures and functions of transistor-level circuit netlists. However, most of the original analog design or other automatically extracted analog circuits are stored in the form of transistor-level netlists in the SPICE format. It can be error-prone and time-consuming to manually create an elegant and readable schematic from a netlist. Different from the conventional graph-based methods, this paper introduces a novel analog schematic diagram generation flow based on comprehensive building block classification and reinforcement learning. The experimental results show that the proposed method can effectively generate aesthetic analog circuit schematics with a higher building block compliance rate, and fewer numbers of wire bends and net crossings, resulting in better readability, compared with existing methods and modern tools.

SESSION: Plenary I

The Changing Landscape of AI-driven System Optimization for Complex Combinatorial Optimization

  • Somdeb Majumdar

With the unprecedented success of modern machine learning in areas like computer vision and natural language processing, a natural question is where can it have maximum impact in real life. At Intel Labs, we are actively investing in research that leverages the robustness and generalizability of deep learning to solve system optimization problems. Examples of such systems include individual hardware modules like memory schedulers and power management units on a chip, automated compiler and software design tools as well as broader problems like chip design. In this talk, I will address some of the open challenges in systems optimization and how Intel and others in the research community are harnessing the power of modern reinforcement learning to address those challenges. A particular aspect of problems in the domain of chip design is the very large combinatorial complexity of the solution space. For example, the number of possible ways to place standard cells and macros on a canvas for even small to medium sized netlists can approach 10100 to 101000. Importantly, only a very small subset of these possible outcomes are actually valid and performant.

Standard approaches like reinforcement learning struggle to learn effective policies under such conditions. For example, a sequential placement policy can get a reinforcing reward signal only after having taken several thousand individual placement actions. This reward is inherently noisy – especially when we need to assign credit to the earliest steps of the multi-step placement episode. This is an example of the classic credit assessment problem in reinforcement learning.

A different way to tackle such problems is to simply search over the solution space. Many approaches exist ranging from Genetic Algorithms to Monte Carlo Tree Search. However, they suffer from very slow convergence times due to the size of the search space.

In order to tackle such problems, we investigate an approach that combines the fast learning capabilities of reinforcement learning and the ability of search based methods to find performant solutions. We use deep reinforcement learning to strategies that are sub-optimal but quick to find. We use these partial solutions as anchors around which we constrain a genetic algorithm based search. This allows us to still exploit the power of genetic algorithms to find performant solutions while significantly reducing the overall search time.

I will describe this solution in the context of combinatorial optimization problems like device placement where we show the ability to learn effective strategies on combinatorial complexities of up to 10300. We also show that by representing these policies as neural networks, we are able to achieve reasonably good zero shot transfer learning performance on unseen problem configurations. Finally, I will touch upon how we are adapting this framework to handle similar combinatoric optimization problems for placement in EDA pipelines.

SESSION: Invited Session I

AI Chips Built by AI – Promise or Reality?: An Industry Perspective

  • Thomas Andersen

Artificial Intelligence is an avenue to innovation that is touching every industry worldwide. AI has made rapid advances in areas like speech and image recognition, gaming, and even self-driving cars, essentially automating less complex human tasks. In turn, this demand drives rapid growth across the semiconductor industry with new chip architectures emerging to deliver the specialized processing needed for the huge breadth of AI applications. Given the advances made to automate simple human tasks, can AI solve more complex tasks such as designing a computer chip? In this talk, we will discuss the challenges and opportunities of building advanced chip designs with the help of artificial intelligence, enabling higher performance, faster time to market, and utilizing reuse of machine-generated learning for successive products.

ML for Analog Design: Good Progress, but More to Do

  • Borivoje Nikolić

Analog and mixed-signal (AMS) blocks are often critical and time-consuming part of System-on-Chip (SoC) design, due to the largely manual process of circuit design, simulation and SoC integration iterations. There have been numerous efforts to realize AMS blocks from specification by using a process analogous to digital synthesis, with automated place and route techniques [1], [2], but although very effective within their application domains, they have been limited in scope. AMS block design process, outlined in Figure 1, starts with the derivation of its target performance specifications (gain, bandwidth, phase margin, settling time, etc.) from system requirements, and establishes a simulation testbench. Then, a designer relies on their expertise to choose the topology that is most likely to achieve the desired performance with minimum power consumption. Circuit sizing is a process of determining schematic-level transistor widths and lengths to attain the specifications, with minimum power consumption. Many of the commonly used analog circuits can be sized by using well-established heuristics to achieve near-optimal performance [3]-[5]. The performance is verified by running simulations, and there has been a notable progress in enriching the commercial simulators to automate the testbench design. Machine learning (ML) based techniques have recently been deployed in circuit sizing to achieve optimality without relying on design heuristics [6]-[8]. Many of the commonly employed ML techniques require a rich training dataset; reinforcement learning (RL) sidesteps this issue by using agent that interacts with its simulation environment through a trial-and-error process that mimics learning in humans. In each step, the RL agent, which contains a neural network, observes the state of the environment and takes a sizing action. The most time-consuming step in a traditional design procedure is layout, which is typically a manual iterative process. Layout parasitics degrade the schematic-level performance, requiring circuit resizing. However, the use of circuit generators, such as the Berkeley Analog Generator (BAG) [9] automates the layout iterations. RL agents have been coupled with BAG to automate the complete design process for a fixed circuit topology [7]. Simulations with post-layout parasitics are much slower than schematic-level simulations, which calls for deployment of RL techniques that limit the sampled space. Finally, the process of integrating an AMS block into an SoC and verifying its system-level performance can be very time consuming.

SESSION: Session 3: Circuit Evaluation and Simulation with ML

SpeedER: A Supervised Encoder-Decoder Driven Engine for Effective Resistance Estimation of Power Delivery Networks

  • Bing-Yue Wu
  • Shao-Yun Fang
  • Hsiang-Wen Chang
  • Peter Wei

Voltage (IR) analysis tools need to be launched multiple times during the Engineering Change Order (ECO) phase in the modern design cycle for Power Delivery Network (PDN) refinement, while analyzing the IR characteristics of advanced chip designs by using traditional IR analysis tools suffers from massive run-time. Multiple Machine Learning (ML)-driven IR analysis approaches have been frequently proposed to benefit from the fast inference time and flexible prediction ability. Among these ML-driven approaches, the Effective Resistance (effR) of a given PDN has been shown to be one of the most critical features that can greatly enhance model performance and thus prediction accuracy; however, calculating effR alone is still computationally expensive. In addition, in the ECO phase, even if only local adjustments of the PDN are required, the run-time of obtaining the regional effR changes by using traditional Laplacian Systems grows exponentially as the size of the chip grows. It is because the whole PDN needs to be considered in a Laplacian solver for computing the effR of any single network node. To address the problem, this paper proposes an ML-driven engine, SpeedER, that combines a U-Net model and a Fully Connected Neural Network (FCNN) with five selected features to speed up the process of estimating regional effRs. Experimental results show that SpeedER can be approximately four times faster than a commercial tool using a Laplacian System with errors of only around 1%.

XT-PRAGGMA: Crosstalk Pessimism Reduction Achieved with GPU Gate-level Simulations and Machine Learning

  • Vidya A. Chhabria
  • Ben Keller
  • Yanqing Zhang
  • Sandeep Vollala
  • Sreedhar Pratty
  • Haoxing Ren
  • Brucek Khailany

Accurate crosstalk-aware timing analysis is critical in nanometer-scale process nodes. While today’s VLSI flows rely on static timing analysis (STA) techniques to perform crosstalk-aware timing signoff, these techniques are limited due to their static nature as they use imprecise heuristics such as arbitrary aggressor filtering and simplified delay calculations. This paper proposes XT-PRAGGMA, a tool that uses GPU-accelerated dynamic gate-level simulations and machine learning to eliminate false aggressors and accurately predict crosstalk-induced delta delays. XT-PRAGGMA reduces STA pessimism and provides crucial information to identify crosstalk-critical nets, which can be considered for accurate SPICE simulation before signoff. The proposed technique is fast (less than two hours to simulate 30,000 vectors on million-gate designs) and reduces falsely-reported total negative slack in timing signoff by 70%.

Fast Prediction of Dynamic IR-Drop Using Recurrent U-Net Architecture

  • Yonghwi Kwon
  • Youngsoo Shin

Recurrent U-Net (RU-Net) is employed for fast prediction of dynamic IR-drop when power distribution network (PDN) contains capacitor components. Each capacitor can be modeled by a resistor and a current source, which is a function of v(t-Δ t) node voltages at time t – Δ t allow the PDN to be solved at time t which then allows the analysis at t + Δ t and so on. Provided that a quick prediction of IR-drop at one time instance can be done by U-Net, a image segmentation model, the analysis of PDN containing capacitors can be done by a number of U-Net instances connected in series, which become RU-Net architecture. Four input maps (effective PDN resistance map, PDN capacitance map, current map, and power pad distance map) are extracted from each layout clip, and are provided to RU-Net for IR-drop prediction. Experiments demonstrate that the proposed IR-drop prediction using the RU-Net is faster than a commercial tool by 16 times with about 12% error, while a simple U-Net-based prediction yields 19% error due to its inability to consider capacitors.

SESSION: Session 4: DRC, Test and Hotspot Detection using ML Methods

Efficient Design Rule Checking Script Generation via Key Information Extraction

  • Binwu Zhu
  • Xinyun Zhang
  • Yibo Lin
  • Bei Yu
  • Martin Wong

Design rule checking (DRC) is a critical step in integrated circuit design. DRC requires formatted scripts as the input to the design rule checker. However, these scripts are always generated manually in the foundry, and such a generation process is extremely inefficient, especially when encountering a large number of design rules. To mitigate this issue, we first propose a deep learning-based key information extractor to automatically identify the essential arguments of the scripts from rules. Then, a script translator is designed to organize the extracted arguments into executable DRC scripts. In addition, we incorporate three specific design rule generation techniques to improve the performance of our extractor. Experimental results demonstrate that our proposed method can significantly reduce the cost of script generation and show remarkable superiority over other baselines.

Scan Chain Clustering and Optimization with Constrained Clustering and Reinforcement Learning

  • Naiju Karim Abdul
  • George Antony
  • Rahul M. Rao
  • Suriya T. Skariah

Scan chains are used in design for test by providing controllability and observability at each register. Scan optimization is run during physical design after placement where scannable elements are re-ordered along the chain to reduce total wirelength (and power). In this paper, we present a machine learning based technique that leverages constrained clustering and reinforcement learning to obtain a wirelength efficient scan chain solution. Novel techniques like next-min sorted assignment, clustered assignment, node collapsing, partitioned Q-Learning and in-context start-end node determination are introduced to enable improved wire length while honoring design-for-test constraints. The proposed method is shown to provide up to 24% scan wirelength reduction over a traditional algorithmic optimization technique across 188 moderately sized blocks from an industrial 7nm design.

Autoencoder-Based Data Sampling for Machine Learning-Based Lithography Hotspot Detection

  • Mohamed Tarek Ismail
  • Hossam Sharara
  • Kareem Madkour
  • Karim Seddik

Technology scaling has increased the complexity of integrated circuit design. It has also led to more challenges in the field of Design for Manufacturing (DFM). One of these challenges is lithography hotspot detection. Hotspots (HS) are design patterns that negatively affect the output yield. Identifying these patterns early in the design phase is crucial for high yield fabrication. Machine Learning-based (ML) hotspot detection techniques are promising since they have shown superior results to other methods such as pattern matching. Training ML models is a challenging task due three main reasons. First, industrial training designs contain millions of unique patterns. It is impractical to train models using this large number of patterns due to limited computational and memory resources. Second, the HS detection problem has an imbalanced nature; datasets typically have a limited number of HS and a large number of non-hotspots. Lastly, hotspot and non-hotspot patterns can have very similar geometries causing models to be susceptible to high false positive rates. Due to these reasons, the use of data sampling techniques is needed to choose the best representative dataset for training. In this paper, a dataset sampling technique based on autoencoders is introduced. The autoencoders are used to identify latent data features that can reconstruct the input patterns. These features are used to group the patterns using Density-based spatial clustering of applications with noise (DBSCAN). Then, the clustered patterns are sampled to reduce the training set size. Experiments on the ICCAD-2019 dataset show that the proposed data sampling approach can reduce the dataset size while maintaining the levels of recall and precision that were obtained using the full dataset.

SESSION: Session 5: Power and Thermal Evaluation with ML

Driving Early Physical Synthesis Exploration through End-of-Flow Total Power Prediction

  • Yi-Chen Lu
  • Wei-Ting Chan
  • Vishal Khandelwal
  • Sung Kyu Lim

Leading-edge designs on advanced nodes are pushing physical design (PD) flow runtime into several weeks. Stringent time-to-market constraint necessitates efficient power, performance, and area (PPA) exploration by developing accurate models to evaluate netlist quality in early design stages. In this work, we propose PD-LSTM, a framework that leverages graph neural networks (GNNs) and long short-term memory (LSTM) networks to perform end-of-flow power predictions in early PD stages. Experimental results on two commercial CPU designs and five OpenCore netlists demonstrate that PD-LSTM achieves high fidelity total power prediction results within 4% normalized root-mean-squared error (NRMSE) on unseen netlists and a correlation coefficient score as high as 0.98.

Towards Neural Hardware Search: Power Estimation of CNNs for GPGPUs with Dynamic Frequency Scaling

  • Christopher A. Metz
  • Mehran Goli
  • Rolf Drechsler

Machine Learning (ML) algorithms are essential for emerging technologies such as autonomous driving and application-specific Internet of Things(IoT) devices. Convolutional Neural Network(CNN) is one of the major techniques used in such systems. This leads to leveraging ML accelerators like GPGPUs to meet the design constraints. However, GPGPUs have high power consumption, and selecting the most appropriate accelerator requires Design Space Exploration(DSE), which is usually time-consuming and needs high manual effort. Neural Hardware Search(NHS) is an upcoming approach to automate the DSE for Neural Networks. Therefore, automatic approaches for power, performance, and memory estimations are needed.

In this paper, we present a novel approach, enabling designers to fast and accurately estimate the power consumption of CNNs inferencing on GPGPUs with Dynamic Frequency Scaling(DFS) in the early stages of the design process. The proposed approach uses static analysis for feature extraction and Random Forest Tree regression analysis for predictive model generation. Experimental results demonstrate that our approach can predict the CNNs power consumption with a Mean Absolute Percentage Error(MAPE) of 5.03% compared to the actual hardware.

A Thermal Machine Learning Solver For Chip Simulation

  • Rishikesh Ranade
  • Haiyang He
  • Jay Pathak
  • Norman Chang
  • Akhilesh Kumar
  • Jimin Wen

Thermal analysis provides deeper insights into electronic chips’ behavior under different temperature scenarios and enables faster design exploration. However, obtaining detailed and accurate thermal profile on chip is very time-consuming using FEM or CFD. Therefore, there is an urgent need for speeding up the on-chip thermal solution to address various system scenarios. In this paper, we propose a thermal machine-learning (ML) solver to speed-up thermal simulations of chips. The thermal ML-Solver is an extension of the recent novel approach, CoAEMLSim (Composable Autoencoder Machine Learning Simulator) with modifications to the solution algorithm to handle constant and distributed HTC. The proposed method is validated against commercial solvers, such as Ansys MAPDL, as well as a latest ML baseline, UNet, under different scenarios to demonstrate its enhanced accuracy, scalability, and generalizability.

SESSION: Session 6: Performance Prediction with ML Models and Algorithms

Physically Accurate Learning-based Performance Prediction of Hardware-accelerated ML Algorithms

  • Hadi Esmaeilzadeh
  • Soroush Ghodrati
  • Andrew B. Kahng
  • Joon Kyung Kim
  • Sean Kinzer
  • Sayak Kundu
  • Rohan Mahapatra
  • Susmita Dey Manasi
  • Sachin S. Sapatnekar
  • Zhiang Wang
  • Ziqing Zeng

Parameterizable ML accelerators are the product of recent breakthroughs in machine learning (ML). To fully enable the design space exploration, we propose a physical-design-driven, learning-based prediction framework for hardware-accelerated deep neural network (DNN) and non-DNN ML algorithms. It employs a unified methodology, coupling backend power, performance and area (PPA) analysis with frontend performance simulation, thus achieving realistic estimation of both backend PPA and system metrics (runtime and energy). Experimental studies show that the approach provides excellent predictions for both ASIC (in a 12nm commercial process) and FPGA implementations on the VTA and VeriGOOD-ML platforms.

Graph Representation Learning for Gate Arrival Time Prediction

  • Pratik Shrestha
  • Saran Phatharodom
  • Ioannis Savidis

An accurate estimate of the timing profile at different stages of the physical design flow allows for pre-emptive changes to the circuit, significantly reducing the design time and effort. In this work, a graph based deep regression model is utilized to predict the gate level arrival time of the timing paths of a circuit. Three scenarios for post routing prediction are considered: prediction after completing floorplanning, prediction after completing placement, and prediction after completing clock tree synthesis (CTS). A commercial static timing analysis (STA) tool is utilized to determine the mean absolute percentage error (MAPE) and the mean absolute error (MAE) for each scenario. Results obtained across all models trained on the complete dataset indicate that the proposed methodology outperforms the baseline errors produced by the commercial physical design tools with an average improvement of 61.58 in the MAPE score when predicting the post-routing arrival time after completing floorplanning and 13.53 improvement when predicting the post-routing arrival time after completing placement. Additional prediction scenarios are analyzed, where the complete dataset is further sub-divided based on the size of the circuits, which leads to an average improvement of 34.83 in the MAPE score as compared to the commercial tool for post-floorplanning prediction of the post-routing arrival time and 22.71 improvement for post-placement prediction of the post-routing arrival time.

A Tale of EDA’s Long Tail: Long-Tailed Distribution Learning for Electronic Design Automation

  • Zixuan Jiang
  • Mingjie Liu
  • Zizheng Guo
  • Shuhan Zhang
  • Yibo Lin
  • David Pan

Long-tailed distribution is a common and critical issue in the field of machine learning. While prior work addressed data imbalance in several tasks in electronic design automation (EDA), insufficient attention has been paid to the long-tailed distribution in real-world EDA problems. In this paper, we argue that conventional performance metrics can be misleading, especially in EDA contexts. Through two public EDA problems using convolutional neural networks and graph neural networks, we demonstrate that simple yet effective model-agnostic methods can alleviate the issue induced by long-tailed distribution when applying machine learning algorithms in EDA.

SESSION: Plenary II

Industrial Experience with Open-Source EDA Tools

  • Christian Lück
  • Daniela Sánchez Lopera
  • Sven Wenzek
  • Wolfgang Ecker

Commonly, the design flow of integrated circuits from initial specifications to fabrication employs commercial, proprietary EDA tools. While these tools deliver high-quality, production-ready results, they can be seen as expensive black boxes and thus, are not suited for research and academic purposes. Innovations on the field are mostly focused on optimizing the quality of the results of the designs by modifying core elements of the tool chain or using techniques of the Machine Learning domain. In both cases, researchers require many or long runs of EDA tools for comparing results or generating training data for Machine Learning models. Using proprietary, commercial tools in those cases may be either not affordable or not possible at all.

With OpenROAD and OpenLane mature open-source alternatives emerged in the past couple of years. The development is driven by a growing community that is improving and extending the tools daily. In contrast to commercial tools, OpenROAD and OpenLane are transparent and allow inspection, modification and replacement of every tool aspect. They are also free and therefore are well suited for use cases such as Machine Learning data generation. Specifically, the fact that no licenses are needed neither for the tools nor for the default PDK enables even fresh students and starters on the field to quickly deploy their ideas and create initial proof of concepts.

Therefore, we at Infineon are using OpenROAD and OpenLane for more experimental and innovative projects. Our vision is to build initial prototypes using free software, and then improve upon them by cross-checking and polishing with commercial tools before delivering them for production. This talk will show Infineon’s experience with these open-source tools so far.

The first steps involved getting OpenLane installed in our company IT infrastructure. While their developers offer convenient build methods using Docker containers, these cannot be used in Infineon’s compute farm. This, and also the fact that most of the open-source tools are currently evolving quickly with little to no versioning, lead to the setup of an in-house continuous integration and continuous delivery system for nightly and weekly builds of the tools. Once the necessary tools were installed and running, effort was put into integrating Infineon’s in-house technology data.

At Infineon, we envision two use cases for OpenROAD/OpenLane: physical synthesis hyperparameter exploration (and tuning) and optimization of the complete flow starting from RTL. First, our goal is to use OpenROAD’s AutoTuner in the path-finding phase to automatically and cost-effectively find optimal parameters for the flow and then build upon these results within a commercial tool for the later steps near the tapeout. Second, we want to include not only the synthesis flow inside the optimization loop of the AutoTuner, but also our in-house RTL generation framework (MetaRTL). For instance, having RTL generators for a RISC-V CPU and also the results of simulated runtime benchmarks for each iteration, the AutoTuner should be able to change fundamental aspects (for example number of pipeline stages) of the RTL to reach certain power, performance, and area requirements when running the benchmark code on the CPU.

Overall, we see OpenROAD/OpenLane as a viable alternative to commercial tools, especially for research and academic use, where modifications to the tools are needed and where very long and otherwise costly tool runtimes are expected.

SESSION: Session 7: ML Models for Analog Design and Optimization

Invertible Neural Networks for Design of Broadband Active Mixers

  • Oluwaseyi Akinwande
  • Osama Waqar Bhatti
  • Xingchen Li
  • Madhavan Swaminathan

In this work, we present the invertible neural network for predicting the posterior distributions of the design space of broadband active mixers with RF from 100 MHz to 10 GHz. This invertible method gives a fast and accurate model when investigating crucial properties of active mixers such as conversion gain and noise figure. Our results show that the response generated by the invertible neural network model has close correlation with the output response from the circuit simulator.

High Dimensional Optimization for Electronic Design

  • Yuejiang Wen
  • Jacob Dean
  • Brian A. Floyd
  • Paul D. Franzon

Bayesian optimization (BO) samples points of interest to update a surrogate model for a blackbox function. This makes it a powerful technique to optimize electronic designs which have unknown objective functions and demand high computational cost of simulation. Unfortunately, Bayesian optimization suffers from scalability issues, e.g., it can perform well in problems up to 20 dimensions. This paper addresses the curse of dimensionality and proposes an algorithm entitled Inspection-based Combo Random Embedding Bayesian Optimization (IC-REMBO). IC-REMBO improves the effectiveness and efficiency of the Random EMbedding Bayesian Optimization (REMBO) approach, which is a state-of-the-art high dimensional optimization method. Generally, it inspects the space near local optima to explore more points near local optima, so that it mitigates the over-exploration on boundaries and embedding distortion in REMBO. Consequently, it helps escape from local optima and provides a family of feasible solutions when inspecting near global optimum within a limited number of iterations.

The effectiveness and efficiency of the proposed algorithm are compared with the state-of-the-art REMBO when optimizing a mmWave receiver with 38 calibration parameters to meet 4 objectives. The optimization results are close to that of a human expert. To the best of our knowledge, this is the first time applying REMBO or inspection method to electronic design.

Transfer of Performance Models Across Analog Circuit Topologies with Graph Neural Networks

  • Zhengfeng Wu
  • Ioannis Savidis

In this work, graph neural networks (GNNs) and transfer learning are leveraged to transfer device sizing knowledge learned from data of related analog circuit topologies to predict the performance of a new topology. A graph is generated from the netlist of a circuit, with nodes representing the devices and edges the connections between devices. To allow for the simultaneous training of GNNs on data of multiple topologies, graph isomorphism networks are adopted to address the limitation of graph convolutional networks in distinguishing between different graph structures. The techniques are applied to transfer predictions of performance across four op-amp topologies in a 65 nm technology, with 10000 sets of sizing and performance evaluations sampled for each circuit. Two scenarios, zero-shot learning and few-shot learning, are considered based on the availability of data in the target domain. Results from the analysis indicate that zero-shot learning with GNNs trained on all the data of the three related topologies is effective for coarse estimates of the performance of the fourth unseen circuit without requiring any data from the fourth circuit. Few-shot learning by fine-tuning the GNNs with a small dataset of 100 points from the target topology after pre-training on data from the other three topologies further boosts the model performance. The fine-tuned GNNs outperform the baseline artificial neural networks (ANNs) trained on the same dataset of 100 points from the target topology with an average reduction in the root-mean-square error of 70.6%. Applying the proposed techniques, specifically GNNs and transfer learning, improves the sample efficiency of the performance models of the analog ICs through the transfer of predictions across related circuit topologies.

RxGAN: Modeling High-Speed Receiver through Generative Adversarial Networks

  • Priyank Kashyap
  • Archit Gajjar
  • Yongjin Choi
  • Chau-Wai Wong
  • Dror Baron
  • Tianfu Wu
  • Chris Cheng
  • Paul Franzon

Creating models for modern high-speed receivers using circuit-level simulations is costly, as it requires computationally expensive simulations and upwards of months to finalize a model. Added to this is that many models do not necessarily agree with the final hardware they are supposed to emulate. Further, these models are complex due to the presence of various filters, such as a decision feedback equalizer (DFE) and continuous-time linear equalizer (CTLE), which enable the correct operation of the receiver. Other data-driven approaches tackle receiver modeling through multiple models to account for as many configurations as possible. This work proposes a data-driven approach using generative adversarial training to model a real-world receiver with varying DFE and CTLE configurations while handling different channel conditions and bitstreams. The approach is highly accurate as the eye height and width are within 1.59% and 1.12% of the ground truth. The horizontal and vertical bathtub curves match the ground truth and correlate to the ground truth bathtub curves.

Who’s Tsung-Wei Huang

Sep 1st, 2022

Tsung-Wei Huang

Assistant Professor

University of Utah

Email:

tsung-wei.huang@utah.edu

Personal webpage

https://tsung-wei-huang.github.io/

Research interests

Design automation and high-performance computing.

Short bio

Dr. Tsung-Wei Huang received his B.S. and M.S. degrees from the Department of Computer Science at Taiwan’s National Cheng-Kung University in 2010 and 2011, respectively. He then received his Ph.D. degree from the Department of Electrical and Computer Engineering (ECE) at the University of Illinois at Urbana-Champaign (UIUC) in 2017. He has been researching on high-performance computing systems with application focus on design automation algorithms and machine learning kernels. He has created several open-source software, such as Taskflow and OpenTimer, that are being used by many people. Dr. Huang receives several awards for his research contributions, including ACM SIGDA Outstanding PhD Dissertation Award in 2019, NSF CAREER Award in 2022, Humboldt Research Fellowship Award in 2022. He also received the 2022 ACM SIGDA Service Award for recognizing his community service that engaged students in design automation research.

Research highlights

(1) Parallel Programming Environment: Modern scientific computing relies on a heterogeneous mix of computational patterns, domain algorithms, and specialized hardware to achieve key scientific milestones that go beyond traditional capabilities. However, programming these applications often requires complex expert-level tools and a deep understanding of parallel decomposition methodologies. Our research investigates new programming environments to assist researchers and developers to tackle the implementation complexities of high-performance parallel and heterogeneous programs.

(2) Electronic Design Automation (EDA): The ever-increasing design complexity in VLSI implementation has far exceeded what many existing EDA tools can scale with reasonable design time and effort. A key fundamental challenge is that EDA must incorporate new parallel paradigms comprising manycore CPUs and GPUs to achieve transformational performance and productivity milestones. Our research investigates new computing methods to advance the current state-of-the-art by assisting everyone to efficiently tackle the challenges of designing, implementing, and deploying parallel EDA algorithms on heterogeneous nodes.

(3) Machine Learning Systems: Machine learning has become centric to a wide range of today’s applications, such as recommendation systems and natural language processing. Due to the unique performance characteristics, GPUs are increasingly used for machine learning applications and can dramatically accelerate neural network training and inference. Modern GPUs are fast and are equipped with new programming models and scheduling runtimes that can bring significant yet largely untapped performance benefits to many machine learning applications. Our research investigates novel parallel algorithms and frameworks to accelerate machine learning system kernels with order-of-magnitude performance breakthrough.