Who’s Li Jiang

July 1st, 2022

Li Jiang

Assistant Professor

Shanghai Jiao Tong University

Email:

ljiang_cs@sjtu.edu.cn

Personal webpage

https://www.cs.sjtu.edu.cn/~jiangli/

Research interests

Compute-in-memory, Neuromorphic Computing, Domain Specific Architecture for AI, Database, networking etc.

Short bio

Li Jiang received the B.S. degree from the Dept. of CS&E, Shanghai Jiao Tong University in 2007, the MPhil, and the Ph.D. degree from the Dept. of CS&E, the Chinese University of Hong Kong in 2010 and 2013, respectively. He has published more than 80 peer-review papers in top-tier computer architecture, EDA and AI/Database conferences and journals, including ISCA, MICRO, DAC, ICCAD, AAAI, ICCV, SigIR, TC, TCAD, TPDS and etc. He received the Best Paper Award in DATE’22, Best Paper Nomination in ICCAD10, and DATE21. According to the IEEE Digital Library, five articles ranked in the top 5 of citations of all papers collected at its conferences. Some of the achievements have been introduced into the IEEE P1838 standard, and several technologies have been in commercial use in cooperation with TSMC, Huawei, and Alibaba.

He got the best Ph.D. Dissertation award in ATS 2014, and he was in the final list of TTTC’s E. J. McCluskey Doctoral Thesis Award. He received ACM Shanghai Rising Star award and CCF VLSI early career award in 2019. He received the 2nd class prize of Wu Wenjun Award for Artificial Intellegence. He serves as co-chair and TPC member in several international and national conferences, such as MICRO, DATE, ASP-DAC, ITC-Asia, ATS, CFTC, CTC, etc. He is an Associate Editor of IET Computers Digital Techniques, VLSI, the Integration Journal. He is the co-founder of ChinaDA and ACM/SigDA East China Branch.

Reasearch highlights

Prof. Li Jiang has been working on the test and repair architecture of 3D ICs that can dramatically reduce costs, advocating and emphasizing the precious resources sharing mechanism. They optimize the 3D SoC test architecture under test-pin count and thermal dissipation constraints by sharing the test-access-mechanism (TAM) and test wire of pre-bond wafer-level and post-bond package-level tests. They further propose the inter-die spare-sharing technique and the die-matching algorithms to improve the stack yield of 3D stacked memory. This work is nominated as the best paper in ICCAD 2010. Based on this method, they work with TSMC to propose a novel BISR architecture that can cluster and map faulty rows/columns across die to the same spare row/column to enhance the reparability. This series of works have been widely accepted by the mainstream and introduced into the IEEE P1838 standard.

To improve the assembly yield in the TSV fabrication process, they develop a fault model considering TSV coupling effect that has not been carefully investigated before. It leads their attention to a unique phenomenon, i.e., the faulty TSVs can be clustered. Thus, they propose a novel spare-TSV sharing architecture composed of a lightweight switch design, two effective and efficient repair algorithms, and a TSV-grid mapping mechanism that can avoid catastrophic TSV clustering defects.

ReRAM cell needs multiple programming pulses to avoid device programming variation and resistance drifting. To overcome the resulting programming latency and energy, they propose a Self-Terminating Write (STW) circuit that heavily reuses the inherent PIM peripherals (e.g., ADC and Trans-Impedance Amplifier) to obtain 2-bit precision via a single program pulse. This work is the best paper award of DATE 2022.

Who’s Fan Chen

July 1st, 2022

Fan Chen

Assistant Professor

Indiana University Bloomington

Email:

fc7@iu.edu

Personal webpage

https://homes.luddy.indiana.edu/fc7/

Research interests

Beyond-CMOS Computing, Quantum Machine Learning, Accelerator Architecture for Emerging Applications, Emerging Nonvolatile Memory

Short bio

Fan Chen is an assistant professor in the Department of Intelligent Systems Engineering at the Indiana University Bloomington. Dr. Chen received her Ph.D. from the department of Electrical and Computer Engineering at Duke University. Dr. Chen is a recipient of the 2022 NSF Faculty Early Career Development Program (CAREER) Award, the 2021 Service Recognition Award of Great Lakes Symposium on VLSI (GLSVLSI), the 2019 Cadence Women in Technology Scholarship, the Best Paper Award and the Ph.D. forum Best Poster Award at 2018 Asia and South Pacific Design Automation Conference (ASP-DAC). Dr. Chen serves as the publication chair of ISLPED 2022/2021, chair of SIGDA University Booth at DAC 2022/2021, web and registration chair of GLSVLSI 2022, proceedings chair of ASAP 2021, arrangement chair of GLSVLSI 2021. Dr. Chen also serves on the editorial board of IEEE Circuits and Systems Magazine (CAS-M). She is a technical reviewer for over 30+ international conferences/journals, such as IEEE TC, IEEE TCAS-I, IEEE TNNLS, IEEE D&T, IEEE IoT-J, ACM TACO, ACM TODAES, ACM JETC, etc.

Reasearch highlights

Prof Chen’ research interests are focused on beyond-CMOS computing, quantum machine learning, accelerator architecture for emerging applications. Her latest work on quantum machine learning investigates fundamentally novel quantum equivalent of deep learning frameworks derived from the working principles of quantum computers, paving the way for general-purpose quantum algorithms on noisy intermediate-scale quantum devices. Another notable contribution of Prof. Chen’s work is accelerator architecture designs for emerging applications including deep learning and bioinformatics. The memory and computing requirements of such big-data applications pose significant technical challenges for their adoption in a broader range of services. Prof. Chen investigates how system/architecture/algorithm co-designed domain-specific accelerators can help on performance and energy efficiency. Prof. Chen’s works have been recognized by the academic community and appeared in top conferences, such as HPCA, DAC, ICCAD, DATE, ISLPED, ASP-DAC, and ESWEEK. Her research on “System Support for Scalable, Fast, and Power-Efficient Genome Sequencing” has been honored with the National Science Foundation Faculty Early Career Development CAREER Award.

Who’s Kuan-Hsun Chen

May 1st, 2022

Kuan-Hsun Chen

Assistant Professor

University of Twente, the Netherlands

Email:

k.h.chen@utwente.nl

Personal webpage

https://people.utwente.nl/k.h.chen

Research interests

Real-Time Embedded Systems, Non-Volatile Memories, Architecture-Aware Software Design, Resource-Constrained Machine Learning

Short bio

Dr.-Ing. Kuan-Hsun Chen is an assistant professor at the Chair of Computer Architecture and Embedded Systems (CAES) for the University of Twente in the Netherlands. He earned his Ph.D. degree (05.2019) in Computer Science (Dr.-Ing.) from TU Dortmund University, Germany with distinction (summa cum laude), and his master’s degree in Computer Science at National Tsing Hua University in Taiwan. He has published more than 40 scientific works in top peer-reviewed journals and international conferences. His key research interests are in design for real-time embedded systems, non-volatile memories, architecture-aware software design, and resource-constrained machine learning. Dr. Chen currently serves as Associate Editor in the journal of AIMS Applied Computing and Intelligence (AIMS-ACI) and Guest Editor for the Journal of Signal Processing Systems (JSPS). He is also a Technical Program Committee (TPC) member for various leading international conferences in the computer science area like Real-Time Systems Symposium (RTSS), International Conference on High Performance Computing, Data, & Analytics (HiPC), and others. He is also a reviewer for many peer-reviewed journals and conferences (TC, TECS, TCPS, RTAS, IROS, ECML PKDD) in computer science. Dr. Chen holds one best student paper award at RTCSA’18, one best paper nomination at DATE’21, and one dissertation award at TU Dortmund University in 2019. He was granted by the German Academic Exchange Service (DAAD) one research project as Principal Investigator and one personal grant for postdoctoral exchange in Japan for the Summer of 2021. He is also a volunteer mentor in the European Space Agency (ESA) Summer of Code in Space (2017) and Google Summer of Code since 2016 for open-source development on a popular real-time operating system, namely RTEMS.

Reasearch highlights

Embedded systems in various safety-critical domains, such as computing systems in automotive and avionic devices, are important for modern society. Due to their intensive interaction with the physical environment, where time naturally progresses, the correctness of the system depends not only on the functional correctness of the delivered results but also on the timeliness of the instant at which these results are delivered. Dr. Chen’s research results cover a wide range of scientific issues in such areas, and two central-most research areas are as follows: Dependable Real-Time Systems: Along with technology shrinking, the presence of hardware faults is growing, which risks the correct system behavior. Against such faults, software tolerance techniques are prominent due to their flexibility. However, their time overhead also makes timeliness a pressing issue. Under this context, three kinds of treatments are studied: 1) In soft real-time systems, occasional deadline misses are acceptable. A series of analyses for the probability of deadline misses are developed. The most effective one is to efficiently derive safe upper bounds on the probability of deadline misses with several magnitude speed-up, in comparison to conventional convolution-based approaches (https://ieeexplore.ieee.org/abstract/document/7993392). 2) By modeling inherent safety margin in applications, soft errors can also be safely ignored in control applications. A runtime adaptive method is thus developed to only compensate when it is necessary while satisfying hard real-time constraints. This work was presented in LCTES’16 and published in ACM SIGPLAN (https://dl.acm.org/doi/abs/10.1145/2980930.2907952). 3) On multi-core systems, several approaches are developed to optimize the system reliability via the deployment of redundant multithreading. A reliability-driven task mapping technique is developed for homogeneous multi-core architectures with reliability and performance heterogeneity, which was published in IEEE Transactions on Computers (https://ieeexplore.ieee.org/abstract/document/7422036). Architecture-Aware Software Design: To unleash the scarce computational power on embedded systems, he focuses on how to exploit a given architecture, especially for data analysis applications, e.g., data mining and machine learning. He develops code generators to automate the optimization of the memory layouts for the tree-based inference model. Given a trained model, the optimized code sessions are generated in C++ to reduce cache misses for various CPU architectures and speed up the runtime. This work is recently published in ACM Transactions on Embedded Computing Systems (https://dl.acm.org/doi/abs/10.1145/3508019). He also works on the system design for non-volatile memories, which feature several advantages like low leakage power, high density, and low unit costs. However, they also impose novel technical constraints, especially limited endurance. His research results on software-based memory analyses, wear-leveling approaches, etc. One highlight is the exploration of energy-aware real-time scheduling for hybrid memory architectures. In this work, a multi-processor procrastination algorithm (HEART) is proposed, based on partitioned earliest-deadline-first (pEDF) scheduling, which facilitates reducing energy consumption by actively enlarging the hibernation time. This work was presented in EMSOFT’21 and published in ACM Transactions on Embedded Computing Systems (https://dl.acm.org/doi/abs/10.1145/3477019).

ISPD’22 TOC

ISPD ’22: Proceedings of the 2022 International Symposium on Physical Design

 Full Citation in the ACM Digital Library

SESSION: Session 1: Opening Session and First Keynote

Session details: Session 1: Opening Session and First Keynote

  • Laleh Behjat
  • Stephen Yang

The Need for Speed: From Electric Supercars to Cloud Bursting for Design

  • Dean Drako

Our industry has insatiable need for speed. In addition to fast products for consumer electronics, medical, mil-aero, security, smart sensors, AI processing, robots, and more, we also continuously push for higher performance for the processing and communication infrastructure needs for true hyper-connectivity.

Dean Drako will compare our industry’s drive for speed to electric supercars. He will then drill down into four key elements that advanced design and verification teams deploy to speed the delivery of their innovative products to successfully meet market windows.

SESSION: Session 2: Placement, Clock Tree Synthesis, and Optimization

Session details: Session 2: Placement, Clock Tree Synthesis, and Optimization

  • Deepashree Sengupta

RTL-MP: Toward Practical, Human-Quality Chip Planning and Macro Placement

  • Andrew B. Kahng
  • Ravi Varadarajan
  • Zhiang Wang

In a typical RTL-­to-­GDSII flow, floorplanning plays an essential role in achieving decent quality of results (QoR). A good floorplan typically requires interaction between the frontend designer, who is responsible for the functionality of the RTL, and the backend physical design engineer. The increasing complexity of macro-­dominated designs (especially machine learning accelerators with autogenerated RTL) has made the floorplanning task even more challenging and time­-consuming. In this paper, we propose RTL-­MP, a novel macro placer which utilizes RTL information and tries to “mimic” the interaction between the frontend RTL designer and the backend physical design engineer to produce human-­quality floorplans. By exploiting the logical hierarchy and processing logical modules based on connection signatures, RTL-­MP can capture the dataflow inherent in the RTL and use the dataflow information to guide macro placement. We also apply autotuning to optimize hyperparameter settings based on input designs. We have built RTL­-MP based on OpenROAD infrastructure and applied RTL-­MP to a set of industrial designs. RTL­-MP outperforms state-­of-­the­-art commercial macro placers and achieves QoR similar to that of handcrafted floorplans.

Clock Design Methodology for Energy and Computation Efficient Bitcoin Mining Machines

  • Chien-Pang Lu
  • Iris Hui-Ru Jiang
  • Chih-Wen Yang

Bitcoin mining machines become a new driving force to push the physical limitation of semiconductor process technology. Instead of peak performance, mining machines pursue energy and computation efficiency of implementing cryptographic hash functions. Therefore, the state-of-the-art ASIC design of mining machines adopts near-threshold computing, deep pipelines, and uni-directional data flow. According to these design properties, in this paper, we propose a novel clock reversing tree design methodology for bitcoin mining machines. In the clock reversing tree, the clock of global tree is fed from the last pipeline stage backward to the first one, and the clock latency difference between the local clock roots of two consecutive stages maintains a constant delay. The local tree of each stage is well balanced and keeps the same clock latency. The special clock topology naturally utilizes setup time slacks to gain hold time margins. Moreover, to alleviate the incurred on-chip variations due to near-threshold computing, we maximize the common clock path shared by flip-flops of each individual stage. Finally, we perform inverter pair swap to maintain duty cycle. Experimental results show that our methodology is promising for industrial bitcoin mining designs: Compared with two variation-aware clock network synthesis approaches widely used in modern ASIC designs, our approach can reduce up to 64% clock buffer/inverter usage, 12% clock power, decrease 99% hold time violating paths, and achieve 85% area saving for timing fixing. The proposed clock design methodology is general and applicable to blockchain and other ASICs with deep pipelines and strong data flow.

Kernel Mapping Techniques for Deep Learning Neural Network Accelerators

  • Sarp Özdemir
  • Mohammad Khasawneh
  • Smriti Rao
  • Patrick H. Madden

Deep learning applications are compute intensive and naturally parallel; this has spurred the development of new processor architectures tuned for the work load. In this paper, we consider structural differences between deep learning neural networks and more conventional circuits — highlighting how this impacts strategies for mapping neural network compute kernels onto available hardware. We present an efficient mapping approach based on dynamic programming, and also a method to establish performance bounds. We also propose an architectural approach to extend the practical life time of hardware accelerators, enabling the integration of a variety of heterogenous processors into a high performance system. Experimental results using benchmarks from a recent ISPD contest are also reported.

SESSION: Session 3: Design Flow Advances with Machine Learning and Lagrangian Relaxation

Session details: Session 3: Design Flow Advances with Machine Learning and Lagrangian Relaxation

  • Ulf Schlichtmann

Design Flow Parameter Optimization with Multi-Phase Positive Nondeterministic Tuning

  • Matthew M. Ziegler
  • Lakshmi N. Reddy
  • Robert L. Franch

Synthesis and place & route tools are highly leveraged for modern digital design. But, despite continuous improvement in CAD tool performance, products in competitive markets often set PPA (performance, power, area) targets beyond what the tools can natively deliver. These aggressive targets lead to circuit designers attempting to tune a vast number of design flow parameters in search of near-optimal design specific flow recipes. Compounding the complex design flow parameter tuning problem is that many digital design tools exhibit nondeterminism, i.e., run-to-run variation. While CAD tool nondeterminism is typically considered an undesirable behavior, this paper proposes design flow tuning methodologies that take advantage of nondeterminism. We propose techniques that employ a combination of running targeted scenarios multiple times to exploit positive deviations nondeterminism can produce and leverage the best observed runs as seeds for multi-phase tuning. We introduce three seed variants for multi-phase tuning that have a spectrum of characteristics, trading off PPA improvement and reduce run-to-run variation. Our experimental analysis using high-performance industrial designs show that the proposed novel techniques outperform an existing state-of-the-art industrial design flow tuning program across all PPA metrics. Furthermore, our proposed approaches reduce run-to-run variation of the best scenarios, leading to a more predictable design flow.

Integrating LR Gate Sizing in an Industrial Place-and-Route Flow

  • David Chinnery
  • Ankur Sharma

Lagrangian relaxation (LR) based gate sizing is the state-of-the-art gate-sizing approach. Integrating it within a place-and-route (P&R) tool is difficult as LR needs multiple iterations to converge, requiring very fast timing analysis. Gate-sizing is invoked in many P&R flow steps, so it is also unclear where best to use LR sizing. We detail development of a LR gate sizer for an industrial P&R flow. Software architecture and P&R flow needs are discussed. We summarize how we sped up the LR sizer by 3x to resize a million gates per hour, and ensure multi-threaded results are deterministic. LR sizing experiments at the fast WNS/TNS optimization steps in the flow stages before and after clock tree synthesis (CTS) show excellent results: 10% to 20% setup timing total negative slack (TNS) reduction with 11% to 14% less leakage power, or 1% to 3% lower total power (dynamic power + leakage) with a total power objective, and 1% to 3% lower cell area. Worst negative slack (WNS) also improved in 2/3 of designs in pre-CTS. In the full flow, 5% lower leakage, 1% lower total power, and 0.6% lower cell area can be achieved, with roughly neutral impact on other metrics, compared to a high-effort low-power P&R flow baseline.

Machine-Learning Enabled PPA Closure for Next-Generation Designs

  • Vishal Khandelwal

Slowdown in process scaling is putting increasing pressure on EDA tools to bridge the power, performance and area (PPA) entitlement gap of Moore’s Law. State-of-the-art designs are pushing the PPA envelope to the limit, accompanied by increasing design size and complexity, and shrinking time-to-market constraints. Al/ML techniques provide a promising direction to address many of the modeling and convergence challenges seen in physical design flows. Further, the promise of intelligent design tools capable of exploring the solution space efficiently brings game-changing possibilities to next-generation design methodologies. In this talk we will discuss various challenges and opportunities in delivering best-in-class PPA closure with AI/ML augmented digital implementation tools. We will also talk about some aspects of large-scale industrial adoption of such a system and the AI capabilities needed to power these tools to minimize the need for an expert user, or endless tool iterations.

Improving Chip Design Performance and Productivity Using Machine Learning

  • Narender Hanchate

Engineering teams are always under pressure to deliver increasingly aggressive power, performance and area (PPA) goals, as fast as possible, on many concurrent projects. Chip designers often spend significant time tuning the implementation flow for each project to meet these goals. Cadence Cerebrus machine learning chip design flow optimization automates this whole process, delivering better PPA much more quickly. During this presentation Cadence will discuss Cerebrus machine learning and distributed computing techniques which enable RTL to GDS flow optimization, delivering better engineering productivity and design performance.

SESSION: Session 4: Panel on Traditional Algorithms Versus Machine Learning Approaches

Session details: Session 4: Panel on Traditional Algorithms Versus Machine Learning Approaches

  • Patrick Groeneveld

From Hard-Coded Heuristics to ML-Driven Optimization: New Frontiers for EDA

  • Patrick R. Groeneveld

The very first Design Automation Conference was held in 1964 when computers were programmed with punch cards. The initial topics were related to automated Printed Circuit Board design, cell placement, and early attempts at transient circuit analysis. The next decades saw the introduction of key graph algorithms and numerical analysis methods. Optimal algorithms and more practical heuristic methods were published. The 1980ies saw the advent of simulated annealing, a universal heuristic optimization method that found many applications. The next decade introduced powerful numerical placement methods for millions of cells. Soon after, physical synthesis was born by combining several incremental synthesis and analysis tools. Today’s commercial EDA tools run a very complex design flow that chains together hundreds of algorithms that were developed over 60 decades. Most effort is in the careful fine-tuning of parameters and addressing the complex – and often surprising – algorithmic interactions. This is a difficult trial-and-error process, driven by a small set of benchmarks. Machine Learning methods will take some of the human tuning efforts out of this loop. Some have already found their way in commercial tools. It will take a while before a Machine Learning method fully replaces a ‘traditional’ EDA algorithm. Each method in the flow has a limited sweet spot and is often run-time critical. On the other hand, conventional algorithms leave only insignificant opportunities for speed up through parallelism. Machine Learning methods may provide the only viable way to unlock the potential of massive cloud computing resources.

Embracing Machine Learning in EDA

  • Haoxing Ren

The application of machine learning (ML) in EDA is a hot research trend. To use ML in EDA, it is nature to think from the ML method point of view, i.e. supervised learning, reinforcement learning and unsupervised learning. Based on this point of view, we can roughly classify the ML applications in EDA into three categories: prediction, optimization, and generation. The prediction category applies supervised learning methods to predict design quality of result (QoR) metrics. There are two kinds of QoR metrics that benefit from the prediction. One kind of metrics are those that can be determined at the current design stage but calculating them consumes a lot of computing resources. For ex-ample, [11] [12] leverage ML to predict circuit power consumption without expensive simulations. The other kind of metrics are those that depend on future design stages. For example, [8] predicts post layout parasitics from schematic of analog circuits. The optimization category applies Bayesian Optimization (BO)and reinforcement learning (RL) to directly optimize EDA problems.BO treats the optimization objective as a blackbox function and tries to find optimal solutions by iteratively sampling the solution space. For example, [5] proposes to use BO with graph embedding and neural network-based surrogate model to size analog circuits. RL treats the optimization objective as the reward from an environment, and trains agents to maximize the reward. [7] proposes to use RL to optimize macro placement, and [9] proposes to use RL to optimize parallel prefix circuit structures. The generation category applies generative models such as generative adversarial networks (GANs) to directly generate solutions to EDA problems. Generative models can learn from previous optimized data distribution and generate solutions for a new problem instance without going through iterative processes like BO or RL. For example, [10] builds a conditional GAN model that learns to generate optical proximity correction (OPC) layout from the original mask.

What’s So Hard About (Mixed-Size) Placement?

  • Mohammad Khasawneh
  • Patrick H. Madden

For years, integrated circuit design has been a driver for algorithmic advances. The problems encountered in the design of modern circuits are often intractable — and with exponentially increasing size. Efficient heuristics and approximations have been essential to sustaining Moore’s Law growth, and now almost every aspect of the design process is heavily automated. There is, however, one notable exception: there is often substantial floor planning effort from human designers to position large macro blocks. The lack of full automation on this step has motivated the exploration of novel optimization methods, most recently with reinforcement learning. In this paper, we argue that there are multiple forces which have prevented full automation — and a lack of algorithmic methods is not the only factor. If the time has come for automation, there are a number of “traditional” methods that should be considered again. We focus on recursive bisection, and highlight key ideas from partitioning algorithms that have broader impact than one might expect. We also stress the importance of benchmarking as a way to determine which approaches may be most effective.

Scalability and Generalization of Circuit Training for Chip Floorplanning

  • Summer Yue
  • Ebrahim M. Songhori
  • Joe Wenjie Jiang
  • Toby Boyd
  • Anna Goldie
  • Azalia Mirhoseini
  • Sergio Guadarrama

Chip floorplanning is a complex task within the physical design process, with more than six decades of research dedicated to it. In a recent paper published in Nature~\citemirhoseini2021graph, a new methodology based on deep reinforcement learning was proposed that solves the floorplanning problem for advanced chip technologies with production quality results. The proposed method enables generalization, which means that the quality of placements improves as the policy is trained on a larger number of chip blocks. In this paper, we describe Circuit Training, an open-source distributed reinforcement learning framework that re-implements the proposed methodology in TensorFlow v2.x. We will explain the framework and discuss ways it can be extended to solve other important problems within physical design and more generally chip design. We also show new experimental results that demonstrate the scaling and generalization performance of Circuit Training.

SESSION: Session 5: Second Keynote

Session details: Session 5: Second Keynote

  • Louis K. Scheffer

The Cerebras CS-2: Designing an AI Accelerator around the World’s Largest 2.6 Trillion Transistor Chip

  • Jean-Philippe Fricker

The computing and memory demands from state-of-the-art neural networks have increased several orders of magnitude in just the last couple of years, and there’s no end in sight. Traditional forms of scaling chip performance are necessary but far from sufficient to run the machine learning models of the future. In this talk, Cerebras Co-Founder and Chief Systems Architect Jean-Philippe Fricker will explore the fundamental properties of neural networks and why they are not well served by traditional architectures. He will examine how co-design can relax the traditional boundaries between technologies and enable designs specialized for neural networks with new architectural capabilities and performance. Finally, Jean-Philippe will explore this rich new design space using the Cerebras architecture as a case study, highlighting design principles and tradeoffs that enable the machine learning models of the future.

SESSION: Session 6: Third Keynote

Session details: Session 6: Third Keynote

  • Chuck Alpert

Leveling Up: A Trajectory of OpenROAD, TILOS and Beyond

  • Andrew B. Kahng

Since June 2018, the OpenROAD project has developed an open-source, RTL-to-GDS EDA system within the DARPA IDEA program. The tool achieves no-human-in-loop generation of design-rule clean layout in 24 hours. This enables system innovation and design space exploration, while also democratizing hardware design by lowering barriers of cost, expertise and risk. Since November 2021, The Institute for Learning-enabled Optimization at Scale (TILOS), an NSF AI institute for advances in optimization partially supported by Intel, has begun its work toward a “new nexus” of AI, optimization, and the leading edge of practice for use domains that include IC design. This paper traces a trajectory of “leveling up” in the research enablement for IC physical design automation and EDA in general. This trajectory has OpenROAD and TILOS as waypoints, and advances themes of openness, infrastructure, and culture change.

SESSION: Session 7: Prototyping, Packaging, and Integration

Session details: Session 7: Prototyping, Packaging, and Integration

  • Tiago Reimann

3DIC Design: Challenges and Opportunities in System-of-Chips Integration

  • Ming Zhang

Technology scaling has enabled the semiconductor industry to successfully address the application performance demands over the past three decades. However, the cost, complexity and diminishing returns of the classic Moore’s Law scaling is accelerating the migration from traditional system-on-chip design to systems-of-chips design consisting of 3D heterogenous integration systems that open a new dimension to improve density, bandwidth, performance, power, and cost. Designing such 3D systems has its own challenges – to enable them, we need to look beyond piece-meal tooling to more hyperconvergent design systems that provide the comprehensive technological solution and productivity gains. This talk will outline the promise of the 3D system-of-chips design and present key design and verification challenges faced by the engineering teams associated with the development of such systems. It will discuss how a holistic design solution consisting of end-to-end design automation, integrated tools, die-to-die IP and methodologies can provide unique benefits in system-level design flow optimization and pave the way to achieving optimal power, performance and transistor volume density to drive the next wave of transformative products.

Novel Methodology for Assessing Chip-Package Interaction Effects onChip Performance

  • Armen Kteyan
  • Jun-Ho Choy
  • Valeriy Sukharev
  • Massimo Bertoletti
  • Carmelo Maiorca
  • Rossana Zadra
  • Massimo Inzaghi
  • Gabriele Gattere
  • Giancarlo Zinco
  • Paolo Valente
  • Roberto Bardelli
  • Alessandro Valerio
  • Pierluigi Rolandi
  • Mattia Monetti
  • Valentina Cuomo
  • Salvatore Santapà

The paper presents a multiscale simulation methodology and EDA tool that assesses the effect of thermal mechanical stresses arising after die assembly on chip performance. Existing non uniformities of feature geometries and composite nature of on-chip interconnect layers are addressed by developed methodology of the anisotropic effective thermomechanical material properties (EMP) that reduces complexity of FEA simulations and enhances the accuracy and performance. Physical nature of the calculated EMP makes it scalable with the simulation grid size, which enables resolution of stress/strain at different scales from package to device channel. With feature-scale resolution, the tool enables accurate calculation of stress components in the active region of each device, where the carrier mobility variation results in deviations of circuits performance. The tool’s capability of back-annotation of the hierarchic Spice netlist with the stress values allows a user to perform circuit simulation in different stress environments, by placing the circuit block in different locations in the layout characterized by different distances from the stress sources, such as die edges and C4 bumps. Both schematic and post-layout netlists can be employed for finding optimal floorplan minimizing the stress impact at early design stages, as well as for the final design sign-off. Electrical measurements on a specially designed test-package were used for validation of the methodology. Good agreement between measured and simulated variations of device characteristics has been demonstrated.

On Ensuring Congruency with Implementation During Emulation and Prototyping

  • Alex Rabinovitch

ASIC-style design implementation ensures a certain degree of determinism in design behavior when it comes to glitches in clock cones and hold violations. Emulation and prototype products must follow the same deterministic rules of behavior in order to match the behavior of the real chip. Those techniques are surveyed and shown to be inherently rooted in modelling the timeline in a manner that creates an artificial common source of synchronization between different clocks in design. Also the capability of low skew clock lines provided by FPGA vendors is leveraged. However, this overall approach could result in performance degradation and techniques are presented to compensate for the degradation. It is an open question whether these methods could potentially benefit the Implementation which is presently using a rather different method to solve similar problems.

SESSION: Session 8: 3D IC Design

Session details: Session 8: 3D IC Design

  • Lang Feng

Challenges and Solutions for 3D Fabric: A Foundry Perspective

  • Sandeep Kumar Goel

3D ICs have increasingly become popular as they provide a way to pack more functionality on a chip and reduce manufacturing cost. TSMC offers a number of packaging technologies under the umbrella of “3D Fabric” to suit different product requirements. Just like any new technology, 3D Fabric brings forward several challenges associated with system, design, thermal as well as testing that require effective and efficient solutions before 3D Fabric can be used in high volume production. In this presentation, we will give a brief introduction about various 3D Fabric offerings and discuss challenges from a semiconductor foundry perspective. Next, we present an overview of solutions along with what EDA needs to solve. Lastly, how various IEEE Standards such as 1838, and 1149.1 can help in streamlining and standardizing testing approaches for 3D Fabrics will be discussed.

Recent Advances and Future Challenges in 2.5D/3D Heterogeneous Integration

  • Tanay Karnik

In this presentation, we will review the recent advances in chiplet-based commercial products and prototypes [2,3,4,5]. Most chiplet usage has been confined to integrating die designed by the same organization applied to building chips for the same product types. The right approach should be able to reduce portfolio costs, scale innovation and improve time to solution [1]. It is important to manage the associated trade-offs, such as thermal, power, I/O escapes, assembly, test, etc. We will conclude the talk by presenting the future 2.xD/3D integration opportunities becoming available [6].

ART-3D: Analytical 3D Placement with Reinforced Parameter Tuning for Monolithic 3D ICs

  • Gauthaman Murali
  • Sandra Maria Shaji
  • Anthony Agnesina
  • Guojie Luo
  • Sung Kyu Lim

In this paper, we show that true 3D placement approaches, enhanced with reinforcement learning, can offer further PPA improvements over pseudo-3D approaches. To accomplish this goal, we integrate an academic true 3D placement engine into a commercial-grade 3D physical design flow, creating ART-3D flow (Analytical 3D Placement with Reinforced Parameter Tuning-based 3D flow). We use a reinforcement learning (RL) framework to find optimized placement parameter settings of the true 3D placement engine for a given netlist and perform high-quality 3D placement. We then use an efficient 3D optimization and routing engine based on a commercial place and route (P&R) tool to maintain or improve the benefits reaped from true 3D placement till design signoff. We evaluate our 3D flow by designing several gate-only and processor benchmarks on a commercial 28nm technology node. Our proposed 3D flow involving true 3D placement offers the best PPA results compared to existing 3D P&R flows and reduces power consumption by up to 31%, improves effective frequency by up to 25%, and therefore reduces power-delay product by up to 43% compared with commercial 2D IC design flow. These improvements predominantly come from RL-based parameter tuning, as it improves the performance of the 3D placer by up to 12%.

Intelligent Design Automation for Heterogeneous Integration

  • Iris Hui-Ru Jiang
  • Yao-Wen Chang
  • Jiun-Lang Huang
  • Charlie Chung-Ping Chen

As the design complexity grows dramatically in modern circuit designs, 2.5D/3D heterogeneous integration (HI) becomes effective for system performance, power, and cost optimization, providing promising solutions to the increasing cost of more-Moore scaling. In this talk, we investigate the chip, package, and board co-design methodology with advanced packages and optical communication considering essential issues on physical design, electrical, thermal, and mechanical effects, timing, and testing, and suggest future research opportunities. Layout: A robust and vertically integrated physical design flow for HI design is needed. We address chip-, package-, and board-level component planning, package-level RDL routing, board-level routing, optical routing, and placement and routing considering warpage and thermal effects. Timing: New chip-level and cross-chip timing analysis techniques are desired. We address timing propagation under current source delay model (CSM), timing analysis and optimization for optical-electrical routing, multi-corner multi-mode analysis for HI, hierarchical MCMM analysis. Testing: The scope covers functional-like test generation, System-in-Package (SiP) online testing, photonic integrated circuits (PIC) testing and design-for-test (DfT), etc. Integration: We shall address chip, package, and board co-design considering multi-domain physics, including physical, electrical, thermal, mechanical, and optical effects and optimization.

SESSION: Session 9: Routing

Session details: Session 9: Routing

  • Jhih-Rong Gao

A Reinforcement Learning Agent for Obstacle-Avoiding Rectilinear Steiner Tree Construction

  • Po-Yan Chen
  • Bing-Ting Ke
  • Tai-Cheng Lee
  • I-Ching Tsai
  • Tai-Wei Kung
  • Li-Yi Lin
  • En-Cheng Liu
  • Yun-Chih Chang
  • Yih-Lang Li
  • Mango C.-T. Chao

This paper presents a router, which tackles a classic algorithm problem in EDA, obstacle-avoiding rectilinear Steiner minimum tree (OARSMT), with the help of an agent trained by our proposed policy-based reinforcement-learning (RL) framework. The job of the policy agent is to select an optimal set of Steiner points that can lead to an optimal OARSMT based on a given layout. Our RL framework can iteratively upgrade the policy agent by applying Monte-Carlo tree search to explore and evaluate various choices of Steiner points on various unseen layouts. As a result, our policy agent can be viewed as a self-designed OARSMT algorithm that can iteratively evolves by itself. The initial version of the agent is a sequential one, which selects one Steiner point at a time. Based on the sequential agent, a concurrent agent can then be derived to predict all required Steiner points with only one model inference. The overall training time can be further reduced by applying geometrically symmetric samples for training. The experimental results on single-layer 15×15 and 30×30 layouts demonstrate that our trained concurrent agent can outperform a state-of-the-art OARSMT router on both wire length and runtime.

LEO: Line End Optimizer for Sub-7nm Technology Nodes

  • Diwesh Pandey
  • Gustavo E. Tellez
  • James Leland

Sub-7nm technology nodes have introduced new challenges, specifically in the lower metal layers. Extreme Ultraviolet Lithography (EUV) and multi-patterning-based lithography such as Self-Aligned Double Patterning (SADP) solutions have become key choices for the manufacturing of these layers. The demand for microprocessors has increased tremendously in the last few years and this imposes another challenge to the chip manufacturers to build their products at a very rapid rate. These days a mix of different lithography solutions for the manufacturing of metal layers is quite common. We propose a first-of-its-kind routing plugin which solves design rule violations for multiple lithography technologies without making any changes in the existing routers. Our plugin consists of a practical line-end optimization (LEO) algorithm, which solves most line-end problems in a few minutes, even for very large designs. Our solution is implemented in the development of a 7nm, industrial microprocessor design.

Routing Layer Sharing: A New Opportunity for Routing Optimization in Monolithic 3D ICs

  • Sai Pentapati
  • Sung Kyu Lim

A 3D Integrated Circuit consists of two or more dies bonded to each other in the vertical direction. This allows for a high transistor density without a need for shrinking the underlying transistor dimensions. While it has been shown to improve design power, performance, and area (PPA) due to the stacked Front End Of the Line (FEOL) layers, the Back End Of the Line (BEOL) structure of the stacked IC also allows for novel routing scenarios. With the split dies in 3D, nets would need to connect cells from different tiers, across many vertical layers and multiple FEOLs. More importantly, nets connecting cells in a single tier could still use metal layers from the BEOL of other tiers to complete routing. This is referred to as routing / metal layer sharing. While such sharing creates additional 3D connections, it can also be utilized to improve several aspects of the design such as cost, routing congestion, and performance. In this paper, we analyze the nets with metal layer sharing in 3D and provide ways to control the number of 3D connections. We show that the configuration of the 3D BEOL stack helps with metal layer cost reduction with up to 1-2 fewer layers needed to complete routing without a noticeable timing impact. Sharing also allows for a better distribution of wirelength in the BEOL stack that can achieve significant reduction in metal layer congestion of top most layer by up to a 50% reduction of its track usage. Finally, we also see performance benefits of up to 16% with the help of metal layer sharing in 3D IC design.

SESSION: Session 10: Fourth Keynote

Session details: Session 10: Fourth Keynote

  • Jens Lienig

Triple-play of Hyperconvergency, Analytics, and AI Innovations in the SysMoore Era

  • Aiqun Cao

The SysMoore Era can be characterized as the widening gap between classic Moore’s Law scaling and increasing system complexity. System-on-a-chip complexity has now fallen by the wayside to systems-of-chips with the need for smaller process nodes, and multi-die integration. With engineers now handling not just larger chip designs but systems comprised of multiple chips, the focus on user productivity and design robustness becomes a major factor in getting designs to market in the fastest time and with the best possible PPA. Combining a hyperconvergent design flow with smart data analytics and AI-based solution space exploration provides a huge benefit to the engineers tasked with completing these systems. This presentation outlines the challenges and the road to a triple-play solution that gets design engineers out of their late inning jams.

SESSION: Session 11: Lifetime Achievement Commemoration for Ricardo Reis

Session details: Session 11: Lifetime Achievement Commemoration for Ricardo Reis

  • Jose Luiz Guntzel

A Lifetime of Physical Design Automation and EDA Education: ISPD 2022 Lifetime Achievement Award Bio

  • Ricardo Augusto da Luz Reis

The 2022 International Symposium on Physical Design lifetime achievement award goes to Prof. Ricardo Reis for his instrumental impact on EDA research in South America and contributions to the physical design community.

Design and Optimization of Quantum Electronic Circuits

  • Giovanni De Micheli

Quantum electronic circuits where the logic information is processed and stored in single flux quanta promise efficient computation in a performance/power metric, and thus are of utmost interest as possible replacement or enhancement of CMOS. Several electronic device families leverage superconducting materials and transitions between resistive and superconducting states. Information is coded into bits with deterministic values – as opposed to qubits used in quantum computing. As an example, information can be coded into pulses. Logic gates can be modeled as finite-state machines, that emit logic outputs in response to inputs. The most natural realization of such circuits is through synchronous implementations, where a clock stimulus is transmitted to every logic gate and where logic depth is balanced at every input to achieve full synchrony. Novel superconducting realization families try to go beyond the limitations of synchronous logic with approaches reminiscent of asynchronous design style and leveraging information coding. Moreover, some superconducting families exploit adiabatic operation, in the search for minimizing energy consumption. Design automation for quantum electronic logic families is still in its infancy, but important results have been achieved in terms of automatic balancing and fanout management. The combination of these problems with logic restructuring poses new challenges, as the overall problem is more complex as compared to CMOS and algorithms and tools cannot be just adapted. This presentation will cover recent advancement in design automation for superconducting electronic circuits as well as address future developments in the field.

Physical Design at the Transistor Level Beyond Standard-Cell Methodology

  • Renato Hentschke

This talk offers a review of possibilities to explore on VLSI layout beyond traditional standard cell methodology. Existing Physical Design tools strictly avoid any modification to the contents of Standard Cells. Here, a post-processing step based on SAT solvers is proposed to obtain optimal solutions for local transistor level layout synthesis problems. This procedure can be constrained by metrics that ensure that quality is not degraded, and an acceptable and better-quality timing model can be rebuilt for the block. These problems and techniques are open research opportunities in Physical Design as they are not sufficiently explored in the literature and can bring significant improvements to the quality of a VLSI circuit.

Physical Design Optimization, From Past to Future

  • Ricardo Augusto da Luz Reis

By the end of years 70s, microprocessors were designed by hand showing excellent layout compaction. It will be shown some highlights of the reverse engineering of the Z8000, which control part was designed by hand, showing several layout optimization strategies. The observation of the Z8000 layout inspired the research of methods to do the automatic generation of the layout of any transistor network, allowing to reduce the number of transistors to implement a circuit, and by consequence, the leakage power. Some of the layout automation tools developed by our group are briefly presented.

SESSION: Session 12: Fifth Keynote

Session details: Session 12: Fifth Keynote

  • Bei Yu

Accelerating the Design and Performance of Next Generation Computing Systems with GPUs

  • Sameer Halepete

The last few years have seen an accelerating growth in the demand for new silicon designs, even as the size and complexity of those designs has increased. However, the gains in design productivity necessary to implement these designs efficiently have not kept up. We need more than an order of magnitude increase in design productivity by the end of the decade to keep up with demand. Traditional methods for improving physical design tool capabilities are running out of steam, and there is a strong need for new approaches. Over the last two decades, we have seen other areas of computer science such as computer vision, speech recognition and natural language processing reach similar plateaus in performance, and each has been able to break out of the stall using GPU accelerated computing and machine learning. There is a similar opportunity in EDA but it will require a rethinking of the way these tools are implemented. The talk will cover where the demand for new silicon designs is coming from, what the productivity bottlenecks are, and then describe some advances in GPUs that could enable us to break through these bottlenecks with some examples.

SESSION: Session 13: Advances in Analog and Full Custom Design Automation

Session details: Session 13: Advances in Analog and Full Custom Design Automation

  • Mark Po-Hung Lin

Optimized is Not Always Optimal – The Dilemma of Analog Design Automation

  • Juergen Scheible

The vast majority of state-of-the-art integrated circuits are mixed-signal chips. While the design of the digital parts of the ICs is highly automated, the design of the analog circuitry is largely done manually; it is very time-consuming; and prone to error. Among the reasons generally listed for this is often the attitude of the analog designer. The fact is that many analog designers are convinced that human experience and intuition are needed for good analog design. This is why they distrust the automated synthesis tools. This observation is quite correct, but this is only a symptom of the real problem. This paper shows that this phenomenon is caused by very concrete technical (and thus very rational) issues. These issues lie in the mode of operation of the typical optimization processes employed for the synthesizing tasks. I will show that the dilemma that arises in analog design with these optimizers is the root cause of the low level of automation in analog design. The paper concludes with a review of proposals for automating analog design.

Analog/Mixed-Signal Layout Optimization using Optimal Well Taps

  • Ramprasath S
  • Meghna Madhusudan
  • Arvind K. Sharma
  • Jitesh Poojary
  • Soner Yaldiz
  • Ramesh Harjani
  • Steven M. Burns
  • Sachin S. Sapatnekar

Well island generation and well tap placement pose an important challenge in automated analog/mixed-signal (AMS) layout. Well taps prevent latchup within a radius of influence in a well island, and must cover all devices. Automated AMS layout flows typically perform well island generation and tap insertion as a postprocessing step after placement. However, this step is intrusive and potentially alters the placement, resulting in increased area, wire length, and performance degradation. This work develops a graph-based optimization that integrates well island generation, well tap insertion, and placement. Its efficacy is demonstrated within a stochastic placement engine. Experimental results show that this approach generates better area, wire length and performance metrics than traditional methods, at the cost of a marginal runtime degradation.

Analog Synthesis – The Deterministic Way

  • Helmut Graeb

While the majority of research in design automation for analog circuits has been relying on statistical solution approaches, deterministic approaches are an attractive alternative. This paper gives a few examples of deterministic methods for sizing, structural synthesis and layout synthesis of analog circuits, which have been developed over the past decades. It starts from the so-called characteristic boundary curve for interactive parameter optimization, and ends at recent approaches for structural synthesis of operational amplifiers based on functional block composition. A deterministic approach to analog placement and to yield optimization will also be described. The central role of structural analysis of circuit netlists in these approaches will be explained. A summary of the underlying mindset of analog design automation and an outlook on future opportunities for deterministic sizing and layout synthesis concludes the paper.

AutoCRAFT: Layout Automation for Custom Circuits in Advanced FinFET Technologies

  • Hao Chen
  • Walker J. Turner
  • Sanquan Song
  • Keren Zhu
  • George F. Kokai
  • Brian Zimmer
  • C. Thomas Gray
  • Brucek Khailany
  • David Z. Pan
  • Haoxing Ren

Despite continuous efforts in layout automation for full-custom circuits, including analog/mixed-signal (AMS) designs, automated layout tools have not yet been widely adopted in current industrial full-custom design flows due to the high circuit complexity and sensitivity to layout parasitics. Nevertheless, the strict design rules and grid-based restrictions in nanometer-scale FinFET nodes limit the degree of freedom in full-custom layout design and thus reduce the gap between automation tools and human experts. This paper presents AutoCRAFT, an automatic layout generator targeting region-based layouts for advanced FinFET-based full-custom circuits. AutoCRAFT uses specialized place-and-route (P&R) algorithms to handle various design constraints while adhering to typical FinFET layout styles. Verified by comprehensive post-layout analyses, AutoCRAFT has achieved promising preliminary results in generating sign-off quality layouts for industrial benchmarks.

SESSION: Session 14: Panel on Challenges and Approaches in VLSI Routing

Session details: Session 14: Panel on Challenges and Approaches in VLSI Routing

  • Gracieli Posser

Challenges and Approaches in VLSI Routing

  • Gracieli Posser
  • Evangeline F.Y. Young
  • Stephan Held
  • Yih-Lang Li
  • David Z. Pan

In this paper, we will first have a brief review of the ISPD 2018 and 2019 Initial Detailed Routing Contests. We will then visit a few important and interesting topics in VLSI routing that includes GPU accelerated routing, signal speed optimization in routing, PCB routing and AI-driven analog routing.

Challenges for Automating Package Routing

  • Wen-Hao Liu
  • Bing Chen
  • Hua-Yu Chang
  • Gary Lin
  • Zi-Shen Lin

Package routing is typically done by semi-auto or manual manners in order to meet several customized requests for different design styles. However, in recent years, the scale of package designs rapidly enlarges, and routing rules become more and more complicated, such that the engineering effort of the manual solution increases dramatically. Therefore, the need of full-auto solution becomes necessary and critical. In addition, in order to build an automatic design flow for 3D-IC, full-auto package routing is one of most important pieces. There are many challenges for realizing full-auto package routing solution. Some of the challenges will be introduced in this paper.

SESSION: Session 15: Global Placement, Macro Placement, and Legalization

Session details: Session 15: Global Placement, Macro Placement, and Legalization

  • Joseph Shinnerl

Congestion and Timing Aware Macro Placement Using Machine Learning Predictions from Different Data Sources: Cross-design Model Applicability and the Discerning Ensemble

  • Xiang Gao
  • Yi-Min Jiang
  • Lixin Shao
  • Pedja Raspopovic
  • Menno E. Verbeek
  • Manish Sharma
  • Vineet Rashingkar
  • Amit Jalota

Modern very large-scale integration (VLSI) designs typically use a lot of macros (RAM, ROM, IP) that occupy a large portion of the core area. Also, macro placement being an early stage of the physical design flow, followed by standard cell placement, physical synthesis (place-opt), clock tree synthesis and routing, etc., has a big impact on the final quality of result (QoR). There is a need for Electronic Design Automation (EDA) physical design tools to provide predictions for congestion, timing, and power etc., with certainty for different macro placements before running time-consuming flows. However, the diversity of IC designs that commercial EDA tools must support and the limited number of similar designs that can provide training data, make such machine learning (ML) predictions extremely hard. Because of this, ML models usually need to be completely retrained for unseen designs to work properly. However, collecting full flow macro placement ML data is time consuming and impractical. To make things worse, common ML methods, such as regression, support vector machine (SVM), random forest (RF), neural network (NN) in general, lack a good estimation of prediction accuracy or confidence and lack debuggability for cross-design applications. In this paper, we present a novel discerning ensemble technique for cross-design ML prediction for macro placement. We developed our solution based on a large number of designs with different design styles and technology nodes, and tested the solution on 8 leading-edge industry designs and achieved comparable or even better results in a few hours (per design) than manual placement results that take many engineers weeks or even months to achieve. Our method shows great promise for many ML problems in EDA applications, or even in other areas.

Global Placement Exploiting Soft 2D Regularity

  • Donghao Fang
  • Boyang Zhang
  • Hailiang Hu
  • Wuxi Li
  • Bo Yuan
  • Jiang Hu

Cell placement is such a critical step for chip physical design that it needs many kinds of efforts for improvement. Recently, designs with 2D processing element arrays have become popular primarily due to their deep neural network computing applications. The 2D array regularity is similar to but different from the regularity of conventional datapath designs. To exploit the 2D array regularity, this work develops a new global placement technique built upon RePlAce, the latest state-of-the-art placement framework. Experimental results from various designs show that the proposed technique can reduce half-perimeter wirelength and Steiner tree wirelength by about $6%$ and $12%$, respectively.

Linear-time Mixed-Cell-Height Legalization for Minimizing Maximum Displacement

  • Chung-Hsien Wu
  • Wai-Kei Mak
  • Chris Chu

Due to the aggressive scaling of advanced technology nodes, multiple-row-height cells have become more and more common in VLSI design. Consequently, the placement of cells is no longer independent among different rows, which makes the traditional row-based legalization techniques obsolete. In this work, we present a highly efficient linear-time mixed-cell-height legalization approach that optimizes both the total cell displacement and the maximum cell displacement. First, a fast window-based cell insertion technique introduced in [4] is applied to obtain a feasible initial row assignment and cell ordering which is known to be good for total displacement consideration. In the second stage, we use an iterative cell swapping algorithm to change the row assignment and the cell order of the critical cells for maximum displacement reduction. Then we develop an optimal linear time DAG-based fixed row and fixed order legalization algorithm to minimize the maximum cell displacement. Finally, we propose a cell shifting heuristic to reduce the total cell displacement without increasing the maximum cell displacement. Using the proposed approach, the quality provided by the global placement can be preserved as much as possible. Compared with the state-of-the-art work [4], experimental results show that our proposed algorithm can reduce the maximum cell displacement by more than 11% on average with similar average cell displacement.

SESSION: Session 16: Sixth Keynote

Session details: Session 16: Sixth Keynote

  • Ajay Joshi

Hardware Security: Physical Design versus Side-Channel and Fault Attacks

  • Ingrid Verbauwhede

What is “hardware” security? How can we improve trustworthiness in hardware circuits? Is there a design method for secure hardware design? To answer these questions, different communities have different expectations of trusted (expecting trustworthy) hardware components upon which they start to build a secure system. At the same time, electronics shrink: sensor nodes, IOT devices, smart electronics are becoming more and more available. In the past, adding security was only a concern for locked server rooms or now cloud servers. However, these days, our portable devices contain highly private and secure information. Adding security and cryptography to these often very resource constraint devices is a challenge. Moreover, they can be subject to physical attacks, including side-channel and fault attacks [1][2]. This presentation aims at bringing some order in the chaos of expectations by introducing the importance of a design methodology for secure design [3][5]. We will illustrate the capabilities of current side EM and laser fault passive and active attacks. In this context, we will also reflect on the role of physical design, place and route [4][6].

SESSION: Session 17: ISPD 2022 Contest Results and Closing Remarks

Session details: Session 17: ISPD 2022 Contest Results and Closing Remarks

  • David Chinnery

Benchmarking Security Closure of Physical Layouts: ISPD 2022 Contest

  • Johann Knechtel
  • Jayanth Gopinath
  • Mohammed Ashraf
  • Jitendra Bhandari
  • Ozgur Sinanoglu
  • Ramesh Karri

Computer-aided design (CAD) tools mainly optimize for power, performance, and area (PPA). However, given a large number of serious hardware-security threats that are emerging, future CAD flows must also incorporate techniques for designing secure integrated circuits (ICs). In fact, the stakes are quite high for IC vendors and design companies, as security risks that are not addressed during design time will inevitably be exploited in the field, where vulnerabilities are almost impossible to fix. However, there is currently little to no experience related to designing secure ICs available within the CAD community. For the very first time, this contest seeks to actively engage with the community to close this gap. The theme of this contest is security closure of physical layouts, that is, hardening the physical layouts at design time against threats that are executed post-design time. More specifically, this contest is focused on selected and seminal threats that, once taken in, are relatively simple to approach and mitigate through means of physical design: Trojan insertion and probing as well as fault injection. Acting as security engineers, contest participants will iteratively and proactively evaluate and fix the vulnerabilities of provided benchmark layouts. Benchmarks and submissions are based on the generic DEF format and related files. Thus, participants are free to use any physical-design tools of their choice, helping us to open up the contest to the community at large.

Who’s Can Li

April 1st, 2022

Can Li

Assistant Professor

The University of Hong Kong

Email:

canl@hku.hk

Personal webpage

http://canlab.hku.hk

Research interests

Neuromorphic computing, nanoelectronics devices, non-volatile memories, software-hardware co-optimization

Short bio

Dr. Can Li is currently an Assistant Professor at the Department of Electrical and Electronic Engineering of the University of Hong Kong, working on analog and neuromorphic computing accelerators based on post-CMOS emerging devices (e.g. memristors), for efficient machine/deep learning, network security, signal processing, etc. Before that, He spent two years at Hewlett Packard Labs in Palo Alto, California, and obtained his Ph.D. from University of Massachusetts, Amherst, and B.S./M.S. from Peking University. He is a recipient of the Early Career Award by HKSAR RGC and the Excellent Young Scientist Fund Award by NSFC.

Reasearch highlights

Can Li has made contributions to the in-memory computing technology based on non-volatile memory devices. At the device level, he fabricated and characterized different resistive switching or memristive devices with different material stacks, including Cu/SiOx/Pt, Pt/SiOx/Pt, Si/SiOx/Si, Ta/HfOx/Pt, etc. The potential of this type of device was also demonstrated by Can Li and colleagues’ work on three-dimensional (3D) stacking and integration (up to eight layers), and ultimate scaling down to 2 nm×2 nm. At the array level, he integrated memristors (2 µm×2 µm and 50 nm×50 nm) with silicon transistors from commercial foundries and demonstrated high-yield and good analog programming ability. At the circuit level, he designed and developed analog circuits for analog content addressable memory in a 6-transistor 2-memristor (6T2M) configuration. Can Li was closely involved in designing, taping out, and evaluating peripheral circuits for matrix multiplication accelerators. At the systems level, he showcased the memristor-based system in potential applications such as artificial intelligence, analog signal/image processing, pattern matching, solving optimization problems, hardware security, etc. Those studies have been documented in many high-profile publications, including Nature Electronics, Nature Machine Intelligence, Nature Nanotechnology, Nature Communications, Advanced Materials, IEDM, etc.

Who’s Johann Knechtel

April 1st, 2022

Johann Knechtel

Research Scientist

New York University Abu Dhabi, UAE

Email:

johann@nyu.edu

Personal webpage

https://wp.nyu.edu/johann/

Research interests

Hardware Security, Electronic Design Automation (EDA), 3D Integration, Emerging Technologies, Machine Learning

Short bio

Dr.-Ing. Johann Knechtel is a Research Scientist with the Design for Excellence Lab at New York University (NYU) Abu Dhabi, UAE. In this position, he is acting as Co-PI for multiple research projects and provides lecturing, training, and mentoring to PhD and undergraduate students. Johann received the Dipl.-Ing. degree (M.Sc.) in Information Systems Engineering in 2010 and the Dr.-Ing. degree (Ph.D.) in Computer Engineering (summa cum lauda, with highest honors) in 2014, both from TU Dresden (TUD), Germany. Before joining NYU Abu Dhabi in 2016, Johann was a Postdoctoral Researcher in 2015 at the Masdar Institute of Science and Technology, UAE, where he was affiliated with the Twinlab on “3D Stacked Chips”, hosted by Masdar Institute and TUD and supported by industry and government partners. In 2012 he was with the Chinese University of Hong Kong, China, and in 2010 he was with the University of Michigan, USA. In 2006, he was working as Freelance Software Engineer for Siemens IT Solutions and Services, Germany; in 2006–08 he was working as Research Assistant for Fraunhofer IWS Institute, Dresden, Germany; and in 2008–09 he was working as Embedded Systems Intern for TraceTronic GmbH, Dresden, Germany. In 2017, Johann and his team achieved the 1st place in the CSAW Applied Research Competition. Johann obtained scholarships from the German Academic Exchange Service (DAAD) in 2010, from the German Research Foundation (DFG) in 2010–14, and from the Graduate Academy of TU Dresden in 2014. Johann obtained an NYU Research Enhancement Fund in 2018–21. Johann has (co-)authored around 60 publications, including 15 highlighted and/or invited papers. Johann is an active member of the ACM, including ACM SIGDA, and IEEE. He is serving as peer reviewer for various top-tier ACM and IEEE conferences and journals.

Reasearch highlights

Johann is acting as Co-PI for multiple projects with the common goal of advancing hardware security. Johann’s work involves five PhD students and postdoctoral researchers at NYU Abu Dhabi and also covers collaborations with around 15 researchers and students at prestigious institutions worldwide. Johann’s work is currently focused on the following themes: 1. Security closure for physical design of integrated circuits (ICs); 2. Protection of IC design intellectual property, with advanced techniques proposed for split manufacturing and obfuscation utilizing interconnect fabrics as well as 2.5D and 3D integration; 3. Secure architectures and secure system integration based on chiplets and 2.5D integration; 4. Machine learning-driven security evaluation at design time of defense schemes like split manufacturing and logic locking; 5. Security evaluation of ICs and field-programmable gate arrays (FPGAs) using advanced electro-magnetic field and laser-assisted optical probing; 6. Design-time security evaluation of ICs against side-channel attacks; 7. Security promises and challenges of emerging technologies for various defense schemes; 8. Security-aware electronic design automation (EDA) flows for 2D, 2.5D, and 3D ICs. Johann has successfully published on these and other themes. Recent examples include two invited papers at ICCAD 2021 on security closure for physical design, two invited papers at ISPD 2020–21 on hardware security for and beyond CMOS devices, an invited paper at DATE 2020 on the role of EDA for secure composition of ICs, and invited papers at IOLTS and COINS 2019 on 3D integration as another dimension toward hardware security and on design IP protection, respectively. Furthermore, Johann and colleagues have recently compiled a book “The Next Era in Hardware Security: A Perspective on Emerging Technologies for Secure Electronics,” Springer, 2022, with already 1.7k full-text downloads as of today. Currently, Johann is acting as lead organizer for the first-ever international competition (co-hosted with ISPD 2022) on security closure. For that, research teams from all over the world are hardening the physical layouts of ICs at design time against selected attacks that are executed post-design time. This notion of security closure is quite complex, and besides the interest from the community with this contest and invited papers, Johann and colleagues are also in active discussion with government agencies on that challenge. Earlier, Johann and colleagues from TU Dresden, Germany, Google Inc., and Masdar Institute published a survey paper “Large-Scale 3D Chips: Challenges and Solutions for Design Automation, Testing, and Trustworthy Integration” in IPSJ Transactions on System LSI Design Methodology. Since its time of appearance in 2017, i.e., for five years already, this paper is constantly the most viewed article of that journal. In 2012, Johann published his first journal article in IEEE TCAD with colleagues from Michigan University, USA; at the time of appearance, this paper was the most popular article across all of that journal.

ACM SIGDA Speaker Travel Grant Program

The SIGDA Speaker Series Travel Grant actively supports the travels of the speakers who are invited to give lectures or talks in local events, universities, and companies, so as to disseminate the values and impact of SIGDA. These speakers can be from either academia or company and are considered as good lectures that can help reach out to the audiences in the broad field of design automation. Once the application is approved, SIGDA will issue partial grants to cover the speaker’s travel expenses, including travel and subsistence costs.

This grant is to help on promoting the EDA community and activities all over the world. It will provide travel support averaging $1,000 (USD) for approximately 6 eligible speakers per year to defray their costs of giving lectures or talks in local events, universities, and companies. Priority will be given to the applicants from the local sections of SIGDA with the speakers presenting in the events supported by the local sections of SIGDA. In addition, local EDA communities or individuals, rather than local sections of SIGDA, are also encouraged to apply for this grant. For the application or additional information, please contact SIGDA by sending an email exclusively to the Technical Activity Chair (https://www.sigda.org/about-us/officers/).

Review Process

The review committee will be formed by the current Technical Activity Chair and Education Chair of SIGDA. The reviews will be reported and discussed in SIGDA’s executive committee meeting. After the discussion, the executing committee members will vote to grant or not grant the submitted applications.

Selection Criteria

The review takes the applicants/events and speakers in considerations.

  • Preference is given to the local sections of SIGDA for the speakers invited to the events, universities, and companies supported by the local sections of SIGDA. In addition, the applicants from local EDA communities or individuals are also considered.
  • The invited speaker should be a good lecture or researcher from either academia or industry, and has a good track record in the broad field of design automation.

Post Applications – Report and Reimbursement

  • For the speaker giving a talk in an ACM event, SIGDA can support the travel grant and process reimbursements to the speaker directly. At the end of the event, the speaker needs to complete the ACM reimbursement form and send it to SIGDA or ACM Representative along with copies of the receipts. The speakers will also need to abide by the reimbursement policies/standards found here: https://www.acm.org/special-interest-groups/volunteer-resources/conference-planning/conference-finances#speaker
  • For the speaker giving a talk in a non-ACM event, SIGDA will provide the lump sum payment to the legal and financial sponsoring organization, which would offer the fund as the travel grants and process reimbursements. Meanwhile, the sponsoring organization needs to indicate on the event’s promotional materials that travel grants are being supported by SIGDA. At the end of the event, the sponsoring organization needs to provide (1) a one-page final report to SIGDA reflecting the success of their goals against the funds provided and indicating how the funds were spent, (2) an invoice for the approved amount, and (3) tax form. Note that there is no specific format for the final report.

Application Form

Sponsor

Synopsys

Who’s Xiaoming Chen

March 1st, 2022

Xiaoming Chen

Associate Professor

Institute of Computing Technology,
Chinese Academy of Sciences

Email:

chenxiaoming@ict.ac.cn

Personal webpage

http://people.ucas.edu.cn/~chenxm

Research interests

EDA and computer architecture

Short bio

Xiaoming Chen received the B.S. and Ph.D. degrees in Electronic Engineering from Tsinghua University, in 2009 and 2014, respectively. Since 2017, he has been an associate professor at Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS). Before joining ICT, CAS, he was a postdoctoral research associate in Electrical and Computer Engineering, Carnegie Mellon University from 2014 to 2016, and a visiting assistant professor in Computer Science and Engineering, University of Notre Dame from 2016 to 2017.

His research interests are mainly focused on EDA and computer architecture. He has published about 100 papers in top journals and conference proceedings, including DAC, ICCAD, DATE, HPCA, IEEE TCAD, IEEE TVLSI, IEEE TPDS, etc. He has served as a technical program committee member for DAC, ICCAD, ASP-DAC, GLSVLSI, AsianHOST, VLSI Design, etc. He was awarded the Excellent Young Scientists Fund of National Natural Science Foundation of China in 2021. He received the 2015 EDAA Outstanding Dissertation Award and the 2018 Alibaba DAMO Academy Young Fellow Award. He received one of the two best paper awards in ASP-DAC 2022 and several best paper nominations in ASP-DAC and ISLPED.

Reasearch highlights

Prof. Xiaoming Chen has spent more than 10 years in the EDA trarea. Specifically, he has developed a parallel sparse direct solver named NICSLU that is well suited for SPICE-based circuit simulators. He proposed a series of novel techniques to elevate the performance of solving highly sparse linear systems from circuit simulation applications, including a new matrix ordering method to minimize fill-ins, a hybrid dynamic scheduling method for parallel matrix factorization, a numerically stable pivoting reduction technique, and an adaptive numerical kernel selection method. NICSLU achieves much higher performance than other sparse solvers in circuit simulation applications, and is also generally faster than state-of-the-art GPU-based solvers which are specially designed for circuit matrices. NICSLU has been used in a number of academic studies, EDA tools and power system simulators. Some techniques have been adopted in commercial SPICE tools of a leading EDA company in China. NICSLU is available at https://github.com/chenxm1986/nicslu.

Prof. Chen has also made important contributions in computing-in-memory (CiM) architecture design. He exposed how to utilize the device-level CiM feature of resistive random-access memories (RRAMs) and ferroelectric field-effect transistors (FeFETs) which can act as both storage elements and switch units, to unify the computing and storage resources at the circuit level, to realize interchangeable computing and storage functionalities at the architecture level. In addition, he has investigated the solutions to some fundamental problems in CiM systems, including data coherence, data contention, simulation methodologies, task assignment, etc. He has also explored the CiM feature of RRAMs and FeFETs in various applications, and designed energy-efficient and high-performance accelerators for neural networks, graph processing, linear algebra, and robots.

Who’s Kai Ni

January 1st, 2022

Kai Ni

Assistant Professor

Rochester Institute of Technology

Email:

kai.ni@rit.edu

Personal webpage

https://www.needskai.org/

Research interests

Emerging Devices for AI Accelerator, Emerging Devices for Unconventioanl Computing

Short bio

Kai Ni received the B.S. degree in Electrical Engineering from University of Science and Technology of China, Hefei, China in 2011, and Ph.D. degree of Electrical Engineering from Vanderbilt University, Nashville, TN, USA in 2016 by working on characterization, modeling, and reliability of III-V MOSFETs. Since then, he became a postdoctoral associate at University of Notre Dame, working on ferroelectric devices for nonvolatile memory and novel computing paradigms. He is now an assistant professor in Electrical & Microelectronic Engineering at Rochester Institute of Technology. He has around 100 publications in top journals and conference proceedings, including Nature Electronics, IEDM, VLSI Symposium, IRPS, EDL, etc. He has served as technical program committee for DAC, DATE, ASPDAC, IRPS, EDTM. His current interests lie in nanoelectronic devices empowering unconventional computing, domain-specific accelerator, and memory technology.

Reasearch highlights

Kai Ni has made important contributions to the development of ferroelectric HfO2 based field effect transistor (FeFET) and its technology applications. On the technology side, he has proposed the ferroelectric metal field effect transistor, which has a metal-ferroelectric-metal-oxide-semiconductor gate stack and has the freedom of optimizing the gate stack, and superlattice structure for multi-level cell. He has developed several models for FeFET explaining different behaviors of FeFET, including a compact model based on the Preisach model of ferroelectric, a Kinetic Monte Carlo model to explain device variation, and a comprehensive model which can capture all the key ferroelectric behaviors. With these models, he also explored the exciting applications of FeFET for in-memory computing. Examples include the crossbar array for matrix-vector multiplication, content addressable memory array for associative search, hardware security circuit, and reconfigurable computing. All these research activities have been published in top journals, and premier conferences, such as IEDM, VLSI Symposium, DAC, DATE, etc.

Prof. Rob Rutenbar receives the 2021 ACM SIGDA Pioneering Achievement Award

The SIGDA award selection committee is honored to announce that Prof. Rob Rutenbar has been selected to receive the 2021 ACM SIGDA Pioneering Achievement Award.

for his pioneering work and extraordinary leadership in analog design automation and general EDA education.

As the highest technical distinction of ACM SIGDA, this award is to recognize the lifetime of outstanding achievements on Electronic Design Automation.

This award will be presented in SIGDA Annual Member Meeting and Dinner at ICCAD 2022.