# ISPD ’21: Proceedings of the 2021 International Symposium on Physical Design

Full Citation in the ACM Digital Library

## SESSION: Session 1: Opening Session and First Keynote

• Jens Lienig

### Physical Design for 3D Chiplets and System Integration

• Frank J.C. Lee

Heterogeneous three-dimensional (3-D) package-level integration plays an increasingly
important role in the design of higher functional density and lower power processors
for general computing, machine learning and mobile applications. In TSMC’s 3DFabricTM
platform, the back end packaging technology Chip-on-Wafer-on-Substrate (CoWoS®) with
the integration of High-Bandwidth Memory (HBM) has been successfully deployed in high
performance compute and machine learning applications to achieve high compute throughput,
while Integrated Fan-Out (InFO) packaging technology is widely used in mobile applications
thanks to its small footprint. System on Integrated Chips (SoIC⃨), leveraging advanced
front end Silicon process technology, offers an unprecedented bonding density for
vertical stacking.

Combining SoIC with CoWoS and InFO, the 3DFabric family of technologies provides a
versatile and flexible platform for system design innovations. A 3DFabric design starts
with system partitioning to decompose it into different functional components. In
contrast to a monolithic design approach, these functional components can potentially
be implemented in different technologies to optimize system performance, power, area,
and cost. Then these component chips are re-integrated with 3DFabric advanced packaging
technologies to form the system. There are new design challenges and opportunities
arising from 3DFabric. To unleash its full potential and accelerate the product development,
physical design solutions are developed. In this presentation, we will first review
these advanced packaging technologies trends and design challenges. Then, we will
present design solutions for 3-D chiplets and system integration.

## SESSION: Session 2: Machine Learning for Physical Design (1/2)

• Jiang Hu

### Reinforcement Learning for Electronic Design Automation: Successes and Opportunities

• Matthew E. Taylor

Reinforcement learning is a machine learning technique that has been applied in many
domains, including robotics, game playing, and finance. This talk will briefly introduce
reinforcement learning with two use cases related to compiler optimization and chip
design. Interested participants will also have materials suggested to learn a more

### Reinforcement Learning for Placement Optimization

• Anna Goldie
• Azalia Mirhoseini

In the past decade, computer systems and chips have played a key role in the success
of artificial intelligence (AI). Our vision in Google Brain’s Machine Learning for
Systems team is to use AI to transform the way in which computer systems and chips
are designed. Many core problems in systems and hardware design are combinatorial
optimization or decision making tasks with state and action spaces that are orders
of magnitude larger than that of standard AI benchmarks in robotics and games. In
this talk, we will describe some of our latest learning based approaches to tackling
such large-scale optimization problems. We will discuss our work on a new domain-transferable
reinforcement learning (RL) method for optimizing chip placement [1], a long pole
in hardware design. Our approach is capable of learning from past experience and improving
over time, resulting in more optimized placements on unseen chip blocks as the RL
agent is exposed to a larger volume of data. Our objective is to minimize power, performance,
and area. We show that, in under six hours, our method can generate placements that
are superhuman or comparable on modern accelerator chips, whereas existing baselines
require human experts in the loop and can take several weeks.

### The Law of Attraction: Affinity-Aware Placement Optimization using Graph Neural Networks

• Yi-Chen Lu
• Sai Pentapati
• Sung Kyu Lim

Placement is one of the most crucial problems in modern Electronic Design Automation
(EDA) flows, where the solution quality is mainly dominated by on-chip interconnects.
To achieve target closures, designers often perform multiple placement iterations
to optimize key metrics such as wirelength and timing, which is highly time-consuming
and computationally inefficient. To overcome this issue, in this paper, we present
a graph learning-based framework named PL-GNN that provides placement guidance for
commercial placers by generating cell clusters based on logical affinity and manually
defined attributes of design instances. With the clustering information as a soft
placement constraint, commercial tools will strive to place design instances in a
common group together during global and detailed placements. Experimental results
on commercial multi-core CPU designs demonstrate that our framework improves the default
placement flow of Synopsys IC Compiler II (ICC2) by 3.9% in wirelength, 2.8% in power,
and 85.7% in performance.

## SESSION: Session 3: Advances in Placement

### Session details: Session 3: Advances in Placement

• Joseph Shinnerl

• Andrew B. Kahng

Placement is central to IC physical design: it determines spatial embedding, and hence
parasitics and performance. From coarse-to fine-grain, placement is conjointly optimized
with logic, performance, clock and power distribution, routability and manufacturability.
This paper gives some personal thoughts on futures for placement research in IC physical
design. Revisiting placement as optimization prompts a new look at placement requirements,
optimization quality, and scalability with resources. Placement must also evolve to
meet a growing need for co-optimizations and for co-operation with other design steps.
“New” challenges will naturally arise from scaling, both at the end of the 2D scaling
roadmap and in the context of future 2.5D/3D/4D integrations. And, the nexus of machine
learning and placement optimization will continue to be an area of intense focus for
research and practice. In general, placement research is likely to see more flow-scale
optimization contexts, open source, benchmarking of progress toward optimality, and
attention to translations into real-world practice.

### A Fast Optimal Double Row Legalization Algorithm

• Stefan Hougardy
• Meike Neuwohner
• Ulrike Schorr

In Placement Legalization, it is often assumed that (almost) all standard cells possess
the same height and can therefore be aligned in cell rows, which can then be treated
independently. However, this is no longer true for recent technologies, where a substantial
number of cells of double- or even arbitrary multiple-row height is to be expected.
Due to interdependencies between the cell placements within several rows, the legalization
task becomes considerably harder. In this paper, we show how to optimize quadratic
cell movement for pairs of adjacent rows comprising cells of single- as well as double-row
height with a fixed left-to-right ordering in time $\mathcalO (n\cdotłog(n))$, whereby
n denotes the number of cells involved. Opposed to prior works, we thereby do not
artificially bound the maximum cell movement and can guarantee to find an optimum
solution. Experimental results show an average percental decrease of over $26%$ in
the total quadratic movement when compared to a legalization approach that fixes cells
of more than single-row height after Global Placement.

### Multiple-Layer Multiple-Patterning Aware Placement Refinement for Mixed-Cell-Height Designs

• Bo-Yang Chen
• Chi-Chun Fang
• Wai-Kei Mak
• Ting-Chi Wang

Conventional lithography techniques are unable to achieve the resolution required
by advance technology nodes. Multiple patterning lithography (MPL) has been introduced
as a viable solution. Besides, new standard cell structure with multiple middle-of-line
(MOL) layers is adopted to improve intra-cell routability. A mixed-cell-height standard
cell library, consisting of cells of single-row and multiple-row heights, is also
used in designs for power, performance and area concerns. As a result, it becomes
increasingly difficult to get a feasible placement for a mixed-cell-height design
where multiple cell layers require MPL. In this paper, we present a methodology to
refine a given mixed-cell-height standard cell placement for satisfying MPL requirements
on multiple cell layers as much as possible, while minimizing the total cell displacement.
We introduce the concept of uncolored cell group (UCG) to facilitate the effective
removal of coloring conflicts. By eliminating UCGs without generating any new coloring
conflict around them, the number of UCGs is effectively reduced in the local and global
refinement stages of our methodology. We report promising experimental results to
demonstrate the efficacy of our methodology.

### Snap-3D: A Constrained Placement-Driven Physical Design Methodology for Face-to-Face-Bonded 3D ICs

• Pruek Vanna-Iampikul
• Chengjia Shao
• Yi-Chen Lu
• Sai Pentapati
• Sung Kyu Lim

3D integration technology is one of the leading options that can advance Moore’s Law
beyond conventional scaling. Due to the absence of commercial 3D placers and routers,
existing 3D physical design flows rely heavily on 2D commercial tools to handle 3D
IC physical synthesis. Specifically, these flows build 2D designs first and then convert
them into 3D designs. However, several works demonstrate that design qualities degrade
during this 2D-3D transformation. In this paper, we overcome this issue with our Snap-3D,
a constraint-driven placement approach to build commercial-quality 3D ICs. Our key
idea is based on the observation that if the standard cell height is contracted by
one half and partitioned into multiple tiers, any commercial 2D placer can place them
onto the row structure and naturally achieve high-quality 3D placement. This methodology
is shown to optimize power, performance, and area (PPA) metrics across different tiers
simultaneously and minimize the aforementioned design quality loss. Experimental results
on 7 industrial designs demonstrate that Snap-3D achieves up to 5.4% wirelength, 10.1%
power, and 92.3% total negative slack improvements compared with state-of-the-art
3D design flows.

## SESSION: Session 4: Driving Research in Placement: a Retrospective

• Igor Markov

### Still Benchmarking After All These Years

• Ismail S. Bustany
• Jinwook Jung
• Natarajan Viswanathan
• Stephen Yang

Circuit benchmarks for VLSI physical design have been growing in size and complexity,
helping the industry tackle new problems and find new approaches. In this paper, we
take a look back at how benchmarking efforts have shaped the research community, consider
trade-offs that have been made, and speculate on what may come next.

## SESSION: Session 6: Second Keynote

### Session details: Session 6: Second Keynote

• Ismail Bustany

### Scalable System and Silicon Architectures to Handle the Workloads of the Post-Moore Era

• Ivo Bolsens

The end of Moore’s law has been proclaimed on many occasions and it’s probably safe
to say that we are now working in the post-Moore era. But no one is ready to slow
down just yet. We can view Gordon Moore’s observation on transistor densification
as just one aspect of a longer-term underlying technological trend – the Law of Accelerating
Returns articulated by Kurzweil. Arguably, companies became somewhat complacent in
the Moore era, happy to settle for the gains brought by each new process node. Although
we can expect scaling to continue, albeit at a slower pace, the end of Moore’s Law
delivers a stronger incentive to push other trends harder. Some exciting new technologies
are now emerging such as multi-chip 3D integration and the introduction of new technologies
such as storage-class memory and silicon photonics. Moreover, we are also entering
a golden age of computer architecture innovation. One of the key drivers is the pursuit
of domain-specific architectures as proclaimed by Turing award winners John Hennessy
and David Patterson. A good example is the Xilinx’s AI Engine, one of the important
features of the Versal? ACAP (adaptive compute acceleration platform). Today, the
explosion of AI workloads is one of the most powerful drivers shifting our attention
to find faster ways of moving data into, across, and out of accelerators. Features
such as massive parallel processing elements, the use of domain specific accelerators,
the dense interconnect between distributed on-chip memories and processing elements,
are examples of the ways chip makers are looking beyond scaling to achieve next-generation
performance gains.

## SESSION: Session 7: Machine Learning for Physical Design (2/2)

### Session details: Session 7: Machine Learning for Physical Design (2/2)

• Siddhartha Nath

### Learning Point Clouds in EDA

• Wei Li
• Guojin Chen
• Haoyu Yang
• Ran Chen
• Bei Yu

The exploding of deep learning techniques have motivated the development in various
fields, including intelligent EDA algorithms from physical implementation to design
for manufacturability. Point cloud, defined as the set of data points in space, is
one of the most important data representations in deep learning since it directly
pre- serves the original geometric information without any discretization. However,
there are still some challenges that stifle the applications of point clouds in the
EDA field. In this paper, we first review previous works about deep learning in EDA
and point clouds in other fields. Then, we discuss some challenges of point clouds
in EDA raised by some intrinsic characteristics of point clouds. Finally, to stimulate
future research, we present several possible applications of point clouds in EDA and
demonstrate the feasibility by two case studies.

### Building up End-to-end Mask Optimization Framework with Self-training

• Bentian Jiang
• Xiaopeng Zhang
• Lixin Liu
• Evangeline F.Y. Young

With the continuous shrinkage of device technology node, the tremendously increasing
demands for resolution enhancement technologies (RETs) have created severe concerns
over the balance between computational affordability and model accuracy. Having realized
the analogies between computational lithography tasks and deep learning-based computer
vision applications (e.g., medical image analysis), both industry and academia start
gradually migrating various RETs to deep learning-enabled platforms. In this paper,
we propose a unified self-training paradigm for building up an end-to-end mask optimization
framework from undisclosable layout patterns. Our proposed flow comprises (1) a learning-based
pattern generation stage to massively synthesize diverse and realistic layout patterns
following the distribution of the undisclosable target layouts, while keeping these
confidential layouts blind for any successive training stage, and (2) a complete self-training
stage for building up an end-to-end on-neural-network mask optimization framework
from scratch, which only requires the aforementioned generated patterns and a compact
lithography simulation model as the inputs. Quantitative results demonstrate that
our proposed flow achieves comparable state-of-the-art (SOTA) performance in terms
of both mask printability and mask correction time while reducing 66% of the turn
around time for flow construction.

### Machine Learning Techniques in Analog Layout Automation

• Tonmoy Dhar
• Kishor Kunal
• Yaguang Li
• Yishuang Lin
• Jitesh Poojary
• Arvind K. Sharma
• Steven M. Burns
• Ramesh Harjani
• Jiang Hu
• Parijat Mukherjee
• Soner Yaldiz
• Sachin S. Sapatnekar

The quality of layouts generated by automated analog design have traditionally not
been able to match those from human designers over a wide range of analog designs.
The ALIGN (Analog Layout, Intelligently Generated from Netlists) project [2, 3, 6]
aims to build an open-source analog layout engine [1] that overcomes these challenges,
using a variety of approaches. An important part of the toolbox is the use of machine
learning (ML) methods, combined with traditional methods, and this talk overviews
our efforts. The input to ALIGN is a SPICE-like netlist and a set of perfor- mance
specifications, and the output is a GDSII layout. ALIGN automatically recognizes hierarchies
in the input netlist. To detect variations of known blocks in the netlist, approximate
subgraph iso- morphism methods based on graph convolutional networks can be used [5].
Repeated structures in a netlist are typically constrained by layout requirements
related to symmetry or matching. In [7], we use a mix of graph methods and ML to detect
symmetric and array structures, including the use of neural network based approximate
matching through the use of the notion of graph edit distances. Once the circuit is
annotated, ALIGN generates the layout, going from the lowest level cells to higher
levels of the netlist hierarchy. Based on an abstraction of the process design rules,
ALIGN builds parameterized cell layouts for each structure, accounting for the need
for common centroid layouts where necessary [11]. These cells then undergo placement
and routing that honors the geomet- ric constraints (symmetry, common-centroid). The
chief parameter that changes during layout is the set of interconnect RC parasitics:
excessively large RCs could result in an inability to meet perfor- mance. These values
can be controlled by reducing the distance between blocks, or, in the case of R, by
using larger effective wire widths (using multiple parallel connections in FinFET
technologies where wire widths are quantized) to reduce the effective resistance.
ALIGN has developed several approaches based on ML for this purpose [4, 8, 9] that
rapidly predict whether a layout will meet the performance constraints that are imposed
at the circuit level, and these can be deployed together with conventional algorithmic
methods [10] to rapidly prune out infeasible layouts. This presentation overviews
our experience in the use of ML- based methods in conjunction with conventional algorithmic
ap- proaches for analog design. We will show (a) results from our efforts so far,
(b) appropriate methods for mixing ML methods with tra- ditional algorithmic techniques
for solving the larger problem of analog layout, (c) limitations of ML methods, and
(d) techniques for overcoming these limitations to deliver workable solutions for
analog layout automation.

## SESSION: Session 8: Monolithic 3D and Packaging Session

• Bill Swartz

### Advances in Carbon Nanotube Technologies: From Transistors to a RISC-V Microprocessor

• Gage Hills

Carbon nanotube (CNT) field-effect transistors (CNFETs) promise to improve the energy
efficiency of very-large-scale integrated (VLSI) systems. However, multiple challenges
have prevented VLSI CNFET circuits from being realized, including inherent nano-scale
material defects, robust processing for yielding complementary CNFETs (i.e., CNT CMOS:
including both PMOS and NMOS CNFETs), and major CNT variations. In this talk, we summarize
techniques that we have recently developed to overcome these outstanding challenges,
enabling VLSI CNFET circuits to be experimentally realized today using standard VLSI
processing and design flows. Leveraging these techniques, we demonstrate the most
complex CNFET circuits and systems to-date, including a three-dimensional (3D) imaging
system comprising CNFETs fabricated directly on top of a silicon imager, CNT CMOS
analog and mixed-signal circuits, 1 kilobit CNFET static random-access memory (SRAM)
memory arrays, and a 16-bit RISC-V microprocessor built entirely out of CNFETs.

### ML-Based Wire RC Prediction in Monolithic 3D ICs with an Application to Full-Chip Optimization

• Sai Surya Kiran Pentapati
• Bon Woong Ku
• Sung Kyu Lim

The state-of-the-art Monolithic 3D (M3D) IC design methodologies~\citem3d:Ku-tcad-Compact2D,
m3d:Panth-tcad-Shrunk2D use commercial electronic design automation tools built for
2D ICs to implement a pseudo-3D design and split it into two dies that are routed
independently to create an M3D design. Therefore, an accurate estimation of 3D wire
parasitics at the pseudo-3D stage is important to achieve a well optimized M3D design.
In this paper, we present a regression model based on boosted decision tree learning
to better predict the 3D wire parasitics (RCs) at the pseudo-3D stage. Our model is
trained using individual net features as well as the full-chip design metrics using
multiple instantiations of 8 different netlists and is tested on 3 unseen netlists.
Compared to the Compact-2D~\citem3d:Ku-tcad-Compact2D flow on its own as the reference
pseudo-3D, the addition of our predictive model achieves up to $2.9 \times$ and $1.7 \times$ smaller root mean square error in the resistance and capacitance predictions
respectively. On an unseen netlist design, we observe that our model provides 98.6%
and 94.6% RC prediction accuracy in 3D and up to $6.4 \times$ smaller total negative
slack of the design compared to the result of Compact-2D flow resulting in a more
timing-robust M3D IC. This model is not limited to Compact-2D, and can be extended
to other pseudo-3D flows.

### Machine Learning-Enabled High-Frequency Low-Power Digital Design Implementation At Advanced Process Nodes

• Siddhartha Nath
• Vishal Khandelwal

Relentless pursuit of high-frequency low-power designs at advanced nodes necessitate
achieving signoff-quality timing and power during digital implementation to minimize
any over-design. With growing design sizes (1–10M instances), full flow runtime is
an equally important metric and commercial implementation tools use graph-based timing
analysis (GBA) to gain runtime over path-based timing analysis (PBA), at the cost
of pessimism in timing. Last mile timing and power closure is then achieved through
expensive PBA-driven engineering change order (ECO) loops in signoff stage. In this
work, we explore “on-the-fly” machine learning (ML) models to predict PBA timing
based on GBA features, to drive digital implementation flow. Our ML model reduces
the GBA vs. PBA pessimism with minimal runtime overhead, resulting in improved area/power
without compromising on signoff timing closure. Experimental results obtained by integrating
our technique in a commercial digital implementation tool show improvement of up to
0.92% in area, 11.7% and 1.16% in power in leakage- and total power-centric designs,
respectively. Our method has a runtime overhead of $\sim$3% across a suite of 5–16nm
industrial designs.

### A Fast Power Network Optimization Algorithm for Improving Dynamic IR-drop

• Jai-Ming Lin
• Yang-Tai Kung
• Zheng-Yu Huang
• I-Ru Chen

As the power consumption of an electronic equipment varies more severely, the device
voltages in a modern design may fluctuate violently as well. Consideration of dynamic
IR-drop becomes indispensable to current power network design. Since solving voltage
violations according to all power consumption files in all time slots is impractical
in reality, this paper applies a clustering based approach to find representative
power consumption files and shows that most IR-drop violations can be repaired if
we repair the power network according to these files. In order to further reduce runtime,
we also propose an efficient and effective power network optimization approach. Compared
to the intuitive approach which repairs a power network file by file, our approach
alternates between different power consumption files and always repairs the file which
has the worst IR-drop violation region that involves more power consumption files
in each iteration. Since many violations can be resolved at the same time, this method
is much faster than the iterative approach. The experimental results show that the
proposed algorithm can not only eliminate voltage violations efficiently but also
construct a power network with less routing resource.

## SESSION: Session 9: Brains, Computers and EDA

### Session details: Session 9: Brains, Computers and EDA

• Patrick Groeneveld

### A Lifetime of ICs, and Cross-field Exploration: ISPD 2021 Lifetime Achievement Award Bio

• Louis K. Scheffer

The 2021 International Symposium on Physical Design lifetime achievement award goes
to Dr. Louis K. Scheffer for his outstand contributions to the field. This autobiography
in Lou’s own words provides a glimpse of what has happened through his career.

### The Physical Design of Biological Systems – Insights from the Fly Brain

• Louis K. Scheffer

Many different physical substrates can support complex computation. This is particularly
apparent when considering human made and biological systems that perform similar functions,
such as visually guided navigation. In common, however, is the need for good physical
design, as such designs are smaller, faster, lighter, and lower power, factors in
both the jungle and the marketplace. Although the physical design of man-made systems
is relatively well understood, the physical design of biological computation has remained
murky due to a lack of detailed information on their construction. The recent EM (electron
microscope) reconstruction of the central brain of the fruit fly now allows us to
start to examine these issues. Here we look at the physical design of the fly brain,
including such factors as fan-in and fanout, logic depth, division into physical compartments
and how this affects electrical response, pin to computation ratios (Rent’s rule),
and other physical characteristics of at least one biological computation substrate.
From this we speculate on how physical design algorithms might change if the target
implementation was a biological neural network.

### Of Brains and Computers

• Jan M. Rabaey

The human brain – which we consider to be the prototypal biological computer – in
its current incarnation is the result of more than a billion years of evolution. Its
main functions have always been to regulate the internal milieu and to help the organism/being
to survive and reproduce. With growing complexity, the brain has adapted a number
of design principles that serve to maximize its efficiency in performing a broad range
of tasks. The physical computer, on the other hand, had only 200 years or so to evolve,
and its perceived function was considerably different and far more constraint – that
is to solve a set of mathematical functions. This however is rapidly changing. One
may argue that the functions of brains and computers are converging. If so, the question
arises if the underlaying design principles will converge or cross-breed as well,
or will the different underlaying mechanisms (physics versus biology) lead to radically
different solutions.

### EDA and Quantum Computing: The key role of Quantum Circuits

• Leon Stok

Quantum computing (QC) is fast emerging as a potential disruptive technology that
can upend some businesses in the short-term and many enterprises in the long run.
Electronic Design Automation (EDA) is uniquely positioned to not only benefit from
quantum computing technologies but can also impact the pace of development of that
technology. Quantum circuits will play a key role in driving the synergy between quantum
and EDA. Much like standard cell libraries became the most important abstraction between
CMOS technology and most EDA tooling and spawned four decades of EDA innovation and
designer productivity, quantum circuits can unleash a similar streak of innovation
in quantum computing.

## SESSION: Session 11: Third Keynote

### Session details: Session 11: Third Keynote

• Iris Hui-Ru Jiang

• Juan C. Rey

In spite of “doomsday” expectations, Moore’s Law is alive and well. Semiconductor
manufacturing and design companies, as well as the Electronic Design Automation (EDA)
industry have been pushing ahead to bring more functionality to satisfy more aggressive
space/power/performance requirements. Physical verification occupies a unique space
in the ecosystem as one of the key bridges between design and manufacturing. As such,
the traditional space of design rule checking (DRC) and layout versus schematic (LVS)
have expanded into electrical verification and yield enabling technologies such as
optical proximity correction, critical area analysis, multi-patterning decomposition
and automated filling. To achieve the expected accuracy and performance demanded by
the design and manufacturing community, it is necessary to consider the physical effects
of the manufacturing processes and electronic devices and to use the most advanced
software engineering technology and computational capabilities.

## SESSION: Session 12: Physical Design at Advanced Technology Nodes

### Session details: Session 12: Physical Design at Advanced Technology Nodes

• Magna Mankalale

### Hardware Security for and beyond CMOS Technology

• Johann Knechtel

As with most aspects of electronic systems and integrated circuits, hardware security
has traditionally evolved around the dominant CMOS technology. However, with the rise
of various emerging technologies, whose main purpose is to overcome the fundamental
limitations for scaling and power consumption of CMOS technology, unique opportunities
arise to advance the notion of hardware security. In this paper, I first provide an
overview on hardware security in general. Next, I review selected emerging technologies,
namely (i) spintronics, (ii) memristors, (iii) carbon nanotubes and related transistors,
(iv) nanowires and related transistors, and (v) 3D and 2.5D integration. I then discuss
their application to advance hardware security and also outline related challenges.

### Physical Design Challenges and Solutions for Emerging Heterogeneous 3D Integration Technologies

• Lingjun Zhu
• Sung Kyu Lim

The emerging heterogeneous 3D integration technologies provide a promising solution
to improve the performance of electronic systems in the post-Moore era, but the lack
of design automation solutions and the challenges in physical design are hindering
the applications of these technologies. In this paper, we discuss multiple types and
levels of heterogeneous integration enabled by the high-density 3D technologies. We
investigate each physical implementation stage from technology setup to placement
and routing, identify the design challenges proposed by heterogeneous 3D integration.
This paper provides a comprehensive survey on the state-of-the-art physical design

### A Scalable and Robust Hierarchical Floorplanning to Enable 24-hour Prototyping for 100k-LUT FPGAs

• Ganesh Gore
• Xifan Tang
• Pierre-Emmanuel Gaillardon

Physical design for Field Programmable Gate Array (FPGA) is challenging and time-consuming,
primarily due to the use of a full-custom approach for aggressively optimize Performance,
Power and Area (P.P.A.) of the FPGA design. The growing number of FPGA applications
demands novel architectures and shorter development cycles. The use of an automated
toolchain is essential to reduce end-to-end development time. This paper presents
scalable and adaptive hierarchical floorplanning strategies to significantly reduce
the physical design runtime and enable millions-of-LUT FPGA layout implementations
using standard ASIC toolchains. This approach mainly exploits the regularity of the
design and performs necessary feedthrough creations for global and clock nets to eliminate
any requirement of global optimizations. To validate this approach, we implemented
full-chip layouts for modern FPGA fabric with logic capacity ranging from 40 to 100k
LUTs using a commercial 12nm technology. Our results show that the physical implementation
of a 128k-LUT FPGA fabric can be achieved within 24-hours, which has not been demonstrated
by any previous work. Compared to previous work, the runtime reduction of 8x is obtained
for implementing 2.5k LUTs FPGA device.

## SESSION: Session 13: Contest and Results

### Session details: Session 13: Contest and Results

• Gracieli Posser

### ISPD 2021 Wafer-Scale Physics Modeling Contest: A New Frontier for Partitioning, Placement and Routing

• Patrick Groeneveld
• Michael James
• Ilya Sharapov
• Marvin Tom
• Leo Wang

Solving 3-D partial differential equations in a Finite Element model is computationally
intensive and requires extremely high memory and communication bandwidth. This paper
describes a novel way where the Finite Element mesh points of varying resolution are
mapped on a large 2-D homogenous array of processors. Cerebras developed a novel supercomputer
that is powered by a 21.5cm by 21.5cm Wafer-Scale Engine (WSE) with 850,000 programmable
compute cores. With 2.6 trillion transistors in a 7nm process this is by far the largest
chip in the world. It is structured as a regular array of 800 by 1060 identical processing
elements, each with its own local fast SRAM memory and direct high bandwidth connection
to its neighboring cores. For the 2021 ISPD competition we propose a challenge to
optimize placement of computational physics problems to achieve the highest possible
performance on the Cerebras supercomputer. The objectives are to maximize performance
and accuracy by optimizing the mapping of the problem to cores in the system. This
involves partitioning and placement algorithms.