CADathlon 2019 Problem References

Problem 1: Circuit Design and Analysis
Contributed by Jianlei Yang, Beihang University
Overview: Solve Landau-Lifshitz-Gilbert (LLG) equation (in C++)
Reference: Iwasaki, Junichi, Masahito Mochizuki, and Naoto Nagaosa. “Current-induced skyrmion dynamics in constricted geometries,” Nature nanotechnology 8.10 (2013): 742.

Problem 2: Physical Design & Design for Manufacturability
Contributed by William Chow, Cadence
Overview: Tap assignment for gated clock network (in C++)
Reference: W-H Chen, C-K Wang, H-M Chen, Y-C Chou, and C-H Tsai, “A Comparative Study on Multisource Clock Network Synthesis,” The 22nd Workshop on Synthesis And System Integration of Mixed Information technologies (SASIMI), 2016

Problem 3: Logic & High-Level Synthesis
Overview: Boolean Function Manipulation by Quantification (in C++)
Reference: No specific reference is provided.

Problem 4: System Design & Analysis
Contributed by Andy Yu-Guang Chen, National Central University
Overview: On-line Wake-up Scheduling for Multi-module design (in C++)
Reference 1: D. Brelaz, “New Methods to Color the Vertices of a Graph,” Communications of the ACM, Vol.22, Issue 4, Apr. 1979.
Reference 2: M.C. Lee, Y. Shi, Y.G. Chen, D. Marculescu, S.C. Chang, “Efficient On-Line Module-Level Wake-Up Scheduling for High Performance Multi-Module Designs,” Proc. on the International Symposium on Physical Design (ISPD), 2012, Page(s): 97-104.

Problem 5: Functional Verification & Testing
Contributed by Hao Zheng, University of South Florida
Overview: Cycle-based logic simulation (in C++)
Reference 1: S. Palnitkar and D. Parham, “Cycle Simulation Techniques,” IEEE International Verilog HDL Conference, 1995, Page(s) 2-8.
Reference 2: A. Biere, “The AIGER And-Inverter Graph (AIG) Format, Version 20070427,” Johannes Kepler University, 2006-2007

Problem 6: Future technologies (Bio-EDA, Security, AI, etc.)
Contributed by Mimi Xie, The University of Texas at San Antonio and Caiwen Ding, University of Connecticut
Overview: Efficient Pruning for Neural Networks (in Python)
Reference: Han, Song, Jeff Pool, John Tran, and William Dally. “Learning both weights and connections for efficient neural network,” In Advances in neural information processing systems, pp. 1135-1143. 2015.


SESSION: Keynote

Session details: Keynote

  • Bustany

Fusion: The Dawn of the Hyperconvergence Era in EDA

  • Krishnamoorthy

Hyperconvergence is a software-centric architecture which has disrupted the datacenter
industry in a dramatic way by bringing the disparate areas of compute, storage and
networking into a single system. A hyperconverged system allows the integrated …

SESSION: New Advances in Placement

Session details: New Advances in Placement

  • Yang

How Deep Learning Can Drive Physical Synthesis Towards More Predictable Legalization

  • Netto

Machine learning has been used to improve the predictability of different physical
design problems, such as timing, clock tree synthesis and routing, but not for legalization.
Predicting the outcome of legalization can be helpful to guide incremental …

Graceful Register Clustering by Effective Mean Shift Algorithm for Power and Timing

  • Chang

As the wide adoption of FinFET technology in mass production, dynamic power becomes
the bottleneck to achieving low power. Therefore, clock power reduction is crucial
in modern IC design. Register clustering can effectively save clock power because
of …

Device Layer-Aware Analytical Placement for Analog Circuits

  • Xu

The layouts of analog/mixed-signal (AMS) integrated circuits (ICs) are dramatically
different from their digital counterparts. AMS circuit layouts usually include a variety
of devices, including transistors, capacitors, resistors, and inductors. A …

Analytical Mixed-Cell-Height Legalization Considering Average and Maximum Movement

  • Li

Modern circuit designs often contain standard cells of different row heights to meet
various design requirements. Due to the higher interference among heterogeneous cell
structures, the legalization problem for mixed-cell-height standard cells becomes

SESSION: FPGA Special Session: Advances in Adaptable Heterogeneous Computing and Acceleration
for Big Data

Session details: FPGA Special Session: Advances in Adaptable Heterogeneous Computing
and Acceleration for Big Data

  • Iyer

FPGA-based Computing in the Era of AI and Big Data

  • Nurvitadhi

The continued rapid growth of data, along with advances in Artificial Intelligence
(AI) to extract knowledge from such data, is reshaping the computing ecosystem landscape.
With AI becoming an essential part of almost every end-user application, our …

Advances in Adaptable Computing

  • Gupta

Recent technical challenges have forced the industry to explore options beyond the
conventional “one size fits all” CPU scalar processing solution. Very large vector
processing (DSP, GPU) solves some problems, but it runs into traditional scaling …

Improving Programmability and Efficiency of Large-Scale Graph Analytics for FPGA Platforms

  • Ozdal
    Muhammet Mustafa

Large-scale graph analytics has gained importance due to emergence of new applications
in different contexts such as web, social networks, and computational biology. It
is known that typical CPU/GPU implementations for sparse graph applications cannot

SESSION: Routing in All Forms

Session details: Routing in All Forms

  • Madden

Pin Access-Driven Design Rule Clean and DFM Optimized Routing of Standard Cells under
Boolean Constraints

  • Ryzhenko

In this paper, we propose a routing flow for nets within a standard cell that generates
layout of standard cells without any design rule violations. Design rules, density
rules for metal fill, and pin-access requirements are modeled via Boolean formulas

PSION: Combining Logical Topology and Physical Layout Optimization for Wavelength-Routed

  • Truppel

Optical Networks-on-Chip (ONoCs) are a promising solution for high-performance multi-core
integration with better latency and bandwidth than traditional Electrical NoCs. Wavelength-routed
ONoCs (WRONoCs) offer yet additional performance guarantees. …

Construction of All Multilayer Monolithic Rectilinear Steiner Minimum Trees on the
3D Hanan Grid for Monolithic 3D IC Routing

  • Lin
    Sheng-En David

Monolithic three-dimensional~(3D) integration enables stacking multiple ultra-thin
silicon tiers in a single package, thereby providing smaller footprint area, shorter
wirelength, higher performance, and lower power consumption than conventional planar

ROAD: Routability Analysis and Diagnosis Framework Based on SAT Techniques

  • Park

Routability diagnosis has increasingly become the bottleneck in detailed routing for
sub-10nm technology due to the limited tracks, high density, and complex design rules. The
conventional ways to examine the routability of detailed routing are ILP- and …

SESSION: Keynote

Session details: Keynote

  • Menezes

A Perspective on Security and Trust Requirements for the Future

  • Plaks

As integrated circuit manufacturing becomes increasingly global and the availability
of domestically produced advanced transistor nodes shrinks, security vulnerabilities
within the supply chain become a significant issue for IC defense applications. In

SESSION: Patterning and Machine Learning

Session details: Patterning and Machine Learning

  • Young

Declarative Language for Geometric Pattern Matching in VLSI Process Rule Modeling

  • Suto

This paper presents a formal (machine readable) declarative language developed for
the specific reason of modeling physical design process rules of any complexity. Case
studies are presented on synthetic as well as industry known design rules of simple

Electromigration-Aware Interconnect Design

  • Sapatnekar
    Sachin S.

Electromigration (EM) is seen as a growing problem in recent and upcoming technology
nodes, and affects a wider variety of wires (e.g., power grid, clock/signal nets),
circuits (e.g., digital, analog, mixed-signal), and systems (e.g., mobile, server,

Toward Intelligent Physical Design: Deep Learning and GPU Acceleration

  • Ren

Deep learning (DL) has achieved tremendous success in computer vision, natural language
processing and gaming. Would DL help push physical design toward a more intelligent
paradigm to meet the post-Moore era design automation challenges? We will discuss

Multiple Patterning Layout Compliance with Minimizing Topology Disturbance and Polygon

  • Chang

Multiple patterning lithography (MPL) divides a layout into several masks and manufactures
them by a series of exposure and etching steps. As technology advances, MPL is still
indispensable because of its cost effectiveness and hybrid lithography …

SESSION: Cyber-Physical Systems

Session details: Cyber-Physical Systems

  • Groeneveld

From Electronic Design Automation to Automotive Design Automation

  • Lin

Advanced driver assistance systems (ADAS), autonomous functions, and connected applications
bring a revolution to automotive systems, but they also make automotive design, especially
software and electronics, more complex than ever. The complexity …

Enterprise-wide AI-enabled Digital Transformation

  • Maasoumy

Having solved the data integration problem, we discuss how convergence of 4 technology
vectors, namely Big Data, Artificial Intelligence, Cloud Computing, and Internet of
Things (IoT) has, for the first time, enabled us to solve a class of problems …

Secure and Trustworthy Cyber-Physical System Design: A Cross-Layer Perspective

  • Nuzzo

This talk discusses some of the design challenges posed by cyber-physical system security
at different abstraction layers, from algorithm design to the realization of trusted
hardware platforms. We introduce two design problems, namely, detecting sensor …

SESSION: Lifetime Achievement Award Tribute to Professor Alberto Sangiovanni-Vicentelli

Session details: Lifetime Achievement Award Tribute to Professor Alberto Sangiovanni-Vicentelli

  • Nuzzo

The Slow Start of Fast Spice: A Brief History of Timing

  • White
    Jacob K.

The list of Professor Alberto Sangiovanni-Vincentelli’s research contributions is
astounding in length and breadth, yet does not entirely capture what this author believes
is his true genius. In so many areas of computer-aided design, Sangiovanni-…

Basic and Advanced Researches in Logic Synthesis and their Industrial Contributions

  • Fujita

We first present historical view on the techniques for two-level and multi-level logic
optimizations, and discuss the practical issues with respect to them. Then the techniques
for sequential optimizations are briefly reviewed. Based on them, a new …

From Electronic Design Automation to Cyber-Physical System Design Automation: A Tale of Platforms and Contracts

  • Nuzzo

This paper reflects on the design challenges posed by cyber-physical systems, what
distinguishes cyber-physical system design from large-scale integrated circuit design,
and what could be the opportunities for the design automation community. The paper

My 50-Year Journey from Punched Cards to Swarm Systems

  • Sangiovanni Vincentelli

The article is a reflection onmy journey during the development of the EDA field,
from its early days to its explosive growth and present maturity. The two special
issues of the Solid State Circuit Society Magazine “Corsi e Ricorsi: Alberto Sangiovanni

SESSION: Lifetime Achievement Award Dinner Banquet Keynote

Freedom From Choice and the Power of Models: in Honor of Alberto Sangiovanni-Vincentelli

  • Lee
    Edward A.

Discovery, invention, and design are all about models. When we say “Joseph Priestly
discovered oxygen in 1774,” we do not mean that Priestly dug up a canister of oxygen,
recognized it as something new, and released it, for the first time, into the air.

SESSION: Physical Design – Where are we going?

Session details: Physical Design – Where are we going?

  • Cheng

Analog Layout Synthesis: Are We There Yet?

  • Mangalagiri

Over the past decade, spurred by advances in mobile computing, there has been a fundamental
shift in computing needs of consumer applications. There has been an industry-wide
transition from highly CPU-centric to a peripheral-centric, connectivity and …

Lagrangian Relaxation Based Gate Sizing With Clock Skew Scheduling – A Fast and Effective

  • Sharma

Recent work has established Lagrangian relaxation (LR) based gate sizing as state-of-the-art
providing the best power reduction with low run time. Gate sizing has limited potential
to reduce the power when the timing constraints are tight. By adjusting …

Adaptive Clustering and Sampling for High-Dimensional and Multi-Failure-Region SRAM
Yield Analysis

  • Shi

Statistical circuit simulation is exhibiting increasing importance for memory circuits
under process variation. It is challenging to accurately estimate the extremely low
failure probability as it becomes a high-dimensional and multi-failure-region …

SESSION: Detailed Routing Contest Results

Session details: Detailed Routing Contest Results

  • Chinnery

ISPD 2019 Initial Detailed Routing Contest and Benchmark with Advanced Routing Rules

  • Liu

Detailed routing becomes the most complicated and runtime consuming stage in the physical
design flow as technology nodes advance. Due to the inaccessibility of advanced routing
rules and industrial designs, it is hard to conduct detailed routing …


Scope – quality retaining display rendering workload scaling based on user-smartphone

  • Nixon
    Kent W.

Modern smartphone display system come equipped with powerful GPU’s capable of rendering
advanced 2D and 3D graphics. These GPU’s make up a significant portion of the system
power profile due to the high resolution and framerate of smartphone display. …

NVSim-CAM: a circuit-level simulator for emerging nonvolatile memory based content-addressable

  • Li

Ternary Content-Addressable Memory (TCAM) is widely used in networking routers, fully
associative caches, search engines, etc. While the conventional SRAM-based TCAM suffers
from the poor scalability, the emerging nonvolatile memories (NVM, i.e., MRAM, …

Design technology for fault-free and maximally-parallel wavelength-routed optical

  • Peano

The recent interest in emerging interconnect technologies is bringing the issue of
a proper EDA support for them to the forefront, so to tackle the design complexity.
A relevant case study is provided by wavelength-routed optical NoCs (WRONoCs), which

Fast generation of lexicographic satisfiable assignments: Enabling canonicity in SAT-based applications

  • Petkovska

Lexicographic Boolean satisfiability (LEXSAT) is a variation of the Boolean satisfiability
problem (SAT). Given a variable order, LEXSAT finds a satisfying assignment whose
integer value under the given variable order is minimum (maximum) among all …

Analytic approaches to the collapse operation and equivalence verification of threshold
logic circuits

  • Lee

Threshold logic circuits gain increasing attention due to their feasible realization
with emerging technologies and strong bind to neural network applications. In this
paper, for logic synthesis we formulate the fundamental operation of collapsing …

A flash-based digital circuit design flow

  • Abusultan

Traditionally, floating gate (flash) transistors have been used exclusively to implement
non-volatile memory in its various forms. Recently, we showed that flash transistors
can be used to implement digital circuits as well. In this paper, we present …

MrDP: <u>m</u>ultiple-<u>r</u>ow <u>d</u>etailed <u>p</u>lacement of heterogeneous-sized
cells for advanced nodes

  • Lin

As VLSI technology shrinks to fewer tracks per standard cell, e.g., from 10-track
to 7.5-track libraries (and lesser for 7nm), there has been a rapid increase in the
usage of multiple-row cells like two- and three-row flip-flops, buffers, etc., for

OWARU: free space-aware timing-driven incremental placement

  • Jung

This paper proposes a powerful new technique called “OWARU”1 that re-places and re-sizes multiple gates simultaneously to improve the most critical
paths of a design. In essence, it is an incremental timing-driven placement technique
integrated with …

Detailed placement for modern FPGAs using 2D dynamic programming

  • Dhar

In this paper, we propose a 2-dimensional dynamic programming (DP) based detailed
placement algorithm for modern FPGAs for wirelength and timing optimization. By tuning
a control parameter, our algorithm can perform fast heuristic or exact optimization.

Security and privacy threats to on-chip non-volatile memories and countermeasures

  • Ghosh

Non-volatile memories (NVMs) such as Spin-Transfer Torque RAM (STTRAM) have drawn
significant attention due to complete elimination of bitcell leakage. In addition
to the plethora of benefits such as density, non-volatility, low-power and high speed,

Security engineering of nanostructures and nanomaterials

  • Shahrjerdi

Proliferation of electronics and their increasing connectivity pose formidable challenges
for information security. At the most fundamental level, nanostructures and nanomaterials
offer an unprecedented opportunity to introduce new approaches to …

Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks

  • Zhang

With the recent advancement of multilayer convolutional neural networks (CNN), deep
learning has achieved amazing success in many areas, especially in visual content
understanding and classification. To improve the performance and energy-efficiency
of …

Re-architecting the on-chip memory sub-system of machine-learning accelerator for
embedded devices

  • Wang

The rapid development of deep learning are enabling a plenty of novel applications
such as image and speech recognition for embedded systems, robotics or smart wearable
devices. However, typical deep learning models like deep convolutional neural …

A data locality-aware design framework for reconfigurable sparse matrix-vector multiplication

  • Li

Sparse matrix-vector multiplication (SpMV) is an important computational kernel in
many applications. For performance improvement, software libraries designated for
SpMV computation have been introduced, e.g., MKL library for CPUs and cuSPARSE library …

Compact oscillation neuron exploiting metal-insulator-transition for neuromorphic

  • Chen

The phenomenon of metal-insulator-transition (MIT) in strongly correlated oxides,
such as NbO2, have shown the oscillation behavior in recent experiments. In this work, the MIT
based two-terminal device is proposed as a compact oscillation neuron for …

A new tightly-coupled transient electro-thermal simulation method for power electronics

  • Chen

This paper presents a new transient electro-thermal (ET) simulation method for fast
3D chip-level analysis of power electronics with field solver accuracy. The metallization
stacks are meshed and solved with 3D field solver using nonlinear temperature-…

A tensor-based volterra series black-box nonlinear system identification and simulation

  • Batselier

Tensors are a multi-linear generalization of matrices to their d-way counterparts, and are receiving intense interest recently due to their natural
representation of high-dimensional data and the availability of fast tensor decomposition
algorithms. …

Efficient statistical analysis for correlated rare failure events via asymptotic probability

  • Yu

In this paper, a novel Asymptotic Probability Approximation (APA) method is proposed
to estimate the overall rare probability of correlated failure events for complex
circuits containing a large number of replicated cells (e.g., SRAM bit-cells). The
key …

Duplex: simultaneous parameter-performance exploration for optimizing analog circuits

  • Ahmadyan
    Seyed Nematollah

We present Duplex random tree search, an algorithm to optimize performance metrics
of analog and mixed signal circuits. Duplex determines the optimal design, the Pareto
set and the sensitivity of circuit’s performance metrics to its parameters. We …

Improved flop tray-based design implementation for power reduction

  • Kahng
    Andrew B.

Clock network power reduction is critical in modern SoC designs. Application of flop trays (i.e., multi-bit flip-flops) can significantly reduce the number of sinks in a clock
network, and thus reduce the number of clock buffers, clock wirelength, and …

RC-aware global routing

  • Scheifele

We address the problem of incorporating RC delay constraints into global routing.
In contrast to the usual global routing approach that focuses on minimizing net length
while obeying constraints given by other tools such as layer assignments, our method

Scalable, high-quality, SAT-based multi-layer escape routing

  • Bayless

Escape routing for Printed Circuit Boards (PCBs) is an important problem arising from
modern packaging with large numbers of densely spaced pins, such as BGAs. Single-layer
escape routing has been well-studied, but large, dense BGAs often require …

Redistribution layer routing for integrated fan-out wafer-level chip-scale packages

  • Lin

The integrated fan-out (InFO) wafer-level chip-scale package (WLCSP) s an emerging
packaging technology, which typically consists of multiple redistribution layers (RDLs)
for signal redistributions among multiple chips. There is still no published work

The architecture value engine: measuring and delivering sustainable SoC improvement

  • Carballo

The value of semiconductor-based systems continues to increase rapidly especially
when considering the cost associated with building it. As such, Moore’s Law has become
a law associated broadly with value growth instead of pure performance growth. While

Circuit valorization in the IC design ecosystem

  • de Gyvez
    José Pineda

Staying at the forefront of research, or in the top tier product market requires circuit
innovation as a key differentiation. We are entering an era where more than Moore
is becoming increasingly evident, not only because of the physical limitations of

Interconnect-aware device targeting from PPA perspective

  • Badaroglu

CMOS scaling so far enabled simultaneous system throughput scaling by concurrent improvements
in delay, power, and area with thanks to Moore’s law. CMOS scaling becomes more difficult
with the limits of interconnect and increasing wafer cost. It is …

Measuring progress and value of IC implementation technology

  • Kahng
    Andrew B.

Over the past decade, “Moore’s Law” has become increasingly well-understood as being
a law of “value scaling”: success of new electronics- and semiconductor-based products
depends on improved cost-efficiency, utility, and value. Design Automation (DA) …

Provably secure camouflaging strategy for IC protection

  • Li

The advancing of reverse engineering techniques has complicated the efforts in intellectual
property protection. Proactive methods have been developed recently, among which layout-level
IC camouflaging is the leading example. However, existing …

CamoPerturb: secure IC camouflaging for minterm protection

  • Yasin

Integrated circuit (IC) camouflaging is a layout-level technique that thwarts reverse
engineering attacks on ICs by introducing camouflaged cells that look alike, but can
implement one of many possible Boolean functions. Existing camouflaging techniques

Chip editor: leveraging circuit edit for logic obfuscation and trusted fabrication

  • Shakya

The globalization of the semiconductor foundry business poses grave risks in terms
of intellectual property (IP) protection, especially for critical applications. Over
the past few years, several techniques have been proposed that allow manufacturing
of …

Arbitrary streaming permutations with minimum memory and latency

  • Koehn

Streaming architectures are a popular choice for data intensive application due to
their high throughput requirements. When assembling components for a streaming application,
it is often necessary to build translation blocks between them to match the …

Multibank memory optimization for parallel data access in multiple data arrays

  • Yin

To realize high throughput out of a relatively low bandwidth, memory partitioning
algorithms have been proposed to separate data arrays into multiple memory banks,
from which multiple data can be accessed in parallel. However, previous partitioning

Allocation of multi-bit flip-flops in logic synthesis for power optimization

  • Yi

In this paper, a new approach to the problem of allocating multi-bit flip-flops for
data storage is presented. Previous approaches divide the allocation problem into
two separate steps: (i) placing single-bit flip-flops under circuit timing constraints

Model-based design of resource-efficient automotive control software

  • Chang

Automotive platforms today run hundreds of millions of lines of software code implementing
a large number of different control applications spanning across safety-critical functionality
to driver assistance and comfort-related functions. While such …

Testing automotive embedded systems under X-in-the-loop setups

  • Tibba

The development of automotive electronics and software systems is often associated
with high costs due to their multi-domain nature (including control engineering, electronics,
hydraulics, mechanics, etc). The involvement of these different disciplines …

Efficient statistical validation of machine learning systems for autonomous driving

  • Shi

Today’s automotive industry is making a bold move to equip vehicles with intelligent
driver assistance features. A modern automobile is now equipped with a powerful computing
platform to run multiple machine learning algorithms for environment …

CONVINCE: a cross-layer modeling, exploration and validation framework for next-generation connected

  • Zheng

Next-generation autonomous and semi-autonomous vehicles will not only precept the
environment with their own sensors, but also communicate with other vehicles and surrounding
infrastructures for vehicle safety and transportation efficiency. The design, …

Overview of the 2016 CAD contest at ICCAD

  • Huang

The CAD Contest at ICCAD is a challenging, multi-month competition, focusing on advanced,
real-world problems in the field of Electronic Design Automation (EDA). In its fifth
year, the 2016 CAD Contest at ICCAD attracted 135 teams from 11 regions/…

ICCAD-2016 CAD contest in large-scale identical fault search

  • Wei

Injecting faults into designs is a way to qualify a verification environment. To improve
the performance of a qualifying process, we need to remove identical faults. The problem
will provide some faulty design cases; the contestants must identify all …

ICCAD-2016 CAD contest in non-exact projective NPNP boolean matching and benchmark

  • Wu
    Chi-An (Rocky)

Boolean Matching is significant to industry applications, such as library binding,
synthesis, engineer change order, and hardware Trojan detection. Instead of basic
Boolean matching, Non-exact Projective NPNP Boolean Matching allows to match two designs

ICCAD-2016 CAD contest in pattern classification for integrated circuit design space
analysis and benchmark suite

  • Topaloglu
    Rasit O.

Layout pattern classification has been utilized in recent years in integrated circuit
design towards various goals such as design space analysis, design rule generation,
and systematic yield optimization. There is a need for open source or academic …

OpenDesign flow database: the infrastructure for VLSI design and design automation

  • Jung

Recently, there have been a slew of design automation contests and released benchmarks.
ISPD place & route contests, DAC placement contests, timing analysis contests at TAU
and CAD contests at ICCAD are good examples in the past and more of new contests …

Malicious LUT: a stealthy FPGA trojan injected and triggered by the design flow

  • Krieg

We present a novel type of Trojan trigger targeted at the field-programmable gate
array (FPGA) design flow. Traditional triggers base on rare events, such as rare values
or sequences. While in most cases these trigger circuits are able to hide a Trojan

On detecting delay anomalies introduced by hardware trojans

  • Ismari

A hardware Trojan (HT) detection method is presented that is based on measuring and
detecting small systematic changes in path delays introduced by capacitive loading
effects or series inserted gates of HTs. The path delays are measured using a high

An optimization-theoretic approach for attacking physical unclonable functions

  • Liu

Physical unclonable functions (PUFs) utilize manufacturing ariations of circuit elements
to produce unpredictable response to any challenge vector. The attack on PUF aims
to predict the PUF response to all challenge vectors while only a small number of

LRR-DPUF: learning resilient and reliable digital physical unclonable function

  • Miao

Conventional silicon physical unclonable function (PUF) extracts fingerprints from
transistor’s analog attributes, which are vulnerable to environmental and operational
variations. Recently, digitalized PUF prototypes have emerged to overcome the …

Enabling online learning in lithography hotspot detection with information-theoretic
feature optimization

  • Zhang

With the continuous shrinking of technology nodes, lithography hotspot detection and
elimination in the physical verification phase is of great value. Recently machine
learning and pattern matching based methods have been extensively studied to overcome

Incorporating cut redistribution with mask assignment to enable 1D gridded design

  • Kuang

1D gridded design is one of the most promising solutions that can enable the scaling
to 10nm technology node and beyond. Line-end cuts are needed to fabricate 1D layouts, where
two techniques are available to resolve the conflicts between cuts: cut …

VCR: simultaneous via-template and cut-template-aware routing for directed self-assembly

  • Su

The directed self-assembly (DSA) technology for next-generation lithography has been
shown its great potential for fabricating highly dense via patterns and cut masks
in the sub-5 nm technology node and beyond. However, DSA via and cut optimizations

DSA-compliant routing for two-dimensional patterns using block copolymer lithography

  • Su

Two-dimensional (2D) directed self-assembly (DSA) is an emerging lithography for the
5 nm process node and beyond that can substantially increase design flexibility in
critical routing layers and reduce the number of cuts for better yield. The state-of-…

The art of semi-formal bug hunting

  • Nalla
    Pradeep Kumar

Verification is a critical task in the development of correct computing systems. Simulation
remains the predominantly used technique to identify design flaws, due to its scalability.
However, simulation intrinsically suffers from low functional coverage,…

Compiled symbolic simulation for systemC

  • Herdt

Ensuring the correctness of SystemC virtual prototypes is indispensable. For such
models, existing symbolic simulation approaches are based on interpreting their behavior.
In this paper we propose a major enhancement called Compiled Symbolic Simulation (…

Exact diagnosis using boolean satisfiability

  • Riener

We propose an exact algorithm to model-free diagnosis with an application to fault
localization in digital circuits. We assume that a faulty circuit and a correctness
specification, e.g., in terms of an un-optimized reference circuit, are available.
Our …

Efficient and accurate analysis of single event transients propagation using SMT-based

  • Hamad
    Ghaith Bany

This paper presents a hierarchical framework to model, analyze, and estimate digital
design vulnerability to soft errors due to Single Event Transients (SETs). A new SET
propagation model is proposed. This model simultaneously includes the impact of …

Power delivery in 3D packages: current crowding effects, dynamic IR drop and compensation network using sensors (invited

  • Kannan

In 3D packages top-die power delivery is a not only limited by back-end of line (technology
scaling), but also by the TSV integration scheme, the stacking method and the microbump
current-carrying capability. The microbump structure and its …

Cost analysis and cost-driven IP reuse methodology for SoC design based on 2.5D/3D

  • Stow

Due to the increasing fabrication and design complexity with new process nodes, the
cost per transistor trend originally identified in Moore’s Law is slowing when using
traditional integration methods. However, emerging die-level integration …

Energy-efficient and reliable 3D network-on-chip (NoC): architectures and optimization

  • Das

The Network-on-Chip (NoC) paradigm has emerged as an enabler for integrating a large
number of embedded cores in a single die. Three-dimensional (3D) integration, a breakthrough
technology to achieve “More Moore and More Than Moore,” provides numerous …

The hype, myths, and realities of testing 3D integrated circuits

  • Wang

Three-dimensional (3D) integration using through-silicon vias (TSVs) promises higher
integration levels in a single package, keeping pace with Moore’s law. Despite the
promise and benefits offered by 3D integration, testing remains a major obstacle that

TASA: toolchain-agnostic static software randomisation for critical real-time systems

  • Kosmidis

Measurement-Based Probabilistic Timing Analysis (MBPTA) derives WCET estimates for
tasks running on processors comprising high-performance features such as caches. MBPTA’s
correct application requires the system to exhibit certain timing properties, …

Splitting functions in code management on scratchpad memories

  • Kim

As the number of cores increases, cache-based memory hierarchy is becoming a major
problem in terms of the scalability and energy consumption. Software-managed scratchpad
memories (SPM) is a scalable alternative to caches, but the benefit comes at the …

Adaptive performance prediction for integrated GPUs

  • Gupta

Integrated GPUs have become an indispensable component of mobile processors due to
the increasing popularity of graphics applications. The GPU frequency is a key factor
both in application throughput and mobile processor power consumption under graphics

Energy-efficient fault tolerance approach for internet of things applications

  • Xu

Fault tolerance (FT) is essential in many Internet of Things (IoT) applications, in
particular in the domains such as medical devices and automotive systems where a single
fault in the system can lead to serious consequences. Non-volatile memory (NVM), …

Critical path isolation for time-to-failure extension and lower voltage operation

  • Masuda

Device miniaturization due to technology scaling has made manufacturing variability
and aging more significant, and lower supply voltage makes circuits sensitive to dynamic
environmental fluctuation. These may shorten the time to failure (TTF) of …

Control synthesis and delay sensor deployment for efficient ASV designs

  • Li

Adaptive Supply Voltage (ASV) is a power-efficient approach to achieving resilience
against process variation and circuit aging. Fine-grained ASV offers further power-efficiency
gains, but entails relatively complex control circuit, which has not been …

Performance driven routing for modern FPGAs

  • Kannan

FPGA routing is a well studied problem. Basic point-to-point routing of nets on FPGA
fabrics can be done optimally using well known shortest path algorithms like Dijkstra’s
and A-star. Practical rip-up and reroute algorithms like PathFinder have been …

UTPlaceF: a routability-driven FPGA placer with physical and congestion aware packing

  • Li

FPGA packing and placement without routability consideration could lead to unroutable
results for high-utilization designs. Conventional FPGA packing and placement approaches
are shown to have severe difficulties to yield good routability. In this paper,…

RippleFPGA: a routability-driven placement for large-scale heterogeneous FPGAs

  • Pui

As the complexity and scale of FPGA circuits grows, resolving routing congestion becomes
more important in FPGA placement. In this paper, we propose a routability-driven placement
algorithm for large-scale heterogeneous FPGAs. Our proposed algorithm …

GPlace: a congestion-aware placement tool for ultrascale FPGAs

  • Pattison

Traditional FPGA flows that wait until the routing stage to tackle congestion are
quickly becoming less effective. This is due to the increasing size and complexity
of FPGA architectures and the designs targeted for them. In this paper, we present
two …

Resiliency in dynamically power managed designs

  • Lai

Dynamic power management has become essential for low power designs and systems. Whether
intentionally or unintentionally, these power reduction techniques and corresponding
management schemes can impact the hardware reliability and system resiliency in …

Dynamic reliability management for near-threshold dark silicon processors

  • Kim

In this article, we propose a new dynamic reliability management (DRM) techniques
at the system level for emerging low power dark silicon manycore microprocessors operating
in near-threshold region. We mainly consider the electromigration (EM) failures. …

A cross-layer approach for resiliency and energy efficiency in near threshold computing

  • Golanbari
    M. S.

Energy constrained systems become the cornerstone of emerging energy harvested or
battery-limited applications in Internet of Thing (IoT) platforms. A promising approach
is to operate at near threshold voltage ranges, which can significantly reduce …

Design space exploration of drone infrastructure for large-scale delivery services

  • Park

Drones, also referred to as unmanned aerial vehicles (UAVs), are recently expanding
their field of usage beyond military surveillance and tactical applications. Commercial
drone delivery service is one of the promising applications in the near future, …

Multi-objective design optimization for flexible hybrid electronics

  • Bhat

Flexible systems that can conform to any shape are desirable for wearable applications.
Over the past decade, there have been tremendous advances in the domain of flexible
electronics which enabled printing of devices, such as sensors on a flexible …

KCAD: kinetic cyber-attack detection method for cyber-physical additive manufacturing systems

  • Chhetri
    Sujit Rokka

Additive Manufacturing (AM) uses Cyber-Physical Systems (CPS) (e.g., 3D Printers)
that are vulnerable to kinetic cyber-attacks. Kinetic cyber-attacks cause physical
damage to the system from the cyber domain. In AM, kinetic cyber-attacks are realized
by …

Autonomous sensor-context learning in dynamic human-centered internet-of-things environments

  • Rokni
    Seyed Ali

Human-centered Internet-of-Things (IoT) applications utilize computational algorithms
such as machine learning and signal processing techniques to infer knowledge about
important events such as physical activities and medical complications. The …

Formulating customized specifications for resource allocation problem of distributed
embedded systems

  • Zhang

There are plentiful attempts for increasing the efficiency, generality and optimality
of the Design Space Exploration (DSE) algorithms for resource allocation problems
of distributed embedded systems. Most contemporary approaches formulate DSE as an

A polyhedral model-based framework for dataflow implementation on FPGA devices of
iterative stencil loops

  • Natale

Iterative Stencil Loops (ISLs) are a specific class of algorithms of great importance
for their substantial presence in a lot of industrial and scientific computing applications,
such as in numerical methods for solving partial differential equation —

Efficient memory compression in deep neural networks using coarse-grain sparsification
for speech applications

  • Kadetotad

Recent breakthroughs in deep neural networks have led to the proliferation of its
use in image and speech applications. Conventional deep neural networks (DNNs) are
fully-connected multi-layer networks with hundreds or thousands of neurons in each

Parallel code-specific CPU simulation with dynamic phase convergence modeling for
HW/SW co-design

  • Kemmerer

While SystemC models provide a promising solution to the complex problem of HW/SW
co-design within the system-on-chip paradigm, such requires a detailed annotation
of transaction level energy and performance data within the model. While this data
can be …

Architectural-space exploration of approximate multipliers

  • Rehman

This paper presents an architectural-space exploration methodology for designing approximate
multipliers. Unlike state-of-the-art, our methodology generates various design points
by adapting three key parameters: (1) different types of elementary …

Design of power-efficient approximate multipliers for approximate artificial neural

  • Mrazek

Artificial neural networks (NN) have shown a significant promise in difficult tasks
like image classification or speech recognition. Even well-optimized hardware implementations
of digital NNs show significant power consumption. It is mainly due to non-…

Automated error prediction for approximate sequential circuits

  • Kapare

Synthesis tools for approximate sequential circuits require the ability to quickly,
efficiently, and automatically characterize and bound the errors produced by the circuits.
Previous approaches to characterize errors in approximate sequential circuits …

Approximation-aware rewriting of AIGs for error tolerant applications

  • Chandrasekharan

Approximation circuits offer superior performance (speed and area) compared to traditional
circuits at the cost of computational accuracy. The accuracy of the results in approximation
circuits is evaluated based on several error metrics such as worst-…

Properties first? a new design methodology for hardware, and its perspectives in safety

  • Urdahl

This paper discusses the possible role of formal verification techniques in system-level
design flows. It is argued that the role of formal verification techniques should
not be limited to “bug hunting” alone. Instead, formal technology should be …

Where formal verification can help in functional safety analysis

  • Bernardini

Formal techniques seem to be a way to cope with the exploding complexity of functional
safety analysis. Here, the overall fault propagation probability to a certain safety-point
in the design must be analyzed. As a consequence, the careful verification …

Formal approaches to design of active cell balancing architectures in battery management

  • Steinhorst

Large battery packs composed of Lithium-Ion cells are continuously gaining in importance
due to their applications in Electric Vehicles (EVs) and smart energy grids. To ensure
maximum lifetime, safety and performance of the battery pack, complex …

How much cost reduction justifies the adoption of monolithic 3D ICs at 7nm node?

  • Ku
    Bon Woong

In this paper we study power, performance, and cost (PPC) tradeoffs for 2-tier, gate-level,
full-chip GDS monolithic 3D ICs (M3D) built using a foundry-grade 7nm bulk FinFET
technology. We first develop highly-accurate wafer and die cost models for 2D …

A novel unified dummy fill insertion framework with SQP-based optimization method

  • Tao

Dummy fill insertion is widely applied to significantly improve the planarity of topographic
patterns for chemical mechanical polishing process in VLSI manufacture. However, these
dummies will lead to additional parasitic capacitance and deteriorate the …

Efficient yield estimation through generalized importance sampling with application
to NBL-assisted SRAM bitcells

  • Ciampolini

We consider the general problem of the efficient and accurate determination of the
yield of an integrated circuit, through electrical circuit level simulation, under
variability constraints due to the manufacturing process. We demonstrate the …

Are proximity attacks a threat to the security of split manufacturing of integrated

  • Magaña

Split manufacturing is a technique that allows manufacturing the transistor-level
and lower metal layers of an IC at a high-end, untrusted foundry, while manufacturing
only the higher metal layers at a smaller, trusted foundry. Using split manufacturing

Making split-fabrication more secure

  • Yang

Today many design houses must outsource their design fabrication to a third party
which is often an overseas foundry. Split-fabrication is proposed for combining the
FEOL capabilities of an advanced but untrusted foundry with the BEOL capabilities
of a …

A machine learning approach to fab-of-origin attestation

  • Ahmadi

We introduce a machine learning approach for distinguishing between integrated circuits
fabricated in a ratified facility and circuits originating from an unknown or undesired
source based on parametric measurements. Unlike earlier approaches, which …

OpenRAM: an open-source memory compiler

  • Guthaus
    Matthew R.

Computer systems research is often inhibited by the availability of memory designs.
Existing Process Design Kits (PDKs) frequently lack memory compilers, while expensive
commercial solutions only provide memory models with immutable cells, limited …

A hardware-based technique for efficient implicit information flow tracking

  • Shin

To access sensitive information, some recent advanced attacks have been successful
in exploiting implicit flows in a program in which sensitive data affects the control
path and in turn affects other data. To track the sensitive data through implicit

Imprecise security: quality and complexity tradeoffs for hardware information flow tracking

  • Hu

Secure hardware design is a challenging task that goes far beyond ensuring functional
correctness. Important design properties such as non-interference cannot be verified
on functional circuit models due to the lack of essential information (e.g., …

Encasing block ciphers to foil key recovery attempts via side channel

  • Agosta

Providing efficient protection against energy consumption based side channel attacks
(SCAs) for block ciphers is a relevant topic for the research community, as current
overheads are in the 100x range. Unprofiled SCAs exploit information leakage from

Security of neuromorphic computing: thwarting learning attacks using memristor’s obsolescence effect

  • Yang

Neuromorphic architectures are widely used in many applications for advanced data
processing, and often implements proprietary algorithms. In this work, we prevent
an attacker with physical access from learning the proprietary algorithm implemented
by …

Generation and use of statistical timing macro-models considering slew and load variability

  • Sinha

Timing macro-modeling captures the timing characteristics of a circuit in a compact
form for use in a hierarchical timing environment. At the same time, statistical timing
provides coverage of the impact from variability sources with the goal of …

TinySPICE plus: scaling up statistical SPICE simulations on GPU leveraging shared-memory based sparse
matrix solution techniques

  • Han

TinySPICE was a SPICE simulator on GPU developed to achieve dramatic speedups in statistical
simulations of small nonlinear circuits, such as standard cell designs and SRAMs.
While TinySPICE can perform circuit simulations much faster than traditional …

PieceTimer: a holistic timing analysis framework considering setup/hold time interdependency using
a piecewise model

  • Zhang
    Grace Li

In static timing analysis, clock-to-q delays of flip-flops are considered as constants.
Setup times and hold times are characterized separately and also used as constants.
The characterized delays, setup times and hold times, are applied in timing …

A fast layer elimination approach for power grid reduction

  • Yassine

Simulation and verification of the on-die power delivery network (PDN) is one of the
key steps in the design of integrated circuits (ICs). With the very large sizes of
modern grids, verification of PDNs has become very expensive and a host of techniques

A deterministic approach to stochastic computation

  • Jenson

Stochastic logic performs computation on data represented by random bit streams. The
representation allows complex arithmetic to be performed with very simple logic, but
it suffers from high latency and poor precision. Furthermore, the results are …

Control-fluidic CoDesign for paper-based digital microfluidic biochips

  • Wang

Paper-based digital microfluidic biochips (P-DMFBs) have recently emerged as a promising
low-cost and fast-responsive platform for biochemical assays. In P-DMFBs, electrodes
and control lines are printed on a piece of photo paper using inkjet printer …

Neural networks designing neural networks: multi-objective hyper-parameter optimization

  • Smithson
    Sean C.

Artificial neural networks have gone through a recent rise in popularity, achieving
state-of-the-art results in various fields, including image classification, speech
recognition, and automated control. Both the performance and computational complexity

Error recovery in a micro-electrode-dot-array digital microfluidic biochip?

  • Li

A digital microfluidic biochip (DMFB) is an attractive technology platform for automating
laboratory procedures in biochemistry. However, today’s DMFBs suffer from several
limitations: (i) constraints on droplet size and the inability to vary droplet …

Privacy protection via appliance scheduling in smart homes

  • Wu

Smart grid, managed by intelligent devices, have demonstrated great potentials to
help residential customers to optimally schedule and manage the appliances’ energy
consumption. Due to the fine-grained power consumption information collected by smart

Framework designs to enhance reliable and timely services of disaster management systems

  • Shih

How to tolerate fault is a fundamental requirement to the designs of many cyber-physical
systems. Devices or sensors might have different requirements on their levels of reliability
and/or timely services in the composition of a cyber-physical system. …

Analysis of production data manipulation attacks in petroleum cyber-physical systems

  • Chen

Petroleum Cyber-Physical System (CPS) marks the beginning of a new chapter of the
oil and gas industry. Combining vast computational power with intelligent Computer
Aided Design (CAD) algorithms, petroleum CPS is capable of precisely modeling the
flow …

Security challenges in smart surveillance systems and the solutions based on emerging

  • Yang

Modern smart surveillance systems can not only record the monitored environment but
also identify the targeted objects and detect anomaly activities. These advanced functions
are often facilitated by deep neural networks, achieving very high accuracy …

Fast physics-based electromigration checking for on-die power grids

  • Chatterjee

Due to technology scaling, electromigration (EM) signoff has become increasingly difficult,
mainly due to the use of inaccurate methods for EM assessment, such as the empirical
Black’s model. In this paper, we present a novel approach for EM checking …

Exploring aging deceleration in FinFET-based multi-core systems

  • Cai

Power and thermal issues are the main constraints for highperformance multi-core systems.
As the current technology of choice, FinFET is observed to have lower delay under
higher temperature in super-threshold voltage region, an effect called …

An efficient and accurate algorithm for computing RC current response with applications
to EM reliability evaluation

  • Guan

In this paper, we propose a current waveform estimation algorithm for signal lines
without the necessity of SPICE simulation. Unlike previous methods, we do not use
function fitting or compute the effective capacitance. Instead, the proposed algorithm

Voltage-based electromigration immortality check for general multi-branch interconnects

  • Sun

As VLSI technology features are pushed to the limit with every generation and with
the introduction of new materials and increased current densities to satisfy the performance
demands, Electromigration (EM) is projected to be a key reliability issue for …

Exploiting randomness in sketching for efficient hardware implementation of machine
learning applications

  • Wang

Energy-efficient processing of large matrices for big-data applications using hardware
acceleration is an intense area of research. Sketching of large matrices into their
lower-dimensional representations is an effective strategy. For the first time, …

Making neural encoding robust and energy efficient: an advanced analog temporal encoder for brain-inspired computing systems

  • Zhao

Neural encoder is one of the key components in neuromorphic computing systems, whereby
sensory information is transformed into spike coded trains. The design of temporal
encoder has attracted a widespread attention in the field of neuromorphic computing

Statistical methodology to identify optimal placement of on-chip process monitors
for predicting fmax

  • Mu

In previous literatures, many approaches use ring oscillators or other process monitors
to correlate the chip’s maximum operating frequency (Fmax). But none of them focus on the placement of these on-chip process monitors (OPMs)
on a chip. The placement …

BugMD: automatic mismatch diagnosis for bug triaging

  • Mammo

System-level validation is the most challenging phase of design verification. A common
methodology in this context entails simulating the design under validation in lockstep
with a high-level golden model, while comparing the architectural state of the …

ODESY: a novel 3T-3MTJ cell design with optimized area DEnsity, scalability and latencY

  • Xue

The STT-RAM (Spin-Transfer Torque Magnetic RAM) technology is a promising candidate
for cache memory because of its high density, low standy-power, and non-volatility.
As technology scales, especially under 40nm technology node, the read disturbance

Delay-optimal technology mapping for in-memory computing using ReRAM devices

  • Bhattacharjee

Recent propositions of diverse In-Memory Computing platforms have shown a promising
alternative to classical Von Neumann computing models. Significant benefits, in terms
of energy-efficiency and performance, are reported for in-memory arithmetic …

Reconfigurable in-memory computing with resistive memory crossbar

  • Zha

Driven by recent advances in resistive random-access memory (RRAM), there have been
growing interests in exploring alternative computing concept, i.e., in-memory processing,
to address the classical von Neumann bottlenecks. Despite of their great …

Exploiting ferroelectric FETs for low-power non-volatile logic-in-memory circuits

  • Yin

Numerous research efforts are targeting new devices that could continue performance
scaling trends associated with Moore’s Law and/or accomplish computational tasks with
less energy. One such device is the ferroelectric FET (FeFET), which offers the …

Approximation knob: power capping meets energy efficiency

  • Kanduri

Power Capping techniques are used to restrict power consumption of computer systems
to a thermally safe limit. Current many-core systems employ dynamic voltage and frequency
scaling (DVFS), power gating (PG) and scheduling methods as actuators for power …

IC thermal analyzer for versatile 3-D structures using multigrid preconditioned krylov

  • Ladenheim

Thermal analysis is crucial for determining the propagation of heat and tracking the
formation of hot spots in advanced integrated circuit technologies. At the core of
the thermal analysis for integrated circuits is the numerical solution of the heat

BoostNoC: power efficient network-on-chip architecture for near threshold computing

  • Rajamanikkam

While near threshold design space provides a promising approach towards energy-efficient
computing, it is plagued by sub-optimal performance. Application characteristics and
hardware non-idealities of conventional architectures (optimized for the …

QScale: thermally-efficient QoS management on heterogeneous mobile platforms

  • Sahin

Single-ISA heterogeneous mobile processors integrate low-power and power-hungry CPU
cores together to combine energy efficiency with high performance. While running computationally
demanding applications, current power management and scheduling …

Synthesis of statically analyzable accelerator networks from sequential programs

  • Cheng

This paper describes a general framework for transforming a sequential program into
a network of processes, which are then converted to hardware accelerators through
high level synthesis. Also proposed is a complementing technique for performing static

Joint loop mapping and data placement for coarse-grained reconfigurable architecture
with multi-bank memory

  • Yin

Coarse-Grained Reconfigurable Architecture (CGRA) is a promising architecture with
high performance, high power-efficiency and attraction of flexibility. The compute-intensive
parts of an application (e.g. loops) are often mapped onto CGRA for …

Efficient synthesis of graph methods: a dynamically scheduled architecture

  • Minutoli

RDF databases naturally map to a graph representation and employ languages, such as
SPARQL, that implements queries as graph pattern matching routines. Graph methods
exhibit an irregular behavior: they present unpredictable, fine-grained data accesses,

Tier partitioning strategy to mitigate BEOL degradation and cost issues in monolithic
3D ICs

  • Samal
    Sandeep Kumar

In this paper, we develop tier partitioning strategy to mitigate back-end-of-line
(BEOL) interconnect delay degradation and cost issues in monolithic 3D ICs (M3D).
First, we study the routing overhead and delay degradation caused by tungsten BEOL

Cascade2D: A design-aware partitioning approach to monolithic 3D IC with 2D commercial tools

  • Chang

Monolithic 3D IC (M3D) can continue to improve power, performance, area and cost beyond
traditional Moore’s law scaling limitations by leveraging the third-dimension and
fine-grained monolithic inter-tier vias (MIVs). Several recent studies present …

SAINT: handling module folding and alignment in fixed-outline floorplans for 3D ICs

  • Lin

Three-dimensional integrated circuits (3D ICs) offer significant improvements over
two-dimensional circuits in several aspects. Classic 3D floorplanning algorithm places
each module at one single die. However, power consumption and wirelength of a 3D IC

From biochips to quantum circuits: computer-aided design for emerging technologies

  • Wille

While previous decades have witnessed impressive accomplishments in the design and
realization of conventional computing devices, physical boundaries and cost restrictions
led to an increasing interest in alternative technologies (often referred to as

Multilevel design understanding: from specification to logic invited paper

  • Ray

We present an outline of the field of Multilevel Design Understanding by first defining
and motivating the related problems, and then describing the key issues which must
be addressed in future research.



OLAF’17: Third International Workshop on Overlay Architectures for FPGAs

  • So
    Hayden Kwok-Hay

The Third International Workshop on Overlay Architectures for FPGAs (OLAF) is held
in Monterey, California, USA, on Feburary 22, 2017 and co-located with FPGA 2017:
The 25th ACM/SIGDA International Symposium on Field Programmable Gate Arrays. The
main …

SESSION: Special Session: The Role of FPGAs in Deep Learning

Session details: Special Session: The Role of FPGAs in Deep Learning

  • Ling

The Role of FPGAs in Deep Learning

  • Ling

Deep learning has garnered significant visibility recently as an Artificial Intelligence
(AI) paradigm, with success in wide ranging applications such as image and speech
recognition, natural language understanding, self-driving cars, and game playing (…

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?

  • Nurvitadhi

Current-generation Deep Neural Networks (DNNs), such as AlexNet and VGG, rely heavily
on dense floating-point matrix multiplication (GEMM), which maps well to GPUs (regular
parallelism, high TFLOP/s). Because of this, GPUs are widely used for …

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

  • Zhao

Convolutional neural networks (CNN) are the current stateof-the-art for many computer
vision tasks. CNNs outperform older methods in accuracy, but require vast amounts
of computation and memory. As a result, existing CNN applications are typically run

Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural

  • Zhang

OpenCL FPGA has recently gained great popularity with emerging needs for workload
acceleration such as Convolutional Neural Network (CNN), which is the most popular
deep learning architecture in the domain of computer vision. While OpenCL enhances
the …

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared
Memory System

  • Zhang

We present a novel mechanism to accelerate state-of-art Convolutional Neural Networks
(CNNs) on CPU-FPGA platform with coherent shared memory. First, we exploit Fast Fourier
Transform (FFT) and Overlap-and-Add (OaA) to reduce the computational …

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional
Neural Networks

  • Ma

As convolution layers contribute most operations in convolutional neural network (CNN)
algorithms, an effective convolution acceleration scheme significantly affects the
efficiency and performance of a hardware CNN accelerator. Convolution in CNNs …

SESSION: Machine Learning

Session details: Machine Learning

  • Cong

An OpenCL™ Deep Learning Accelerator on Arria 10

  • Aydonat

Convolutional neural nets (CNNs) have become a practical means to perform vision tasks,
particularly in the area of image classification. FPGAs are well known to be able
to perform convolutions efficiently, however, most recent efforts to run CNNs on …

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

  • Umuroglu

Research has shown that convolutional neural networks contain significant redundancy,
and high classification accuracy can be obtained even when weights and activations
are reduced from floating point to binary values. In this paper, we present FINN,
a …

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA

  • Han

Long Short-Term Memory (LSTM) is widely used in speech recognition. In order to achieve
higher prediction accuracy, machine learning scientists have built increasingly larger
models. Such large model is both computation intensive and memory intensive. …

SESSION: Interconnect and Routing

Session details: Interconnect and Routing

  • Kaptanoglu

Quality-Time Tradeoffs in Component-Specific Mapping: How to Train Your Dynamically Reconfigurable Array of Gates with Outrageous Network-delays

  • Giesen

How should we perform component-specific adaptation for FPGAs? Prior work has demonstrated
that the negative effects of variation can be largely mitigated using complete knowledge
of device characteristics and full per-FPGA CAD flow. However, the cost …

Synchronization Constraints for Interconnect Synthesis

  • Rodionov

Interconnect synthesis tools ease the burden on the designer by automatically generating
and optimizing communication hardware. In this paper we propose a novel capability
for FPGA interconnect synthesis tools that further simplifies the designer’s …

Corolla: GPU-Accelerated FPGA Routing Based on Subgraph Dynamic Expansion

  • Shen

FPGAs are increasingly popular as application-specific accelerators because they lead
to a good balance between flexibility and energy efficiency, compared to CPUs and
ASICs. However, the long routing time imposes a barrier on FPGA computing, which …

SESSION: Architecture

Session details: Architecture

  • Wilton

Don’t Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration

  • Yazdanshenas

While academic FPGA architecture exploration tools have become sufficiently advanced
to enable a wide variety of explorations and optimizations on soft fabric and outing,
support for Block RAM (BRAM) has been very limited. In this paper, we present …

Automatic Construction of Program-Optimized FPGA Memory Networks

  • Yang

Memory systems play a key role in the performance of FPGA applications. As FPGA deployments
move towards design entry points that are more serial, memory latency has become a
serious design consideration. For these applications, memory network …

NAND-NOR: A Compact, Fast, and Delay Balanced FPGA Logic Element

  • Huang

The And-Inverter Cone has been introduced as an alternative logic element to the look-up
table in FPGAs, since it improves their performance and resource utilization. However,
further analysis of the AIC design showed that it suffers from the delay …

120-core microAptiv MIPS Overlay for the Terasic DE5-NET FPGA board

  • Kumar H B

We design a 120-core 94MHz MIPS processor FPGA over-lay interconnected with a lightweight
message-passing fabric that fits on a Stratix V GX FPGA (5SGXEA7N2F45C2). We use silicon-tested
RTL source code for the microAptiv MIPS processor made available …


Session details: CAD Tools

  • Shannon

A Parallelized Iterative Improvement Approach to Area Optimization for LUT-Based Technology

  • Liu

Modern FPGA synthesis tools typically apply a predetermined sequence of logic optimizations
on the input logic network before carrying out technology mapping. While the “known
recipes” of logic transformations often lead to improved mapping results, …

A Parallel Bandit-Based Approach for Autotuning FPGA Compilation

  • Xu

Mainstream FPGA CAD tools provide an extensive collection of optimization options
that have a significant impact on the quality of the final design. These options together
create an enormous and complex design space that cannot effectively be explored …

PANEL SESSION: Panel: FPGAs in the Cloud

Session details: Panel: FPGAs in the Cloud

  • Constantinides

FPGAs in the Cloud

  • Constantinides
    George A.

Ever greater amounts of computing and storage are happening remotely in the cloud,
and it is estimated that spending on public cloud services will grow by over 19%/year
to $140B in 2019. Besides commodity processors, network and storage infrastructure,

SESSION: High-Level Synthesis — Tools and Applications

Session details: High-Level Synthesis — Tools and Applications

  • Neuendorffer

Hardware Synthesis of Weakly Consistent C Concurrency

  • Ramanathan

Lock-free algorithms, in which threads synchronise not via coarse-grained mutual exclusion
but via fine-grained atomic operations (‘atomics’), have been shown empirically to
be the fastest class of multi-threaded algorithms in the realm of conventional …

A New Approach to Automatic Memory Banking using Trace-Based Address Mining

  • Zhou

Recent years have seen an increased deployment of FPGAs as programmable accelerators
for improving the performance and energy efficiency of compute-intensive applications.
A well-known “secret sauce” of achieving highly efficient FPGA acceleration is to

Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis

  • Dai

Current pipelining approach in high-level synthesis (HLS) achieves high performance
for applications with regular and statically analyzable memory access patterns. However,
it cannot effectively handle infrequent data-dependent structural and data …

Accelerating Face Detection on Programmable SoC Using C-Based Synthesis

  • Srivastava
    Nitish Kumar

High-level synthesis (HLS) enables designing at a higher level of abstraction to effectively
cope with design complexity of emerging applications on modern programmable system-on-chip
(SoC). While HLS continues to evolve with a growing set of algorithms,…

Packet Matching on FPGAs Using HMC Memory: Towards One Million Rules

  • Rozhko

Packet processing systems increasingly need larger rulesets to satisfy the needs of
deep-network intrusion prevention and cluster computing. FPGA-based implementations
of packet processing systems have been proposed but their use of on-chip memory …

SESSION: Graph Processing Applications

Session details: Graph Processing Applications

  • Kapre

Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search

  • Zhang

Large graph processing has gained great attention in recent years due to its broad
applicability from machine learning to social science. Large real-world graphs, however,
are inherently difficult to process efficiently, not only due to their large …

ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture

  • Dai

The performance of large-scale graph processing suffers from challenges including
poor locality, lack of scalability, random access pattern, and heavy data conflicts.
Some characteristics of FPGA make it a promising solution to accelerate various …

FPGA-Accelerated Transactional Execution of Graph Workloads

  • Ma

Many applications that operate on large graphs can be intuitively parallelized by
executing a large number of the graph operations concurrently and as transactions
to deal with potential conflicts. However, large numbers of operations occurring …

SESSION: Virtualization and Applications

Session details: Virtualization and Applications

  • Lockwood

Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center

  • Tarafdar

We present a framework for creating network FPGA clusters in a heterogeneous cloud
data center. The FPGA clusters are created using a logical kernel description describing
how a group of FPGA kernels are to be connected (independent of which FPGA these …

Energy Efficient Scientific Computing on FPGAs using OpenCL

  • Weller

An indispensable part of our modern life is scientific computing which is used in
large-scale high-performance systems as well as in low-power smart cyber-physical
systems. Hence, accelerators for scientific computing need to be fast and energy …

Secure Function Evaluation Using an FPGA Overlay Architecture

  • Fang

Secure Function Evaluation (SFE) has received considerable attention recently due
to the massive collection and mining of personal data over the Internet, but large
computational costs still render it impractical. In this paper, we leverage hardware

SESSION: Applications

Session details: Applications

  • Leeser

FPGA Acceleration for Computational Glass-Free Displays

  • He

The increasing computational power enables various new applications that are runtime
prohibitive before. FPGA is one of such computational power with both reconfigurability
and energy efficiency. In this paper, we demonstrate the feasibility of …

Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling

  • Huang

With the advent of several accurate and sophisticated statistical algorithms and pipelines
for DNA sequence analysis, it is becoming increasingly possible to translate raw sequencing
data into biologically meaningful information for further clinical …

POSTER SESSION: Poster Session 1

Measuring the Power-Constrained Performance and Energy Gap between FPGAs and Processors
(Abstract Only)

  • Ye
    Andy Gean

This work measures the performance and power consumption gap between the current generation
of low power FPGAs and low power microprocessors (microcontrollers) through an implementation
of the Canny edge detection algorithm. In particular, the algorithm …

A Mixed-Signal Data-Centric Reconfigurable Architecture enabled by RRAM Technology
(Abstract Only)

  • Zha

This poster presents a data-centric reconfigurable architecture, which is enabled
by emerging non-volatile memory, i.e., RRAM. Compared to the heterogeneous architecture
of commercial FPGAs, it is inherently a homogeneous architecture comprising of a …

A Framework for Iterative Stencil Algorithm Synthesis on FPGAs from OpenCL Programming
Model (Abstract Only)

  • Wang

Iterative stencil algorithms find applications in a wide range of domains. FPGAs have
long been adopted for computation acceleration due to its advantages of dedicated
hardware design. Hence, FPGAs are a compelling alternative for executing iterative

Scala Based FPGA Design Flow (Abstract Only)

  • Liu

With the rapid growth of data scale, data analysis applications start to meet the
performance bottleneck, and thus requiring the aid of hardware acceleration. At the
same time, Field Programmable Gate Arrays (FPGAs), known for their high customizability

Thermal Flattening in 3D FPGAs Using Embedded Cooling (Abstract Only)

  • Deshpande

Thermal management is one of the key concerns in modern high power density chips.
A variety of thermal cooling techniques that have been in use in industrial applications
are now also being applied to integrated circuits. In this work, we explore the …

A Machine Learning Framework for FPGA Placement (Abstract Only)

  • Grewal

Many of the key stages in the traditional FPGA CAD flow require substantial amounts
of computational effort. Moreover, due to limited overlap among individual stages,
poor decisions made in earlier stages will often adversely affect the quality of …

Precise Coincidence Detection on FPGAs: Three Case Studies (Abstract Only)

  • Salomon

In high-performance applications, such as quantum physics and positron emission tomography,
precise coincidence detection is of central importance: The quality of the reconstructed
images depends on the accuracy with which the underlying system detects …

Towards Efficient Design Space Exploration of FPGA-based Accelerators for Streaming
HPC Applications (Abstract Only)

  • Koraei

Streaming HPC applications are data intensive and have widespread use in various fields
(e.g., Computational Fluid Dynamics and Bioinformatics). These applications consist
of different processing kernels where each kernel performs a specific computation

Accurate and Efficient Hyperbolic Tangent Activation Function on FPGA using the DCT
Interpolation Filter (Abstract Only)

  • Abdelsalam
    Ahmed M.

Implementing an accurate and fast activation function with low cost is a crucial aspect
to the implementation of Deep Neural Networks (DNNs) on FPGAs. We propose a high accuracy
approximation approach for the hyperbolic tangent activation function of …

An FPGA Overlay Architecture for Cost Effective Regular Expression Search (Abstract

  • Luinaud

Snort and Bro are Deep Packet Inspection systems which express complex rules with
regular expressions. Before performing a regular expression search, these applications
apply a filter to select which regular expressions must be searched. One way to …

POSTER SESSION: Poster Session 2

Using Vivado-HLS for Structural Design: a NoC Case Study (Abstract Only)

  • Zhao

There have been ample successful examples of applying Xilinx Vivado’s “function-to-module”
high-level synthesis (HLS) where the subject is algorithmic in nature. In this work,
we carried out a design study to assess the effectiveness of applying Vivado-…

Automatic Generation of Hardware Sandboxes for Trojan Mitigation in Systems on Chip
(Abstract Only)

  • Bobda

Component based design is one of the preferred methods to tackle system complexity,
and reduce costs and time-to-market. Major parts of the system design and IC production
are outsourced to facilities distributed across the globe, thus opening the door …

Accelerating Financial Market Server through Hybrid List Design (Abstract Only)

  • Fu

The financial market server in exchanges aims to maintain the order books and provide
real time market data feeds to traders. Low-latency processing is in a great demand
in financial trading. Although software solutions provide the flexibility to …

Joint Modulo Scheduling and Memory Partitioning with Multi-Bank Memory for High-Level
Synthesis (Abstract Only)

  • Lu

High-Level Synthesis (HLS) has been widely recognized and accepted as an efficient
compilation process targeting FPGAs for algorithm evaluation and product prototyping.
However, the massively parallel memory access demands and the extremely expensive

A Batch Normalization Free Binarized Convolutional Deep Neural Network on an FPGA
(Abstract Only)

  • Nakahara

A pre-trained convolutional deep neural network (CNN) is a feed-forward computation
perspective, which is widely used for the embedded systems, requires high power-and-area
efficiency. This paper realizes a binarized CNN which treats only binary 2-…

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural
Networks (Abstract Only)

  • Li

FPGA-based hardware accelerator for convolutional neural networks (CNNs) has obtained
great attentions due to its higher energy efficiency than GPUs. However, it has been
a challenge for FPGA-based solutions to achieve a higher throughput than GPU …

CPU-FPGA Co-Optimization for Big Data Applications: A Case Study of In-Memory Samtool Sorting (Abstract Only)

  • Cong

To efficiently process a tremendous amount of data, today’s big data applications
tend to distribute the datasets into multiple partitions, such that each partition
can be fit into memory and be processed by a separate core/server in parallel. Meanwhile,…

Stochastic-Based Multi-stage Streaming Realization of a Deep Convolutional Neural
Network (Abstract Only)

  • Alawad

Large-scale convolutional neural network (CNN), conceptually mimicking the operational
principle of visual perception in human brain, has been widely applied to tackle many
challenging computer vision and artificial intelligence applications. …

fpgaConvNet: Automated Mapping of Convolutional Neural Networks on FPGAs (Abstract Only)

  • Venieris
    Stylianos I.

In recent years, Convolutional Neural Networks (ConvNets) have become the state-of-the-art
in several Artificial Intelligence tasks. Across the range of applications, the performance
needs vary significantly, from high-throughput image recognition to …

POSTER SESSION: Poster Session 3

FPGA-based Hardware Accelerator for Image Reconstruction in Magnetic Resonance Imaging
(Abstract Only)

  • Pezzotti

Magnetic Resonance Imaging (MRI) is widely used in medical diagnostics. Sampling of
MRI data on Cartesian grids allows efficient computation of the Inverse Discrete Fourier
Transform for image reconstruction using the Inverse Fast Fourier Transform (…

Storage-Efficient Batching for Minimizing Bandwidth of Fully-Connected Neural Network
Layers (Abstract Only)

  • Shen

Convolutional neural networks (CNNs) are used to solve many challenging machine learning
problems. These networks typically use convolutional layers for feature extraction
and fully-connected layers to perform classification using those features. …

ASAP: Accelerated Short Read Alignment on Programmable Hardware (Abstract Only)

  • Banerjee
    Subho S.

The proliferation of high-throughput sequencing machines allows for the rapid generation
of billions of short nucleotide fragments in a short period. This massive amount of
sequence data can quickly overwhelm today’s storage and compute infrastructure. …

RxRE: Throughput Optimization for High-Level Synthesis using Resource-Aware Regularity Extraction
(Abstract Only)

  • Lotfi

Despite the considerable improvements in the quality of HLS tools, they still require
the designer’s manual optimizations and tweaks to generate efficient results, which
negates the HLS design productivity gains. Majority of designer interventions lead

GRT 2.0: An FPGA-based SDR Platform for Cognitive Radio Networks (Abstract Only)

  • Wu

Although there is explosive growth of theoretical research on cognitive radio, the
real-time platform for cognitive radio is progressing at a low pace. Researchers expect
fast prototyping their designs with appropriate wireless platforms to precisely …

FPGA Implementation of Non-Uniform DFT for Accelerating Wireless Channel Simulations
(Abstract Only)

  • Siripurapu

FPGAs have been used as accelerators in a wide variety of domains such as learning,
search, genomics, signal processing, compression, analytics and so on. In recent years,
the availability of tools and flows such as high-level synthesis has made it even

Learning Convolutional Neural Networks for Data-Flow Graph Mapping on Spatial Programmable
Architectures (Abstract Only)

  • Yin

Data flow graph (DFG) mapping is critical for the compiling of spatial programmable
architecture, where compilation time is a key factor for both time-to-market requirement
and mapping successful rate. Inspired from the great progress made in tree …

Cache Timing Attacks from The SoCFPGA Coherency Port (Abstract Only)

  • Chaudhuri

In this presentation we show that side-channels arising from micro-architecture of
SoCFPGAs could be a security risk. We present a FPGA trojan based on OpenCL which
performs cache-timing attacks through the accelerator coherency port (ACP) of a SoCFPGA.

Dynamic Partitioning for Library based Placement on Heterogeneous FPGAs (Abstract

  • Mao

Library based design and IP reuses have been previously proposed to speed up the synthesis
of large-scale FPGA designs. However, existing methods result in large area wastage
due to the module size difference and the waste area inside each module. In …

An Energy-Efficient Design-Time Scheduler for FPGAs Leveraging Dynamic Frequency Scaling
Emulation (Abstract Only)

  • Loke
    Wei Ting

We present a design-time tool, EASTA, that combines the feature of reconfigurability
in FPGAs and Dynamic Frequency Scaling to realize an efficient multiprocessing scheduler
on a single-FPGA system. Multiple deadlines, reconvergent nodes, flow …


SESSION: Keynote 1

Session details: Keynote 1

  • Coskun

Why Is It So Hard to Make Secure Chips?

  • Witteman

Chip security has long been the domain of smart cards. These microcontrollers are
specifically designed to thwart many different attacks in order to deliver typical
security functions as payment cards, electronic passports, and access cards. With
the …

SESSION: Keynote 2

Session details: Keynote 2

  • Han

Design and Implementation of Real-Time Multi-sensor Vision Systems

  • Leblebici

Implementation of high performance multi-camera / multi-sensor imaging systems that
are required to produce real-time video output pose a large number of unique challenges
to conventional digital design based on general-purpose processors or GPUs. In …

SESSION: Keynote 3

Session details: Keynote 3

  • Margala

Medical Device Security: The First 165 Years

  • Fu

Today, it would be difficult to find medical device technology that does not critically
depend on computer software. Network connectivity and wireless communication has transformed
the delivery of patient care. The technology often enables patients to …

SESSION: Keynote 4

Session details: Keynote 4

  • Behjat

VLSI Design Methods for Low Power Embedded Encryption

  • Verbauwhede

Intelligent things, medical devices, vehicles and factories, all part of cyberphysical
systems, will only be secure if we can build devices that can perform the mathematically
demanding cryptographic operations in an efficient way. Unfortunately, many …

SESSION: Session 1: VLSI Circuits 1

Session details: Session 1: VLSI Circuits 1

  • Navabi

High-Speed Polynomial Multiplier Architecture for Ring-LWE Based Public Key Cryptosystems

  • Du

Many lattice-based cryptosystems are based on the security of the Ring learning with
errors (Ring-LWE) problem. The most critical and computationally intensive operation
of these Ring-LWE based cryptosystems is polynomial multiplication. In this paper,

Reduced Overhead Gate Level Logic Encryption

  • Juretus

Untrusted third-parties are found throughout the integrated circuit (IC) design flow
resulting in potential threats in IC reliability and security. Threats include IC
counterfeiting, intellectual property (IP) theft, IC overproduction, and the insertion

A Design of a Non-Volatile PMC-Based (Programmable Metallization Cell) Register File

  • Junsangsri

This paper presents the design of a non-volatile register file using cells made of
a SRAM and a Programmable Metallization Cell (PMC). The proposed cell is a symmetric
8T2P (8-transistors, 2PMC) design; it utilizes three control lines to ensure the …

A Clockless Sequential PUF with Autonomous Majority Voting

  • Xu

Physical unclonable functions (PUFs) leverage minute silicon process variations to
produce device-tied secret keys. The energy and area costs of creating keys from PUFs
can far exceed the costs of the basic PUF circuits alone. Minimizing the end-to-end

SESSION: Session 2: VLSI and Test

Session details: Session 2: VLSI and Test

  • Qian

Area-Efficient Error-Resilient Discrete Fourier Transformation Design using Stochastic

  • Yuan

Discrete Fourier Transformation (DFT)/Fast Fourier Transformation (FFT) are the widely
used techniques in numerous modern signal processing applications. In general, because
of their inherent multiplication-intensive characteristics, the hardware …

Concurrent Error Detection for Reliable SHA-3 Design

  • Luo

Cryptographic systems are vulnerable to random errors and injected faults. Soft errors
can inadvertently happen in critical cryptographic modules and attackers can inject
faults into systems to retrieve the embedded secret. Different schemes have been …

Secure Model Checkers for Network-on-Chip (NoC) Architectures

  • Boraten

As chip multiprocessors (CMPs) are becoming more susceptible to process variation,
crosstalk, and hard and soft errors, emerging threats from rogue employees in a compromised
foundry are creating new vulnerabilities that could undermine the integrity of …

Parameter-importance based Monte-Carlo Technique for Variation-aware Analog Yield

  • kondamadugula

The Monte-Carlo method is the method of choice for accurate yield estimation. Standard
Monte-Carlo methods suffer from a huge computational burden even though they are very
accurate. Recently a Monte-Carlo method was proposed for the parametric yield …

SESSION: Session 3: VLSI Design 1

Session details: Session 3: VLSI Design 1

  • Thapliyal

Low Energy Sketching Engines on Many-Core Platform for Big Data Acceleration

  • Kulkarni

Almost 90% of the data available today was created within the last couple of years,
thus Big Data set processing is of utmost importance. Many solutions have been investigated
to increase processing speed and memory capacity, however I/O bottleneck is …

Low-Power Manycore Accelerator for Personalized Biomedical Applications

  • Page

Wearable personal health monitoring systems can offer a cost effective solution for
human healthcare. These systems must provide both highly accurate, secured and quick
processing and delivery of vast amount of data. In addition, wearable biomedical …

Hardware Security Threats and Potential Countermeasures in Emerging 3D ICs

  • Dofe

New hardware security threats are identified in emerging three-dimensional (3D) integrated
circuits (ICs) and potential countermeasures are introduced. Trigger and payload mechanisms
for future 3D hardware Trojans are predicted. Furthermore, a novel, …

Real-Time Analysis for Wormhole NoC: Revisited and Revised

  • Xiong

The network delay upper-bound analysis problem is of fundamental importance to real-time
applications in Network-on-Chip (NoC). In the paper, we revisit a state-of-the-art
analysis model for real-time communication in wormhole NoC with priority-based …

SESSION: Session 4: CAD 1

Session details: Session 4: CAD 1

  • Adegbija

A New Methodology for Noise Sensor Placement Based on Association Rule Mining

  • Hung

Due to near-threshold computing nowadays, voltage emergency is threatening our design
margins very seriously. Noise sensors are inserted in order to prevent various integrity
issues from happening during runtime. In this work, we use a new technique …

MCFRoute 2.0: A Redundant Via Insertion Enhanced Concurrent Detailed Router

  • Jia

In modern VLSI design, manufacturing yield and chip performance are seriously affected
by via failure. Redundant via insertion is an effective technique recommended by foundries
to deal with the via failure. However, due to the extreme scaling of …

Modular Placement for Interposer based Multi-FPGA Systems

  • Mao

Novel device with multiple FPGAs on-chip based on interposer interconnection has emerged
to resolve the IOs limit and improve the inter-FPGA communication delay. However,
new challenges arise for the placement on such architecture. Firstly, existing …

A Parallel Random Walk Solver for the Capacitance Calculation Problem in Touchscreen

  • Xu

In this paper, a random walk based solver is presented which calculates the capacitances
for verifying the touchscreen design. To suit the complicated conductor geometries
in touchscreen structures, we extend the floating random walk (FRW) method for …

POSTER SESSION: Poster Session 1

Session details: Poster Session 1

  • Moreshet

Real-Time Hardware Stereo Matching Using Guided Image Filter

  • Yang

Stereo matching is a key step in stereo vision systems that require high accurate
depth information and real-time processing of high definition image streams. This
work presents a high-accuracy hardware implementation for the stereo matching based
on …

Computing Complex Functions using Factorization in Unipolar Stochastic Logic

  • Liu

This paper addresses computing complex functions using unipolar stochastic logic.
Stochastic computing requires simple logic gates and is inherently fault-tolerant.
Thus, these structures are well suited for nanoscale CMOS technologies. Implementations

DCC: Double Capacity Cache Architecture for Narrow-Width Values

  • Imani

Modern caches are designed to hold 64-bits wide data, however a proportion of data
in the caches continues to be narrow width. In this paper, we propose a new cache
architecture which increases the effective cache capacity up to 2X for the systems
with …

Static Noise Margin based Yield Modelling of 6T SRAM for Area and Minimum Operating
Voltage Improvement using Recovery Techniques

  • Batra

In advanced technology nodes, the process variations deteriorate SRAM performance
and greatly affect yield. It is necessary to formulate yield estimation models to
optimize SRAMs and effectively trade-off area, performance and robustness. We propose

Asynchronous High Speed Serial Links Analysis using Integrated Charge for Event Detection

  • Dalakoti

We present a metric for event detection, targeted for the analysis of CMOS asynchronous
serial data links. Our metric is used to analyze signaling strategies that allow for
coincident or nearly coincident detection of both data and event timing. The …

Design and Comparative Evaluation of a Hybrid Cache Memory at Architectural Level

  • Wei

A hybrid memory cell usually consists of a Static Random Access Memory (SRAM) and
an embedded Dynamic Random Access Memory (eDRAM) cell; hybrid cells are particularly
suitable for cache design. A novel hybrid cache memory scheme (that has also non-…

A Sampling Clock Skew Correction Technique for Time-Interleaved SAR ADCs

  • Prashanth

A technique for sampling clock skew correction by adjusting the delay in the input
signal to each channel in a time-interleaved (TI) ADC is proposed. A proof-of-concept
TI ADC employing this technique was implemented in a 65 nm CMOS process. The four-…

Secure and Low-Overhead Circuit Obfuscation Technique with Multiplexers

  • Wang

Circuit obfuscation techniques have been proposed to conceal circuit’s functionality
in order to thwart reverse engineering (RE) attacks to integrated circuits (IC). We
believe that a good obfuscation method should have low design complexity and low …

Task-Resource Co-Allocation for Hotspot Minimization in Heterogeneous Many-Core NoCs

  • Reza
    Md Farhadur

To fully exploit the massive parallelism of many cores, this work tackles the problem
of mapping large-scale applications onto heterogeneous on-chip networks (NoCs) to
minimize the peak workload for energy hotspot avoidance. A task-resource co-…

Guiding Power/Quality Exploration for Communication-Intense Stream Processing

  • Tabkhi

In this paper, we explore the power/quality trade-off for streaming applications with
a shift from the computation to the communication aspects of the design. The paper
proposes a systematic exploration methodology to formulate and traverse power/…

SESSION: Session 5: Low Power 1

Session details: Session 5: Low Power 1

  • Savidis

Graphene-PLA (GPLA): a Compact and Ultra-Low Power Logic Array Architecture

  • Tenace

The key characteristics of the next generation of ICs for wearable applications include
high integration density, small area, low power consumption, high energy-efficiency,
reliability and enhanced mechanical properties like stretchability and …

A Metastability Immune Timing Error Masking Flip-Flop for Dynamic Variation Tolerance

  • Sannena

In this paper, two timing error masking flip-flops have been proposed, which are immune
to metastability. The proposed flip-flops exploit the concept of either delayed data
or pulse based approach to detect timing errors. The timing violations are …

Exploring Configurable Non-Volatile Memory-based Caches for Energy-Efficient Embedded

  • Adegbija

Non-volatile memory (NVM) technologies have recently emerged as alternatives to traditional
SRAM-based cache memories, since NVMs offer advantages such as non-volatility, low
leakage power, fast read speed, and high density. However, NVMs also have …

Multiple Attempt Write Strategy for Low Energy STT-RAM

  • Park

In this paper, we demonstrate an energy-reduction strategy that exploits the stochastic
switching characteristics of STT-RAM write operation and propose a multiple-attempt
write technique needed for it. In contrast to the traditional approach which uses

SESSION: Special Session 1: IoT Security: Issues, Innovations and Interplays

Session details: Special Session 1: IoT Security: Issues, Innovations and Interplays

  • Bhunia

Secret Sharing and Multi-user Authentication: From Visual Cryptography to RRAM Circuits

  • Arafin
    Md Tanvir

In this era of Internet of Things (IoT), connectivity exists everywhere, among everything
(including people) at all times. Therefore, security, trust, and privacy become crucial
to the design and implementation of IoT devices [12]. However, it is …

Defense Systems and IoT: Security Issues in an Era of Distributed Command and Control

  • Palmer

Security Meets Nanoelectronics for Internet of Things Applications

  • Rose
    Garrett S.

The internet of things (IoT) is quickly emerging as the next major domain for embedded
computer systems. Although the term IoT could be defined in a variety of different
ways, IoT always encompasses typically ordinary devices (e.g., thermostats and …

Tracking Data Flow at Gate-Level through Structural Checking

  • Le

The rapid growth of Internet-of-things and other electronic devices make a huge impact
on how and where data travel. The confidential data (e.g., personal data, financial
information) that travel through unreliable channels can be exposed to attackers.

SESSION: Session 6: Test 2

Session details: Session 6: Test 2

  • Yu

Design of Error-Resilient Logic Gates with Reinforcement Using Implications

  • Han

Operating circuits in the sub-threshold region can save power, but at the cost of
higher susceptibility to noise. This paper analyzes various gate-level error-mitigation
designs appropriate for sub-threshold circuits. Previous works have proposed a …

Reducing Soft-error Vulnerability of Caches using Data Compression

  • Mittal

With ongoing chip miniaturization and voltage scaling, particle strike-induced soft
errors present increasingly severe threat to the reliability of on-chip caches. In
this paper, we present a technique to reduce the vulnerability of caches to soft-…

Workload-Aware Worst Path Analysis of Processor-Scale NBTI Degradation

  • Bian

As technology further scales semiconductor devices, aging-induced device degradation
has become one of the major threats to device reliability. In addition, aging mechanisms
like the negative bias temperature instability (NBTI) is known to be sensitive …

Enhancing Fault Emulation of Transient Faults by Separating Combinational and Sequential
Fault Propagation

  • Nyberg

We present a fault emulation environment capable of injecting single and multiple
transient faults in sequential as well as combinational logic. It is used to perform
fault injection campaigns during design verification of security circuits such as

SESSION: Session 7: VLSI Circuits 2

Session details: Session 7: VLSI Circuits 2

  • Li

A Novel On-Chip Impedance Calibration Method for LPDDR4 Interface between DRAM and

  • Choi

In this paper, a novel on-chip impedance calibration methodology for a LPDDR4 (low
power double data rate) application is proposed. The background calibration operates
to compensate mismatches and variations of the output NMOS drivers from process and

A General Sign Bit Error Correction Scheme for Approximate Adders

  • Zhou

Approximate computing is an emerging design technique for error-tolerant applications.
As adders are the key building blocks in many applications, approximate adders have
been widely studied recently. However, existing approximate adders may introduce …

RRAM Refresh Circuit: A Proposed Solution To Resolve The Soft-Error Failures For HfO2/Hf 1T1R RRAM Memory

  • Tosson
    Amr M.S.

RRAM-based memory is a promising emerging technology for both on-chip and stand-alone
non-volatile data storage in advanced technologies. In addition to its small dimensions,
the RRAM device has many technological advantages including its low-…

Exploratory Power Noise Models of Standard Cell 14, 10, and 7 nm FinFET ICs

  • Patel

The physical dimensions of standard cells constrain the dimensions of power networks,
affecting the on-chip power noise. An exploratory modeling methodology is presented
for estimating power noise in advanced technology nodes. The models are evaluated

8T1R: A Novel Low-power High-speed RRAM-based Non-volatile SRAM Design

  • Abdelwahed
    Amr M.S. Tosson

With continuous and aggressive technology scaling, suppressing the stand-by power
is among the top priorities for SRAM design. Switching off the less-frequently accessed
blocks is an efficient way to reduce the stand-by power, provided that the …

SESSION: Session 8: Emerging 1

Session details: Session 8: Emerging 1

  • Yuan

Polynomial Arithmetic Using Sequential Stochastic Logic

  • Saraf

We present the design of stochastic computing systems based on sequential logic to
implement arbitrary polynomial functions. Stochastic computing is an emerging alternative
computing paradigm that performs arithmetic operations on real-valued data …

Ultra-Robust Null Convention Logic Circuit with Emerging Domain Wall Devices

  • Bai

Despite many attractive advantages, Null Convention Logic (NCL) remains to be a niche
largely due to its high imple- mentation costs. Using emerging spintronic devices,
this paper proposes a Domain-Wall-Motion-based NCL circuit design methodology that

Inter-Tier Crosstalk Noise On Power Delivery Networks For 3-D ICs With Inductively-Coupled

  • Papistas
    Ioannis A.

Inductive links have been proposed as an inter-tier interconnect solution for three-dimensional
(3-D) integrated systems. Combined with signal multiplexing, inductive links achieve
high communication bandwidth comparable to that of through silicon vias. …

Delay Estimates for Graphene Nanoribbons: A Novel Measure of Fidelity and Experiments with Global Routing Trees

  • Das

With extreme miniaturization of traditional CMOS devices in deep sub-micron design
levels, the delay of a circuit, as well as power dissipation and area are dominated
by interconnections between logic blocks. In an attempt to search for alternative

SESSION: Session 9: CAD 2

Session details: Session 9: CAD 2

  • Velev

VarDroid: Online Variability Emulation in Android/Linux Platforms

  • Mercati

Variability is the real big challenge for integrated circuits. Today, simulators help
to estimate the effect of variability, but fail to capture real workload dynamics
and user interactions, which are fundamental to mobile devices. This paper presents

Neural Network-based Prediction Algorithms for In-Door Multi-Source Energy Harvesting
System for Non-Volatile Processors

  • Liu

Due to size, longevity, safety, and recharging concerns, energy harvesting is becoming
a better choice for many wearable embedded systems than batteries. However, harvested
energy is intrinsically unstable. In order to overcome this drawback, non-…

A Unified Model of Power Sources for the Simulation of Electrical Energy Systems

  • Vinco

Models of power sources are essential elements in the simulation of systems that generate,
store and manage energy. In spite of the huge difference in power scale, they perform
a common function: converting a primary environmental quantity into power. …

Hardware-Accelerated Software Library Drivers Generation for IP-Centric SoC Designs

  • Jassi

In recent years, the semiconductor industry has been witnessing an increasing reuse
of hardware IPs for System-on-Chip (SoC) designs and embedded computing systems on
FPGA platforms with hard-core processors. The IP-reuse comes with an increasing …

Extracting Designs of Secure IPs Using FPGA CAD Tools

  • Mirian

In today’s competitive market, a company’s success is strongly dependent on delivering
sophisticated and state-of-the-art IPs prior to their competitors. To take a short
cut, a company may resort to reverse engineering or pirating their competitor’s IP.

SESSION: Special Session 3: Emerging Technology Devices and Security

Session details: Special Session 3: Emerging Technology Devices and Security

  • Rajendran

Security Primitive Design with Nanoscale Devices: A Case Study with Resistive RAM

  • Karam

Inherent stochastic physical mechanisms in emerging nonvolatile memories (NVMs), such
as resistive random-access-memory (RRAM), have recently been explored for hardware
security applications. Unlike the conventional silicon Physical Unclonable Functions

Enhancing Hardware Security with Emerging Transistor Technologies

  • Bi

We consider how the I-V characteristics of emerging transistors (particularly those
sponsored by STARnet) might be employed to enhance hardware security. An emphasis
of this work is to move beyond hardware implementations of physically unclonable …

The Applications of NVM Technology in Hardware Security

  • Yang

The emerging nonvolatile memory (NVM) technologies have demonstrated great potentials
in revolutionizing modern memory hierarchy because of their many promising properties:
nanosecond read/write time, small cell area, non-volatility, and easy CMOS …

Survey of Emerging Technology Based Physical Unclonable Funtions

  • Bautista Adames
    Ilia A.

Authentication of electronic devices has become critical. Hardware authentication
is one way to enhance security of a chip. Along with software, it makes it harder
for an intruder to access any computer, smart-phone, or other devices without …

SESSION: Session 10: VLSI Design 2

Session details: Session 10: VLSI Design 2

  • Meyer

Trellis-search based Dynamic Multi-Path Connection Allocation for TDM-NoCs

  • Chen

This paper proposes a centralized approach for connection allocation for TDM-based
NoCs by making use of dedicated hardware unit called NoCManager that employs trellis-based
search algorithm enabling dynamic parallel multi-path, multi-slot allocation. …

Prolonging Lifetime of Non-volatile Last Level Caches with Cluster Mapping

  • Soltani

Recently, work has been done on using nonvolatile cells, such as Spin Transfer Torque
RAM (STT-RAM) or Magnetic RAM (M-RAM), to construct last level caches (LLC). These
structures mitigate the leakage power and density problem found in traditional SRAM

A Low-Power Network-on-Chip Architecture for Tile-based Chip Multi-Processors

  • Psarras

Technology scaling of tiled-based CMPs reduces the physical size of each tile and
increases the number of tiles per die. This trend directly impacts the on-chip interconnect;
even though the tile population increases, the inter-tile link distances scale …

Dynamic Real-Time Scheduler for Large-Scale MPSoCs

  • Ruaro

Large-scale MPSoCs requires a scalable and dynamic real-time (RT) task scheduler,
able to handle non-deterministic computational behaviors. Current proposals for MPSoCs
have limitations, as lack of scalability, complex static steps, validation with …

SESSION: Special Session 4: Emerging Frontiers in Hardware Security

Session details: Special Session 4: Emerging Frontiers in Hardware Security

  • Joshi

Leveraging 3D Technologies for Hardware Security: Opportunities and Challenges

  • Gu

3D die stacking and 2.5D interposer design are promising technologies to improve integration
density, performance and cost. Current approaches face serious issues in dealing with
emerging security challenges such as side channel attacks, hardware …

POSTER SESSION: Poster Session 2

Session details: Poster Session 2

  • Tabkhi

FCM: Towards Fine-Grained GPU Power Management for Closed Source Mobile Games

  • Song

Contemporary mobile platforms employ embedded graphic processing units (GPUs) for
graphics-intensive games, and dynamic voltage and frequency scaling (DVFS) policies
are used to save energy without sacrificing quality. However, current GPU DVFS policies

Quality of Service-Aware, Scalable Cache Tuning Algorithm in Consumer-based Embedded

  • Alsafrjalani
    Mohamad Hammam

To meet energy and quality of service (QoS) constraints in consumer-based embedded
devices (CEDs), configurable caches can be tuned to a best configuration that consumes
the least amount of energy while adhering to QoS expectations. However, due to …

Temperature-aware Dynamic Voltage Scaling for Near-Threshold Computing

  • Kiamehr

Power/energy reduction is of uttermost importance for applications with stringent
power/energy budget such as ultra-low power and energy-harvested systems. Aggressive
voltage scaling and in particular Near-Threshold Computing (NTC) is a promising …

Leakage Power Minimization in Deep Sub-Micron Technology by Exploiting Positive Slacks
of Dependent Paths

  • Chakraborty
    Tuhin Subhra

Leakage power minimization is one of the key aspects of modern multi-million low power
system-on-chip (SoC) design. In post timing-closure phase, leakage-in-place-optimization
(LIPO) is generally adopted to reduce leakage power by swapping high-leaky …

An Enhanced Analytical Electrical Masking Model for Multiple Event Transients

  • Watkins

Due to the reducing transistor feature size, the susceptibility of modern circuits
to radiation induced errors has increased. This, as a result, has increased the likelihood
of multiple transients affecting a circuit. An important aspect when modeling …

Capturing True Workload Dependency of BTI-induced Degradation in CPU Components

  • Stamoulis

Atomistic-based approaches accurately model Bias Temperature Instability phenomena,
but they suffer from prolonged execution times, preventing their seamless integration
in system-level analysis flows. In this paper we present a comprehensive flow that

Performance Constraint-Aware Task Mapping to Optimize Lifetime Reliability of Manycore

  • Rathore

Negative bias temperature instability (NBTI) has emerged as a critical challenge to
lifetime reliability of computing systems. Traditionally, temperature-aware methodologies
are used to mitigate the impact of NBTI on aging and degradation of computing …

ASIC Implementation of An All-digital Self-adaptive PVTA Variation-aware Clock Generation

  • Pérez-Puigdemont

An all-digital self-adaptive clock generation system capable of autonomously adapt
the clock frequency to compensate the effects of static spatially heterogeneous (SSHet)
PVTA variations is presented. The design uses time-to-digital converters (TDCs) as

Ultra-Low Energy Reconfigurable Spintronic Threshold Logic Gate

  • Fan

This paper introduces a novel design of reconfigurable Spintronic Threshold Logic
Gate (STLG), which employs spintronic weight devices to perform current mode weighted
summation of binary inputs, whereas, the low voltage spintronic threshold device …

Red-Shield: Shielding Read Disturbance for STT-RAM Based Register Files on GPUs

  • Zhang

To address the high energy consumption issue of SRAM on GPUs, emerging Spin-Transfer
Torque (STT-RAM) memory technology has been intensively studied to build GPU register
files for better energy-efficiency, thanks to its benefits of low leakage power, …

Modeling and Study of Two-BDT-Nanostructure based Sequential Logic Circuits

  • Marthi

In this paper, study of different digital logic circuits developed using two-BDT ballistic
nanostructure is presented. New D flip-flop (DFF) based on the same nanostructure
is also proposed. The logic structure comprises two ballistic deflection …

SESSION: Session 11: Emerging 2

Session details: Session 11: Emerging 2

  • Dai

Exploring Main Memory Design Based on Racetrack Memory Technology

  • Hu

Emerging non-volatile memories (NVMs), which include PC-RAM and STT-RAM, have been
proposed to replace DRAM, mainly because they have better scalability and lower standby
power. However, previous research has demonstrated that these NVMs cannot …

An Offline Frequent Value Encoding for Energy-Efficient MLC/TLC Non-volatile Memories

  • Alsuwaiyan

This paper describes a low overhead, offline frequent value encoding (FVE) solution
to reduce the write energy in multi-level/triple-level cell (MLC/TLC) non-volatile
memories (NVMs). The proposed solution, which does not require any runtime software

Low-Power Multi-Port Memory Architecture based on Spin Orbit Torque Magnetic Devices

  • Bishnoi

Multi-port memories are widely used as shared memory, such as register files, in a
microprocessor system, and its number of ports and capacities are significantly increasing
with every product generation. However, with technology advancements, multi-…

Optimizing the Operating Voltage of Tunnel FET-Based SRAM Arrays Equipped with Read/Write
Assist Circuitry

  • Afzali-Kusha

This paper deals with obtaining the minimum operating voltage of memory arrays based
on TFET SRAM cells. First, we compare the I-V characteristics of two TFETs and one
FDSOI using SPICE simulations based on 20nm technology models. The results reveal

SESSION: Session 12: Low Power 2

Session details: Session 12: Low Power 2

  • Kim
    Kyung Ki

Approximate Differential Encoding for Energy-Efficient Serial Communication

  • Jahier Pagliari

Embedded computing systems include several off-chip serial links, that are typically
used to interface processing elements with peripherals, such as sensors, actuators
and I/O controllers. Because of the long physical lines of these connections, they

Fast Thermal Simulation using SystemC-AMS

  • Chen

Out of the many options available for thermal simulation of digital electronic systems,
those based on solving an RC equivalent circuit of the thermal network are the most
popular choice in the EDA community, as they provide a reasonable tradeoff …

Learning-Based Near-Optimal Area-Power Trade-offs in Hardware Design for Neural Signal

  • Aprile

Wireless implantable devices capable of monitoring the electrical activity of the
brain are becoming an important tool for understanding and potentially treating mental
diseases such as epilepsy and depression. While such devices exist, it is still …

Load Balanced On-Chip Power Delivery for Average Current Demand

  • Pathak

A dynamic power management system for homogeneous chip multi-processors (CMP) is proposed.
Each core of the CMP includes on chip DC-DC switching buck converters that are interconnected
through a switch network. The peak current rating of the buck …

DAC 2018 TOC

Ensemble learning for effective run-time hardware-based malware detection: a comprehensive analysis and classification

  • Sayadi

Malware detection at the hardware level has emerged recently as a promising solution
to improve the security of computing systems. Hardware-based malware detectors take
advantage of Machine Learning (ML) classifiers to detect pattern of malicious …

Deepsecure: scalable provably-secure deep learning

  • Rouhani
    Bita Darvish

This paper presents DeepSecure, the an scalable and provably secure Deep Learning
(DL) framework that is built upon automated design, efficient logic synthesis, and
optimization methodologies. DeepSecure targets scenarios in which neither of the …

DWE: decrypting learning with errors with errors

  • Bian

The Learning with Errors (LWE) problem is a novel foundation of a variety of cryptographic
applications, including quantumly-secure public-key encryption, digital signature,
and fully homomorphic encryption. In this work, we propose an approximate …

Reverse engineering convolutional neural networks through side-channel information

  • Hua

A convolutional neural network (CNN) model represents a crucial piece of intellectual
property in many applications. Revealing its structure or weights would leak confidential
information. In this paper we present novel reverse-engineering attacks on …

OFTL: ordering-aware FTL for maximizing performance of the journaling file system

  • Park

Journaling of ext4 file system employs two FLUSH commands to make their data durable,
even though the FLUSH is more expensive than the ordinary write operations. In this
paper, to halve the number of FLUSH commands, we propose an efficient FTL, called

LAWN: boosting the performance of NVMM file system through reducing write amplification

  • Wang

Byte-addressable non-volatile memories can be used with DRAM to build a hybrid memory
system of volatile/non-volatile main memory (NVMM). NVMM file systems demand consistency
techniques such as logging and copy-on-write to guarantee data consistency in …

FastGC: accelerate garbage collection via an efficient copyback-based data migration in SSDs

  • Wu

Copyback is an advanced command contributing to accelerating data migration in garbage
collection (GC). Unfortunately, detecting copyback feasibility (whether copyback can
be carried out with assurable reliability) against data corruption in the …

Dynamic management of key states for reinforcement learning-assisted garbage collection
to reduce long tail latency in SSD

  • Kang

Garbage collection (GC) is one of main causes of the long-tail latency problem in
storage systems. Long-tail latency due to GC is more than 100 times greater than the
average latency at the 99th percentile. Therefore, due to such a long tail latency, …

WB-trees: a meshed tree representation for FinFET analog layout designs

  • Lu

The emerging design requirements with the FinFET technology, along with traditional
geometrical constraints, make the FinFET-based analog placement even more challenging.
Previous works can handle only partial FinFET-induced design constraints because …

Analog placement with current flow and symmetry constraints using PCP-SP

  • Patyal

Modern analog placement techniques require consideration of current path and symmetry
constraints. The symmetry pairs can be efficiently packed using the symmetry island
configurations, but not all these configurations result in minimum gate …

Multi-objective bayesian optimization for analog/RF circuit synthesis

  • Lyu

In this paper, a novel multi-objective Bayesian optimization method is proposed for
the sizing of analog/RF circuits. The proposed approach follows the framework of Bayesian
optimization to balance the exploitation and exploration. Gaussian processes (…

Calibrating process variation at system level with in-situ low-precision transfer
learning for analog neural network processors

  • Jia

Process Variation (PV) may cause accuracy loss of the analog neural network (ANN)
processors, and make it hard to be scaled down, as well as feasibility degrading.
This paper first analyses the impact of PV on the performance of ANN chips. Then proposes

DPS: dynamic precision scaling for stochastic computing-based deep neural networks

  • Sim

Stochastic computing (SC) is a promising technique with advantages such as low-cost,
low-power, and error-resilience. However so far SC-based CNN (convolutional neural
network) accelerators have been kept to relatively small CNNs only, primarily due
to …

Dyhard-DNN: even more DNN acceleration with dynamic hardware reconfiguration

  • Putic

Deep Neural Networks (DNNs) have demonstrated their utility across a wide range of
input data types, usable across diverse computing substrates, from edge devices to
datacenters. This broad utility has resulted in myriad hardware accelerator …

Exploring the programmability for deep learning processors: from architecture to tensorization

  • Chen

This paper presents an instruction and Fabric Programmable Neuron Array (iFPNA) architecture, its 28nm CMOS chip prototype, and a compiler for
the acceleration of a variety of deep learning neural networks (DNNs) including convolutional
neural networks (…

LCP: a layer clusters paralleling mapping method for accelerating inception and residual
networks on FPGA

  • Lin

Deep convolutional neural networks (DCNNs) have been widely used in various AI applications.
Inception and Residual are two promising structures adopted in many important modern
DCNN models, including AlphaGo Zero’s model. These structures allow …

Ares: a framework for quantifying the resilience of deep neural networks

  • Reagen

As the use of deep neural networks continues to grow, so does the fraction of compute
cycles devoted to their execution. This has led the CAD and architecture communities
to devote considerable attention to building DNN hardware. Despite these efforts,

DeepN-JPEG: a deep neural network favorable JPEG-based image compression framework

  • Liu

As one of most fascinating machine learning techniques, deep neural network (DNN)
has demonstrated excellent performance in various intelligent tasks such as image
classification. DNN achieves such performance, to a large extent, by performing expensive

Thundervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient
deep learning accelerators

  • Zhang

Hardware accelerators are being increasingly deployed to boost the performance and
energy efficiency of deep neural network (DNN) inference. In this paper we propose
Thundervolt, a new framework that enables aggressive voltage underscaling of high-…

Loom: exploiting weight and activation precisions to accelerate convolutional neural networks

  • Sharify

Loom (LM), a hardware inference accelerator for Convolutional Neural Networks (CNNs) is presented.
In LM every bit of data precision that can be saved translates to proportional performance
gains. For both weights and activations LM exploits profile-…

Parallelizing SRAM arrays with customized bit-cell for binary neural networks

  • Liu

Recent advances in deep neural networks (DNNs) have shown Binary Neural Networks (BNNs)
are able to provide a reasonable accuracy on various image datasets with a significant
reduction in computation and memory cost. In this paper, we explore two BNNs: …

An ultra-low energy internally analog, externally digital vector-matrix multiplier
based on NOR flash memory technology

  • Mahmoodi
    M. Reza

Vector-matrix multiplication (VMM) is a core operation in many signal and data processing
algorithms. Previous work showed that analog multipliers based on nonvolatile memories
have superior energy efficiency as compared to digital counterparts at low-…

Coding approach for low-power 3D interconnects

  • Bamberg

Through-silicon vias (TSVs) in 3D ICs show a significant power consumption, which
can be reduced using coding techniques. This work presents an approach which reduces
the TSV power consumption by a signal-aware bit assignment which includes inversions

A novel 3D DRAM memory cube architecture for space applications

  • Agnesina

The first mainstream products in 3D IC design are memory devices where multiple memory
tiers are horizontally integrated to offer manifold improvements compared with their
2D counterparts. Unfortunately, none of these existing 3D memory cubes are ready …

A general graph based pessimism reduction framework for design optimization of timing

  • Peng

In this paper, we develop a general pessimism reduction framework for design optimization
of timing closure. Although the modified graph based timing analysis (mGBA) slack
model can be readily formulated into a quadratic programming problem with …

Virtualsync: timing optimization by synchronizing logic waves with sequential and combinational
components as delay units

  • Zhang
    Grace Li

In digital circuit designs, sequential components such as flip-flops are used to synchronize
signal propagations. Logic computations are aligned at and thus isolated by flip-flop
stages. Although this fully synchronous style can reduce design efforts …

Noise-aware DVFS transition sequence optimization for battery-powered IoT devices

  • Luo

Low power system-on-chips (SoCs) are now at the heart of Internet-of-Things (IoT)
devices, which are well known for their bursty workloads and limited energy storage
— usually in the form of tiny batteries. To ensure battery lifetime, DVFS has become

Accurate processor-level wirelength distribution model for technology pathfinding
using a modernized interpretation of rent’s rule

  • Prasad

Faithful system-level modeling is vital to design and technology pathfinding, and
requires accurate representation of interconnects. In this study, Rent’s rule is modernized
to cater to advanced technology and design, and applied to derive a priori …

Semi-automatic safety analysis and optimization

  • Munk

The complexity of safety-critical E/E-systems within the automotive domain are continuously
increasing. At the same time, functional safety standards such as the ISO 26262 prescribe
analysis methods like the Fault Tree Analysis (FTA) and Failure Mode …

Reasoning about safety of learning-enabled components in autonomous cyber-physical

  • Tuncali
    Cumhur Erkan

We present a simulation-based approach for generating barrier certificate functions
for safety verification of cyber-physical systems (CPS) that contain neural network-based
controllers. A linear programming solver is utilized to find a candidate …

Runtime monitoring for safety of intelligent vehicles

  • Watanabe

Advanced driver-assistance systems (ADAS), autonomous driving, and connectivity have
enabled a range of new features, but also made automotive design more complex than
ever. Formal verification can be applied to establish functional correctness, but
its …

Revisiting context-based authentication in IoT

  • Miettinen

The emergence of IoT poses new challenges towards solutions for authenticating numerous
very heterogeneous IoT devices to their respective trust domains. Using passwords
or pre-defined keys have drawbacks that limit their use in IoT scenarios. Recent …

MAXelerator: FPGA accelerator for privacy preserving multiply-accumulate (MAC) on cloud servers

  • Hussain
    Siam U.

This paper presents MAXelerator, the first hardware accelerator for privacy-preserving
machine learning (ML) on cloud servers. Cloud-based ML is being increasingly employed
in various data sensitive scenarios. While it enhances both efficiency and …

Hypernel: a hardware-assisted framework for kernel protection without nested paging

  • Kwon

Large OS kernels always suffer from attacks due to their numerous inherent vulnerabilities.
To protect the kernel, hypervisors have been employed by many security solutions.
However, relying on a hypervisor has a detrimental impact on the system …

Reducing the overhead of authenticated memory encryption using delta encoding and
ECC memory

  • Yitbarek
    Salessawi Ferede

Data stored in an off-chip memory, such as DRAM or non-volatile main memory, can potentially
be extracted or tampered by an attacker with physical access to a device. Protecting
such attacks requires storing message authentication codes and counters – …

Reducing time and effort in IC implementation: a roadmap of challenges and solutions

  • Kahng
    Andrew B.

To reduce time and effort in IC implementation, fundamental challenges must be solved.
First, the need for (expensive) humans must be removed wherever possible. Humans are
skilled at predicting downstream flow failures, evaluating key early decisions …

Efficient reinforcement learning for automating human decision-making in SoC design

  • Sadasivam

The exponential growth in PVT corners due to Moore’s law scaling, and the increasing
demand for consumer applications and longer battery life in mobile devices, has ushered
in significant cost and power-related challenges for designing and productizing …

Compensated-DNN: energy efficient low-precision deep neural networks by compensating quantization errors

  • Jain

Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence
(AI) tasks involving images, videos, text, and natural language. Their ubiquitous
adoption is limited by the high computation and storage requirements of DNNs, …

Thermal-aware optimizations of reRAM-based neuromorphic computing systems

  • Beigi
    Majed Valad

ReRAM-based systems are attractive implementation alternatives for neuromorphic computing
because of their high speed and low design cost. In this work, we investigate the
impact of temperature on the ReRAM-based neuromorphic architectures and show how …

Compiler-guided instruction-level clock scheduling for timing speculative processors

  • Fan

Despite the significant promise that circuit-level timing speculation has for enabling
operation in marginal conditions, overheads associated with recovery prove to be a
serious drawback. We show that fine-grained clock adjustment guided by the compiler

SRAM based opportunistic energy efficiency improvement in dual-supply near-threshold

  • Gu

Energy-efficient microprocessors are essential for a wide range of applications. While
near-threshold computing is a promising technique to improve energy efficiency, optimal
supply demands from logic core and on-chip memory are conflicting. In this …

Enhancing workload-dependent voltage scaling for energy-efficient ultra-low-power
embedded systems

  • Mohan

Ultra-low-power (ULP) chipsets are in higher demand than ever due to the proliferation
of ULP embedded systems to support growing applications like the Internet of Things
(IoT), wearables and sensor networks. Since ULP systems are also cost constrained,

Efficient and reliable power delivery in voltage-stacked manycore system with hybrid
charge-recycling regulators

  • Zou

Voltage stacking (VS) fundamentally improves power delivery efficiency (PDE) by series-stacking
multiple voltage domains to eliminate explicit step-down voltage conversion and reduce
energy loss along the power delivery path. However, it suffers from …

Exact algorithms for delay-bounded steiner arborescences

  • Held

Rectilinear Steiner arborescences under linear delay constraints play an important
role for buffering. We present exact algorithms for either minimizing the total length
subject to delay constraints, or minimizing the total length plus the (weighted) …

Efficient multi-layer obstacle-avoiding region-to-region rectilinear steiner tree

  • Wang

As Engineering Change Order (ECO) has attracted substantial attention in modern VLSI
design, the open net problem, which aims at constructing a shortest obstacle-avoiding
path to reconnect the net shapes in an open net, becomes more critical in the ECO

Obstacle-avoiding open-net connector with precise shortest distance estimation

  • Fang

At the end of digital integrated circuit (IC) design flow, some nets may still be
left open due to engineering change order (ECO). Resolving these opens could be quite
challenging for some huge nets such as power ground nets because of a large number
of …

COSAT: congestion, obstacle, and slew aware tree construction for multiple power domain design

  • Lu

Slew fixing, which ensures correct signal propagation, is essential during timing
closure of IC design flow. Conventionally, gate sizing, Vt swapping, or buffer insertion
is adopted to locally fix the slew violation on a single gate. Nevertheless, when

A machine learning framework to identify detailed routing short violations from a
placed netlist

  • Tabrizi
    Aysa Fakheri

Detecting and preventing routing violations has become a critical issue in physical
design, especially in the early stages. Lack of correlation between global and detailed
routing congestion estimations and the long runtime required to frequently …

DSA-friendly detailed routing considering double patterning and DSA template assignments

  • Yu

As integrated circuit technology nodes continue to shrink, dense via distribution
becomes a severe challenge, requiring multiple masks to avoid spacing violations in
via layers. Meanwhile, the directed self-assembly (DSA) technique shows a great promise

Developing synthesis flows without human knowledge

  • Yu

Design flows are the explicit combinations of design transformations, primarily involved
in synthesis, placement and routing processes, to accomplish the design of Integrated
Circuits (ICs) and System-on-Chip (SoC). Mostly, the flows are developed based …

Efficient computation of ECO patch functions

  • Dao
    Ai Quoc

Engineering Change Orders (ECO) modify a synthesized netlist after its specification
has changed. ECO is divided into two major tasks: finding target signals whose functions
should be updated and synthesizing the patch that produces the desired change. …

Canonical computation without canonical representation

  • Mishchenko

A representation of a Boolean function is canonical if, given a variable order, only one instance of the representation is possible for
the function. A
computation is canonical if the result depends only on the Boolean function and a variable order, and …

SAT based exact synthesis using DAG topology families

  • Haaswijk

SAT based exact synthesis is a powerful technique, with applications in logic optimization,
technology mapping, and synthesis for emerging technologies. However, its runtime
behavior can be unpredictable and slow. In this paper, we propose to add a new …

Efficient batch statistical error estimation for iterative multi-level approximate
logic synthesis

  • Su

Approximate computing is an emerging energy-efficient paradigm for error-resilient
applications. Approximate logic synthesis (ALS) is an important field of it. To improve
the existing ALS flows, one key issue is to derive a more accurate and efficient …

BLASYS: approximate logic synthesis using boolean matrix factorization

  • Hashemi

Approximate computing is an emerging paradigm where design accuracy can be traded
off for benefits in design metrics such as design area, power consumption or circuit
complexity. In this work, we present a novel paradigm to synthesize approximate …

Optimized I/O determinism for emerging NVM-based NVMe SSD in an enterprise system

  • Kim

Non-volatile memory express (NVMe) over peripheral component interconnect express
(PCIe) has been adopted in the storage system to provide low latency and high throughput.
NVMe allows a host system to reduce latency because it offers a high parallel …

Improving runtime performance of deduplication system with host-managed SMR storage

  • Wu

Due to the cost consideration for data storage, high-areal-density shingled-magnetic-recording
(SMR) drives and data deduplication techniques are getting popular in many data storage
services for the improvement of profit per storage unit. However, …

Wear leveling for crossbar resistive memory

  • Wen

Resistive Memory (ReRAM) is an emerging non-volatile memory technology that has many
advantages over conventional DRAM. ReRAM crossbar has the smallest 4F2 planar cell size and thus is widely adopted for constructing dense memory with large
capacity. …

RADAR: a 3D-reRAM based DNA alignment accelerator architecture

  • Huangfu

Next Generation Sequencing (NGS) technology has become an indispensable tool for studying
genomics, resulting in an exponentially growth of biological data. Booming data volume
demands significant computational resources and creates challenges for ‘…

Mamba: closing the performance gap in productive hardware development frameworks

  • Jiang

Modern high-level languages bring compelling productivity benefits to hardware design
and verification. For example, hardware generation and simulation frameworks (HGSFs)
use a single “host” language for parameterization, static elaboration, test bench

ACED: a hardware library for generating DSP systems

  • Wang

Designers translate DSP algorithms into application-specific hardware via primitives
composed in various ways for different architectural realizations. Despite sharing
underlying algorithms and hardware constructs, designs are often difficult to reuse,

PARM: <u>p</u>ower supply noise <u>a</u>ware <u>r</u>esource <u>m</u>anagement for NoC based
multicore systems in the dark silicon era

  • Raparti
    Venkata Yaswanth

Reliability is a major concern in chip multi-processors (CMPs) due to shrinking technology
and low operating voltages. Today’s processors designed at sub-10nm technology nodes
have high device densities and fast switching frequencies that cause …

Aging-constrained performance optimization for multi cores

  • Khdr

Circuit aging has become a dire design concern and hence it is considered a primary
design constraint. Current practice to cope with this problem is to apply (too) conservative

In contrast, we introduce a far less restrictive approach by …

A measurement system for capacitive PUF-based security enclosures

  • Obermaier

Battery-backed security enclosures that are permanently monitored for penetration
and tampering are common solutions for providing physical integrity to multi-chip
embedded systems. This paper presents a well-tailored measurement system for a

It’s hammer time: how to attack (rowhammer-based) DRAM-PUFs

  • Zeitouni

Physically Unclonable Functions (PUFs) are still considered promising technology as
building blocks in cryptographic protocols. While most PUFs require dedicated circuitry,
recent research leverages DRAM hardware for PUFs due to its intrinsic properties …

CamPUF: physically unclonable function based on CMOS image sensor fixed pattern noise

  • Kim

Physically unclonable functions (PUFs) have proved to be an effective measure for
secure device authentication and key generation. We propose a novel PUF design, named
CamPUF, based on commercial off-the-shelf CMOS image sensors, which are ubiquitously

Tamper-resistant pin-constrained digital microfluidic biochips

  • Tang

Digital microfluidic biochips (DMFBs)—an emerging technology that implements bioassays
through manipulation of discrete fluid droplets—are vulnerable to actuation tampering
attacks, where a malicious adversary modifies control signals for the …

Approximation-aware coordinated power/performance management for heterogeneous multi-cores

  • Kanduri

Run-time resource management of heterogeneous multi-core systems is challenging due
to i) dynamic workloads, that often result in ii) conflicting knob actuation decisions,
which potentially iii) compromise on performance for thermal safety. We present a

QoS-aware stochastic power management for many-cores

  • Pathania

A many-core processor can execute hundreds of multi-threaded tasks in parallel on
its 100s – 1000s of processing cores. When deployed in a Quality of Service (QoS)-based
system, the many-core must execute a task at a target QoS. The amount of processing

Employing classification-based algorithms for general-purpose approximate computing

  • Oliveira
    Geraldo F.

Approximate computing has recently reemerged as a design solution for additional performance
and energy improvements at the cost of output quality. In this paper, we propose using
a tree-based classification algorithm as an approximation tool for …

Using imprecise computing for improved non-preemptive real-time scheduling

  • Huang

Conventional hard real-time scheduling is often overly pessimistic due to the worst
case execution time estimation. The pessimism can be mitigated by exploiting imprecise
computing in applications where occasional small errors are acceptable. This …

A modular digital VLSI flow for high-productivity SoC design

  • Khailany

A high-productivity digital VLSI flow for designing complex SoCs is presented. The
flow includes high-level synthesis tools, an object-oriented library of synthesizable
SystemC and C++ components, and a modular VLSI physical design approach based on …

Basejump STL: systemverilog needs a standard template library for hardware design

  • Taylor
    Michael Bedford

We propose a Standard Template Library (STL) for synthesizeable SystemVerilog that
sharply reduces the time required to design digital circuits. We overview the principles
that underly the design of the open-source BaseJump STL, including light-weight …

TRIG: hardware accelerator for inference-based applications and experimental demonstration
using carbon nanotube FETs

  • Hills

The energy efficiency demands of future abundant-data applications, e.g., those which
use inference-based techniques to classify large amounts of data, exceed the capabilities
of digital systems today. Field-effect transistors (FETs) built using …

OPERON: optical-electrical power-efficient route synthesis for on-chip signals

  • Liu

As VLSI technology scales to deep sub-micron, optical interconnect becomes an attractive
alternative for on-chip communication. The traditional optical routing works mainly
optimize the path loss, and few works explicitly exploit the optical-electrical …

Soft-FET: phase transition material assisted soft switching field effect transistor for supply
voltage droop mitigation

  • Teja

Phase Transition Material (PTM) assisted novel soft switching transistor architecture
named “Soft-FET” is proposed for supply voltage droop mitigation. By utilizing the
abrupt phase transition mechanism in PTMs, the proposed Soft-FET achieves soft …

Ultralow power acoustic feature-scoring using gaussian I-V transistors

  • Trivedi
    Amit Ranjan

This paper discusses energy-efficient acoustic feature-scoring using transistors with
Gaussian-shaped Ids-Vgs. Acoustic feature-scoring is a critical step in speech recognition tasks such as
speaker recognition. Suited to the transistor, we discuss a …

Test cost reduction for X-value elimination by scan slice correlation analysis

  • Chae

X-values in test output responses corrupt an output response compaction and can cause
a fault coverage loss. X-Masking and X-Canceling MISR methods have been suggested to eliminate X-values, however, there are control data volume and test time overhead …

Cross-layer fault-space pruning for hardware-assisted fault injection

  • Dietrich

With shrinking structure sizes, soft-error mitigation has become a major challenge
in the design and certification of safety-critical embedded systems. Their robustness
is quantified by extensive fault-injection campaigns, which on hardware level can

A machine learning based hard fault recuperation model for approximate hardware accelerators

  • Taher
    Farah Naz

Continuous pursuit of higher performance and energy efficiency has led to heterogeneous
SoC that contains multiple dedicated hardware accelerators. These accelerators exploit
the inherent parallelism of tasks and are often tolerant to inaccuracies in …

SOTERIA: exploiting process variations to enhance hardware security with photonic NoC architectures

  • Chittamuru
    Sai Vineel Reddy

Photonic networks-on-chip (PNoCs) enable high bandwidth on-chip data transfers by
using photonic waveguides capable of dense-wave-length-division-multiplexing (DWDM)
for signal traversal and microring resonators (MRs) for signal modulation. A Hardware

LEAD: learning-enabled energy-aware dynamic voltage/frequency scaling in NoCs

  • Clark

Network on Chips (NoCs) are the interconnect fabric of choice for multicore processors
due to their superiority over traditional buses and crossbars in terms of scalability.
While NoC’s offer several advantages, they still suffer from high static and …

Subutai: distributed synchronization primitives in NoC interfaces for legacy parallel-applications

  • Cataldo

Parallel applications are essential for efficiently using the computational power
of a Multiprocessor System-on-Chip (MPSoC). Unfortunately, these applications do not
scale effortlessly with the number of cores because of synchronization operations
that …

Packet pump: overcoming network bottleneck in on-chip interconnects for GPGPUs

  • Cheng

In order to fully exploit GPGPU’s parallel processing power, on-chip interconnects
need to provide bandwidth efficient data communication. GPGPUs exhibit a many-to-few-to-many
traffic pattern which makes the memory controller connected routers the …

STASH: security architecture for smart hybrid memories

  • Swami

Whereas emerging non-volatile memories (NVMs) are low power, dense, scalable alternatives
to DRAM, the high latency and low endurance of these NVMs limit the feasibility of
NVM-only memory systems. Smart hybrid memories (SHMs) that integrate NVM, DRAM, …

ACME: advanced counter mode encryption for secure non-volatile memories

  • Swami

Modern computing systems that integrate emerging non-volatile memories (NVMs) are
vulnerable to classical security threats to data confidentiality (e.g., stolen DIMM
and bus snooping attacks) as well as new security threats to system availability (e.g.,

CASTLE: compression architecture for secure low latency, low energy, high endurance NVMs

  • Palangappa
    Poovaiah M.

CASTLE is a Compression-based main memory Architecture realizing a read-decrypt-free
(i.e., write-only) Secure solution for low laTency, Low Energy, high endurance non-volatile
memories (NVMs). CASTLE integrates pattern-based data compression and …

A collaborative defense against wear out attacks in non-volatile processors

  • Cronin

While the Internet of Things (IoT) keeps advancing, its full adoption is continually
blocked by power delivery problems. One promising solution is Non-Volatile (NV) processors,
which harvest energy for themselves and employ a NV memory hierarchy. This …

Protecting the supply chain for automotives and IoTs

  • Ray

Modern automotive systems and IoT devices are designed through a highly complex, globalized,
and potentially untrustworthy supply chain. Each player in this supply chain may (1)
introduce sensitive information and data (collectively termed “assets”) …

Reconciling remote attestation and safety-critical operation on simple IoT devices

  • Carpent

Remote attestation (RA) is a means of malware detection, typically realized as an
interaction between a trusted verifier and a potentially compromised remote device
(prover). RA is especially relevant for low-end embedded devices that are incapable
of …

Formal security verification of concurrent firmware in SoCs using instruction-level
abstraction for hardware

  • Huang

Formal security verification of firmware interacting with hardware in modern Systems-on-Chip
(SoCs) is a critical research problem. This faces the following challenges: (1) design
complexity and heterogeneity, (2) semantics gaps between software and …

Application level hardware tracing for scaling post-silicon debug

  • Pal

We present a method for selecting trace messages for post-silicon validation of Systems-on-a-Chips
(SoCs) with diverse usage scenarios. We model specifications of interacting flows
in typical applications. Our method optimizes trace buffer utilization …

Specification-driven automated conformance checking for virtual prototype and post-silicon

  • Gu

Due to the increasing complexity of System-on-Chip (SoC) design, how to ensure that
silicon implementations conform to their high-level specifications is becoming a major
challenge. To address this problem, we propose a novel specification-driven …

Formal micro-architectural analysis of on-chip ring networks

  • van Wesel

In the realm of Multi-Processors System-on-Chip (MPSoC’s), the Network-on-Chip (NoC)
connecting all system components plays a crucial role in the overall correctness and
performance of the system. Recent papers have proposed several ring based NoC …

HFMV: hybridizing formal methods and machine learning for verification of analog and mixed-signal

  • Hu

With increasing design complexity and robustness requirement, analog and mixed-signal
(AMS) verification manifests itself as a key bottleneck. While formal methods and
machine learning have been proposed for AMS verification, these two techniques suffer

Cost-aware patch generation for multi-target function rectification of engineering
change orders

  • Zhang

The increasing system complexity makes engineering change order (ECO) mostly inevitable
and a common practice in integrated circuit design. Despite extensive research being
made, prior methods are not effectively applicable to instances where …

Modelling multicore contention on the AURIXTM TC27x

  • Díaz

Multicores are becoming ubiquitous in automotive. Yet, the expected benefits on integration
are challenged by multicore contention concerns on timing V&V. Worst-case execution
time (WCET) estimates are required as early as possible in the software …

Cache side-channel attacks and time-predictability in high-performance critical real-time

  • Trilla

Embedded computers control an increasing number of systems directly interacting with
humans, while also manage more and more personal or sensitive information. As a result,
both safety and security are becoming ubiquitous requirements in embedded …

Cross-layer dependency analysis with timing dependence graphs

  • Möstl

We present Non-interference Analysis as a model-based method to automatically reveal,
track and analyze end-to-end timing dependencies as part of a cross-layer dependency
analysis in complex systems. Based on revealed timing dependencies of functional …

Brook auto: high-level certification-friendly programming for GPU-powered automotive systems

  • Trompouki
    Matina Maria

Modern automotive systems require increased performance to implement Advanced Driving
Assistance Systems (ADAS). GPU-powered platforms are promising candidates for such
computational tasks, however current low-level programming models challenge the …

Dynamic vehicle software with AUTOCONT

  • Jakobs

Future automotive software needs to deal with an increasing level of dynamicity, reasoned
by the wish for connected driving, software updates, and dynamic feature activation.
Such functionalities cannot be properly realized with today’s classic AUTOSAR …

Automated interpretation and reduction of in-vehicle network traces at a large scale

  • Mrowca

In modern vehicles, high communication complexity requires cost-effective integration
tests such as data-driven system verification with in-vehicle network traces. With
the growing amount of traces, distributable Big Data solutions for analyses become

Atomlayer: a universal reRAM-based CNN accelerator with atomic layer computation

  • Qiao

Although ReRAM-based convolutional neural network (CNN) accelerators have been widely
studied, state-of-the-art solutions suffer from either incapability of training (e.g.,
ISSAC [1]) or inefficiency of inference (e.g., PipeLayer [2]) due to the …

Towards accurate and high-speed spiking neuromorphic systems with data quantization-aware
deep networks

  • Liu

Deep Neural Networks (DNNs) have gained immense success in cognitive applications
and greatly pushed today’s artificial intelligence forward. The biggest challenge
in executing DNNs is their extremely data-extensive computations. The computing …

CMP-PIM: an energy-efficient comparator-based processing-in-memory neural network accelerator

  • Angizi

In this paper, an energy-efficient and high-speed comparator-based processing-in-memory
accelerator (CMP-PIM) is proposed to efficiently execute a novel hardware-oriented
comparator-based deep neural network called CMPNET. Inspired by local binary …

SNrram: an efficient sparse neural network computation architecture based on resistive random-access

  • Wang

The sparsity in the deep neural networks can be leveraged by methods such as pruning
and compression to help the efficient deployment of large-scale deep neural networks
onto hardware platforms, such as GPU or FPGA, for better performance and power …

Long live TIME: improving lifetime for training-in-memory engines by structured gradient sparsification

  • Cai

Deeper and larger Neural Networks (NNs) have made breakthroughs in many fields. While
conventional CMOS-based computing platforms are hard to achieve higher energy efficiency.
RRAM-based systems provide a promising solution to build efficient Training-…

Hierarchical hyperdimensional computing for energy efficient classification

  • Imani

Brain-inspired Hyperdimensional (HD) computing emulates cognition tasks by computing
with hypervectors rather than traditional numerical values. In HD, an encoder maps
inputs to high dimensional vectors (hypervectors) and combines them to generate a

Dadu-P: a scalable accelerator for robot motion planning in a dynamic environment

  • Lian

As a critical operation in robotics, motion planning consumes lots of time and energy,
especially in a dynamic environment. Through approaches based on general-purpose processors,
it is hard to get a valid planning in real time. We present an …

Data prediction for response flows in packet processing cache

  • Yamaki

We propose a technique to reduce compulsory misses of packet processing cache (PPC),
which largely affects both throughput and energy of core routers. Rather than prefetching
data, our technique called response prediction cache (RPC) speculatively …

PULP-HD: accelerating brain-inspired high-dimensional computing on a parallel ultra-low power

  • Montagna

Computing with high-dimensional (HD) vectors, also referred to as hypervectors, is a brain-inspired alternative to computing with scalars. Key properties of HD
computing include a well-defined set of arithmetic operations on hypervectors, generality,

Active forwarding: eliminate IOMMU address translation for accelerator-rich architectures

  • Fu

Accelerator-rich architectures employ IOMMUs to support unified virtual address, but
researches show that they fail to meet the performance and energy requirements of
accelerators. Instead of optimizing the speed/energy of IOMMU address translation,

SARA: self-aware resource allocation for heterogeneous MPSoCs

  • Song

In modern heterogeneous MPSoCs, the management of shared memory resources is crucial
in delivering end-to-end QoS. Previous frameworks have either focused on singular
QoS targets or the allocation of partitionable resources among CPU applications at

PEP: <u>p</u>roactive checkpointing for <u>e</u>fficient <u>p</u>reemption on GPUs

  • Li

The demand for multitasking GPUs increases whenever the GPU may be shared by multiple
applications, either spatially or temporally. This requires that GPUs can be preempted
and switch context to a new application while already executing one. Unlike CPUs,…

FMMU: a hardware-accelerated flash map management unit for scalable performance of flash-based

  • Woo

Address translation is increasingly a performance bottleneck in flash-based SSDs (solid
state drives). We propose a hardware-accelerated flash map management unit called
FMMU to speed up the address translation. The FMMU operates in a non-blocking …

Minimizing write amplification to enhance lifetime of large-page flash-memory storage

  • Wang

Due to the decreasing endurance of flash chips, the lifetime of flash drives has become
a critical issue. To resolve this issue, various techniques such as wear-leveling
and error correction code have been proposed to reduce the bit error rates of flash

Proactive channel adjustment to improve polar code capability for flash storage devices

  • Hsu

With the low encoding/decoding complexity and the high error correction capability,
polar code with the support of list-decoding and cyclic redundancy check can outperform
LDPC code in the area of data communication. Thus, it also draws a lot of …

Achieving defect-free multilevel 3D flash memories with one-shot program design

  • Ho

To store the desired data on MLC and TLC flash memories, the conventional programming
strategies need to divide a fixed range of threshold voltage (Vt) window into several parts. The narrowly partitioned Vt window in turn limits the design of …

Power-based side-channel instruction-level disassembler

  • Park

Modern embedded computing devices are vulnerable against malware and software piracy
due to insufficient security scrutiny and the complications of continuous patching.
To detect malicious activity as well as protecting the integrity of executable …

Side-channel security of superscalar CPUs: evaluating the impact of micro-architectural features

  • Barenghi

Side-channel attacks are performed on increasingly complex targets, starting to threaten
superscalar CPUs supporting a complete operating system. The difficulty of both assessing
the vulnerability of a device to them, and validating the effectiveness of …

Electro-magnetic analysis of GPU-based AES implementation

  • Gao

In this work, for the first time, we investigate Electro-Magnetic (EM) attacks on
GPU-based AES implementation. In detail, we first sample EM traces using a delicate
trigger; then, we build a heuristic leakage model and a novel leakage model to exploit

GPU obfuscation: attack and defense strategies

  • Chakraborty

Conventional attacks against existing logic obfuscation techniques rely on the presence
of an activated hardware for analysis. In reality, obtaining such activated chips may not always be practical,
especially if the on-chip test structures are …

Measurement-based cache representativeness on multipath programs

  • Milutinovic

Autonomous vehicles in embedded real-time systems increase critical-software size
and complexity whose performance needs are covered with high-performance hardware
features like caches, which however hampers obtaining WCET estimates that hold valid
for …

Resource-aware partitioned scheduling for heterogeneous multicore real-time systems

  • Han

Heterogeneous multicore processors have become popular computing engines for modern
embedded real-time systems recently. However, there is rather limited research on
the scheduling of real-time tasks running on heterogeneous multicore systems with

Response-time analysis of DAG tasks supporting heterogeneous computing

  • Serrano
    Maria A.

Hardware platforms are evolving towards parallel and heterogeneous architectures to
overcome the increasing necessity of more performance in the real-time domain. Parallel
programming models are fundamental to exploit the performance capabilities of …

Duet: an OLED & GPU co-management scheme for dynamic resolution adaptation

  • Lin

The increasingly high display resolution of mobile devices imposes a further burden
on energy consumption. Existing schemes manage either OLED or GPU power to save energy.
This paper presents the design, algorithm, and implementation of a co-managing …

RAMP: resource-aware mapping for CGRAs

  • Dave

Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate
even non-parallel loops. Acceleration achieved through CGRAs critically depends on
the goodness of mapping (of loop operations onto the PEs of CGRA), and in …

An architecture-agnostic integer linear programming approach to CGRA mapping

  • Chin
    S. Alexander

Coarse-grained reconfigurable architectures (CGRAs) have gained traction as a potential
solution to implement accelerators for compute-intensive kernels, particularly in
domains requiring hardware programmability. Architecture and CAD for CGRAs are …

Dnestmap: mapping deeply-nested loops on ultra-low power CGRAs

  • Karunaratne

Coarse-Grained Reconfigurable Arrays (CGRAs) provide high performance, energy-efficient
execution of the innermost loops of an application. Most real-world applications,
however, comprise of deeply-nested loops with complex and often irregular control

Locality aware memory assignment and tiling

  • Rogers

With the trend toward specialization, an efficient memory-path design is vital to
capitalize customization in data-path. A monolithic memory hierarchy is often highly
inefficient for irregular applications, traditionally targeted for CPUs. New …

GAN-OPC: mask optimization with lithography-guided generative adversarial nets

  • Yang

Mask optimization has been a critical problem in the VLSI design flow due to the mismatch
between the lithography system and the continuously shrinking feature sizes. Optical
proximity correction (OPC) is one of the prevailing resolution enhancement …

An efficient Bayesian yield estimation method for high dimensional and high sigma
SRAM circuits

  • Zhai

With increasing dimension of variation space and computational intensive circuit simulation,
accurate and fast yield estimation of realistic SRAM chip remains a significant and
complicated challenge. In this paper, du Experiment results show that the …

RAIN: a tool for reliability assessment of interconnect networks—physics to software

  • Abbasinasab

In this paper, we study the main interconnect aging processes: electromigration, thermomigration
and stress migration and propose comprehensive yet compact models for transient and
steady states based on hydrostatic stress evolution. Our model can be …

A fast and robust failure analysis of memory circuits using adaptive importance sampling

  • Shi

Performance failure has become a growing concern for the robustness and reliability
of memory circuits. It is challenging to accurately estimate the extremely small failure
probability when failed samples are distributed in multiple disjoint failure …

SpWA: an efficient sparse winograd convolutional neural networks accelerator on FPGAs

  • Lu

FPGAs have been an efficient accelerator for CNN inference due to its high performance,
flexibility, and energy-efficiency. To improve the performance of CNNs on FPGAs, fast
algorithms and sparse methods emerge as the most attractive alternatives, which …

Efficient winograd-based convolution kernel implementation on edge devices

  • Xygkis

The implementation of Convolutional Neural Networks on edge Internet of Things (IoT)
devices is a significant programming challenge, due to the limited computational resources
and the real-time requirements of modern applications. This work focuses on …

An efficient kernel transformation architecture for binary- and ternary-weight neural
network inference

  • Zheng

While deep convolutional neural networks (CNNs) have emerged as the driving force
of a wide range of domains, their computationally and memory intensive natures hinder
the further deployment in mobile and embedded applications. Recently, CNNs with low-…

Content addressable memory based binarized neural network accelerator using time-domain
signal processing

  • Choi

Binarized neural network (BNN) is one of the most promising solution for low-cost
convolutional neural network acceleration. Since BNN is based on binarized bit-level
operations, there exist great opportunities to reduce power-hungry data transfers
and …

A security vulnerability analysis of SoCFPGA architectures

  • Chaudhuri

SoCFPGAs or FPGAs integrated on the same die with chip multi processors have made
it to the market in the past years. In this article we analyse various security loopholes,
existing precautions and countermeasures in these architectures. We consider …

Raise your game for split manufacturing: restoring the true functionality through BEOL

  • Patnaik

Split manufacturing (SM) seeks to protect against piracy of intellectual property
(IP) in chip designs. Here we propose a scheme to manipulate both placement and routing
in an intertwined manner, thereby increasing the resilience of SM layouts. Key …

Analysis of security of split manufacturing using machine learning

  • Zhang

This work is the first to analyze the security of split manufacturing using machine
learning, based on data collected from layouts provided by industry, with 8 routing
metal layers, and significant variation in wire size and routing congestion across

Inducing local timing fault through EM injection

  • Ghodrati

Electromagnetic fault injection (EMFI) is an efficient class of physical attacks that
can compromise the immunity of secure cryptographic algorithms. Despite successful
EMFI attacks, the effects of electromagnetic injection (EM) on a processor are not

IAfinder: identifying potential implicit assumptions to facilitate validation in medical cyber-physical

  • Fu

According to the U.S. Food and Drug Administration (FDA) medical device recall database,
medical device recalls are at an all-time high. One of the major causes of the recalls
is due to implicit assumptions of which either the medical device operating …

An efficient timestamp-based monitoring approach to test timing constraints of cyber-physical

  • Mehrabian

Formal specifications on temporal behavior of Cyber-Physical Systems (CPS) is essential
for verification of performance and safety. Existing solutions for verifying the satisfaction
of temporal constraints on a CPS are compute and resource intensive …

Runtime adjustment of IoT system-on-chips for minimum energy operation

  • Golanbari
    Mohammad Saber

Energy-constrained Systems-on-Chips (SoC) are becoming major components of many emerging
applications, especially in the Internet of Things (IoT) domain. Although the best
energy efficiency is achieved when the SoC operates in the near-threshold region,

Edge-cloud collaborative processing for intelligent internet of things: a case study on smart surveillance

  • Mudassar
    Burhan A.

Limited processing power and memory prevent realization of state of the art algorithms
on the edge level. Offloading computations to the cloud comes with tradeoffs as compression
techniques employed to conserve transmission bandwidth and energy …

Bandwidth-efficient deep learning

  • Han

Deep learning algorithms are achieving increasingly higher prediction accuracy on
many machine learning tasks. However, applying brute-force programming to data demands
a huge amount of machine power to perform training and inference, and a huge amount

Co-design of deep neural nets and neural net accelerators for embedded vision applications

  • Kwon

Deep Learning is arguably the most rapidly evolving research area in recent years.
As a result it is not surprising that the design of state-of-the-art deep neural net
models proceeds without much consideration of the latest hardware targets, and the

Generalized augmented lagrangian and its applications to VLSI global placement

  • Zhu

Global placement dominates the circuit placement process in its solution quality and
efficiency. With increasing design complexity and various design constraints, it is
desirable to develop an efficient, high-quality global placement algorithm for …

Routability-driven and fence-aware legalization for mixed-cell-height circuits

  • Li

Placement is one of the most critical stages in the physical synthesis flow. Circuits
with increasing numbers of cells of multi-row height have brought challenges to traditional
placers on efficiency and effectiveness. Furthermore, constraints on fence …

PlanarONoC: concurrent placement and routing considering crossing minimization for optical networks-on-chip

  • Chuang

Optical networks-on-chips (ONoCs) have become a promising solution for the on-chip
communication of multi-and many-core systems to provide superior communication bandwidths,
efficiency in power consumption, and latency performance compared to electronic …

Similarity-aware spectral sparsification by edge filtering

  • Feng

In recent years, spectral graph sparsification techniques that can compute ultra-sparse
graph proxies have been extensively studied for accelerating various numerical and
graph-related applications. Prior nearly-linear-time spectral sparsification …

S2FA: an accelerator automation framework for heterogeneous computing in datacenters

  • Yu
    Cody Hao

Big data analytics using the JVM-based MapReduce framework has become a popular approach
to address the explosive growth of data sizes. Adopting FPGAs in datacenters as accelerators
to improve performance and energy efficiency also attracts increasing …

Automated accelerator generation and optimization with composable, parallel and pipeline

  • Cong

CPU-FPGA heterogeneous architectures feature flexible acceleration of many workloads
to advance computational capabilities and energy efficiency in today’s datacenters.
This advantage, however, is often overshadowed by the poor programmability of FPGAs.

TAO: techniques for algorithm-level obfuscation during high-level synthesis

  • Pilato

Intellectual Property (IP) theft costs semiconductor design companies billions of
dollars every year. Unauthorized IP copies start from reverse engineering the given
chip. Existing techniques to protect against IP theft aim to hide the IC’s …

Extracting data parallelism in non-stencil kernel computing by optimally coloring
folded memory conflict graph

  • Escobedo

Irregular memory access pattern in non-stencil kernel computing renders the well-known
hyperplane- [1], lattice- [2], or tessellation-based [3] HLS techniques ineffective.
We develop an elegant yet effective technique that synthesizes memory-optimal …

SMApproxlib: library of FPGA-based approximate multipliers

  • Ullah

The main focus of the existing approximate arithmetic circuits has been on ASIC-based
designs. However, due to the architectural differences between ASICs and FPGAs, comparable
performance gains cannot be achieved for FPGA-based systems by using the …

Sign-magnitude SC: getting 10X accuracy for free in stochastic computing for deep neural networks

  • Zhakatayev

Stochastic computing (SC) is a promising computing paradigm for applications with
low precision requirement, stringent cost and power restriction. One known problem
with SC, however, is the low accuracy especially with multiplication. In this paper
we …

Area-optimized low-latency approximate multipliers for FPGA-based hardware accelerators

  • Ullah

The architectural differences between ASICs and FPGAs limit the effective performance
gains achievable by the application of ASIC-based approximation principles for FPGA-based
reconfigurable computing systems. This paper presents a novel approximate …

Approximate on-the-fly coarse-grained reconfigurable acceleration for general-purpose

  • Brandalero

Approximate functional unit designs have the potential to reduce power consumption
significantly compared to their precise counterparts; however, few works have investigated
composing them to build generic accelerators. In this work, we do a design-…

LEMAX: learning-based energy consumption minimization in approximate computing with quality

  • Akhlaghi

Approximate computing aims to trade accuracy for energy efficiency. Various approximate
methods have been proposed in the literature that demonstrate the effectiveness of
relaxing accuracy requirements in a specific unit. This provides a basis for …

PIMA-logic: a novel processing-in-memory architecture for highly flexible and energy-efficient
logic computation

  • Angizi

In this paper, we propose PIMA-Logic, as a novel <u>P</u>rocessing-<u>i</u>n-<u>M</u>emory
<u>A</u>rchitecture for highly flexible and efficient <u>Logic</u> computation. Insteadof
integrating complex logic units in cost-sensitive memory, PIMA-Logic …

Columba S: a scalable co-layout design automation tool for microfluidic large-scale integration

  • Tseng

Microfluidic large-scale integration (mLSI) is a promising platform for high-throughput
biological applications. Design automation for mLSI has made much progress in recent
years. Columba and its succeeding work Columba 2.0 proposed a mathematical …

Design-for-testability for continuous-flow microfluidic biochips

  • Liu

Flow-based microfluidic biochips are gaining traction in the microfluidics community
since they enable efficient and low-cost biochemical experiments. These highly integrated
lab-on-a-chip systems, however, suffer from manufacturing defects, which cause …

Design and architectural co-optimization of monolithic 3D liquid state machine-based
neuromorphic processor

  • Ku
    Bon Woong

A liquid state machine (LSM) is a powerful recurrent spiking neural network shown
to be effective in various learning tasks including speech recognition. In this work,
we investigate design and architectural co-optimization to further improve the area-…

Enabling a new era of brain-inspired computing: energy-efficient spiking neural network with ring topology

  • Bai

The reservoir computing, an emerging computing paradigm, has proven its benefit to
multifarious applications. In this work, we successfully designed and fabricated an
analog delayed feedback reservoir (DFR) chip. Measurement results demonstrate its
rich …

A neuromorphic design using chaotic mott memristor with relaxation oscillation

  • Yan

The recent proposed nanoscale Mott memristor features negative differential resistance
and chaotic dynamics. This work proposes a novel neuromorphic computing system that
utilizes Mott memristors to simplify peripheral circuitry. According to the …

DrAcc: a DRAM based accelerator for accurate CNN inference

  • Deng

Modern Convolutional Neural Networks (CNNs) are computation and memory intensive.
Thus it is crucial to develop hardware accelerators to achieve high performance as
well as power/energy-efficiency on resource limited embedded systems. DRAM-based CNN

On-chip deep neural network storage with multi-level eNVM

  • Donato

One of the biggest performance bottlenecks of today’s neural network (NN) accelerators
is off-chip memory accesses [11]. In this paper, we propose a method to use multi-level,
embedded nonvolatile memory (eNVM) to eliminate all off-chip weight accesses. …

Closed yet open DRAM: achieving low latency and high performance in DRAM memory systems

  • Subramanian

DRAM memory access is a critical performance bottleneck. To access one cache block,
an entire row needs to be sensed and amplified, data restored into the bitcells and
the bitlines precharged, incurring high latency. Isolating the bitlines and sense

VRL-DRAM: improving DRAM performance via variable refresh latency

  • Das

A DRAM chip requires periodic refresh operations to prevent data loss due to charge
leakage in DRAM cells. Refresh operations incur significant performance overhead as
a DRAM bank/rank becomes unavailable to service access requests while being …

Enabling union page cache to boost file access performance of NVRAM-based storage

  • Chen

Due to the fast access performance, byte-addressability, and non-volatility of non-volatile
random access memory (NVRAM), NVRAM has emerged as a popular candidate for the design
of memory/storage systems on mobile computing systems. For example, the …

FLOSS: FLOw sensitive scheduling on mobile platforms

  • Zhang

Today’s mobile platforms have grown in sophistication to run a wide variety of frame-based
applications. To deliver better QoS and energy efficiency, these applications utilize
multi-flow execution, which exploits hardware-level parallelism across …

Context-aware dataflow adaptation technique for low-power multi-core embedded systems

  • Jung

Today’s embedded systems operate under increasingly dynamic conditions. First, computational
workloads can be either fluctuating or adjustable. Moreover, as many devices are battery-powered,
it is common to have runtime power management technique, which …

Architecture decomposition in system synthesis of heterogeneous many-core systems

  • Richthammer

Determining feasible application mappings for Design Space Exploration (DSE) and run-time
embedding is a challenge for modern many-core systems. The underlying NP-complete
system-synthesis problem faces tremendously complex problem instances due to the …

NNsim: fast performance estimation based on sampled simulation of GPGPU kernels for neural

  • Kang

Existent GPU simulators are too slow to use for neural networks implemented in GPUs.
For fast performance estimation, we propose a novel hybrid method of analytical performance
modeling and sampled simulation of GPUs. By taking full advantage of …

STAFF: online learning with stabilized adaptive forgetting factor and feature selection algorithm

  • Gupta

Dynamic resource management techniques rely on power consumption and performance models
to optimize the operating frequency and utilization of processing elements, such as
CPU and GPU. Despite the importance of these decisions, many existing approaches …

Extensive evaluation of programming models and ISAs impact on multicore soft error

  • Rosa
    Felipe da

To take advantage of the performance enhancements provided by multicore processors,
new instruction set architectures (ISAs) and parallel programming libraries have been
investigated across multiple industrial segments. This paper investigates the …

Optimized selection of wireless network topologies and components via efficient pruning
of feasible paths

  • Kirov

We address the design space exploration of wireless networks to jointly select topology
and component sizing. We formulate the exploration problem as an optimized mapping
problem, where network elements are associated with components from pre-defined …


Resource and data optimization for hardware implementation of deep neural networks
targeting FPGA-based edge devices

  • Liu

Recently, as machine learning algorithms have become more practical, there has been
much effort to implement them on edge devices that can be used in our daily lives.
However, unlike server-scale devices, edge devices are relatively small and thus have

A study of optimal cost-skew tradeoff and remaining suboptimality in interconnect
tree constructions

  • Han

Cost and skew are among the most fundamental objectives for interconnect tree synthesis.
The cost-skew tradeoff is particularly important in buffered clock tree construction,
where clock subnets are an important “sweet spot” for balancing on-chip …

A design framework for processing-in-memory accelerator

  • Gao

With increasing performance mismatch between processor and memory, “memory wall” has
become the bottleneck of the entire computing system. In order to bridge the gap,
processing-in-memory (PIM) has been revisited as a viable option to overcome the …

Fast and precise routability analysis with conditional design rules

  • Kang

As pin accessibility encounters more challenges due to the less number of tracks,
higher pin density, and more complex design rules, routability has become one bottleneck
of sub-10nm designs. Thus, we need a new design methodology for fast turnaround in …

Adaptive sensitivity analysis with nonlinear power load modeling

  • Hsu

Voltage fluctuation in power networks is a critical issue for VLSI designs. The analysis
and optimization of the voltage drops rely on accurate sensitivity calculation. Due
to the high complexity of large-scale circuits, in practice active devices are …

Exploiting PDN noise to thwart correlation power analysis attacks in 3D ICs

  • Dofe

Three-dimensional (3D) integration is envisioned as a natural defense to thwart side-channel
analysis (SCA) attacks. However, there lack extensive studies on the unique feature
of 3D power distribution network (PDN) noise and its impact on the …


SESSION: Welcome and Keynote Address

Technology Options for Beyond-CMOS

  • Young

CMOS integrated circuit technology for computation is at an inflexion point. Although
this is the technology which has enabled the semiconductor industry to make vast progress
over the past 30-plus years, it is expected to see challenges going beyond …

SESSION: Machine Learning in EDA

The Quest for The Ultimate Learning Machine

  • Dubey

Traditionally, there has been a division of labor between computers and humans where
all forms of number crunching and bit manipulations are left to computers; whereas,
intelligent decision-making is left to us humans. We are now at the cusp of a major

Deep Learning in the Enhanced Cloud

  • Chung

Deep Learning has emerged as a singularly critical technology for enabling human-like
intelligence in online services such as Azure, Office 365, Bing, Cortana, Skype, and
other high-valued scenarios at Microsoft. While Deep Neural Networks (DNNs) have …

Bilinear Lithography Hotspot Detection

  • Zhang

Advanced semiconductor process technologies are producing various circuit layout patterns,
and it is essential to detect and eliminate problematic ones, which are called lithography
hotspots. These hotspots are formed due to light diffraction and …

Routability Optimization for Industrial Designs at Sub-14nm Process Nodes Using Machine

  • Chan
    Wei-Ting J.

Design rule check (DRC) violations after detailed routing prevent a design from being
taped out. To solve this problem, state-of-the-art commercial EDA tools global-route
the design to produce a global-route congestion map; this map is used by the …

SESSION: Monday Afternoon Keynote

Pushing the boundaries of Moore’s Law to transition from FPGA to All Programmable

  • Bolsens

Since their inception, FPGAs have changed significantly in their capacity and architecture.
The devices we use today are called upon to solve problems in mixed-signal, high-speed
communications, signal processing and compute acceleration that early …

POSTER SESSION: Invited Poster Presentation

How Game Engines Can Inspire EDA Tools Development: A use case for an open-source physical design library

  • Fontana

Similarly to game engines, physical design tools must handle huge amounts of data.
Although the game industry has been employing modern software development concepts
such as data-oriented design, most physical design tools still relies on object-…

Rsyn: An Extensible Physical Synthesis Framework

  • Flach

Due to the advanced stage of development on EDA science, it has been increasingly
difficult to implement realistic software infrastructures in academia so that new
problems and solutions are tested in a meaningful and consistent way. In this paper
we …

SESSION: Nontraditional Physical Design Challenges

Research Challenges in Security-Aware Physical Design

  • Karri

The presentation will discuss security techniques such as IC camouflaging and logic

Challenges and Opportunities: From Near-memory Computing to In-memory Computing

  • Khoram

The confluence of the recent advances in technology and the ever-growing demand for
large-scale data analytics created a renewed interest in a decades-old concept, processing-in-memory
(PIM). PIM, in general, may cover a very wide spectrum of compute …

Physical Design Considerations of One-level RRAM-based Routing Multiplexers

  • Tang

Resistive Random Access Memory(RRAM) technology opens the opportunity for granting both high-performance and low-power
features to routing multiplexers. In this paper, we study the physical design considerations
related to RRAM-based routing …

Hierarchical and Analytical Placement Techniques for High-Performance Analog Circuits

  • Xu

High-performance analog integrated circuits usually require minimizing critical parasitic
loading, which can be modeled by the critical net wire length in the layout stage.
In order to reduce post-layout circuit performance degradation, critical net …

SESSION: Tuesday Keynote Address

Physical Design Challenges and Innovations to Meet Power, Speed, and Area Scaling

  • Lu

In the advanced process technologies of 7nm and beyond, the semiconductor industry
faces several new challenges: (1) aggressive chip area scaling with economically feasible
process technology development, (2) sufficient performance enhancement of …

SESSION: Clock and Timing

Modern Challenges in Constructing Clocks

  • Alpert
    Charles J.

Clock Tree Construction based on Arrival Time Constraints

  • Ewetz

There are striking differences between constructing clock trees based on dynamic implied
skew constraints and based on static arrival time constraints. Dynamic implied skew
constraints allow the full timing margins to be utilized, but the constraints …

A Fast Incremental Cycle Ratio Algorithm

  • Wu

In this paper, we propose an algorithm to quickly find the maximum cycle ratio (MCR)
on an incrementally changing directed cyclic graph. Compared with traditional MCR
algorithms which have to recalculate everything from scratch at each incremental …

iTimerM: Compact and Accurate Timing Macro Modeling for Efficient Hierarchical Timing Analysis

  • Lee

As designs continue to grow in size and complexity, EDA paradigm shifts from flat
to hierarchical timing analysis. In this paper, we propose compact and accurate timing
macro modeling, which is the key to achieve efficient and accurate hierarchical …

SESSION: Routability Considerations

DSAR: DSA aware Routing with Simultaneous DSA Guiding Pattern and Double Patterning Assignment

  • Ou

Directed self-assembly (DSA) is a promising solution for fabrication of contacts and
vias for advanced technology nodes. In this paper, we study a DSA aware detailed routing
problem, where DSA guiding pattern assignment and guiding pattern double …

Automatic Cell Layout in the 7nm Era

  • Cremer

Multi patterning technology used in 7nm technology and beyond imposes more and more
complex design rules on the layout of cells. The often non local nature of these new
design rules is a great challenge not only for human designers but also for existing

Improving Detailed Routability and Pin Access with 3D Monolithic Standard Cells

  • Shi

We study the impact of using 3D monolithic (3DM) standard cells on improving detailed
routability and pin access. We propose a design flow which transforms standard rows
of single-tier “2D” cells into rows of standard 3DM cells folded into two tiers. …

SESSION: Commemoration for Professor Satoshi Goto

The Spirit of in-house CAD Achieved by the Legend of Master “Prof. Goto” and his Apprentices

  • Nakamura

In this paper, a legend story to develop CAD algorithms and CAD/EDA tools for NEC’s
in-house use is described. About 30 years ago, since there are few commercial CAD
tools, ICT vendors had to develop their own CAD tools to enhance the performance of

Generalized Force Directed Relaxation with Optimal Regions and Its Applications to
Circuit Placement

  • Chang
    Yao Wen

This paper introduces popular algorithmic paradigms for circuit placement, presents
Goto’s classical placement framework based on the generalized force directed relaxation
(GFDR) method with an optimal region (OR) formulation and its impacts on modern …

100x Evolution of Video Codec Chips

  • Zhou

In the past two decades, there has been tremendous progress in video compression technologies.
Meanwhile, the use of these technologies, along with the ever-increasing demand for
emerging ultra-high-definition applications greatly challenges the design …

Physical Layout after Half a Century: From Back-Board Ordering to Multi-Dimensional Placement and Beyond

  • Kang

Innovations and advancements on physical design (PD) in the past half century significantly
contribute to the progresses of modern VLSI designs. While “Moore’s Law” and “Dennard
Scaling” have become slowing down recently, physical design society …

Past, Present and Future of the Research

  • Goto

SESSION: Optimization and Placement

Interesting Problems in Physical Synthesis

  • Ho

It is a misperception that the Chinese have the same word for crisis as opportunity.
Despite that, a technical crisis does present opportunities for researchers and practitioners
to solve interesting problems. In this talk we point out two crises: …

Pin Accessibility-Driven Detailed Placement Refinement

  • Ding

The significantly increased number of routing design rules at sub-20nm nodes has made
pin access one of the most critical challenges in detailed routing. Resolving pin
access issues in detailed routing stage may be too late due to the fixed pin …

A Fast, Robust Network Flow-based Standard-Cell Legalization Method for Minimizing
Maximum Movement

  • Karimpour Darav

The standard-cell placement legalization problem has become critical due to increasing
design rule complexity and design utilization at 16nm and lower technology nodes.
An ideal legalization approach should preserve the quality of the input placement
in …


CAD Opportunities with Hyper-Pipelining

  • Iyer
    Mahesh A.

Hyper-pipelining is a design technique that results in significant performance and
throughput improvements in latency-insensitive designs. Modern FPGA architectures
like Intel’s Stratix®10 feature a revolutionary register-rich HyperFlex? core fabric

An Effective Timing-Driven Detailed Placement Algorithm for FPGAs

  • Dhar

In this paper, we propose a new timing-driven detailed placement technique for FPGAs
based on optimizing critical paths. Our approach extends well beyond the previously
known critical path optimization approaches and explores a significantly larger …

Clock-Aware FPGA Placement Contest

  • Yang

Modern FPGA device contains complex clocking architecture on top of FPGA logic fabric.
To best utilize FPGA clocking architecture, both FPGA designers and EDA tool developers
need to understand the clocking architecture and design best methodology/…


A Comparative Analysis of Front-End and Back-End Compatible Silicon Photonic On-Chip

  • Thakkar
    Ishan G.

Photonic devices fabricated with back-end compatible silicon photonic (BCSP) materials
can provide independence from the complex CMOS front-end compatible silicon photonic
(FCSP) process, to significantly enhance photonic network-on-chip (PNoC) …

Latch Clustering for Minimizing Detection-to-Boosting Latency Toward Low-Power Resilient

  • Hsu

Dynamic voltage scaling (DVS) has become one of the most effective approaches to achieve
ultra-low-power SoC. To eliminate timing errors arising from DVS, several error-resilient
circuit design techniques were proposed to detect and/or correct timing …

Connectivity Effects on Energy and Area for Neuromorphic System with High Speed Asynchronous
Pulse Mode Links

  • Segal

Hardware neuromorphic systems are challenged to achieve biologically realistic levels
of interconnectivity. When building a physical implementation of a neural net, the
properties of the media immediately impose limits on the number of interconnects and

Buffered Interconnects in 3D IC Layout Design

  • Ahmed
    Mohammad A.

A very important challenge in designing through-silicon via (TSV)-based 3D ICs is
to accurately estimate, through all stages of the physical design, the interconnect
delay which is strongly dependent on the layout of 3D IC. The earlier in the design

Topologically-Geometric Routing

  • Bazylevych

The paper introduces foundations of the “Flexible Routing Method” that belongs to
the topologically-geometric type. It develops the idea to divide the routing problem
on two separate successive stages: topological and geometrical. At the first stage
it …

Revisiting 3DIC Benefit with Multiple Tiers

  • Chan
    Wei-Ting Jonas

3DICs with multiple tiers are expected to achieve large benefits (e.g., in terms of
power, area) as compared to conventional planar designs. However, few if any previous
works study upper bounds on power and area benefits from 3DIC integration with …

Spin-Hall Assisted STT-RAM Design and Discussion

  • Eken

In recent years, Spin-Transfer Torque Random Access Memory (STT-RAM) has attracted
significant attentions from both industry and academia due to its attractive attributes
such as small cell area and non-volatility. However, long switching time and large

A Demand-Aware Predictive Dynamic Bandwidth Allocation Mechanism for Wireless Network-on-Chip

  • Mansoor

Long distance data communication over multi-hop wireline paths in conventional Networks-on-Chips
(NoCs) cause high energy consumption and degradation in bandwidth. Wireless interconnects
in the millimeter-wave band have emerged as an energy-efficient …


SESSION: Keynote Address

Session details: Keynote Address

  • Chu

Challenges and Opportunities in Automotive, Industrial, and IoT Physical Design

  • Hill
    Anthony M.

Taping out modern, complex SOCs presents a myriad of challenges in physical design.
Doing so for demanding markets such as automotive, industrial, and IoT multiplies
that complexity. In this talk we will take a broad look across the physical design

SESSION: Finding the Golden Tree in the Forest!

Session details: Finding the Golden Tree in the Forest!

  • Yeap

Wot the L: Analysis of Real versus Random Placed Nets, and Implications for Steiner Tree Heuristics

  • Kahng
    Andrew B.

The NP-hard Rectilinear Steiner Minimum Tree (RSMT) problem has been studied in the
VLSI physical design literature for well over three decades. Fast estimators of RSMT
cost (which reflects routed wirelength) are a required ingredient of modern physical

Prim-Dijkstra Revisited: Achieving Superior Timing-driven Routing Trees

  • Alpert
    Charles J.

The Prim-Dijkstra (PD ) construction [1] was first presented over 20 years ago as a way to efficiently
trade off between shortest-path and minimum-wirelength routing trees. This approach
has stood the test of time, having been integrated into leading …

Construction of All Rectilinear Steiner Minimum Trees on the Hanan Grid

  • Lin
    Sheng-En David

Given a set of pins, a Rectilinear Steiner Minimum Tree (RSMT) connects the pins using
only rectilinear edges with the minimum wirelength. RSMT construction is heavily used
at various design steps such as floorplanning, placement, routing, and …

SESSION: FPGA Special Session

Session details: FPGA Special Session

  • Das

Challenges in Large FPGA-based Logic Emulation Systems

  • Hung
    William N.N.

Functional verification is an important aspect of electronic design automation. Traditionally,
simulation at the register transfer-level has been the mainstream functional verification
approach. Formal verification and various static analysis checkers …

Flexibility: FPGAs and CAD in Deep Learning Acceleration

  • Chiu
    Gordon R.

Deep learning inference has become the key workload to accelerate in our AI-powered
world. FPGAs are an ideal platform for the acceleration of deep learning inference
by combining low-latency performance, power-efficiency, and flexibility. This paper

Exploration and Tradeoffs of different Kernels in FPGA Deep Learning Applications

  • Delaye

In the field of deep learning, efficient computational hardware has come to the forefront
of the large scale implementation and deployment of many applications. In the process
of designing hardware, various characteristics of hardware platforms have …

Architecture Exploration of Standard-Cell and FPGA-Overlay CGRAs Using the Open-Source
CGRA-ME Framework

  • Chin
    S. Alexander

We describe an open-source software framework,CGRA-ME, for the modeling and exploration
of coarse-grained reconfigurable architectures (CGRAs). CGRAs are programmable hardware
devices having large ALU-like logic blocks, and datapath bus-style inter-…

SESSION: Design Flow and Power Grid Optimization

Session details: Design Flow and Power Grid Optimization

  • Iyer

Concurrent High Performance Processor Design: From Logic to PD in Parallel

  • Stok

The design of a high-performance processor in an advanced technology node is a highly
concurrent process. While most SoCs are designed with (fairly) stable IP, several
trends are driving the design of the micro-architecture, the logic and the physical

Towards a VLSI Design Flow Based on Logic Computation and Signal Distribution

  • Reis

This paper discusses directions for a VLSI design flow based on a novel paradigm of
local logic computation and global signal distribution. In the last years there has
been an increasing effort to perform a better integration between logic synthesis
and …

Power Grid Reduction by Sparse Convex Optimization

  • Ye

With the dramatic increase in the complexity of modern integrated circuits (ICs),
direct analysis and verification of IC power distribution networks (PDNs) have become
extremely computationally expensive. Various power grid reduction methods are …

SESSION: Statistical and Machine Learning-Based CAD

Session details: Statistical and Machine Learning-Based CAD

  • Kissiov

Machine Learning Applications in Physical Design: Recent Results and Directions

  • Kahng
    Andrew B.

In the late-CMOS era, semiconductor and electronics companies face severe product
schedule and other competitive pressures. In this context, electronic design automation
(EDA) must deliver “design-based equivalent scaling” to help continue essential …

Machine Learning for Feature-Based Analytics

  • Wang

Applying machine learning in Electronic Design Automation (EDA) has received growing
interests in recent years. One approach to analyze data in EDA applications can be
called feature-based analytics. In this context, the paper explains the inadequacy
of …

Data Efficient Lithography Modeling with Residual Neural Networks and Transfer Learning

  • Lin

Lithography simulation is one of the key steps in physical verification, enabled by
the substantial optical and resist models. A resist model bridges the aerial image
simulation to printed patterns. While the effectiveness of learning-based solutions

SESSION: Three Shades of Placement!

Session details: Three Shades of Placement!

  • Shinnerl

Compact-2D: A Physical Design Methodology to Build Commercial-Quality Face-to-Face-Bonded 3D ICs

  • Ku
    Bon Woong

The recent advancement of wafer bonding technology offers fine-grained and silicon-space
overhead-free 3D interconnections in face-to-face (F2F) bonded 3D ICs. In this paper,
we propose a full-chip RTL-to-GDSII physical design solution to build high-…

Analog Placement Constraint Extraction and Exploration with the Application to Layout

  • Xu

In analog/mixed-signal (AMS) integrated circuits (ICs), most of the layout design
efforts are still handled manually, which is time-consuming and error-prone. Given
the previous high-quality manual layouts containing valuable design expertise of …

Pin Assignment Optimization for Multi-2.5D FPGA-based Systems

  • Kuo

Advanced 2.5D FPGAs with larger logic capacity and higher pin counts compared to conventional
FPGAs are commercially available. Some multi-FPGA systems have already utilized 2.5D
FPGAs. Commercial 2.5D FPGA consists of multiple dies connected through an …

SESSION: Commemoration for Professor Te Chiang Hu

Session details: Commemoration for Professor Te Chiang Hu

  • Kahng
    Andrew B.

Influence of Professor T. C. Hu’s Works on Fundamental Approaches in Layout

  • Kahng
    Andrew B.

Professor T. C. Hu has made numerous pioneering and fundamental contributions in combinatorial
algorithms, mathematical programming and operations research. His seminal 1985 IEEE
book VLSI Circuit Layout: Theory and Design, coedited with Prof. E. S. Kuh,…

Tree Structures and Algorithms for Physical Design

  • Cheng

Tree structures and algorithms provide a fundamental and powerful data abstraction
and methods for computer science and operations research. In particular, they enable
significant advancement of IC physical design techniques and design optimization.
For …

Pioneer Research on Mathematical Models and Methods for Physical Design

  • Chu

In the inaugural International Symposium on Physical Design (ISPD) at 1997, Prof.
Te Chiang Hu has delivered the keynote address “Physical Design: Mathematical Models
and Methods” [1]. Without any question, Prof. Hu has made a lot of foundational and

Theory and Algorithms of Physical Design

  • Cheng

SESSION: Interconnect Optimization and Detailed Routing Contest Results

Session details: Interconnect Optimization and Detailed Routing Contest Results

  • Yan

Interconnect Optimization Considering Multiple Critical Paths

  • Hu

Interconnect optimization, including buffer insertion and Steiner tree construction,
continues to be a pillar technology that largely determines overall chip performance.
Buffer insertion algorithms in published literature are mostly focused on …

Interconnect Physical Optimization

  • Janac
    K. Charles

The SoC Interconnect is one of the most important IPs in modern chips as it is the
logical and physical instantiation of an SoC architecture and carries virtually all
the SoC data. Interconnect IPs have to carry non-coherent, cache coherent, subsystem

ISPD 2018 Initial Detailed Routing Contest and Benchmarks

  • Mantik

In advanced technology nodes, detailed routing becomes the most complicated and runtime
consuming stage. To spur detailed routing research, ISPD 2018 initial detailed routing
contest is hosted and it is the first ISPD contest on detailed routing …

SESSION: How to Make Your Foundry Happier?

Session details: How to Make Your Foundry Happier?

  • Hu

The Pressing Need for Electromigration-Aware Physical Design

  • Lienig

Electromigration (EM) is becoming a progressively intractable design challenge due
to increased interconnect current densities. It has changed from something designers
“should” think about to something they “must” think about, i.e., it is now a definite

On Coloring and Colorability Analysis of Integrated Circuits with Triple and Quadruple
Patterning Techniques

  • Lvov

The continued delay of higher resolution alternatives for lithography, such as EUV,
is forcing the continued adoption of multi-patterning solutions in new technology
nodes, which include triple and quadruple patterning using several lithography-etch

Standard CAD Tool-Based Method for Simulation of Laser-Induced Faults in Large-Scale

  • Viera
    Raphael A.C.

Designing secure integrated systems requires methods and tools dedicated to simulating
that early design stages’ the effects of laser-induced transient faults maliciously
injected by attackers. Existing methods for simulation of laser-induced transient