GLSVLSI 2019 TOC

Digital Library logo
Full Citation in the ACM Digital Library

SESSION: Keynote & Invited Talks

Thoughts on Edge Intelligence

Wolf
Marilyn

Machine learning methods have exploded in the past half-dozen years. Machine learning
is being applied to a huge range of problems across the spectrum of applications.
Initial results relied on server-oriented computations. But many applications will
…

Automatic Implementation of Secure Silicon

Leef
Serge

Throughout the past decade, cybersecurity threats have evolved from attacks focused
high in the software stack to progressively lower levels of computational hierarchy.
With the explosion of popularity and growing deployment of internet connected …

Processing Data Where It Makes Sense in Modern Computing Systems: Enabling In-Memory Computation

Mutlu
Onur

Today’s systems are overwhelmingly designed to move data to computation. This design
choice goes directly against at least three key trends in systems that cause performance,
scalability and energy bottlenecks: 1) data access from memory is already a …

Innovations in IoT for a Safe, Secure, and Sustainable Future

Bhunia
Swarup

Internet of things (IoT) promises to usher in the fourth industrial revolution through
an exponential growth of smart connected devices deployed in myriad application domains.
It gives rise to new relationships between man and smart connected machines …

SESSION: Tech Session 1: Design and Integration of Hardware Security Primitives

LPN-based Device Authentication Using Resistive Memory

Arafin
Md Tanvir

Recent progress in the design and implementation of resistive memory components such
as RRAMs and PCMs has introduced opportunities for developing novel hardware security
solutions using unique physical properties of these devices. In this work, we …

Leveraging On-Chip Voltage Regulators Against Fault Injection Attacks

Vosoughi
Ali

The security implications of utilizing an on-chip voltage regulator as a countermeasure
against fault injection attacks are investigated in this paper. The effect of the
size of the capacitors and number of phases of the voltage regulator on the …

On the Theoretical Analysis of Memristor based True Random Number Generator

Uddin
Mesbah

Emerging nano-devices like memristors display stochastic switching behavior which
poses a big uncertainty in their implementation as the next-generation CMOS alternative.
However, this stochasticity provides an opportunity to design circuits for …

Control-Lock: Securing Processor Cores Against Software-Controlled Hardware Trojans

Šišejković
Dominik

Malicious circuit modifications known as hardware Trojans represent a rising threat
to the integrated circuit supply chain. As many Trojans are activated based on a specific
sequence of circuit states, we have recognized the ease of utilizing an …

Lightweight Authenticated Encryption for Network-on-Chip Communications

Harttung
Julian

In recent years, Network-on-Chip (NoC) has gained increasing popularity as a promising
solution for the challenging interconnection problem in multi-processor systems-on-chip
(MPSoCs). However, the interest of adversaries to compromise such systems grew …

SESSION: Tech Session 2: VLSI Circuits and Power Aware Design

Design of a Low-power and Small-area Approximate Multiplier using First the Approximate
and then the Accurate Compression Method

Yang
Tongxin

Recently emerging applications, such as convolution neural networks (CNNs), which
process thousands of convolutional computations, require a large amount of power.
Multiplication is the key arithmetic in these applications and an approximate multiplier
…

GraphiDe: A Graph Processing Accelerator leveraging In-DRAM-Computing

Angizi
Shaahin

In this paper, we propose GraphiDe, a novel DRAM-based processing-in-memory (PIM)
accelerator for graph processing. It transforms current DRAM architecture to massively
parallel computational units exploiting the high internal bandwidth of the modern
…

An Efficient Time-based Stochastic Computing Circuitry Employing Neuron-MOS

Erlina
Tati

A compact and low energy circuitry of time-based stochastic computing (TBSC) have
been designed. In the TBSC theory, stochastic numbers (SNs) are represented by duty-cycle
of periodic signals. Additionally, multiplication and addition operations of the …

Monolithic 8×8 SiPM with 4-bit Current-Mode Flash ADC with Tunable Dynamic Range

Vinayaka
Vikas

A monolithic photon-counting receiver consisting of an integrated silicon-photomultiplier
and a current-mode analog-to-digital converter (ADC) was designed, simulated and fabricated
in the AMS 0.35 μm SiGe BiCMOS process. The silicon photomultiplier (…

SESSION: Tech Session 3: : VLSI for Machine Learning and Artificial Intelligence

A Systolic SNN Inference Accelerator and its Co-optimized Software Framework

Guo
Shasha

Although Deep Neural Network (DNN) architectures have made some breakthroughs in computer
vision tasks, they are not close to biological brain neurons. Spiking Neural Network
(SNN) is highly expected to bridge the gap between artificial computing …

Dynamic Beam Width Tuning for Energy-Efficient Recurrent Neural Networks

Jahier Pagliari
Daniele

Recurrent Neural Networks (RNNs) are state-of-the-art models for many machine learning
tasks, such as language modeling and machine translation. Executing the inference
phase of a RNN directly in edge nodes, rather than in the cloud, would provide …

Efficient Softmax Hardware Architecture for Deep Neural Networks

Du
Gaoming

Deep neural network (DNN) has become a pivotal machine learning and object recognition
technology in the big data era. The softmax layer is one of the key component layers
for completing multi-classification tasks. However, the softmax layer contains …

HSIM-DNN: Hardware Simulator for Computation-, Storage- and Power-Efficient Deep Neural Networks

Sun
Mengshu

Deep learning that utilizes large-scale deep neural networks (DNNs) is effective in
automatic high-level feature extraction but also computation and memory intensive.
Constructing DNNs using block-circulant matrices can simultaneously achieve hardware
…

SESSION: Tech Session 4: Next Generation Interconnect: Architecture to Physical Design

An Area-Efficient Iterative Single-Precision Floating-Point Multiplier Architecture
for FPGA

Kim
Sunwoong

Approximate multipliers have been widely used in critical applications, such as machine
learning and multimedia, which are tolerant to approximation errors. This paper proposes
a novel single-precision floating-point (SPFP) multiplication algorithm and …

An Automatic Transistor-Level Tool for GRM FPGA Interconnect Circuits Optimization

Li
Zhengjie

Due to its dominance in FPGA area and delay, the interconnect circuit is traditionally
designed and optimized in full customized fashion, which can be extremely time consuming.
In this paper, we propose an automated transistor-level sizing optimization …

Low Voltage Clock Tree Synthesis with Local Gate Clusters

Sitik
Can

In this paper, a novel local clock gate cluster-aware low voltage clock tree synthesis
methodology is introduced. In low voltage/swing clocking, timing closure is a challenging
problem due to tight skew and slew constraints. The clock gating makes this …

SESSION: Tech Session 5: Designing robust VLSI circuits. From approximate computing to hardware
security

TOIC: Timing Obfuscated Integrated Circuits

Alam
Mahabubul

To counter the threats of reverse engineering (RE) and Trojan in-sertion, researchers
have considered gate-level obfuscation in inte-grated circuits (IC) as a viable solution.
However, several techniques are present in the literature to crack the …

Design for Eliminating Operation Specific Power Signatures from Digital Logic

Majumder
Md Badruddoja

Conventional digital logic operations have distinguishable power signatures. Side
channel power analysis combined with classification algorithm can reveal unknown logic
operations. Revealing the underlying operations is the main task in reverse …

Non-Uniform Temperature Distribution in Interconnects and Its Impact on Electromigration

Abbasinasab
Ali

We investigate the effect of electrically induced thermal load on interconnect reliability
and aging. We propose new models for uniform and non-uniform temperature evolution
and its steady state distribution in interconnects considering Joule heating …

Fault Classification and Coverage of Analog Circuits using DC Operating Point and
Frequency Response Analysis

Sanyal
Sayandeep

Detection of faults in a mixed-signal SOC at the pre-silicon stage is a challenge,
especially when it has substantial analog components. Given the time taken for simulating
analog circuits, designing tests to detect faults in them is not a …

Crash Skipping: A Minimal-Cost Framework for Efficient Error Recovery in Approximate Computing Environments

Verdeja Herms
Yan

We present a lightweight technique to minimize error recovery costs in approximate
computing environments. We take advantage of the key observation that if an application
crashes in a “non-critical” region of its execution, then skipping the crash and …

SESSION: Tech Session 6: Emerging Computing & Post-CMOS Technologies

Voltage-Controlled Magnetoelectric Memory Bit-cell Design With Assisted Body-bias
in FD-SOI

Cai
Hao

Voltage-controlled magnetic anisotropy (VCMA)-magnetic tunnel junction (MTJ) is incorporated
into FD-SOI CMOS technology. The design space of 1 transistor-1 MTJ (1T-1M) bit-cell
is explored through varied VCMA pulse duration/amplitude and scaling down …

Low Cost Hybrid Spin-CMOS Compressor for Stochastic Neural Networks

Li
Bingzhe

With expansion of neural network (NN) applications lowering their hardware implementation
cost becomes an urgent task especially in back-end applications where the power-supply
is limited. Stochastic computing (SC) is a promising solution to realize low-…

Functionally Complete Boolean Logic and Adder Design Based on 2T2R RRAMs for Post-CMOS
In-Memory Computing

Yang
Zongxian

In-memory computing (IMC) paradigm has attracted extensive attention for future electronics
to overcome the bottleneck and memory wall problem in the von Neumann systems. Nonvolatile
logic based on resistive random-access memory (RRAM) is a promising …

Jump Search: A Fast Technique for the Synthesis of Approximate Circuits

Witschen
Linus

State-of-the-art frameworks for generating approximate circuits automatically explore
the search space in an iterative process – often greedily. Synthesis and verification
processes are invoked in each iteration to evaluate the found solutions and to …

SESSION: Tech Session 7: Physical Design and Obfuscation

SAT-Based Placement Adjustment of FinFETs inside Unroutable Standard Cells Targeting
Feasible DRC-Clean Routing

Sorokin
Anton

In this paper, we present an algorithm of transistor placement that takes unroutable
standard cells and makes them routable by moving transistors in local windows. It
converts the task of placement of gridded FinFETs into a Boolean problem and employs
…

A Scalable and Process Variation Aware NVM-FPGA Placement Algorithm

Yang
Chengmo

As non-volatile memory (NVM) based FPGAs gain increasing popularity, FPGA synthesis
tools start to tune the synthesis flow to match NVM characteristics. State-of-the-art
NVM FPGA placement algorithms tried to reduce the high reconfiguration cost induced
…

Functional Obfuscation of Hardware Accelerators through Selective Partial Design Extraction
onto an Embedded FPGA

Hu
Bo

The protection of Intellectual Property (IP) has emerged as one of the most serious
areas of concern in the semiconductor industry. To address this issue, we present
a method and architecture to map selective portions of a design, given as a behavioral
…

HydraRoute: A Novel Approach to Circuit Routing

Khasawneh
Mohammad

Routing for dense circuits is a major challenge for VLSI physical design. Most routing
approaches rely at least partially on a “rip-up and reroute” scheme, where solution
quality and run times can be impacted profoundly by the order in which nets are …

SESSION: Tech Session 8: Quantum Circuits and Emerging Technologies

Balanced Factorization and Rewriting Algorithms for Synthesizing Single Flux Quantum
Logic Circuits

Pasandi
Ghasem

Single Flux Quantum (SFQ) logic with switching energy of 100zJ1 and switching delay
of 1ps is a promising post-CMOS candidate. Logic synthesis of these magnetic-pulse-based
circuits is a very important step in their design flow with a big impact on the …

A Majority Logic Synthesis Framework for Adiabatic Quantum-Flux-Parametron Superconducting
Circuits

Cai
Ruizhe

Adiabatic Quantum-Flux-Parametron (AQFP) logic is an adiabatic superconductor logic
that has been proposed as alternative to CMOS logic with extremely high energy efficiency.
In AQFP technology, majority-based gates have the same area as two-input AND/…

A Processing-In-Memory Implementation of SHA-3 Using a Voltage-Gated Spin Hall-Effect
Driven MTJ-based Crossbar

Yang
Chengmo

Processing-In-Memory (PIM), which implements logic operations within memory cells,
opens up a new direction on organizing data and computation. Leveraging resistive
or magnetic characteristics of nonvolatile memory (NVM) devices, platforms such as
PLiM …

Exploring Processing In-Memory for Different Technologies

Gupta
Saransh

The recent emergence of IoT has led to a substantial increase in the amount of data
processed. Today, a large number of applications are data intensive, involving massive
data transfers between processing core and memory. These transfers act as a …

SESSION: Tech Session 9: Towards Fast, Efficient, and Robust Memory

BLADE: A BitLine Accelerator for Devices on the Edge

Simon
William Andrew

The increasing ubiquity of edge devices in the consumer market, along with their ever
more computationally expensive workloads, necessitate corresponding increases in computing
power to support such workloads. In-memory computing is attractive in edge …

Enhancing the Lifetime of Non-Volatile Caches by Exploiting Module-Wise Write Restriction

Agarwal
Sukarn

The emerging Non-Volatile Memory (NVM) technologies offer a good combination of high
density and near-zero leakage power, becoming the strongest candidate in the memory
hierarchy including caches. However, the weak write endurance of these memories …

Mitigating the Performance and Quality of Parallelized Compressive Sensing Reconstruction
Using Image Stitching

Namazi
Mahmoud

Orthogonal Matching Pursuit is an iterative greedy algorithm used to find a sparse
approximation for high-dimensional signals. The algorithm is most popularly used in
Compressive Sensing, which allows for the reconstruction of sparse signals at rates
…

Towards Optimizing Refresh Energy in embedded-DRAM Caches using Private Blocks

Manohar
Sheel Sindhu

In recent years, the increased working set size of applications craves for more memory
demand in terms of large size Last Level Caches (LLC). To fulfill this, embedded DRAM
(eDRAM) caches have been considered as one of the best alternatives over …

SESSION: Tech Session 10: MSE

Extending Student Labs with SMT Circuit Implementation

Brunvand
Erik

Computer Science and Computer Engineering classes related to digital circuits, embedded
systems, Human Computer Interaction (HCI), and a wide variety of “maker” subjects,
would often like to include physical computing projects. Extending these physical
…

Teaching the Next Generation of Cryptographic Hardware Design to the Next Generation
of Engineers

Aysu
Aydin

Evolving threats against cryptographic systems and the increasing diversity of computing
platforms enforce teaching cryptographic engineering to a wider audience. This paper
describes the development of a new graduate course on hardware security taught …

A Web-based Remote FPGA Laboratory for Computer Organization Course

Wan
Han

Learning in digital systems could be enhanced by applying a learn-by-doing mechanism.
In this paper the implementation of a web-based remote FPGA laboratory for Computer
Organization course is proposed. The projects created for this course are designed
…

System-on-a-Chip Design as a Platform for Teaching Design and Design Flow Integration

Covey
Jacob

The design of microelectronic systems requires integration and cooperation across
multiple disciplines, but most curriculum is taught in unconnected pieces. This makes
the creation of manageable projects that reflect the design experience very …

SESSION: Poster Sessions I, II

UPIM: Unipolar Switching Logic for High Density Processing-in-Memory Applications

Sim
Joonseop

Internet of Things (IoT) has built a network with billions of connected devices which
generate massive volumes of data. Processing large data on existing systems requires
significant costs for data movements between processors and memory due to limited
…

Fence-Region-Aware Mixed-Height Standard Cell Legalization

Do
SangGi

We propose a fence-region-aware mixed-height standard cell legalization that can optimize
the placement of standard cells that have more than a two row height in various shapes
of the fence region. The algorithm consists of pre-legalization and mixed-…

A Case for Heterogeneous Network-on-Chip Based H.264 Video Decoders

Ghorbani Moghaddam
Milad

The design of a heterogeneous network-on-chip (NoC) based H.264 video decoder is proposed.
A thorough investigation using a system simulator developed as the combination of
a cycle accurate NoC simulator together with complete implementations of all the …

A 16b Clockless Digital-to-Analog Converter with Ultra-Low-Cost Poly Resistors Supporting
Wide-Temperature Range from -40°C to 85°C

Wang
Xuedi

High-precision digital-to-analog converter (DAC) is a critical component in process
control, data acquisition, and testing instruments. In order to achieve high resolution
and a wide-temperature range, conventional designs have been adopting high-cost …

A Skyrmion Racetrack Memory based Computing In-memory Architecture for Binary Neural
Convolutional Network

Pan
Yu

A Skyrmion Racetrack Memory (SRM) based Computing In-Memory Architecture (SRM-CIM)
was proposed in this paper. Both data and computing operation can be achieved in SRM-CIM.
SRM-CIM is used to support convolutional computing in Binary Convolutional …

TASecure: Temperature-Aware Secure Deletion Scheme for Solid State Drives

Li
Bingzhe

With the increasing concerns of security, the secure deletion for SSDs becomes very
costly due to its out-of-place update (i.e., an update is performed in a new location
leaving the old data un-touched). Some previous studies used a combined erase-based
…

An Asymmetric Dual Output On-Chip DC-DC Converter for Dynamic Workloads

Liu
Xingye

We propose a novel two-stage hybrid on-chip DC-DC converter targeting low power applications
with multiple supply voltage domains and dynamic workloads. The converter has a nominal
input voltage of 1.2V and generates two asymmetrically regulated output …

CNNWire: Boosting Convolutional Neural Network with Winograd on ReRAM based Accelerators

Lin
Jilan

Resistive random access memory (ReRAM) demonstrates the great potential of in-memory
processing for neural network (NN) acceleration. However, since the convolutional
neural network (CNN) is widely known as compute-bound, current ReRAM-based …

Feed-Forward XOR PUFs: Reliability and Attack-Resistance Analysis

Avvaru
S. V. Sandeep

Physical unclonable functions (PUFs) can be used to generate unique signatures of
integrated circuit (IC) chips. XOR arbiter PUFs (XOR PUFs), that typically contain
multiple standard arbiter PUFs as their components, are more secure than standard
…

Exploring Design Trade-offs in Fault-Tolerant Behavioral Hardware Accelerators

Zhu
Zhiqi

High-Level Synthesis (HLS) allows the automatic generation of hardware accelerators
with unique design metrics. This work leverages this unique feature and presents a
method to increase the search space of fault-tolerant hardware accelerators. The …

Automatic Extraction of Requirements from State-based Hardware Designs for Runtime
Verification

Seo
Minjun

Runtime monitoring and verification enables a system to monitor itself and ensure
system requirements are met even in the presence of dynamic environments. For hardware,
state-based models are widely used, but verifying the correctness between the state-…

MirrorCache: An Energy-Efficient Relaxed Retention L1 STTRAM Cache

Kuan
Kyle

Spin-Transfer Torque RAM (STTRAM) is a promising alternative to SRAMs in on-chip caches,
due to several advantages, including non-volatility, low leakage, high integration
density, and CMOS compatibility. However, STTRAMs’ wide adoption in resource-…

Design and Evaluation of DNU-Tolerant Registers for Resilient Architectural State
Storage

Alghareb
Faris S.

In this work, we aim to maintain the correct execution of instructions in the pipeline
stages. To achieve that, the integrity for the data computed in registers during execution
should be maintained via protecting the susceptible registers. Thus, we …

Automated Analysis of Virtual Prototypes at Electronic System Level

Goli
Mehran

The exponential increase in functionality of System-on-Chips (SoCs) and reduced Time-to-Market
(TTM) requirements have significantly altered the typical design and verification
flow. Virtual Prototyping (VP) at the Electronic System Level (ESL) using …

Dynamic Physically Unclonable Functions

Xiong
Wenjie

Physical variations in the manufacturing processes of electronic devices have been
widely leveraged to design Physically Unclonable Functions (PUFs), which can be used
for authentication and key storage. Existing PUFs are static, as their PUF responses
…

RDTA: An Efficient Routability-Driven Track Assignment Algorithm

Liu
Genggeng

This paper presents a routability-driven track assignment algorithm (RDTA) to efficiently
estimate routability. Routability has become a very challenging issue in modern IC
design and it can be effectively estimated by routing congestion. Track …

EraseMe: A Defense Mechanism against Information Leakage exploiting GPU Memory

Fang
Hongyu

Graphics Processing Units (GPU) play a major role in speeding up computational tasks
of the users, especially in applications such as high volume text and image processing.
Recent works have demonstrated the security problems associated with GPU that do …

A Statistical Current and Delay Model Based on Log-Skew-Normal Distribution for Low
Voltage Region

Cao
Peng

The increasing performance variation and non-Gaussian distribution pose remarkable
challenges to timing analysis for circuits operating in low voltage region. Accurate
modeling of the statistical characteristics is urgently required with process …

Enabling Approximate Storage through Lossy Media Data Compression

Worek
Brian

As compute capabilities continue to scale, memory capacity and bandwidth continue
to lag behind. Data compression is an effective approach to improving memory capacity
and bandwidth; but prior works have focused primarily on lossless compression and
…

Thermal Fingerprinting of FPGA Designs through High-Level Synthesis

Chen
Jianqi

This work investigates if temperature can be used to fingerprint FPGA designs and
presents a method to generate a large number of functionally equivalent FPGA designs
such that each design has a unique distinguishable thermal signature. The main …

Deep RNN-Oriented Paradigm Shift through BOCANet: Broken Obfuscated Circuit Attack

Tehranipoor
Fatemeh

Logic encryption obfuscation has been used for thwarting counterfeiting, overproduction,
and reverse engineering but vulnerable to attacks. However, it was recently shown
that satisfiability – checking (SAT) can potentially compromise hardware …

STAT: Mean and Variance Characterization for Robust Inference of DNNs on Memristor-based
Platforms

Zhang
Baogang

An emerging solution to accelerate the inference phase of deep neural networks (DNNs)
is to utilize memristor crossbar arrays (MCAs) to perform highly efficient matrix-vector
multiplication in the analog domain. An adverse challenge is that memristor …

LSM: Novel Low-Complexity Unified Systolic Multiplier over Binary Extension Field

Xie
Jiafeng

Unified (hybrid field-size) systolic multiplier over GF(2m) (binary extension field)
has attracted significant attentions from research communities recently as it can
be used in reconfigurable cryptographic processors. In this paper, we present a novel
…

Binarized Depthwise Separable Neural Network for Object Tracking in FPGA

Yang
Li

Object tracking has achieved great advances in the past few years and has been widely
applied in vision-based application. Nowadays, deep convolutional neural network has
taken an important role in object tracking tasks. However, its enormous model size
…

An Analytical-based Hybrid Algorithm for FPGA Placement

Hu
Chengyu

As the capacity of FPGA increases, FPGA placers that adopt Simulated Annealing (SA)
algorithm take more and more runtime. To solve this problem, this paper presents HCAS,
a Hybrid algorithm Combining Analytical method and SA. There are three …

Approximate Memory with Approximate DCT

Ma
Shenghou

Approximate Computing is an emerging computing paradigm where one exploits inherent
error resilience of certain applications (e.g., digital signal processing, multimedia
and artificial intelligence) and trades off absolute computation precisions for …

AQuRate: MRAM-based Stochastic Oscillator for Adaptive Quantization Rate Sampling of Sparse
Signals

Salehi
Soheil

Recently, the promising aspects of compressive sensing have inspired new circuit-level
approaches for their efficient realization within the literature. However, most of
these recent advances involving novel sampling techniques have been proposed …

Clockless Spin-based Look-Up Tables with Wide Read Margin

Salehi
Soheil

In this paper, we develop a 6-input fracturable non-volatile Clockless LUT (C-LUT)
using spin Hall effect (SHE)-based Magnetic Tunnel Junctions (MTJs) and provide a
detailed comparison between the SHE-MTJ-based C-LUT and Spin Transfer Torque (STT)-MTJ-…

A Hybrid Framework for Functional Verification using Reinforcement Learning and Deep
Learning

Singh
Karunveer

In this paper, we propose a novel hybrid verification framework (HVF) which uses Reinforcement
Learning (RL) and Deep Neural Networks (DNNs) to accelerate the verification of complex
systems. More precisely, our HVF incorporates RL to generate all …

SESSION: Special Session 1: In-Memory Processing for Future Electronics

Digital and Analog-Mixed-Signal In-Memory Processing in CMOS SRAM

Jaiswal
Akhilesh

Ferroelectric FET Based In-Memory Computing for Few-Shot Learning

Laguna
Ann Franchesca

As CMOS technology advances, the performance gap between the CPU and main memory has
not improved. Furthermore, the hardware deployed for Internet of Things (IoT) applications
need to process ever growing volumes of data, which can further exacerbate …

True In-memory Computing with the CRAM: From Technology to Applications

Zabihi
Masoud

An Overview of In-memory Processing with Emerging Non-volatile Memory for Data-intensive
Applications

Li
Bing

The conventional von Neumann architecture has been revealed as a major performance
and energy bottleneck for rising data-intensive applications. The decade-old idea
of leveraging in-memory processing to eliminate substantial data movements has returned
…

SESSION: Special Session 2: Approximate Computing Systems Design: Energy Efficiency and Security
Implications

Security Threats in Approximate Computing Systems

Yellu
Pruthvy

Approximate computing systems improve energy efficiency and computation speed at the
cost of reduced accuracy on system outputs. Existing efforts mainly explore the feasible
approximation mechanisms and their implementation methods. There is limited …

Characterizing Approximate Adders and Multipliers Optimized under Different Design
Constraints

Jiang
Honglan

Taking advantage of the error resilience in many applications as well as the perceptual
limitations of humans, numerous approximate arithmetic circuits have been proposed
that trade off accuracy for higher speed or lower power in emerging applications …

Approximate Communication Strategies for Energy-Efficient and High Performance NoC: Opportunities and Challenges

Reza
Md Farhadur

With the advancement and miniaturization of transistor technology, hundreds of cores
can be integrated on a single chip. Network-on-Chips (NoCs) are the de facto on-chip
communication fabrics for multi/many core systems because of their benefits over …

Information Hiding behind Approximate Computation

Wang
Ye

There are many interesting advances in approximate computing recently targeting the
energy efficiency in system design and execution. The basic idea is to trade computation
accuracy for power and energy during all phases of the computation, from data to …

MLPrivacyGuard: Defeating Confidence Information based Model Inversion Attacks on Machine Learning
Systems

Alves
Tiago A. O.

As services based on Machine Learning (ML) applications find increasing use, there
is a growing risk of attack against such systems. Recently, adversarial machine learning
has received a lot of attention, where an adversary is able to craft an input or …

SESSION: Special Session 3: Recent Advances in Near and In-Memory Computing Circuit ?

XNOR-SRAM: In-Bitcell Computing SRAM Macro based on Resistive Computing Mechanism

Jiang
Zhewei

We present an in-memory computing SRAM macro for binary neural networks. The memory
macro computes XNOR-and-accumulate for binary/ternary deep convolutional neural networks
on the bitline without row-by-row data access. It achieves 33X better energy and …

Efficient Process-in-Memory Architecture Design for Unsupervised GAN-based Deep Learning
using ReRAM

Chen
Fan

The ending of Moore’s Law makes domain-specific architecture as the future of computing.
The most representative is the emergence of various deep learning accelerators. Among
the proposed solutions, resistive random access memory (ReRAM) based process-…

DigitalPIM: Digital-based Processing In-Memory for Big Data Acceleration

Imani
Mohsen

In this work, we design, DigitalPIM, a Digital-based Processing In-Memory platform
capable of accelerating fundamental big data algorithms in real time with orders of
magnitude more energy efficient operation. Unlike the existing near-data processing
…

In-memory Processing based on Time-domain Circuit

Kong
Yuyao

Deep Neural Networks (DNN) have emerged as a dominant algorithm for machine learning
(ML). High performance and extreme energy efficiency are critical for deployments
of DNN, especially in mobile platforms such as autonomous vehicles, cameras, and other
…

SESSION: Special Session 4: Opportunities and Challenges for Emerging Monolithic 3D Integrated
Circuits

An Overview of Thermal Challenges and Opportunities for Monolithic 3D ICs

Shukla
Prachi

Monolithic 3D (Mono3D) is a three-dimensional integration technology that can overcome
some of the fundamental limitations faced by traditional, two-dimensional scaling.
This paper analyzes the unique thermal characteristics of Mono3D ICs by simulating
…

Logic Monolithic 3D ICs: PPA Benefits and EDA Tools Necessary

Pentapati
Sai Surya Kiran

Monolithic 3D (M3D) ICs provide a way to achieve high performance and low power designs
within the same technology node, thereby bypassing the need for transistor scaling.
M3D ICs have multiple 2D tiers sequentially fabricated on top of each other and …

Investigation and Trade-offs in 3DIC Partitioning Methodologies: N/A

Sketopoulos
Nikolaos

In this work, we compare alternative 3DIC partitioning methodologies, in terms of
slack, number of inter-tier vias, Tier Area Ratio (TAR) and HPWL design parameters.
The popular 3DIC postplacement, bin-based Fidducia-Mattheyses (FM) partitioning flow
is …

Test and Design-for-Testability Solutions for Monolithic 3D Integrated Circuits

Koneru
Abhishek

M3D integration can result in reduced area and higher performance when compared to
3D die stacking. Due to the benefits of M3D integration, there is growing interest
in industry towards the adoption of this technology. However, test challenges for
M3D …

N3XT Monolithic 3D Energy-Efficient Computing Systems

Aly
Mohamed M. Sabry

The world’s appetite for analyzing massive amounts of structured and unstructured
data has grown dramatically. The computational demands of these abundant-data applications
far exceed the capabilities of today’s computing systems. The N3XT (Nano-…

SESSION: Special Session 5: Robust IC Authentication and Protected Intellectual Property: A
Special Session on Hardware Security

How to Generate Robust Keys from Noisy DRAMs?

Karimian
Nima

Security primitives based on Dynamic Random Access Memory (DRAM) can provide cost-efficient
and practical security solutions, especially for resource-constrained devices, such
as hardware used in the Internet of Things (IoT), as DRAMs are an intrinsic …

Threats on Logic Locking: A Decade Later

Zamiri Azar
Kimia

To reduce the cost of ICs and to meet the market’s demand, a considerable portion
of manufacturing supply chain, including silicon fabrication, packaging and testing
may be pushed offshore. Utilizing a global IC manufacturing supply chain, and inclusion
…

On Custom LUT-based Obfuscation

Kolhe
Gaurav

Logic obfuscation yields hardware security against various threats, such as Intellectual
Property (IP) piracy and reverse engineering. Evolving Boolean satisfiability (SAT)
attacks have challenged the hardware security assurance rendered by various …

Securing Analog Mixed-Signal Integrated Circuits Through Shared Dependencies

Juretus
Kyle

The transition to a horizontal integrated circuit (IC) design flow has raised concerns
regarding the security and protection of IC intellectual property (IP). Obfuscation
of an IC has been explored as a potential methodology to protect IP in both the …

SESSION: Special Session 6: Neuromorphic Computing and Deep Neural Network

Design Methodology for Embedded Approximate Artificial Neural Networks

Balaji
Adarsha

Artificial neural networks (ANNs) have demonstrated significant promise while implementing
recognition and classification applications. The implementation of pre-trained ANNs
on embedded systems requires representation of data and design parameters in …

Exploration of Segmented Bus As Scalable Global Interconnect for Neuromorphic Computing

Balaji
Adarsha

Spiking Neural Networks (SNNs) are efficient computation models for spatio-temporal
pattern recognition on resource and power constrained platforms. Dedicated SNN hardware,
also called neuromorphic hardware, can further reduce the energy consumption of …

ADMM-based Weight Pruning for Real-Time Deep Learning Acceleration on Mobile Devices

Li
Hongjia

Deep learning solutions are being increasingly deployed in mobile applications, at
least for the inference phase. Due to the large model size and computational requirements,
model compression for deep neural networks (DNNs) becomes necessary, especially …

On the use of Deep Autoencoders for Efficient Embedded Reinforcement Learning

Prakash
Bharat

In autonomous embedded systems, it is often vital to reduce the amount of actions
taken in the real world and energy required to learn a policy. Training reinforcement
learning agents from high dimensional image representations can be very expensive
and …

SESSION: Panelist Position Papers

Tuning Track-based NVM Caches for Low-Power IoT Devices

Aghaei Khouzani
Hoda

Track-based non-volatile memories, such as Domain Wall Memory (DWM) and Skyrmion,
are promising candidates to be used as CPU caches due to their ultra-high density
and low-static power. However, the access latency and energy of these devices are
highly …

Dynamic Computation Migration at the Edge: Is There an Optimal Choice?

Shahhosseini
Sina

In the era of Fog computing where one can decide to compute certain time-critical
tasks at the edge of the network, designers often encounter a question whether the
sensor layer provides the optimal response time for a service, or the Fog layer, or
…

Solving Energy and Cybersecurity Constraints in IoT Devices Using Energy Recovery
Computing

Thapliyal
Himanshu

With the growth of Internet-of-Things (IoT), the potential threat vectors for malicious
cyber and hardware attacks are rapidly expanding. As the IoT paradigm emerges, there
are challenging requirements to design energy-efficient and secure systems. To …

Right-Provisioned IoT Edge Computing: An Overview

Adegbija
Tosiron

Edge computing on the Internet of Things (IoT) is an increasingly popular paradigm
in which computation is moved closer to the data source (i.e., edge devices). Edge
computing mitigates the overheads of cloud-based computing arising from increased
…

Secure Computing Systems Design Through Formal Micro-Contracts

Kinsy
Michel A.

Two enduring concepts in computer system design are abstraction levels and layered
composition. The design generally takes a layered approach where each layer implements
a different abstraction of the system. The layers communicate through interfaces …