SESSION: Keynote & Invited Talks

Thoughts on Edge Intelligence

  • Wolf

Machine learning methods have exploded in the past half-dozen years. Machine learning
is being applied to a huge range of problems across the spectrum of applications.
Initial results relied on server-oriented computations. But many applications will

Automatic Implementation of Secure Silicon

  • Leef

Throughout the past decade, cybersecurity threats have evolved from attacks focused
high in the software stack to progressively lower levels of computational hierarchy.
With the explosion of popularity and growing deployment of internet connected …

Processing Data Where It Makes Sense in Modern Computing Systems: Enabling In-Memory Computation

  • Mutlu

Today’s systems are overwhelmingly designed to move data to computation. This design
choice goes directly against at least three key trends in systems that cause performance,
scalability and energy bottlenecks: 1) data access from memory is already a …

Innovations in IoT for a Safe, Secure, and Sustainable Future

  • Bhunia

Internet of things (IoT) promises to usher in the fourth industrial revolution through
an exponential growth of smart connected devices deployed in myriad application domains.
It gives rise to new relationships between man and smart connected machines …

SESSION: Tech Session 1: Design and Integration of Hardware Security Primitives

LPN-based Device Authentication Using Resistive Memory

  • Arafin
    Md Tanvir

Recent progress in the design and implementation of resistive memory components such
as RRAMs and PCMs has introduced opportunities for developing novel hardware security
solutions using unique physical properties of these devices. In this work, we …

Leveraging On-Chip Voltage Regulators Against Fault Injection Attacks

  • Vosoughi

The security implications of utilizing an on-chip voltage regulator as a countermeasure
against fault injection attacks are investigated in this paper. The effect of the
size of the capacitors and number of phases of the voltage regulator on the …

On the Theoretical Analysis of Memristor based True Random Number Generator

  • Uddin

Emerging nano-devices like memristors display stochastic switching behavior which
poses a big uncertainty in their implementation as the next-generation CMOS alternative.
However, this stochasticity provides an opportunity to design circuits for …

Control-Lock: Securing Processor Cores Against Software-Controlled Hardware Trojans

  • Šišejković

Malicious circuit modifications known as hardware Trojans represent a rising threat
to the integrated circuit supply chain. As many Trojans are activated based on a specific
sequence of circuit states, we have recognized the ease of utilizing an …

Lightweight Authenticated Encryption for Network-on-Chip Communications

  • Harttung

In recent years, Network-on-Chip (NoC) has gained increasing popularity as a promising
solution for the challenging interconnection problem in multi-processor systems-on-chip
(MPSoCs). However, the interest of adversaries to compromise such systems grew …

SESSION: Tech Session 2: VLSI Circuits and Power Aware Design

Design of a Low-power and Small-area Approximate Multiplier using First the Approximate
and then the Accurate Compression Method

  • Yang

Recently emerging applications, such as convolution neural networks (CNNs), which
process thousands of convolutional computations, require a large amount of power.
Multiplication is the key arithmetic in these applications and an approximate multiplier

GraphiDe: A Graph Processing Accelerator leveraging In-DRAM-Computing

  • Angizi

In this paper, we propose GraphiDe, a novel DRAM-based processing-in-memory (PIM)
accelerator for graph processing. It transforms current DRAM architecture to massively
parallel computational units exploiting the high internal bandwidth of the modern

An Efficient Time-based Stochastic Computing Circuitry Employing Neuron-MOS

  • Erlina

A compact and low energy circuitry of time-based stochastic computing (TBSC) have
been designed. In the TBSC theory, stochastic numbers (SNs) are represented by duty-cycle
of periodic signals. Additionally, multiplication and addition operations of the …

Monolithic 8×8 SiPM with 4-bit Current-Mode Flash ADC with Tunable Dynamic Range

  • Vinayaka

A monolithic photon-counting receiver consisting of an integrated silicon-photomultiplier
and a current-mode analog-to-digital converter (ADC) was designed, simulated and fabricated
in the AMS 0.35 μm SiGe BiCMOS process. The silicon photomultiplier (…

SESSION: Tech Session 3: : VLSI for Machine Learning and Artificial Intelligence

A Systolic SNN Inference Accelerator and its Co-optimized Software Framework

  • Guo

Although Deep Neural Network (DNN) architectures have made some breakthroughs in computer
vision tasks, they are not close to biological brain neurons. Spiking Neural Network
(SNN) is highly expected to bridge the gap between artificial computing …

Dynamic Beam Width Tuning for Energy-Efficient Recurrent Neural Networks

  • Jahier Pagliari

Recurrent Neural Networks (RNNs) are state-of-the-art models for many machine learning
tasks, such as language modeling and machine translation. Executing the inference
phase of a RNN directly in edge nodes, rather than in the cloud, would provide …

Efficient Softmax Hardware Architecture for Deep Neural Networks

  • Du

Deep neural network (DNN) has become a pivotal machine learning and object recognition
technology in the big data era. The softmax layer is one of the key component layers
for completing multi-classification tasks. However, the softmax layer contains …

HSIM-DNN: Hardware Simulator for Computation-, Storage- and Power-Efficient Deep Neural Networks

  • Sun

Deep learning that utilizes large-scale deep neural networks (DNNs) is effective in
automatic high-level feature extraction but also computation and memory intensive.
Constructing DNNs using block-circulant matrices can simultaneously achieve hardware

SESSION: Tech Session 4: Next Generation Interconnect: Architecture to Physical Design

An Area-Efficient Iterative Single-Precision Floating-Point Multiplier Architecture
for FPGA

  • Kim

Approximate multipliers have been widely used in critical applications, such as machine
learning and multimedia, which are tolerant to approximation errors. This paper proposes
a novel single-precision floating-point (SPFP) multiplication algorithm and …

An Automatic Transistor-Level Tool for GRM FPGA Interconnect Circuits Optimization

  • Li

Due to its dominance in FPGA area and delay, the interconnect circuit is traditionally
designed and optimized in full customized fashion, which can be extremely time consuming.
In this paper, we propose an automated transistor-level sizing optimization …

Low Voltage Clock Tree Synthesis with Local Gate Clusters

  • Sitik

In this paper, a novel local clock gate cluster-aware low voltage clock tree synthesis
methodology is introduced. In low voltage/swing clocking, timing closure is a challenging
problem due to tight skew and slew constraints. The clock gating makes this …

SESSION: Tech Session 5: Designing robust VLSI circuits. From approximate computing to hardware

TOIC: Timing Obfuscated Integrated Circuits

  • Alam

To counter the threats of reverse engineering (RE) and Trojan in-sertion, researchers
have considered gate-level obfuscation in inte-grated circuits (IC) as a viable solution.
However, several techniques are present in the literature to crack the …

Design for Eliminating Operation Specific Power Signatures from Digital Logic

  • Majumder
    Md Badruddoja

Conventional digital logic operations have distinguishable power signatures. Side
channel power analysis combined with classification algorithm can reveal unknown logic
operations. Revealing the underlying operations is the main task in reverse …

Non-Uniform Temperature Distribution in Interconnects and Its Impact on Electromigration

  • Abbasinasab

We investigate the effect of electrically induced thermal load on interconnect reliability
and aging. We propose new models for uniform and non-uniform temperature evolution
and its steady state distribution in interconnects considering Joule heating …

Fault Classification and Coverage of Analog Circuits using DC Operating Point and
Frequency Response Analysis

  • Sanyal

Detection of faults in a mixed-signal SOC at the pre-silicon stage is a challenge,
especially when it has substantial analog components. Given the time taken for simulating
analog circuits, designing tests to detect faults in them is not a …

Crash Skipping: A Minimal-Cost Framework for Efficient Error Recovery in Approximate Computing Environments

  • Verdeja Herms

We present a lightweight technique to minimize error recovery costs in approximate
computing environments. We take advantage of the key observation that if an application
crashes in a “non-critical” region of its execution, then skipping the crash and …

SESSION: Tech Session 6: Emerging Computing & Post-CMOS Technologies

Voltage-Controlled Magnetoelectric Memory Bit-cell Design With Assisted Body-bias

  • Cai

Voltage-controlled magnetic anisotropy (VCMA)-magnetic tunnel junction (MTJ) is incorporated
into FD-SOI CMOS technology. The design space of 1 transistor-1 MTJ (1T-1M) bit-cell
is explored through varied VCMA pulse duration/amplitude and scaling down …

Low Cost Hybrid Spin-CMOS Compressor for Stochastic Neural Networks

  • Li

With expansion of neural network (NN) applications lowering their hardware implementation
cost becomes an urgent task especially in back-end applications where the power-supply
is limited. Stochastic computing (SC) is a promising solution to realize low-…

Functionally Complete Boolean Logic and Adder Design Based on 2T2R RRAMs for Post-CMOS
In-Memory Computing

  • Yang

In-memory computing (IMC) paradigm has attracted extensive attention for future electronics
to overcome the bottleneck and memory wall problem in the von Neumann systems. Nonvolatile
logic based on resistive random-access memory (RRAM) is a promising …

Jump Search: A Fast Technique for the Synthesis of Approximate Circuits

  • Witschen

State-of-the-art frameworks for generating approximate circuits automatically explore
the search space in an iterative process – often greedily. Synthesis and verification
processes are invoked in each iteration to evaluate the found solutions and to …

SESSION: Tech Session 7: Physical Design and Obfuscation

SAT-Based Placement Adjustment of FinFETs inside Unroutable Standard Cells Targeting
Feasible DRC-Clean Routing

  • Sorokin

In this paper, we present an algorithm of transistor placement that takes unroutable
standard cells and makes them routable by moving transistors in local windows. It
converts the task of placement of gridded FinFETs into a Boolean problem and employs

A Scalable and Process Variation Aware NVM-FPGA Placement Algorithm

  • Yang

As non-volatile memory (NVM) based FPGAs gain increasing popularity, FPGA synthesis
tools start to tune the synthesis flow to match NVM characteristics. State-of-the-art
NVM FPGA placement algorithms tried to reduce the high reconfiguration cost induced

Functional Obfuscation of Hardware Accelerators through Selective Partial Design Extraction
onto an Embedded FPGA

  • Hu

The protection of Intellectual Property (IP) has emerged as one of the most serious
areas of concern in the semiconductor industry. To address this issue, we present
a method and architecture to map selective portions of a design, given as a behavioral

HydraRoute: A Novel Approach to Circuit Routing

  • Khasawneh

Routing for dense circuits is a major challenge for VLSI physical design. Most routing
approaches rely at least partially on a “rip-up and reroute” scheme, where solution
quality and run times can be impacted profoundly by the order in which nets are …

SESSION: Tech Session 8: Quantum Circuits and Emerging Technologies

Balanced Factorization and Rewriting Algorithms for Synthesizing Single Flux Quantum
Logic Circuits

  • Pasandi

Single Flux Quantum (SFQ) logic with switching energy of 100zJ1 and switching delay
of 1ps is a promising post-CMOS candidate. Logic synthesis of these magnetic-pulse-based
circuits is a very important step in their design flow with a big impact on the …

A Majority Logic Synthesis Framework for Adiabatic Quantum-Flux-Parametron Superconducting

  • Cai

Adiabatic Quantum-Flux-Parametron (AQFP) logic is an adiabatic superconductor logic
that has been proposed as alternative to CMOS logic with extremely high energy efficiency.
In AQFP technology, majority-based gates have the same area as two-input AND/…

A Processing-In-Memory Implementation of SHA-3 Using a Voltage-Gated Spin Hall-Effect
Driven MTJ-based Crossbar

  • Yang

Processing-In-Memory (PIM), which implements logic operations within memory cells,
opens up a new direction on organizing data and computation. Leveraging resistive
or magnetic characteristics of nonvolatile memory (NVM) devices, platforms such as
PLiM …

Exploring Processing In-Memory for Different Technologies

  • Gupta

The recent emergence of IoT has led to a substantial increase in the amount of data
processed. Today, a large number of applications are data intensive, involving massive
data transfers between processing core and memory. These transfers act as a …

SESSION: Tech Session 9: Towards Fast, Efficient, and Robust Memory

BLADE: A BitLine Accelerator for Devices on the Edge

  • Simon
    William Andrew

The increasing ubiquity of edge devices in the consumer market, along with their ever
more computationally expensive workloads, necessitate corresponding increases in computing
power to support such workloads. In-memory computing is attractive in edge …

Enhancing the Lifetime of Non-Volatile Caches by Exploiting Module-Wise Write Restriction

  • Agarwal

The emerging Non-Volatile Memory (NVM) technologies offer a good combination of high
density and near-zero leakage power, becoming the strongest candidate in the memory
hierarchy including caches. However, the weak write endurance of these memories …

Mitigating the Performance and Quality of Parallelized Compressive Sensing Reconstruction
Using Image Stitching

  • Namazi

Orthogonal Matching Pursuit is an iterative greedy algorithm used to find a sparse
approximation for high-dimensional signals. The algorithm is most popularly used in
Compressive Sensing, which allows for the reconstruction of sparse signals at rates

Towards Optimizing Refresh Energy in embedded-DRAM Caches using Private Blocks

  • Manohar
    Sheel Sindhu

In recent years, the increased working set size of applications craves for more memory
demand in terms of large size Last Level Caches (LLC). To fulfill this, embedded DRAM
(eDRAM) caches have been considered as one of the best alternatives over …

SESSION: Tech Session 10: MSE

Extending Student Labs with SMT Circuit Implementation

  • Brunvand

Computer Science and Computer Engineering classes related to digital circuits, embedded
systems, Human Computer Interaction (HCI), and a wide variety of “maker” subjects,
would often like to include physical computing projects. Extending these physical

Teaching the Next Generation of Cryptographic Hardware Design to the Next Generation
of Engineers

  • Aysu

Evolving threats against cryptographic systems and the increasing diversity of computing
platforms enforce teaching cryptographic engineering to a wider audience. This paper
describes the development of a new graduate course on hardware security taught …

A Web-based Remote FPGA Laboratory for Computer Organization Course

  • Wan

Learning in digital systems could be enhanced by applying a learn-by-doing mechanism.
In this paper the implementation of a web-based remote FPGA laboratory for Computer
Organization course is proposed. The projects created for this course are designed

System-on-a-Chip Design as a Platform for Teaching Design and Design Flow Integration

  • Covey

The design of microelectronic systems requires integration and cooperation across
multiple disciplines, but most curriculum is taught in unconnected pieces. This makes
the creation of manageable projects that reflect the design experience very …

SESSION: Poster Sessions I, II

UPIM: Unipolar Switching Logic for High Density Processing-in-Memory Applications

  • Sim

Internet of Things (IoT) has built a network with billions of connected devices which
generate massive volumes of data. Processing large data on existing systems requires
significant costs for data movements between processors and memory due to limited

Fence-Region-Aware Mixed-Height Standard Cell Legalization

  • Do

We propose a fence-region-aware mixed-height standard cell legalization that can optimize
the placement of standard cells that have more than a two row height in various shapes
of the fence region. The algorithm consists of pre-legalization and mixed-…

A Case for Heterogeneous Network-on-Chip Based H.264 Video Decoders

  • Ghorbani Moghaddam

The design of a heterogeneous network-on-chip (NoC) based H.264 video decoder is proposed.
A thorough investigation using a system simulator developed as the combination of
a cycle accurate NoC simulator together with complete implementations of all the …

A 16b Clockless Digital-to-Analog Converter with Ultra-Low-Cost Poly Resistors Supporting
Wide-Temperature Range from -40°C to 85°C

  • Wang

High-precision digital-to-analog converter (DAC) is a critical component in process
control, data acquisition, and testing instruments. In order to achieve high resolution
and a wide-temperature range, conventional designs have been adopting high-cost …

A Skyrmion Racetrack Memory based Computing In-memory Architecture for Binary Neural
Convolutional Network

  • Pan

A Skyrmion Racetrack Memory (SRM) based Computing In-Memory Architecture (SRM-CIM)
was proposed in this paper. Both data and computing operation can be achieved in SRM-CIM.
SRM-CIM is used to support convolutional computing in Binary Convolutional …

TASecure: Temperature-Aware Secure Deletion Scheme for Solid State Drives

  • Li

With the increasing concerns of security, the secure deletion for SSDs becomes very
costly due to its out-of-place update (i.e., an update is performed in a new location
leaving the old data un-touched). Some previous studies used a combined erase-based

An Asymmetric Dual Output On-Chip DC-DC Converter for Dynamic Workloads

  • Liu

We propose a novel two-stage hybrid on-chip DC-DC converter targeting low power applications
with multiple supply voltage domains and dynamic workloads. The converter has a nominal
input voltage of 1.2V and generates two asymmetrically regulated output …

CNNWire: Boosting Convolutional Neural Network with Winograd on ReRAM based Accelerators

  • Lin

Resistive random access memory (ReRAM) demonstrates the great potential of in-memory
processing for neural network (NN) acceleration. However, since the convolutional
neural network (CNN) is widely known as compute-bound, current ReRAM-based …

Feed-Forward XOR PUFs: Reliability and Attack-Resistance Analysis

  • Avvaru
    S. V. Sandeep

Physical unclonable functions (PUFs) can be used to generate unique signatures of
integrated circuit (IC) chips. XOR arbiter PUFs (XOR PUFs), that typically contain
multiple standard arbiter PUFs as their components, are more secure than standard

Exploring Design Trade-offs in Fault-Tolerant Behavioral Hardware Accelerators

  • Zhu

High-Level Synthesis (HLS) allows the automatic generation of hardware accelerators
with unique design metrics. This work leverages this unique feature and presents a
method to increase the search space of fault-tolerant hardware accelerators. The …

Automatic Extraction of Requirements from State-based Hardware Designs for Runtime

  • Seo

Runtime monitoring and verification enables a system to monitor itself and ensure
system requirements are met even in the presence of dynamic environments. For hardware,
state-based models are widely used, but verifying the correctness between the state-…

MirrorCache: An Energy-Efficient Relaxed Retention L1 STTRAM Cache

  • Kuan

Spin-Transfer Torque RAM (STTRAM) is a promising alternative to SRAMs in on-chip caches,
due to several advantages, including non-volatility, low leakage, high integration
density, and CMOS compatibility. However, STTRAMs’ wide adoption in resource-…

Design and Evaluation of DNU-Tolerant Registers for Resilient Architectural State

  • Alghareb
    Faris S.

In this work, we aim to maintain the correct execution of instructions in the pipeline
stages. To achieve that, the integrity for the data computed in registers during execution
should be maintained via protecting the susceptible registers. Thus, we …

Automated Analysis of Virtual Prototypes at Electronic System Level

  • Goli

The exponential increase in functionality of System-on-Chips (SoCs) and reduced Time-to-Market
(TTM) requirements have significantly altered the typical design and verification
flow. Virtual Prototyping (VP) at the Electronic System Level (ESL) using …

Dynamic Physically Unclonable Functions

  • Xiong

Physical variations in the manufacturing processes of electronic devices have been
widely leveraged to design Physically Unclonable Functions (PUFs), which can be used
for authentication and key storage. Existing PUFs are static, as their PUF responses

RDTA: An Efficient Routability-Driven Track Assignment Algorithm

  • Liu

This paper presents a routability-driven track assignment algorithm (RDTA) to efficiently
estimate routability. Routability has become a very challenging issue in modern IC
design and it can be effectively estimated by routing congestion. Track …

EraseMe: A Defense Mechanism against Information Leakage exploiting GPU Memory

  • Fang

Graphics Processing Units (GPU) play a major role in speeding up computational tasks
of the users, especially in applications such as high volume text and image processing.
Recent works have demonstrated the security problems associated with GPU that do …

A Statistical Current and Delay Model Based on Log-Skew-Normal Distribution for Low
Voltage Region

  • Cao

The increasing performance variation and non-Gaussian distribution pose remarkable
challenges to timing analysis for circuits operating in low voltage region. Accurate
modeling of the statistical characteristics is urgently required with process …

Enabling Approximate Storage through Lossy Media Data Compression

  • Worek

As compute capabilities continue to scale, memory capacity and bandwidth continue
to lag behind. Data compression is an effective approach to improving memory capacity
and bandwidth; but prior works have focused primarily on lossless compression and

Thermal Fingerprinting of FPGA Designs through High-Level Synthesis

  • Chen

This work investigates if temperature can be used to fingerprint FPGA designs and
presents a method to generate a large number of functionally equivalent FPGA designs
such that each design has a unique distinguishable thermal signature. The main …

Deep RNN-Oriented Paradigm Shift through BOCANet: Broken Obfuscated Circuit Attack

  • Tehranipoor

Logic encryption obfuscation has been used for thwarting counterfeiting, overproduction,
and reverse engineering but vulnerable to attacks. However, it was recently shown
that satisfiability – checking (SAT) can potentially compromise hardware …

STAT: Mean and Variance Characterization for Robust Inference of DNNs on Memristor-based

  • Zhang

An emerging solution to accelerate the inference phase of deep neural networks (DNNs)
is to utilize memristor crossbar arrays (MCAs) to perform highly efficient matrix-vector
multiplication in the analog domain. An adverse challenge is that memristor …

LSM: Novel Low-Complexity Unified Systolic Multiplier over Binary Extension Field

  • Xie

Unified (hybrid field-size) systolic multiplier over GF(2m) (binary extension field)
has attracted significant attentions from research communities recently as it can
be used in reconfigurable cryptographic processors. In this paper, we present a novel

Binarized Depthwise Separable Neural Network for Object Tracking in FPGA

  • Yang

Object tracking has achieved great advances in the past few years and has been widely
applied in vision-based application. Nowadays, deep convolutional neural network has
taken an important role in object tracking tasks. However, its enormous model size

An Analytical-based Hybrid Algorithm for FPGA Placement

  • Hu

As the capacity of FPGA increases, FPGA placers that adopt Simulated Annealing (SA)
algorithm take more and more runtime. To solve this problem, this paper presents HCAS,
a Hybrid algorithm Combining Analytical method and SA. There are three …

Approximate Memory with Approximate DCT

  • Ma

Approximate Computing is an emerging computing paradigm where one exploits inherent
error resilience of certain applications (e.g., digital signal processing, multimedia
and artificial intelligence) and trades off absolute computation precisions for …

AQuRate: MRAM-based Stochastic Oscillator for Adaptive Quantization Rate Sampling of Sparse

  • Salehi

Recently, the promising aspects of compressive sensing have inspired new circuit-level
approaches for their efficient realization within the literature. However, most of
these recent advances involving novel sampling techniques have been proposed …

Clockless Spin-based Look-Up Tables with Wide Read Margin

  • Salehi

In this paper, we develop a 6-input fracturable non-volatile Clockless LUT (C-LUT)
using spin Hall effect (SHE)-based Magnetic Tunnel Junctions (MTJs) and provide a
detailed comparison between the SHE-MTJ-based C-LUT and Spin Transfer Torque (STT)-MTJ-…

A Hybrid Framework for Functional Verification using Reinforcement Learning and Deep

  • Singh

In this paper, we propose a novel hybrid verification framework (HVF) which uses Reinforcement
Learning (RL) and Deep Neural Networks (DNNs) to accelerate the verification of complex
systems. More precisely, our HVF incorporates RL to generate all …

SESSION: Special Session 1: In-Memory Processing for Future Electronics

Digital and Analog-Mixed-Signal In-Memory Processing in CMOS SRAM

  • Jaiswal

Ferroelectric FET Based In-Memory Computing for Few-Shot Learning

  • Laguna
    Ann Franchesca

As CMOS technology advances, the performance gap between the CPU and main memory has
not improved. Furthermore, the hardware deployed for Internet of Things (IoT) applications
need to process ever growing volumes of data, which can further exacerbate …

True In-memory Computing with the CRAM: From Technology to Applications

  • Zabihi

An Overview of In-memory Processing with Emerging Non-volatile Memory for Data-intensive

  • Li

The conventional von Neumann architecture has been revealed as a major performance
and energy bottleneck for rising data-intensive applications. The decade-old idea
of leveraging in-memory processing to eliminate substantial data movements has returned

SESSION: Special Session 2: Approximate Computing Systems Design: Energy Efficiency and Security

Security Threats in Approximate Computing Systems

  • Yellu

Approximate computing systems improve energy efficiency and computation speed at the
cost of reduced accuracy on system outputs. Existing efforts mainly explore the feasible
approximation mechanisms and their implementation methods. There is limited …

Characterizing Approximate Adders and Multipliers Optimized under Different Design

  • Jiang

Taking advantage of the error resilience in many applications as well as the perceptual
limitations of humans, numerous approximate arithmetic circuits have been proposed
that trade off accuracy for higher speed or lower power in emerging applications …

Approximate Communication Strategies for Energy-Efficient and High Performance NoC: Opportunities and Challenges

  • Reza
    Md Farhadur

With the advancement and miniaturization of transistor technology, hundreds of cores
can be integrated on a single chip. Network-on-Chips (NoCs) are the de facto on-chip
communication fabrics for multi/many core systems because of their benefits over …

Information Hiding behind Approximate Computation

  • Wang

There are many interesting advances in approximate computing recently targeting the
energy efficiency in system design and execution. The basic idea is to trade computation
accuracy for power and energy during all phases of the computation, from data to …

MLPrivacyGuard: Defeating Confidence Information based Model Inversion Attacks on Machine Learning

  • Alves
    Tiago A. O.

As services based on Machine Learning (ML) applications find increasing use, there
is a growing risk of attack against such systems. Recently, adversarial machine learning
has received a lot of attention, where an adversary is able to craft an input or …

SESSION: Special Session 3: Recent Advances in Near and In-Memory Computing Circuit ?

XNOR-SRAM: In-Bitcell Computing SRAM Macro based on Resistive Computing Mechanism

  • Jiang

We present an in-memory computing SRAM macro for binary neural networks. The memory
macro computes XNOR-and-accumulate for binary/ternary deep convolutional neural networks
on the bitline without row-by-row data access. It achieves 33X better energy and …

Efficient Process-in-Memory Architecture Design for Unsupervised GAN-based Deep Learning
using ReRAM

  • Chen

The ending of Moore’s Law makes domain-specific architecture as the future of computing.
The most representative is the emergence of various deep learning accelerators. Among
the proposed solutions, resistive random access memory (ReRAM) based process-…

DigitalPIM: Digital-based Processing In-Memory for Big Data Acceleration

  • Imani

In this work, we design, DigitalPIM, a Digital-based Processing In-Memory platform
capable of accelerating fundamental big data algorithms in real time with orders of
magnitude more energy efficient operation. Unlike the existing near-data processing

In-memory Processing based on Time-domain Circuit

  • Kong

Deep Neural Networks (DNN) have emerged as a dominant algorithm for machine learning
(ML). High performance and extreme energy efficiency are critical for deployments
of DNN, especially in mobile platforms such as autonomous vehicles, cameras, and other

SESSION: Special Session 4: Opportunities and Challenges for Emerging Monolithic 3D Integrated

An Overview of Thermal Challenges and Opportunities for Monolithic 3D ICs

  • Shukla

Monolithic 3D (Mono3D) is a three-dimensional integration technology that can overcome
some of the fundamental limitations faced by traditional, two-dimensional scaling.
This paper analyzes the unique thermal characteristics of Mono3D ICs by simulating

Logic Monolithic 3D ICs: PPA Benefits and EDA Tools Necessary

  • Pentapati
    Sai Surya Kiran

Monolithic 3D (M3D) ICs provide a way to achieve high performance and low power designs
within the same technology node, thereby bypassing the need for transistor scaling.
M3D ICs have multiple 2D tiers sequentially fabricated on top of each other and …

Investigation and Trade-offs in 3DIC Partitioning Methodologies: N/A

  • Sketopoulos

In this work, we compare alternative 3DIC partitioning methodologies, in terms of
slack, number of inter-tier vias, Tier Area Ratio (TAR) and HPWL design parameters.
The popular 3DIC postplacement, bin-based Fidducia-Mattheyses (FM) partitioning flow
is …

Test and Design-for-Testability Solutions for Monolithic 3D Integrated Circuits

  • Koneru

M3D integration can result in reduced area and higher performance when compared to
3D die stacking. Due to the benefits of M3D integration, there is growing interest
in industry towards the adoption of this technology. However, test challenges for
M3D …

N3XT Monolithic 3D Energy-Efficient Computing Systems

  • Aly
    Mohamed M. Sabry

The world’s appetite for analyzing massive amounts of structured and unstructured
data has grown dramatically. The computational demands of these abundant-data applications
far exceed the capabilities of today’s computing systems. The N3XT (Nano-…

SESSION: Special Session 5: Robust IC Authentication and Protected Intellectual Property: A
Special Session on Hardware Security

How to Generate Robust Keys from Noisy DRAMs?

  • Karimian

Security primitives based on Dynamic Random Access Memory (DRAM) can provide cost-efficient
and practical security solutions, especially for resource-constrained devices, such
as hardware used in the Internet of Things (IoT), as DRAMs are an intrinsic …

Threats on Logic Locking: A Decade Later

  • Zamiri Azar

To reduce the cost of ICs and to meet the market’s demand, a considerable portion
of manufacturing supply chain, including silicon fabrication, packaging and testing
may be pushed offshore. Utilizing a global IC manufacturing supply chain, and inclusion

On Custom LUT-based Obfuscation

  • Kolhe

Logic obfuscation yields hardware security against various threats, such as Intellectual
Property (IP) piracy and reverse engineering. Evolving Boolean satisfiability (SAT)
attacks have challenged the hardware security assurance rendered by various …

Securing Analog Mixed-Signal Integrated Circuits Through Shared Dependencies

  • Juretus

The transition to a horizontal integrated circuit (IC) design flow has raised concerns
regarding the security and protection of IC intellectual property (IP). Obfuscation
of an IC has been explored as a potential methodology to protect IP in both the …

SESSION: Special Session 6: Neuromorphic Computing and Deep Neural Network

Design Methodology for Embedded Approximate Artificial Neural Networks

  • Balaji

Artificial neural networks (ANNs) have demonstrated significant promise while implementing
recognition and classification applications. The implementation of pre-trained ANNs
on embedded systems requires representation of data and design parameters in …

Exploration of Segmented Bus As Scalable Global Interconnect for Neuromorphic Computing

  • Balaji

Spiking Neural Networks (SNNs) are efficient computation models for spatio-temporal
pattern recognition on resource and power constrained platforms. Dedicated SNN hardware,
also called neuromorphic hardware, can further reduce the energy consumption of …

ADMM-based Weight Pruning for Real-Time Deep Learning Acceleration on Mobile Devices

  • Li

Deep learning solutions are being increasingly deployed in mobile applications, at
least for the inference phase. Due to the large model size and computational requirements,
model compression for deep neural networks (DNNs) becomes necessary, especially …

On the use of Deep Autoencoders for Efficient Embedded Reinforcement Learning

  • Prakash

In autonomous embedded systems, it is often vital to reduce the amount of actions
taken in the real world and energy required to learn a policy. Training reinforcement
learning agents from high dimensional image representations can be very expensive
and …

SESSION: Panelist Position Papers

Tuning Track-based NVM Caches for Low-Power IoT Devices

  • Aghaei Khouzani

Track-based non-volatile memories, such as Domain Wall Memory (DWM) and Skyrmion,
are promising candidates to be used as CPU caches due to their ultra-high density
and low-static power. However, the access latency and energy of these devices are
highly …

Dynamic Computation Migration at the Edge: Is There an Optimal Choice?

  • Shahhosseini

In the era of Fog computing where one can decide to compute certain time-critical
tasks at the edge of the network, designers often encounter a question whether the
sensor layer provides the optimal response time for a service, or the Fog layer, or

Solving Energy and Cybersecurity Constraints in IoT Devices Using Energy Recovery

  • Thapliyal

With the growth of Internet-of-Things (IoT), the potential threat vectors for malicious
cyber and hardware attacks are rapidly expanding. As the IoT paradigm emerges, there
are challenging requirements to design energy-efficient and secure systems. To …

Right-Provisioned IoT Edge Computing: An Overview

  • Adegbija

Edge computing on the Internet of Things (IoT) is an increasingly popular paradigm
in which computation is moved closer to the data source (i.e., edge devices). Edge
computing mitigates the overheads of cloud-based computing arising from increased

Secure Computing Systems Design Through Formal Micro-Contracts

  • Kinsy
    Michel A.

Two enduring concepts in computer system design are abstraction levels and layered
composition. The design generally takes a layered approach where each layer implements
a different abstraction of the system. The layers communicate through interfaces …