Test for news - Page 11 of 11

SRC@ICCAD

18 June 2019

Yibo Lin

No comments

Categories: SIGDA Events

ACM Student Research Competition at ICCAD 2021 (SRC@ICCAD’21)

Sponsored by Microsoft Research, the ACM Student Research Competition is an internationally recognized venue enabling undergraduate and graduate students who are ACM members to:

Experience the research world — for many undergraduates, this is a first!
Share research results and exchange ideas with other students, judges, and conference attendees
Rub shoulders with academic and industry luminaries
Understand the practical applications of their research
Perfect their communication skills
Receive prizes and gain recognition from ACM and the greater computing community.

This Year’s Results (2021)

Undergraduate category (8 participants in total):

1st place: Zizheng Guo, Peking University
Presentation Title: Accelerating Static Timing Analysis with Parallel and Heterogeneous Computing

2nd place: Cynthia Chen, California Institute of Technology
Presentation Title: Optimizing Quantum Circuit Synthesis for Permutations on Limited Connectivity Topologies

3rd place: Yu Qian, Zhejiang University
Presentation Title: Energy-Aware Designs of Ferroelectric Ternary Content Addressable Memory

Graduate category (22 participants in total):

1st place: Xiaofan Zhang, UIUC
Presentation Title: Bridge the Hardware-Software Gap: Exploring End-to-End Design Flows for Building Efficient AI Systems

2nd place: Sanmitra Banerjee, Duke University
Presentation Title: Optimizing Emerging AI Accelerators under Random Uncertainties

3rd place: Qi Sun, Chinese University of Hong Kong
Presentation Title: Fast and Efficient Deployment of Deep Learning Algorithms via Learning-based Methods

Submission

The ACM Special Interest Group on Design Automation (ACM SIGDA) is organizing such an event in conjunction with the International Conference on Computer Aided Design (ICCAD). Authors of accepted submissions will get ICCAD registration fee support from SIGDA. The event consists of several rounds, as described at http://src.acm.org/, where you can also find more details on student eligibility and timeline.

DEADLINE: September 28, 2021 (Extended)
Online Submission: https://easychair.org/my/conference?conf=srciccad2021

Details on abstract submission:
Research projects from all areas of design automation are encouraged. The author submitting the abstract must still be a student at the time the abstract is due. Each submission should be made on the EasyChair submission site. Please include the author’s name, affiliation, and email address; research advisor’s name; ACM student member number; category (undergraduate or graduate); research title; and an extended abstract (maximum 2 pages or 800 words) containing the following sections:

Problem and Motivation: This section should clearly state the problem being addressed and explain the reasons for seeking a solution to this problem.
Background and Related Work: This section should describe the specialized (but pertinent) background necessary to appreciate the work. Include references to the literature where appropriate, and briefly explain where your work departs from that done by others. Reference lists do not count towards the limit on the length of the abstract.
Approach and Uniqueness: This section should describe your approach in attacking the problem and should clearly state how your approach is novel.
Results and Contributions: This section should clearly show how the results of your work contribute to computer science and should explain the significance of those results. Include a separate paragraph (maximum of 100 words) for possible publication in the conference proceedings that serves as a succinct description of the project.
Single paper summaries (or just cut & paste versions of published papers) are inappropriate for the ACM SRC. Submissions should include at least one year worth of research contributions, but not subsuming an entire doctoral thesis load.

All accepted submissions will be invited to present their work to the community (and a jury) as part of the program for ICCAD 2021 (details on the presentations will follow after acceptance). Note that ICCAD will take place virtually (i.e., as an online event) from November 1 to November 5, 2021.

The ACM Student Research Competition allows both graduate and undergraduate students to discuss their research with student peers, as well as academic and industry researchers, in an informal setting, while enabling them to attend ICCAD and compete with other ACM SRC winners from other computing areas in the ACM Grand Finals.

Online Submission – EasyChair:
https://easychair.org/my/conference?conf=srciccad2021
Important dates:

Abstract submission deadline: September 28, 2021
Acceptance notification: October 12, 2021
Poster session: November 02, 2021
Award winners announced at ICCAD
Grand Finals winners honored at ACM Awards Banquet: June 2022 (Estimated)

Requirement:
Students submitting and presenting their work at SRC@ICCAD’21 are required to be members of both ACM and ACM SIGDA.

Organizers:

Meng Li (Facebook, USA), meng.li@fb.com

Cong Hao (Georgia Institute of Technology, USA), callie.hao@ece.gatech.edu

Last Year’s Results (2020): SIGDA SRC Gold Medalists won ACM SRC Grand Finals

Graduate: First Place

Jiaqi Gu, University of Texas at Austin

Research Advisors: David Z. Pan and Ray T. Chen

“Light in Artificial Intelligence: Efficient Neuromorphic Computing with Optical Neural Networks” (ICCAD 2020)

Deep neural networks have received an explosion of interest for their superior performance in various intelligent tasks and high impacts on our lives. The computing capacity is in an arms race with the rapidly escalating model size and data amount for intelligent information processing. Practical application scenarios, e.g., autonomous vehicles, data centers, and edge devices, have strict energy efficiency, latency, and bandwidth constraints, raising a surging need to develop more efficient computing solutions. However, as Moore’s law is winding down, it becomes increasingly challenging for conventional electrical processors to support such massively parallel and energy-hungry artificial intelligence (AI) workloads. .. [Read more]

Undergraduate: Second Place

Chuangtao Chen, Zhejiang University

Research Advisor: Cheng Zhuo

“Optimally Approximated Floating-Point Multiplier” (ICCAD 2020)

At the edge, IoT devices are designed to consume the minimum resource to achieve the desired accuracy. However, the conventional processors, such as CPU or GPU, can only conduct all the computations with predetermined but sometimes unnecessary precisions, inevitably degrading their energy efficiency. When running data-intensive applications, due to the large range of input operands, most conventional processors heavily rely on floating-point units (FPUs). Recently, approximate computing has become a promising alternative to improve energy efficiency for IoT devices on the edge, especially when running inaccuracy-tolerable applications. For various data-intensive tasks on edge devices, multiplication is a common but the most energy consuming one among different floating-point operations. As a common arithmetic component that has been studied for decades [1]–[3], the past focus on the FP multiplier is accuracy and performance… [Read more]

Ph.D. Forum@DAC 2022

18 June 2019

Umit Ogras

No comments

Categories: SIGDA Events

The Ph.D. Forum at the Design Automation Conference is a poster session hosted by ACM SIGDA and IEEE CEDA for Ph.D. students to present and discuss their dissertation research with people in the EDA community. It has become one of the premier forums for Ph.D. students in design automation to get feedback on their research and to connect with other members of the community. It also enables both, academicians and industry, to see the best graduating students in one place. Presentations are selected through a scientific evaluation by an expert committee consisting of academia and industry. The forum is open to all members of the design automation community and is free-of-charge. It is co-located with DAC, but a DAC registration is not required in order to attend this event.

Submission Process

In order to select the presentations to be featured at the DAC Ph.D. Forum, we are seeking contributions from students who are currently working on their Ph.D. or have recently completed their Ph.D. Corresponding applications have to be submitted through EasyChair and need to include the following two documents:

A two-page PDF abstract of the dissertation (in two-column format, using 10-11 pt. fonts and single-spaced lines), including name, institution, advisor, contact information, estimated (or actual) graduation date, whether the work has been presented at the ASP-DAC Ph.D. Forum or the DATE Ph.D. Forum, as well as figures, and bibliography (if applicable). The two-page limit on the abstract will be strictly enforced: any material beyond the second page will be truncated before sending to the reviewers. Please include a description of the supporting paper, including the publication forum. A list of all papers authored or co-authored by the student, related to the dissertation topic and included in the two-page abstract, will strengthen the submission.
A published (or accepted) paper, in support of the submitted dissertation abstract. The paper must be related to the dissertation topic and the publication forum must have a valid ISBN number. It will be helpful, but is not required, to include your name and the publication forum on the first page of the paper. Papers on topics unrelated to the dissertation abstract or not yet accepted will not be considered during the review process.

Please include the supporting paper with the abstract in one PDF file and submit the single file (there are many free utilities available online which can merge multiple PDF files into a single file if necessary). Then, please submit your application through the following link:

https://easychair.org/conferences/?conf=dacforum22

Important Dates

Abstract Submission: March 13, 2022
Notification Date: May 8, 2022
Forum Presentation Date: July 12, 2022 in San Francisco

Eligibility

All submitters must satisfy the following eligibility constraints:

Dissertation topic must be relevant to the DAC community.
Students with at least one published or accepted conference, symposium or journal paper.
Students within 1-2 years of dissertation completion and students who have completed their dissertation during the 2021-2022 academic year. Students closer to graduation will have higher priority since the rest of the students can attend a future Ph.D. Forum with more mature results.
Students who have presented previously at the DATE and ASP-DAC Ph.D. forums are eligible, but will be less likely to receive travel assistance.
Previous DAC SIGDA Ph.D. forum presenters are not eligible.
Students having a conflict of interests with one of the (co-)chairs and/or a member of the evaluation committee are allowed to submit. The submission will then be handled by different chairs/members and the entire evaluation will be completely blind to the anyone with a CoI.

Furthermore, it is strongly recommended to consider the following remarks:

The abstract is the key part of your submission. Write the abstract for someone familiar with your technical area, but entirely unfamiliar with your work. Clearly indicate the motivation of your Ph.D. dissertation topic, the uniqueness of your approach, as well as the potential impact your approach may have on the topic.
Proper spelling, grammar, and coherent organization are critical: remember that the two pages may be the only information about yourself and your PhD research available to the reviewers.

Travel Support and Best Presentation Award

All presentations selected to be presented at the DAC Ph.D. Forum are eligible to apply for some travel support as well as for getting awarded with a Best Presentation Award. Corresponding information on how to apply for travel support will be provided later to all accepted presenters (however, please note that travel support can only be given to a selected amount of presenters and will only cover a fraction of the actually needed travel costs). The Best Presentation Award is selected by a dedicated committee at the Ph.D. Forum (taking the submission as well as the actual presentation into account). The same CoI guidelines as for submission evaluation apply.

Contact

For questions not addressed on this page, please send an e-mail to Robert Wille (robert.wille@jku.at). Please include “DAC Ph.D Forum” in the subject line of your email.

Organizers Guide

18 June 2019

Yibo Lin

No comments

Categories: SIGDA Events

ACM/SIGDA Guide to Running or Starting a Conference, Symposium, or Workshop

Revised on May 1, 2020

ACM/SIGDA sponsors a number of conferences, symposia, and workshops, which will be referred to generically as events. The event staff are almost always volunteers, and those involved change on a yearly basis. The purpose of this guide is to give a short overview of how events are run, make you aware of services that ACM and SIGDA can provide, and to help simplify the entire process.

This guide is divided into four sections. First is the “financial” aspect of running an event: contracts with hotels, registration, etc. Second is the “administrative” component: selecting a program committee, setting up a timeline, handling paper submissions and reviews, creating and archiving the event web site, and passing control to the next set of organizers. Third is a checklist and timeline, to give an idea of when various tasks should be done. The last part is about the travel grant.

Financial View
SIGDA is a non-profit professional society–there is no expectation that an event (particularly a new one) will return a large surplus. Having some positive revenue, however, is desirable. The bulk of funds that are used to support student travel, reduced student registrations, online access to DA literature, salary for permanent staff, insurance coverage, among other things, comes from conference revenue. SIGDA membership fees provide almost no revenue.

Cosponsorship and In Cooperation
Most events are cosponsored by some branch of the IEEE; some events have other cosponsors. Generally, sponsorship implies financial and legal responsibility for the event. That includes providing insurance, accepting liability for contracts and covering any deficit the conference might incur. If the conference should have a surplus, the sponsor or co-sponsor will receive a portion of that surplus based on their percentage of sponsorship. Co-sponsorship percentages rarely change; both ACM and IEEE are interested in having good cooperation between the societies, and by sharing both risks and rewards across the societies, service is improved for the members. Dual sponsorship also broadens the audience for any advertising, improving overall attendance. In some cases, a group may be “in cooperation” — which implies that they see value in the technical program and wish to lend their name to the event without taking on any financial or legal responsibility.

TMRF — Technical Meeting Request Form
A TMRF is a large spreadsheet that details the expected attendance, registration costs, hotel costs, printing costs, and so forth. The objective is to determine if the event is financially viable, and in keeping with prior years. The organizers of an event will need to file a TMRF, and receive approval, before ACM will accept any financial responsibility. One common concern is with respect to some additional fees in the TMRF based on total expenditures. These fees go to cover ACM insurance and liability expenses, and help cover the salaries of the permanent staff at ACM.

Care should be taken when preparing a TMRF; try to keep all costs and projected revenues within reason–in some cases, approval has been delayed due to budget concerns. We stress again that there is no requirement for an event (particularly a new one) to turn a profit, although this is preferable. If an event is profitable, it enables SIGDA to fund other activities, to support events in new areas, and to weather short term losses without sacrificing member services.

ACM Support Staff
ACM employs a number of permanent staff to assist in planning and running an event. In particular, the staff has data on the following.

Other events in a given city, and on a given date. Hotel prices may be extremely high if you are planning your event in a town that is hosting a major activity.
Listings of hotels in a given town, with rough estimates on the number of attendees they can support, the types of conference rooms available, etc.
Obviously, having a successful event will require good location at a time the attendees find convenient. Consulting the ACM staff on this is highly recommended. The ACM staff involved with supporting events can be found on our Who’s Who page and ACM’s SIG volunteer resources page.

Contracts
Never sign any contract personally. If a disaster occurs, a hotel may hold you responsible for all charges. For example, a conference scheduled to be held a few days after the 9/11 terrorist attack was cancelled. The hotel that was to hold the event lost many room bookings, which was charged back to the sponsoring societies, costing them thousands of dollars. ACM and SIGDA are prepared to accept this type of financial liability. As an event organizer, you should not put yourself in this position.

We recommend that you allow the ACM staff to do the bulk of the negotiation with the hotel or conference center. They are familiar with industry practices, know typical rates, and can use the membership of ACM as leverage for better deals. The staff will keep you informed, and will work to find arrangements that are to your satisfaction.

Registration
Allowing early registration through the web is highly recommended; this is a good way to get an early estimate on attendance. ACM can support electronic registration, but must charge some fees to cover related expenses and the time required for the support staff.

There are several ways to handle on-site registration: you may have either volunteer staff or a professional organization, and you may wish to accept cash, checks, or credit cards. If you accept credit cards, billing immediately will require phone access, equipment, and coordination with a credit agency. We would recommend instead simply recording the credit card number manually, and then having ACM process the charges after the event.

If the event is relatively small, we highly recommend finding volunteers to man the registration desk; professional conference management can be quite expensive

Administration

Executive Committee
Most events have an “executive committee” consiting of a general chair, program committee chair, publications chair, and publicity chair. Larger events may have more positions. In most cases, there is a progression of staff through the positions, allowing new members to gain experience before taking control of an event.

Program Committee
For paper review, a program committee should be formed. We encourage a balance of academic and industry representatives. Selection of committee members should be done carefully: a well-respected group will improve the public perception of accepted papers, encourage good research groups to submit papers, and will improve attendance.

Timeline
We recommend setting a timeline for all tasks related to the event. By setting the timeline, all committee members will know when certain tasks must be done, and will be able to plan accordingly. At the end of this document we show a sample timeline that contains common tasks. Specific dates obviously depend on the event itself. Carefully adjusting the dates to fit in with other events is beneficial. For example, it might be possible to arrange a program committee meeting to follow a widely attended conference, which reduces the cost of attending the meeting and improves committee members’ participation. When possible, advertising should be scheduled to coincide with similar events.

Paper Submission, Review, and Selection
Paper submission should be performed electronically; this greatly simplifies the submission and review process. Supported file formats (PDF, PostScript, DOC, text, etc.) are at the discretion of the program committee, although we suggest that PDF be the preferred format. ACM has style guidelines for proceedings and journal papers, and these should be referenced on any call for papers or submission web site.

There are a number of conference paper management software packages. At one point, ACM investigated supporting one in-house. Each program committee seemed to have a specific package that they were quite loyal to, making centralized support impractical. If your program committee does not have a specific preference, check with ACM staff to see if there is a supported package.

Web based conference software generally supports online review submission. We suggest sending periodic “warning” emails to reviewers, letting them know the review deadlines. Without these reminders, many reviewers may wait until the last minute, resulting in low-quality reviews.

Paper selection should be performed by the program committee in a timely fashion. A fast turn-around on submissions will benefit authors, and increase the number of submitted papers.

Proceedings — Printed and Electronic
The print version of the proceedings will require coordination with the printer. There will be deadlines for final camera-ready paper submissions, table of contents, etc. Plan for some authors being a few days late with submissions, and allow for unexpected delays.

Generally, workshops do not have “published” proceedings. Discuss with the ACM staff if the event material should be considered as a publication. For workshops, many authors may be willing to discuss preliminary results, as long as it does not preclude them from publishing the work in a larger venue.

ACM/SIGDA supports online access to all sponsored event material. It can be made available through the ACM portal, the SIGDA web site, and on annual SIGDA publication compendiums. Part of the revenue from successful conferences has allowed SIGDA to subsidize this publication, making all material available free of charge. For events co-sponsored with IEEE, the material is likely still available free of charge; IEEE and ACM have been cooperating actively to make publications as widely available as possible.

Creation and Archival of a Web Site
ACM provides free-of-charge web hosting and web site archival for sponsored events. Even domain registration fees can be covered. Funding for this activity is derived from budget surpluses from successful sponsored events.

If the web hosting for your event is not currently handled by ACM, contact the staff, and they will assist in setting things up.

Handoff to the Next Organizers
Perhaps the most important task for an executive committee is making arrangements to hand off the event to a new group. The next executive committee will need to know attendance, number of submissions, acceptance rate, planned and actual expenses, and any comments from the attendees. We recommend having the next set of organizers identified early–perhaps by the time of the event–giving enough time for them to prepare and have success for the next year.

Checklist
We would suggest filling in dates for the following events as soon as possible, and then distributing the checklist to the executive committee. This should help committee members from missing important task deadlines, and makes sure that no one is “in the dark.”

Contact ACM staff for preliminary event planning.
Finalize the event executive committee.
Recruit technical program committee members.
Establish event website.
Identify event location and venue; ACM staff members should be able to help.
Submit TMRF to ACM.
Publish “Call for Papers” deadline in print.
Publish “Call for Papers” electronically.
Establish and publish paper submission deadline.
Assign papers to reviewers.
Review submission deadline.
Call meeting of the Technical Program Committee.
Print deadline for “Call for Participation.”
Notify authors.
Have papers ready for camera-ready paper deadline.
Distribute electronic call for participation.
Begin accepting conference registrations.
Identify executive committee members for next year.
Event.
Collect statistics on event for ACM and next organizers.
Hand over control to the next committee.

Travel grants:

If the conference event is solely financially sponsored by the ACM SIGDA, or is jointly financially sponsored by ACM SIGDA and other organizations, the conference organizer is generally suggested to include travel grants into the conference budget. In this case, the travel grant will be handled by the event organizer;
For any reason that 1 cannot be implemented, the participants of the conference can apply for ACM SIGDA travel grants directly from the ACM SIGDA. In this case, the travel grants will be handled by the ACM SIGDA, or handled by the conference organizer authorized by the ACM SIGDA.

CADathlon 2018@ICCAD

18 June 2019

Yibo Lin

No comments

Categories: SIGDA Events

SIGDA’s CADathlon 2018 at ICCAD

Sunday, Nov. 4, 2018 8 am – 5 pm, Hilton San Diego Resort & Spa, San Diego, CA

Welcome to the CADathlon @ ICCAD

Participation Request — Closed.
Team Confirmation Form — Closed.
Important Dates
Participants
Problems and References – 2018’s / 2017’s / 2016’s / 2015’s / 2014’s / 2013’s / 2012’s / archive
Contest Guidelines
Organizing Committee
Winners

The CADathlon is a challenging, all-day, programming competition focusing on practical problems at the forefront of Computer-Aided Design,
and Electronic Design Automation in particular. The contest emphasizes the knowledge of algorithmic techniques for CADapplications,
problem-solving and programming skills, as well as teamwork.
In its 15th year as the “Olympic games of EDA,” the contest brings together the best and the brightest of the next generation of CAD
professionals. It gives academia and the industry a unique perspective on challenging problems and rising stars, and it also helps attract
top graduate students to the EDA field.
The contest is open to two-person teams of graduate students specializing in CAD and currently full-time enrolled in a Ph.D. granting
institution in any country. Students are selected based on their academic backgrounds and their relevant EDA programming experiences.
Partial or full travel grants are provided to qualifying students. CADathlon competition consists of six problems in the following areas:

Circuit Design & Analysis
Physical Design & Design for Manufacturability
Logic & High-Level Synthesis
System Design & Analysis
Functional Verification & Testing
Future technologies (Bio-EDA, Security, AI, etc.)

More specific information about the problems and relevant research papers will be released on the Internet one week prior to the
competition. The writers and judges that construct and review the problems are experts in EDA from both academia and industry. At the
contest, students will be given the problem statements and example test data, but they will not have the judges’ test data. Solutions
will be judged on correctness and efficiency. Where appropriate, partial credit might be given.
The team that earns the highest score is declared the winner. In addition to handsome trophies, the first place and the second place teams
receive cash award, and the contest winners will be announced at the ICCAD Opening Session on Monday morning and celebrated
at the ACM/SIGDA Dinner and Member Meeting on Monday evening.

Global Education Partner:

LIVE

18 June 2019

Yibo Lin

No comments

Categories: SIGDA Events

SIGDA Live is a series of webinars, launched monthly or bi-monthly, on topics (either technical or non-technical) of general interest to the SIGDA community. The talks in general fall on the last Wednesday of a month, and last about 45 minutes plus 15 minutes Q&A. Speaker and topic nominations are welcome and should be sent to sigdalive@gmail.com. All past talks are archived through our Youtube channel at: https://www.youtube.com/channel. Each year we recognize one speaker with the “Most Influential Speaker of the Year” award.

Organizers: Yiyu Shi (University of Notre Dame), Qinru Qiu (Syracuse University)

Technical support: Bei Yu (Chinese University of Hong Kong)

Recent SIGDA-sponsored presentations:

DASS@DAC

18 June 2019

Yibo Lin

No comments

Categories: SIGDA Events

DASS at DAC 2018

The Design Automation Summer School (DASS) is a one-day intensive course on research and development in design automation (DA). Each topic in this course will be covered by a distinguished speaker who will define the topic, describe recent accomplishments, and indicate remaining challenges. Interactive discussions and follow-up activities among the participants will be used to reinforce and expand upon the lessons. This program is intended to introduce and outline emerging challenges, and to foster creative thinking in the next generation of EDA engineers. Simultaneously, they also help the students hone their problem solving, programming, and teamwork skills, in addition to fostering long-term collégial relationships. The 2018 SIGDA Design Automation Summer School is co-hosted by A. Richard Newton Young Fellowship Program at ACM/IEEE Design Automation Conference (DAC). DASS program will be co-hosted by DAC RNYS program and will be held on Sunday June 24, 2018 at Room 30003, Moscone Center West, from 9 a.m. to 6 p.m. in San Francisco, California. Richard Newton Young Student Fellowship Welcome breakfast is held at the same room from 7:30 am to 8:30 am. All the students receiving the fellowship (excluding the mentors) are required to attend DASS event.

The DASS event complements other educational and professional development activities in design automation including outreach projects such as the SIGDA University Booth, the CADathlon, and the Design Automation Conference (DAC) Ph.D. forum that have met with tremendous success over the past decade. Note that there is no separate call for participation for DASS. Attending DASS is mandatory for all the students receiving the Richard Newton Young Fellowship. The DASS final program will be available in late April 2018.

Organizing Committee:

Yier Jin (Univ. of Central Florida, Gainesville, FL) (yier.jin@eecs.ucf.edu)
Muhammad Shafique (TU Wien, Vienna, Austria) (muhammad.shafique@tuwien.ac.at)
Jaytita Das (Intel Corp., Hillsboro, OR) (jayita.365@gmail.com)

SIGDA advisory committee for DASS:

Yiran Chen (Duke University) (yiran.chen@duke.edu)

DASS Schedule

Date: Sunday June 24, 2018
Time: 7:30am – 6:00pm
Location: Room 3003, Moscone Center West, San Francisco, California

The detailed schedule is listed below:

Time	Session title	Speakers	Title
7:30-9:00 am	Breakfast and RNYF Networking
9:00-11:00 am	In-Memory Computations	Onur Mutlu (ETH Zürich)	Processing Data Where It Makes Sense in Modern Computing Systems: Enabling In-Memory Computation
10:00-10:15 am	Coffee Break
11:00am-12:00pm	Neuro-Inspired Learning	Kaushik Roy (Purdue University)	Re-Engineering Computing with Neuro-Inspired Learning: Devices, Circuits, and Systems
12:00-1:00 pm	Lunch
1:00-2:00 pm	EDA	Sani Nassif (Radyalis)	From EDA to ’42’
2:00-3:00 pm	EDA	Kunal Ghosh (VSD)	An overview of RISC-V CPU Core implementation and sign-off using EDA management system
3:00-4:00 pm	EDA	Seetharam Narasimhan (Intel)	Security Evaluation of System-on-Chip (SoC) Products
4:00-4:30 pm	Coffee Break
4:30-6:00 pm		Laleh Behjat (University of Calgary)	Give a Winning Presentation: From Idea to Delivery
6:00 – Onward	Reception and Networking: Welcome Reception Level 3 Lobby

Invited Talks

Title: Processing Data Where It Makes Sense in Modern Computing Systems: Enabling In-Memory Computation
Spaeker: Onur Mutlu (ETH Zürich)
Abstract: Today’s systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: 1) data access from memory is already a key bottleneck as applications become more data-intensive and memory bandwidth and energy do not scale well, 2) energy consumption is a key constraint in especially mobile and server systems, 3) data movement is very expensive in terms of bandwidth, energy and latency, much more so than computation. These trends are especially severely-felt in the data-intensive server and energy-constrained mobile systems of today. At the same time, conventional memory technology is facing many scaling challenges in terms of reliability, energy, and performance. As a result, memory system architects are open to organizing memory in different ways and making it more intelligent, at the expense of slightly higher cost. The emergence of 3D-stacked memory plus logic as well as the adoption of error correcting codes inside the latest DRAM chips are an evidence of this trend. In this lecture, I will discuss some recent research that aims to practically enable computation close to data. After motivating trends in applications as well as technology, we will discuss at least two promising directions: 1) performing massively-parallel bulk operations in memory by exploiting the analog operational properties of DRAM, with low-cost changes, 2) exploiting the logic layer in 3D-stacked memory technology in various ways to accelerate important data-intensive applications. In both approaches, we will discuss relevant cross-layer research, design, and adoption challenges in devices, architecture, systems, applications, and programming models. Our focus will be the development of in-memory processing designs that can be adopted in real computing platforms and real data-intensive applications, spanning machine learning, graph processing and genome analysis, at low cost. We will also discuss and describe simulation and evaluation infrastructures that can enable exciting and forward-looking research in future memory systems, including Ramulator and SoftMC.
Biography: Onur Mutlu is a Professor of Computer Science at ETH Zurich. He is also a faculty member at Carnegie Mellon University, where he previously held the William D. and Nancy W. Strecker Early Career Professorship. His current broader research interests are in computer architecture, systems, and bioinformatics. He is especially interested in interactions across domains and between applications, system software, compilers, and microarchitecture, with a major current focus on memory and storage systems. A variety of techniques he, together with his group and collaborators, have invented over the years have influenced industry and have been employed in commercial microprocessors and memory/storage systems. He obtained his PhD and MS in ECE from the University of Texas at Austin and BS degrees in Computer Engineering and Psychology from the University of Michigan, Ann Arbor. His industrial experience spans starting the Computer Architecture Group at Microsoft Research (2006-2009), and various product and research positions at Intel Corporation, Advanced Micro Devices, VMware, and Google. He received the inaugural IEEE Computer Society Young Computer Architect Award, the inaugural Intel Early Career Faculty Award, faculty partnership awards from various companies, a healthy number of best paper or “Top Pick” paper recognitions at various computer systems and architecture venues, and the ACM Fellow recognition “for contributions to computer architecture research, especially in memory systems.” His computer architecture course lectures and materials are freely available on YouTube, and his research group makes software artifacts freely available online. For more information, please see his webpage at http://people.inf.ethz.ch/omutlu/.
Title: Re-Engineering Computing with Neuro-Inspired Learning: Devices, Circuits, and Systems
Speaker: Kaushik Roy (Purdue University)
Abstract: Advances in machine learning, notably deep learning, have led to computers matching or surpassing human performance in several cognitive tasks including vision, speech and natural language processing. However, implementation of such neural algorithms in conventional “von-Neumann” architectures are several orders of magnitude more energy expensive than the biological brain. Hence, we need fundamentally new approaches to sustain exponential growth in performance at high energy-efficiency beyond the end of the CMOS roadmap in the era of ‘data deluge’ and emergent data-centric applications. Exploring the new paradigm of computing necessitates a multi-disciplinary approach: exploration of new learning algorithms inspired from neuroscientific principles, developing network architectures best suited for such algorithms, new hardware techniques to achieve orders of improvement in energy consumption, and nanoscale devices that can closely mimic the neuronal and synaptic operations of the brain leading to a better match between the hardware substrate and the model of computation. In this presentation, I will discuss recent developments in CMOS and non-CMOS devices and architectures for implementing brain-inspired hardware. Implementation of different neural operations with varying degrees of bio-fidelity (from “non-spiking” to “spiking” networks) and implementation of on-chip learning mechanisms (Spike-Timing Dependent Plasticity) will be discussed. Additionally, we also show probabilistic neural and synaptic computing platforms that can leverage the underlying stochastic device physics of spin-devices due to thermal noise. System-level simulations indicate ~100x improvement in energy consumption for spin-based neural computing over a corresponding CMOS implementation across different computing workloads. Complementary to the above efforts, I will also present different learning algorithms including stochastic learning with one-bit synapses that greatly reduces the storage/bandwidth requirement while maintaining competitive accuracy, and adaptive online learning that efficiently utilizes the limited memory and resource constraints to learn new information without catastrophically forgetting already learnt data.
Biography: Kaushik Roy received B.Tech. degree in electronics and electrical communications engineering from the Indian Institute of Technology, Kharagpur, India, and Ph.D. degree from the electrical and computer engineering department of the University of Illinois at Urbana-Champaign in 1990. He was with the Semiconductor Process and Design Center of Texas Instruments, Dallas, where he worked on FPGA architecture development and low-power circuit design. He joined the electrical and computer engineering faculty at Purdue University, West Lafayette, IN, in 1993, where he is currently Edward G. Tiedemann Jr. Distinguished Professor. He also the director of the center for brain-inspired computing (C-BRIC) funded by SRC/DARPA. His research interests include neuromorphic and emerging computing models, neuro-mimetic devices, spintronics, device-circuit-algorithm co-design for nano-scale Silicon and non-Silicon technologies, and low-power electronics. Dr. Roy has published more than 700 papers in refereed journals and conferences, holds 18 patents, supervised 75 PhD dissertations, and is co-author of two books on Low Power CMOS VLSI Design (John Wiley & McGraw Hill). Dr. Roy received the National Science Foundation Career Development Award in 1995, IBM faculty partnership award, ATT/Lucent Foundation award, 2005 SRC Technical Excellence Award, SRC Inventors Award, Purdue College of Engineering Research Excellence Award, Humboldt Research Award in 2010, 2010 IEEE Circuits and Systems Society Technical Achievement Award (Charles Doeser Award), Distinguished Alumnus Award from Indian Institute of Technology (IIT), Kharagpur, Fulbright-Nehru Distinguished Chair, DoD Vannevar Bush Faculty Fellow (2014-2019), Semiconductor Research Corporation Aristotle award in 2015, and best paper awards at 1997 International Test Conference, IEEE 2000 International Symposium on Quality of IC Design, 2003 IEEE Latin American Test Workshop, 2003 IEEE Nano, 2004 IEEE International Conference on Computer Design, 2006 IEEE/ACM International Symposium on Low Power Electronics & Design, and 2005 IEEE Circuits and system society Outstanding Young Author Award (Chris Kim), 2006 IEEE Transactions on VLSI Systems best paper award, 2012 ACM/IEEE International Symposium on Low Power Electronics and Design best paper award, 2013 IEEE Transactions on VLSI Best paper award. Dr. Roy was a Purdue University Faculty Scholar (1998-2003). He was a Research Visionary Board Member of Motorola Labs (2002) and held the M. Gandhi Distinguished Visiting faculty at Indian Institute of Technology (Bombay) and Global Foundries visiting Chair at National University of Singapore. He has been in the editorial board of IEEE Design and Test, IEEE Transactions on Circuits and Systems, IEEE Transactions on VLSI Systems, and IEEE Transactions on Electron Devices. He was Guest Editor for Special Issue on Low-Power VLSI in the IEEE Design and Test (1994) and IEEE Transactions on VLSI Systems (June 2000), IEE Proceedings — Computers and Digital Techniques (July 2002), and IEEE Journal on Emerging and Selected Topics in Circuits and Systems (2011). Dr. Roy is a fellow of IEEE.
Title: From EDA to ’42’
Speaker: Sani Nassif (Radyalis)
Abstract: No field in Engineering has had the sustained exponential that was Moore’s Law. One of the outcomes is a rich culture of “using computers to automate the design of computers”, namely EDA, which has had to rapidly adapt to ever larger complexity. But with Moore’s era now over, it is time to apply the energy of the EDA community to other areas. This talk will explore the application of EDA techniques and knowhow to the area of Cancer Radiation Therapy, and specific technical problems that are of great interest to the Oncologists will be related to EDA and other areas of work like Big Data.
Biography: Sani received his Bachelors degree with Honors from the American University of Beirut in 1980, and his Masters and PhD degrees from Carnegie-Mellon University in 1981 and 1986 respectively. He then worked for ten years at Bell Laboratories in the general area of technology CAD, focusing on various aspects of design and technology coupling including device modeling, parameter extraction, worst case analysis, design optimization and circuit simulation. While at Bell Labs, working under Larry Nagel -the original author of Spice, he led a large team in the development of an in-house circuit simulator, named Celerity, which became the main circuit simulation tool at Bell Labs. In January 1996, he joined the then newly formed IBM Austin Research Laboratory (ARL), which was founded with a specific focus on research for the support of IBM’s Power computer systems. After twelve years of management, he stepped down to focus on technical work again with an emphasis on applying techniques developed in the VLSI-EDA area to IBM’s Smarter Planet initiative. In January 2014 Sani founded Radyalis, a company focused on applying VLSI-EDA techniques to the field of Cancer Radiation Therapy. Sani has authored one book, many book chapters, and numerous conference and journal publications. He has delivered many tutorials at top conferences and has received Best Paper awards from TCAD, ICCAD, DAC, ISQED, ICCD and SEMICON, authored invited papers to ISSCC, IEDM, IRPS, ISLPED, HOTCHIPS, and CICC. He has given Keynote and Plenary presentations at Sasimi, ESSCIRC, BMAS, SISPAD, SEMICON, VLSI-SOC, PATMOS, NMI, ASAP, GLVLSI, TAU, ISVLSI and DATE. He is an IEEE Fellow, was a member of the IBM Academy of Technology, a member of the ACM and the AAAS, and an IBM master inventor with more than 75 patents. He was the president of the IEEE Council on EDA (CEDA) for 2014 and 2015.
Title: An overview of RISC-V CPU Core implementation and sign-off using EDA management system
Speaker: Kunal Ghosh (VSD)
Abstract:
A good backend/full flow session
VSDFlow is a ‘plug-n-play’ EDA management system, built for chip designers to implement their ideas and convert to GDSII. ‘Plug-n-Play’ refers to switching between any eda tools, for eg. user can plug Cadence Genus for synthesis, Synopsys ICC for PNR and Tempus for sign-off STA. The output report will provide a QOR of entire design, which forms the starting point for design analysis. In this session, we will present how this management system works, how other tools (like Qflow and Opentimer) can be plugged in, and the full flow results on medium sized designs like picoRV32 – a RISC-V cpu core that implements RV32I instruction set
VSD stands for VLSI System Design (name of our company)
More references
https://www.vlsisystemdesign.com/wp-content/uploads/2017/10/conference_p…
Students can download for free, though there is a online course as well
‘vsdflow’ – A plug-n-play EDA management system (EMS)
Kunal Ghosh, VLSI System Design Corp. Pvt. Ltd.
Qflow: A flexible open source tool flow for digital synthesis of ASIC designs R. Timothy Edwards, eFabless.com
Opentimer: An open-source high-performance timing analysis tool for large designs
Tsung-wei Huang, Martin Wong, UIUC
Biography: Kunal Ghosh is the Director and co-founder of VLSI System Design (VSD) Corp. Pvt. Ltd. Prior to launching VSD in 2017, Kunal held several technical leadership positions at Qualcomm’s Test-chip business unit. He joined Qualcomm in 2010. He led the Physical design and STA flow development of 28nm, 16nm test-chips. At 2013, he joined Cadence as Lead Sales Application engineer for Tempus STA tool. Kunal holds a Masters degree in Electrical Engineering from Indian Institute of Technology (IIT), Bombay, India and specialized in VLSI Design & Nanotechnology.
Title: Security Evaluation of System-on-Chip (SoC) Products
Speaker: Seetharam Narasimhan (Intel)
Abstract: With the rapid proliferation of computation and connectivity in a wide variety of devices, ranging from the edge to the cloud, we are rapidly entering the era of the Internet of Things. However, underlying the desire to have all things smart and connected is the overarching fear of security breaches and loss of privacy, which are also becoming commonplace news. Security is no longer considered as an after-thought during the design and development lifecycle of System-on-Chip (SoC) products. In this talk, we shall focus on the hardware security aspects of SoCs and how a systematic evaluation at different stages of the product lifecycle (architecture, design implementation and validation) can help reduce the risk of security vulnerabilities. Proper threat modeling and iterative security-oriented design reviews can be powerful tools in preventing security loopholes, while pre- and post-silicon security validation can be used to detect any implementation issues which lead to security vulnerabilities. We shall use synthetic examples to examine how automated frameworks and evaluation tools can help in finding these issues early and mitigating their impact. We shall also highlight the challenges and research opportunities in this field.
Biography: Seetharam Narasimhan is a Security Researcher (architect) at the Security Center of Excellence, Platform Architecture Group of Intel Corporation, Hillsboro, Oregon, USA. He has a Ph.D. in Computer Engineering from Case Western Reserve University (USA) and a B.E. (Hons.) in Electronics and Telecommunication Engineering from Jadavpur University (India). His research interests include: Hardware Security, Ultralow power and reliable nanoscale circuits, as well as Bio-medical circuits and systems. He is the co-author of three book chapters, and more than 40 publications in international journals and conferences of repute.
Title: Give a Winning Presentation: From Idea to Delivery
Speaker: Laleh Behjat (University of Calgary)
Biography: Dr. Laleh Behjat is a Professor in the department of Electrical and Computer Engineering, Schulich School of Engineering, University of Calgary. She joined the University of Calgary in 2002. Dr. Behjat’s research focus is on developing EDA techniques for physical design and application of large scale optimization in EDA. Her research team has won several awards including 1st and 2nd places in ISPD 2014 and ISPD 2015 High Performance Routability Driven Placement Contests and 3rd place in DAC Design Perspective Challenge in 2015. She is an Associate Editor of the IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems and Optimization in Engineering from Springer. Dr. Behjat has been developing new and innovative methods to teach EDA to students. Her work has been published in American Society of Engineering Education (ASEE) and Grace Hopper Celebration of Women in Engineering. She won the University of Calgary, Electrical and Computer Engineering Graduate Educator Award in 2015. Dr. Behjat received the Women in Engineering and Geoscience Award from APEGA in 2015 in recognition of her work in promoting gender and diversity in engineering.

SDC@DAC

18 June 2019

Yibo Lin

No comments

Categories: SIGDA Events

System Design Contest at DAC 2023

The DAC System Design Contest focuses on object detection and classification on an embedded GPU or FPGA system. Contestants will receive a training dataset provided by Baidu, and a hidden dataset will be used to evaluate the performance of the designs in terms of accuracy and speed. Contestants will compete to create the best performing design on a Nvidia Jetson Nano GPU or Xilinx Kria KV260 FPGA board. Grand cash awards will be given to the top three teams. The award ceremony will be held at the 2023 IEEE/ACM Design Automation Conference.

Links

Contest Dates
Registration (The contest is open to both industry and academia.)
Participant Info
Evaluation Metrics
Submission Instructions
Contest Results:
- FPGA
- GPU

Organizing Committee

Jeff Goeders – Brigham Young University
Callie Hao – Georgia Institute of Technology
Meng Li – Peking University
Cheng Zhuo – Zhejiang University

ASPDAC 2019 TOC

17 June 2019

Yibo Lin

No comments

Categories: Publications

Full Citation in the ACM Digital Library

SESSION: University design contest

A wide conversion ratio, 92.8% efficiency, 3-level buck converter with adaptive on/off-time control and shared charge pump intermediate voltage regulator

Kousuke Miyaji
Yuki Karasawa
Takanobu Fukuoka

An efficient cascode 3-level buck converter with adaptive on/off-time (AOOT) control and shared charge pump (CP) intermediate voltage (V_mid) regulator is proposed and demonstrated. The conversion ratio (CR) V_out/V_in is enhanced by using the proposed AOOT control scheme, where the control switches between adaptive on-time (AOnT) and adaptive off-time (AOffT) mode according to the target CR. The proposed CP shares flying capacitor C_fly and power switches in the 3-level buck converter to generate V_mid=V_in/2 achieving both small size and low loss. The proposed 3-level buck converter is implemented in a standard 0.25μm CMOS process. 92.8% maximum efficiency and wide CR are obtained with the integrated V_mid regulator.

A three-dimensional millimeter-wave frequency-shift based CMOS biosensor using vertically stacked spiral inductors in LC oscillators

Maya Matsunaga
Taiki Nakanishi
Atsuki Kobayashi
Kiichi Niitsu

This paper presents a millimeter-wave frequency-shift-based CMOS biosensor that is capable of providing three-dimensional (3D) resolution. The vertical resolution from the sensor surface is obtained using dual-layer LC oscillators, which enable 3D target detection. The LC oscillators produce different frequency shifts from the desired resonant frequency due to the frequency-dependent complex relative permittivity of the biomolecular target. The measurement results from a 65-nm test chip demonstrated the feasibility of achieving 3D resolution.

Design of 385 x 385 μm² 0.165V 270pW fully-integrated supply-modulated OOK transmitter in 65nm CMOS for glasses-free, self-powered, and fuel-cell-embedded continuous glucose monitoring contact lens

Kenya Hayashi
Shigeki Arata
Ge Xu
Shunya Murakami
Cong Dang Bui
Takuyoshi Doike
Maya Matsunaga
Atsuki Kobayashi
Kiichi Niitsu

This work presents the lowest power consumption sub-mm² supply modulated OOK transmitter for enabling self-powered continuous glucose monitoring (CGM) contact lens. By combining the transmitter with a glucose fuel cell which functions as both the power source and sensing transducer, self-powered CGM contact lens can be emerged. The 385 x 385 μm² test chip implemented in 65-nm standard CMOS technology operates 270pW under 0.165V and successfully demonstrates self-powered operation using 2 x 2 mm² solid-state glucose fuel cell.

2D optical imaging using photosystem I photosensor platform with 32×32 CMOS biosensor array

Kiichi Niitsu
Taichi Sakabe
Mariko Miyachi
Yoshinori Yamanoi
Hiroshi Nishihara
Tatsuya Tomo
Kazuo Nakazato

This paper presents 2D imaging using photosensor platform with a newly-proposed large-scale CMOS biosensor array in 0.6-μm standard CMOS. The platform combines photosystem I (PSI) isolated from Thermosynechococcus elongatus and a large-scale CMOS biosensor array. PSI converts the absorbed photons into electrons, which are then sensed by the CMOS biosensor array. The prototyped photosensor enables CMOS-based 2D imaging using PSI for the first time.

Design of gate-leakage-based timer using an amplifier-less replica-bias switching technique in 55-nm DDC CMOS

Atsuki Kobayashi
Yuya Nishio
Kenya Hayashi
Shigeki Arata
Kiichi Niitsu

A design of gate-leakage-based timer using an amplifier-less replica-bias switching technique that can realize stable and low-voltage operation is presented. To generate stable oscillation frequency, the topology that discharges the pre-charged capacitor via a gate leaking MOS capacitor with low-leakage switch and logic circuits is employed. The test chip fabricated in 55-nm deeply depleted channel (DDC) CMOS technology achieves an Allan deviation floor of 200 ppm at a supply voltage of 350 mV in a 0.0022 mm² area.

A low-voltage CMOS electrophoresis IC using electroless gold plating for small-form-factor biomolecule manipulation

Kiichi Niitsu
Yuuki Yamaji
Atsuki Kobayashi
Kazuo Nakazato

We present sub-1-V CMOS-based electrophoresis method for small-form-factor biomolecule manipulation that is contained in a microchip. This is the first time this type of device has been presented in the literature. By combining CMOS technology with electroless gold plating, the electrode pitch can be reduced and the required input voltage can be decreased to less than 1 V. We fabricated the CMOS electrophoresis chip in a cost-competitive 0.6 μm standard CMOS process. A sample/hold circuit in each cell is used to generate a constant output from an analog input. After forming gold electrodes using an electroless gold plating technique, we were able to manipulate red food coloring with a 0–0.7 V input voltage range. The results shows that the proposed CMOS chip is effective for electrophoresis-based manipulation.

A low-voltage low-power multi-channel neural interface IC using level-shifted feedback technology

Liangjian Lyu
Yu Wang
Chixiao Chen
C. -J. Richard Shi

A low-voltage low-power 16-channel neural interface front-end IC for in-vivo neural recording applications is presented in this paper. A current reuse telescope amplifier is used to achieve better noise efficiency factor (NEF). Power efficiency factor (PEF) is further improved by reducing supply voltage with the proposed level-shifted feedback (LSFB) technique. The neural interface is fabricated in a 65 nm CMOS process. It operates under 0.6V supply voltage consuming 1.07 μW/channel. An input referred noise of 5.18 μV is measured, leading to a NEF of 2.94 and a PEF of 5.19 over 10 kHz bandwidth.

Development of a high stability, low standby power six-transistor CMOS SRAM employing a single power supply

Nobuaki Kobayashi
Tadayoshi Enomoto

We developed and applied a new circuit, called the “Self-controllable Voltage Level (SVL)” circuit, not only to expand both “write” and “read” stabilities, but also to achieve a low stand-by power and data holding capability in a single low power supply, 90-nm, 2-kbit, six-transistor CMOS SRAM. The SVL circuit can adaptively lower and higher the wordline voltages for a “read” and “write” operation, respectively. It can also adaptively lower and higher the memory cell supply voltages for the “write” and “hold” operations, and “read” operation, respectively. A Si area overhead of the SVL circuit is only 1.383% of the conventional SRAM.

Design of heterogeneously-integrated memory system with storage class memories and NAND flash memories

Chihiro Matsui
Ken Takeuchi

Heterogeneously-integrated memory system is configured with various types of storage class memories (SCMs) and NAND flash memories. SCMs are faster than NAND flash, and they are divided into memory and storage types with their characteristics. NAND flash memories are also classified by the number of stored bits per memory cell. These non-volatile memories have trade-offs among access speed, capacity and bit cost. Therefore, mix and match of various non-volatile memories are essential to simultaneously achieve the best speed and cost of the storage. This paper proposes a design methodology with unique interaction of device, circuit and system to achieve the appropriate configurations in the heterogeneously-integrated memory system for application.

A 65-nm CMOS fully-integrated circulating tumor cell and exosome analyzer using an on-chip vector network analyzer and a transmission-line-based detection window

Taiki Nakanishi
Maya Matsunaga
Shunya Murakami
Atsuki Kobayashi
Kiichi Niitsu

A fully-integrated CMOS circuit based on a vector network analyzer (VNA) and a transmission-line-based detection window for circulating tumor cell (CTC) and exosome analysis is presented. We have introduced a fully-integrated architecture, which eliminates the undesired parasitic components and enables high-sensitivity, for analysis of extremely low-concentration CTC in blood. To validate the operation of the proposed system, a test chip was fabricated using 65-nm CMOS technology. Measurement results shows the effectiveness of the approach.

Low standby power CMOS delay flip-flop with data retention capability

Nobuaki Kobayashi
Tadayoshi Enomoto

We developed and applied a new circuit, called the self-controllable voltage level (SVL) circuit, to achieve not only low standby power dissipation (P_st) while retaining data, but also to switch significantly quickly between an operational mode and a standby mode, in a single power source, 90-nm CMOS delay flip-flop (D-FF). The P_st of the developed D-FF is only 5.585 nW/bit, 14.81% of the 37.71 nW/bit of the conventional D-FF at a supply voltage (V_dd) of 1.0 V. The static-noise margin of the developed D-FF is 0.2576 V, and that of the conventional D-FF is 0.3576 V (at V_dd of 1.0 V). The Si area overhead of the SVL circuit is 11.62% of the conventional D-FF.

Accelerate pattern recognition for cyber security analysis

Mohammad Tahghighi
Wei Zhang

Network security analysis is about processing the network equipment’s log records to capture malicious and anomalous traffic. Scrutinizing huge amount of records to capture complex patterns is time consuming and difficult to parallelize. In this paper, we proposed a hardware/software co-designed system to address this problem for specific IP chaining patterns.

FPGA laboratory system supporting power measurement for low-power digital design

Marco Winzker
Andrea Schwandt

Power measurement of a digital design implementation supports development of low-power systems and gives insight into the performance of a circuit. A laboratory system is presented that consists of an FPGA board for use in a hands-on and remote laboratory. Measurement results show how the system can be utilized for teaching and research.

SESSION: Real-time embedded software

Towards limiting the impact of timing anomalies in complex real-time processors

Pedro Benedicte
Jaume Abella
Carles Hernandez
Enrico Mezzetti
Francisco J. Cazorla

Timing verification of embedded critical real-time systems is hindered by complex designs. Timing anomalies, deeply analyzed in static timing analysis, require specific solutions to bound their impact. For the first time, we study the concept and impact of timing anomalies in measurement-based timing analysis, the most used in industry, showing that they require to be considered and handled differently. In addition, we analyze anomalies in the context of Measurement-Based Probabilistic Timing Analysis, which simplifies quantifying their impact.

SeRoHAL: generation of selectively robust hardware abstraction layers for efficient protection of mixed-criticality systems

Petra R. Kleeberger
Juana Rivera
Daniel Mueller-Gritschneder
Ulf Schlichtmann

A major challenge in mixed-criticality system design is to ensure safe behavior under the influence of hardware errors while complying with cost and performance constraints. SeRoHAL generates hardware abstraction layers with software-based safety mechanisms to handle errors in peripheral interfaces. To reduce performance and memory overheads, SeRoHAL can select protection mechanisms, depending on the criticality of the hardware accesses.

We evaluated SeRoHAL on a robot arm control software. During fault injection, it prevents up to 76% of the assertion failures. Selective protection customized to the criticality of the accesses reduces the induced overheads significantly compared to protection of all hardware accesses.

Partitioned and overhead-aware scheduling of mixed-criticality real-time systems

Yuanbin Zhou
Soheil Samii
Petru Eles
Zebo Peng

Modern real-time embedded and cyber-physical systems comprise a large number of applications, often of different criticalities, executing on the same computing platform. Partitioned scheduling is used to provide temporal isolation among tasks with different criticalities. Isolation is often a requirement, for example, in order to avoid the case when a low criticality task overruns or fails in such a way that causes a failure in a high criticality task. When the number of partitions increases in mixed criticality systems, the size of the schedule table can become extremely large, which becomes a critical bottleneck due to design time and memory constraints of embedded systems. In addition, switching between partitions at runtime causes CPU overhead due to preemption. In this paper, we propose a design framework comprising a hyper-period optimization algorithm, which reduces the size of schedule table and preserves schedulability, and a re-scheduling algorithm to reduce the number of preemptions. Extensive experiments demonstrate the effectiveness of proposed algorithms and design framework.

SESSION: Hardware and system security

Layout recognition attacks on split manufacturing

Wenbin Xu
Lang Feng
Jeyavijayan (JV) Rajendran
Jiang Hu

One technique to prevent attacks from an untrusted foundry is split manufacturing, where only a part of the layout is sent to the untrusted high-end foundry, and the rest is manufactured at a trusted low-end foundry. The untrusted foundry has front-end-of-line (FEOL) layout and the original circuit netlist and attempts to identify critical components on the layout for Trojan insertion. Although defense methods for this scenario have been developed, the corresponding attack technique is not well explored. For instance, Boolean satisfiability (SAT) based bijective mapping attack is mentioned without detailed research. Hence, the defense methods are mostly evaluated with the k-security metric without actual attacks. We provide the first systematic study, to the best of our knowledge, on attack techniques in this scenario. Besides of implementing SAT-based bijective mapping attack, we develop a new attack technique based on structural pattern matching. Experimental comparison with bijective mapping attack shows that the new attack technique achieves about the same success rate with much faster speed for cases without the k-security defense, and has a much better success rate at the same runtime for cases with k-security defense. The results offer an alternative and practical interpretation for k-security in split manufacturing.

Execution of provably secure assays on MEDA biochips to thwart attacks

Tung-Che Liang
Mohammed Shayan
Krishnendu Chakrabarty
Ramesh Karri

Digital microfluidic biochips (DMFBs) have emerged as a promising platform for DNA sequencing, clinical chemistry, and point-of-care diagnostics. Recent research has shown that DMFBs are susceptible to various types of malicious attacks. Defenses proposed thus far only offer probabilistic guarantees of security due to the limitation of on-chip sensor resources. A micro-electrode-dot-array (MEDA) biochip is a next-generation DMFB that enables the sensing of on-chip droplet locations, which are captured in the form of a droplet-location map. We propose a security mechanism that validates assay execution by reconstructing the sequencing graph (i.e., the assay specification) from the droplet-location maps and comparing it against the golden sequencing graph. We prove that there is a unique (one-to-one) mapping from the set of droplet-location maps (over the duration of the assay) to the set of possible sequencing graphs. Any deviation in the droplet-location maps due to an attack is detected by this countermeasure because the resulting derived sequencing graph is not isomorphic to the original sequencing graph. We highlight the strength of the security mechanism by simulating attacks on real-life bioassays.

TAD: time side-channel attack defense of obfuscated source code

Alexander Fell
Hung Thinh Pham
Siew-Kei Lam

Program obfuscation is widely used to protect commercial software against reverse-engineering. However, an adversary can still download, disassemble and analyze binaries of the obfuscated code executed on an embedded System-on-Chip (SoC), and by correlating execution times to input values, extract secret information from the program. In this paper, we show (1) the impact of widely-used obfuscation methods on timing leakage, and (2) that well-known software countermeasures to reduce timing leakage of programs, are not always effective for low-noise environments found in embedded systems. We propose two methods for mitigating timing leakage in obfuscated codes. The first is a compiler driven method, called TAD, which removes conditional branches with distinguishable execution times for an input program. In the second method (TADCI), TAD is combined with dynamic hardware diversity by replacing primitive instructions with Custom Instructions (CIs) that exhibit non-deterministic execution times at runtime. Experimental results on the RISC-V platform show that the information leakage is reduced by 92% and 82% when TADCI is applied to the original and obfuscated source code, respectively.

SESSION: Thermal- and power-aware design and optimization

Leakage-aware thermal management for multi-core systems using piecewise linear model based predictive control

Xingxing Guo
Hai Wang
Chi Zhang
He Tang
Yuan Yuan

Performing thermal management on new generation IC chips is challenging. This is because the leakage power, which is significant in today’s chips, is nonlinearly related to temperature, resulting in a complex nonlinear control problem in thermal management. In this paper, a new dynamic thermal management (DTM) method with piecewise linear (PWL) thermal model based predictive control is proposed to solve the nonlinear control problem. First, a PWL thermal model is built by combining multiple local linear thermal models expanded at several Taylor expansion points. These Taylor expansion points are carefully selected by a systematic scheme which exploits the thermal behavior property of the IC chips. Based on the PWL thermal model, a new predictive control method is proposed to compute the future power recommendation for DTM. By approximating the nonlinearity accurately with the PWL thermal model and being equipped with predictive control technique, the new DTM can achieve an overall high quality temperature management with smooth and accurate temperature tracking. Experimental results show the new method outperforms the linear model predictive control based method in temperature management quality with negligible computing overhead.

Multi-angle bended heat pipe design using x-architecture routing with dynamic thermal weight on mobile devices

Hsuan-Hsuan Hsiao
Hong-Wen Chiou
Yu-Min Lee

Heat pipe is an effective passive cooling technique for mobile devices. This work builds a multi-angle bended heat pipe thermal model and presents an X-architecture routing engine guided by developed dynamic thermal weights to construct the heat pipe path for reducing the operating temperatures of a smartphone. Compared with a commercial tool, the error of the thermal model is only 4.79%. The routing engine can efficiently reduce the operating temperatures of application processors at least 13.20% in smartphones.

Fully-automated synthesis of power management controllers from UPF

Dustin Peterson
Oliver Bringmann

We present a methodology for automatic synthesis of power management controllers for System-on-Chip designs by using an extended version of the Unified Power Format (UPF). Our methodology takes an SoC design and a UPF-based power design, and automatically generates a power management controller in Verilog/VHDL that implements the power state machine specified in UPF. It performs a priority-based scheduling for all power state machine actions, connects each power management signal to the corresponding logic wire in the UPF design and integrates the controller into the System-on-Chip using a configurable bus interface. We implemented the proposed approach as a plugin for Synopsys Design Compiler to close the gap in today’s power management flows and evaluated it by a RISC-V System-on-Chip.

SESSION: Reverse engineering: growing more mature – and facing powerful countermeasures

Integrated flow for reverse engineering of nanoscale technologies

Bernhard Lippmann
Michael Werner
Niklas Unverricht
Aayush Singla
Peter Egger
Anja Dübotzky
Horst Gieser
Martin Rasche
Oliver Kellermann
Helmut Graeb

In view of potential risks of piracy and malicious manipulation of complex integrated circuits built in technologies of 45 nm and less, there is an increasing need for an effective and efficient process of reverse engineering. This paper provides an overview of the current process and details on a new tool for the acquisition and synthesis of large area images and the extraction of a layout. For the first time the error between the generated layout and the known drawn GDS will be compared quantitatively as a figure of merit (FOM). From this layout a circuit graph of an ECC encryption and the partitioning in circuit blocks will be extracted.

NETA: when IP fails, secrets leak

Travis Meade
Jason Portillo
Shaojie Zhang
Yier Jin

Assuring the quality and the trustworthiness of third party resources has been a hard problem to tackle. Researchers have shown that analyzing Integrated Circuits (IC), without the aid of golden models, is challenging. In this paper we discuss a toolset, NETA, designed to aid IP users in assuring the confidentiality, integrity, and accessibility of their IC or third party IP core. The discussed toolset gives access to a slew of gate-level analysis tools, many of which are heuristic-based, for the purposes of extracting high-level circuit design information. NETA majorly comprises the following tools: RELIC, REBUS, REPCA, REFSM, and REPATH.

The first step involved in netlist analysis falls to signal classification. RELIC uses a heuristic based fan-in structure matcher to determine the uniqueness of each signal in the netlist. REBUS finds word groups by leveraging the data bus in the netlist in conjunction with RELIC’s signal comparison through heuristic verification of input structures. REPCA on the other hand tries to improve upon the standard bruteforce RELIC comparison by leveraging the data analysis technique of PCA and a sparse RELIC analysis on all signals. Given a netlist and a set of registers, REFSM reconstructs the logic which represents the behavior of a particular register set over the course of the operation of a given netlist. REFSM has been shown useful for examining register interaction at a higher level. REPATH, similar to REFSM, finds a series of input patterns which forces a logical FSM initialize with some reset state into a state specified by the user. Finally, REFSM 2 is introduced to utilizes linear time precomputation to improve the original REFSM.

Machine learning and structural characteristics for reverse engineering

Johanna Baehr
Alessandro Bernardini
Georg Sigl
Ulf Schlichtmann

In the past years, much of the research into hardware reverse engineering has focused on the abstraction of gate level netlists to a human readable form. However, none of the proposed methods consider a realistic reverse engineering scenario, where the netlist is physically extracted from a chip. This paper analyzes how errors caused by this extraction and the later partitioning of the netlist affect the ability to identify the functionality. Current formal verification based methods, which compare against a golden model, are incapable of dealing with such erroneous netlists. Two new methods are proposed, which focus on the idea that structural similarity implies functional similarity. The first approach uses fuzzy structural similarity matching to compare the structural characteristics of an unknown design against designs in a golden model library using machine learning. The second approach proposes a method for inexact graph matching using fuzzy graph isomorphisms, based on the functionalities of gates used within the design. For realistic error percentages, both approaches are able to match more than 90% of designs correctly. This is an important first step for hardware reverse engineering methods beyond formal verification based equivalence matching.

Towards cognitive obfuscation: impeding hardware reverse engineering based on psychological insights

Carina Wiesen
Nils Albartus
Max Hoffmann
Steffen Becker
Sebastian Wallat
Marc Fyrbiak
Nikol Rummel
Christof Paar

In contrast to software reverse engineering, there are hardly any tools available that support hardware reversing. Therefore, the reversing process is conducted by human analysts combining several complex semi-automated steps. However, countermeasures against reversing are evaluated solely against mathematical models. Our research goal is the establishment of cognitive obfuscation based on the exploration of underlying psychological processes. We aim to identify problems which are hard to solve for human analysts and derive novel quantification metrics, thus enabling stronger obfuscation techniques.

Insights into the mind of a trojan designer: the challenge to integrate a trojan into the bitstream

Maik Ender
Pawel Swierczynski
Sebastian Wallat
Matthias Wilhelm
Paul Martin Knopp
Christof Paar

The threat of inserting hardware Trojans during the design, production, or in-field poses a danger for integrated circuits in real-world applications. A particular critical case of hardware Trojans is the malicious manipulation of third-party FPGA configurations. In addition to attack vectors during the design process, FPGAs can be infiltrated in a non-invasive manner after shipment through alterations of the bitstream. First, we present an improved methodology for bitstream file format reversing. Second, we introduce a novel idea for Trojan insertion.

SESSION: All about PIM

GraphSAR: a sparsity-aware processing-in-memory architecture for large-scale graph processing on ReRAMs

Guohao Dai
Tianhao Huang
Yu Wang
Huazhong Yang
John Wawrzynek

Large-scale graph processing has drawn great attention in recent years. The emerging metal-oxide resistive random access memory (ReRAM) and ReRAM crossbars have shown huge potential in accelerating graph processing. However, the sparse feature of natural graphs hinders the performance of graph processing on ReRAMs. Previous work of graph processing on ReRAMs stored and computed edges separately, leading to high energy consumption and long latency of transferring data. In this paper, we present GraphSAR, a sparsity-aware processing-in-memory large-scale graph processing accelerator on ReRAMs. Computations over edges are performed in the memory, eliminating overheads of transferring edges. Moreover, graphs are divided considering the sparsity. Subgraphs with low densities are further divided into smaller ones to minimize the waste of memory space. According to our extensive experimental results, GraphSAR achieves 4.43x energy reduction and 1.85x speedup (8.19x lower energy-delay product, EDP) against previous graph processing architecture on ReRAMs (GraphR [1]).

ParaPIM: a parallel processing-in-memory accelerator for binary-weight deep neural networks

Shaahin Angizi
Zhezhi He
Deliang Fan

Recent algorithmic progression has brought competitive classification accuracy despite constraining neural networks to binary weights (+1/-1). These findings show remarkable optimization opportunities to eliminate the need for computationally-intensive multiplications, reducing memory access and storage. In this paper, we present ParaPIM architecture, which transforms current Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) sub-arrays to massively parallel computational units capable of running inferences for Binary-Weight Deep Neural Networks (BWNNs). ParaPIM’s in-situ computing architecture can be leveraged to greatly reduce energy consumption dealing with convolutional layers, accelerate BWNNs inference, eliminate unnecessary off-chip accesses and provide ultra-high internal bandwidth. The device-to-architecture co-simulation results indicate ~4x higher energy efficiency and 7.3x speedup over recent processing-in-DRAM acceleration, or roughly 5x higher energy-efficiency and 20.5x speedup over recent ASIC approaches, while maintaining inference accuracy comparable to baseline designs.

CompRRAE: RRAM-based convolutional neural network accelerator with reduced computations through a runtime activation estimation

Xizi Chen
Jingyang Zhu
Jingbo Jiang
Chi-Ying Tsui

Recently Resistive-RAM (RRAM) crossbar has been used in the design of the accelerator of convolutional neural networks (CNNs) to solve the memory wall issue. However, the intensive multiply-accumulate computations (MACs) executed at the crossbars during the inference phase are still the bottleneck for the further improvement of energy efficiency and throughput. In this work, we explore several methods to reduce the computations for the RRAM-based CNN accelerators. First, the output sparsity resulting from the widely employed Rectified Linear Unit is exploited, and a significant portion of computations are bypassed through an early detection of the negative output activations. Second, an adaptive approximation is proposed to terminate the MAC early when the sum of the partial results of the remaining computations is considered to be within a certain range of the intermediate accumulated result and thus has an insignificant contribution to the inference. In order to determine these redundant computations, a novel runtime estimation on the maximum and minimum values of each output activation is developed and used during the MAC operation. Experimental results show that around 70% of the computations can be reduced during the inference with a negligible accuracy loss smaller than 0.2%. As a result, the energy efficiency and the throughput are improved by over 2.9 and 2.8 times, respectively, compared with the state-of-the-art RRAM-based accelerators.

CuckooPIM: an efficient and less-blocking coherence mechanism for processing-in-memory systems

Sheng Xu
Xiaoming Chen
Ying Wang
Yinhe Han
Xiaowei Li

The ever-growing processing ability of in-memory processing logic makes the data sharing and coherence between processors and in-memory logic play an increasingly important role in Processing-in-Memory (PIM) systems. Unfortunately, the existing state-of-the-art coarse-grained PIM coherence solutions suffer from unnecessary data movements and stalls caused by a data ping-pong issue. This work proposes CuckooPIM, a criticality-aware and less-blocking coherence mechanism, which can effectively avoid unnecessary data movements and stalls. Experiments reveal that CuckooPIM achieves 1.68x speedup on average comparing with coarse-grained PIM coherence.

AERIS: area/energy-efficient 1T2R ReRAM based processing-in-memory neural network system-on-a-chip

Jinshan Yue
Yongpan Liu
Fang Su
Shuangchen Li
Zhe Yuan
Zhibo Wang
Wenyu Sun
Xueqing Li
Huazhong Yang

ReRAM-based processing-in-memory (PIM) architecture is a promising solution for deep neural networks (NN), due to its high energy efficiency and small footprint. However, traditional PIM architecture has to use a separate crossbar array to store either positive or negative (P/N) weights, which limits both energy efficiency and area efficiency. Even worse, imbalance running time of different layers and idle ADCs/DACs even lower down the whole system efficiency. This paper proposes AERIS, an Area/Energy-efficient 1T2R ReRAM based processing-In-memory NN System-on-a-chip to enhance both energy and area efficiency. We propose an area-efficient 1T2R ReRAM structure to represent both P/N weights in a single array, and a reference current cancelling scheme (RCS) is also presented for better accuracy. Moreover, a layer-balance scheduling strategy, as well as the power gating technique for interface circuits, such as ADCs/DACs, is adopted for higher energy efficiency. Experiment results show that compared with state-of-the-art ReRAM-based architectures, AERIS achieves 8.5x/1.3x peak energy/area efficiency improvements in total, due to layer-balance scheduling for different layers, power gating of interface circuits, and 1T2R ReRAM circuits. Furthermore, we demonstrate that the proposed RCS compensates the non-ideal factors of ReRAM and improves NN accuracy by 5.2% in the XNOR net on CIFAR-10 dataset.

SESSION: Design for reliability

IR-ATA: IR annotated timing analysis, a flow for closing the loop between PDN design, IR analysis & timing closure

Ashkan Vakil
Houman Homayoun
Avesta Sasan

This paper presents IR-ATA, a novel flow for modeling the timing impact of IR drop during the physical design and timing closure of an ASIC chip. We first illustrate how the current and conventional mechanism for budgeting the IR drop and voltage noise (by using hard margins) lead to sub-optimal design. Consequently, we propose a new approach for modeling and margining against voltage noise, such that each timing path is margined based on its own topology and its own view of voltage noise. By having such a path based margining mechanism, the margins for IR drop and voltage noise for most timing paths in the design are safely relaxed. The reduction in the margin increases the available timing slack that could be used for improving the power, performance, and area of a design. Finally, we illustrate how IR-ATA could be used to track the timing impact of physical or PDN changes, allowing the physical designers to explore tradeoffs that were previously, for lack of methodology, not possible.

Learning-based prediction of package power delivery network quality

Yi Cao
Andrew B. Kahng
Joseph Li
Abinash Roy
Vaishnav Srinivas
Bangqi Xu

Power Delivery Network (PDN) is a critical component in modern System-on-Chip (SoC) designs. With the rapid development in applications, the quality of PDN, especially Package (PKG) PDN, determines whether a sufficient amount of power can be delivered to critical computing blocks. In conventional PKG design, PDN design typically takes multiple weeks including many manual iterations for optimization. Also, there is a large discrepancy between (i) quick simulation tools used for quick PDN quality assessment during the design phase, and (ii) the golden extraction tool used for signoff. This discrepancy may introduce more iterations. In this work, we propose a learning-based methodology to perform PKG PDN quality assessment both before layout (when only bump/ball maps, but no package routing, are available) and after layout (when routing is completed but no signoff analysis has been launched). Our contributions include (i) identification of important parameters to estimate the achievable PKG PDN quality in terms of bump inductance; (ii) the avoidance of unnecessary manual trial and error overheads in PKG PDN design; and (iii) more accurate design-phase PKG PDN quality assessment. We validate accuracy of our predictive models on PKG designs from industry. Experimental results show that, across a testbed of 17 industry PKG designs, we can predict bump inductance with an average absolute percentage error of 21.2% or less, given only pinmap and technology information. We improve prediction accuracy to achieve an average absolute percentage error of 17.5% or less when layout information is considered.

Tackling signal electromigration with learning-based detection and multistage mitigation

Wei Ye
Mohamed Baker Alawieh
Yibo Lin
David Z. Pan

With the continuous scaling of integrated circuit (IC) technologies, electromigration (EM) prevails as one of the major reliability challenges facing the design of robust circuits. With such aggressive scaling in advanced technology nodes, signal nets experience high switching frequency, which further exacerbates the signal EM effect. Traditionally, signal EM fixing approaches analyze EM violations after the routing stage and repair is attempted via iterative incremental routing or cell resizing techniques. However, these “EM-analysis-then fix” approaches are ill-equipped when faced with the ever-growing EM violations in advanced technology nodes. In this work, we propose a novel signal EM handling framework that (i) incorporates EM detection and fixing techniques into earlier stages of the physical design process, and (ii) integrates machine learning based detection alongside a multistage mitigation. Experimental results demonstrate that our framework can achieve 15x speedup when compared to the state-of-the-art EDA tool while achieving similar performance in terms of EM mitigation and overhead.

ROBIN: incremental oblique interleaved ECC for reliability improvement in STT-MRAM caches

Elham Cheshmikhani
Hamed Farbeh
Hossein Asadi

Spin-Transfer Torque Magnetic RAM (STT-MRAM) is a promising alternative for SRAMs in on-chip cache memories. Besides all its advantages, high error rate in STT-MRAM is a major limiting factor for on-chip cache memories. In this paper, we first present a comprehensive analysis that reveals that the conventional Error-Correcting Codes (ECCs) lose their efficiency due to data-dependent error patterns, and then propose an efficient ECC configuration, so-called ROBIN, to improve the correction capability. The evaluations show that the inefficiency of conventional ECC increases the cache error rate by an average of 151.7% while ROBIN reduces this value by more than 28.6x.

Aging-aware chip health prediction adopting an innovative monitoring strategy

Yun-Ting Wang
Kai-Chiang Wu
Chung-Han Chou
Shih-Chieh Chang

Concerns exist that the reliability of chips is worsening because of downscaling technology. Among various reliability challenges, device aging is a dominant concern because it degrades circuit performance over time. Traditionally, runtime monitoring approaches are proposed to estimate aging effects. However, such techniques tend to predict and monitor delay degradation status for circuit mitigation measures rather than the health condition of the chip. In this paper, we propose an aging-aware chip health prediction methodology that adapts to workload conditions and process, supply voltage, and temperature variations. Our prediction methodology adopts an innovative on-chip delay monitoring strategy by tracing representative aging-aware delay behavior. The delay behavior is then fed into a machine learning engine to predict the age of the tested chips. Experimental results indicate that our strategy can obtain 97.40% accuracy with 4.14% area overhead on average. To the authors’ knowledge, this is the first method that accurately predicts current chip age and provides information regarding future chip health.

SESSION: New advances in emerging computing paradigms

Compiling SU(4) quantum circuits to IBM QX architectures

Alwin Zulehner
Robert Wille

The Noisy Intermediate-Scale Quantum (NISQ) technology is currently investigated by major players in the field to build the first practically useful quantum computer. IBM QX architectures are the first ones which are already publicly available today. However, in order to use them, the respective quantum circuits have to be compiled for the respectively used target architecture. While first approaches have been proposed for this purpose, they are infeasible for a certain set of SU(4) quantum circuits which have recently been introduced to benchmark corresponding compilers. In this work, we analyze the bottlenecks of existing compilers and provide a dedicated method for compiling this kind of circuits to IBM QX architectures. Our experimental evaluation (using tools provided by IBM) shows that the proposed approach significantly outperforms IBM’s own solution regarding fidelity of the compiled circuit as well as runtime. Moreover, the solution proposed in this work has been declared winner of the IBM QISKit Developer Challenge. An implementation of the proposed methodology is publicly available at http://iic.jku.at/eda/research/ibm_qx_mapping.

Quantum circuit compilers using gate commutation rules

Toshinari Itoko
Rudy Raymond
Takashi Imamichi
Atsushi Matsuo
Andrew W. Cross

The use of noisy intermediate-scale quantum computers (NISQCs), which consist of dozens of noisy qubits with limited coupling constraints, has been increasing. A circuit compiler, which transforms an input circuit into an equivalent output circuit conforming the coupling constraints with as few additional gates as possible, is essential for running applications on NISQCs. We propose a formulation and two algorithms exploiting gate commutation rules to obtain a better circuit compiler.

Scalable design for field-coupled nanocomputing circuits

Marcel Walter
Robert Wille
Frank Sill Torres
Daniel Große
Rolf Drechsler

Field-coupled Nanocomputing (FCN) technologies are considered as a solution to overcome physical boundaries of conventional CMOS approaches. But despite ground breaking advances regarding their physical implementation as e.g. Quantum-dot Cellular Automata (QCA), Nanomagnet Logic (NML), and many more, there is an unsettling lack of methods for large-scale design automation of FCN circuits. In fact, design automation for this class of technologies still is in its infancy – heavily relying either on manual labor or automatic methods which are applicable for rather small functionality only. This work presents a design method which – for the first time – allows for the scalable design of FCN circuits that satisfy dedicated constraints of these technologies. The proposed scheme is capable of handling around 40000 gates within seconds while the current state-of-the-art takes hours to handle around 20 gates. This is confirmed by experimental results on the layout level for various established benchmarks libraries.

BDD-based synthesis of optical logic circuits exploiting wavelength division multiplexing

Ryosuke Matsuo
Jun Shiomi
Tohru Ishihara
Hidetoshi Onodera
Akihiko Shinya
Masaya Notomi

Optical circuits using nanophotonic devices attract significant interest due to its ultra-high speed operation. As a consequence, the synthesis methods for the optical circuits also attract increasing attention. However, existing methods for synthesizing optical circuits mostly rely on straight-forward mappings from established data structures such as Binary Decision Diagram (BDD). The strategy of simply mapping a BDD to an optical circuit sometimes results in an explosion of size and involves significant power losses in branches and optical devices. To address these issues, this paper proposes a method for reducing the size of BDD-based optical logic circuits exploiting wavelength division multiplexing (WDM). The paper also proposes a method for reducing the number of branches in a BDD-based circuit, which reduces the power dissipation in laser sources. Experimental results obtained using a partial product accumulation circuit in parallel multipliers demonstrates significant advantages of our method over existing approaches in terms of area and power consumption.

Hybrid binary-unary hardware accelerator

S. Rasoul Faraji
Kia Bazargan

Stochastic computing has been used in recent years to create designs with significantly smaller area by harnessing unary encoding of data. However, the low area advantage comes at an exponential price in latency, making the area x delay cost unattractive. In this paper, we present a novel method which uses a hybrid binary / unary representation to perform computations. We first divide the input range into a few sub-regions, perform unary computations on each sub-region individually, and finally pack the outputs of all sub-regions back to compact binary. Moreover, we propose a synthesis methodology and a regression model to predict an optimal or sub-optimal design in the design space. The proposed method is especially well-suited to FPGAs due to the abundant availability of routing and flip-flop resources. To the best of our knowledge, we are the first to show a scalable method based on the principles of stochastic computing that can beat conventional binary in terms of a real cost, i.e., area x delay. Our method outperforms the binary and fully unary methods on a number of functions and on a common edge detection algorithm. In terms of area x delay cost, our cost is on average only 2.51% and 10.2% of the binary for 8- and 10-bit resolutions, respectively. These numbers are 2–3 orders of magnitude better than the results of traditional stochastic methods. Our method is not competitive with the binary method for high-resolution oscillating functions such as sin(15x).

SESSION: Design, testing, and fault tolerance of neuromorphic systems

Fault tolerance in neuromorphic computing systems

Mengyun Liu
Lixue Xia
Yu Wang
Krishnendu Chakrabarty

Resistive Random Access Memory (RRAM) and RRAM-based computing systems (RCS) provide energy-efficient technology options for neuromorphic computing. However, the applicability of RCS is limited by reliability problems that arise from the immature fabrication process. In order to take advantage of RCS in practical applications, fault-tolerant design is a key challenge. We present a survey of fault-tolerant designs for RRAM-based neuromorphic computing systems. We first describe RRAM-based crossbars and training architectures in RCS. Following this, we classify fault models into different categories, and review post-fabrication testing methods. Subsequently, online testing methods are presented. Finally, we present various fault-tolerant techniques that were designed to tolerate different types of RRAM faults. The methods reviewed in this survey represent recent trends in fault-tolerant designs of RCS, and are expected motivate further research in this field.

Build reliable and efficient neuromorphic design with memristor technology

Bing Li
Bonan Yan
Chenchen Liu
Hai (Helen) Li

Neuromorphic computing is a revolutionary approach of computation, which attempts to mimic the human brain’s mechanism for extremely high implementation efficiency and intelligence. Latest research studies showed that the memristor technology has a great potential for realizing power- and area-efficient neuromorphic computing systems (NCS). On the other hand, the memristor device processing is still under development. Unreliable devices can severely degrade system performance, which arises as one of the major challenges in developing memristor-based NCS. In this paper, we first review the impacts of the limited reliability of memristor devices and summarize the recent research progress in building reliable and efficient memristor-based NCS. In the end, we discuss the main difficulties and the trend in memristor-based NCS development.

Reliable in-memory neuromorphic computing using spintronics

Christopher Münch
Rajendra Bishnoi
Mehdi B. Tahoori

Recently Spin Transfer Torque Random Access Memory (STT-MRAM) technology has drawn a lot of attention for the direct implementation of neural networks, because it offers several advantages such as near-zero leakage, high endurance, good scalability, small foot print and CMOS compatibility. The storing device in this technology, the Magnetic Tunnel Junction (MTJ), is developed using magnetic layers that requires new fabrication materials and processes. Due to complexities of fabrication steps and materials, MTJ cells are subject to various failure mechanisms. As a consequence, the functionality of the neuromorphic computing architecture based on this technology is severely affected. In this paper, we have developed a framework to analyze the functional capability of the neural network inference in the presence of the several MTJ defects. Using this framework, we have demonstrated the required memory array size that is necessary to tolerate the given amount of defects and how to actively decrease this overhead by disabling parts of the network.

SESSION: Memory-centric design and synthesis

A staircase structure for scalable and efficient synthesis of memristor-aided logic

Alwin Zulehner
Kamalika Datta
Indranil Sengupta
Robert Wille

The identification of the memristor as fourth fundamental circuit element and, eventually, its fabrication in the HP labs provide new capabilities for in-memory computing. While there already exist sophisticated methods for realizing logic gates with memristors, mapping them to crossbar structures (which can easily be fabricated) still constitutes a challenging task. This is particularly the case since several (complementary) design objectives have to be satisfied, e.g. the design method has to be scalable, should yield designs requiring a low number of timesteps and utilized memristors, and a layout should result that is hardly skewed. However, all solutions proposed thus far only focus on one of these objectives and hardly address the other ones. Consequently, rather imperfect solutions are generated by state-of-the-art design methods for memristor-aided logic thus far. In this work, we propose a corresponding automatic design solution which addresses all these design objectives at once. To this end, a staircase structure is utilized which employs an almost square-like layout and remains perfectly scalable while, at the same time, keeps the number of timesteps and utilized memristors close to the minimum. Experimental evaluations confirm that the proposed approach indeed allows to satisfy all design objectives at once.

On-chip memory optimization for high-level synthesis of multi-dimensional data on FPGA

Daewoo Kim
Sugil Lee
Jongeun Lee

It is very challenging to design an on-chip memory architecture for high-performance kernels with large amount of computation and data. The on-chip memory architecture must support efficient data access from both the computation part and the external memory part, which often have very different expectations about how data should be accessed and stored. Previous work provides only a limited set of optimizations. In this paper we show how to fundamentally restructure on-chip buffers, by decoupling logical array view from the physical buffer view, and providing general mapping schemes for the two. Our framework considers the entire data flow from the external memory to the computation part in order to minimize resource usage without creating performance bottleneck. Our experimental results demonstrate that our proposed technique can generate solutions that reduce memory usage significantly (2X over the conventional method), and successfully generate optimized on-chip buffer architectures without costly design iterations for highly optimized computation kernels.

HUBPA: high utilization bidirectional pipeline architecture for neuromorphic computing

Houxiang Ji
Li Jiang
Tianjian Li
Naifeng Jing
Jing Ke
Xiaoyao Liang

Training Convolutional Neural Networks(CNNs) is both memory-and computation-intensive. The resistive random access memory (ReRAM) has shown its advantage to accelerate such tasks with high energy-efficiency. However, the ReRAM-based pipeline architecture suffers from the low utilization of computing resource, caused by the imbalanced data throughput in different pipeline stages because of the inherent down-sampling effect in CNNs and the inflexible usage of ReRAM cells. In this paper, we propose a novel ReRAM-based bidirectional pipeline architecture, named HUBPA, to accelerate the training with higher utilization of the computing resource. Two stages of the CNN training, forward and backward propagations, are scheduled in HUBPA dynamically to share the computing resource. We design an accessory control scheme for the context switch of these two tasks. We also propose an efficient algorithm to allocate computing resource for each neural network layer. Our experiment results show that, compared with state-of-the-art ReRAM pipeline architecture, HUBPA improves the performance by 1.7X and reduces the energy consumption by 1.5X, based on the current benchmarks.

SESSION: Efficient modeling of analog, mixed signal and arithmetic circuits

Efficient sparsification of dense circuit matrices in model order reduction

Charalampos Antoniadis
Nestor Evmorfopoulos
Georgios Stamoulis

The integration of more components into ICs due to the ever increasing technology scaling has led to very large parasitic networks consisting of million of nodes, which have to be simulated in many times or frequencies to verify the proper operation of the chip. Model Order Reduction techniques have been employed routinely to substitute the large scale parasitic model by a model of lower order with similar response at the input/output ports. However, all established MOR techniques result in dense system matrices that render their simulation impractical. To this end, in this paper we propose a methodology for the sparsification of the dense circuit matrices resulting from Model Order Reduction, which employs a sequence of algorithms based on the computation of the nearest diagonally dominant matrix and the sparsification of the corresponding graph. Experimental results indicate that a high sparsity ratio of the reduced system matrices can be achieved with very small loss of accuracy.

Spectral approach to verifying non-linear arithmetic circuits

Cunxi Yu
Tiankai Su
Atif Yasin
Maciej Ciesielski

This paper presents a fast and effective computer algebraic method for analyzing and verifying non-linear integer arithmetic circuits using a novel algebraic spectral model. It introduces a concept of algebraic spectrum, a numerical form of polynomial expression; it uses the distribution of coefficients of the monomials to determine the type of arithmetic function under verification. In contrast to previous works, the proof of functional correctness is achieved by computing an algebraic spectrum combined with local rewriting of word-level polynomials. The speedup is achieved by propagating coefficients through the circuit using And-Inverter Graph (AIG) datastructure. The effectiveness of the method is demonstrated with experiments including standard and Booth multipliers, and other synthesized non-linear arithmetic circuits up to 1024 bits containing over 12 million gates.

S²-PM: semi-supervised learning for efficient performance modeling of analog and mixed signal circuits

Mohamed Baker Alawieh
Xiyuan Tang
David Z. Pan

As integrated circuit technologies continue to scale, variability modeling is becoming more crucial yet, more challenging. In this paper, we propose a novel performance modeling method based on semi-supervised co-learning. We exploit the multiple representations of process variation in any analog and mixed signal circuit to establish a co-learning framework where unlabeled samples are leveraged to improve the model accuracy without enduring any simulation cost. Practically, our proposed method relies on a small set of labeled data, and the availability of no-cost unlabeled data to efficiently build accurate performance model for any analog and mixed signals circuit design. Our numerical experiments demonstrate that the proposed approach achieves up to 30% reduction in simulation cost compared to the state-of-the-art modeling technique without surrendering any accuracy.

SESSION: Logic and precision optimization for neural network designs

Energy-efficient, low-latency realization of neural networks through boolean logic minimization

Mahdi Nazemi
Ghasem Pasandi
Massoud Pedram

Deep neural networks have been successfully deployed in a wide variety of applications including computer vision and speech recognition. To cope with computational and storage complexity of these models, this paper presents a training method that enables a radically different approach for realization of deep neural networks through Boolean logic minimization. The aforementioned realization completely removes the energy-hungry step of accessing memory for obtaining model parameters, consumes about two orders of magnitude fewer computing resources compared to realizations that use floating-point operations, and has a substantially lower latency.

Log-quantized stochastic computing for memory and computation efficient DNNs

Hyeonuk Sim
Jongeun Lee

For energy efficiency, many low-bit quantization methods for deep neural networks (DNNs) have been proposed. Among them, logarithmic quantization is being highlighted showing acceptable deep learning performance. It also simplifies high-cost multipliers as well as reducing memory footprint drastically. Meanwhile, stochastic computing (SC) was proposed for low-cost DNN acceleration and the recently proposed SC multiplier improved the accuracy and latency significantly which are main drawbacks of SC. However, in their binary-interfaced system which yet costs much less than storing all stochastic stream, quantization is basically linear as same as conventional fixed-point binary. We applied logarithmically quantized DNNs to the state-of-the-art SC multiplier and studied how it can benefit. We found that SC multiplication on logarithmically quantized input is more accurate and it can help fine-tuning process. Furthermore, we designed the much low-cost SC-DNN accelerator utilizing the reduced complexity of inputs. Finally, while logarithmic quantization benefits data flow, proposed architecture achieves 40% and 24% less area and power consumption than the previous SC-DNN accelerator. Its area X latency product is smaller even than the shifter based accelerator.

Cell division: weight bit-width reduction technique for convolutional neural network hardware accelerators

Hanmin Park
Kiyoung Choi

The datapath bit-width of hardware accelerators for convolutional neural network (CNN) inference is generally chosen to be wide enough, so that they can be used to process upcoming unknown CNNs. Here we introduce the cell division technique, which is a variant of function-preserving transformations. With this technique, it is guaranteed that CNNs that have weights quantized to fixed-point format of arbitrary bit-widths, can be transformed to CNNs with less bit-widths of weights without any accuracy drop (or any accuracy change). As a result, CNN hardware accelerators are released from the weight bit-width constraint, which has been preventing them from having narrower datapaths. In addition, CNNs that have wider weight bit-widths than those assumed by a CNN hardware accelerator can be executed on the accelerator. Experimental results on LeNet-300-100, LeNet-5, AlexNet, and VGG-16 show that weights can be reduced down to 2–5 bits with 2.5X–5.2X decrease in weight storage requirement and of course without any accuracy drop.

SESSION: Modern mask optimization: from shallow to deep learning

LithoROC: lithography hotspot detection with explicit ROC optimization

Wei Ye
Yibo Lin
Meng Li
Qiang Liu
David Z. Pan

As modern integrated circuits scale up with escalating complexity of layout design patterns, lithography hotspot detection, a key stage of physical verification to ensure layout finishing and design closure, has raised a higher demand on its efficiency and accuracy. Among all the hotspot detection approaches, machine learning distinguishes itself for achieving high accuracy while maintaining low false alarms. However, due to the class imbalance problem, the conventional practice which uses the accuracy and false alarm metrics to evaluate different machine learning models is becoming less effective. In this work, we propose the use of the area under the ROC curve (AUC), which provides a more holistic measure for imbalanced datasets compared with the previous methods. To systematically handle class imbalance, we further propose the surrogate loss functions for direct AUC maximization as a substitute for the conventional cross-entropy loss. Experimental results demonstrate that the new surrogate loss functions are promising to outperform the cross-entropy loss when applied to the state-of-the-art neural network model for hotspot detection.

Detecting multi-layer layout hotspots with adaptive squish patterns

Haoyu Yang
Piyush Pathak
Frank Gennari
Ya-Chieh Lai
Bei Yu

Layout hotpot detection is one of the critical steps in modern integrated circuit design flow. It aims to find potential weak points in layouts before feeding them into manufacturing stage. Rapid development of machine learning has made it a preferable alternative of traditional hotspot detection solutions. Recent researches range from layout feature extraction and learning model design. However, only single layer layout hotspots are considered in state-of-the-art hotspot detectors and certain defects such as metal-to-via failures are not naturally supported. In this paper, we propose an adaptive squish representation for multilayer layouts, which is storage efficient, lossless and compatible with deep neural networks. We conduct experiments on 14nm industrial designs with a metal layer and its two adjacent via layers that contain metal-to-via hotspots. Results show that the adaptive squish representation can achieve satisfactory hotspot detection accuracy by incorporating a medium-sized convolutional neural networks.

A local optimal method on DSA guiding template assignment with redundant/dummy via insertion

Xingquan Li
Bei Yu
Jianli Chen
Wenxing Zhu

As an emerging manufacture technology, block copolymer directed self-assembly (DSA) is promising for via layer fabrication. Meanwhile, redundant via insertion is considered as an essential step for yield improvement. For better reliability and manufacturability, in this paper, we concurrently consider DSA guiding template assignment with redundant via and dummy via insertion at post-routing stage. Firstly, by analyzing the structure property of guiding templates, we propose a building-block based solution expression to discard redundant solutions. Then, honoring the compact solution expression, we construct a conflict graph with dummy via insertion, and then formulate the problem to an integer linear programming (ILP). To make a good trade-off between solution quality and runtime, we relax the ILP to an unconstrained nonlinear programming (UNP). Finally, a line search optimization algorithm is proposed to solve the UNP. Experimental results verify the effectiveness of our new solution expression and the efficiency of our proposed algorithm.

Deep learning-based framework for comprehensive mask optimization

Bo-Yi Yu
Yong Zhong
Shao-Yun Fang
Hung-Fei Kuo

With the dramatically increase of design complexity and the advance of semiconductor technology nodes, huge difficulties appear during design for manufacturability with existing lithography solutions. Sub-resolution assist feature (SRAF) insertion and optical proximity correction (OPC) are both inevitable resolution enhancement techniques (RET) to maximize process window and ensure feature printability. Conventional model-based SRAF insertion and OPC methods are widely applied in industrial application but suffer from the extremely long runtime due to iterative optimization process. In this paper, we propose the first work developing a deep learning framework to simultaneously perform SRAF insertion and edge-based OPC. In addition, to make the optimized masks more reliable and convincing for industrial application, we employ a commercial lithography simulation tool to consider the quality of wafer image with various lithographic metrics. The effectiveness and efficiency of the proposed framework are demonstrated in experimental results, which also show the success of machine learning-based lithography optimization techniques for the current complex and large-scale circuit layouts.

SESSION: System level modelling methods I

AxDNN: towards the cross-layer design of approximate DNNs

Yinghui Fan
Xiaoxi Wu
Jiying Dong
Zhi Qi

Thanks for the inborn error resistance of neural networks, approximate computing has become a promising and hardware friendly technique to improve the energy efficiency of DNNs. From the layer of algorithms, architectures, to circuits, there are many possibilities to implement approximate DNNs. However, the complicated interaction between major design concerns, e.g., power performance, and the lack of an efficient simulator cross multiple design layers have generated suboptimal solutions of approximate DNNs through the conventional design method. In this paper, we present a systematical framework towards the cross-layer design of approximation DNNs. By introducing hardware imperfection to the training phase, the accuracy of DNN models can be recovered by up to 5.32% when the most aggressive approximate multiplier has been used. Integrated with the techniques of activation pruning and voltage scaling, the energy efficiency of the approximate DNN accelerator can be improved by 52.5% on average. We also build a pre-RTL simulation environment where we can easily express accelerator architectures, try the combination of different approximate strategies, and evaluate the power consumption. Experiments demonstrate the pre-RTL simulation has achieved ~20X speed up compared with traditional RTL method when evaluating the same target. The convenient pre-RTL simulation helps us to quickly figure out the trade-off between accuracy and energy at the design stage for an approximate DNN accelerator.

Simulate-the-hardware: training accurate binarized neural networks for low-precision neural accelerators

Jiajun Li
Ying Wang
Bosheng Liu
Yinhe Han
Xiaowei Li

This work investigates how to effectively train binarized neural networks (BNNs) for the specialized low-precision neural accelerators. When mapping BNNs onto the specialized neural accelerators that adopt fixed-point feature data representation and binary parameters, due to the operation overflow caused by short fixed-point coding, the BNN inference results from the deep learning frameworks on CPU/GPU will be inconsistent with those from the accelerators. This issue leads to a large deviation between the training environment and the inference implementation, and causes potential model accuracy losses when deployed on the accelerators. Therefore, we present a series of methods to contain the overflow phenomenon, and enable typical deep learning frameworks like Tensorflow to effectively train BNNs that could work with high accuracy and convergence speed on the specialized neural accelerators.

An N-way group association architecture and sparse data group association load balancing algorithm for sparse CNN accelerators

Jingyu Wang
Zhe Yuan
Ruoyang Liu
Huazhong Yang
Yongpan Liu

In recent years, ASIC CNN Accelerators have attracted great attention among researchers for the high performance and energy efficiency. Some former works utilize the sparsity of CNN networks to improve the performance and the energy efficiency. However, these methods bring tremendous overhead to the output memory, and the performance suffers from the hash collision. This paper presents: 1) an N-Way Group Association Architecture to reduce the memory overhead for Sparse CNN Accelerators; 2) a Sparse Data Group Association Load Balancing Algorithm which is implemented by the Scheduler module in the architecture to reduce the collision rate and improve the performance. Compared with the state-of-art accelerator, this work achieves either 1) 1.74x performance with 50% memory overhead reduction in the 4-way associated design or 2) 1.91x performance without memory overhead reduction the 2-way associated design, which is close to the theoretical performance limit (without collision).

Maximizing power state cross coverage in firmware-based power management

Vladimir Herdt
Hoang M. Le
Daniel Große
Rolf Drechsler

Virtual Prototypes (VPs) are becoming increasingly attractive for the early analysis of SoC power management, which is nowadays mostly implemented in firmware (FW). Power and timing constraints can be monitored and validated by executing a set of test-cases in a power-aware FW/VP co-simulation. In this context, cross coverage of power states is an effective but challenging quality metric. This paper proposes a novel coverage-driven approach to automatically generate test-cases maximizing this cross coverage. In particular, we integrate a coverage-loop that successively refines the generation process based on previous results. We demonstrate our approach on a LEON3-based VP.

SESSION: Testing and design for security

Improving scan chain diagnostic accuracy using multi-stage artificial neural networks

Mason Chern
Shih-Wei Lee
Shi-Yu Huang
Yu Huang
Gaurav Veda
Kun-Han (Hans) Tsai
Wu-Tung Cheng

Diagnosis of intermittent scan chain failures remains a hard problem. We demonstrate that Artificial Neural Networks (ANNs) can be used to achieve significantly higher accuracy. The key is to take on domain knowledge and use a multi-stage process incorporating ANNs with gradually refined focuses. Experimental results on benchmark circuits show that this method is, on average, 20% more accurate than a state-of-the-art commercial tool for intermittent stuck-at faults, and improves the hit rate from 25.3% to 73.9% for some test-case.

Testing stuck-open faults of priority address encoder in content addressable memories

Tsai-Ling Tsai
Jin-Fu Li
Chun-Lung Hsu
Chi-Tien Su

Content addressable memory (CAM) is widely used in the systems with the need of parallel search. The testing of CAM is more difficult than that of random access memory (RAM) due to the complicated function of CAM. Similar to the testing of RAM, the testing of CAM should cover the cell array and peripheral circuits. In this paper, we propose a March-like test, March-PCL, for detecting the stuck-open faults (SOFs) of the priority address encoder of CAMs. As the best of our knowledge, this is the first word to discuss the testing of SOFs of the priority address encoder of CAMs. The March-PCL requires 4N Write and 4N Compare operations to cover 100% SOFs.

ScanSAT: unlocking obfuscated scan chains

Lilas Alrahis
Muhammad Yasin
Hani Saleh
Baker Mohammad
Mahmoud Al-Qutayri
Ozgur Sinanoglu

While financially advantageous, outsourcing key steps such as testing to potentially untrusted Outsourced Semiconductor Assembly and Test (OSAT) companies may pose a risk of compromising on-chip assets. Obfuscation of scan chains is a technique that hides the actual scan data from the untrusted testers; logic inserted between the scan cells, driven by a secret key, hide the transformation functions between the scan-in stimulus (scan-out response) and the delivered scan pattern (captured response). In this paper, we propose ScanSAT: an attack that transforms a scan obfuscated circuit to its logic-locked version and applies a variant of the Boolean satisfiability (SAT) based attack, thereby extracting the secret key. Our empirical results demonstrate that ScanSAT can easily break naive scan obfuscation techniques using only three or fewer attack iterations even for large key sizes and in the presence of scan compression.

CycSAT-unresolvable cyclic logic encryption using unreachable states

Amin Rezaei
You Li
Yuanqi Shen
Shuyu Kong
Hai Zhou

Logic encryption has attracted much attention due to increasing IC design costs and growing number of untrusted foundries. Unreachable states in a design provide a space of flexibility for logic encryption to explore. However, due to the available access of scan chain, traditional combinational encryption cannot leverage the benefit of such flexibility. Cyclic logic encryption inserts key-controlled feedbacks into the original circuit to prevent piracy and overproduction. Based on our discovery, cyclic logic encryption can utilize unreachable states to improve security. Even though cyclic encryption is vulnerable to a powerful attack called CycSAT, we develop a new way of cyclic encryption by utilizing unreachable states to defeat CycSAT. The attack complexity of the proposed scheme is discussed and its robustness is demonstrated.

SESSION: Network-centric design and system

Routing in optical network-on-chip: minimizing contention with guaranteed thermal reliability

Mengquan Li
Weichen Liu
Lei Yang
Peng Chen
Duo Liu
Nan Guan

Communication contention and thermal susceptibility are two potential issues in optical network-on-chip (ONoC) architecture, which are both critical for ONoC designs. However, minimizing conflict and guaranteeing thermal reliability are incompatible in most cases. In this paper, we present a routing criterion in the network level. Combined with device-level thermal tuning, it can implement thermal-reliable ONoC. We further propose two routing approaches (including a mixed-integer linear programming (MILP) model and a heuristic algorithm (CAR)) to minimize communication conflict based on the guaranteed thermal reliability, and meanwhile, mitigate the energy overheads of thermal regulation in the presence of chip thermal variations. By applying the criterion, our approaches achieve excellent performance with largely reduced complexity of design space exploration. Evaluation results on synthetic communication traces and realistic benchmarks show that the MILP-based approach achieves an average of 112.73% improvement in communication performance and 4.18% reduction in energy overhead compared to state-of-the-art techniques. Our heuristic algorithm only introduces 4.40% performance difference compared to the optimal results and is more scalable to large-size ONoCs.

Bidirectional tuning of microring-based silicon photonic transceivers for optimal energy efficiency

Yuyang Wang
M. Ashkan Seyedi
Jared Hulme
Marco Fiorentino
Raymond G. Beausoleil
Kwang-Ting Cheng

Microring-based silicon photonic transceivers are promising to resolve the communication bottleneck of future high-performance computing systems. To rectify process variations in microring resonance wavelengths, thermal tuning is usually preferred over electrical tuning due to its preservation of extinction ratios and quality factors. However, the low energy efficiency of resistive thermal tuners results in nontrivial tuning cost and overall energy consumption of the transceiver. In this study, we propose a hybrid tuning strategy which involves both thermal and electrical tuning. Our strategy determines the tuning direction of each resonance wavelength with the goal of optimizing the transceiver energy efficiency without compromising signal integrity. Formulated as an integer programming problem and solved by a genetic algorithm, our tuning strategy yields 32%~53% savings of overall energy per bit for measured data of 5-channel transceivers at 5~10 Gb/s per channel, and up to 24% saving for synthetic data of 30-channel transceivers, generated based on the process variation models built upon measured data. We further investigated a polynomial-time approximation method which achieves over 100x speedup in tuning scheme computation, while still maintaining considerable energy-per-bit savings.

Redeeming chip-level power efficiency by collaborative management of the computation and communication

Ning Lin
Hang Lu
Xin Wei
Xiaowei Li

Power consumption is the first order design constraint in future many-core processors. Conventional power management approaches usually focus on certain functional components, either computation or communication hardware resources, trying to optimize its power consumption as much as possible, while leave the other part untouched. However, such unilateral power control concept, though has some potentials to contribute overall power reduction, cannot guarantee the optimal power efficiency of the chip. In this paper, we propose a novel Collaborative management approach, coordinating both Computation and Communication infrastructure in tandem, termed as CoCom. Apart from prior work that deals with power control separately, it leverages the correlations between the two parts, as the “key chain” to guide their respective power state coordination to the appropriate direction. Besides, it uses dedicated hybrid on-chip/off-chip mechanisms to minimize the control cost and simultaneously guarantee the effectiveness. Experimental results show that, compared with the conventional unilateral baselines, CoCom is able to achieve abundant power reduction with minimal performance degradation at the same time.

A high-level modeling and simulation approach using test-driven cellular automata for fast performance analysis of RTL NoC designs

Moon Gi Seok
Hessam S. Sarjoughian
Daejin Park

The simulation speedup of designed RTL NoC regarding the packet transmission is essential to analyze the performance or to optimize NoC parameters for various combinations of intellectual-property (IP) blocks, which requires repeated computations for parameter-space exploration. In this paper, we propose a high-level modeling and simulation (M&S) approach using a revised cellular automata (CA) concept to speed up simulation of dynamic flit movements and queue occupancy within target RTL NoC. The CA abstracts the detailed RTL operations with the view of deciding a cell’s state of actions (related to moving packet flits and changing the connection between CA cells) using its own high-level states and those of neighbors, and executing relevant operations to the decided action states. During the performing the operations including connection requests and acceptances, architecture-independent and user-developed routing and arbitration functions are utilized. The decision regarding the action states follows a rule set, which is generated by the proposed test environment. The proposed method was applied to an open-source Verilog NoC, which achieves simulation speedup by approximately 8 to 31 times for a given parameter set.

SESSION: Advanced memory systems

A sharing-aware L1.5D cache for data reuse in GPGPUs

Jianfei Wang
Li Jiang
Jing Ke
Xiaoyao Liang
Naifeng Jing

With GPUs heading towards general-purpose, hardware caching, e.g. the first-level data (L1D) cache is introduced into the on-chip memory hierarchy for GPGPUs. However, facing the GPGPU massive multi-threading, the small L1D requires a better management for a higher hit rate to benefit the performance. In this paper, on observing the L1D usage inefficiency, such as data duplication among streaming multiprocessors (SMs) that wastes the precious L1D resources, we first propose a shared L1.5D cache that substitutes the private L1D caches in several SMs to reduce the duplicated data and in turn increase the effective cache size for each SM. We evaluate and adopt a suitable layout of L1.5D to meet the timing requirements in GPGPUs. Then, to protect the sharable data from early evictions, we propose a sharable data aware cache management, which leverages a lightweight PC-based history table to protect sharable data on cache replacement. The experiments demonstrate that the proposed design can achieve an averaged 20.1% performance improvement with an increased on-chip hit rate by 16.9% for applications with sharable data.

NeuralHMC: an efficient HMC-based accelerator for deep neural networks

Chuhan Min
Jiachen Mao
Hai Li
Yiran Chen

In Deep Neural Network (DNN) applications, energy consumption and performance cost of moving data between memory hierarchy and computational units are significantly higher than that of the computation itself. Process-in-memory (PIM) architecture such as Hybrid Memory Cube (HMC), becomes an excellent candidate to improve the data locality for efficient DNN execution. However, it’s still hard to efficiently deploy large-scale matrix computation in DNN on HMC because of its coarse grained packet protocol. In this work, we propose NeuralHMC, the first HMC-based accelerator tailored for efficient DNN execution. Experimental results show that NeuralHMC reduces the data movement by 1.4x to 2.5x (depending on the DNN data reuse strategy) compared to Von Neumann architecture. Furthermore, compared to state-of-the-art PIM-based DNN accelerator, NeuralHMC can promisingly improve the system performance by 4.1x and reduces energy by 1.5x, on average.

Boosting chipkill capability under retention-error induced reliability emergency

Xianwei Zhang
Rujia Wang
Youtao Zhang
Jun Yang

The DRAM based main memory of high embedded systems faces two design challenges: (i) degrading reliability; and (ii) increasing power and energy consumption. While chipkill ECC (error correction code) and multi-rate refresh may be adopted to address them, respectively, a simple integration of the two results in 3x or more SDC (silent data corruption) errors and failing to meet the system reliability guarantee. This is referred to as reliability emergency.

In this paper, we propose PlusN, a hardware-assisted memory error protection design that adaptively boosts the baseline chipkill capability to address the reliability emergency. Based on the error probability assessment at runtime, the system switches its memory protection between the baseline chipkill and PlusN — the latter generates a stronger ECC with low storage and access overheads. Our experimental results show that PlusN can effectively enforce the system reliability guarantee under different reliability emergency scenarios.

SESSION: Learning: make patterning light and right

SRAF insertion via supervised dictionary learning

Hao Geng
Haoyu Yang
Yuzhe Ma
Joydeep Mitra
Bei Yu

In modern VLSI design flow, sub-resolution assist feature (SRAF) insertion is one of the resolution enhancement techniques (RETs) to improve chip manufacturing yield. With aggressive feature size continuously scaling down, layout feature learning becomes extremely critical. In this paper, for the first time, we enhance conventional manual feature construction, by proposing a supervised online dictionary learning algorithm for simultaneous feature extraction and dimensionality reduction. By taking advantage of label information, the proposed dictionary learning engine can discriminatively and accurately represent the input data. We further consider SRAF design rules in a global view, and design an integer linear programming model in the post-processing stage of SRAF insertion framework. Experimental results demonstrate that, compared with a state-of-the-art SRAF insertion tool, our framework not only boosts the mask optimization quality in terms of edge placement error (EPE) and process variation (PV) band area, but also achieves some speed-up.

A fast machine learning-based mask printability predictor for OPC acceleration

Bentian Jiang
Hang Zhang
Jinglei Yang
Evangeline F. Y. Young

Continuous shrinking of VLSI technology nodes brings us powerful chips with lower power consumption, but it also introduces many issues in manufacturability. Lithography simulation process for new feature size suffers from large computational overhead. As a result, conventional mask optimization process has been drastically resource consuming in terms of both time and cost. In this paper, we propose a high performance machine learning-based mask printability evaluation framework for lithography-related applications, and apply it in a conventional mask optimization tool to verify its effectiveness.

Semi-supervised hotspot detection with self-paced multi-task learning

Ying Chen
Yibo Lin
Tianyang Gai
Yajuan Su
Yayi Wei
David Z. Pan

Lithography simulation is computationally expensive for hotspot detection. Machine learning based hotspot detection is a promising technique to reduce the simulation overhead. However, most learning approaches rely on a large amount of training data to achieve good accuracy and generality. At the early stage of developing a new technology node, the amount of data with labeled hotspots or non-hotspots is very limited. In this paper, we propose a semi-supervised hotspot detection with self-paced multi-task learning paradigm, leveraging both data samples w./w.o. labels to improve model accuracy and generality. Experimental results demonstrate that our approach can achieve 2.9–4.5% better accuracy at the same false alarm levels than the state-of-the-art work using 10%-50% of training data. The source code and trained models are released on https://github.com/qwepi/SSL.

SESSION: Design and CAD for emerging memories

Exploring emerging CNFET for efficient last level cache design

Dawen Xu
Li Li
Ying Wang
Cheng Liu
Huawei Li

Carbon Nanotube field-effect transistors (CNFET) emerge as a promising alternative to the conventional CMOS for the much higher speed and power efficiency. It is particularly suitable for building the power-hungry last level cache (LLC). However, the process variation (PV) in CNFET substantially affects the operation stability and thus the worst-case timing, which limits the LLC operation frequency dramatically given a fully synchronous design. To address this problem, we developed a variation-aware cache such that each part of the cache can run at its optimal frequency and the overall cache performance can be improved significantly.

While asymmetric-correlated in the variation unique to the CNFET fabrication process, this indicates that cache latency distribution is closely related with the LLC layouts. For the two typical LLC layouts, we proposed variation-aware-set (VAS) cache and variation-aware-way (VAW) cache respectively to make best use of the CNFET cache architecture. For VAS cache, we further proposed a static page mapping to ensure the most frequent used data are mapped to the fast cache region. Similarly, we apply a latency-aware LRU replacement strategy to assign the most recent data to the fast cache region. According to the experiments, the optimized CNFET based LLC improves the performance by 39% and reduces the power consumption by 10% on average compared to the baseline CNFET LLC design.

Mosaic: an automated synthesis flow for boolean logic based on memristor crossbar

Lei Xie

Memristor crossbar stacked on the top of CMOS circuitry is a promising candidate for future VLSI circuits, due to its great scalability, near-zero standby power consumption, etc. In order to design large-scale logic circuits, an automated synthesis flow is highly demanded to map Boolean functions onto memristor crossbar. This paper proposes such a synthesis flow, Mosaic by reusing a part of the existing CMOS synthesis flow. In addition, two schemes are proposed to optimize designs in terms of delay and power consumption. To verify Mosaic and its optimization schemes, four types of adders are used as a study case; the incurred delay, area and power costs for both the crossbar and its CMOS controller are evaluated. The results show that the optimized adders reduce delay (>26%), power consumption (>21%) and area (>23%) as compared to initial ones. To show the potential of Mosaic for design space exploration, we use other nice more complex benchmarks. The results shows that the design can be significantly optimized in terms of both area (4.5x to 82.9x) and delay (2.4x to 9.5x).

Handling stuck-at-faults in memristor crossbar arrays using matrix transformations

Baogang Zhang
Necati Uysal
Deliang Fan
Rickard Ewetz

Matrix-vector multiplication is the dominating computational workload in the inference phase of neural networks. Memristor crossbar arrays (MCAs) can inherently execute matrix-vector multiplication with low latency and small power consumption. A key challenge is that the classification accuracy may be severely degraded by stuck-at-fault defects. Earlier studies have shown that the accuracy loss can be recovered by retraining each neural network or by utilizing additional hardware. In this paper, we propose to handle stuck-at-faults using matrix transformations. A transformation T changes a weight matrix W into a weight matrix, @ = T(W), which is more robust to stuck-at-faults. In particular, we propose a row flipping transformation, a permutation transformation, and a value range transformation. The row flipping transformation results in that stuck-off (stuck-on) faults are translated into stuck-on (stuck-off) faults. The permutation transformation maps small (large) weights to memristors stuck-off (stuck-on). The value range transformation is based on reducing the magnitude of the smallest and largest elements in the matrix, which results in that each stuck-at-fault introduces an error of smaller magnitude. The experimental results demonstrate that the proposed framework is capable of recovering 99% of the accuracy loss introduced by stuck-at-faults without requiring the neural network to be retrained.

SESSION: Optimized training for neural networks

CAPTOR: a class adaptive filter pruning framework for convolutional neural networks in mobile applications

Zhuwei Qin
Fuxun Yu
Chenchen Liu
Xiang Chen

Nowadays, the evolution of deep learning and cloud service significantly promotes neural network based mobile applications. Although intelligent and prolific, those applications still lack certain flexibility: For classification tasks, neural networks are generally trained online with vast classification targets to cover various utilization contexts. However, only partial classes are practically tested due to individual mobile user preference and application specificity. Thus the unneeded classes cause considerable computation and communication cost. In this work, we propose CAPTOR – a class-level reconfiguration framework for Convolutional Neural Networks (CNNs). By identifying the class activation preference of convolutional filters through feature interest visualization and gradient analysis, CAPTOR can effectively cluster and adaptively prune the filters associated with unneeded classes. Therefore, CAPTOR enables class-level CNN reconfiguration for network model compression and local deployment on mobile devices. Experiment shows that, CAPTOR can reduce computation load for VGG-16 by up to 40.5% and 37.9% energy consumption with ignored loss of accuracy. For AlexNet, CAPTOR also reduces computation load by up to 42.8% and 37.6% energy consumption with less than 3% loss in accuracy.

TNPU: an efficient accelerator architecture for training convolutional neural networks

Jiajun Li
Guihai Yan
Wenyan Lu
Shuhao Jiang
Shijun Gong
Jingya Wu
Junchao Yan
Xiaowei Li

Training large scale convolutional neural networks (CNNs) is an extremely computation and memory intensive task that requires massive computational resources and training time. Recently, many accelerator solutions have been proposed to improve the performance and efficiency of CNNs. Existing approaches mainly focus on the inference phase of CNN, and can hardly address the new challenges posed in CNN training: the resource requirement diversity and bidirectional data dependency between convolutional layers (CVLs) and fully-connected layers (FCLs). To overcome this problem, this paper presents a new accelerator architecture for CNN training, called TNPU, which leverages the complementary effect of the resource requirements between CVLs and FCLs. Unlike prior approaches optimizing CVLs and FCLs in separate way, we take an alternative by smartly orchestrating the computation of CVLs and FCLs in single computing unit to work concurrently so that both computing and memory resources will maintain high utilization, thereby boosting the performance. We also proposed a simplified out-of-order scheduling mechanism to address the bidirectional data dependency issues in CNN training. The experiments show that TNPU achieves a speedup of 1.5x and 1.3x, with an average energy reduction of 35.7% and 24.1% over comparably provisioned state-of-the-art accelerators (DNPU and DaDianNao), respectively.

REIN: a robust training method for enhancing generalization ability of neural networks in autonomous driving systems

Fuxun Yu
Chenchen Liu
Xiang Chen

In recent years, neural network has shown its great potential in autonomous driving systems. However, the theoretically well-train neural networks usually fail their performance when facing real-world examples with unexpected physical variations. As the current neural networks still suffer from limited generalization ability, those unexpected variations would cause considerable accuracy degradation and critical safety issues. Therefore, the generalization ability of neural networks becomes one of the most critical challenges for autonomous driving system design. In this work, we propose a robust training method to enhance neural network’s generalization ability in various practical autonomous driving scenarios. Based on detailed practical variation modeling and neural network generation ability analysis, the proposed training method could consistently improve model classification accuracy by at most 25% in various scenarios (e.g. raining/fogy, dark lighting, and camera discrepancy). Even with adversarial corner cases, our model could still achieve at most 40% accuracy improvement over natural model.

SESSION: New trends in biochips

Factorization based dilution of biochemical fluids with micro-electrode-dot-array biochips

Sohini Saha
Debraj Kundu
Sudip Roy
Sukanta Bhattacharjee
Krishnendu Chakrabarty
Partha P. Chakrabarti
Bhargab B. Bhattacharya

Sample preparation, an essential preprocessing step for biochemical protocols, is concerned with the generation of fluids satisfying specific target ratios and error-tolerance. Recent micro-electrode-dot-array (MEDA)-based DMF biochips provide the advantage of supporting both discrete and dynamic mixing models, the power of which has not yet been fully harnessed for implementing on-chip dilution and mixing of fluids. In this paper, we propose a novel factorization-based algorithm called FacDA for efficient and accurate dilution of sample fluid on a MEDA chip. Simulation results reveal that over a large number of test-cases with the mixing volume constraint in the range of 4–10 units, FacDA requires around 38% fewer mixing steps, 52% less sample units, and generates approximately 23% less wastage, all on average, compared to two prior dilution algorithms used for MEDA chips.

Sample preparation for multiple-reactant bioassays on micro-electrode-dot-array biochips

Tung-Che Liang
Yun-Sheng Chan
Tsung-Yi Ho
Krishnendu Chakrabarty
Chen-Yi Lee

Sample preparation, as a key procedure in many biochemical protocols, mixes various samples and/or reagents into solutions that contain the target concentrations. Digital microfluidic biochips (DMFBs) have been adopted as a platform for sample preparation because they provide automatic procedures that require less reactant consumption and reduce human-induced errors. However, traditional DMFBs only utilize the (1:1) mixing model, i.e., only two droplets of the same volume can be mixed at a time, which results in higher completion time and the wastage of valuable reactants. To overcome this limitation, a next-generation micro-electrode-dot-array (MEDA) architecture that provides flexibility of mixing multiple droplets of different volumes in a single operation was proposed. In this paper, we present a generic multiple-reactant sample preparation algorithm that exploits the novel fluidic operations on MEDA biochips. Simulated experiments show that the proposed method outperforms existing methods in terms of saving reactant cost, minimizing the number of operations, and reducing the amount of waste.

Robust sample preparation on digital microfluidic biochips

Zhanwei Zhong
Robert Wille
Krishnendu Chakrabarty

Sample preparation is an important application for the digital microfluidic biochips (DMFBs) platform, and many methods have been developed to reduce the time and reagent usage associated with on-chip sample preparation. However, errors in fluidic operations can result in the concentration of the resulting droplet being outside the calibration range. Current error-recovery methods have the drawback that they need the use of on-chip sensors and further re-execution time. In this paper, we present two dilution-chain structures that can generate a droplet with a desired concentration even if volume variations occur during droplet splitting. Experimental results show the effectiveness of the proposed method compared to previous methods.

SESSION: Power-efficient machine learning hardware design

SAADI: a scalable accuracy approximate divider for dynamic energy-quality scaling

Setareh Behroozi
Jingjie Li
Jackson Melchert
Younghyun Kim

Approximate computing can significantly improve the energy efficiency of arithmetic operations in error-resilient applications. In this paper, we propose an approximate divider design that facilitates dynamic energy-quality scaling. Conventional approximate dividers lack runtime energy-quality scalability, which is the key to maximizing the energy efficiency while meeting dynamically varying accuracy requirements. Our divider design, named SAADI, makes an approximation to the reciprocal of the divisor in an incremental manner, thus the division speed and energy efficiency can be dynamically traded for accuracy by controlling the number of iterations. For the approximate 8-bit division of 32-bit/16-bit division, the average accuracy of SAADI can be adjusted in between 92.5% and 99.0% by varying latency up to 7x. We evaluate the accuracy and energy consumption of SAADI for various design parameters and demonstrate its efficacy for low-power signal processing applications.

SeFAct: selective feature activation and early classification for CNNs

Farhana Sharmin Snigdha
Ibrahim Ahmed
Susmita Dey Manasi
Meghna G. Mankalale
Jiang Hu
Sachin S. Sapatnekar

This work presents a dynamic energy reduction approach for hardware accelerators for convolutional neural networks (CNN). Two methods are used: (1) an adaptive data-dependent scheme to selectively activate a subset of all neurons, by narrowing down the possible activated classes (2) static bitwidth reduction. The former is applied in late layers of the CNN, while the latter is more effective in early layers. Even accounting for the implementation overheads, the results show 20%–25% energy savings with 5–10% accuracy loss.

FACH: FPGA-based acceleration of hyperdimensional computing by reducing computational complexity

Mohsen Imani
Sahand Salamat
Saransh Gupta
Jiani Huang
Tajana Rosing

Brain-inspired hyperdimensional (HD) computing explores computing with hypervectors for the emulation of cognition as an alternative to computing with numbers. In HD, input symbols are mapped to a hypervector and an associative search is performed for reasoning and classification. An associative memory, which finds the closest match between a set of learned hypervectors and a query hypervector, uses simple Hamming distance metric for similarity check. However, we observe that, in order to provide acceptable classification accuracy HD needs to store non-binarized model in associative memory and uses costly similarity metrics such as cosine to perform a reasoning task. This makes the HD computationally expensive when it is used for realistic classification problems. In this paper, we propose a FPGA-based acceleration of HD (FACH) which significantly improves the computation efficiency by removing majority of multiplications during the reasoning task. FACH identifies representative values in each class hypervector using clustering algorithm. Then, it creates a new HD model with hardware-friendly operations, and accordingly propose an FPGA-based implementation to accelerate such tasks. Our evaluations on several classification problems show that FACH can provide 5.9X energy efficiency improvement and 5.1X speedup as compared to baseline FPGA-based implementation, while ensuring the same quality of classification.

SESSION: Security of machine learning and machine learning for security: progress and challenges for secure, machine intelligent mobile systems

ADMM attack: an enhanced adversarial attack for deep neural networks with undetectable distortions

Pu Zhao
Kaidi Xu
Sijia Liu
Yanzhi Wang
Xue Lin

Many recent studies demonstrate that state-of-the-art Deep neural networks (DNNs) might be easily fooled by adversarial examples, generated by adding carefully crafted and visually imperceptible distortions onto original legal inputs through adversarial attacks. Adversarial examples can lead the DNN to misclassify them as any target labels. In the literature, various methods are proposed to minimize the different l_p norms of the distortion. However, there lacks a versatile framework for all types of adversarial attacks. To achieve a better understanding for the security properties of DNNs, we propose a general framework for constructing adversarial examples by leveraging Alternating Direction Method of Multipliers (ADMM) to split the optimization approach for effective minimization of various l_p norms of the distortion, including l₀, l₁, l₂, and l_∞ norms. Thus, the proposed general framework unifies the methods of crafting l₀, l₁, l₂, and l_∞ attacks. The experimental results demonstrate that the proposed ADMM attacks achieve both the high attack success rate and the minimal distortion for the misclassification compared with state-of-the-art attack methods.

A system-level perspective to understand the vulnerability of deep learning systems

Tao Liu
Nuo Xu
Qi Liu
Yanzhi Wang
Wujie Wen

Deep neural network (DNN) is nowadays achieving the human-level performance on many machine learning applications like self-driving car, gaming and computer-aided diagnosis. However, recent studies show that such a promising technique has gradually become the major attack target, significantly threatening the safety of machine learning services. On one hand, the adversarial or poisoning attacks incurred by DNN algorithm vulnerabilities can cause the decision misleading with very high confidence. On the other hand, the system-level DNN attacks built upon models, training/inference algorithms and hardware and software in DNN execution, have also emerged for more diversified damages like denial of service, private data stealing. In this paper, we present an overview of such emerging system-level DNN attacks by systematically formulating their attack routines. Several representative cases are selected in our study to summarize the characteristics of system-level DNN attacks. Based on our formulation, we further discuss the challenges and several possible techniques to mitigate such emerging system-level DNN attacks.

HAMPER: high-performance adaptive mobile security enhancement against malicious speech and image recognition

Zirui Xu
Fuxun Yu
Chenchen Liu
Xiang Chen

Recently, the machine learning technologies have been widely used in cognitive applications such as Automatic Speech Recognition (ASR) and Image Recognition (IR). Unfortunately, these techniques have been massively used in unauthorized audio/image data analysis, causing serious privacy leakage. To address this issue, we propose HAMPER in this work, which is a data encryption framework that protects the audio/image data from unauthorized ASR/IR analysis. Leveraging machine learning models’ vulnerability to adversarial examples, HAMPER encrypt the audio/image data with adversarial noises to perturb the recognition results of ASR/IR systems. To deploy the proposed framework in extensive platforms (e.g. mobile devices), HAMPER also take into consideration of computation efficiency, perturbation transferability, as well as data attribute configuration. Therefore, rather than focusing on the high-level machine learning models, HAMPER generates adversarial examples from the low-level features. Taking advantage of the light computation load, fundamental impact, and direct configurability of the low-level features, the generated adversarial examples can efficiently and effectively affect the whole ASR/IR systems. Experiment results show that, HAMPER can effectively perturb the unauthorized ASR/IR analysis with 85% Word-Error-Rate (WER) and 83% Image-Error-Rate (IER) respectively. Also, HAMPER achieves faster processing speed with 1.5X speedup for image encryption and even 26X in audio, comparing to the state-of-the-art methods. Moreover, HAMPER achieves strong transferability and configures adversarial examples with desired attributes for better scenario adaptation.

AdverQuil: an efficient adversarial detection and alleviation technique for black-box neuromorphic computing systems

Hsin-Pai Cheng
Juncheng Shen
Huanrui Yang
Qing Wu
Hai Li
Yiran Chen

In recent years, neuromorphic computing systems (NCS) have gained popularity in accelerating neural network computation because of their high energy efficiency. The known vulnerability of neural networks to adversarial attack, however, raises a severe security concern of NCS. In addition, there are certain application scenarios in which users have limited access to the NCS. In such scenarios, defense technologies that require changing the training methods of the NCS, e.g., adversarial training become impracticable. In this work, we propose AdverQuil – an efficient adversarial detection and alleviation technique for black-box NCS. AdverQuil can identify the adversarial strength of input examples and select the best strategy for NCS to respond to the attack, without changing structure/parameter of the original neural network or its training method. Experimental results show that on MNIST and CIFAR-10 datasets, AdverQuil achieves a high efficiency of 79.5 – 167K image/sec/watt. AdverQuil introduces less than 25% of hardware overhead, and can be combined with various adversarial alleviation techniques to provide a flexible trade-off between hardware cost, energy efficiency and classification accuracy.

SESSION: System level modelling methods II

SIMULTime: Context-sensitive timing simulation on intermediate code representation for rapid platform explorations

Alessandro Cornaglia
Alexander Viehl
Oliver Bringmann
Wolfgang Rosenstiel

Nowadays, product lines are common practice in the embedded systems domain as they allow for substantial reductions in development costs and the time-to-market by a consequent application of design paradigms such as variability and structured reuse management. In that context, accurate and fast timing predictions are essential for an early evaluation of all relevant variants of a product line concerning target platform properties. Context-sensitive simulations provide attractive benefits for timing analysis. Nevertheless, these simulations depend strongly on a single configuration pair of compiler and hardware platform. To cope with this limitation, we present SIMULTime, a new technique for context-sensitive timing simulation based on the software intermediate representation. The assured simulation throughput significantly increases by simulating simultaneously different hardware hardware platforms and compiler configurations. Multiple accurate timing predictions are produced by running the simulator only once. Our novel approach was applied on several applications showing that SIMULTime increases the average simulation throughput by 90% when at least four configurations are analyzed in parallel.

Modeling processor idle times in MPSoC platforms to enable integrated DPM, DVFS, and task scheduling subject to a hard deadline

Amirhossein Esmaili
Mahdi Nazemi
Massoud Pedram

Energy efficiency is one of the most critical design criteria for modern embedded systems such as multiprocessor system-on-chips (MPSoCs). Dynamic voltage and frequency scaling (DVFS) and dynamic power management (DPM) are two major techniques for reducing energy consumption in such embedded systems. Furthermore, MPSoCs are becoming more popular for many real-time applications. One of the challenges of integrating DPM with DVFS and task scheduling of real-time applications on MPSoCs is the modeling of idle intervals on these platforms. In this paper, we present a novel approach for modeling idle intervals in MPSoC platforms which leads to a mixed integer linear programming (MILP) formulation integrating DPM, DVFS, and task scheduling of periodic task graphs subject to a hard deadline. We also present a heuristic approach for solving the MILP and compare its results with those obtained from solving the MILP.

Phone-nomenon: a system-level thermal simulator for handheld devices

Hong-Wen Chiou
Yu-Min Lee
Shin-Yu Shiau
Chi-Wen Pan
Tai-Yu Chen

This work presents a system-level thermal simulator, Phone-nomenon, to predict the thermal behavior of smartphone. First, we study the nonlinearity of internal and external heat transfer mechanisms and propose a compact thermal model. After that, we develop an iterative framework to handle the nonlinearity. Compared with a commercial tool, ANSYS Icepak, Phonenomenon can achieve two and three orders of magnitude speedup with 3.58% maximum error and 1.72°C difference for steady-state and transient-state simulations, respectively. Meanwhile, Phone-nomenon also fits the measured data of a built thermal test vehicle pretty well.

Virtual prototyping of heterogeneous automotive applications: matlab, SystemC, or both?

Xiao Pan
Carna Zivkovic
Christoph Grimm

We present a case study on virtual prototyping of automotive applications. We address the co-simulation of HW/SW systems involving firmware, communication protocols, and physical/mechanical systems in the context of model-based and agile development processes. The case study compares the Matlab/Simulink and SystemC based approaches by an e-gas benchmark. We compare the simulation performance, modeling capabilities and applicability in different stages of the development process.

SESSION: Placement

Diffusion break-aware leakage power optimization and detailed placement in sub-10nm VLSI

Sun ik Heo
Andrew B. Kahng
Minsoo Kim
Lutong Wang

A diffusion break (DB) isolates two neighboring devices in a standard cell-based design and has a stress effect on delay and leakage power. In foundry sub-10nm design enablements, device performance is changed according to the type of DB – single diffusion break (SDB) or double diffusion break (DDB) – that is used in the library cell layout. Crucially, local layout effect (LLE) can substantially affect device performance and leakage. Our present work focuses on the 2^nd DB effect, a type of LLE in which distance to the second-closest DB (i.e., a distance that depends on the placement of a given cell’s neighboring cell) also impacts performance of a given device. In this work, we implement a 2^nd DB-aware timing and leakage analysis flow, and show how a lack of 2^nd DB awareness can misguide current optimization in place-and-route stages. We then develop 2^nd DB-aware leakage optimization and detailed placement heuristics. Experimental results in a scaled foundry 14nm technology indicate that our 2^nd DB-aware analysis and optimization flow achieves, on average, 80% recovery of the leakage increment that is induced by the 2^nd DB effect, without changing design performance.

MDP-trees: multi-domain macro placement for ultra large-scale mixed-size designs

Yen-Chun Liu
Tung-Chieh Chen
Yao-Wen Chang
Sy-Yen Kuo

In this paper, we present a new hybrid representation of slicing trees and multi-packing trees, called multi-domain-packing trees (MDP-trees), for macro placement to handle ultra large-scale multi-domain mixed-size designs. A multi-domain design typically consists of a set of mixed-size domains, each with hundreds/thousands of large macros and (tens of) millions of standard cells, which is often seen in modern high-end applications (e.g., 4G LTE products and upcoming 5G ones). To the best of our knowledge, there is still no published work specifically tackling the domain planning and macro placement simultaneously. Based on binary trees, the MDP-tree is very efficient and effective for handling macro placement with multiple domains. Previous works on macro placement can handle only single-domain designs, which do not consider the global interactions among domains. In contrast, our MDP-trees plan domain regions globally, and optimize the interconnections among domains and macro/cell positions simultaneously. The placement area of each domain is well reserved, and the macro displacement is minimized from initial macro positions of the design prototype. Experimental results show that our approach can significantly reduce both the average half-perimeter wirelength and the average global routing wirelength.

A shape-driven spreading algorithm using linear programming for global placement

Shounak Dhar
Love Singhal
Mahesh A. Iyer
David Z. Pan

In this paper, we consider the problem of finding the global shape for placement of cells in a chip that results in minimum wirelength. Under certain assumptions, we theoretically prove that some shapes are better than others for purposes of minimizing wirelength, while ensuring that overlap-removal is a key constraint of the placer. We derive some conditions for the optimal shape and obtain a shape which is numerically close to the optimum. We also propose a linear-programming-based spreading algorithm with parameters to tune the resultant shape and derive a cost function that is better than total or maximum displacement objectives, that are traditionally used in many numerical global placers. Our new cost function also does not require explicit wirelength computation, and our spreading algorithm preserves to a large extent, the relative order among the cells placed after a numerical placer iteration. Our experimental results demonstrate that our shape-driven spreading algorithm improves wirelength, routing congestion and runtime compared to a bi-partitioning based spreading algorithm used in a state-of-the-art academic global placer for FPGAs.

Finding placement-relevant clusters with fast modularity-based clustering

Mateus Fogaça
Andrew B. Kahng
Ricardo Reis
Lutong Wang

In advanced technology nodes, IC implementation faces increasing design complexity as well as ever-more demanding design schedule requirements. This raises the need for new decomposition approaches that can help reduce problem complexity, in conjunction with new predictive methodologies that can help avoid bottlenecks and loops in the physical implementation flow. Notably, with modern design methodologies it would be very valuable to better predict final placement of the gate-level netlist: this would enable more accurate early assessment of performance, congestion and floorplan viability in the SOC floorplanning/RTL planning stages of design. In this work, we study a new criterion for the classic challenge of VLSI netlist clustering: how well netlist clusters “stay together” through final implementation. We propose use of several evaluators of this criterion. We also explore the use of modularity-driven clustering to identify natural clusters in a given graph without the tuning of parameters and size balance constraints typically required by VLSI CAD partitioning methods. We find that the netlist hypergraph-to-graph mapping can significantly affect quality of results, and we experimentally identify an effective recipe for weighting that also comprehends topological proximity to I/Os. Further, we empirically demonstrate that modularity-based clustering achieves better correlation to actual netlist placements than traditional VLSI CAD methods (our method is also 4X faster than use of hMetis for our largest testcases). Finally, we show a potential flow with fast “blob placement” of clusters to evaluate netlist and floorplan viability in early design stages; this flow can predict gate-level placement of 370K cells in 200 seconds on a single core.

SESSION: Algorithms and architectures for emerging applications

An approximation algorithm to the optimal switch control of reconfigurable battery packs

Shih-Yu Chen
Jie-Hong R. Jiang
Shou-Hung Welkin Ling
Shih-Hao Liang
Mao-Cheng Huang

The broad applications of lithium-ion batteries in cyber-physical systems attract intensive research on building energy-efficient battery systems. Reconfigurable battery packs have been proposed to improve reliability and energy efficiency. Despite recent efforts, how to simultaneously maximize battery usage time and minimize switching count during reconfiguration is rarely addressed. In this work, we devise a control algorithm that, under a simplified battery model, achieves the longest usage time under a given constant power-load while the switching count is at most twice above the minimum. It is further generalized for arbitrary power-loads and adjusted for refined battery models. Simulation experiments show promising benefits of the proposed algorithm.

Autonomous vehicle routing in multiple intersections

Sheng-Hao Lin
Tsung-Yi Ho

Advancements in artificial intelligence and Internet of Things indicates the realization of commercial autonomous vehicles is almost ready. With autonomous vehicles comes new approaches in solving some of the current traffic problems such as fuel consumption, congestion, and high incident rates. Autonomous Intersection Management (AIM) is an example that utilizes the unique attributes of autonomous vehicles to improve the efficiency of a single intersection. However, in a system of interconnected intersections, just by improving individual intersections does not guarantee a system optimum. Therefore, we extend from a single intersection to a grid of intersections and propose a novel vehicle routing method for autonomous vehicles that can effectively reduce the travel time of each vehicle. With dedicated short range communications and the fine-grained control of autonomous vehicles, we are able to apply wire routing algorithms with modified constraints to vehicle routing. Our method intelligently avoids congestions by simulating the future traffic and thereby achieving a system optimum.

GRAM: graph processing in a ReRAM-based computational memory

Minxuan Zhou
Mohsen Imani
Saransh Gupta
Yeseong Kim
Tajana Rosing

The performance of graph processing for real-world graphs is limited by inefficient memory behaviours in traditional systems because of random memory access patterns. Offloading computations to the memory is a promising strategy to overcome such challenges. In this paper, we exploit the resistive memory (ReRAM) based processing-in-memory (PIM) technology to accelerate graph applications. The proposed solution, GRAM, can efficiently executes vertex-centric model, which is widely used in large-scale parallel graph processing programs, in the computational memory. The hardware-software co-design used in GRAM maximizes the computation parallelism while minimizing the number of data movements. Based on our experiments with three important graph kernels on seven real-world graphs, GRAM provides 122.5X and 11.1x speedup compared with an in-memory graph system and optimized multithreading algorithms running on a multi-core CPU. Compared to a GPU-based graph acceleration library and a recently proposed PIM accelerator, GRAM improves the performance by 7.1X and 3.8X respectively.

ADEPOS: anomaly detection based power saving for predictive maintenance using edge computing

Sumon Kumar Bose
Bapi Kar
Mohendra Roy
Pradeep Kumar Gopalakrishnan
Arindam Basu

In Industry 4.0, predictive maintenance (PdM) is one of the most important applications pertaining to the Internet of Things (IoT). Machine learning is used to predict the possible failure of a machine before the actual event occurs. However, main challenges in PdM are: (a) lack of enough data from failing machines, and (b) paucity of power and bandwidth to transmit sensor data to cloud throughout the lifetime of the machine. Alternatively, edge computing approaches reduce data transmission and consume low energy. In this paper, we propose Anomaly Detection based Power Saving (ADEPOS) scheme using approximate computing through the lifetime of the machine. In the beginning of the machine’s life, low accuracy computations are used when machine is healthy. However, on detection of anomalies as time progresses, system is switched to higher accuracy modes. We show using the NASA bearing dataset that using ADEPOS, we need 8.8X less neurons on average and based on post-layout results, the resultant energy savings are 6.4–6.65X.

SESSION: Embedded software for parallel architecture

Efficient sporadic task handling in parallel AUTOSAR applications using runnable migration

Milan Copic
Rainer Leupers
Gerd Ascheid

Automotive software has become immensely complex. To manage this complexity, a safety-critical application is commonly written respecting the AUTOSAR standard and deployed on a multi-core ECU. However, parallelization of an AUTOSAR task is hindered by data dependencies between runnables, the smallest code-fragments executed by the run-time system. Consequently, a substantial number of idle intervals is introduced. We propose to utilize such intervals in sporadic tasks by migrating runnables that were originally scheduled to execute in the scope of periodic tasks.

A heuristic for multi objective software application mappings on heterogeneous MPSoCs

Gereon Onnebrink
Ahmed Hallawa
Rainer Leupers
Gerd Ascheid
Awaid-Ud-Din Shaheen

Efficient development of parallel software is one of the biggest hurdles to exploit the advantages of heterogeneous multi-core architectures. Fast and accurate compiler technology is required for determining the trade-off between multiple objectives, such as power and performance. To tackle this problem, the paper at hand proposes the novel heuristic TONPET. Furthermore, it is integrated into the SLX tool suite for a detailed evaluation and an applicability study. TOPNET is tested against representative benchmarks on three different platforms and compared to a state-of-the-art Evolutionary Multi Objective Algorithm (EMOA). On average, TONPET produces 6% better Pareto fronts, while being 18X faster in the worst case.

ReRAM-based processing-in-memory architecture for blockchain platforms

Fang Wang
Zhaoyan Shen
Lei Han
Zili Shao

Blockchain’s decentralized and consensus mechanism has attracted lots of applications, such as IoT devices. Blockchain maintains a linked list of blocks and grows by mining new blocks. However, the Blockchain mining consumes huge computation resource and energy, which is unacceptable for resource-limited embedded devices. This paper for the first time presents a ReRAM-based processing-in-memory architecture for Blockchain mining, called Re-Mining. Re-Mining includes a message schedule module and a SHA computation module. The modules are composed of several basic ReRAM-based logic operations units, such as ROR, RSF and XOR. Re-Mining further designs intra-transaction and inter-transaction parallel mechanisms to accelerate the Blockchain mining. Simulation results show that the proposed Re-Mining architecture outperforms CPU-based and GPU-based implementations significantly.

SESSION: Machine learning and hardware security

Towards practical homomorphic email filtering: a hardware-accelerated secure naïve bayesian filter

Song Bian
Masayuki Hiromoto
Takashi Sato

A secure version of the naïve Bayesian filter (NBF) is proposed utilizing partially homomorphic encryption (PHE) scheme. SNBF can be implemented with only the additive homomorphism from the Paillier system, and we derive new techniques to reduce the computational cost of PHE-based SNBF. In the experiment, we implemented SNBF both in software and hardware. Compared to the best existing PHE scheme, we achieved 1,200x (resp., 398,840x) runtime reduction in the CPU (resp., ASIC) implementations, with additional 1,919x power reduction on the designated hardware multiplier. Our hardware implementation is able to classify an average-length email in 0.5 s, making it one of the most practical NBF schemes to date.

A 0.16pJ/bit recurrent neural network based PUF for enhanced machine learning attack resistance

Nimesh Shah
Manaar Alam
Durga Prasad Sahoo
Debdeep Mukhopadhyay
Arindam Basu

Physically Unclonable Function (PUF) circuits are finding wide-spread use due to increasing adoption of IoT devices. However, the existing strong PUFs such as Arbiter PUFs (APUF) and its compositions are susceptible to machine learning (ML) attacks because the challenge-response pairs have a linear relationship. In this paper, we present a Recurrent-Neural-Network PUF (RNN-PUF) which uses a combination of feedback and XOR function to significantly improve resistance to ML attack, without significant reduction in the reliability. ML attack is also partly reduced by using a shared comparator with offset-cancellation to remove bias and save power. From simulation results, we obtain ML attack accuracy of 62% for different ML algorithms, while reliability stays above 93%. This represents a 33.5% improvement in our Figure-of-Merit. Power consumption is estimated to be 12.3μW with energy/bit of ≈ 0.16pJ.

P³M: a PIM-based neural network model protection scheme for deep learning accelerator

Wen Li
Ying Wang
Huawei Li
Xiaowei Li

This work is oriented at the edge computing scenario that terminal deep learning accelerators use pre-trained neural network models distributed from third-party providers (e.g. from data center clouds) to process the private data instead of sending it to the cloud. In this scenario, the network model is exposed to the risk of being attacked in the unverified devices if the parameters and hyper-parameters are transmitted and processed in an unencrypted way. Our work tackles this security problem by using on-chip memory Physical Unclonable Functions (PUFs) and Processing-In-Memory (PIM). We allow the model execution only on authorized devices and protect the model from white-box attacks, black-box attacks and model tampering attacks. The proposed PUFs-and-PIM based Protection method for neural Models (P³M), can utilize unstable PUFs to protect the neural models in edge deep learning accelerators with negligible performance overhead. The experimental results show considerable performance improvement over two state-of-the-art solutions we evaluated.

SESSION: Memory architecture for efficient neural network computing

Learning the sparsity for ReRAM: mapping and pruning sparse neural network for ReRAM based accelerator

Jilan Lin
Zhenhua Zhu
Yu Wang
Yuan Xie

With the in-memory processing ability, ReRAM based computing gets more and more attractive for accelerating neural networks (NNs). However, most ReRAM based accelerators cannot support efficient mapping for sparse NN, and we need to map the whole dense matrix onto ReRAM crossbar array to achieve O(1) computation complexity. In this paper, we propose a sparse NN mapping scheme based on elements clustering to achieve better ReRAM crossbar utilization. Further, we propose crossbar-grained pruning algorithm to remove the crossbars with low utilization. Finally, since most current ReRAM devices cannot achieve high precision, we analyze the effect of quantization precision for sparse NN, and propose to complete high-precision composing in the analog field and design related periphery circuits. In our experiments, we discuss how the system performs with different crossbar sizes to choose the optimized design. Our results show that our mapping scheme for sparse NN with proposed pruning algorithm achieves 3 — 5X energy efficiency and more than 2.5 — 6X speedup, compared with those accelerators for dense NN. Also, the accuracy experiments show that our pruning method appears to have almost no accuracy loss.

In-memory batch-normalization for resistive memory based binary neural network hardware

Hyungjun Kim
Yulhwa Kim
Jae-Joon Kim

Binary Neural Network (BNN) has a great potential to be implemented on Resistive memory Crossbar Array (RCA)-based hardware accelerators because it requires only 1-bit precision for weights and activations. While general structures to implement convolution or fully-connected layers in RCA-based BNN hardware were actively studied in previous works, Batch-Normalization (BN) layer, which is another key layer of BNN, has not been discussed in depth yet. In this work, we propose in-memory batch-normalization schemes which integrate BN layers on RCA so that area/energy-efficiency of the BNN accelerators can be maximized. In addition, we also show that sense amp error due to device mismatch can be suppressed using the proposed in-memory BN design.

XOMA: exclusive on-chip memory architecture for energy-efficient deep learning acceleration

Hyeonuk Sim
Jason H. Anderson
Jongeun Lee

State-of-the-art deep neural networks (DNNs) require hundreds of millions of multiply-accumulate (MAC) computations to perform inference, e.g. in image-recognition tasks. To improve the performance and energy efficiency, deep learning accelerators have been proposed, realized both on FPGAs and as custom ASICs. Generally, such accelerators comprise many parallel processing elements, capable of executing large numbers of concurrent MAC operations. From the energy perspective, however, most consumption arises due to memory accesses, both to off-chip external memory, and on-chip buffers. In this paper, we propose an on-chip DNN co-processor architecture where minimizing memory accesses is the primary design objective. To the maximum possible extent, off-chip memory accesses are eliminated, providing lowest-possible energy consumption for inference. Compared to a state-of-the-art ASIC, our architecture requires 36% fewer external memory accesses and 53% less energy consumption for low-latency image classification.

SESSION: Logic-level security and synthesis

BeSAT: behavioral SAT-based attack on cyclic logic encryption

Yuanqi Shen
You Li
Amin Rezaei
Shuyu Kong
David Dlott
Hai Zhou

Cyclic logic encryption is newly proposed in the area of hardware security. It introduces feedback cycles into the circuit to defeat existing logic decryption techniques. To ensure that the circuit is acyclic under the correct key, CycSAT is developed to add the acyclic condition as a CNF formula to the SAT-based attack. However, we found that it is impossible to capture all cycles in any graph with any set of feedback signals as done in the CycSAT algorithm. In this paper, we propose a behavioral SAT-based attack called BeSAT. Be-SAT observes the behavior of the encrypted circuit on top of the structural analysis, so the stateful and oscillatory keys missed by CycSAT can still be blocked. The experimental results show that BeSAT successfully overcomes the drawback of CycSAT.

Structural rewriting in XOR-majority graphs

Zhufei Chu
Mathias Soeken
Yinshui Xia
Lunyao Wang
Giovanni De Micheli

In this paper, we present a structural rewriting method for a recently proposed XOR-Majority graph (XMG), which has exclusive-OR (XOR), majority-of-three (MAJ), and inverters as primitives. XMGs are an extension of Majority-Inverter Graphs (MIGs). Previous work presented an axiomatic system, Ω, and its derived transformation rules for manipulation of MIGs. By additionally introducing XOR primitive, the identities of MAJ-XOR operations should be exploited to enable powerful logic rewriting in XMGs. We first proposed two MAJ-XOR identities and exploit its potential optimization opportunities during structural rewriting. Then, we discuss the rewriting rules that can be used for different operations. Finally, we also address structural XOR detection problem in MIG. The experimental results on EPFL benchmark suites show that the proposed method can optimize the size/depth product of XMGs and its mapped look-up tables (LUTs), which in turn benefits the quantum circuit synthesis that using XMG as the underlying logic representations.

Design automation for adiabatic circuits

Alwin Zulehner
Michael P. Frank
Robert Wille

Adiabatic circuits are heavily investigated since they allow for computations with an asymptotically close to zero energy dissipation per operation—serving as an alternative technology for many scenarios where energy efficiency is preferred over fast execution. Their concepts are motivated by the fact that the information lost from conventional circuits results in an entropy increase which causes energy dissipation. To overcome this issue, computations are performed in a (conditionally) reversible fashion which, additionally, have to satisfy switching rules that are different from conventional circuitry—crying out for dedicated design automation solutions. While previous approaches either focus on their electrical realization (resulting in small, hand-crafted circuits only) or on designing fully reversible building blocks (an unnecessary overhead), this work aims for providing an automatic and dedicated design scheme that explicitly takes the recent findings in this domain into account. To this end, we review the theoretical and technical background of adiabatic circuits and present automated methods that dedicatedly realize the desired function as an adiabatic circuit. The resulting methods are further optimized—leading to an automatic and efficient design automation for this promising technology. Evaluations confirm the benefits and applicability of the proposed solution.

SESSION: Analysis and algorithms for digital design verification

A figure of merit for assertions in verification

Samuel Hertz
Debjit Pal
Spencer Offenberger
Shobha Vasudevan

Assertion quality is critical to the confidence and claims in a design’s verification. In current practice, there is no metric to evaluate assertions. We introduce a methodology to rank register transfer level (RTL) assertions. We define assertion importance and assertion complexity and present efficient algorithms to compute them. Our method ranks each assertion according to its importance and complexity. We demonstrate the effectiveness of our ranking for pre-silicon verification on a detailed case study. For completeness, we study the relevance of our highly ranked assertions in a post-silicon validation context, using traced and restored signal values from the design’s netlist.

Suspect2vec: a suspect prediction model for directed RTL debugging

Neil Veira
Zissis Poulos
Andreas Veneris

Automated debugging tools based on Boolean Satisfiability (SAT) have greatly alleviated the time and effort required to diagnose and rectify a failing design. Practical experience shows that long-running debugging instances can often be resolved faster using partial results that are available before the SAT solver completes its search. In such cases it is preferable for the tool to maximize the number of suspects it returns during the early stages of its deployment. To capitalize on this observation, this paper proposes a directed SAT-based debugging algorithm which prioritizes examining design locations that are more likely to be suspects. This prioritization is determined by suspect2vec — a model which learns from historical debug data to predict the suspect locations that will be found. Experiments show that this algorithm is expected to find 16% more suspects than the baseline algorithm if terminated prematurely, while still retaining the ability to find all suspects if executed to completion. Key to its performance and a contribution of this work is the accuracy of the suspect prediction model. This is because incorrect predictions introduce overhead in exploring parts of the search space where few or no solutions exist. Suspect2vec is experimentally demonstrated to outperform existing suspect prediction methods by an average accuracy of 5–20%.

Path controllability analysis for high quality designs

Li-Jie Chen
Hong-Zu Chou
Kai-Hui Chang
Sy-Yen Kuo
Chi-Lai Huang

Given a design variable and its fanin cone, determining whether one fanin variable has controlling power over other fanin variables can benefit many design steps such as verification, synthesis and test generation. In this work we formulate this path controllability problem and propose several algorithms that not only solve this problem but also return values that enable or block other fanin variables. Empirical results show that our algorithms can effectively perform path controllability analysis and help produce high-quality designs.

SESSION: FPGA and optics-based neural network designs

Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS

Qin Li
Xiaofan Zhang
JinJun Xiong
Wen-mei Hwu
Deming Chen

Neural machine translation (NMT) is a popular topic in Natural Language Processing which uses deep neural networks (DNNs) for translation from source to targeted languages. With the emerging technologies, such as bidirectional Gated Recurrent Units (GRU), attention mechanisms, and beam-search algorithms, NMT can deliver improved translation quality compared to the conventional statistics-based methods, especially for translating long sentences. However, higher translation quality means more complicated models, higher computation/memory demands, and longer translation time, which causes difficulties for practical use. In this paper, we propose a design methodology for implementing the inference of a real-life NMT (with the problem size = 172 GFLOP) on FPGA for improved run time latency and energy efficiency. We use High-Level Synthesis (HLS) to build high-performance parameterized IPs for handling the most basic operations (multiply-accumulations) and construct these IPs to accelerate the matrix-vector multiplication (MVM) kernels, which are frequently used in NMT. Also, we perform a design space exploration by considering both computation resources and memory access bandwidth when utilizing the hardware parallelism in the model and generate the best parameter configurations of the proposed IPs. Accordingly, we propose a novel hybrid parallel structure for accelerating the NMT with affordable resource overhead for the targeted FPGA. Our design is demonstrated on a Xilinx VCU118 with overall performance at 7.16 GFLOPS.

Efficient FPGA implementation of local binary convolutional neural network

Aidyn Zhakatayev
Jongeun Lee

Binarized Neural Networks (BNN) has shown a capability of performing various classification tasks while taking advantage of computational simplicity and memory saving. The problem with BNN, however, is a low accuracy on large convolutional neural networks (CNN). Local Binary Convolutional Neural Network (LBCNN) compensates accuracy loss of BNN by using standard convolutional layer together with binary convolutional layer and can achieve as high accuracy as standard AlexNet CNN. For the first time we propose FPGA hardware design architecture of LBCNN and address its unique challenges. We present performance and resource usage predictor along with design space exploration framework. Our architecture on LBCNN AlexNet shows 76.6% higher performance in terms of GOPS, 2.6X and 2.7X higher performance density in terms of GOPS/Slice, and GOPS/DSP compared to previous FPGA implementation of standard AlexNet CNN.

Hardware-software co-design of slimmed optical neural networks

Zheng Zhao
Derong Liu
Meng Li
Zhoufeng Ying
Lu Zhang
Biying Xu
Bei Yu
Ray T. Chen
David Z. Pan

Optical neural network (ONN) is a neuromorphic computing hardware based on optical components. Since its first on-chip experimental demonstration, it has attracted more and more research interests due to the advantages of ultra-high speed inference with low power consumption. In this work, we design a novel slimmed architecture for realizing optical neural network considering both its software and hardware implementations. Different from the originally proposed ONN architecture based on singular value decomposition which results in two implementation-expensive unitary matrices, we show a more area-efficient architecture which uses a sparse tree network block, a single unitary block and a diagonal block for each neural network layer. In the experiments, we demonstrate that by leveraging the training engine, we are able to find a comparable accuracy to that of the previous architecture, which brings about the flexibility of using the slimmed implementation. The area cost in terms of the Mach-Zehnder interferometers, the core optical components of ONN, is 15%-38% less for various sizes of optical neural networks.

SESSION: The resurgence of reconfigurable computing in the post moore era

Software defined architectures for data analytics

Vito Giovanni Castellana
Marco Minutoli
Antonino Tumeo
Marco Lattuada
Pietro Fezzardi
Fabrizio Ferrandi

Data analytics applications increasingly are complex workflows composed of phases with very different program behaviors (e.g., graph algorithms and machine learning, algorithms operating on sparse and dense data structures, etc). To reach the levels of efficiency required to process these workflows in real time, upcoming architectures will need to leverage even more workload specialization. If, at one end, we may find even more heterogenous processors composed by a myriad of specialized processing elements, at the other end we may see novel reconfigurable architectures, composed of sets of functional units and memories interconnected with (re)configurable on-chip networks, able to adapt dynamically to adapt the workload characteristics. Field Programmable Gate Arrays are more and more used for accelerating various workloads and, in particular, inferencing in machine learning, providing higher efficiency than other solutions. However, their fine-grained nature still leads to issues for the design software and still makes dynamic reconfiguration impractical. Future, more coarse-grained architectures could offer the features to execute diverse workloads at high efficiency while providing better reconfiguration mechanisms for dynamic adaptability. Nevertheless, we argue that the challenges for reconfigurable computing remain in the software. In this position paper, we describe a possible toolchain for reconfigurable architectures targeted at data analytics.

Runtime reconfigurable memory hierarchy in embedded scalable platforms

Davide Giri
Paolo Mantovani
Luca P. Carloni

In heterogeneous systems-on-chip, the optimal choice of the cache-coherence model for a loosely-coupled accelerator may vary at each invocation, depending on workload and system status. We propose a runtime adaptive algorithm to manage the coherence of accelerators. The algorithm’s choices are based on the combination of static and dynamic features of the active accelerators and their workloads. We evaluate the algorithm by leveraging our FPGA-based platform for rapid SoC prototyping. Experimental results, obtained through the deployment of a multi-core and multi-accelerator system that runs Linux SMP, show the benefits of our approach in terms of execution time and memory accesses.

XPPE: cross-platform performance estimation of hardware accelerators using machine learning

Hosein Mohammadi Makrani
Hossein Sayadi
Tinoosh Mohsenin
Setareh rafatirad
Avesta Sasan
Houman Homayoun

The increasing heterogeneity in the applications to be processed ceased ASICs to exist as the most efficient processing platform. Hybrid processing platforms such as CPU+FPGA are emerging as powerful processing platforms to support an efficient processing for a diverse range of applications. Hardware/Software co-design enabled designers to take advantage of these new hybrid platforms such as Zynq. However, dividing an application into two parts that one part runs on CPU and the other part is converted to a hardware accelerator implemented on FPGA, is making the platform selection difficult for the developers as there is a significant variation in the application’s performance achieved on different platforms. Developers are required to fully implement the design on each platform to have an estimation of the performance. This process is tedious when the number of available platforms is large. To address such challenge, in this work we propose XPPE, a neural network based cross-platform performance estimation. XPPE utilizes the resource utilization of an application on a specific FPGA to estimate the performance on other FPGAs. The proposed estimation is performed for a wide range of applications and evaluated against a vast set of platforms. Moreover, XPPE enables developers to explore the design space without requiring to fully implement and map the application. Our evaluation results show that the correlation between the estimated speed up using XPPE and actual speedup of applications on a Hybrid platform over an ARM processor is more than 0.98.

SESSION: Hardware acceleration

Addressing the issue of processing element under-utilization in general-purpose systolic deep learning accelerators

Bosheng Liu
Xiaoming Chen
Ying Wang
Yinhe Han
Jiajun Li
Haobo Xu
Xiaowei Li

As an energy-efficient hardware solution for deep neural network (DNN) inference, systolic accelerators are particularly popular in both embedded and datacenter computing scenarios. Despite their excellent performance and energy efficiency, however, systolic DNN accelerators are naturally facing a resource under-utilization problem – not all DNN models can well match the fixed processing elements (PEs) in a systolic array implementation, because typical DNN models vary significantly from applications to applications. Consequently, state-of-the-art hardware solutions are not expected to deliver the nominal (peak) performance and energy efficiency as claimed because of resource under-utilization. To deal with this dilemma, this study proposes a novel systolic DNN accelerator with a flexible computation mapping and dataflow scheme. By providing three types of parallelism and dynamically switching among them: channel-direction mapping, planar mapping, and hybrid, our accelerator offers the adaptability to match various DNN models to the fixed hardware resources, and thus, enables flexibly exploiting PE provision and data reuse for a wide range of DNN models to achieve optimal performance and energy efficiency.

ALook: adaptive lookup for GPGPU acceleration

Daniel Peroni
Mohsen Imani
Tajana Rosing

Associative memory in form of look-up table can decrease the energy consumption of GPGPU applications by exploiting data locality and reducing the number redundant computations. State of the art architectures utilize associative memory as static look-up tables. Static designs lack the ability to adapt to applications at runtime, limiting them to small segments of code with high redundancy. In this paper, we propose an adaptive look-up based approach, called ALook, which uses a dynamic update policy to maintain a set of recently used operations in associative memory. ALook updates with values computed by floating point units at runtime to adapt to the workload and matches the stored results to avoid recomputing similar operations. ALook utilizes a novel FPU architecture which accelerates GPU computation by parallelizing the operation lookup process. We test the efficiency of ALook on image processing, general purpose, and machine learning applications by integrating it beside FPUs in an AMD Southern Island GPU. Our evaluation shows that ALook provides 3.6X EDP (Energy Delay Product) and 32.8% performance speedup, compared to an unmodified GPU, for applications accepting less than 5% output error. The proposed ALook architecture improves the GPU performance by 2.0X as compared to state-of-the-art computational reuse methods for the same level of output error.

Collaborative accelerators for in-memory MapReduce on scale-up machines

Abraham Addisie
Valeria Bertacco

Relying on efficient data analytics platforms is increasingly becoming crucial for both small and large scale datasets. While MapReduce implementations, such as Hadoop and Spark, were originally proposed for petascale processing in scale-out clusters, it has been noted that, today, most data centers processes operate on gigabyte-order or smaller datasets, which are best processed in single high-end scale-up machines. In this context, Phoenix++ is a highly optimized MapReduce framework available for chip-multiprocessor (CMP) scale-up machines. In this paper we observe that Phoenix++ suffers from an inefficient utilization of the memory subsystem, and a serialized execution of the MapReduce stages. To overcome these inefficiencies, we propose CASM, an architecture that equips each core in a CMP design with a dedicated instance of a specialized hardware unit (the CASM accelerators). These units collaborate to manage the key-value data structure and minimize both on- and off-chip communication costs. Our experimental evaluation on a 64-core design indicates that CASM provides more than a 4x speedup over the highly optimized Phoenix++ framework, while keeping area overhead at only 6%, and reducing energy demands by over 3.5x.

SESSION: Routing

Detailed routing by sparse grid graph and minimum-area-captured path search

Gengjie Chen
Chak-Wa Pui
Haocheng Li
Jingsong Chen
Bentian Jiang
Evangeline F. Y. Young

Different from global routing, detailed routing takes care of many detailed design rules and is performed on a significantly larger routing grid graph. In advanced technology nodes, it becomes the most complicated and time-consuming stage. We propose Dr. CU, an efficient and effective detailed router, to tackle the challenges. To handle a 3D detailed routing grid graph of enormous size, a set of two-level sparse data structures is designed for runtime and memory efficiency. For handling the minimum-area constraint, an optimal correct-by-construction path search algorithm is proposed. Besides, an efficient bulk synchronous parallel scheme is adopted to further reduce the runtime usage. Compared with the first place of ISPD 2018 Contest, our router improves the routing quality by up to 65% and on average 39%, according to the contest metric. At the same time, it achieves 80–93% memory reduction, and 2.5–15X speed-up.

Latency constraint guided buffer sizing and layer assignment for clock trees with useful skew

Necati Uysal
Wen-Hao Liu
Rickard Ewetz

Closing timing using clock tree optimization (CTO) is a tremendously challenging problem that may require designer intervention. CTO is performed by specifying and realizing delay adjustments in an initially constructed clock tree. Delay adjustments are typically realized by inserting delay buffers or detour wires. In this paper, we propose a latency constraint guided buffer sizing and layer assignment framework for clock trees with useful skew, called the (BLU) framework. The BLU framework realizes delay adjustments during CTO by performing buffer sizing and layer assignment. Given an initial clock tree, the BLU framework first predicts the final timing quality and specifies a set of delay adjustments, which are translated into latency constraints. Next, buffer sizing and layer assignment is performed with respect to the latency constraints using an extension of van Ginneken’s algorithm. Moreover, the framework includes a feature of reducing the power consumption by relaxing the latency constraints and a method of improving the timing performance by tightening the latency constraints. The experimental results demonstrate that the proposed framework is capable of reducing the capacitive cost with 13% on the average. The total negative slack (TNS) and worst negative slack (WNS) are reduced with up to 58% and 20%, respectively.

2024: John Darringer, IBM
2022: Ron Rohrer, SMU, CMU	For the introduction and evolution of simulation and analysis techniques that have supported the design and test of integrated circuits and systems for more than half a century.
2021: Prof. Rob Rutenbar, PITT	For his pioneering work and extraordinary leadership in analog design automation and general EDA education.
2020: Prof. Jacob A. Abraham, UT Austin	For pioneering and fundamental contributions to manufacturing testing and fault-tolerant operation of computing systems.
2019: Prof. Giovanni De Micheli, EPFL	For pioneering and fundamental contributions to synthesis and optimization of integrated circuits and networks-on-chip.
2018: Prof. Alberto Sangiovanni Vincentelli, UC Berkeley	For pioneering and fundamental contributions to design automation research and industry, in system-level design, embedded systems, logic synthesis, physical design and circuit simulation.
2017: Prof. Mary Jane Irwin, Pennsylvania State University	For contributions to VLSI architectures, electronic design automation and community membership.
2016: Prof. Chung Laung (Dave) Liu, National Tsing Hua University, Taiwan (emeritus)	For the fundamental and seminal contributions to physical design and embedded systems.
2014: Prof. John P. Hayes, University of Michigan
2013: Prof. Donald E. Thomas, Carnegie Mellon University	For his pioneering work in making the Verilog Hardware Description Language more accessible for the design automation community and allowing for faster and easier pathways to simulation, high-level synthesis, and co-design of hardware-software systems.
2012: Dr. Louise Trevillyan, IBM	Recognizing her almost-40-year career in EDA and her groundbreaking research contributions in logic and physical synthesis, design verification, high-level synthesis, processor performance analysis, and compiler technology.
2011: Prof. Robert K. Brayton, UC Berkeley	For outstanding contributions to the field of Computer Aided Design of integrated systems over the last several decades.
2010: Prof. Scott Kirkpatrick, The Hebrew University of Jerusalem	On Solving Hard Problems by Analogy Automated electronic design is not the only field in which surprising analogies from other fields of science have been used to deal with the challenges of very large problem sizes, requiring optimization across multiple scales, with constraints which eliminate any elegant solutions. Similar opportunities arise, for example, in logistics, in scheduling, in portfolio optimization and other classic problems. The common ingredient in all of these is that the problems are fundamentally frustrated, in that conflicting objectives must be traded off at all scales. This, plus the irregular structure in such real world problems eliminates any easy routes to the best solutions. Of course, in engineering, the real objective is not a global optimum, but a solution that is “good enough” and can be obtained “soon enough” to be useful. The model in materials science that gave rise by analogy to simulated annealing is the spin glass, which recently surfaced again in computer science as a vehicle whose inherent complexity might answer the long-vexing question of whether P can be proved not equal to NP.
2009: Prof. Martin Davis, NYU	For his fundamental contributions to algorithms for solving the Boolean Satisfiability problem, which heavily influenced modern tools for hardware and software verifciation, as well as logic circuit synthesis.
2008: Prof. Edward J. McCluskey, Stanford	For his outstanding contributions to the areas of CAD, test and reliable computing during the past half of century.
2007: Dr. Gene M. Amdahl, Amdahl Corporation	Award citation: For his outstanding contributions to the computing industry on the occasion of the 40th anniversary of Amdahl’s Law. Video of Dr. Amdahl’s dinner talk and a panel debate are available on the ACM digital library.

2024	Bonan Yan	Peking University
2023	Tsung-Wei Huang	University of Utah
2022	Yingyan (Celine) Lin	Rice University
2021	Zheng Zhang	UC Santa Barbara
2020	Pierre-Emmanuel Gaillardon	University of Utah
2019	Jeyavijayan (JV) Rajendran	Texas A&M University
2018	Shimeng Yu	Arizona State University
2017	Yier Jin	University of Florida
2016	Swaroop Ghosh	University of South Florida
2015	Muhammad Shafique	Karlsruhe Institute of Technology
2014	Yiran Chen	University of Pittsburgh
2013	Shobha Vasudevan	UIUC
2012	David Atienza	EPFL, Switzerland
2011	Farinaz Koushanfar	Rice University
2010	Puneet Gupta	UCLA
	Deming Chen	UIUC
2009	Yu Cao	Arizona State University
2008	Subhasish Mitra	Stanford University
2007	Michael Orshansky	University of Texas, Austin
2006	David Pan	University of Texas, Austin
2004	Kaustav Banerjee	University of California, Santa Barbara
	Igor Markov	University of Michigan, Ann Arbor
2003	Dennis Sylvester	University of Michigan, Ann Arbor
2002	Charlie Chung-Ping Chen	Univ. of Wisconsin, Madison
2000	Vijay Narayanan	Penn State University

2024	Lukas Burgholzer, for the dissertation “Design Automation Tools and Software for Quantum Computing”, Johannes Kepler University Linz. Advisors: Robert Wille and Jens Eisert.
2023	Zhiyao Xie, for the dissertation “Intelligent Circuit Design and Implementation with Machine Learning”, Duke University, Advisors: Yiran Chen and Hai Li
2022	Ganapati Bhat, for the dissertation “Design, Optimization, and Applications of Wearable IoT Devices”, Arizona State University, Advisor: Umit Y. Ogras
2021	Ahmedullah Aziz, for the dissertation “Device-Circuit Co-Design Employing Phase Transition Materials for Low power Electronics”, Purdue University, Advisor: Sumeet Gupta.
2020	Gengjie Chen, for the dissertation “VLSI Routing: Seeing Nano Tree in Giga Forest,” The Chinese University of Hong Kong. Advisor: Evangeline Young.
2019	Tsung-Wei Huang, for the dissertation “Distributed Timing Analysis“, University of Illinois, Urbana-Champaign. Advisor: Martin D. F. Wong.
2018	Xiaoqing Xu, for the dissertation “Standard Cell Optimization and Physical Design in Advanced Technology Nodes,” University of Texas at Austin. Advisor: David Z. Pan.
	Pramod Subramanyan, for the dissertation “Deriving Abstractions to Address Hardware Platform Security Challenges,” Princeton University. Advisor: Sharad Malik.
2017	Jeyavijayan Rajendran, for the dissertation “Trustworthy Integrated Circuit Design,” New York University. Advisor: Ramesh Karri.
2016	Zheng Zhang, for the dissertation “Uncertainty Quantification for Integrated Circuits and Microelectromechanical Systems,” Massachusetts Institute of Technology. Advisor: Luca Daniel.
2015	Wenchao Li, for the dissertation “Specification Mining: New Formalisms, Algorithms and Applications,” University of California at Berkeley. Advisor: Sanjit Seshia.
2014	Wangyang Zhang, for the dissertation “IC Spatial Variation Modeling: Algorithms and Applicaitons,” Carnegie Mellon University. Advisors: Xin Li and Rob Rutenbar
2013	Duo Ding, for the dissertation “CAD for Nanolithography and Nanophotonics,” University of Texas at Austin. Advisor: David Z. Pan
	Guojie Luo, for the dissertation “Placement and Design Planning for 3D integrated Circuits,” UCLA. Advisor: Jason Cong
2012	Tan Yan, for the dissertation “Algorithmic Studies on PCB Routing,” defended with the University of Illinois at Urbana-Champaign.
2011	Nishant Patil, for the dissertation “Design and Fabrication of Imperfection-Immune Carbon Nanotube Digital VLSI Circuits,” Stanford University.
2010	Himanshu Jain, for the dissertation “Verification using Satisfiability Checking, Predicate Abstraction, and Craig Interpolation,” Carnegie Mellon University.
2009	Kai-Hui Chang, for the dissertation “Functional Design Error Diagnosis, Correction and Layout Repair of Digital Circuits”, University of Michigan at Ann Arbor.
2008	(No award is given this year)
2007	(No award is given this year)
2006	Haifeng Qian of University of Minnesota, Minneapolis, Department of Electrical and Computer Engineering, for the thesis entitled “Stochastic and Hybrid Linear Equation Solvers and their Applications in VLSI Design Automation”.
2005	Shuvendu Lahiri of Carnegie Mellon University, Department of Electrical and Computer Engineering, for a thesis entitled “Unbounded System Verification using Decision Procedure and Predicate Abstraction”
2004	Chao Wang of University of Colorado at Boulder, Department of Electrical Engineering, for a thesis entitled “Abstraction Refinement for Large Scale Model Checking”
2003	Luca Daniel of University of California, Berkeley Department of Electrical Engineering and Computer Science for a thesis entitled “Simulation and modeling techniques for signal integrity and electromagnetic interference on high frequency electronic systems”
	Lintao Zhang of Princeton University Department of Electrical Engineering for a thesis entitled “Searching for truth: techniques for satisfiability of Boolean formulas.”
2002	(No award is given this year)
2001	Darko Kirovski from University of California, Los Angeles Department of Computer Science for a thesis entitled “Constraint Manipulation Techniques for Synthesis and Verification of Embedded Systems.” The runner-up who received an honorable mention in that years ceremony was Michael Beattie of Carnegie Mellon University Department of Electrical and Computer Engineering for a thesis entitled “Efficient Electromagnetic Modeling for Giga-scale IC Interconnect.”
2000	Robert Brent Jones of Stanford University Department of Electrical Engineering for a thesis entitled “Applications of Symbolic Simulation To the Formal Verification of Microprocessors.”

2023	Tulika Mitra, National University of Singapore “For her leadership in major SIGDA conferences such asgeneral chair for ICCAD and ESWEEK“.
	Patrick Groeneveld, Stanford University “For his multi-year significant contribution to the EDA community, such as DAC finance chair among many other“.
2022	Vijay Narayanan, The Pennsylvania State University “For Extraordinary Dedication and Leadership to SIGDA“.
	Harry Foster, Siemens EDA “For Extraordinary Dedication and Persistence in Leading DAC during Pandemic“.
2021	Deming Chen, University of Illinois Urbana-Champaign “For distinguished contributions to the design automation and reconfigurable computing communities“.
	Evangeline F. Y. Young, Chinese University of Hong Kong “For outstanding leadership in promoting diversity in the ACM/SIGDA community“.
2020	Sri Parameswaran, University of New South Wales “For leadership and distinguished service to the EDA community“.
2019	Naehyuck Chang, Korea Advanced Institute of Science and Technology “For many years of impactful service to ACM/SIGDA in various leadership positions“.
	Sudeep Pasricha, Colorado State University “For a decade of outstanding service to ACM/SIGDA in various volunteer positions“
2018	Chuck Alpert, Cadence Design Systems “For significant contributions to DAC”.
	Jörg Henkel, Karlsruhe Institute of Technology “For leading SIGDA efforts in Europe and DATE”.
	Michael ‘Mac’ McNamara, Adapt-IP “For sustained contributions to the design automation community and DAC”.
	Michelle Clancy, Cayenne Communication “For sustained contributions to the community, especially DAC”.
2016	Steven Levitan “In recognition of a lifetime of devoted service to ACM SIGDA and the Electronic Design Automation community.”
2015	Tatsuo Ohtsuki, Waseda University Hiroto Yasuura, Kyushu University Hidetoshi Onodera, Kyoto University “For their distinguished contributions to the Asia and South Pacific Design automation Conference (ASPDAC) as well as their many years of dedicated service on the conference’s steering committee”
	Massoud Pedram, University of Southern California “For his many years of service as Editor in Chief of the ACM Transactions on Design Automation of Electronic Systems (TODAES)”
2014	Peter Marwedel, Technical University of Dortmund “For his muiltiple years of service starting and maintaining the DATE PhD Forum”
2012	Joe Zamreno, Iowa State University
	Baris Taskin, Drexel University
2011	Peter Feldman, IBM
	Radu Marculescu, CMU
	Qinru Qiu, Syracuse University
	Martin Wong, UIUC
	Qing Wu, Air Force Rome Labs
2010	Alex K. Jones “For dedicated service to ACM/SIGDA and the Design Automation Conference as director of the University Booth”
	Matt Guthaus “For dedicated service as director of SIGDA CADathlon at ICCAD program and Editor-in-Chief of the SIGDA E-Newsletter”
	Diana Marculescu “For dedicated service as SIGDA Chair, and contributions to SIGDA, DAC and the EDA Profession”
2009	Nikil Dutt “For contributions to ACM’s Special Interest Group on Design Automation during the past fifteen years as a SIGDA officer, coordinator of the University Booth in its early years, and most recently, as Editor-in-Chief of the ACM Transactions on Design Automation of Electronic Systems”
2008	SungKyu Lim “For his contributions to the DAC University Booth.”
2007	Richard Goering “For his contributions as EE Times Editorial Director for Design Automation for more than two decades”
	Gary Smith “For his contributions as Chief EDA Analyst at Gartner Dataquest for almost two decades.”
	Daniel Gajski Mary Jane Irwin Donald E. Thomas Chuck Shaw “For outstanding contributions to the creation of the SIGDA/DAC University Booth, on the occasion of its 20th edition.”
	Soha Hassoun Steven P. Levitan “For outstanding contributions to the creation of the SIGDA Ph.D. Forum at DAC on the occasion of its 10th edition.”
	Richard Auletta “For over a decade of service to SIGDA as University Booth Coordinator, Secretary/Treasurer, and Executive Committee Member-at-Large.”
2006	Robert Walker “For dedicated service as SIGDA Chair (2001 – 2005), and over a decade of service to SIGDA, DAC and the EDA profession.”
2005	Mary Jane Irwin “For dedicated service as Editor in Chief of ACM Journal, TODAES (1998 – 2004), and many years of service to SIGDA, DAC, and the EDA profession.”
2004	James P. Cohoon “For exemplary service to SIGDA, to ACM, to DAC, and to the EDA profession as a whole”
2003	James Plusquellic “For exemplary service to ACM/SIGDA and the Design Automation Conference as director of the University Booth program”
2002	Steven P. Levitan “For over a decade of service to ACM/SIGDA and the EDA industry — as DAC University Booth Coordinator, Student Design Contest organizer, founder and promoter of SIGDA’s web server, and most recently, Chair of ACM/SIGDA from 1997 to 2001.”
	Cheng-Kok Koh “For exemplary service to ACM/SIGDA and the EDA industry — as Co-director of SIGDA’s CDROM Project, as SIGDA’s Travel Grants Coordinator, and as Editor of the SIGDA Newsletter.”
2001	Robert Grafton “For contributions to the EDA profession through his many years as the Program Director of NSF’s Design, Tools, and Test Program of the computer, Information Sciences & Engineering Directorate. In this position, he provided supervision, mentorship, and guidance to several generation of EDA tool designers and builders funded by grants from the National Science Foundation.”
2000	Massoud Pedram “For his contributions in developing the SIGDA Multimedia Series and organizing the Young Student Support Program”
	Soha Hassoun “For developing the SIGDA Ph.D. Forum”
1999	C.L. (Dave) Liu “For his work in founding our flagship journal ACM/TODAES”

2023	Robert Wille, Technical University of Munich “For his leading positions in major ACM SIGDA conferences, including Executive Committee of DATE, ICCAD and Chair of the PhD Forum at DAC and DATE“.
	Lei Jiang, Indiana University Bloomington “For his leadership and contribution to SIGDA student research forums (SRFs) at ASP-DAC“.
	Hui-Ru Jiang, National Taiwan University “For her continuous contribution to SIGDA PhD Forum at DAC and many other events“.
	Jeyavijayan (JV) Rajendran, Texas A&M University “For his leadership in co-founding and organizing Hack@DAC, the largest hardware security competition in the world“.
2022	Jeff Goeders, Brigham Young University “For Chairing System Design Contest @ DAC for the Past 3 Years“.
	Cheng Zhuo, Zhejiang University “For the Leading Efforts to the Success of SRC@ICCAD and SDC@DAC as Chairs for the past5 years, and the Sustained Contributions to the EDA Community in China“.
	Tsung-Wei Huang, University of Utah “For Chairing CADathlon and CAD Contests at ICCAD for Three Years. These Activities Have Engaged Hundreds of Students into CAD Research“.
	Yiyu Shi, University of Notre Dame “For Outstanding Services in Leading SIGDA Educational Efforts“.
2021	Bei Yu, Chinese University of Hong Kong “For service as SIGDA Web Chair from 2016 to 2021, SIGDA Student Research Competition Chair in 2018 and 2019, and other SIGDA activities“.

2020	Aida Todri-Sanial, LIRMM/University of Montpellier “For service as Co-Editor-in-Chief of SIGDA e-Newsletter from 2016 to 2019 and other SIGDA activities“.
	Yu Wang, Tsinghua University “For service as Co-Editor-in-Chief of SIGDA e-Newsletter from 2017 to 2019 and other SIGDA activities”.
2019	Yinhe Han, Chinese Academy of Sciences “For outstanding effort in promoting EDA and SIGDA events in China“
	Jingtong Hu, University of Pittsburgh “For contribution to multiple SIGDA education and outreach activities“
	Xiaowei Xu, University of Notre Dame “For contribution to the 2018 System Design Contest at ACM/IEEE Design Automation Conference“
2015	Laleh Behjat, University of Calgary “For service as chair of the SIGDA PhD forum at DAC”
	Soonhoi Ha, Seoul National University Jeonghee Shin, Apple “For their service as co-chairs of the University Booth at DAC”
1998	Jason Cong Bryan Preas Kathy Preas Chong-Chian Koh Cheng-Kok Koh “For contributions in producing SIGDA CD ROM’s – Archiving the knowledge of the Design Automation Community”
1997	Robert Walker “For his hard work as Secretary/Treasurer and University Booth Coordinator”
1996	Debbie Hall “For serving as ACM Program Director for SIGDA for the past 6 years”

2013	Jarrod Roy Sudeep Pasricha Sudarshan Banerjee Srinivas Katkoori “for running CADathlon”
2012	Cheng Zhuo Steve Burns Amin Chirayu Andrey Ayupov Gustavo Wilke Mustafa Ozdal
2011	Raju Balasuramanian Zhuo Li Frank Liu Natarajan Viswanathan
2010	Cliff Sze “For contributions to the ISPD Physical Design contest, and promoting research in physical design.”
2008	Hai Zhou “For contributions to the SIGDA E-Newsletter (2005-2008)”
	Jing Yang “For contributions to the SIGDA Ph.D. Forum at DAC (2004-2008)”
2007	Geert Janssen “For contributions to the SIGDA CADathlon”
	Tony Givargis “For contributions to the SIGDA Ph.D. Forum at DAC (2005-2007)”
	Gi-Joon Nam “For contributions to the Physical Design Contest at ISPD (2005-2007)”
2006	Kartik Mohanram “For contributions to the SIGDA CADathlon at ICCAD (2004-2005)”
	Ray Hoare “For contributions to the SIGDA University Booth at DAC (2004-2006)”
	Radu Marculescu “For contributions to the SIGDA Ph.D. Forum at DAC (2004-2006)”
	Frank Liu “For contributions to the SIGDA Ph.D. Forum at DAC (2005-2006)”
2005	Florian Krohm “For contributions to the SIGDA CADathlon at ICCAD”
	R. Iris Bahar Igor Markov “For contributions to the SIGDA E-Newsletter”
2004	Robert Jones “For contributions to the SIGDA Ph.D. Forum at DAC”
2003	Diana Marculescu “For contributions to the SIGDA Ph.D. Forum at DAC”
2002	Geert Janssen “For contributions to the SIGDA CADathlon at ICCAD”
	Pai Chou Abe Elfadel Olivier Coudert Soha Hassoun “For contributions to the SIGDA Ph.D. Forum at DAC”

A. Richard Newton Technical Impact Award 2025: Call for Nominations

SIGDA Service Awards 2023

News

Events

UDDAC 2025: 35th ACM SIGDA / IEEE CEDA University Demonstration at Design Automation Conference

Sunday, Oct. 29, 2023 08:00 AM – 05:00 PM, In-Person

Welcome to CADathlon@ICCAD

System Design Contest at DAC 2023

Awards

2025 ACM SIGDA Pioneering Achievement Award

Call for Nominations

SIGDA Outstanding New Faculty Award

Call for Applications

Past Awardees

2025 ACM Outstanding Ph.D. Dissertation Award in Electronic Design Automation

Past Awardees

ACM/IEEE A. Richard Newton Technical Impact Award in Electronic Design Automation 2025

Call for Nominations

Description

Prize

Funding

Presentation

Historical Background

Basis for Judging

Eligibility

Selection Committee

Nomination Deadline

Nomination Package

Past Awardees

Service Awards

Distinguished Service Awards

Meritorious Service Awards

Technical Leadership Awards

Programs

Review Process

Selection Criteria

Post Applications – Report and Reimbursement

Application Form

Sponsor

SRC@ICCAD

ACM Student Research Competition at ICCAD 2021 (SRC@ICCAD’21)

This Year’s Results (2021)

Undergraduate category (8 participants in total):

Graduate category (22 participants in total):

Submission

Last Year’s Results (2020): SIGDA SRC Gold Medalists won ACM SRC Grand Finals

Ph.D. Forum@DAC 2022

Submission Process

Important Dates

Eligibility

Travel Support and Best Presentation Award

Contact

Sponsored By

Organizers Guide

ACM/SIGDA Guide to Running or Starting a Conference, Symposium, or Workshop

Revised on May 1, 2020

CADathlon 2018@ICCAD

SIGDA’s CADathlon 2018 at ICCAD

Sunday, Nov. 4, 2018 8 am – 5 pm, Hilton San Diego Resort & Spa, San Diego, CA

Welcome to the CADathlon @ ICCAD

Global Education Partner:

LIVE

DASS@DAC

DASS at DAC 2018

SDC@DAC

System Design Contest at DAC 2023

Links

Organizing Committee

Sponsors

ASPDAC 2019 TOC

SESSION: University design contest

SESSION: Real-time embedded software

SESSION: Hardware and system security

SESSION: Thermal- and power-aware design and optimization

SESSION: Reverse engineering: growing more mature – and facing powerful countermeasures

SESSION: All about PIM

SESSION: Design for reliability

SESSION: New advances in emerging computing paradigms

SESSION: Design, testing, and fault tolerance of neuromorphic systems

SESSION: Memory-centric design and synthesis