Main Page
DAC 2002 : [ Table of Contents ] [ Session Index ] [ Author Index ]


DAC 2002 ABSTRACTS

Sessions: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55]


Session 1: PANEL: Wall Street Evaluates EDA

Chair: Aart de Geus
Organizers: Sharon Turnoy, Deirdre Hanford
Panel Members: Moshe Gavrielov, Richard Goering, Lucio Lanza, Vishal Saluja, Jay Vleeschhouwer [p. 1]

The EDA sector is capturing unprecedented attention on Wall Street. With seven IPOs in 2001 alone and strong performance by the EDA "blue chips," the industry has gained new prominence with the capital markets. In this panel, Aart de Geus will moderate a discussion between representatives of the various constituencies who play a role in shaping Wall Street's opinion of EDA: financial analysts, portfolio managers, venture capitalists, CEOs, and the press.
Questions discussed will include: How do investors and analysts currently view EDA? What contributes to that perception? What factors drive EDA's current favor with Wall Street, and why is the sector "hot" compared to two to three years ago? How do we sustain that favor? Given the highly complex nature of our industry, how do investors decipher the strength of a particular EDA firm? Is EDA tied to the semiconductor industry's performance? What role does the press play in shaping the view?


Session 2: Web and IP Based Design

Chair: Gang Qu
Organizers: Ahmed A. Jerraya, Krzysztof Kuchcinski
2.1 IP Delivery for FPGAs Using Applets and JHDL [p. 2]
Michael J. Wirthlin, Brian McMurtrey

This paper introduces an FPGA IP evaluation and delivery system that operates within Java applets. The use of such applets allows designers to create, evaluate, test, and obtain FPGA circuits directly within a web browser. Based on the JHDL design tool, these applets allow structural viewing, circuit simulation, and netlist generation of application-specific circuits. Applets can be customized to provide varying levels of IP visibility and functionality as needed by both customer and vendor.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids - Simulation, Hardware description languages
General Terms
Design
Keywords
Intellectual Property, JHDL, Applet, FPGA

2.2 Watermarking Integer Linear Programming Solutions [p. 8]
Seapahn Megerian, Milenko Drinic, Miodrag Potkonjak

Linear programming (LP) in its many forms has proven to be an indispensable tool for expressing and solving optimization problems in numerous domains. We propose the first set of generic watermarking techniques for integer-LP (ILP). The proof of authorship by watermarking is achieved by introducing additional constraints to limit the solution space and can be used as effective means of intellectual property protection (IPP) and authentication. We classify and analyze the types of constraints in the ILP watermarking domain and show how ILP formulations provide more degrees of freedom for embedding signatures than other existing approaches. To demonstrate the effectiveness of the proposed ILP watermarking techniques, the generic discussion is further concretized using two examples, namely Satisfiability and Scheduling.
Categories and Subject Descriptors
K.6.5 [Management of Computing and Information Systems]: Security and Protection -- Digital Watermarking
General Terms
Algorithms, Economics, Theory, Legal Aspects.
Keywords
Digital Watermarking, Intellectual Property Protection

2.3 Model Design Using Hierarchical Web-Based Libraries [p. 14]
Fabrice Bernardi, Jean-Franois Santucci

Design tools can be profitably associated with libraries of reusable modeling components that will make the description and also the validation of the models much easier. Furthermore, applications of today and tomorrow will be increasingly based on three fundamental technologies: Object Orientation, Client/Server and Internet. We propose in this article an object-oriented architecture for the definition of Web-based hierarchical models libraries. The originality of our approach lies in the facts that it is based on : (i) a notion of genericity of use, (ii) notions like inheritance and abstraction links between the stored models and (iii) Web-based storing and consulting libraries procedures.
Categories and Subject Descriptors
J.6 [Computer Applications]: Computer-Aided Design; D.2.11 [Software]: Software Engineering|Software Architectures ; D.2.13 [Software]: Software Engineering|Reusable Software
General Terms
Design, Management
Keywords
models reuse, models libraries, Web-based access, abstraction hierarchy

2.4 Behavioral Synthesis via Engineering Change [p. 18]
Milenko Drinic, Darko Kirovski

Engineering change (EC) is a technique that enables a designer to rapidly perform minor specification alternations while minimally resynthesizing only small portions of the specification throughout several levels of design abstraction. In this paper, we introduce the first EC-based synthesis technique for coordinated design optimization in multiple steps. The technique has four phases: optimization region identification,feedback formulation,resynthesis in first step, and finally resynthesis in the second design step. To demonstrate the technique,we focus on behavioral synthesis and transformation, scheduling, and register assignment steps. We developed a generic EC-based approach for design optimization during multiple consecutive synthesis steps. Next, we show how one can use EC to enhance coordinated application of transformations and scheduling,and scheduling and register assignment.
Categories and Subject Descriptors
B.5.2 [Register-Transfer-Level Implementation]: Design Aids Optimization
General Terms
Design
Keywords
Engineering change, transformations, scheduling, register assignment


Session 3: Design Innovations for Embedded Processors

Chair: Gang Qu
Organizers: Grant E. Martin, Majid Sarrafzadeh
3.1 A Universal Technique for Fast and Flexible Instruction-Set Architecture Simulation [p. 22]
Achim Nohl, Gunnar Braun, Oliver Schliebusch, Rainer Leupers, Heinrich Meyr, Andreas Hoffmann

In the last decade, instruction-set simulators have become an essential development tool for the design of new programmable architectures. Consequently, the simulator performance is a key factor for the overall design efficiency. Based on the extremely poor performance of commonly used interpretive simulators, research work on fast compiled instruction-set simulation was started ten years ago. However, due to the restrictiveness of the compiled technique, it has not been able to push through in commercial products. This paper presents a new retargetable simulation technique which combines the performance of traditional compiled simulators with the flexibility of interpretive simulation. This technique is not limited to any class of architectures or applications and can be utilized from architecture exploration up to end-user software development. The work-flow and the applicability of the so-called just-in-time cache compiled simulation (JIT-CCS) technique will be demonstrated by means of state of the art real world architectures.
Categories and Subject Descriptors
I.6.3 [Simulation and Modeling]: Simulation Support Systems; I.6.3 [Simulation and Modeling]: Model Validation and Analysis; D.3.2 [Programming Languages]: Design Languages - LISA; C.0 [General]: Modeling of Computer Architecture
General Terms
Design, Languages, Performance
Keywords
Retargetable simulation, compiled simulation, instruction set architectures

3.2 A Fast On-Chip Profiler Memory [p. 28]
Roman Lysecky, Susan Cotterell, Frank Vahid

Profiling an application executing on a microprocessor is part of the solution to numerous software and hardware optimization and design automation problems. Most current profiling techniques suffer from runtime overhead, inaccuracy, or slowness, and the traditional non-intrusive method of using a logic analyzer doesn't work for today's system-on-a-chip having embedded cores. We introduce a novel on-chip memory architecture that overcomes these limitations. The architecture, which we call ProMem, is based on a pipelined binary tree structure. It achieves single-cycle throughput, so it can keep up with today's fastest pipelined processors. It can also be laid out efficiently and scales very well, becoming more efficient the larger it gets. The memory can be used in a wide-variety of common profiling situations, such as instruction profiling, value profiling, and network traffic profiling, which in turn can be used to guide numerous design automation tasks.
Keywords
Profiling, system-on-a-chip, platform tuning, adaptive architectures, low power, embedded CAD, binary tree, memory design, embedded systems.

3.3 Design of an One-cycle Decompression Hardware for Performance Increase in Embedded Systems [p. 34]
Haris Lekatsas, Jrg Henkel, Venkata Jakkula

Code compression is known as an effective technique to reduce instruction memory size on an embedded system. However, code compression can also be very effective in increasing processor-to-memory bandwidth and hence provide increased system performance. In this paper we describe our design and design methodology of the first running prototype of a one-cycle code decompression unit that decompresses compressed instructions on-the-fly. We describe in detail the architecture that enables decompression of multiple instructions in one cycle and we present the design methodologies and tools used. The stand-alone decompression unit does not require any modifications on the processor core. We observed up to 63% performance increase with 25% in average over a wide variety of applications running on the hardware prototype under various system configurations.
Categories and Subject Descriptors
B.3 [Hardware]: Memory Structures; C.3 [Computer Systems Organization]: Special-purpose and Application-based Systems-Real-time and embedded systems
General Terms
Algorithms, Design, Performance


Session 4: Passive Model Order Reduction

Chair: Jacob White
Organizers: Jaijeet Roychowdhury, Mustafa Celik
4.1 A Factorization-Based Framework for Passivity-Preserving Model Reduction of RLC Systems [p. 40]
Q. Su, V. Balakrishnan, C.-K. Koh

We present a framework for passivity-preserving model reduction for RLC systems that includes, as a special case, the well-known PRIMA model reduction algorithm. This framework provides a new interpretation for PRIMA, and offers a qualitative explanation as to why PRIMA performs remarkably well in practice. In addition, the framework enables the derivation of new error bounds for PRIMA-like methods. We also show how the framework offers a systematic approach to computing reduced-order models that better approximate the original system than PRIMA, while still preserving passivity.
Categories and Subject Descriptors
G.1.3 [NUMERICAL ANALYSIS]: Numerical Linear Algebra linear systems; F.2.1 [ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY]: Numerical Algorithms and Problems-computations on matrices
General Terms
Algorithms
Keywords
Model Reduction, Large Scale Systems, RLC interconnect, Passivity Preserving, Factorization

4.2 Model Order Reduction for Strictly Passive and Causal Distributed Systems [p. 46]
Luca Daniel, Joel Phillips

This paper presents a class of algorithms suitable for model reduction of distributed systems. Distributed systems are not suitable for treatment by standard model-reduction algorithms such as PRIMA, PVL, and the Arnoldi schemes because they generate matrices that are dependent on frequency (or other parameters) and cannot be put in a lumped or state-space form. Our algorithms build on well-known projection-based reduction techniques, and so require only matrix-vector product operations and are thus suitable for operation in conjunction with electromagnetic analysis codes that use iterative solution methods and fast-multipole acceleration techniques. Under the condition that the starting systems satisfy system-theoretic properties required of physical systems, the reduced systems can be guaranteed to be passive. For distributed systems, we argue that causality of the underlying representation is as important a consideration as passivity has become.
Categories and Subject Descriptors: B.7.2 Simulation, B.8.2 Performance Analysis and Design Aids, G.1.1 Interpolation G.1.2, Approximations, I.6 Simulation and Modeling.
General Terms: Algorithms, Performance, Design
Keywords: Passive reduced order modeling, Distributed systems.

4.3 Guaranteed Passive Balancing Transformations for Model Order Reduction [p. 52]
Joel Phillips, Luca Daniel, L. Miguel Silveira

The major concerns in state-of-the-art model reduction algorithms are: achieving accurate models of sufficiently small size, numerically stable and efficient generation of the models, and preservation of system properties such as passivity. Algorithms such as PRIMA generate guaranteed-passive models, for systems with special internal structure, using numerically stable and efficient Krylov-subspace iterations. Truncated Balanced Realization (TBR) algorithms, as used to date in the design automation community, can achieve smaller models with better error control, but do not necessarily preserve passivity. In this paper we show how to construct TBR-like methods that guarantee passive reduced models and in addition are applicable to state-space systems with arbitrary internal structure.
Categories and Subject Descriptors: B.7.2 Simulation, B.8.2 Performance Analysis and Design Aids, I.6 Simulation and Modeling.
General Terms: Algorithms, Performance, Design
Keywords: Passive reduced order modeling, Truncated balanced realization.


Session 5: New Perspectives in Physical Design

Chair: Steven Teig
Organizers: Ralph Otten, Timothy Kam
5.1 Uncertainty-Aware Circuit Optimization [p. 58]
Xiaoliang Bai, Chandu Visweswariah, Philip N. Strenski, David J. Hathaway

Almost by definition, well-tuned digital circuits have a large number of equally critical paths, which form a so-called "wall" in the slack histogram. However, by the time the design has been through manufacturing, many uncertainties cause these carefully aligned delays to spread out. Inaccuracies in parasitic predictions, clock slew, mode-to-hardware correlation, static timing assumptions and manufacturing variations all cause the performance to vary from prediction. Simple statistical principles tell us that the variation of the limiting slack is larger when the height of the wall is greater. Although the wall may be the optimum solution if the static timing predictions were perfect, in the presence of uncertainty in timing and manufacturing, it may no longer be the best choice. The application of formal mathematical optimization in transistor sizing increases the height of the wall, thus exacerbating the problem. There is also a practical matter that schematic restructuring down-stream in the design methodology is easier to conceive when there are fewer equally critical paths. This paper describes a method that gives formal mathematical optimizers the incentive to avoid the wall of equally critical paths, while giving up as little as possible in nominal performance. Surprisingly, such a formulation reduces the degeneracy of the optimization problem and can render the optimizer more effective. This "uncertainty-aware" mode has been implemented and applied to several high-performance microprocessor macros. Numerical results are included.

5.2 Congestion-Driven Codesign of Power and Signal Networks [p. 64]
Haihua Su, Jiang Hu, Sachin S. Sapatnekar, Sani R. Nassif

We present a global wire design methodology that simultaneously considers the performance needs for both signal lines and power grids under congestion considerations. An iterative procedure is employed in which the global routing is performed according to a congestion map that includes the resource utilization of the power grid, followed by a step in which the power grid is adjusted to relax the congestion in crowded regions. This adjustment is in the form of wire removal in noncritical regions, followed by a wire sizing step that overcomes the effects of wire removal. Experimental results show that the overall routability can be significantly improved while the power grid noise is maintained within the voltage droop constraint.
Categories and Subject Descriptors
B.8.2 [Performance and Reliability]: Performance Analysis and Design Aids
General Terms
Algorithms
Keywords
wire congestion, codesign, signal routing, power grid noise

5.3 On Metrics for Comparing Routability Estimation Methods for FPGAs [p. 70]
Parivallal Kannan, Shankar Balachandran, Dinesh Bhatia

Interconnect management is a critical design issue for large FPGA based designs. One of the most important issues for planning interconnection is the ability to accurately and efficiently predict the routability of a given design on a given FPGA architecture. The recently proposed routability estimation procedure, fGREP [6], produced estimates within 3 to 4% of an actual detailed router. Other known routability estimation methods include RISA [5], Lou's [7] method and Rent's rule based methods [1] [12] [9]. Comparing these methods has been difficult because of the different reporting methods used by the authors. We propose a uniform reporting metric based on comparing the estimates produced with the results of an actual detailed router on both local and global levels. We compare all the above methods using our reporting metric on a large number of benchmark circuits and show that the enhanced fGREP method produces tight estimates that outperform most other techniques.
Categories and Subject Descriptors B.7.2 [Design Aids for Integrated Circuits]:
General Terms
Algorithms, Measurement, Experimentation
Keywords
FPGA, fGREP, routability estimation, congestion, RISA, Rent's rule


Session 6: PANEL: Tools or Users: Which is the Bigger Bottleneck?

Chair: Andrew B. Kahng
Organizer: Bob Dahlberg
Panel Members: Ron Collett, Patrick Groeneveld, Lambert van den Hoven, Lavi Lev, Nancy Nettleton, Paul Rodman [p. 76]

As chip design becomes ever more complex, fewer design teams are succeeding. Who's to blame? On one hand, tools are hard to use, buggy, not interoperable, and have missing functionality. On the other hand, there is a wide range of engineering skills within the user population, and tools can be abused within flawed methodologies. This panel will quantify and prioritize the key gaps, including interoperability, that must be addressed on both sides.


Session 7: Special Session: Life after CMOS: Imminent or Irrelevant?

Chairs: Dennis Sylvester, Kaustav Banerjee
Organizers: Dennis Sylvester, Kaustav Banerjee
7.1 Life Is CMOS: Why Chase the Life After? [p. 78]
George Sery, Shekhar Borkar, Vivek De

This paper discusses potential solutions to the CMOS device technology scaling at gate lengths approaching 10nm. Promising circuit and design techniques to control leakage power are described. Energy-efficient microarchitecture trends for general purpose microprocessors are elucidated.
Categories and Subject Descriptors
B.7 INTEGRATED CIRCUITS
B.7.1 Types and Design Styles Microprocessors and microcomputers, VLSI.
General Terms
Performance, Design
Keywords
Technology scaling, Leakage control, Microarchitecture

7.2 The Next Chip Challenge: Effective Methods for Viable Mixed Technology SoCs [p. 84]
H. Bernhard Pogge

The next generation of computer chips will continue the trend for more complexity than their predecessors. Many of them will contain different chip technologies and are termed SoCs (System on a Chip). They present to the process community, the system and circuit communities, as well as to the design and test communities major new challenges. On the other hand they also offer at the same time also new opportunities!. For one, the desire to bring more functionality onto a single chip tends to require additional processing, which in turn results in various degrees of device compromises. The chips will also tend to become larger due to the added device content, and this generally will impact the yieldability of the final chip. And such chips will require potentially new approaches to validate the intended design performances. Chip sector reuse must also be brought into the discussion and wherever possible into practice. The net effect implies higher chip costs. Much of the industry's efforts are therefore focused in addressing these challenges; however, so far, not yet very successfully. The alternative has been to continue in the placement of chips onto substrate modules. Yet, this solution creates practical limits on achievable wiring densities and bandwidth, due to the spacing requirements of the C4 interconnection. Furthermore, every C4 joint is associated with a signal delay of about 50 psec. All of these handicaps would potentially benefit greatly from new SoC methods, starting with the fabrication methodology and extending it into the chip design and test areas.
Such a direction has been set in motion. The opportunity for a uniquely new chip fabrication method has emerged by combining a set of somewhat diverse processes. It is based on a judicious selection of process elements from the traditional chip area and combined with those of a somewhat more recent chip packaging process methodology. This approach results in overcoming simultaneously all of the key current process limitations as experienced with today's SoC chip designs, as well as eliminates certain chip packaging technology handicaps. Yet, it does not require the need for new process tooling. It relies on currently existing process tooling and process methodologies.
This new process direction has been found to be quite applicable to a number of desirable SoC device designs, and offers new opportunities for yet another expansion of the current semiconductor technology base over the next few years. However, effective SoC designs and fabrications require a much closer and earlier collaboration between the process, design and test communities.
General Terms: Design
Key Words: SoCs (System on a Chip); Chip Fabrication methods; Chip/Packing integration; Chip Subsector concepts

7.3 Few Electron Devices: Towards Hybrid CMOS-SET Integrated Circuits [p. 88]
Adrian M. Ionescu, Michel J. Declercq, Santanu Mahapatra, Kaustav Banerjee, Jacques Gautier

In this paper, CMOS evolution and their fundamental and practical limitations are briefly reviewed, and the working principles, performance, and fabrication of single-electron transistors (SETs) are addressed in detail. Some of the unique characteristics and functionality of SETs, like unrivalled integration and low power, which are complementary to the sub-20 nm CMOS1, are demonstrated. Characteristics of two novel SET architectures, namely, C-SET and R-SET, aimed at logic applications are compared. Finally, it is shown that combination of CMOS and SET in hybrid ICs appears to be attractive in terms of new functionality and performance, together with better integrability for ULSI, especially because of their complementary characteristics. It is envisioned that efforts in terms of compatible fabrication processes, packaging, modeling, electrical characterization, co-design and co-simulation will be needed in the near future to achieve substantial advances in both memory and logic circuit applications based on CMOS-SET hybrid circuits.
Categories and Subject Descriptors
B.7 INTEGRATED CIRCUITS
B.7.1 Types and Design Styles Advanced Technologies
General Terms
Design, Experimentation, Measurement, Performance
Keywords
Nanoelectronics, Single-Electron Transistors, Ultimate CMOS, Hybrid CMOS-SET Circuits, Low power, Inverter, Quantizer.

7.4 Carbon Nanotube Field-Effect Transistors and Logic Circuits [p. 94]
R. Martel, V. Derycke, J. Appenzeller, S. Wind, and Ph. Avouris

In this paper, we present recent advances in the understanding of the properties of semiconducting single wall carbon nanotube and in the exploration of their use as field-effect transistors (FETs). Both electrons and holes can be injected in a nanotube transistor by either controlling the metal-nanotube Schottky barriers present at the contacts or simply by doping the bulk of the nanotube. These methods give complementary nanotube FETs that can be integrated together to make inter- and intra-nanotube logic circuits. The device performance and their general characteristics suggest that they can compete with silicon MOSFETs. While this is true when considering simple prototype devices, several issues remain to be explored before a nanotube-based technology is possible. They are also discussed.
Categories and Subject Descriptors
B.6.0 [Logic Design]: General novel logic devices.
General Terms
Measurement, Performance, Design, Experimentation.
Keywords
Nanoelectronics, Carbon Nanotube, Semiconductor, Field-Effect Transistor, FET, Schottky Barrier, Circuits, Inverter, Logic Gate, SWNT.


Session 8: Formal Verification

Chair: Yaron Wolfsthal
Organizers: Carl Pixley, Karem Sakallah
8.1 Efficient State Representation for Symbolic Simulation [p. 99]
Valeria Bertacco, Kunle Olukotun

Symbolic simulation is attracting increasing interest for the validation of digital circuits. It allows the verification engineer to explore all, or a major portion of the circuit's state space without having to design specific and time-consuming test stimuli. However, the complexity and unpredictable run-time behavior of symbolic simulation have limited its scope to small-to-medium circuits. In this paper, we propose a novel approach to symbolic simulation that reduces the size of the BDDs of the state vector while maintaining an exact representation of the set of states visited. The method exploits the decomposition properties of Boolean functions. By restructuring the next-state functions in their disjoint support components, we gain a better insight in the role of each input variable. Consequently, we can simplify the next-state functions without significantly sacrificing the simulation accuracy. Our experimental results shows that this approach can be used in effectively reducing the memory requirements of symbolic simulation while surrendering only a small portion of the design's state space.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids - Verification, Simulation; B.8 [Hardware]: Performance and Reliability
General Terms
Design, Verification, Performance, Theory
Keywords
Formal Verification, Symbolic Simulation, BDDs

8.2 Handling Special Constructs in Symbolic Simulation [p. 105]
Alfred Klbl, James Kukula, Kurt Antreich, Robert Damiano

Symbolic simulation is a formal verification technique which combines the flexibility of conventional simulation with powerful symbolic methods. Some constructs, however, which are easy to handle in conventional simulation need special consideration in symbolic simulation. This paper discusses some special constructs that require unique treatment in symbolic simulation such as the symbolic representation of arrays, an efficient symbolic method for storing arrayed instances and the handling of symbolic data-dependent delays. We present results which demonstrate the effectiveness of our symbolic array model in the simulation of highly regular structures like FPGAs, memories or cellular automata.
Categories and Subject Descriptors
B.5.2 [Hardware]: Register-Transfer-Level ImplementationDesign Aids, Verification; B.6.3 [Hardware]: Logic Design - Design Aids, Verification; B.7.2 [Hardware]: Integrated Circuits - Design Aids, Verification
General Terms
Verification
Keywords
Symbolic Simulation, Formal Verification

8.3 A Hybrid Verification Approach: Getting Deep into the Design [p. 111]
Scott Hazelhurst, Gila Kamhi, Osnat Weissberg, Limor Fix

One method of handling the computational complexity of the verification process is to combine the strengths of different approaches. We propose a hybrid verification technology combining symbolic trajectory evaluation with either symbolic model checking or SAT-based model checking. This reduces significantly the cost (both human and computing) of verifying circuits with complex initialisation, as well as simplifying proof development by enhancing verification productivity. The approach has been tested on current Intel designs.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aidsverification; F.3.1 [Specifying and Verifying and Reasoning about Programs]: mechanical verification
General Terms
Verification, Theory
Keywords
symbolic model checking, symbolic trajectory evaluation, hybrid verification

8.4 Can BDDs Compete with SAT Solvers on Bounded Model Checking? [p. 117]
Gianpiero Cabodi, Paolo Camurati, Stefano Quer

The usefulness of Bounded Model Checking (BMC) based on propositional satisfiability (SAT) methods has recently proven its efficacy for bug hunting. BDD based tools are able to verify broader sets of properties (e.g. CTL formulas) but recent experimental comparisons between SAT and BDDs in formal verification lead to the conclusion that SAT approaches are more robust and scalable than BDD techniques. In this work we extend BDD-based verification to larger circuit and problem sizes, so that it can indeed compete with SAT-based tools. The approach we propose solves Bounded Model Checking problems using BDDs. In order to cope with larger models it exploits approximate traversals, yet it is exact, i.e. it does not produce false negatives or positives. It reaps relevant performance enhancements from mixed forward and backward, approximate and exact traversals, guided search, conjunctive decompositions and generalized cofactor based BDD simplifications. We experimentally compare our tool with BMC in NuSMV (using mchaff as SAT engine), and we show that BDDs are able to accomplish large verification tasks, and they can better cope with increasing sequential depths.


Session 9: High Level Specification and Design

Chair: Andreas Kanstein
Organizers: Limor Fix, Shin-ichi Minato
9.1 RTL C-Based Methodology for Designing and Verifying a Multi-Threaded Processor [p. 123]
Luc Smria, Andrew Seawright, Renu Mehra, Daniel Ng, Arjuna Ekanayake, Barry Pangrle

A RTL C-based design and verification methodology is presented which enabled the successful high speed validation of a 7 million gate simultaneous multi-threaded (SMT) network processor. The methodology is centered on statically scheduled C-based coding style, C to HDL translation, and a novel RTL-C to RTL-Verilog equivalence checking flow. It leverages improved simulation performance combined with static techniques to reduce the amount of RTL-Verilog and gate-level verification required during development.
Categories - B.5.2 [Register-Transfer-Level Implementation] Design Aids: Automatic synthesis, Hardware description languages, Optimization, Simulation,Verification.
General Terms - Design, Verification, Performance, Languages.
Keywords - C/C++, RTL, design, verification, formal equivalence checking.

9.2 High-Level Specification and Automatic Generation of IP Interface Monitors [p. 129]
Marcio T. Oliveira, Alan J. Hu

A central problem in functional verification is to check that a circuit block is producing correct outputs while enforcing that the environment is providing legal inputs. To attack this problem, several researchers have proposed monitor-based methodologies, which offer many benefits. This paper presents a novel, high-level specification style for these monitors, along with a linear-size, linear-time translation algorithm into monitor circuits. The specification style naturally fits the complex, but well-specified interfaces used between IP blocks in systems-on-chip. To demonstrate the advantage of our specification style, we have specified monitors for various versions of the Sonics OCP protocol as well as the AMBA AHB protocol, and have developed a prototype tool that automatically translates specifications into Verilog or VHDL monitor circuits.
Categories and Subject Descriptors B.5.2 [Register-Transfer Level Implementation]: Design Aids; B.6.3 [Logic Design]: Design Aids; C.0 [Computer Systems Organization]: General - Systems specification methodology; J.6 [Computer-Aided Engineering]: Computer-aided design (CAD)
General Terms
Documentation, Languages, Verification
Keywords
Formal Verification, Regular Expressions, Pipelining, Alternation

9.3 Achieving Maximum Performance: A Method for the Verification of Interlocked Pipeline Control Logic [p. 135]
Kerstin Eder, Geoff Barrett

Getting the interlock logic which controls pipeline flow correct is an important prerequisite for maximising pipeline performance. Unnecessary pipeline stalls can only be eliminated when they can be distinguished from those stalls which are necessary to preserve functional correctness. We propose a method for deriving a maximum pipeline performance specification from a complete functional specification of the pipeline control logic. The performance specification can be used to generate simulation testbench assertions. On the other hand, the specification can serve as a basis for formal property checking. The most promising aspect of our work is, however, the potential to synthesise the actual control logic from its formal description.
Categories and Subject Descriptors
B.5.2 [Register-Transfer-Level Implementation]: Design Aids - Verification; B.5.1 [Register-Transfer-Level Implementation]: Design - Control Design, Pipeline
General Terms
Performance, Verification
Keywords
Pipeline Stall, Interlock Logic, Verification

9.4 Formal Verification of Module Interfaces against Real Time Specifications [p. 141]
Arindam Chakrabarti, Pallab Dasgupta, P. P. Chakrabarti, Ansuman Banerjee

One of the main concerns of the designer of a circuit module is to guarantee that the interface of the module conforms to specific protocols (such as PCI Bus or Ethernet) by which it interacts with its environment. The computational complexity of verifying such open systems under all possible environments has been shown to be very hard (EXPTIME complete [10]). On the other hand, designers are typically required to guarantee correct behavior only for specific valid behaviors of the environment (such as a valid PCI Bus environment). Designers attempt to model these behaviors through an appropriate test bench for the module. In this paper we present a module verifier tool based on a proposed real time temporal logic called Open-RTCTL, which allows combined specification of the correctness properties and the input environments. The tool accepts the design in a subset of Verilog. By making the designer specify the environment constraints, we are able to verify a module in isolation, and thereby avoid the state explosion problem due to composition of modules. We present experimental results on modules from the Texas-97 Benchmark circuits [14] to demonstrate the space/time efficiency of the tool.
Categories and Subject Descriptors
B.7.2. [Hardware]: Integrated Circuits - Verification
General Terms
Verification
Keywords
Formal Verification, Temporal Logic


Session 10: Timing Abstraction

Chair: Mark Hahn
Organizers: Chandu Visweswariah, Narendra V. Shenoy
10.1 Automated Timing Model Generation [p. 146]
Ajay J. Daga, Loa Mize, Subramanyam Sripada, Chris Wolff, Qiuyang Wu

The automated generation of timing models from gate-level netlists facilitates IP reuse and dramatically improves chip-level STA runtime in a hierarchical design flow. In this paper we discuss two different approaches to model generation, the design flows they lend themselves to and results from the application of these model generation solutions to large customer designs. Categories and Subject Descriptors
J.6: Computer Application.CAD.
General Terms
Design, Performance, Algorithm, Verification
Keywords
Static Timing Analysis, Model Generation. EDA.

10.2 Timing Model Extraction of Hierarchical Blocks by Graph Reduction [p. 152]
Cho W. Moon, Harish Kriplani, Krishna P. Belkhale

Timing model extractor builds a timing model of a digital circuit for use with a static timing analyzer. This paper proposes a novel method of generating a gray box timing model from gate-level netlist by reducing a timing graph. Previous methods of generating timing models sacrificed accuracy and/or did not scale well with design size. The proposed method is simple, yet it provides model accuracy including arbitrary levels of latch time borrowing, correct support for self-loop timing checks and capability to support timing constraints that span multiple blocks. Also, cpu and memory resources required to generate the model scale well with size of the circuit. We were able to extract a model for a 456K gate block using under 2 minutes of cpu time and 464 MB of memory on a Sun Fire 880 machine. The generated model can provide a capacity improvement in timing verification by more than two orders of magnitude.

10.3 Efficient Stimulus Independent Timing Abstraction Model Based on a New Concept of Circuit Block Transparency [p. 158]
Martin Foltin, Brian Foutz, Sean Tyler

We have developed a new timing abstraction model for digital circuit blocks that is stimulus independent, port based, supports designs with level triggered latches, and can be input into commercial STA (Static Timing Analysis) tools. The model is based on an extension of the concept of latch transparency to circuit block transparency introduced in this paper. It was implemented, tested and is being used in conjunction with transistor level STA for microprocessor designs with tens of millions of transistors. The STA simulation times are significantly shorter than with gray box timing models, which can decrease the overall chip timing verification time. The model can also be used in the intellectual property encapsulation domain.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - simulation, verification
General Terms
Performance, Design, Verification.
Keywords
Timing analysis, timing model, VLSI design, circuit optimization

10.4 An Implication-based Method to Detect Multi-Cycle Paths in Large Sequential Circuits [p. 164]
Hiroyuki Higuchi

This paper proposes a fast multi-cycle path analysis method for large sequential circuits. It determines whether or not all the paths between every flip-flop pair are multi-cycle paths. The proposed method is based on ATPG techniques, especially on implication techniques, to utilize circuit structure and multi-cycle path condition directly. The method also checks whether or not the multi-cycle path may be invalidated by static hazards in combinational logic parts. Experimental results show that our method is much faster than conventional ones.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids
General Terms
Algorithms, Designs, Verification
Keywords
multi-cycle path, sequential circuits, implication, ATPG


Session 11: Special Session: E-Textiles

Chair & Organizer: Majid Sarrafzadeh
11.1 WITHDRAWN

11.2 The Wearable Motherboard: A Framework for Personalized Mobile Information Processing (PMIP) [p. 170]
Sugmee Park, Kenneth Mackenzie, Sundaresan Jayaraman

Textiles and computing share a synergistic relationship, which is being harnessed to create a new paradigm in personalized mobile information processing (PMIP). In this paper, we provide an overview of this "interconnection" between the two fields and present the vision for "E-Textiles," which represents the convergence of the two fields. We discuss the role of the Georgia Tech Wearable Motherboard in pioneering this paradigm of "fabric is the computer" and serving as a framework for PMIP. Finally, recent research in this area resulting in the realization of a "computational fabric network" is discussed.

11.3 Challenges and Opportunities in Electronic Textiles Modeling and Optimization [p. 175]
Diana Marculescu, Radu Marculescu, Pradeep K. Khosla

This paper addresses an emerging new field of research that combines the strengths and capabilities of electronics and textiles in one: electronic textiles, or e-textiles. E-textiles, also called Smart Fabrics, have not only "wearable" capabilities like any other garment, but also local monitoring and computation, as well as wireless communication capabilities. Sensors and simple computational elements are embedded in e-textiles, as well as built into yarns, with the goal of gathering sensitive information, monitoring vital statistics and sending them remotely (possibly over a wireless channel) for further processing. Possible applications include medical (infant or patient) monitoring, personal information processing systems, or remote monitoring of deployed personnel in military or space applications. We illustrate the challenges imposed by the dual textile/electronics technology on their modeling and optimization methodology.
Categories and Subject Descriptors: I.6 [Simulation and Modeling]: Modeling methodologies; B.8.2 [Performance and reliability]: performance analysis and design aids.
General terms: design, performance


Session 12: PANEL: Analog Intellectual Property: Now? Or Never?

Chair: Stephen Ohr
Organizers: Linda Marchant, Philippe Magarshack
Panelists: Masao Hotta, Mike Brunoli, Felicia James, Rudy Koch, Roy McGuffin, Andrew Moore [p. 181]

There is considerable controversy as to whether or not the trade in analog intellectual properties (IP) will ever represent a viable business opportunity. One school of thought suggests that analog design will always be too specialized to constitute a major market; another school of thought says this will be big business if certain nagging technical problems are solved. While the demand for analog interface components is high, the ability of IP creators to render it in a tradable format is limited. And the ability of digital design teams to successfully utilize analog IPs - without a considerable amount of handholding - is similarly limited. EDA tools here are looked upon as both culpable and offering the best hopes for the future. This panel of experts - representing analog designers, analog EDA tool providers, silicon foundries and analog IP vendors bring their own points of view on some of the business and technology issues which need to be resolved to provide the context for analog IP development and trade. Among the open questions:
Is IP created and validated within the design environment for re-use a more productive approach than imported IP? How many 'traditional' analog designers would admit they are still using kit parts and breadboards - maybe even SPICE and manual IC layout techniques - today in their every-day job?
What CAD tools are needed to help analog designers? Are newly emerging EDA technologies, designed to enhance analog design productivity, maturing rapidly enough to be accepted by designers?
What is the future of analog designs at the very low voltage swings coming with sub-100nm CMOS?
Though silicon foundries need to process a wide set of external IP offerings in order to allow their users to build complete systems, are foundries seeing enough activity in analog IP designs to justify specialized fab runs or the kind of process tuning that would allow analog and digital IPs to coexist on the same chip?
How likely - and how soon - can we get to analog IP development and trade?


Session 13: Low-Power System Design

Chair: Giovanni De Micheli
Organizers: Renu Mehra, Enrico Macii
13.1 Task Scheduling and Voltage Selection for Energy Minimization [p. 183]
Yumin Zhang, Xiaobo (Sharon) Hu, Danny Z. Chen

Categories and Subject Descriptors
I.2.8 [Problem Solving, Control Methods, and Search]: Scheduling
General Terms
Algorithms, Design
Keywords
voltage selection, task scheduling
In this paper, we present a two-phase framework that integrates task assignment, ordering and voltage selection (VS) together to minimize energy consumption of real-time dependent tasks executing on a given number of variable voltage processors. Task assignment and ordering in the first phase strive to maximize the opportunities that can be exploited for lowering voltage levels during the second phase, i.e., voltage selection. In the second phase, we formulate the VS problem as an Integer Programming (IP) problem and solve the IP efficiently. Experimental results demonstrate that our framework is very effective in executing tasks at lower voltage levels under different system configurations.

13.2 Battery-Conscious Task Sequencing for Portable Devices Including Voltage/Clock Scaling [p. 189]
Daler Rakhmatov, Sarma Vrudhula, Chaitali Chakrabarti

Operation of battery-powered portable systems can no longer be sustained once a battery becomes discharged. Maximization of the battery lifetime is a difficult task due to nonlinearity of battery behavior that depends on the characteristics of the system load profile. We address the problem of task sequencing without and with voltage/ clock scaling that shapes the profile so that the battery lifetime is maximized. We developed an accurate analytical battery model and validated it with measurements taken on a real lithium-ion battery used in a pocket computer. We use the model as a basis for a unique battery-conscious cost function and utilize its properties to develop several novel algorithms, including insertion of recovery periods and voltage/clock scaling for delay slack distribution.
Categories and Subject Descriptors
J.6.2 [Computer-Aided Engineering]: Computer-Aided Design
General Terms
Algorithms, Performance, Design
Keywords
Battery, modeling, low-power design, scheduling, voltage scaling

13.3 An Energy Saving Strategy Based on Adaptive Loop Parallelization [p. 195]
I. Kadayif, M. Kandemir, M. Karakoy

In this paper, we evaluate an adaptive loop parallelization strategy (i.e., a strategy that allows each loop nest to execute using different number of processors if doing so is beneficial) and measure the potential energy savings when unused processors during execution of a nested loop in a multi-processor on-a-chip (MPoC) are shut down (i.e., placed into a power-down or sleep state). Our results show that shutting down unused processors can lead to as much as 67% energy savings with up to 17% performance loss in a set of array-intensive applications. We also discuss and evaluate a processor pre-activation strategy based on compile-time analysis of nested loops. Based on our experiments, we conclude that an adaptive loop parallelization strategy combined with idle processor shut-down and pre-activation can be very effective in reducing energy consumption without increasing execution time.
Categories and Subject Descriptors
D.3.4 [Programming Languages]: Processors - Compilers, Optimization
General Terms
Design, Experimentation, Performance
Keywords
Adaptive Parallelization, Multiprocessing, Energy Consumption


Session 14: Fabric-Driven Logic Synthesis

Chair: Maciej Ciesielski
Organizers: Malgorzata Merek-Sadowska, Steven Nowick
14.1 River PLAs: A Regular Circuit Structure [p. 201]
Fan Mo, Robert K. Brayton

A regular circuit structure called a River PLA and its reconfigurable version, Glacier PLA, are presented. River PLAs provide greater regularity than circuits implemented with standard-cells. Conventional optimization stages such as technology mapping, placement and routing are eliminated. These two features make the River PLA a highly predictable structure. Glacier PLAs can be an alternative to FPGAs, but with a simpler and more efficient design methodology.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids Automatic synthesis.
General Terms
Algorithms.
Keywords
Programmable Logic Array, River routing.

14.2 WITHDRAWN

14.3 Layout-Aware Synthesis of Arithmetic Circuits [p. 207]
Junhyung Um, Taewhan Kim

In deep sub-micron (DSM) technology, wires are equally or more important than logic components since wire-related problems such as crosstalk, noise are much critical in system-on-chip (SoC) design. Recently, a method [12] for generating a partial product reduction tree (PPRT) with optimal-timing using bit-level adders to implement arithmetic circuits, which outperforms the current best designs, is proposed. However, in the conventional approaches including [12], interconnects are not primary components to be optimized in the synthesis of arithmetic circuits, mainly due to its high integration complexity or unpredictable wire effects, thereby resulting in unsatisfactory layout results with long and messed wire connections. To overcome the limitation, we propose a new module generation/synthesis algorithm for arithmetic circuits utilizing carry-save-adder (CSA) modules, which not only optimizes the circuit timing but also generates a much regular interconnect topology of the final circuits. Specifically, we propose a two-step algorithm: (Phase 1: CSA module generation) we propose an optimal-timing CSA module generation algorithm for an arithmetic expression under a general CSA timing model; (Phase 2: Bit-level interconnect refinements) we optimally refine the interconnects between the CSA modules while retaining the global CSA-tree structure produced by Phase 1. It is shown that the timing of the circuits produced by our approach is equal or almost close to that by [12] in most testcases (even without including the interconnect delay), and at the same time, the interconnects in layout are significantly short and regular.
Categories and Subject Descriptions
B.2.4. [Arithmetic and Logic Structures]: High-Speed Arithmetic - Algorithms,Cost/Performance
General Terms: Algorithms, Design and Performance
Keywords: Carry-save-adder, layout, high performance


Session 15: Memory Management and Address Optimization in Embedded Systems

Chair: Nikil Dutt
Organizers: Diederik Verkest, Luca Benini
15.1 Automatic Data Migration for Reducing Energy Consumption in Multi-Bank Memory Systems [p. 213]
V. De La Luz, M. Kandemir, I. Kolcu

An architectural solution to reducing memory energy consumption is to adopt a multi-bank memory system instead of a monolithic (single-bank) memory system. Some recent multi-bank memory architectures help reduce memory energy by allowing an unused bank to be placed into a low-power operating mode. This paper describes an automatic data migration strategy which dynamically places the arrays with temporal affinity into the same set of banks. This strategy increases the number of banks which can be put into low-power modes and allows the use of more aggressive energy saving modes. Experiments using several array-dominated applications show the usefulness of data migration and indicate that large energy savings can be achieved with low overhead.
Categories and Subject Descriptors
B.3 [Hardware]: Memory Structures
General Terms
Design, Experimentation, Performance
Keywords
Energy Consumption, Multi-Bank Memories, Data Migration

15.2 Exploiting Shared Scratch Pad Memory Space in Embedded Multiprocessor Systems [p. 219]
Mahmut Kandemir, J. Ramanujam, A. Choudhary

In this paper, we present a compiler strategy to optimize data accesses in regular array-intensive applications running on embedded multiprocessor environments. Specifically, we propose an optimization algorithm that targets the reduction of extra off-chip memory accesses caused by inter-processor communication. This is achieved by increasing the application-wide reuse of data that resides in the scratch-pad memories of processors. Our experimental results obtained on four array-intensive image processing applications indicate that exploiting inter-processor data sharing can reduce the energy-delay product by as much as 33.8% (and 24.3% on average) on a four-processor embedded system. The results also show that the proposed strategy is robust in the sense that it gives consistently good results over a wide range of several architectural parameters.
Categories and Subject Descriptors B.3 [Hardware] Memory Structures; D.3.4 [Software] Programming Languages: Processors [Compilers]
Terms Algorithms, management, performance.
Keywords Embedded multiprocessors, energy consumption, scratch pad memories, access patterns, compiler optimizations, data tiles.

15.3 Address Assignment Combined with Scheduling in DSP Code Generation [p. 225]
Yoonseo Choi, Taewhan Kim

One of the important issues in embedded system design is to optimize program code for the microprocessor to be stored in ROM. In this paper, we propose an integrated approach to the DSP address code generation problem for minimizing the number of addressing instructions. Unlike previous works in which code scheduling and offset assignment are performed sequentially without any interaction between them, our work tightly couples offset assignment problem with code scheduling to exploit scheduling on minimizing addressing instructions more effectively. We accomplish this by developing a fast but accurate two-phase procedure which, for a sequence of code schedules, finds a sequence of memory layouts with minimum addressing instructions. Experimental results with benchmark DSP programs show improvements of 13%-33% in the address code size over Solve-SOA/GOA [7].
Categories and Subject Descriptors
C.3 [Special-purpose and application-based systems]: [Signal processing systems]
General Terms
Algorithms, Performance
Keywords
Offset assignment, Scheduling, Code Generation


Session 16: Special Session: Optics: Lighting the Way to Eda Riches?

Chair: Jaijeet Roychowdhury
Organizers: Jaijeet Roychowdhury, Joel R. Phillips
16.1 Multifunctional Photonic Integration for the Agile Optical Internet [p. 231]
Edward H. Sargent

An agile, transparent optical network is emerging. This paper enumerates the functions that will be needed at nodes in order to add transparency and agility to the network while robustly assuring optical fibre channel performance. One key requirement will be scalable integration of multiple functions on a platform. The paper presents recent results along one strategic axis for integration the use of functionalized self-organized photonic crystals and heterostructures thereof to control the flow and features of light.
Categories and Subject Descriptors
C.2.1 [Network Architecture and Design]: Enabling technologies.
General Terms
Experimentation, Theory.
Keywords
Agile optical networks, reconfigurability, optical performance monitoring, optoelectronic integration, photonic crystals, optical nonlinearity, electro-optics, optical polymers, semiconductor nanocrystals.

16.2 Computer Aided Design of Long-Haul Optical Transmission Systems [p. 235]
James G. Maloney, Brian E. Brewington, Curtis R. Menyuk

We present a general overview of the role of computer models in the design and optimization of commercial optical transmission systems. Specifically, we discuss (1) the role of modeling in a commercial setting, (2) achieving the proper balance between accuracy and computation speed, (3) model verification against experiment, and (4) case studies demonstrating the benefits of modeling. Ideally, experiments are preferable to models when describing system performance, particularly to support claims of a systems functionality to a customer. However, modeling is often the only choice for many of the problems that a commercial networking company must solve. Because there are design parameter spaces that are either too expensive or too time consuming to verify experimentally, the main role of modeling in industry is to study what experiments cannot. For example, when an analytical solution of a statistical problem is infeasible, a common modeling solution is to perform Monte Carlo trials to study the statistical behavior. Another typical modeling task involves looking at variations of hardware that would be prohibitively expensive to acquire and test. Implementing modeling in industry involves a balance between three needs: cost-efficiency, time-efficiency, and accuracy. We will discuss the approaches we have taken at PhotonEx to meet these needs: leveraging academic research, developing reduced models and utilizing computational clusters. Specifically, we will use case studies to illustrate the application of these approaches to modeling long-haul optical transmission systems.
Categories and Subject Descriptors
J.6 [Computer Applications]: Computer-Aided Engineering - computer-aided design (CAD) and computer-aided manufacturing (CAM).
General Terms
Algorithms, Measurement, Performance, Design, Experimentation, Theory, Verification
Keywords
Optical Communication, Long-Haul (LH) Transmission, Ultra-Long Haul (ULH) Transmission, Optical Modeling

16.3 A Fast Optical Propagation Technique for Modeling Micro-Optical Systems [p. 236]
Timothy P. Kurzweg, Steven P. Levitan, Jose A. Martinez, Mark Kahrs, Donald M. Chiarulli

As designers become more aggressive in introducing optical components to micro-systems, rigorous optical models are required for system-level simulation tools. Common optical modeling techniques and approximations are not valid for most optical micro-systems, and those techniques that provide accurate simulation are computationally slow. In this paper, we introduce an angular frequency optical propagation technique that greatly reduces computation time while achieving the accuracy of a full scalar formulation. We present simulations of a diffractive optical MEM Grating Light Valve to show the advantages of this optical propagation method and the integration of the technique into a system-level multi-domain CAD tool.
Categories and Subject Descriptors
I.6.5 [Simulation and Modeling]: Model Development - modeling methodologies
General Terms
Algorithms, Design
Keywords
Optical Propagation, Angular Spectrum, CAD, Optical Micro-systems, Optical MEMS


Session 17: PANEL: Nanometer Design: What Hurts Next...?

Chair: Lawrence T. Pileggi
Organizers: Rob A. Rutenbar, Andrew B. Kahng
Panelists: Bob Brodersen, Anthony Hill, John Kibarian, Desmond A. Kirkpatrick, Mitsumasa Koyanagi, Mark Lavin [p. 242]

Every year, the design and EDA communities are besieged by dire warnings about the impending doom of "design as we know it" Every year, another unpleasant physical effect from the evil depths of deep submicron physics surfaces, compromising our designs in new and vile ways. Every year, the same story: more nanometer woes. Rather than endorse a new winner in this year's race for the "next worst thing" from the nanometer arena, this panel gathers a set of world-class technology experts to debate what effects are hiding just around the next corner, waiting to pounce on the unwary tool or chip designer. Which among these is really the most important, when will it happen, and why?


Session 18: Novel DFT, BIST and Diagnosis Techniques

Chair: Rathish Jayabharathi
Organizers: Kwang-Ting (Tim) Cheng, T. M. Mak
18.1 Low-Cost Sequential ATPG with Clock-Control DFT [p. 243]
Miron Abramovici, Xiaoming Yu, Elizabeth M. Rudnick

We present a new clock-control DFT technique for sequential circuits, based on clock partitioning and selective clock freezing, and we use it to break the global feedback loops and to generate clock waves to test the resulting sequential circuit with self-loops. Clock waves allow us to significantly reduce the complexity of sequential ATPG. Unlike scan, our non-intrusive DFT technique does not introduce any delay penalty; the generated tests may be applied at speed, have shorter application time, and dissipate less power.
Categories and Subject Descriptors: B.8.1 [Performance and Reliability]: Reliability, Testing, and Fault-Tolerance
General Terms: Algorithms, Design, Reliability

18.2 Effective Diagnostics through Interval Unloads in a BIST Environment [p. 249]
Peter Wohl, John A. Waicukauski, Sanjay Patel, Greg Maston

Logic built-in self test (BIST) is increasingly being adopted to improve test quality and reduce test costs for rapidly growing designs. Compared to deterministic automated test pattern generation (ATPG), BIST presents inherent fault diagnostic challenges. Previous diagnostic techniques have been limited in their diagnosis resolution and/or require significant hardware overhead. This paper proposes an interval-based scan-unload method that ensures diagnosis resolution down to gate-level faults with minimal hardware overhead. Tester fail-data collection is based on a novel construct incorporated into the design-extensions of the standard test-interface language (STIL). The implementation of the proposed method is presented and analyzed.
Categories and Subject Descriptors: B.8.1 [Performance and Reliability]: Reliability, Testing and Fault-Tolerance.
General Terms: Algorithms, Design.
Keywords: built-in self-test (BIST), fault diagnosis.

18.3 On Output Response Compression in the Presence of Unknown Output Values [p. 255]
Irith Pomeranz, Sandip Kundu, Sudhakar M. Reddy

A circuit may produce unknown output values during simulation of an input sequence due to an unknown initial state or due to the existence of tri-state elements. For circuits tested using BIST, unknown output values make it impossible to determine a single unique signature for the fault free circuit. To accommodate unknown output values in a BIST scheme, we describe a procedure for synthesizing a minimal logic block that replaces unknown output values by a known constant. The proposed procedure ensures that the BIST scheme will be able to detect all the faults detectable by the input sequence applied to the circuit while allowing a single unique signature to be obtained.

18.4 Software-Based Diagnosis for Processors [p. 259]
Li Chen, Sujit Dey

Software-based self-test (SBST) is emerging as a promising technology for enabling at-speed test of high-speed microprocessors using low-cost testers. We explore the fault diagnosis capability of SBST, in which functional information can be used to guide and facilitate the generation of diagnostic tests. By using a large number of carefully constructed diagnostic test programs, the fault universe can be divided into fine-grained partitions, each corresponding to a unique pass/fail pattern. We evaluate the quality of diagnosis by constructing diagnostic-tree-based fault dictionaries. We demonstrate the feasibility of the proposed method by applying it to a processor example. Experimental results show its potential as an effective method for diagnosing larger processors.
Categories and Subject Descriptors
B.8.1 [Performance and Reliability]: Reliability, Testing, and Fault-Tolerance.
General Terms
Algorithms, Measurement, Reliability, Experimentation.
Keywords
Microprocessor, self-test, instruction, diagnostics.


Session 19: Case Studies in Embedded System Design

Chair: Wayne Wolf
Organizers: Anand Raghunathan, Xiabo (Sharon) Hu
19.1 Design of a High-Throughput Low-Power IS95 Viterbi Decoder [p. 263]
Xun Liu, Marios C. Papaefthymiou

The design of high-throughput large-state Viterbi decoders relies on the use of multiple arithmetic units. The global communication channels among these parallel processors often consist of long interconnect wires, resulting in large area and high power consumption. In this paper, we propose a data-transfer oriented design methodology to implement a low-power 256-state rate-1/3 IS95 Viterbi decoder. Our architectural level scheme uses operation partitioning, packing, and scheduling to analyze and optimize interconnect effects in early design stages. In comparison with other published Viterbi decoders, our approach reduces the global data transfers by up to 75% and decreases the amount of global buses by up to 48%, while enabling the use of deeply pipelined datapaths with no data forwarding. In the RTL implementation of the individual processors, we apply precomputation in conjunction with saturation arithmetic to further reduce power dissipation with provably no coding performance degradation. Designed using a 0.25 m standard cell library, our decoder achieves a throughput of 20 Mbps in simulation and dissipates only 450 mW.
Categories and Subject Descriptors
B.7.1 [Integrated Circuits]: Types and Design StylesAlgorithms implemented in hardware
General Terms
Design, Performance
Keywords
Communications, Pipelining, Bus reduction

19.2 A Detailed Cost Model for Concurrent Use With Hardware/Software Co-Design [p. 269]
Daniel Ragan, Peter Sandborn, Paul Stoaks

Hardware/software co-design methodologies generally focus on the prediction of system performance or co-verification of system functionality. This study extends this conventional focus through the development of a methodology and software tool that evaluates system (hardware and software) development, fabrication, and testing costs (dollar costs) concurrent with hardware/software partitioning in a co-design environment. Based on the determination of key metrics such as gate count and lines of software, a new tool called Ghost, evaluates software and hardware development, fabrication, packaging and testing costs. Ghost enables optimization of hardware/software partitioning as a function of specific combinations of hardware foundries and software development environments.
Categories and Subject Descriptors
E3 [HW/SW co-design]: specification, model., co-simulation and performance analysis, system-level scheduling and partitioning.
General Terms Design, Economics.
Keywords Cost Modeling, Cost-Performance Trade-off.

19.3 Efficient Code Synthesis from Extended Dataflow Graphs for Multimedia Applications [p. 275]
Hyunok Oh, Soonhoi Ha

This paper presents efficient automatic code synthesis techniques from dataflow graphs for multimedia applications. Since multimedia applications require large size buffers containing composite type data, we aim to reduce the buffer sizes with fractional rate dataflow extension and buffer sharing technique. In an H.263 encoder experiment, the FRDF extension and buffer sharing technique enable us to reduce the buffer size by 67%. The final buffer size is no more than in a manual reference code.
Keywords
memory optimization, software synthesis, multimedia, dataflow


Session 20: Theoretical Foundations of Embedded System Design

Chair: Rajesh Gupta
Organizers: Annette Reutter, Donatella Sciuto
20.1 Transformation Based Communication and Clock Domain Refinement for System Design [p. 281]
Ingo Sander, Axel Jantsch

The ForSyDe methodology has been developed for system level design. In this paper we present formal transformation methods for the refinement of an abstract and formal system model into an implementation model. The methodology defines two classes of design transformations: (1) semantic-preserving transformations and (2) design decisions. In particular we present and illustrate communication and clock domain refinement by way of a digital equalizer system.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design-Aids; J.6 [Computer-Aided Engineering]: Computer-Aided Design (CAD)
General Terms
Design, Theory
Keywords
System Design, System Modeling, Design Refinement

20.2 Model Composition for Scheduling Analysis in Platform Design [p. 287]
Kai Richter, Dirk Ziegenbein, Marek Jersak, Rolf Ernst

Categories and Subject Descriptors
C.0 [General]: System Architecture; C.3 [Computer Systems Organization]: Special-Purpose and Application-Based Systems - realtime and embedded systems; C.4 [Computer Systems Organization]: Performance of Systems
General Terms
Algorithms, Performance, Verification
Keywords
Platform-Based Design, Performance Analysis, Scheduling, Formal Analysis

20.3 Timed Compiled-Code Simulation of Embedded Software for Performance Analysis of SOC Design [p. 293]
Jong-Yeol Lee, In-Cheol Park

In this paper, a new timing generation method is proposed for the performance analysis of embedded software. The time stamp generation of I/O accesses is crucial to performance estimation and architecture exploration in the timed functional simulation, which simulates the whole design at a functional level with timing. A portable compiler is modified to generate time-deltas, which are the estimated cycle counts between two adjacent I/O accesses, by counting the cycles of the intermediate representation (IR) operations and using a machine description that contains information on a target processor. Since the proposed method is based on the machine-independent IR of a compiler, the method can be applied to various processors by changing the machine description. The experimental results show that the proposed method is effective in that the average estimation error is about 2% and the maximum speed-up over the corresponding instruction-set simulators is about 300 times. The proposed method is also verified in a timed functional simulation environment.


Session 21: Equivalence Verification

Chair: Ziyad Hanna
Organizer: Shin-ichi Minato
21.1 Automated Equivalence Checking of Switch Level Circuits [p. 299]
Simon Jolly, Atanas Parashkevov, Tim McDougall

A chip that is required to meet strict operating criteria in terms of speed, power, or area is commonly custom designed at the switch level. Traditional techniques for verifying these designs, based on simulation, are expensive in terms of resources and cannot completely guarantee correct operation. Formal verification methods, on the other hand, provide for a complete proof of correctness, and require less effort to setup. This paper presents Motorolas Switch Level Verification (SLV) tool, which employs detailed switch level analysis to model the behavior of MOS transistors and obtain an equivalent RTL model. This tool has been used for equivalence checking at the switch level for several years within Motorola for the PowerPC, M*Core and DSP custom blocks. We focus on the novel techniques employed in SLV, particularly in the areas of pre-charged and sequential logic analysis, and provide details on the automated and integrated equivalence checking flow in which the tool is used.
Categories and Subject Descriptors
J.6 [Computer-Aided Engineering]: Computer-Aided Design.
General Terms
Algorithms, Design, Verification.
Keywords
Custom design, switch level analysis, equivalence checking, formal verification, MOS circuits, VLSI design.

21.2 A Practical and Efficient Method for Compare-Point Matching [p. 305]
Demos Anastasakis, Robert Damiano, Hi-Keung Tony Ma, Ted Stanion

An important step in using combinational equivalence checkers to verify sequential designs is identifying and matching corresponding compare-points in the two sequential designs to be verified. Both non-function and function-based matching methods are usually employed in commercial verification tools. In this paper, we describe a heuristic algorithm using ATPG for matching compare-points based on the functionality of the combinational blocks in the sequential designs. Results on industrial-sized circuits show our methods are both practical and efficient. Categories and Subject Descriptors
J.6 [Computer-aided engineering]: Verification, compare-point matching.
General Terms
Algorithms, Experimentation, Verification.
Keywords
Combinational verification, equivalence checking, latch mapping.

21.3 Self-referential Verification of Gate-level Implementations of Arithmetic Circuits [p. 311]
Ying-Tsai Chang, Kwang-Ting (Tim) Cheng

Verification of gate-level implementations of arithmetic circuits is challenging due to a number of reasons: the existence of some hard-to-verify arithmetic operators (e.g. multiplication), the use of different operand ordering, the incorporation of merged arithmetic with cross-operator implementations, and the employment of circuit transformations based on arithmetic relations. It is hence a peculiar problem that does not fit quite well into the existing RTL-to-gate equivalence checking methodology. In this paper, we propose a self-referential functional verification approach which uses the gate-level implementation of the arithmetic circuit under verification to verify itself. Specifically, the verification task is decomposed into a sequence of equivalence checking subproblems, each of which compare circuit pairs derived from the implementation under verification based on the proposed self-referential functional equations. A decomposition-based heuristic using structural information is employed to guide the verification process for better efficiency. Experimental results on a number of implementations of the multiply-add units and the inner product units with different architectures demonstrate the versatility of this approach.
Categories and Subject Descriptors
B.5.2 [Register-Transfer-Level Implementation]: Design Aids verification
General Terms
Algorithm, Verification
Keywords
Arithmetic circuit verification


Session 22: PANEL: Whither (or Wither?) ASIC Handoff?

Chair: Michael Santarini
Organizers: Sudhakar Jilla, Mark Miller
Panelists: Tommy Eng, Sandeep Khanna, Kamalesh Ruparel, Tom Russell, Kazu Yamada [p. 317]

The traditional ASIC netlist handoff is changing - but to what? Is RTL handoff finally a reality? Or, will a placement-based handoff model emerge? Are differences among underlying tool technologies and methodologies only cosmetic? Or, are there fundamental business and IP distinctions? These and other questions will be discussed as the panel examines the future of the designer - ASIC vendor - EDA vendor relationship.


Session 23: Embedded Software Automation: From Specification to Binary

Chair: Joerg Henkel
Organizers: Marco Di Natale, Xiaobo (Sharon) Hu
23.1 Software Synthesis from Synchronous Specifications Using Logic Simulation Techniques [p. 319]
Yunjian Jiang, Robert K. Brayton

This paper addresses the problem of automatic generation of implementation software from high-level functional specifications in the context of embedded system on chip designs. Software design complexity for embedded systems has increased so much that a high-level functional programming paradigm need to be adopted for formal verifiability, maintainability and short time-to-market. We propose a framework for efficiently generating implementation software from a synchronous state machine specification for embedded control systems. The framework is generic enough to allow hardware/software partition for a given architecture platform. It is demonstrated that the logic optimization and simulation techniques can be combined to produce fast execution code for such embedded systems. Specifically, we propose a framework for software synthesis from multi-valued logic, including fast evaluation of logic functions, and scheduling techniques for node execution. Experiments are performed to show the initial results of our algorithms in this framework.

23.2 Complex Library Mapping for Embedded Software Using Symbolic Algebra [p. 325]
Armita Peymandoust, Tajana Simunic, Giovanni De Micheli

Embedded software designers often use libraries that have been pre-optimized for a given processor to achieve higher code quality. However, using such libraries in legacy code optimization is nontrivial and typically requires manual intervention. This paper presents a methodology that maps algorithmic constructs of the software specification to a library of complex software elements. This library-mapping step is automated by using symbolic algebra techniques. We illustrate the advantages of our methodology by optimizing an algorithmic level description of MPEG Layer III (MP3) audio decoder for the Badge4 [2] portable embedded system. During the optimization process we use commercially available libraries with complex elements ranging from simple mathematical functions such as exp to the IDCT routine. We implemented and measured the performance and energy consumption of the MP3 decoder software on Badge4 running embedded Linux operating system. The optimized MP3 audio decoder runs 300 times faster than the original code obtained from the standards body while consuming 400 times less energy. Since our optimized MP3 decoder runs 3.5 times faster than real-time, additional energy can be saved by using processor frequency and voltage scaling.
Categories and Subject Descriptors
C.3 [Special-Purpose and Application-Based Systems]: Microprocessor/microcomputer applications, Real-time and embedded systems, Signal processing systems.
General Terms
Algorithms, Performance, Design, Experimentation, Theory.
Keywords
Embedded software optimization, Automated library mapping, Symbolic algebra, Polynomial representation, Computation intensive software.

23.3 Retargetable Binary Utilities [p. 331]
Maghsoud Abbaspour, Jianwen Zhu

Since software is playing an increasingly important role in system-on-chip, retargetable compilation has been an active research area in the last few years. However, the retargetting of equally important downstream system tools, such as assemblers, linkers and debuggers, has either been ignored, or falls short of meeting the requirements of modern programming languages and operating systems. In this paper, we present techniques that can automatically retarget the GNU binutils tool kit, which contains a large array of production-quality downstream tools. Other than having all the advantages enjoyed by open-source software by aligning to a de facto standard, our techniques are systematic, as a result of using a formal model of instruction set architecture (ISA) and application binary interface (ABI); and simple, as a result of leveraging free software to the largest extent.
Categories and Subject Descriptors
D.3.4 [Processors]: Retargetable compilers
General Terms
Design, Languages


Session 24: Applications of Reconfigurable Computing

Chair: Ivo Bolsens
Organizers: Grant E. Martin, Kurt Keutzer
24.1 Exploiting Operation Level Parallelism through Dynamically Reconfigurable Datapaths [p. 337]
Zhining Huang, Sharad Malik

Increasing non-recurring engineering (NRE) and mask costs are making it harder to turn to hardwired Application Specific Integrated Circuit (ASIC) solutions for high performance applications [12]. The volume required to amortize these high costs has been increasing, making it increasingly expensive to afford ASIC solutions for medium volume products. This has led to designers seeking programmable solutions of varying sorts using these so-called programmable platforms. These programmable platforms span a large range from bit-level programmable Field Programmable Gate Arrays (FPGAs), to word-level programmable application-specific, and in some cases even general-purpose processors. The programmability comes with a power and performance overhead. Attempts to reduce this overhead typically involve making some core hardwired ASIC like logic blocks accessible to the programmable elements. This paper presents one such hybrid solution in this space a relatively simple processor with a dynamically reconfigurable datapath acting as an accelerating co-processor. This datapath consists of hardwired function units and reconfigurable interconnect. We present a methodology for the design of these solutions and illustrate it with two complete case studies: an MPEG 2 coder, and a GSM coder, to show how significant speedups can be obtained using relatively little hardware. The co-processor can be viewed as a VLIW processor with a single instruction per kernel loop. We compare the efficiency of exploiting the operation level parallelism using classic VLIW processors and this proposed class of dynamically configurable co-processors. This work is part of the MESCAL project, which is geared towards developing design environments for the development of application specific platforms.
Categories and Subject Descriptors
J.6 [COMPUTER-AIDED ENGINEERING]: Computer-aided design (CAD)
General Terms
Design, Performance

24.2 Dynamic Hardware Plugins in an FPGA with Partial Run-time Reconfiguration [p. 343]
Edson L. Horta, John W. Lockwood, David E. Taylor, David Parlour

Tools and a design methodology have been developed to support partial run-time reconfiguration of FPGA logic on the Field Programmable Port Extender. High-speed Internet packet processing circuits on this platform are implemented as Dynamic Hardware Plugin (DHP) modules that fit within a specific region of an FPGA device. The PARBIT tool has been developed to transform and restructure bitfiles created by standard computer aided design tools into partial bitsteams that program DHPs. The methodology allows the platform to hot-swap application-specific DHP modules without disturbing the operation of the rest of the system.
Keywords
FPGA, partial RTR, reconfiguration, hardware, modularity, network, routing, packet, Internet, IP, platform computing
Categories and Subject Descriptors
B.7.2 [Hardware]: Circuits|Design Aids; B.7.1 [Hardware]: Circuits|VLSI ; B.4.3 [Hardware]: Input/Output and Data Communications|Interconnections (Subsystems); C.2.1 [Computer Systems Organization]: Computer-Communication Networks|Network Architecture and Design
General Terms
Design, Experimentation

24.3 A Reconfigurable FPGA-Based Readback Signal Generator For Hard-Drive Read Channel Simulator [p. 349]
Jinghuan Chen, Jaekyun Moon, Kia Bazargan

A hard disk readback signal generator designed to provide noise-corrupted signals to a channel simulator has been implemented on a Xilinx VirtexTME FPGA device. The generator simulates pulses sensed by read heads in hard drives. All major distortion and noise processes, such as intersymbol interference, transition noise, electronics noise, head and media nonlinearity, intertrack interference, and write timing error, can be generated according to the statistics and parameters defined by the user. Reconfigurable implementation enables an update of the signal characteristics in runtime. The user also has the flexibility to choose from a set of bitstreams to simulate particular combinations of noise and distortion. Such customized restructuring helps reduce the area consumption and hence virtually increase the capacity of the FPGA device. The time to generate the readback signals has been reduced by four orders compared to its software counterpart.


Session 25: New Test Methods Targeting Non-Classical Faults

Chair: Rob Aitken
Organizers: Miron Abramovici, T. M. Mak
25.1 Embedded Software-Based Self-Testing for SoC Design [p. 355]
A. Krstic, W.-C. Lai, L. Chen, K.-T. Cheng, S. Dey

At-speed testing of high-speed circuits is becoming increasingly difficult with external testers due to the growing gap between design and tester performance, growing cost of high-performance testers and increasing yield loss caused by inherent tester inaccuracy. Therefore, empowering the chip to test itself seems like a natural solution. Hardware-based self-testing techniques have limitations due to performance and area overhead and problems caused by the application of non-functional patterns. Embedded software-based self-testing has recently become focus of intense research. In this methodology, the programmable cores are used for on-chip test generation, measurement, response analysis and even diagnosis. After the programmable core on a System-onChip (SoC) has been self-tested, it can be reused for testing on-chip buses, interfaces and other non-programmable cores. The advantages of this methodology include at-speed testing, low design-for-testability overhead and application of functional patterns in the functional environment. In this paper, we give a survey and outline the roadmap and challenges of this emerging embedded software-based self-testing paradigm.
Categories and Subject Descriptors
B.8.1 [Integrated Circuits]: Performance and Reliability reliability, testing and fault-tolerance.
General Terms
Algorithms, Performance, Reliability.
Keywords
VLSI test, SoC test, functional test, microprocessor test.

25.2 A Novel Wavelet Transform Based Transient Current Analysis for Fault Detection and Localization [p. 361]
Swarup Bhunia, Kaushik Roy, Jaume Segura

Transient current (IDD) based testing has been often cited and investigated as an alternative and/or supplement to quiescent current (IDDQ) testing. While the potential of IDD testing for fault detection has been established, there is no known efficient method for fault diagnosis using IDD analysis. In this paper, we present a novel integrated method for fault detection and localization using wavelet transform based IDD waveform analysis. The time-frequency resolution property of wavelet transform helps us detect as well as localize faults in digital CMOS circuits. Experiments performed on measured data from a fabricated 8-bit shift register and simulation data from more complex circuits show promising results for both detection and localization. Wavelet based detection method shows superior sensitivity than spectral and time-domain methods. The effectiveness of the localization method in presence of process variation, measurement noise and complex power supply network is addressed.
Categories and Subject Descriptors
B.8.2 [Hardware]: Performance and Reliability| Reliability, Testing, and Fault-Tolerance
General Terms
Algorithms, Reliability, Experimentation
Keywords
Transient current (IDD), wavelet transform, fault localization

25.3 Signal Integrity Fault Analysis Using Reduced-Order Modeling [p. 367]
Amir Attarha, Mehrdad Nourani

This paper aims at analysis of signal integrity for the purpose of testing high speed interconnects. This requires taking into account the effect of inputs as well as parasitic RLC elements of the interconnect. To improve the analysis/simulation time in integrity fault testing, we use reduced-order modeling that essentially performs the analysis in the frequency domain. To demonstrate the generality and usefulness of our method, we also discuss its application for test pattern generation targeting signal integrity loss.

25. 4 Enhancing Test Efficiency for Delay Fault Testing Using Multiple-Clock Schemes [p. 371]
Jing-Jia Liou, Li-C. Wang, Kwang-Ting Cheng, Jennifer Dworak, M. Ray Mercer, Rohit Kapur, Thomas W. Williams

In conventional delay testing, the test clock is a single pre-defined parameter that is often set to be the same as the system clock. This paper discusses the potential of enhancing test efficiency by using multiple clock frequencies. The intuition behind our work is that for a given set of AC delay patterns, a carefully-selected, tighter clock would result in higher effectiveness to screen out the potential defective chips. Then, by using a smarter test clock scheme and combining with a second set of AC delay patterns, the overall quality of AC delay test can be enhanced while the cost of including the second pattern set can be minimized. We demonstrate these concepts through analysis and experiments using a statistical timing analysis framework with defect-injected simulation.
Categories and Subject Descriptors
B.8.1 [Hardware]: Reliability, Testing, and Fault-Tolerance
General Terms
Experimentation, Measurement, Reliability
Keywords
Delay Testing, Statistical Timing Analysis, Transition Fault Model


Session 26: Special Session: How Do You Design a 10M Gate ASIC?

Chair: Ahmed A. Jerraya
Organizers: Ahmed A. Jerraya, Kurt Keutzer
26.1 Going Mobile: The Next Horizon for Multi-million Gate Designs in the Semi-Conductor Industry [p. 375]
Christian Berthet

The complexity of a System-on-Chip design is not only in the million transistors packed in a square millimeter. The major challenge for technical success of a SoC is to make sure that millions lines of software fit in with millions gates. In this paper, the problematic of multi-million gate design is illustrated from the viewpoint of a practical development of a complex digital system done at STMicroelectronics for a GSM/GPRS cellular application.
Categories and Subject Descriptors
C.3 [Computer Systems Organisation]: Special-purpose and application-based systems real-time and embedded systems.
General Terms
Design.
Keywords
SoC Design, HW/SW co-design.


Session 27: Power Distribution Issues

Chair: Sachin Sapatnekar
Organizers: Abhijit Dharchoudhury, Tadahiro Kuroda
27.1 HiPRIME: Hierarchical and Passivity Reserved Interconnect Macromodeling Engine for RLKC Power Delivery [p. 379]
Yahong Cao, Yu-Min Lee, Tsung-Hao Chen, Charlie Chung-Ping Chen

This paper proposes a general hierarchical analysis methodology, HiPRIME, to efficiently analyze RLKC power delivery systems. After partitioning the circuits into blocks, we develop and apply the IEKS (Improved Extended Krylov Subspace) method to build the Multi-port Norton Equivalent circuits which transform all the internal sources to Norton current sources at ports. Since there is no active elements inside the Norton circuits, passive or realizable model order reduction techniques such as PRIMA can be applied. To further reduce the top-level hierarchy runtime, we develop a second-level model reduction algorithm and prove its passivity. Experimental results show 400-700X runtime improvement with less than 0.2% error.

27.2 High-Level Current Macro-Model For Power-Grid Analysis [p. 385]
Srinivas Bodapati, Farid N. Najm

We present a frequency domain current macro-modeling technique for capturing the dependence of the block current waveform on its input vectors. The macro-model is based on estimating the Discrete Cosine Transform (DCT) of the current waveform as a function of input vector pair and then taking the inverse transform to estimate the time domain current waveform. The input vector pairs are partitioned according to Hamming distance and a current macro-model is built for each Hamming distance using regression. Regression is done on a set of current waveforms generated for each circuit, using HSPICE. The average relative error in peak current estimation using the current macro-model is less than 20%.
Categories and Subject Descriptors
B.7 [Hardware]: Integrated CircuitsCAD; B.7.2 [Integrated Circuits]: Design AidsModeling
General Terms
Algorithms
Keywords
Power grid, Current macro-model, DCT

27.3 Macro-Modeling Concepts For The Chip Electrical Interface [p. 391]
Brian W. Amick, Claude R. Gauthier, Dean Liu

The power delivery network is made up of passive elements in the distribution network, as well as the active transistor loads. A chip typically has three types of power supplies that require attention: core, I/O, and analog. Core circuits consist of digital circuits and have the largest current demand. In addition to all of the system issues/models for the core, modeling the I/O subsystem has the additional requirement of modeling return paths and discontinuities. The analog circuits present yet a different challenge to the macromodeling of the supply network because they place a tight demand on supply variations. This paper presents a design methodology on how to generate macro-models of the entire chip electrical interface. This methodology can be used by the chip, package, and system designers and is being used to design high-reliability servers.
Categories and Subject Descriptors
C.5.3 [Computer System Implementation]: VLSI Systems.
General Terms
Performance, Design, Reliability.
Keywords
VLSI Power Distribution, Inductance, High Speed Microprocessor Design, Analog and I/O Power Delivery.

27.4 Modeling and Analysis of Regular Symmetrically Structured Power/Ground Distribution Networks [p. 395]
Hui Zheng, Lawrence T. Pileggi

In this paper we propose a novel and efficient methodology for modeling and analysis of regular symmetrically-structured power/ ground distribution networks. The modeling of inductive effects is simplified by a folding technique which exploits the symmetry in the power/ground distribution. Furthermore, employment of susceptance [10,11] (inverse of inductance) models enables further simplification of the analysis, and is also shown to preserve the symmetric positive definiteness of the circuit equations. Experimental results demonstrate that our approach can provide up to 8x memory savings and up to10x speedup over the already efficient simulation based on the original sparse susceptance matrix without loss of accuracy. Importantly, this work demonstrates that by employing limited regularity, one can create excellent power/ground distribution designs that are dramatically simpler to analyze, and therefore amenable to more powerful global design optimization. Categories and Subject Descriptors
B.7.2 [Integrated Circuits] Design Aids - verification
General Terms
Design, Verification
Keywords
Power/Ground Distribution, Susceptance, Folding Technique, Design Regularity

27. 5 Clock Tree Optimization in Synchronous CMOS Digital Circuits for Substrate Noise Reduction Using Folding of Supply Current Transients [p. 399]
Mustafa Badaroglu, Kris Tiri, Stphane Donnay, Piet Wambacq, Ingrid Verbauwhede, Georges Gielen, Hugo De Man

In a synchronous clock distribution network with zero latencies, digital circuits switch simultaneously on the clock edge, therefore they generate substrate noise due to the sharp peaks on the supply current. We present a novel methodology optimizing the clock tree for less substrate generation by using statistical single cycle supply current profiles computed for every clock region taking the timing constraints into account. Our methodology is novel as it uses an error-driven compressed data set during the optimization over a number of clock regions specified for a significant reduction in substrate noise. It also produces a quality analysis of the computed latencies as a function of the clock skew. The experimental results show >x2 reduction of substrate noise generation from the circuits having four clock regions of which the latencies are optimized.
Categories and Subject Descriptors
B.5.1 [Register-Transfer-Level Implementation]: Design - datapath design. B.6.1 [Logic Design]: Design Styles - sequential circuits. B.6.3 [Logic Design]: Design Aids - optimization, simulation. B.7.1 [Integrated Circuits]: Types and Design Styles- VLSI. B.8.2 [Performance and Reliability]: Performance Analysis and Design Aids.
General Terms
Algorithms, Design, Performance, Reliability.
Keywords
Substrate noise, di/dt noise, low-noise digital design, clock distribution networks, supply current shaping and optimization.


Session 28: Advances in Synthesis

Chair: Marek Perkowski
Organizers: Soha M. Hassoun, Yusuke Matsunaga
28.1 Resynthesis and Peephole Transformations for the Optimization of Large-Scale Asynchronous Systems [p. 405]
Tiberiu Chelcea, Steven M. Nowick

Several approaches have been proposed for the syntax-directed compilation of asynchronous circuits from high-level specification languages, such as Balsa and Tangram. Both compilers have been successfully used in large real-world applications; however, in practice, these methods suffer from significant performance overheads due to their reliance on straightforward syntax-directed translation. This paper introduces a powerful new set of transformations, and an extended channel-based language to support them, which can be used an optimizing back-end for Balsa. The transforms described in this paper fall into two categories: resynthesis and peephole. The proposed optimization techniques have been fully integrated into a comprehensive asynchronous CAD package, Balsa. Experimental results on several substantial design examples indicate significant performance improvements.

28.2 Design of Asynchronous Circuits by Synchronous CAD Tools [p. 411]
Alex Kondratyev, Kelvin Lwin

The roadblock to wide acceptance of asynchronous methodology is poor CAD support. Current asynchronous design tools require a significant re-education of designers, and their features are far behind synchronous commercial tools. This paper considers a particular subclass of asynchronous circuits (Null Convention Logic or NCL) and suggests a design flow that is based entirely on commercial CAD tools. This new design flow shows a significant area improvement over known flows based on NCL.

28.3 Implementing Asynchronous Circuits using a Conventional EDA Tool-Flow [p. 415]
Christos P. Sotiriou

This paper presents an approach by which asynchronous circuits can be realised with a conventional EDA tool flow and conventional standard cell libraries. Based on a gate-level asynchronous circuit implementation technique, direct-mapping, and by identifying the delay constraints and exploiting certain EDA tool features, this paper demonstrates that a conventional EDA tool flow can be used to describe, place, route and timing-verify asynchronous circuits.
Categories and Subject Descriptors
B7.1. [Integrated Circuits]: Types and Design Styles
General Terms
Design, Experimentation, Standardization
Keywords
Asynchronous, EDA, Tool-Flow

28.4 Transformation Rules for Designing CNOT-based Quantum Circuits [p. 419]
Kazuo Iwama, Yahiko Kambayashi, Shigeru Yamashita

This paper gives a simple but nontrivial set of local transformation rules for Control-NOT(CNOT)-based combinatorial circuits. It is shown that this rule set is complete, namely, for any two equivalent circuits, S1 and S2, there is a sequence of transformations, each of them in the rule set, which changes S1 to S2. Our motivation is to use this rule set for developing a design theory for quantum circuits whose Boolean logic parts should be implemented by CNOT-based circuits. As a preliminary example, we give a design procedure based on our transformation rules which reduces the cost of CNOT-based circuits.
Categories and Subject Descriptors
B.6.m [LOGIC DESIGN]: Miscellaneous
General Terms
Design, Theory
Keywords
Quantum Circuit, CNOT Gate, Local Transformation Rules

28.5 Fast Three-Level Logic Minimization Based on Autosymmetry [p. 425]
Anna Bernasconi, Valentina Ciriani, Fabrizio Luccio, Linda Pagli

Sum of Pseudoproducts (SPP) is a three level logic synthesis technique developed in recent years. In this framework we exploit the "regularity" of Boolean functions to decrease minimization time. Our main results are: 1) the regularity of Boolean function f of n variables is expressed by its autosymmetry degree k (which 0 <= k <= n), where k = 0 means no regularity (that is, we are not able to provide any advantage over standard synthesis); 2) for k >= 1 the function is autosymmetric, and a new function fk is identified in polynomial time: fk is "equivalent" to, but smaller than f, and depends on n - k variables only; 3) given a minimal SPP form for fk, a minimal SPP form for f is built in linear time; 4) experimental results show that 61% of the functions in the classical ESPRESSO benchmark suite are autosymmetric, and the SPP minimization time for them is critically reduced; we can also solve cases otherwise practically intractable. We finally discuss the role and meaning of autosymmetry.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids - Automatic Synthesis, Optimization. General Terms
Algorithms, Design, Theory.
Keywords:
Three-Level Logic, Synthesis, Autosymmetry.


Session 29: Analog Synthesis & Design Methodology

Chair: C.-J. Richard Shi
Organizers: Joel R. Phillips, Kartikeya Mayaram
29.1 An Efficient Optimization-based Technique to Generate Posynomial Performance Models for Analog Integrated Circuits [p. 431]
Walter Daems, George Gielen, Willy Sansen

This paper presents an new direct-fitting method to generate posynomial response surface models with arbitrary constant exponents for linear and nonlinear performance parameters of analog integrated circuits. Posynomial models enable the use of efficient geometric programming techniques for circuit sizing and optimization. The automatic generation avoids the time-consuming nature and inaccuracies of handcrafted analytic model generation. The technique is based on the fitting of posynomial model templates to numerical data from SPICE simulations. Attention is paid to estimating the relative "goodness-of-fit" of the generated models. Experimental results illustrate the significantly better accuracy of the new approach.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids; B.8.2 [Performance and Reliability]: Performance Analysis and Design Aids; I.6.5 [Simulation and Modeling]: Model Development
General Terms
Performance, Design, Algorithms
Keywords
Performance Modeling for Analog Circuits, Posynomial Response Surface Modeling, Geometric Programming

29.2 Remembrance of Circuits Past: Macromodeling by Data Mining in Large Analog Design Spaces [p. 437]
Hongzhou Liu, Amit Singhee, Rob A. Rutenbar, L. Richard Carley

The introduction of simulation-based analog synthesis tools creates a new challenge for analog modeling. These tools routinely visit 103 to 105 fully simulated circuit solution candidates. What might we do with all this circuit data? We show how to adapt recent ideas from large-scale data mining to build models that capture significant regions of this visited performance space, parameterized by variables manipulated by synthesis, trained by the data points visited during synthesis. Experimental results show that we can automatically build useful nonlinear regression models for large analog design spaces.
CATEGORIES AND SUBJECT DESCRIPTORS
B.7.2 [Integrated Circuits]: Design aids - verification
GENERAL TERMS
Algorithms

29.3 Optimal Design of Delta-Sigma ADCs by Design Space Exploration [p. 443]
Ovidiu Bajdechi, Georges Gielen, Johan H. Huijsing

An algorithm for architecture-level exploration of delta-sigma ADC design space is presented. The algorithm finds an optimal solution by exhaustively exploring both single-loop and cascaded architectures, with single-bit or multi-bit quantizer, for a range of oversampling ratios. A fast filter-level step evaluates the performance of all loop-filter topologies and passes the accepted solutions to the architecture-level optimization step which maps the filters on feasible architectures and evaluates their performance. The power consumption of each accepted architecture is estimated and the best top-ten solutions in terms of the ratio of peak SNDR versus power consumption are further optimized for yield. Experimental results for two different design targets are presented. They show that previously published solutions are among the best architectures for a given target but that better solutions can be designed.
Categories and Subject Descriptors
J.6 [Computer Applications]: Computer-Aided Engineering
General Terms
Design
Keywords
ADC, CAD, delta-sigma

29.4 Systematic Design of a 200 MS/s 8-bit Interpolating/Averaging A/D Converter [p. 449]
J. Vandenbussche, K. Uyttenhove, E. Lauwers, M. Steyaert, G. Gielen

The systematic design of a high-speed, high-accuracy Nyquistrate A/D converter is proposed. The presented design methodology covers the complete flow and is supported by software tools. A generic behavioral model is used to explore the A/D converters specifications during high-level design and exploration. The inputs to the flow are the specifications of the A/D converter and the technology process. The result is a generated layout and the corresponding extracted behavioral model. The approach has been applied to a real-life test case, where a Nyquist-rate 8-bit 200 MS/s 4-2 interpolating/averaging A/D converter was developed for a WLAN application.
Categories and Subject Descriptors
B.7.m Integrated Circuits: miscellaneous
General Terms
Design
Keywords
A/D converters, Interpolating, Flash, Simulated Annealing.


Session 30: Low-Power Physical Design

Chair: Massoud Pedram
Organizers: Chaitali Chakrabarti, Sarma Vrudhula
30.1 Petri Net Modeling of Gate and Interconnect Delays for Power Estimation [p. 455]
Ashok K. Murugavel, N. Ranganathan

In this paper, a new type of Petri net called Hierarchical Colored Hardware Petri net, to model real-delay switching activity for power estimation is proposed. The logic circuit is converted into a HCHPN and simulated as a Petri net to get the switching activity estimate and thus the power values. The method is accurate and is significantly faster than other simulative methods. The HCHPN yields an average error of 4.9% with respect to Hspice for the ISCAS '85 benchmark circuits. The per-pattern simulation time is about 46 times lesser than PowerMill.

30.2 Power Estimation in Global Interconnects and its Reduction Using a Novel Repeater Optimization Methodology [p. 461]
Pawan Kapur, Gaurav Chandra, Krishna C. Saraswat

The purpose of this work is two fold. First, to quantify and establish future trends for the dynamic power dissipation in global wires of high performance integrated circuits. Second, to develop a novel and efficient delay-power tradeoff formulation for minimizing power due to repeaters, which can otherwise constitute 50% of total global wire power dissipation. Using the closed form solutions from this formulation, power savings of 50% on repeaters are shown with minimal delay penalties of about 5% at the 50 nm technology node. These closed-form, analytical solutions provide a fast and powerful tool for designers to minimize power.

30.3 Low-Swing Clock Domino Logic Incorporating Dual Supply and Dual Threshold Voltage [p. 467]
Seong-Ook Jung, Ki-Wook Kim, Sung-Mo Kang

High-speed domino logic is now prevailing in performance critical block of a chip. Low Voltage Swing Clock (LVSC) domino logic family is developed for substantial dynamic power saving. To boost up the transition speed in proposed circuitry, a well-established dual threshold voltage technique is exploited. Dual supply voltage technique in the LVSC domino logic is geared to reduce power consumption in clock tree and logic gates effectively. Delay Constrained Power Optimization (DCPO) algorithm allocates low supply voltage to logic gates such that dynamic power consumed by logic gates is minimized. Delay time variations due to gate-to-source voltage change and input signal arrival time difference are considered for accurate timing analysis in DCPO.
Categories and Subject Descriptors
B.6 [Hardware]: Logic Design; B.7 [Hardware]: Integrated Circuits
General Terms
Design
Keywords
domino logic, low swing clock, dual supply voltage, dual threshold voltage, low power

30.4 DRG-Cache: A Data Retention Gated-Ground Cache for Low Power [p. 473]
Amit Agarwal, Hai Li, Kaushik Roy

In this paper we propose a novel integrated circuit and architectural level technique to reduce leakage power consumption in high performance cache memories using single Vt (transistor threshold voltage) process. We utilize the concept of Gated-Ground [5] (NMOS transistor inserted between Ground line and SRAM cell) to achieve reduction in leakage energy without significantly affecting performance. Experimental results on gated-Ground caches show that data is retained (DRG-Cache) even if the memory are put in the stand-by mode of operation. Data is restored when the gated-Ground transistor is turned on. Turning off the gated-Ground transistor in turn gives large reduction in leakage power. This technique requires no extra circuitry; row decoder itself can be used to control the gated-Ground transistor. The technique is applicable to data and instruction caches as well as different levels of cache hierarchy such as the L1, L2, or L3 caches. We fabricated a test chip in TSMC 0.25 technology to show the data retention capability and the cell stability of DRG-cache. Our simulation results on 100nm and 70nm processes (Berkeley Predictive Technology Model) show 16.5% and 27% reduction in consumed energy in L1 cache and 50% and 47% reduction in L2 cache with less than 5% impact on execution time and within 4% increase in area overhead.
Categories and Subject Descriptors
B.3.2 [Memory Structure]: Design Styles --- Cache memories; B.3.1 [Memory Structure]: Semiconductor Memories --- Static memory (SRAM); B.7.1 [Integrated Circuits]: Types and Design Styles --- Memory technology.
General Terms: Design, Performance and
Experimentation.
Keywords: Gated-ground, SRAM, low leakage cache.


Session 31: PANEL: Unified Tools for SoC Embedded Systems: Mission Critical, Mission Impossible or Mission Irrelevant?

Chair: Gary Smith
Organizers: Daya Nadamuni, Sharad Malik
Panelists: Rick Chapman, John Fogelin, Kurt Keutzer, Grant Martin, Brian Bailey [p. 479]

As designers struggle with developing application solutions consisting of complex systems-on-a-chip with a significant software component, they must deal with a diversity of tools with very different philosophies and assumptions, to help manage this task. On one hand are tools which assume a clean separation between the hardware and software parts of the design with an abstraction of the hardware available for software development. On the other hand are tools that try to handle the hardware and software parts of the design concurrently. What drives these different philosophies? Which of these is critical for emerging system designs? Which of these is viable going forward? Our panel of experts consisting of designers, embedded software tool providers, system design tool providers and an academic will answer these challenging questions.


Session 32: Multi-Voltage, Multi-Threshold Design

Chair: Rajendran Panda
Organizers: Renu Mehra, Sarma Vrudhula
32.1 Dynamic and Leakage Power Reduction in MTCMOS Circuits Using an Automated Efficient Gate Clustering Technique [p. 480]
Mohab Anis, Shawki Areibi, Mohamed Mahmoud, Mohamed Elmasry

Reducing power dissipation is one of the most principle subjects in VLSI design today. Scaling causes subthreshold leakage currents to become a large component of total power dissipation. This paper presents two techniques for efficient gate clustering in MTCMOS circuits by modeling the problem via Bin-Packing (BP) and Set-Partitioning (SP) techniques. An automated solution is presented, and both techniques are applied to six benchmarks to verify functionality. Both methodologies offer significant reduction in both dynamic and leakage power over previous techniques during the active and standby modes respectively. Furthermore, the SP technique takes the circuits routing complexity into consideration which is critical for Deep Sub-Micron (DSM) implementations. Sufficient performance is achieved, while significantly reducing the overall sleep transistors area. Results obtained indicate that our proposed techniques can achieve on average 90% savings for leakage power and 15% savings for dynamic power.
Categories and Subject Descriptors: B.7.1 [Integrated Circuits]: Types and Design Styles
General Terms: Design

32.2 Total Power Optimization By Simultaneous Dual-Vt Allocation and Device Sizing in High Performance Microprocessors [p. 486]
Tanay Karnik, Yibin Ye, James Tschanz, Liqiong Wei, Steven Burns, Venkatesh Govindarajulu, Vivek De, Shekhar Borkar

We describe various design automation solutions for design migration to a dual-Vt process technology. We include the results of a Lagrangian Relaxation based tool, iSTATS, and a heuristic iterative optimization flow. Joint dual-Vt allocation and sizing reduces total power by 10+% compared with Vt allocation alone, and by 25+% compared with pure sizing methods. The heuristic flow requires 5x larger computation runtime than iSTATS due to its iterative nature.
Categories and Subject Descriptors
B.7 INTEGRATED CIRCUITS
B.7.1 Types and Design Styles Microprocessors and microcomputers, VLSI.
General Terms
Algorithms, Performance, Design, Experimentation, Verification.
Keywords
Dual-Vt design, multiple threshold, sizing, optimization.

32.3 An Optimal Voltage Synthesis Technique for a Power-Efficient Satellite Application [p. 492]
Dong-In Kang, Jinwoo Suh, Stephen P. Crago

This paper presents an optimal voltage synthesis technique for a satellite application to maximize system performance subject to energy budget. A period of a satellite's orbit is partitioned into several independent regions with different characteristics such as type of computation, importance, performance requirements, and energy consumption. Given a periodic energy recharge model, optimal voltages for the regions are synthesized such that the overall performance is maximized within the energy budget in the period.
Categories and Subject Descriptors
C.4 [PERFORMANCE OF SYSTEMS] Design studies, Modeling techniques, Performance attributes.
General Terms
Algorithms, Management, Performance, Design.
Keywords
Power-aware design, power-efficient design, satellite application, queueing.


Session 33: Advanced Simulation Techniques

Chair: L. Miguel Silveira
Organizers: Georges G. Gielen, Kartikeya Mayaram
33.1 Fast and Accurate Behavioral Simulation of Fractional-N Frequency Synthesizers and other PLL/DLL Circuits [p. 498]
Michael H. Perrott

Techniques for fast and accurate simulation of fractional-N synthesizers at a detailed behavioral level are presented. The techniques allow a uniform time step to be used for the simulator, and can be applied to a variety of phase locked loop (PLL) and delay locked loop (DLL) circuits beyond fractional-N synthesizers, as well as to a variety of simulation frameworks such as Verilog and Matlab. Simulated results from a custom C++ simulator are shown to compare well to measured results from a prototype fractional-N synthesizer using a Delta-Sigma modulator to dither its divide value.
Categories and Subject Descriptors
I.6.5 [Simulation and Modeling]: Model Development General Terms
Algorithms
Keywords
fractional-N,frequency,synthesizer,sigma,delta,PLL,DLL

33.2 Time-domain Steady-state Simulation of Frequency-Dependent Components Using Multi-interval Chebyshev Method [p. 504]
Baolin Yang, Joel Phillips

Simulation of RF circuits often demands analysis of distributed component models that are described via frequency-dependent multiport Y, Z, or S parameters. Frequency-domain methods such as harmonic balance are able to handle these components without difficulty, while