All times in Pacific Daylight Time (PDT = GMT-7). Central Europe (CEST) = PDT + 9. Taiwan/China = PDT + 15.
Across the three days for ISPD 2023, we have 3 keynotes, 13 accepted papers, 22 invited talks, one panel on Wednesday with 5 panelists, 3 speakers with longer talks for Professor Marek-Sadowska's commemorative session, and finally the ISPD 2023 contest results.
Papers are available at SIGDA and the ACM Digital Library.
Videos are available through the links in the program below or at the ISPD 2023 YouTube channel.
6:00 - 8:00 pm PDT. Sunday Social Chit-Chat and Welcome
Meet and greet cocktail hours on Gather.Town with the general chair of the organizing committee, David Chinnery (Siemens EDA).
7:00 - 8:00 am PDT. Opening Session and First Keynote
Chair: David Chinnery (Siemens EDA), [opening slides], [opening video]
"Automated Design of Chiplets", Alberto Sangiovanni-Vincentelli (University of California at Berkeley), [slides], [video]
Chiplet-based designs have gained recognition as a promising al- ternative to monolithic SoCs due to their lower manufacturing costs, improved re-usability, and optimized technology specializa- tion. Despite progress made in various related domains, the design of chiplets remains largely reliant on manual processes. In this paper, we provide an examination of the historical evolution of chiplets, encompassing a review of crucial design considerations and a synopsis of recent advancements in relevant fields. Further, we identify and examine the opportunities and challenges in the automated design of chiplets. To further demonstrate the potential of this nascent area, we present a novel task that showcases the promising future of automated chiplet design.
8:00 - 8:15 am PDT. Break
8:15 - 9:30 am PDT. Session on Routing
Chair: Yih-Lang Li (National Yang Ming Chiao Tung University)
1. "FastPass: Fast Pin Access Analysis with Incremental SAT Solving", Fangzhou Wang, Jinwei Liu, Evangeline F. Y. Young (The Chinese University of Hong Kong), best paper, [slides], [overview video], [video]
Pin access analysis is a critical step in detailed routing. With complicated design rules and pin shapes, efficient and accurate pin accessibility evaluation is desirable in many physical design scenarios. To this end, we present FastPass, a fast and robust pin access analysis framework, which first generates design rule checking (DRC)-clean pin access route candidates for each pin, pre-computes incompatible pairs of routes, and then uses incremental SAT solving to find an optimized pin access scheme. Experimental results on the ISPD 2018 benchmarks show that FastPass produces DRC-clean pin access schemes for all cases while being 14.7 x faster than the known best pin access analysis framework on average
2. "Pin Access-Oriented Concurrent Detailed Routing", Yun-Jhe Jiang, Shao-Yun Fang (National Taiwan University of Science and Technology), [slides], [overview video], [video]
Due to continuously shrunk feature sizes and increased design complexity, the difficulty in pin access becomes one of the most critical challenges in large-scale full-chip routing. State-of-the-art pin access-aware detailed routing techniques suffer from either the ordering problem of the sequential routing scheme or the inflexibility of pre-determining an access point for each pin. Some other routing-related studies create pin extensions with Metal-2 metal segments to optimize pin accessibility; however, this strategy may not be practical without considering the contemporary routing flow. This paper presents a pin access-oriented concurrent detailed routing approach conducted after the track assignment stage. The core detailed routing engine is based on an integer linear programming (ILP) formulation, which has lower complexity and can flexibly tackle multi-pin nets compared to an existing formulation. Besides, to maximize the free routing resource and to keep the problem size tractable, a pre-processing flow trimming redundant metals and inserting assistant metals is developed. The experimental results show that compared to a state-of-the-art academic router, the proposed concurrent scheme can effectively derive good results with fewer design rule violations and less runtime.
3. "Reinforcement Learning Guided Detailed Routing for Custom Circuits", Hao Chen, Kai-Chieh Hsu, Walker Turner, Po-Hsuan Wei, Keren Zhu, David Pan, Haoxing Ren (University of Texas at Austin, Princeton University, NVIDIA), [slides], [overview video], [video]
Detailed routing is the most tedious and complex procedure in design automation and has become a determining factor in layout automation in advanced manufacturing nodes. Despite continuing advances in custom integrated circuit (IC) routing research, industrial custom layout flows remain heavily manual due to the high complexity of the custom IC design problem. Besides conventional design objectives such as wirelength minimization, custom detailed routing must also accommodate additional constraints (e.g., path-matching) across the analog/mixed-signal (AMS) and digital domains, making an already challenging procedure even more so. This paper presents a novel detailed routing framework for custom circuits that leverages deep reinforcement learning to optimize routing patterns while considering custom routing constraints and industrial design rules. Comprehensive post-layout analyses based on industrial designs demonstrate the effectiveness of our framework in dealing with the specified constraints and producing sign-off-quality routing solutions.
4. "Voltage-Drop Optimization Through Insertion of Extra Stripes To A Power Delivery Network", Jai Ming Lin, Yu Tien Chen, Yang Tai Kung, Hao Jia Lin (National Cheng Kung University), [slides], [overview video], [video]
As the complexity increases, power delivery network (PDN) optimization becomes a more important step in a modern design. In order to construct a robust PDN, most classic PDN optimization methods focus on adjusting the dimensions of power stripes. However, this approach becomes infeasible when voltage violation regions also have severe routing congestion. Hence, this paper proposes a delicate procedure to insert additional power stripes to reduce voltage violation while maintaining routability. In the beginning, IR-drop high related regions are identified to reveal those locations which are thirsty for more currents. Then, we solve a minimum-cost flow problem to find the topologies of power delivery paths (PDPs) from power sources to these regions and determine the widths of edges in each PDP so that enough currents can be provided to these regions. Moreover, vertical power stripes (VPSs for short) are inserted to the locations which have less routing congestion and severe voltage violations by the dynamic programming to reduce a probability to deteriorate routability. Finally, more wires will be inserted to IR-drop high related regions if there still exist voltage violations. Experimental results show that our method can use much less routing resource and induce less routing congestion to meet IR-drop constraint in industry designs.
5. "NVCell 2: Routability-Driven Standard Cell Layout in Advanced Nodes with Lattice Graph Routability Model", Mark Ho (NVIDIA), [slides], [overview video], [video]
Standard cells are essential components of modern digital circuit designs. With process technologies advancing beyond the 5nm node, more routability issues have arisen due to the decreasing number of routing tracks, increasing number and complexity of design rules, and strict patterning rules. Automatic standard cell synthesis tools are struggling to design cells with severe routability issues. In this paper, we propose a routability-driven standard cell synthesis framework using a novel pin density aware congestion metric, lattice graph routability modelling approach, and dynamic external pin allocation methodology to generate routability optimized layouts. On a benchmark of 94 complex and hard-to-route standard cells, NVCell 2 improves the number of routable and LVS/DRC clean cell layouts by 84.0% and 87.2%, respectively. NVCell 2 can generate 98.9% of cells LVS/DRC clean, with 13.9% of the cells having smaller area, compared to an industrial standard cell library with over 1000 standard cells.
9:30 - 9:45 am PDT. Break
9:45 - 10:30 am PDT. Session 1 on 3D ICs, heterogeneous integration, and packaging
Chair: Wen-Hao Liu (Cadence Design Systems)
1. "FXT-Route: Efficient High-Performance PCB Routing with Crosstalk Reduction Using Spiral Delay Lines", Meng Lian, Yushen Zhang, Mengchu Li, Tsun-Ming Tseng, Ulf Schlichtmann (Technical University of Munich), [slides], [overview video], [video]
In high-performance printed circuit boards (PCBs), adding serpentine delay lines is the most prevalent delay-matching technique to balance the delays of time-critical signals. Serpentine topology, however, can induce simultaneous accumulation of the crosstalk noise, resulting in erroneous logic gate triggering and speed-up effects. The state-of-the-art approach for crosstalk alleviation achieves waveform integrity by enlarging wire separation, resulting in an increased routing area. We introduce a method that adopts spiral delay lines for delay matching to mitigate the speed-up effect by spreading the crosstalk noise uniformly in time. Our method avoids possible routing congestion while achieving a high density of transmission lines. We implement our method by constructing a mixed-integer-linear programming (MILP) model for routing and a quadratic programming (QP) model for spiral synthesis. Experimental results demonstrate that our method requires, on average, 31% less routing area than the original design. In particular, compared to the state-of-the-art approach, our method can reduce the magnitude of the crosstalk noise by at least 69%.
2. "On Legalization of Die Bonding Bumps and Pads for 3D ICs", Sung-Kyu Lim (Georgia Institute of Technology), [slides], [overview video], [video]
State-of-the-art 3D IC Place-and-Route flows were designed with older technology nodes and aggressive bonding pitch assumptions. As a result, these flows fail to honor the width and spacing rules for the 3D vias with realistic pitch values. We propose a critical new 3D via legalization stage during routing to reduce such violations. A force-based solver and bipartite-matching algorithm with Bayesian optimization are presented as viable legalizers and are compatible with various process nodes, bonding technologies, and partitioning types. With the modified 3D routing, we reduce the 3D via violations by more than 10× with zero impact on performance, power, or area
3. "Reshaping System Design in 3D Integration: Perspectives and Challenges", Hung-Ming Chen (National Yang Ming Chiao Tung University), [slides], [overview video], [video]
In this paper, we depict modern system design methodologies via 3D integration along with the advance of packaging, considering system prototyping, interconnecting, and physical implementation. The corresponding challenges are presented as well.
10:30 - 10:45 am PDT. Break
10:45 - 11:30 am PDT. Session 2 on 3D ICs, heterogeneous integration, and packaging
Chair: Yarui Peng (University of Arkansas)
1. "Co-design for Heterogeneous Integration: A Failure Analysis Perspective", Erica Douglas (Sandia National Labs), [slides], [overview video]
As scaling for CMOS transistors asymptotically approaches the end of Moore’s Law, the need to push into 3D integration schemes to innovate capabilities is gaining significant traction. Further, rapid development of new semiconductor solutions, such as heterogeneous integration, has turned the semiconductor industry’s consistent march towards next generation products into new arenas. In 2018, the Department of Energy Office of Science (DOE SC) released their “Basic Research Needs for Microelectronics,” communicating a strong push towards “parallel but intimately networked efforts to create radically new capabilities,”1 which they have coined as “co-design”.
2. "Goal Driven PCB Synthesis Using Machine Learning and Cloud Scale Compute", Taylor Hogan (Cadence Design Systems), [slides], [overview video], [video]
X AI is a cloud-based system that leverages machine learning, and search to place and route printed circuit boards using physics-based analysis and high-level design. We propose a feedback-based Monte Carlo Tree Search (MCTS) algorithm to explore the space of possible designs. A metric, or metrics, is given to evaluate the quality of designs as MCTS learns about possible solutions. A policy and value network are trained during exploration to learn to accurately weight quality actions and identify useful design states. This is performed as a feedback loop in conjunction with other feedforward tools for placement and routing.
3. "Gate-All-Around Technology is Coming. What’s Next After GAA?", Victor Moroz (Synopsys), [slides] [overview video]
Currently, the industry is transitioning from FinFETs to gate-allaround (GAA) technology and will likely have several GAA technology generations in the next few years. What’s next after that? This is the question that we are trying to answer in this project by benchmarking GAA technology with transistors on 2D materials and stacked transistors (CFETs). The main objective for logic is to get a meaningful gain in power, performance, area, and cost (PPAC). The main objective for SRAM is to get a noticeable density scaling for the SRAM array and its periphery without losing performance and yield. Another objective is to move in the direction that has a promise of longerterm progress, such as to start stacking two layers of transistors before moving to a larger number of transistor layers. With that in mind, we explore and discuss the next steps beyond GAA technology.
11:30 - 11:45 am PDT. Break
11:45 - 1:00 pm PDT. Session on Analog Design
Chair: Jürgen Scheible (Reutlingen University)
1. "VLSIR - A Modular Framework for Programming Analog & Custom Circuits & Layouts", Dan Fritchman (UC Berkeley), [slides], [overview video], [video]
We present VLSIR, a modular and fully open-source framework for programming analog and custom circuits and layouts. VLSIR is centered around a protobuf-defined design database. It features high-productivity front-ends for hardware description ("circuit programming"), simulation, and custom layout programming, designed to be amenable to both human designers and automation.
2. "Joint Optimization of Sizing and Layout for AMS Designs: Challenges and Opportunities", David Pan (UT Austin) [slides], [overview video], [video]
Recent advances in analog device sizing algorithms show promising results on the automatic schematic design. However, the majority of the sizing algorithms are based on schematic-level simulations and layout-agnostic. The physical layout implementation brings extra parasitics to the analog circuits, leading to discrepancies between schematic and post-layout performance. This performance gap raises questions about the effectiveness of automatic analog device sizing tools. Prior work has leveraged procedural layout generation to account for layout-induced parasitics in the sizing process. However, the need for layout templates makes such methodology limited in application. In this paper, we propose to bridge automatic analog sizing with post-layout performance using state-of-the-art optimization-based analog layout generators. A quantitative study is conducted to measure the impact of layout awareness in stateof-the-art device sizing algorithms. Furthermore, we present our perspectives on the future directions in layout-aware analog circuit schematic design.
3. "Learning from the Implicit Functional Hierarchy in an Analog Netlist", Helmut Graeb (Technical University of Munich), [slides], [overview video], [video]
Analog circuit design is characterized by a plethora of implicit design and technology aspects available to the experienced designer. In order to create useful computer-aided design methods, this implicit knowledge has to be captured in a systematic and hierarchical way. A key approach to this goal is to "learn" the knowledge from the netlist of an analog circuit. This requires a library of structural and functional blocks for analog circuits together with their individual constraints and performance equations, graph homomorphism techniques to recognize blocks that can have different structural implementations and I/O pins, as well as synthesis methods that exploit the learned knowledge. In this contribution, we will present how to make use of the functional and structural hierarchy of operational amplifiers. As an application, we explore the capabilities of machine learning in the context of structural and functional properties and show that the results can be substantially improved by pre-processing data with traditional methods for functional block analysis. This claim is validated on a data set of roughly 100,000 readily sized and simulated operational amplifiers.
4. "The ALIGN Automated Analog Layout Engine: Progress, Learnings, and Open Issues", Sachin Sapatnekar (University of Minnesota), [slides], [overview video], [video]
The ALIGN (Analog Layout, Intelligently Generated from Netlists) project [1, 2] is a joint university-industry effort to push the envelope of automated analog layout through a systematic new approach, novel algorithms, and open-source software [3]. Analog automation research has been active for several decades, but has not found widespread acceptance due to its general inability to meet the needs of the design community. Therefore, unlike digital design, which has a rich history of automation and extensive deployment. ALIGN attempts to overcome several of the major issues associated with this lack of success. We overview the ALIGN flow, while highlighting novel techniques developed as part of the project.
5. "Analog Layout Automation On Advanced Process Technologies", Soner Yaldiz (Intel Corporation), [slides], [overview video], [video]
Despite the digitization of analog and the disaggregated silicon trends, high-volume or high-performance system-on-chip (SoC) designs integrate numerous analog and mixed-signal (AMS) intellectual property (IP) blocks including voltage regulators, clock generators, sensors, memory and other interfaces. For example, fine-grain dynamic voltage and frequency scaling requires a dedicated clock generator and voltage regulator per compute unit. The design of these blocks in advanced FinFET or GAAFET technologies is challenging due to the i) increasing gap between schematic and post-layout simulation, ii) design rule complexity, and iii) strict reliability rules [1]. The convergence of a high-performance or a high-power block may require multiple iterations of circuit sizing and layout changes. As a result, physical design, which is primarily a manual effort, has become a key bottleneck in the design process. This talk will first overview physical design of AMS IP blocks on an advanced process technology highlighting the opportunities and the expectations from layout automation during this process.
1:00 - 1:30 pm PDT. Chit-Chat on Gather.Town
7:00 - 7:50 am PDT. Second Keynote
Chair: Yao-Wen Chang (National Taiwan University)
"Immersion and EUV Lithography: Two Pillars to Sustain Single-Digit Nanometer Nodes", Burn J. Lin (National Tsing Hua University), [slides], [video]
Semiconductor technology has advanced to single-digit nanometer dimensions for the circuit elements. The minimum feature size has reached subwavelength dimension. Many resolution enhancement techniques have been developed to extend the resolution limit of optical lithography systems, namely illumination optimization, phase-shifting masks, and proximity corrections. Needless to say, the actinic wavelength and the numerical aperture of the imaging lens have been reduced in stages, The most recent innovations are Immersion lithography and Extreme UV (EUV) lithography. In this presentation, the working principles, advantages, and challenges of immersion lithography are given. The defectivity issue is addressed by showing possible causes and solutions. The circuit design issues for pushing immersion lithography to single-digit nanometer delineation are presented. Similarly, the working principles, advantages, and challenges of EUV lithography are given. There are special focusses on EUV power requirement, generation, and distribution; EUV mask components, absorber thickness, defects, flatness requirement, and pellicles; EUV resist challenges on sensitivity, line edge roughness, thickness, and etch resistance.
8:00 - 9:15 am PDT. Session on DFM, Reliability, and Electromigration
Chair: Bei Yu (Chinese University of Hong Kong)
1. "Advanced Design Methodologies for Directed Self-Assembly", Shao-Yun Fang (National Taiwan University of Science and Technology), [slides], [overview video], [video]
Directed self-assembly (DSA), which uses the segregation nature after an annealing process of block co-polymer (BCP) to generate tiny feature shapes, becomes one of the most promising next generation lithography technologies. According to the different proportions of the two monomers in an adopted BCP, either cylinders or lamellae can be generated by removing one of the two monomers, which are respectively referred to as cylindrical DSA and lamellar DSA. In addition, guiding templates are required to produce trenches before filling BCP such that the additional forces from the trench walls regulate the generated cylinders/lamellae. Both the two DSA technologies can be used to generate contact/via patterns in circuit layouts, while the practices of designing guiding templates are quite different due to different manufacturing principles. This paper reviews the existing studies on the guiding template design problem for contact/via hole fabrication with the DSA technology. The design constraints are differentiated and the design methodologies are respectively introduced for cylindrical DSA and lamellar DSA. Possible future research directions are finally suggested to further enhance contact/via manufacturability and the feasibility of adopting DSA in semiconductor manufacturing.
2. "Challenges for Interconnect Reliability: From Element to System Level", Olalla Varela Pedreira (IMEC), [slides], [overview video], [video]
The high current densities carried by the interconnects have a direct impact on the back-end-of-line (BEOL) reliability degradation as they locally increase the temperature by Joule heating, and they lead to drift in the metal atoms. Local increase in temperature due to Joule heating will lead to thermal gradients along the interconnects inducing degradation through thermomigration. As the power density of the chip increases, thermal gradients may become a major reliability concern for scaled Cu interconnects. Therefore, it is of utmost relevance to fundamentally understand the impact of thermal gradients in metal migration. Our studies show that by using a combined modelling approach and a dedicated test structure we can assess the local temperatures and temperature gradients profiles. Moreover, with long-term experiments, we are able to successfully generate voids at the location of highest temperature gradients. Additionally, the main consequence of scaling the Cu interconnects is the dramatic drop of EM lifetime (Jmax). Currently the experimentally obtained EM parameters are used at system design level to set the current limits through the interconnect networks. However, this approach is very simplistic and neglects the benefits provided by the redundancy and interconnectivity from the network. Our studies by using a system-level physicsbased EM simulation framework which can determine the EM induced IR drop at the standard cell level, show that the circuit reliability margins of the power delivery network (PDN) can be further relaxed.
3. "Combined Modeling of Electromigration, Thermal and Stress Migration in AC Interconnect Lines", Susann Rothe, Jens Lienig (Dresden University of Technology), [slides], [overview video], [video]
The migration of atoms in metal interconnects in integrated circuits (ICs) increasingly endangers chip reliability. The susceptibility of DC interconnects to electromigration has been extensively studied. A few works on thermal migration and AC electromigration are also available. Yet, the combined effect of both on chip reliability has been neglected thus far. This paper provides both FEM and analytical models for atomic migration and steady-state stress profiles in AC interconnects considering electromigration, thermal and stress migration combined. For this we expand existing models by the impact of self-healing, temperature-dependent resistivity, and short wire length. We conclude by analyzing the impact of thermal migration on interconnect robustness and show that it cannot be neglected any longer in migration robustness verification.
4. "Recent Progress in the Analysis of Electromigration and Stress Migration in Large Multisegment Interconnects", Nestor Evmorfopoulos (University of Thessaly), [slides], [overview video], [video]
Traditional approaches to analyzing electromigration (EM) in onchip interconnects are largely driven by semi-empirical models. However, such methods are inexact for the typical multisegment lines that are found in modern integrated circuits. This paper overviews recent advances in analyzing EM in on-chip interconnect structures based on physics-based models that use partial differential equations, with appropriate boundary conditions, to capture the impact of electron-wind and back-stress forces within an interconnect, across multiple wire segments. Methods for both steady-state and transient analysis are presented, highlighting approaches that can solve these problems with a computation time that is linear in the number of wire segments in the interconnect.
5. "Electromigration Assessment in Power Grids with Account of Redundancy and Non-Uniform Temperature Distribution", Armen Kteyan (Siemens EDA), [slides], [overview video], [video]
A recently proposed methodology for electromigration (EM) assessment in on-chip power/ground grid of integrated circuits has been validated by means of measurements, performed on dedicated test grids. IR drop degradation in the grid is used for defining the EM failure criteria. Physics-based models are involved for simulation of EM-induced stress evolution in interconnect structures, void formation and evolution, resistance increase of the voided segments, and consequent re-distribution of electric current in the redundant grid paths. A grid-like test structure, fabricated with a 65 nm technology and consisting of two metal layers, allowed to calibrate the voiding models by tracking voltage evolution in all grid nodes in experiment and in simulation. Good fit of the measured and simulated time-to-failure (TTF) probability distribution was obtained in both cases of uniform and non-uniform temperature distribution across the grid. The second test grid was fabricated with a 28 nm technology, consisted of 4 metal layers, and contained power and ground nets connected to “quasi-cells” with poly-resistors, which were specially designed for operating at elevated temperatures ~350⁰C. The existing current distributions resulted in different behavior of EM-induced failures in these nets: a gradual voltage evolution in power net, and sharp changes in ground net were observed in experiment, and successfully reproduced in simulations.
9:15 - 9:30 am PDT. Break
9:30 - 10:30 am PDT. Session on Placement
Chair: Yibo Lin (Peking University)
1. "Placement Initialization via Sequential Subspace Optimization with Sphere Constraints", Pengwen Chen, Chung-Kuan Cheng, Albert Chern, Chester Holtz, Aoxi Li, Yucheng Wang (National Chung Hsing University, University of California San Diego), best paper candidate, [slides], [overview video], [video]
State-of-the-art analytical placement algorithms for VLSI designs rely on solving nonlinear programs to minimize wirelength and cell congestion. As a consequence, the quality of solutions produced using these algorithms crucially depends on the initial cell coordinates. In this work, we reduce the problem of finding wirelength-minimal initial layouts subject to density and fixed-macro constraints to a Quadratically Constrained Quadratic Program (QCQP). We additionally propose an efficient sequential quadratic programming algorithm to recover a block-globally optimal solution and a subspace method to reduce the complexity of problem. We extend our formulation to facilitate direct minimization of the Half-Perimeter Wirelength (HPWL) by showing that a corresponding solution can be derived by solving a sequence of reweighted quadratic programs. Critically, our method is parameter-free, i.e. involves no hyperparameters to tune. We demonstrate that incorporating initial layouts produced by our algorithm with a global analytical placer results in improvements of up to 4.76% in post-detailed-placement wirelength on the ISPD’05 benchmark suite. Our code is available on github.
2. "DREAM-GAN: Advancing DREAMPlace towards Commercial-Quality Using Generative Adversarial Learning", Yi-Chen Lu, Haoxing Ren, Hao-Hsiang Hsiao, Sung Kyu Lim (Georgia Institute of Technology, NVIDIA), [slides], [overview video], [video]
DREAMPlace is a renowned open-source placer that provides GPUacceleratable infrastructure for placements of Very-Large-ScaleIntegration (VLSI) circuits. However, due to its limited focus on wirelength and density, existing placement solutions of DREAMPlace are not applicable to industrial design flows. To improve DREAMPlace towards commercial-quality without knowing the black-boxed algorithms of the tools, in this paper, we present DREAM-GAN, a placement optimization framework that advances DREAMPlace using generative adversarial learning. At each placement iteration, aside from optimizing the wirelength and density objectives of the vanilla DREAMPlace, DREAM-GAN computes and optimizes a differentiable loss that denotes the similarity score between the underlying placement and the tool-generated placements in commercial databases. Experimental results on 5 commercial and OpenCore designs using an industrial design flow implemented by Synopsys ICC2 not only demonstrate that DREAM-GAN significantly improves the vanilla DREAMPlace at the placement stage across each benchmark, but also show that the improvements last firmly to the post-route stage, where we observe improvements by up to 8.3% in wirelength and 7.4% in total power.
3. "AutoDMP: Automated DREAMPlace-based Macro Placement", Anthony Agnesina (NVIDIA), [slides], [overview video], [video]
Macro placement is a critical very large-scale integration (VLSI) physical design problem that significantly impacts the design powerperformance-area (PPA) metrics. This paper proposes AutoDMP, a methodology that leverages DREAMPlace, a GPU-accelerated placer, to place macros and standard cells concurrently in conjunction with automated parameter tuning using a multi-objective hyperparameter optimization technique. As a result, we can generate high-quality predictable solutions, improving the macro placement quality of academic benchmarks compared to baseline results generated from academic and commercial tools. AutoDMP is also computationally efficient, optimizing a design with 2.7 million cells and 320 macros in 3 hours on a single NVIDIA DGX Station A100. This work demonstrates the promise and potential of combining GPU-accelerated algorithms and ML techniques for VLSI design automation.
4. "Assessment of Reinforcement Learning for Macro Placement", Andrew B. Kahng (University of California at San Diego), [slides], [overview video], [video]
We provide open, transparent implementation and assessment of Google Brain’s deep reinforcement learning approach to macro placement [9] and its Circuit Training (CT) implementation in GitHub [23]. We implement in open-source key “blackbox” elements of CT, and clarify discrepancies between CT and [9]. New testcases on open enablements are developed and released. We assess CT alongside multiple alternative macro placers, with all evaluation flows and related scripts public in GitHub. Our experiments also encompass academic mixed-size placement benchmarks, as well as ablation and stability studies. We comment on the impact of [9] and CT, as well as directions for future research.
10:30 - 10:45 am PDT. Break
10:45 - 11:30 am PDT. Session on New Computing Techniques and Accelerators
Chair: Deepashree Sengupta (Synopsys)
1. "GPU Acceleration in Physical Synthesis", Evangeline Young (Chinese University of Hong Kong), [slides], [overview video], [video]
Placement and routing are essential steps in physical synthesis of VLSI designs. Modern circuits contain billions of cells and nets, which significantly increases the computational complexity of physical synthesis and brings big challenges to leading-edge physical design tools. With the fast development of GPU architecture and computational power, it becomes an important direction to explore speeding up physical synthesis with massive parallelism on GPU. In this talk, we will look into opportunities to improve EDA algorithms with GPU acceleration. Traditional EDA tools run on CPU with limited degree of parallelism. We will investigate a few examples of accelerating some classical algorithms in placement and routing using GPU. We will see how one can leverage the power of GPU to improve both quality and computational time in solving these EDA problems.
2. "Efficient Runtime Power Modeling with On-Chip Power Meters", Zhiyao Xie (Hong Kong University of Science and Technology), [slides], [overview video], [video]
Accurate and efficient power modeling techniques are crucial for both design-time power optimization and runtime on-chip IC management. In prior research, different types of power modeling solutions have been proposed, optimizing multiple objectives including accuracy, efficiency, temporal resolution, and automation level, targeting various power/voltage-related applications. Despite extensive prior explorations in this topic, new solutions still keep emerging and achieve state-of-the-art performance. Considering the increasing complexity and importance of the problem, this paper aims at providing a review of the recent progress in power modeling, with more focus on promising runtime on-chip power meter (OPM) development techniques. It also serves as a vehicle for discussing some general development techniques for the runtime on-chip power modeling task.
3. "DREAMPlaceFPGA-PL: An Open-Source GPU-Accelerated Packer-Legalizer for Heterogeneous FPGAs", Rachel Selina Rajarathnam, Zixuan Jiang, Mahesh A. Iyer, David Z. Pan (University of Texas at Austin, Intel Corporation), [slides], [overview video], [video]
Placement plays a pivotal and strategic role in the FPGA implementation flow to allocate the physical locations of the heterogeneous instances in the design. Among the placement stages, the packing or clustering stage groups logic instances like look-up tables (LUTs) and flip-flops (FFs) that could be placed on the same site. The legalization stage determines all instances’ physical site locations. With advances in FPGA architecture and technology nodes, designs contain millions of logic instances, and placement algorithms must scale accordingly. While other placement stages - global placement and detailed placement, have been accelerated using GPUs, the acceleration of packing and legalization stages on a GPU remains largely unexplored. This work presents DREAMPlaceFPGA-PL, an open-source packer-legalizer for heterogeneous FPGAs that employs GPU for acceleration. We revise the existing consensus-based parallel algorithms employed for packing and legalizing a flat placement to obtain further speedup on a GPU. Our experiments on the ISPD’2016 benchmarks demonstrate more than 2x acceleration.
11:30 - 11:45 am PDT. Break
11:45 - 1:15 pm PDT. Lifetime Achievement Commemoration for Professor Malgorzata Marek-Sadowska [Biography]
Chair: Aida Todri Sanial (Eindhoven University of Technology)
1. "Building Oscillatory Neural Networks: AI Applications and Physical Design Challenges", Aida Todri-Sanial (Eindhoven University of Technology), [slides], [video]
This talk is about a novel computing paradigm based on coupled oscillatory neural networks. Oscillatory neural networks (ONNs) are recurrent neural networks where each neuron is an oscillator and oscillator couplings are the synaptic weights. Inspired by Hopfield Neural Networks, ONNs make use of nonlinear dynamics to compute and solve computational problems such as associative memory tasks and combinatorial optimization problems difficult to address with conventional digital computers. An exciting direction in recent years has been to implement Ising machines based on the Ising model of coupled binary spins on magnets. In this talk, I cover the design aspects of building ONNs from devices to architecture to allow to benefit from the parallel computations with oscillators while implementing them in an energy efficient way.
2. "Optimization of AI SoC with Compiler-Assisted Virtual Design Platform", Chih-Tsun Huang (National Tsing Hua University), [slides], [video]
As deep learning keeps evolving dramatically with rapidly increasing complexity, the demand for efficient hardware accelerators has become vital. However, the lack of software/hardware co-development toolchains makes designing AI SoCs (artificial intelligent systemon-chips) considerably challenging. This paper presents a compilerassisted virtual platform to facilitate the development of AI SoCs from the early design stage. The electronic system-level design platform provides rapid functional verification and performance/energy analysis. Cooperating with the neural network compiler, AI software and hardware can be co-optimized on the proposed virtual design platform. Our Deep Inference Processor is also utilized on the virtual design platform to demonstrate the effectiveness of the architectural evaluation and exploration methodology
3. "Challenges and Opportunities for Computing-in-Memory Chips", Xiang Qiu (East China Normal University), [slides], [video]
In recent years, artificial neural networks have been applied to many scenarios, from daily life applications like face detection, to industry problems like placement and routing in physical design. Neural network inference mainly contains multiply–accumulate operations, which requires huge amount of data movement. Traditional Von-Neumann architecture computers are inefficient for neural networks as they have separate CPU and memory, and data transfer between them costs excessive energy and performance. To address this problem, in-memory or nearmemory computing have been proposed and attracted much attention in both academic and industry. In this talk, we will give a brief review of non-volatile memory crossbar-based computing-in-memory architecture. Next, we will demonstrate the challenges for chips with such architecture to replace current CPUs/GPUs for neural network processing, from an industry perspective. Lastly, we will discuss possible solutions for those challenges.
1:15 - 1:45 pm PDT. Chit-Chat on Gather.Town with Professor Malgorzata Marek-Sadowska
7:00 - 7:50 am PDT. Third Keynote
Chair: Laleh Behjat (University of Calgary)
"Neural Operators for Solving PDEs and Inverse Design", Anima Anandkumar (NVIDIA / Caltech), [slides], [video]
Deep learning surrogate models have shown promise in modeling complex physical phenomena such as photonics, fluid flows, molecular dynamics and material properties. However, standard neural networks assume finite-dimensional inputs and outputs, and hence, cannot withstand a change in resolution or discretization between training and testing. We introduce Fourier neural operators that can learn operators, which are mappings between infinite dimensional spaces. They are discretization- invariant and can generalize beyond the discretization or resolution of training data. They can efficiently solve partial differential equations (PDEs) on general geometries. We consider a variety of PDEs for both forward modeling and inverse design problems, as well as show practical gains in the lithography domain.
7:50 - 8:00 am PDT. Break
8:00 - 8:45 am PDT. Session on Quantum Computing
Chair: Giovanni de Micheli (École Polytechnique Fédérale de Lausanne)
1. "Quantum Challenges for EDA", Leon Stok (IBM Corporation), [slides], [overview video], [video]
Though early in its development, quantum computing is now available on real hardware and via the cloud through IBM Quantum. This radically new kind of computing holds open the possibility of solving some problems that are now and perhaps always will be intractable for “classical” computers. As with any new technology things are developing rapidly but there are still a lot of open questions. What is the status of Quantum computers today? What are the key metrics we need to look at to improve a Quantum System? What are some of the technical opportunities being looked at from an EDA perspective. We will look at the Quantum Roadmap for the next couple of years and outline challenges that need to be solved and how the EDA community can potentially contribute to solve these challenges
2. "Developing Quantum Workloads for Workload-Driven Co-Design", Anne Matsuura (Intel Corporation), [slides], [overview video], [video]
Quantum computing offers the future promise of solving problems that are intractable for classical computers today. However, as an entirely new kind of computational device, we must learn how to best develop useful workloads. Today’s small workloads serve the dual purpose that they can also be used to learn how to design a better quantum computing system architecture. At Intel Labs, we develop small applicationoriented workloads and use them to drive research into the design of a scalable quantum computing system architecture. We run these small workloads on the small systems of qubits that we have today to understand what is required from the system architecture to run them efficiently and accurately on real qubits. In this presentation, I will give examples of quantum workload-driven co-design and what we have learned from this type of research.
3. "MQT QMAP: Efficient Quantum Circuit Mapping", Robert Wille (Technical University of Munich), [slides], [overview video], [video]
Quantum computing is an emerging technology that has the potential to revolutionize fields such as cryptography, machine learning, optimization, and quantum simulation. However, a major challenge in the realization of quantum algorithms on actual machines is ensuring that the gates in a quantum circuit (i.e., corresponding operations) match the topology of a targeted architecture so that the circuit can be executed while, at the same time, the resulting costs (e.g., in terms of the number of additionally introduced gates, fidelity, etc.) are kept low. This is known as the quantum circuit mapping problem. This summary paper provides an overview of QMAP—an open-source tool that is part of the Munich Quantum Toolkit (MQT) and offers efficient, automated, and accessible methods for tackling this problem. To this end, the paper first briefly reviews the problem. Afterwards, it shows how QMAP can be used to efficiently map quantum circuits to quantum computing architectures from both a user’s and a developer’s perspective. QMAP is publicly available as open-source at https://github.com/cda-tum/qmap.
8:45 - 9:00 am PDT. Break
9:00 - 10:15 am PDT. Panel on EDA for domain specific computing, [discussion video]
This panel explores domain-specific computing from hardware, software, and EDA perspectives. Domain-specific customization can improve power-performance efficiency by orders-of-magnitude for important applications, such as machine learning and image processing. Domain-specific computers have been deployed recently, for example: the Google Tensor Processing Unit; Microsoft Catapult which is FPGA-based; and the RISC-V architecture and open instruction set for heterogeneous computing. Expediting software development with optimized compilation for fast computation on heterogeneous architectures is a difficult task, and must be considered with the hardware design.
Chair:Deming Chen (University of Illinois Urbana-Champaign)
1. "Software-Driven Design for Domain-Specific Compute", Desmond Kirkpatrick (Intel Corporation), [slides], [video]
The end of Dennard scaling has created a focus on advancing domain-specific computing; we are seeing a renaissance of accelerating compute problems through specialization, with orders-of-magnitude improvement in performance and energy efficiency [1]. Domain-specific compute, with its wide proliferation of domains and narrow specialization of hardware and software, provides unique challenges in design automation not met by the methodologies matured under the model of high-volume manufacturing of competitive CPUs, GPUS, and SOCs [2]. Importantly, domain-specific compute targets smaller markets that move more rapidly so design NRE plays a much larger role. Secondly, the role of software is so much more significant that we believe a software-first approach, where software drives hardware design and the product is developed at the speed of software, is required to keep pace with domain-specific compute market requirements. This creates significant new challenges and opportunities for EDA to address the domain-specific compute design space. The forces that are driving the renaissance in domain-specific compute architectures also require a renaissance in the tools, flows, and methods to maintain this pace of innovation.
2. "Google Investment in Open Source Custom Hardware Development Including No-Cost Shuttle Program", Tim Ansell (Google), [slides], [overview video]
The end of Moore’s Law combined with unabated growth in usage have forced Google to turn to hardware acceleration to deliver efficiency gains to meet demand. Traditional hardware design methodology for accelerators is practical when there’s a common core – such as with Machine Learning (ML) or video transcoding, but what about the hundreds of smaller tasks performed in Google data centers? Our vision is “softwarespeed” development for hardware acceleration so that it becomes commonplace and, frankly, boring. Toward this goal Google is investing in open tooling to foster innovation in multiplying accelerator developer productivity. Tim Ansell will provide an outline of these coordinated open source projects in EDA (including high level synthesis), IP, PDKs, and related areas. This will be followed by presenting the CFU (Custom Function Unit) Playground, which utilizes many of these projects. The CFU Playground lets you build your own specialized & optimized ML processor based on the open RISC-V ISA, implemented on an FPGA using a fully open source stack. The goal isn't general ML extensions; it's about a methodology for building your own extension specialized just for your specific tiny ML model. The extension can range from a few simple new instructions, up to a complex accelerator that interfaces to the CPU via a set of custom instructions; we will show examples of both.
3. "A Case for Open EDA Verticals", Zhiru Zhang (Cornell University), [slides], [video]
With the end of Dennard scaling and Moore’s Law reaching its limits, domain-specific hardware specialization has become a crucial method for improving compute performance and efficiency for various important applications. Leading companies in competitive fields, such as machine learning and video processing, are building their own in-house technology stacks to better suit their accelerator design needs. However, currently this approach is only a viable option for a few large enterprises that can afford to invest in teams of experts in hardware, systems, and compiler development for high-value applications. In particular, the high license cost of commercial electronic design automation (EDA) tools presents a significant barrier for small and mid-size engineering teams to create new hardware accelerators. These tools are essential for designing, simulating, and testing new hardware, but can be too expensive for smaller teams with limited budgets, reducing their ability to innovate and compete with larger organizations. We advocate for open EDA verticals as a solution to enabling more widespread use of domain-specific hardware acceleration. The objective is to empower small teams of domain experts to productively develop high-performance accelerators using programming interfaces they are already familiar with.
4. "Addressing the EDA Roadblocks for Domain-Specific Compilers: an Industry Perspective", Alireza Kaviani (AMD), [slides], [video]
Computer architects are now widely subscribed to domainspecific architectures as being the only path left for major improvements in performance-cost-energy. As a result, future compilers need to go beyond their traditional role of mapping a design input to a generic hardware platform. Emerging domainspecific compilers must subscribe to a broader view in which compilers provide more control to the end users, enabling customization of hardware components to implement their corresponding tasks. We make several proposals how EDA vendors could support domainspecific customized flows, based on experience with RapidWright and RapidStreams for FPGAs.
5. "High-level Synthesis for Domain Specific Computing", Deming Chen (University of Illinois Urbana-Champaign), [slides], [video]
This paper proposes a High-Level Synthesis (HLS) framework for domain-specific computing. The framework contains three key components: 1) ScaleHLS, a multi-level HLS compilation flow. Aimed to address the lack of expressiveness and hardware-dedicated representation of traditional software-oriented compilers. ScaleHLS introduces a hierarchical intermediate representation (IR) for the progressive optimization of HLS designs defined in various high-level languages. ScaleHLS consists of three levels of optimizations, including graph, loop, and directive levels, to realize an efficient compilation pipeline and generate highly-optimized domain-specific accelerators. 2) AutoScaleDSE is an automated design space exploration (DSE) engine. Real-world HLS designs often come with large design spaces that are difficult for designers to explore. Meanwhile, the connections between different components of an HLS design further complicate the design spaces. In order to address the DSE problem, AutoScaleDSE proposes a random forest classifier and a graph-driven approach to improve the accuracy of estimating the intermediate DSE results while reducing the time and computational cost. With this new approach, AutoScaleDSE can evaluate thousands of HLS design points and find the Paretodominating design points within a couple of hours. 3) PyTransform is a flexible pattern-driven design customization flow. Existing HLS flows demand manual code rewriting or intrusive compiler customization to conduct domain-specific optimizations, leading to unscalable or inflexible compiler solutions. PyTransform proposes a Python-based flow that enables users to define custom matching and rewriting patterns at a high level of abstraction, being able to be incorporated into the DSL compilation flow in an automatic and scalable manner. In summary, ScaleHLS, AutoScaleDSE, and PyTransform aim to address the challenges present in the compilation, DSE, and customization of existing HLS flows, respectively. With the three key components, our newly proposed HLS framework can deliver a scalable and extensible solution for designing domain-specific languages to automate and speed up the process of designing domain-specific accelerators.
10:15 - 10:30 am PDT. Break
10:30 - 11:30 am PDT. Session on hardware security and bug fixing
Chair: Samuel Pagliarini (Tallinn University of Technology)
1. "Security-aware Physical Design against Trojan Insertion, Frontside Probing, and Fault Injection Attacks", Jhih-Wei Hsu, Kuan-Cheng Chen, Yan-Syuan Chen, Yu-Hsiang Lo, Yao-Wen Chang (National Taiwan University), [slides], [overview video], [video]
The dramatic growth of hardware attacks and the lack of securityconcern solutions in design tools lead to severe security problems in modern IC designs. Although many existing countermeasures provide decent protection against security issues, they still lack the global design view with sufficient security consideration in design time. This paper proposes a security-aware framework against Trojan insertion, frontside probing, and fault injection attacks at the design stage. The framework consists of two major techniques: (1) a large-scale shielding method that effectively covers the exposed areas of assets and (2) a cell-movement-based method to eliminate the empty spaces vulnerable to Trojan insertion. Experimental results show that our framework effectively reduces the vulnerability of these attacks and achieves the best overall score compared with the top-3 teams in the 2022 ACM ISPD Security Closure of Physical Layouts Contest.
2. "Security Closure of IC Layouts Against Hardware Trojans", Fangzhou Wang, Qijing Wang, Bangqi Fu, Shui Jiang, Xiaopeng Zhang, Lilas Alrahis, Ozgur Sinanoglu, Johann Knechtel, Tsung-Yi Ho, Evangeline F. Y. Young, (The Chinese University of Hong Kong, New York University Abu Dhabi), [slides], [overview video], [video]
Due to cost benefits, supply chains of integrated circuits (ICs) are largely outsourced nowadays. However, passing ICs through various third-party providers gives rise to many threats, like piracy of IC intellectual property or insertion of hardware Trojans, i.e., malicious circuit modifications. In this work, we proactively and systematically harden the physical layouts of ICs against post-design insertion of Trojans. Toward that end, we propose a multiplexer-based logic-locking scheme that is (i) devised for layout-level Trojan prevention, (ii) resilient against state-of-the-art, oracle-less machine learning attacks, and (iii) fully integrated into a tailored, yet generic, commercial-grade design flow. Our work provides in-depth security and layout analysis on a challenging benchmark suite. We show that ours can render layouts resilient, with reasonable overheads, against Trojan insertion in general and also against second-order attacks (i.e., adversaries seeking to bypass the locking defense in an oracle-less setting). We release our layout artifacts for independent verification [29].
3. "X-Volt: Joint Tuning of Driver Strengths and Supply Voltages Against Power Side-Channel Attacks", Saideep Sreekumar, Mohammed Ashraf, Mohammed Nabeel, Ozgur Sinanoglu, Johann Knechtel (New York University Abu Dhabi), [slides], [overview video], [video]
Power side-channel (PSC) attacks are well-known threats to sensitive hardware like advanced encryption standard (AES) crypto cores. Given the significant impact of supply voltages (VCCs) on power profiles, various countermeasures based on VCC tuning have been proposed, among other defense strategies. Driver strengths of cells, however, have been largely overlooked, despite having direct and significant impact on power profiles as well. For the first time, we thoroughly explore the prospects of jointly tuning driver strengths and VCCs as novel working principle for PSC-attack countermeasures. Toward this end, we take the following steps: 1) we develop a simple circuit-level scheme for tuning; 2) we implement a CAD flow for design-time evaluation of ASICs, enabling security assessment of ICs before tape-out; 3) we implement a correlation power analysis (CPA) framework for thorough and comparative security analysis; 4) we conduct an extensive experimental study of a regular AES design, implemented in ASIC as well as FPGA fabrics, under various tuning scenarios; 5) we summarize design guidelines for secure and efficient joint tuning. In our experiments, we observe that runtime tuning is more effective than static tuning, for both ASIC and FPGA implementations. For the latter, the AES core is rendered >11.8x (i.e., at least 11.8 times) as resilient as the untuned baseline design. Layout overheads can be considered acceptable, with, e.g., around +10% critical-path delay for the most resilient tuning scenario in FPGA. We release source codes for our methodology, as well as artifacts from the experimental study in [13].
4. "Validating the Redundancy Assumption for HDL from Code Clone’s Perspective", Jianjun Xu, Jiayu He, Jingyan Zhang, Deheng Yang, Jiang Wu, Xiaoguang Mao (National University of Defense Technology), [slides], [overview video], [video]
Automated program repair (APR) is being leveraged in hardware description languages (HDLs) to fix hardware bugs without human involvement. Most existing APR techniques search for donor code (i.e., code fragment for bug fixing) in the original program to generate repairs, which is based on the assumption that donor code can be found in existing source code. The redundancy assumption is the fundamental basis of most APR techniques, which has been widely studied in software by searching code clones of donor code. However, despite a large body of work on code clone detection, researchers have focused almost exclusively on repositories in traditional programming languages, such as C/C++ and Java, while few studies have been done on detecting code clones in HDLs. Furthermore, little attention has been paid on the repetitiveness of bug fixes in hardware designs, which limits automatic repair targeting HDLs. To validate the redundancy assumption for HDL, we perform an empirical study on code clones of real-world bug fixes in Verilog. On top of empirical results, we find that 17.71% of newly introduced code in bug fixes can be found from the clone pairs of buggy code in the original program, and 11.77% can be found in the file itself. The findings not only validate the assumption but also provides helpful insights for the design of APR targeting HDLs.
11:30 - 11:45 am PDT. Break
11:45 - 12:45 pm PDT. ISPD 2023 Contest
Chair: Iris Hui-Ru Jiang (National Taiwan University)
"Benchmarking Advanced Security Closure of Physical Layouts: ISPD 2023 Contest", Johann Knechtel (New York University) , [slides], [video]
Computer-aided design (CAD) tools traditionally optimize “only” for power, performance, and area (PPA). However, given the wide range of hardware-security threats that have emerged, future CAD flows must also incorporate techniques for designing secure and trustworthy integrated circuits (ICs). This is because threats that are not addressed during design time will inevitably be exploited in the field, where system vulnerabilities induced by ICs are almost impossible to fix. However, there is currently little experience for designing secure ICs within the CAD community. This contest seeks to actively engage with the community to close this gap. The theme is security closure of physical layouts, that is, hardening the physical layouts at design time against threats that are executed post-design time. Acting as security engineers, contest participants will proactively analyse and fix the vulnerabilities of benchmark layouts in a blue-team approach. Benchmarks and submissions are based on the generic DEF format and related files. This contest is focused on the threat of Trojans, with challenging aspects for physical design in general and for hindering Trojan insertion in particular. For one, layouts are based on the ASAP7 library and rules are strict, e.g., no DRC issues and no timing violations are allowed at all. In the alpha/qualifying round, submissions are evaluated using first-order metrics focused on exploitable placement and routing resources, whereas in the final round, submissions are thoroughly evaluated (red-teamed) through actual insertion of different Trojans.
"Closing Remarks and Outlook to ISPD 2024", Iris Hui-Ru Jiang (ISPD 2024 General Chair), [slides], [video]
12:45 - 1:30 pm PDT. Farewell Chit-Chat on Gather.Town.