March 12-15, 2024
National Taiwan University, Taipei, Taiwan

ISPD 2024 Program

All times in Taipei Time (GMT+8, CET+7, EDT+12, PDT+15).

Across the three days for ISPD 2024, we have 3 keynotes, 18 accepted papers, 16 invited talks, one panel on Wednesday with 6 panelists, 4 speakers with longer talks for Professor Martin D. F. Wong's commemorative session, and finally the ISPD 2024 contest results.

Tuesday, March 12, 2024

18:00 - 20:00: Welcome dinner reception

Location: The Howard Plaza Hotel Taipei [4F, Park Avenue]
台北福華大飯店
No. 160, Section 3, Ren'ai Rd, Da'an District, Taipei City, Taiwan 106


Day 1 Wednesday, March 13, 2024

8:30 - 8:40: Opening

8:40 - 9:30: Keynote

Chair: Iris Hui-Ru Jiang (National Taiwan University)

"Engineering the Future of IC Design with AI", Ruchir Puri (IBM Research) [abstract]

Abstract: Software and Semiconductors are two fundamental technologies that have become woven into every aspect of our society, and it will be fair to say that "Software and Semiconductors have eaten the world". More recently, advances in AI are starting to transform every aspect of our society as well. These are three tectonic forces of transformation - "AI", "Software", and "Semiconductors" which are colliding together resulting in a seismic shift - a future where both software and semiconductor chips themselves will be designed, optimized, and operated by AI - pushing us towards a future where "Computers can program themselves!". In this talk, we will discuss these forces of "AI for Chips and Code" and how the future of Semiconductor chip design and software engineering is being redefined by AI.

9:30 - 9:50: Break

9:50 - 10:50: Partitioning and Clustering

Chair: Patrick Madden (State University of New York at Binghamton)

1. "MedPart: A Multi-Level Evolutionary Differentiable Hypergraph Partitioner", Rongjian Liang, Anthony Agnesina and Haoxing Ren, (Nvidia) [abstract]

Abstract: State-of-the-art hypergraph partitioners, such as hMETIS, usually adopt a multi-level paradigm for efficiency and scalability. However, they are prone to getting trapped in local minima due to their reliance on refinement heuristics and overlooking global structural information during coarsening. SpecPart, the most advanced academic hypergraph partitioning refinement method, improves partitioning by leveraging spectral information. Still, its success depends heavily on the quality of initial input solutions. This work introduces MedPart, a multi-level evolutionary differentiable hypergraph partitioner. MedPart follows the multi-level paradigm but addresses its limitations by using fast spectral coarsening and introducing a novel evolutionary differentiable algorithm to optimize each coarsening level. Moreover, by analogy between hypergraph partitioning and deep graph learning, our evolutionary differentiable algorithmcanbeacceleratedwithdeepgraphlearning toolkits on GPUs. Experiments on public benchmarks consistently show MedPart outperforming hMETIS and achieving up to a 30% improvement in cut size for some benchmarks compared to the best-published solutions, including those from SpecPart---moreover, MedPart's runtime scales linearly with the number of hyperedges.

2. "FuILT: Full Chip ILT System With Boundary Healing", Shuo Yin, Wenqian Zhao, Li Xie, Hong Chen, Yuzhe Ma, Tsung-Yi Ho and Bei Yu (The Chinese University of Hong Kong, Hong Kong University of Science and Technology, ShenZhen Guoweixin Technology Co., Ltd) [abstract]

Abstract: Mask optimization in lithography is becoming increasingly important as the technology node size shrinks down. Inverse Lithography Technology (ILT) is one of the most performant and robust solutions widely used in the industry, yet it still suffers from heavy time consumption and complexity. As the number of transistors scales up, the industry currently focuses more on efficiency improvement and workload distribution. Meanwhile, most recent publications are still tangled in local pattern restoration regardless of real manufacturing conditions. We are trying to extend academia to some real industrial bottlenecks with FuILT, a practical full-chip ILT-based mask optimization flow. Firstly, we build a multi-level partitioning strategy with thedivide-and-conquer mindset totackle the full-chip ILT problem. Secondly, we implement a workload distribution framework to maintain hardware efficiency with scalable multi-GPU parallelism. Thirdly, we propose a gradient-fusion technique and a multi-level healing strategy to fix the boundary error at different levels. Our experimental results on different layers from real designs show that FuILT is both effective and generalizable.

3. "Slack Redistributed Register Clustering with Mixed-Driving Strength Multi-bit Flip-Flops", Yen-Yu Chen, Hao-Yu Wu, Iris Hui-Ru Jiang, Cheng-Hong Tsai and Chien-Cheng Wu (National Taiwan University, Global Unichip Corp.) [abstract]

Abstract: Register clustering is an effective technique for suppressing the increasing dynamic power ratio in modern IC design. By clustering registers (flip-flops) into multi-bit flip-flops (MBFFs), clock circuitry can be shared, and the number of clock sinks and buffers can be lowered, thereby reducing power consumption. Recently, the use of mixed-driving strength MBFFs has provided more flexibility for power and timing optimization. Nevertheless, existing register clustering methods usually employ evenly distributed and invariant path slack strategies. Unlike them, in this work, we propose a register clustering algorithm with slack redistribution at the post-placement stage. Our approach allows registers to borrow slack from connected paths, creates the possibility to cluster with neighboring maximal cliques, and releases extra slack. An adaptive interval graph based on the red-black tree is developed to efficiently adapt timing feasible regions of flip-flops for slack redistribution. An attraction-repulsion force model is tailored to wisely select flip flops to be included in each MBFF. Experimental results show that our approach outperforms state-of-the-art work in terms of clock power reduction, timing balancing, and runtime.

10:50 - 11:00: Break

11:00 - 12:00: Timing optimization

Chair: Pei-Yu Lee (Synopsys)

1. "Calibration-Based Differentiable Timing Optimization in Non-linear Global Placement", Wuxi Li, Yuji Kukimoto, Gregory Servel, Ismail Bustany and Mehrdad E. Dehkordi (AMD) [abstract]

Abstract: Placement plays a crucial role in the timing closure of integrated circuit (IC) physical design. This paper presents an efficient and effective calibration-based differentiable timing-driven global placement engine. Our key innovation is a calibration technique that approximates a precise but expensive reference timer, such as a sign-off timer, using a lightweight simple timer. This calibrated simple timer inherently accounts for intricate timing exceptions and common path pessimism removal (CPPR) prevalent in industry designs. Extendingthis calibrated simple timer into a differentiable timing engine enables ultrafast yet accurate timing optimization in non-linear global placement. Experimental results on various industry designs demonstrate the superiority of the proposed framework over the latest AMD Vivado and traditional net-weighting methods across key metrics including maximumclock frequency, wirelength, routability, and overall back-end runtime.

2. "Novel Airgap Insertion and Layer Reassignment for Timing Optimization Guided by Slack Dependency", Wei-Chen Tai, Min-Hsien Chung and Iris Hui-Ru Jiang (National Taiwan University) [abstract]

Abstract: BEOL with airgap technology is an alternative metallization option with promising performance, electrical yield and reliability to explore at 2nm node and beyond. Airgaps form cavities in inter-metal dielectrics (IMD) between interconnects. The ultra-low dielectric constant reduces line-to-line capacitance, thus shortening the interconnect delay. The shortened interconnect delay is beneficial to setup timing but harmful to hold timing. To minimize the additional manufacturing cost, the number of metal layers that accommodate airgaps is practically limited. Hence, circuit timing optimization at post routing can be achieved by wisely performing airgap insertion and layer reassignment to timing critical nets. In this paper, we present a novel and fast airgap insertion approach for timing optimization. A Slack Dependency Graph (SDG) is constructed to view the timing slack relationship of a circuit with path segments. With the global view provided by SDG, we can avoid ineffective optimizations. Our Linear Programming (LP) formulation simultaneously solves airgap insertion and layer reassignment and allows a flexible amount of airgap to be inserted. Both SDG update and LP solving can be done extremely fast. Experimental results show that our approach outperforms the state-of-the-art work on both total negative slack (TNS) and worst negative slack (WNS) with more than 89X speedup.

3. "Parallel and Heterogeneous Timing Analysis: Partition, Algorithm, and System", Tsung-Wei Huang, Boyang Zhang, Dian-Lun Lin, and Cheng-Hsiang Chiu (University of Wisconsin at Madison) [abstract]

Abstract: Static timing analysis (STA) is an integral part in the overall design flow because it verifies the expected timing behaviors of a circuit. However, as the circuit complexity continues to enlarge, there is an increasing need for enhancing the performance of existing STA algorithms using emerging heterogeneous parallelism that comprises manycore central processing units (CPUs) and graphics processing units (GPUs). In this paper, we introduce several state-of-the-art STA techniques, including task-based parallelism, task graph partition, and GPU kernel algorithms, all of which have brought significant performance benefits to STA applications. Motivated by these successful results, we will introduce a task-parallel programming system to generalize our solutions to benefit broader scientific computing applications.

12:00 - 13:00: Lunch

13:00 - 14:00: Panel: EDA Challenges at Advanced Technology Nodes [abstract]

Chair: Tung-Chieh Chen (Synopsys)

We have gathered a panel of experts who will delve into the electronic design automation (EDA) challenges at advanced technology nodes. In this rapidly evolving field, the race towards miniaturization presents new hurdles and complexities. With advanced nodes shrinking, design and technology co-optimization becomes an increasingly intricate task. Our panelists will share their unique insights on how they approach these challenges and the impact of complicated design rules on the design process. As designs grow larger and more complex, novel strategies and methodologies are necessary to address the runtime issues of EDA tools. Our discussion will explore these strategies, including promising emerging technologies such as multi-machine or GPU acceleration that show potential in mitigating runtime challenges. We will also delve into the scaling and runtime challenges at multi-million gates and beyond, the current state of EDA tools for data-sharing for machine learning/AI across tools, and the major new issues that need to be addressed for more optimal design in the 2nm to 5nm process technology nodes. Finally, we will discuss the high-priority research topics that need to be tackled to address advanced technology nodes. We look forward to an enlightening discussion filled with expert insights and thought-provoking ideas.

Panelists:

Andrew Kahng (University of California at San Diego)
Bei Yu (Chinese University of Hong Kong)
Eugene Liu (Siemens EDA)
Guang-Wan Liao (Cadence)
I-Lun Tseng (MediaTek)
Keh-Jeng Chang (TSMC)

14:00 - 14:10: Break

14:10 - 15:30: 3D IC

Chair: Chung-Ching Peng (Intel)

1. "Routing-aware Legal Hybrid Bonding Terminal Assignment for 3D Face-to-Face Stacked ICs", Siting Liu, Jiaxi Jiang, Zhuolun He, Ziyi Wang, Yibo Lin, Bei Yu and Martin Wong (The Chinese University of Hong Kong, Peking University) [abstract]

Abstract: Face-to-face (F2F) stacked 3D IC is a promising alternative for scaling beyond Moore's Law. In F2F 3D ICs, dies are connected through bonding terminals whose positions can significantly impact routing performance. Further, there exists resource competition among all the 3D nets due to the constrained bonding terminal number. In advanced technology nodes, such 3D integration may also introduce legality challenges of bonding terminals, as the metal pitches can be much smaller than the sizes of bonding terminals. Previous works attempt to insert bonding terminals automatically using existing 2D commercial P&R tools and then consider inter-die connection legality, but they fail to take the legality and routing performance into account simultaneously. In this paper, we explore the formulation of the generalized assignment in the hybrid bonding terminal assignment problem. Our framework, BTAssign, offers a strict legality guarantee and an iterative solution. The experiments are conducted on 18 open-source designs with various 3D net densities and the most advanced bonding scale. The results reveal that BTAssign can achieve improvements in routed wirelength under all testing conditions from 1.0% to 5.0% with a tolerable runtime overhead.

2. "Unified 3D-IC Multi-Chiplet System Design Solution", Thunder Lay (Cadence) [abstract]

Abstract: With the advancements in 2.5/3D fabrication offered by Foundry Technologies for unleashing computing power, EDA tools must adapt and take a direction to be more integrated and IC centric for multi-chiplet system design. 3D stacking introduces extra design and analysis requirements like full system planning, power, thermal analysis, cross-die STA and inter-die physical verification which have to be taken into account early during planning and implementation. In this paper, Cadence presents its technology that proactively looks ahead through integrated early analysis and addresses all aspects of 3D-IC design comprehensively from system planning, implementation, analysis and system level signoff capabilities.

3. "Warpage Study by Employing an Advanced Simulation Methodology for Assessing Chip Package Interaction Effects", Jun-Ho Choy (Siemens EDA) [abstract]

Abstract: A physics-based multi-scale simulation methodology that analyses die stress variations generated by package fabrication is employed for warpage study. The methodology combines coordinate dependent anisotropic effective properties extractor with finite element analysis (FEA) engine, and computes mechanical stress globally on a package-scale, as well as locally on a feature-scale. For the purpose of mechanical failure analysis in the early stage of a package design, the warpage measurements were used for the tool's calibration. The warpage measurements on printed circuit board (PCB), interposer and chiplet samples, during heating and subsequent cooling, were employed for calibrating the model parameters. The warpage simulation results on full package represented by PCB-interposer-chiplets stack demonstrate the overall good agreement with measurement profile. Performed study demonstrates that the developed electronic design automation (EDA) tool and methodology can be used for accurate warpage prediction in different types of IC stacks at early stage of package design.

4. "Enabling System Design in 3D Integration: Technologies and Methodologies", Hung-Ming Chen (National Yang Ming Chiao Tung University) [abstract]

Abstract: 3D integration solutions have been called for in the semiconductor market for a long time to possibly substitute the place of technology scaling. It consists of 3D IC packaging, 3D IC integration, and 3D silicon integration. 3D IC packaging has been in the market, but 3D IC and silicon integrations have obtained more attention and care due to modern system requirements on high performance computing and edge AI applications. In the need of further integration in electronics system development at lower cost, chip and package design are therefore evolving along the years [11].
In our first attempt of the series manuscripts [2], we have introduced some perspectives on 3D integration for system designs; in this talk, we continue to depict the future of 3D integration as we know of, plus new observations, technologies, and methodologies such as programmable package, building-block-based multi-chiplet methodology.

15:30 - 15:50: Break

15:50 - 17:30: Artificial Intelligence and Machine Learning

Chair: Shao-Yun Fang (National Taiwan University of Science and Technology)

1. "FastTuner: Transferable Physical Design Parameter Optimization using Fast Reinforcement Learning", Hao-Hsiang Hsiao, Yi-Chen Lu, Pruek Vanna-Iampikul and Sung Kyu Lim (Georgia Institute of Technology) [abstract]

Abtract: Current state-of-the-art Design Space Exploration (DSE) methods in Physical Design (PD), including Bayesian optimization (BO) and Ant Colony Optimization (ACO), mainly rely on black-boxed rather than parametric (e.g., neural networks) approaches to improve end-of-flow Power, Performance, and Area (PPA) metrics, which often fail to generalize across unseen designs as netlist features are not properly leveraged. To overcome this issue, in this paper, we develop a Reinforcement Learning (RL) agent that leverages Graph Neural Networks (GNNs) andTransformers to perform "fast" DSE on unseen designs by sequentially encoding netlist features across different PD stages. Particularly, an attention-based encoder decoder framework is devised for "conditional" parameter tuning, and a PPA estimator is introduced to predict end-of-flow PPA metrics for RL reward estimation. Extensive studies across 7 industrial designs under the TSMC 28nm technology node demonstrate that the proposed framework FastTuner, significantly outperforms existing state-of-the-art DSE techniques in both optimization quality and runtime. where we observe improvements up to 79.38% in Total Negative Slack (TNS), 12.22% in total power, and 50x in runtime.

2. "Methodology of Resolving Design Rule Checking Violations Coupled with Fully Compatible Prediction Model", Suwan Kim, Hyunbum Park, Kyeonghyeon Baek, Kyu-Myung Choi and Taewhan Kim (Seoul National University) [abstract]

Abstract: Resolving the design rule checking (DRC) violations at the pre-route stage is critically important to reduce the time-consuming design closure process at the post-route stage. Recently, noticeable methodologies have been proposed to predict DRC hotspots using Machine Learning based prediction models. However, little attention has been paid to how the predicted DRC violations can be effectively resolved. In this paper, we propose a pre-route DRC violation resolution methodology that is tightly coupled with fully compatible prediction model. Precisely, we devise different resolution strategies for two types of DRC violations: (1) pin accessibility (PA)-related and (2) routing congestion (RC)-related. To this end, we develop a fully predictable ML-based model for both PA and RC-related DRC violations, and propose completely different resolution techniques to be applied depending on the DRC violation type informed by the compatible prediction model such that for (1) PA-related DRC violation, we extract the DRC violation mitigating regions, then improve placement by formulating the whitespace redistribution problem on the regions into an instance of Bayesian Optimization problem to produce an optimal cell perturbation, while for (2) RC-related DRC violation, we manipulate the routing resources within the regions that have high potential for the occurrence of RC-related DRC violation. Through experiments, it is shown that our methodology is able to resolve the number of DRC violations by 26.54%, 25.28%, and 20.34% further on average over that by a conventional flow with no resolution, a commercial ECO router, and a state-of-the-art academic predictor/resolver, respectively, while maintaining comparable design quality.

3. "AI for EDA/Physical Design: Driving the AI Revolution: The Crucial Role of 3D-IC", Erick Chao (Cadence) [abstract]

Abstract: 3D Integrated Circuits (3D-ICs) represent a significant advancement in semiconductor technology, offering enhanced functionality in smaller form factors, improved performance, and cost reductions. These 3D-ICs, particularly those utilizing Through-Silicon Vias (TSVs), are at the forefront of industry trends. They enable the integration of system components from various process nodes, including analog and RF, without being limited to a single node. TSVs outperform wire-bonded System in Package (SiP) in terms of reduced (RLC) parasitics, offering better performance, more power efficiency, and denser implementation. Compared to silicon interposer methods, vertical 3D die stacking achieves higher integration levels, smaller sizes, and quicker design cycles. This presentation introduces a novel AI-driven method designed to tackle the challenges hindering the automation of 3D-IC design flows.

4. "DSO.ai - A Distributed System to Optimize Physical Design Flows", Piyush Verma (Synopsys) [abstract]

Abstract: The VLSI chip design process consists of a sequence of distinct steps like floor planning, placement, clock tree synthesis and routing. Each of these steps requires solving optimization problems that are often NP-hard, and the state-of-the art algorithms are not guaranteed to the optimal. Due to the compartmentalization of the design flow into distinct steps, these optimization problems are solved sequentially, with the output of first feeding into the next. This results in an inherent inefficiency, where the optimization goal of an early step problem is estimated using a fast and approximate surrogate model for the following steps. Consequently, any improvement in the step-specific optimization algorithm, while obvious at that step, is much smaller when measured at the end of the full design flow. For example, the placement step minimizes wire length. In the absence of routed nets, this wire length might be estimated by using a simple wire length model like the Steiner tree. Thus, any improvement in the placement algorithm is limited by the accuracy of the wire length estimate.
In this presentation, we demonstrate how DSO.ai provides a flexible framework to integrate with existing design flows and serve the design quality needs throughout the design evolution cycle. We will also highlight how DSO.ai is allowing expert designers at Synopsys to package their knowledge into fully featured toolboxes ready to be deployed by novice designers. We will provide a summary of QoR gains that DSO.ai has delivered on advanced process nodes. Finally, we will show how DSO.ai's decision engine is paving the way to automating the parameter choices in the chip design flow.

5. "Solvers, Engines, Tools and Flows: The Next Wave for AI/ML in Physical Design", Andrew Kahng (UC San Diego) [abstract]

Abstract: It has been six years since an ISPD-2018 invited talk on "Machine Learning Applications in Physical Design" [25]. Since then, despite considerable activity across both academia and industry, many R&D targets remain open. At the same time, there is now clearer understanding of where AI/ML can and cannot (yet) move the needle in physical design, as well as some of the difficult blockers and technical challenges that lie ahead. Some futures for AI/ML-boosted physical design are visible across solvers, engines, tools and flows -- and in contexts that span generative AI, the modeling of "magic" handoffs at flow interstices, academic research infrastructure, and the culture of benchmarking and open-source EDA.


Day 2 Thursday, March 14, 2024

8:30 - 9:20: Keynote

Chair: Tsung-Yi Ho (Chinese University of Hong Kong)

"Physical Design Challenges in Modern Heterogeneous Integration", Yao-Wen Chang (National Taiwan University) [abstract]

Abstract: To achieve the power, performance, and area (PPA) target in modern semiconductor design, the trend to go for More-than-Moore heterogeneous integration by packing various components/dies into a package becomes more obvious as the economic advantages of More-Moore scaling for on-chip integration are getting smaller and smaller. In particular, we have already encountered the high cost of moving to more advanced technology and the high fabrication cost associated with extreme ultraviolet (EUV) lithography, mask, process, design, electronic design automation (EDA), etc. Heterogeneous integration refers to integrating separately manufactured components into a higher-level assembly (in a package or even multiple packages in a PCB) that provides enhanced functionality and improved operating characteristics. Unlike the on-chip designs with relatively regular components and wirings, the physical design problem for heterogeneous integration often needs to handle arbitrary component shapes, diverse metal wire widths, and different spacing requirements between components, wire metals, and pads, with multiple cross-physics domain considerations such as system-level, physical, electrical, mechanical, thermal, and optical effects, which are not well addressed in the traditional chip design flow. In this paper, we first introduce popular heterogeneous integration technologies and options, their layout modeling and physical design challenges, survey key published techniques, and provide future research directions for modern physical design for heterogeneous integration.

9:20 - 10:20: Analog

Chair: Jens Lienig (Dresden University of Technology)

1. "Fundamental Differences Between Analog and Digital Design Problems - an Introduction", Jürgen Scheible (Reutlingen University) [abstract]

Abstract: This article discusses fundamental differences between analog and digital circuits from a design perspective. On this basis one can understand why the design flows of these two circuit types differ so greatly, notably with regard to their degree of automation.

2. "Layout Verification using Open-Source Software", Andreas Krinke (Dresden University of Technology) [abstract]

Abstract: The design and manufacturing of integrated circuits is an expensive endeavor. The use of open-source software can lower the barrier to entry significantly, especially for smaller companies or startups. In this paper, we look at open-source software for layout verification, a crucial step in ensuring the consistency and manufacturability of a design. We showthat acomprehensive design rule check (DRC) and layout versus schematic (LVS) check for commercial technologies is possible with open-source software in general and with KLayout in particular. To facilitate the use of these tools, we present our approach to automatically generate the required DRC scripts from a more abstract representation. As a result, we are able to generate nearly 74% of the over 1000 design rules of X-FABs XH018 180nm technology as a DRC script for the open-source software KLayout. This demonstrates the potential of using open-source software for layout verification and open-source process design kits (PDKs) in general.

3. "Reinforcement Learning or Simulated Annealing for Analog Placement? A Study Based on Bounded-Sliceline Grids", Mark Po-Hung Lin, Chou-Chen Lee and Yi-Chao Hsieh (National Yang Ming Chiao Tung University, Novatek Microelectronics Corp.) [abstract]

Abstract: Analog placement is a crucial phase in analog integrated circuit synthesis, impacting the quality and performance of the final circuits. This process involves determining the physical positions of analog building blocks while minimizing chip area and interconnecting wire-length. Existing methodologies often rely on the simulated-annealing (SA) approach, prioritizing constraints like symmetry-island, proximity, and well-island. We present a novel reinforcement learning (RL) based analog placement methodology on the bounded-sliceline grid (BSG) structure. Introducing a hierarchical clustering feature in BSG, we address well-island, proximity, and symmetry constraints. In experimental comparisons with the SA approach, our RL-based method exhibits superior placement quality across various analog circuits.

10:20 - 10:40: Break

10:40 - 12:00: Placement

Chair: Bill Swartz (Timberwolf and University of Texas at Dallas)

1. "Practical Mixed-Cell-Height Legalization Considering Vertical Cell Abutment Constraint", Teng-Ping Huang and Shao-Yun Fang (National Taiwan University of Science and Technology) [abstract]

Abstract: Propelled by aggressive technology scaling, adopting mixed-cell-height design in VLSI circuits has made conventional single row-based cell legalization techniques obsolete. Furthermore, the vertical abutment constraint (VAC) among cells on consecutive rows emerges as an advanced design requirement, which has rarely been considered because the power/ground rails were sufficiently tall in conventional process nodes to isolate cells on different rows. Although there have been a number of studies on mixed-cell-height legalization, most of themcannotbetriviallyextendedtowell-tackle the general VAC due to the analytical optimization scheme. To address these issues, this work proposes the first mixed-cell-height legalization algorithm that addresses the general inter-row cell abutment constraint (i.e., VAC). The experimental results show that the proposed algorithm outperforms previous mixed-cell-height legalization works, even in the absence of the VAC. Upon applying the VAC, our algorithm offers superior performance and delivers promising results.

2. "Multi-Electrostatics Based Placement for Non-Integer Multiple-Height Cells", Yu Zhang, Yuan Pu, Fangzhou Liu, Peiyu Liao, Kaiyuan Chao, Keren Zhu, Yibo Lin and Bei Yu (The Chinese University of Hong Kong, Huawei Technologies Noah's Ark Lab, Peking University) [abstract]

Abstract: A circuit design incorporating non-integer multi-height (NIMH) cells, such as a combination of 8-track and 12-track cells, offers increased flexibility in optimizing area, timing, and power simultaneously. The conventional approach for placing NIMH cells involves using commercial tools to generate an initial global placement, followed by a legalization process that divides the block area into row regions with specific heights and relocates cells to rows of matching height. However, such placement flow often causes significant disruptions in the initial placement results, resulting in inferior wirelength. To address this issue, we propose a novel multi-electrostatics-based global placement algorithm that utilizes the NIMH-aware clustering method to dynamically generate rows. This algorithm directly tackles the global placement problem with NIMH cells. Specifically, we utilize an augmented Lagrangian formulation along with a preconditioning technique to achieve high-quality solutions with fast and robust numerical convergence. Experimental results on the OpenCores benchmarks demonstrate that our algorithm achieves about 12% improvements on HPWL with 23.5X speed up on average, outperforming state-of-the-art approaches. Furthermore, our placement solutions demonstrate a substantial improvement in WNS and TNS by 22% and 49% respectively. These results affirm the efficiency and effectiveness of our proposed algorithm in solving row-based placement problems for NIMH cells.

3. "IncreMacro: Incremental Macro Placement Refinement", Yuan Pu, Tinghuan Chen, Zhuolun He, Chen Bai, Haisheng Zheng, Yibo Lin and Bei Yu (The Chinese University of Hong Kong, Shanghai AI Laboratory, Peking University) (Best Paper Candidate) [abstract]

Abstract: This paper proposes IncreMacro, a novel approach for macro placement refinement in the context of integrated circuit (IC) design. The suggested approach iteratively and incrementally optimizes the placement of macros in order to enhance IC layout routability and timing performance. To achieve this, IncreMacro utilizes several methods including kd-tree-based macro diagnosis, gradient-based macro shifting and constraint-graph-based LP for macro legalization. By employing these techniques iteratively, IncreMacro meets two critical solution requirements of macro placement: (1) pushing macros to the chip boundary; and (2) preserving the original macro relative positional relationship. The proposed approach has been incorporated into DREAMPlace and AutoDMP, and is evaluated on several RISC-V benchmark circuits at the 7-nm technology node. Experimental results show that, compared with the macro placement solution provided by DREAMPlace (AutoDMP), IncreMacro reduces routed wirelength by 6.5% (16.8%), improves the routed worst negative slack (WNS) and total negative slack (TNS) by 59.9% (99.6%) and 63.9% (99.9%), and reduces the total power consumption by 3.3% (4.9%).

4. "Timing-Driven Analytical Placement According to Expected Cell Distribution Range", Jai-Ming Lin, You-Yu Chang and Wei-Lun Huang (National Cheng Kung University) [abstract]

Abstract: Since the multilevel framework with the analytical approach has been proven as a promising method to handle the very-large-scale integration (VLSI) placement problem, this paper presents two techniques including a pin-connectivity-aware cluster score function and identification of expected object distribution ranges to further improve the coarsening and refinement stages of this framework. Moreover, we extend the proposed analytical placement method to consider timing in order to speed up design convergence. To optimize timing without increasing wirelength, our approach only increases the weights of timing-critical nets, where the weight of a net is estimated according to the associated timing slack and degree. Besides, we propose a new equation to update net weights based on their historical values to maintain the stability of the net-based timing-driven placement approach. Experimental results demonstrate that the proposed analytical placement approach with new techniques can actually improve wirelength of the classic approach. Moreover, our TDP can get much better WNS and TNS than the previous timing-driven placers such as DREAMPlace4.0 and Differentiable TDP.

12:00 - 13:00: Lunch

13:00 - 14:00: Standard Cell, Routability, and IR drop

Chair: William Chow (Cadence)

1. "Routability Booster - Synthesize a Routing Friendly Standard Cell Library by Relaxing BEOL Resources", Bing-Xun Song, Ting Xin Lin and Yih-Lang Li (National Yang Ming Chiao Tung University) [abstract]

Abstract: In recent years, the accessibility of pins has become a focal point for cell design and synthesis research. In this study, we propose a novel approach to improve routability in upper-level routing by eliminating one M1 track during cell synthesis. This creates space for accommodating upper-level routing, leading to improved routability. We achieve consolidated routability of transistor placement by integrating fast track assignment with dynamic programming-based transistor placement. Additionally, we introduce a hybrid routing algorithm that identifies an optimal cell routing territory for each net. This optimal territory facilitates subsequent Steiner Minimum Tree (SMT) solutions for mixed integer linear programming (MILP) and constrains the routing region of MILP, resulting in accelerated execution. The proposed MILP approach enables concurrent routing planning and pin metal allocation, effectively resolving the chicken-or-egg causality dilemma. Experimental results demonstrate that, when using the routing-friendly synthesized cell library, the routing quality in various designs surpasses that achieved with a handcrafted cell library in ASAP7 PDK. This improvement is evident in metrics such as wirelength, number of vias, and design rule check (DRC) violations.

2. "Novel Transformer Model Based Clustering Method for Standard Cell Design Automation", Chia-Tung Ho, Ajay Chandna, David Guan, Alvin Ho, Minsoo Kim, Yaguang Li and Haoxing Ren (Nvidia) (Best Paper Candidate) [abstract]

Abstract: Standard cells are essential components of modern digital circuit Standard cell design automation, Electronic design automation, designs. With process technologies advancing beyond 5nm, more routability issues have arisen due to the decreasing number of routing tracks (RTs), increasing number and complexity of design rules, and strict patterning rules. The standard cell design automation framework is able to automatically design standard cell layouts, but it is struggling to resolve the severe routability issues in advanced nodes. As a result, a better and more efficient standard cell design automation method that can not only resolve the routability issue but also scale to hundreds of transistors to shorten the development time of standard cell libraries is highly needed and essential. High quality device clustering with the considerations of routability in the layouts of different technology nodes can reduce the complexity and assist finding the routable layouts faster. In this paper, we develop a novel transformer model-based clustering methodology- training the model using LVS/DRC clean cell layouts and leveraging the personalized page rank vectors to cluster the devices with the attentions to netlist graph and learned embeddings from the actual LVS/DRC clean layouts. On a benchmark of 94 complex and hard-to-route standard cells, the proposed method not only generates 15% more LVS/DRC clean layouts, but also achieves average 12.7X faster than previous work. The proposed method can generate 100% LVS/DRC clean cell layouts over 1000 standard cells and achieve 14.5% smaller cell width than an industrial standard cell library.

3. "Power Sub-Mesh Construction in Multiple Power Domain Design with IR Drop and Routability Optimization", Chien-Pang Lu, Iris Hui-Ru Jiang, Chung-Ching Peng, Mohd Mawardi Mohd Razha and Alessandro Uber (Intel, National Taiwan University) [abstract]

Abstract: Multiple power domain design is prevalent for achieving aggressive power savings. In such design, power delivery to cross-domain cells poses a tough challenge at advanced technology nodes because of the stringent IR drop constraint and the routing resource competition between the secondary power routing and regular signal routing. Nevertheless, this challenge was rarely mentioned and studied in recent literature. Therefore, in this paper, we explore power sub-mesh construction to mitigate the IR drop issue for cross-domain cells and minimize its routing overhead. With the aid of physical, power, and timing related features, we train one IR drop prediction model and one design rule violation prediction model under power sub-meshes of various densities. The trained models effectively guide sub-mesh construction for cross-domain cells to budget the routing resource usage on secondary power routing and signal routing. Our experiments are conducted on industrial mobile designs manufactured by a 6nm process. Experimental results show that IR drop of cross-domain cells, the routing resource usage, and timing QoR are promising after our proposed methodology is applied.

14:00 - 14:10: Break

14:10 - 15:10: Thermal Analysis and Packaging

Chair: Yu-Min Lee (National Yang Ming Chiao Tung University)

1. "Introduction of 3D IC Thermal Analysis Flow", Alex Hung (Siemens EDA) [abstract]

Abstract: Thermal Challenge from modeling heterogeneous 2.5/3D IC package is important for several reasons. Designing a large high power device, e.g. a AI or HPC processor without considering how to get the heat out is likely to lead to problems later on, resulting in a sub-optimal packaging solution from cost, size, weight and performance perspectives. Thermal simulation combines with physical verification. The benefits are enablement for automatic extraction, power map generation and simulation of the complete 3D IC assembly, viewing thermal map, and addressing hotspot. Make the IC design flow aware temperature and hotspot at the early design stage.

2. "3Dblox: Unleash the Ultimate 3DIC Design Productivity", Jim Chang (TSMC) [abstract]

Abstract: The 3DIC design world is blooming with new ideas and new possibilities. With TSMC's 3DFabricTM technology, new opportunities in architectural innovation have led to superior system performance and density. However, what comes with the new opportunities is the continuous rise in design complexity.
In this talk, we will introduce 3Dblox, our latest invention to ease the design complexity challenge. The 3Dblox is an innovative design language that modularizes the complex 3DIC structures to streamline the design flow, and is open and free to all industry participants. All 4 EDA vendors, including Cadence, Synopsys, Ansys, and Siemens have actively participated in this effort to provide a unified design ecosystem to unleash the ultimate 3DIC design productivity.

3. "Challenges for Automating PCB Layout", Wen-Hao Liu (Nvidia) [abstract]

Abstract: Printed circuit board (PCB) design is typically semi-automated or model to parse the datasheet and schematic diagram to verify its correctness [3]. fully manual. However, in recent years, the scale of PCB designs has rapidly enlarged, such that the engineering effort of manual design has increased dramatically. Therefore, the criticality of automation emerges. PCB houses are looking for productivity improvement that is contributed by automation. In this talk, the speaker will give a short tutorial about how a PCB design is done today and then indicate the challenges and opportunities for PCB design automation. Based on the speaker's experience and observation, this talk will introduce several challenges and opportunities for PCB automation.

15:10 - 15:30: Break

15:30 - 17:30: Lifetime Achievement Session

Chair: Yao-Wen Chang (National Taiwan University)

1. "Scheduling and Physical Design", Jason Cong (University of California at Los Angeles) [abstract]

Abstract: In a typical integrated circuit electronic design automation (EDA) flow, scheduling is a key step in high-level synthesis, which is the first stage of the EDA flow that synthesizes a cycle-accurate register transfer level (RTL) from the given behavior description, while physical design is the last stage in the EDA flow that generates the final geometric layout of the transistors and wires for fabrication. As a result, scheduling and physical design are usually carried out independently. In this paper, I discuss multiple research projects that I have been involved with, where the interaction between scheduling and physical design are shown to be highly beneficial. I shall start with my very first paper in EDA on multi-layer channel routing which benefited from an unexpected connection to the optimal two-processor scheduling algorithm, a joint work with Prof. Martin Wong, who is being honored at this conference for the 2024 ISPD Lifetime Achievement Award. Then, I shall further demonstrate how scheduling can help to overcome interconnect bottleneck, enable parallel placement and routing, and, finally, play a key role in layout synthesis for quantum computing.

2. "Accelerating Physical Design from 1 to N", Evangeline Young (The Chinese University of Hong Kong) [abstract]

Abstract: Today, we have abundant parallel computing resources, while most EDA tools are still running sequentially. It is interesting to see how physical design can be advanced by leveraging this massive parallel computing power. To achieve significant speedup, it is usually not simply running the same sequential method a few copies in parallel. Innovative parallel algorithms that solve the problem from a new perspective using different mechanisms are needed. We will look at a few examples in physical design and logic synthesis in this talk to illustrate some methodologies and techniques in parallelizing design automation.

3. "Pioneering Contributions of Professor Martin D. F. Wong to Automatic Floorplan Design", Ting-Chi Wang (National Tsing Hua University) [abstract]

Abstract: Professor Martin D. F. Wong is well recognized as a distinguished figure in the community of physical design, owing to his numerous and noteworthy contributions. This talk aims to highlight his pioneering works in the field of automatic floorplan design. Professor Wong's profound insights and innovative approaches have not only propelled advancements in the field but also served as an inspirational source for other researchers.

4. "My Journey in EDA", Martin D.F. Wong (Hong Kong Baptist University) [abstract]

Abstract: The 2024 International Symposium on Physical Design lifetime achievement award goes to Professor Martin D F Wong for his oustanding contributions in the field. Professor Wong is a world-renowned scholar in the area of Electronic Design Automation (EDA) where he has made significant contributions in the algorithmic aspects of the physical design of silicon chips. A prolific scholar, he has published over 500 technical papers at top EDA journals and conferences. He has supervised and graduated over 50 PhD students, and many of them now hold leadership positions in the industry and academia. Professor Wong is a Fellow of ACM and a Fellow of IEEE for his contributions to the algorithmic aspects of EDA. In December 2019, he was elected as Fellow of the Hong Kong Academy of Engineering Science.

18:30 - 20:30: Banquet

Host: Ting-Chi Wang (National Tsing Hua University)

Location: Le Méridien Taipei [2F, LEO]
台北寒舍艾美酒店-軒轅廳
No. 38, Songren Rd, Xinyi District, Taipei City, Taiwan 110


Day 3 Friday, March 15, 2024

8:30 - 9:20: Keynote

Chair: Chia-Lin Yang (National Taiwan University)

"Computing Architecture for Large-Language Models (LLMs) and Large Multimodal Models (LMMs)", Bor-Sung Liang (MediaTek) [abstract]

Abstract: Large-language models (LLMs) have achieved remarkable performance in many AI applications, but they require large parameter size in their models. The parameter size ranges from several billions to trillion parameters, and results in huge computation requirements on both training and inference. General speaking, LLMs increasing more parameters are to explore "Emergent Abilities" for AI models. On the other hands, LLMs with fewer parameters are to reduce computing burden to democratize generative AI applications.
To fulfill huge computation requirement, Domain Specific Architecture is important to co-optimize AI models, hardware, and software designs, and to make trade-offs among different design parameters. Besides, there are also trade-offs between AI computation throughput and energy efficiency on different types of AI computing systems.
Large Multimodal Models (LMMs), also called Multimodal Large Language Models, integrates multiple data types as input. Multimodal information can provide rich and or environment information for LMMs to generate better user experience. LMM is also a trend for mobile devices, because mobile devices often connect with many sensors, such as video, audio, touch, gyro, navigation system, etc.
Those LLMs/LMMs trends and new usage scenarios will shape future computing architecture design. In this talk we will discuss those issues, and especially their impacts on mobile processor design.

9:20 - 10:20: Quantum and Superconducting Circuits

Chair: Ben Trombley (IBM)

1. "SMT-Based Layout Synthesis Approaches for Quantum Circuits", Zi-Hao Guo and Ting-Chi Wang (National Tsing Hua University) [abstract]

Abstract: The physical qubits in current quantum computers do not all interact with each other. Therefore, in executing a quantum algorithm on an actual quantum computer, layout synthesis is a crucial step that ensures that the synthesized circuit of the quantum algorithm can run smoothly on the quantum computer. In this paper, we focus on a layout synthesis problem for quantum circuits and improve a prior work, TB-OLSQ, which adopts a transition-based satisfiability modulo theories (SMT) formulation. We present how to modify TB-OLSQto obtain an accelerated version for runtime reduction. In addition, we extend the accelerated version by considering gate absorption for better solution quality. Our experimental results show that compared with TB-OLSQ, the accelerated version achieves 121X speedup for a set of SWAP-free circuits and 6X speedup for the other set of circuits with no increase in SWAP gates. In addition, the accelerated version with gate absorption helps reduce the number of SWAP gates by 38.9% for the circuits requiring SWAP gates, while it is also 3X faster.

2. "Satisfiability Modulo Theories-Based Qubit Mapping for Trapped-Ion Quantum Computing Systems", Wei-Hsiang Tseng, Yao-Wen Chang and Jie-Hong Roland Jiang (National Taiwan University) [abstract]

Abstract: Qubit mapping is crucial in optimizing the performance of quantum algorithms for physical executions on quantum computing architectures. Many qubit mapping algorithms have been proposed for superconducting systems recently. However, due to their limitations on the physical qubit connectivity, costly SWAP gates are often required to swap logical qubits for proper quantum operations. Trapped-ion systems have emerged as an alternative quantum computing architecture and have gained much recent attention due to their relatively long coherence time, high-fidelity gates, and good scalability for multi-qubit coupling. However, the qubit mapping of the new trapped-ion systems remains a relatively untouched research problem. This paper proposes a new coupling constraint graph with multi-pin nets to model the unique constraints and connectivity patterns in one-dimensional trapped-ion systems. To minimize the time steps for quantum circuit execution satisfying the coupling constraints for trapped-ion systems, we devise a divide-and-conquer solution using Satisfiability Modulo Theories for efficient qubit mapping on trapped-ion quantum computing architectures. Experimental results demonstrate the superiority of our approach in scalability and effectiveness compared to the previous work.

3. "Optimization for Buffer and Splitter Insertion in AQFP Circuits with Local and Group Movement", Bing-Huan Wu and Wai-Kei Mak (National Tsing Hua University) [abstract]

Abstract: Adiabatic quantum-flux parametron (AQFP) is a superconducting technology with extremely low power consumption compared to traditional CMOS structure. Since AQFP logic gates are all clocked by AC current, extra buffer cells are required for balancing the length of data paths. Furthermore, since the output current of an AQFP logic gate is too weak to drive more than one gate, splitter cells are needed for branching the output signals of multi-fanout gates. For an AQFP circuit, the total number of additional buffers and splitters may be much more than the number of logic gates (up to 9 times in the benchmark circuits after optimization), which would greatly impact the power, performance, and area of the circuit. In this paper, we propose several techniques to (i) reduce the total number of required buffers and splitters, and (ii) perturb the levels of logic gates in order to seek more optimization opportunities for buffer and splitter reduction. Experimental results shows that our approach has better quality with comparable runtime compared to a retiming-based method from ASP-DAC'23. Moreover, our approach has quality which is on equal footing with the integer linear programming-based method also from ASP-DAC'23.

10:20 - 10:40: Break

10:40 - 11:40: Physical Design Challenges for Automotive

Chair: Jürgen Scheible (Reutlingen University)

1. "Design Automation Challenges for Automotive Systems", Chung-Wei Lin (National Taiwan University) [abstract]

Abstract: As vehicular technology advances, vehicles become more connected and autonomous. Connectivity provides the capability to exchange information between vehicles, and autonomy provides the capability to make decisions and control each vehicle precisely. Connectivity and autonomy realize many evolutional applications, such as intelligent intersection management and cooperative adaptive cruise control. Electric vehicles are sometimes combined to create more use cases and business models. However, these intelligent features make the design process more complicated and challenging. In this talk, we introduce several examples of automotive design automation, which is required to improve the design quality and facilitate the design process. We mainly discuss the rising incompatibility issue, where different original equipment manufacturers and suppliers are developing systems, but the designs are confidential and thus incompatible with other players' designs. The incompatibility issue is especially critical with autonomous vehicles because no human driver resolves incompatible scenarios. We believe that techniques and experiences in electronic design automation can provide insights and solutions to automotive design automation.

2. "Physical Design Challenges for Automotive ASICs", Goeran Jerke (Bosch) [abstract]

Abstract: The design of automotive ASICs faces several key challenges that mainly arise from the harsh environmental operating conditions, specific functional loads, cost pressure, safety requirements, and the steady progress of the automotive-grade semiconductor technologies that are unique to automotive applications.
The talk first highlights these key differences between the design approaches for automotive and non-automotive ASIC designs. It also addresses why automotive ASIC designs prefer larger and more mature nodes compared to leading-edge non-automotive ASIC designs. In addition, the talk introduces several automotive specific physical design problems and essential solutions for design implementation, direct-verification and meta-verification to address them. Finally, the talk provides an outlook of several related and yet-unsolved challenges in the physical design domain.

3. "Solving the Physical Challenges for the Next Generation of Safety Critical & High Reliability Systems", Brian Li (Cadence) [abstract]

Abstract: Silicon systems have been part of automobiles for a long time. The physical design methodology to address the quality, reliability, and safety challenges of these systems are common knowledge in the leading automotive semiconductor companies. The rise of trends like autonomous driving (ADAS), software defined vehicles (SDV) and the electrification of our transportation network are giving rise to not only new levels of these challenges, but also many new players in the automotive semiconductor space. The same forces of opportunity which are transforming our society are also the foundation of a transformation in automotive semiconductor design: massive improvements in accelerated compute, 3DIC and chiplet based design, digital twins, and artificial intelligence (AI). We'll discuss how these forces are helping modern automotive semiconductor design and highlight how the electronic design automation (EDA) industry can apply successful principles from earlier eras to these new challenges.

11:40 - 11:50: Break

11:50 - 12:30: Contest Summary/results

Chair: Gracieli Posser (Cadence)

"ISPD 2024 GPU/ML-Enhanced Large Scale Global Routing Contest", Rongjian Liang (Nvidia) [abstract]

Abstract: Modern VLSI design flows demand scalable global routing techniques applicable across diverse design stages. In response, the ISPD 2024 contest pioneers the first GPU/ML-enhanced global routing competition, selecting advancements in GPU-accelerated computing platforms and machine learning techniques to address scalability challenges. Large-scale benchmarks, containing up to 50 million cells, offer test cases to assess global routers' runtime and memory scalability. The contest provides simplified input/output formatsandperformancemetrics,framingglobalroutingchallenges as mathematical optimization problems and encouraging diverse participation. Two sets of evaluation metrics are introduced: the primary one concentrates on global routing applications to guide post-placement optimization and detailed routing, focusing on congestion resolution and runtime scalability. Special honor is given based on the second set of metrics, placing additional emphasis on runtime efficiency and aiming at guiding early-stage planning.

12:30 - 12:40: Outlook to ISPD 2025

12:40 - 20:00: Boxed lunch + Social outing: northeastern coast (Geopark) + Yangmingshan Hot Spring + dinner

The Greater Taipei One-Day Tour

In the afternoon and evening of Friday, March 15, there will be a tour to Yehliu Geopark then to the Yangmingshan Tien Lai hot springs with a dinner banquet. Attendees who wish to join us for this social networking event, must sign up by February 21 after completing ISPD registration, as bus capacity is limited and personal details need to be provided to the tour operator in advance. Yehliu GeoPark has beautiful geological formations and "Queen's Head" is a representative landmark and popular photo spot. Yangmingshan Tien Lai Resort and Spa offers a unique and relaxing hot spring experience.