# Cross Link Insertion for Improving Tolerance to Variations in Clock Network Synthesis

Tarun Mittal Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University

#### **Presentation Flow**

- Introduction
- Comparison of link insertion schemes
- Clock Network Synthesis
- Experimental Results
- Conclusions and Future Work

**Insertion of Cross link** 

Current approach to Clock Network Synthesis

- Clock Trees
  - Shorter Wiring
  - Unique path from source to sinks
  - More susceptible to process variations



#### **Insertion of Cross link**

- Current approach to Clock Network Synthesis
  - Clock Trees
    - Shorter Wiring
    - Unique path from source to sinks
    - More susceptible to process variations
  - Clock Mesh
    - Higher wiring cost
    - Many paths from source to sinks
    - More robust to process variations

#### **Insertion of Cross link**

- Current approach to Clock Network Synthesis
  - Clock Trees
    - Shorter Wiring
    - Unique path from source to sinks
    - More susceptible to process variations
  - Clock Mesh
    - Higher wiring cost
    - Many paths from source to sinks
    - More robust to process variations
- Cross link form a compromise between clock trees and clock meshes



#### **Effect of cross link insertion**

Change in skew between nodes u and v due to cross link addition

$$\overline{q}_{u,v} = \alpha q_{u,v} + \alpha \beta$$

where

 $\overline{q}_{u,v}$ =skew after link addition

 $q_{u,v}$ =skew before link addition $T_a$ 



#### **Effect of cross link insertion**

Change in skew between nodes u and v due to cross link addition

$$\overline{q}_{u,v} = \alpha q_{u,v} + \alpha \beta$$

where

 $\overline{q}_{u,v}$ =skew after link addition

 $q_{u,v}$ =skew before link addition $T_a$ 

 $\alpha = R_I/R_{loop}$ 



#### **Effect of cross link insertion**

Change in skew between nodes u and v due to cross link addition

$$\overline{q}_{u,v} = \alpha q_{u,v} + \alpha \beta$$

where

 $\overline{q}_{u,v}$ =skew after link addition

 $q_{u,v}$ =skew before link addition $T_a$ 

$$\alpha = R_I/R_{loop}$$

$$\beta = CI/2(R_{u,u}-R_{v,v})$$



### **Comparison of Link insertion schemes**

#### Method 1:

- Link I<sub>1</sub> is inserted between two sinks u and v
- This method of link insertion is used in [Rajaram-Hu, ISPD'05]



### **Comparison of Link insertion schemes**

#### Method 1:

- Link I<sub>1</sub> is inserted between two sinks u and v
- This method of link insertion is used in [Rajaram-Hu, ISPD'05]

#### Method 2:

- Link l<sub>2</sub> is inserted between two higher level internal nodes u and v
- This method of link insertion is used in our approach



### **Comparison of Link insertion schemes**

#### Method 1:

- Link I<sub>1</sub> is inserted between two sinks u and v
- This method of link insertion is used in [Rajaram-Hu, ISPD'05]

#### Method 2:

- Link l<sub>2</sub> is inserted between two higher level internal nodes u and v
- This method of link insertion is used in our approach
- $I_2 << I_1$  satisfies  $\alpha_2 < \alpha_1 \& \beta_2 < \beta_1$



Method 2

### Effect of cross link on sink delays

#### Sinks are in the same subtree

#### Method 1:

- m and n have different path lengths to the end point of the cross link
- skew variability depends upon locality of sink node to the end point of the cross link



#### Sinks are in the same subtree

#### Method 1:

- m and n have different path lengths to the end point of the cross link
- skew variability depends upon locality of sink node to the end point of the cross link

#### Method 2:

- m and n have nearly same path lengths to the end point of cross link
- skew variability is same for the sink nodes





#### Measured skew variability for both methods



Sinks are in different sub-trees connected by the cross link

- Method 1:
  - Different delays for sinks within a sub-tree
  - Non uniform correlation between the sink pairs m and n



## Sinks are in different sub-trees connected by the cross link

#### Method 1:

- Different delays for sinks within a sub-tree
- Non uniform correlation between the sink pairs m and n

#### Method 2:

- Same delays for sinks within a sub-tree
- Uniform correlation between all sink pairs m and n





### Sinks are in two disjoint sub-trees

- No predictable correlation between delays of sinks m and n due to no overlap path
- Both Method 1 and Method 2 are equally ineffective in this situation.



### **Clock Network Synthesis**

- Our clock network synthesis is based on the usage of Method 2 for cross link insertion.
- Problem formulation is based on ISPD'10 High performance Clock Network Synthesis contest.
- Our approach to clock network synthesis consists of 3 main steps
  - Merging
  - Buffer Insertion
  - Link Insertion

#### **Problem Formulation**

- Given: Sinks, Blockages and clock source location
- Objective: Generate a clock network T that connects clock source to the sinks.
- Constraints:
  - All sink pairs with distance between them less than user specified distance are called local sink pairs.
  - All local sink pairs should satisfy Local clock skew constraint (LCS).
  - Slew at any point should be less than predefined limit S.
  - Buffers should not be placed in the blockages











In bottom-up phase clock tree is constructed iteratively.



#### **Buffer Insertion**

- Slew constraints results in the buffer insertion in clock tree.
- Buffers are inserted on the stem wires.
- NGSPICE simulations are used to compute the length of stem wire.
- Each buffer buf<sub>i</sub> has a merging region mr<sub>buf<sub>i</sub></sub> associated with it.



#### **Buffer Insertion**

- Slew constraints results in the buffer insertion in clock tree.
- Buffers are inserted on the stem wires.
- NGSPICE simulations are used to compute the length of stem wire.
- Each buffer buf<sub>i</sub> has a merging region mr<sub>buf<sub>i</sub></sub> associated with it.
- Blockage avoidance is considered

















### Merits of our design flow

- Our link insertion flow allows us to control the link length.
- Inserting link below the buffer helps in reducing the variation effects of buffer as compared to inserting above it.
- Cross link maximizes the reduction of the skew variability for the sinks in the same sub-tree
- Cross link improves the correlation of the sink delays in the two sub-trees that are connected by the cross link.

#### **Experimental Setup**

- 45nm Predictive Technology Model
- Inverters types
  - Mid sized inverter (inv-1)
    - 10μm nmos, 14.6μm pmos (for similar R/F delay)
    - input cap=35fF, resistance=61.2 $\Omega$ , output parasitic cap=80fF
  - Small inverter(inv-2)
    - 1.37μm nmos, 2μm pmos
    - input cap=4.2fF, resistance=440 $\Omega$ , output parasitic cap=6.1fF
- Wire types
  - wire-1:  $0.1(\Omega/\mu m)$ ,  $0.2(fF/\mu m)$
  - wire-2:  $0.3(\Omega/\mu m)$ ,  $0.16(fF/\mu m)$

### **Experiment Setup**

- Supply voltage variations=15%
- Wire width variations=10%
- Inverter size: 30 parallel inv-2
- Buffer size: 10 parallel inv-2 driving 40 parallel inv-2
- In ISPD Monte-Carlo simulations, each inverter gets supply voltage independent of other inverters in the circuit

### Benchmark summary

| Name        | #<br>sinks | LCS<br>distance<br>(nm) | LCŞ<br>(ps) | Width<br>(nm) | Height<br>(nm) | # blockages |
|-------------|------------|-------------------------|-------------|---------------|----------------|-------------|
| ispd10cns01 | 1107       | 600000                  | 7.50        | 8000000       | 8000000        | 4           |
| ispd10cns02 | 2249       | 600000                  | 7.50        | 13000000      | 7000000        | 1           |
| ispd10cns03 | 1200       | 370000                  | 4.99        | 3071928       | 492989         | 2           |
| ispd10cns04 | 1845       | 600000                  | 7.50        | 2130492       | 2689554        | 2           |
| ispd10cns05 | 1016       | 600000                  | 7.50        | 2318787       | 2545448        | 1           |
| ispd10cns06 | 981        | 600000                  | 7.50        | 1949600       | 890880         | 0           |
| ispd10cns06 | 1915       | 600000                  | 7.50        | 2536640       | 1447680        | 0           |
| ispd10cns08 | 1134       | 600000                  | 7.50        | 1837440       | 1628160        | 0           |

#### **ISPD Monte-Carlo Simulations**

| ВМ | # sinks | LCS<br>(ps) | Method                                                                                               | 95%<br>LCS<br>(ps)                            | Cap<br>(fF)                                               | Cap<br>ratio                                 | CPU<br>(s)                                    |
|----|---------|-------------|------------------------------------------------------------------------------------------------------|-----------------------------------------------|-----------------------------------------------------------|----------------------------------------------|-----------------------------------------------|
| 01 | 1107    | 7.50        | Contango[1,18]<br>CNSrouter[1,19]<br>NTUclock[1]<br>Work in [20]<br>Our work (buf)<br>Our work (inv) | 7.01<br>7.23<br>8.66<br>7.16<br>7.32<br>7.03  | 198337<br>1168104<br>293887<br>445331<br>142325<br>136961 | 1.44<br>8.52<br>2.14<br>3.25<br>1.03<br>1.00 | 12015<br>675<br>15<br>0.40<br>1092<br>3237    |
| 02 | 2249    | 7.50        | Contango[1,18]<br>CNSrouter[1,19]<br>NTUclock[1]<br>Work in [20]<br>Our work (buf)<br>Our work (inv) | 7.34<br>7.35<br>10.73<br>7.33<br>7.42<br>7.36 | 375863<br>2099811<br>832483<br>933574<br>263198<br>253760 | 1.48<br>8.27<br>3.28<br>3.67<br>1.03<br>1.00 | 25006<br>2140<br>176<br>2.42<br>4314<br>10157 |
| 03 | 1200    | 4.99        | Contango[1,18]<br>CNSrouter[1,19]<br>NTUclock[1]<br>Work in [20]<br>Our work (buf)<br>Our work (inv) | 4.18<br>3.95<br>8.63<br>4.88<br>4.49<br>4.82  | 55861<br>93965<br>167062<br>183702<br>36609<br>36867      | 1.51<br>2.54<br>4.53<br>4.98<br>0.99<br>1.00 | 3840<br>21<br>6<br>1.57<br>383<br>1761        |

#### **ISPD Monte-Carlo Simulations contd...**

| ВМ | # sinks | LCS<br>(ps) | Method                                                                                               | 95% LCS<br>(ps)                                | Cap<br>(fF)                                           | Cap<br>ratio                                  | CPU<br>(s)                              |
|----|---------|-------------|------------------------------------------------------------------------------------------------------|------------------------------------------------|-------------------------------------------------------|-----------------------------------------------|-----------------------------------------|
| 04 | 1845    | 7.50        | Contango[1,18]<br>CNSrouter[1,19]<br>NTUclock[1]<br>Work in [20]<br>Our work (buf)<br>Our work (inv) | 4.46<br>7.25<br>9.55<br>4.09<br>6.70<br>6.79   | 71843<br>125333<br>325206<br>196337<br>51070<br>47393 | 1.51<br>2.64<br>6.86<br>4.14<br>1.07<br>1.00  | 6075<br>22<br>58<br>0.27<br>934<br>2543 |
| 05 | 1016    | 7.50        | Contango[1,18]<br>CNSrouter[1,19]<br>NTUclock[1]<br>Work in [20]<br>Our work (buf)<br>Our work (inv) | 4.41<br>7.27<br>6.98<br>3.81<br>4.78<br>4.41   | 37690<br>74084<br>130389<br>89094<br>25129<br>22589   | 1.48<br>8.27<br>3.28<br>3.67<br>1.03<br>1.00  | 2406<br>10<br>11<br>0.40<br>278<br>778  |
| 06 | 981     | 7.50        | Contango[1,18]<br>CNSrouter[1,19]<br>NTUclock[1]<br>Work in [20]<br>Our work (buf)<br>Our work (inv) | 6.05<br>6.79<br>416.62<br>7.49<br>6.41<br>5.81 | 47810<br>87390<br>2E+06<br>160447<br>32680<br>29278   | 1.63<br>2.98<br>68.31<br>5.48<br>1.11<br>1.00 | 2660<br>41<br>1<br>0.28<br>285<br>995   |

#### **ISPD Monte-Carlo Simulations contd...**

| ВМ | # sinks | LCS<br>(ps) | Method                                                                                               | 95% LCS<br>(ps)                              | Cap<br>(fF)                                           | Cap<br>ratio                                 | CPU<br>(s)                              |
|----|---------|-------------|------------------------------------------------------------------------------------------------------|----------------------------------------------|-------------------------------------------------------|----------------------------------------------|-----------------------------------------|
| 07 | 1915    | 7.50        | Contango[1,18]<br>CNSrouter[1,19]<br>NTUclock[1]<br>Work in [20]<br>Our work (buf)<br>Our work (inv) | 4.58<br>5.97<br>8.12<br>6.24<br>5.86<br>5.53 | 72644<br>128351<br>275597<br>228243<br>48316<br>47555 | 1.52<br>2.69<br>5.79<br>4.79<br>1.01<br>1.00 | 2351<br>27<br>66<br>0.30<br>818<br>2765 |
| 08 | 1134    | 7.50        | Contango[1,18]<br>CNSrouter[1,19]<br>NTUclock[1]<br>Work in [20]<br>Our work (buf)<br>Our work (inv) | 5.15<br>5.37<br>7.64<br>5.47<br>5.07<br>5.72 | 52490<br>97421<br>165883<br>228243<br>33029<br>31088  | 1.68<br>3.13<br>5.33<br>7.34<br>1.06<br>1.00 | 1987<br>17<br>7<br>0.28<br>367<br>938   |

 We were able to meet the LCS constraint for all benchmarks with lower capacitance as compared to previous work.

#### Conclusions and Future Work

#### Conclusions

- New link insertion methodology of inserting links between higher level internal nodes in a clock tree is proposed
- Proposed methodology improves the correlation of sink delays for the sinks that have similar path lengths to the inserted cross link
- NGSPICE based Monte-Carlo simulations verifies the effectiveness of the approach

#### Future work

- Merging to minimize the local clock skew instead of global skew
- Handling of longer cross links

Thank You