# Clock Enable Timing Closure Methodology

Harish Dangat
Samsung Semiconductor



#### **Agenda**

- Basics of Clock Gating
- Fixing Clock Enable Timing in RTL-2-GDSII Flow
- Results
- Conclusion



#### **Clock Gating Basic**



Figure A - Clock Enable Signal Path

- Use internal (or external) signal to disable clock
- This saves Dynamic Power
- A must for low power design
- Creates new timing paths



## **Two Types of Clock Gating**

Using AND gate

Using ICG Cell





Rest of presentation is about ICG type clock gating



### Register to Register Path





# Register to Register Path with Clock Gating



### What is different about CE path

- Not noticed at Synthesis
- Timing available is less than cycle time
- ICG cells are not skew balanced with registers
- Violations are seen only after Clock Tree Synthesis
- Mostly affects timing critical blocks



# Effect of ICG Cells Location in Clock Tree





#### **Agenda**

- Basics of Clock Gating
- Fixing Clock Enable Timing in RTL-2-GDSII
   Flow
- Results
- Conclusion



#### What to Do at RTL Level

 CE signal should be generated in the same module

Generate CE signal from functionally related modules

Simplify the logic that generates CE signal



### **CE Timing at Synthesis Step**

Reduce cycle time to ICG cells

```
set_clock_latency -(cycle_time/2) \
        [get_pin all_clock_gating_registers/CK]
set_clock_latency 0 [get_pin all_clock_gating_registers/ECK]
```

Set high setup time on ICG cells

```
set timing_scgc_override_library_setup_hold true
set_clock_gating_style -setup 400ps clock_gate
```

Turn off bus sharing in Power Compiler

```
set_clock_gating_style -no_sharing
```



#### **CE Timing at Floorplan Step**

- When placing modules, pay attention to CE signal connectivity
- If CE signal(s) are input pins, place them close to modules that receive it









**Good CE timing** 

#### **CE Timing at placement Step**

Tightening available cycle time by changing ICG setup time

```
set timing_scgc_override_library_setup_hold true
set_clock_gating_style -setup 400ps clock_gate
```

 Tightening available cycle time by changing ICG clock latency

```
set_clock_latency -(cycle_time/2) \
        [get_pin all_clock_gating_registers/CK]
set_clock_latency 0 [get_pin all_clock_gating_registers/ECK]
```



Harish Dangat

## **CE Timing at placement Step (cont)**

Create group path and add extra weight

```
group_path -weight 5 -name CLOCK_ENABLE \
  -to [get_cell */*GATE_LATCH]
```

Place ICG cells close to flops

```
set placer_disable_auto_bound_for_gated_clock false
```



#### **How to Select Latency?**

- Apply global latency
  - Easy, Not very efficient
- Apply based on ICG depth and fanout
  - Less depth more latency
  - More fanout more latency
- Apply based on CTS results
  - More accurate



### **CE Timing at Clock Tree Synthesis**

#### Clone ICG Cells



# **ICG Cloning**



Harish Dangat



# **CE Timing at Clock Tree Synthesis Cloning based on fanout and slack**

```
foreach in collection CELLS [get cells * -hier -filter "ref name =~ *ICG*"] {
          set names [get object name $CELLS]
           set ckPins [get object name [get pins -of object [get cells $CELLS] \
                     -filter "full name =~ */CLK"]]
           set eckPins [get object name [get pins -of object [get cells $CELLS] \
                     -filter "full name =~ */ENABLE CLK"]]
           set eckFanout [sizeof collection [all fanout -from [get pins $eckPins] -flat]]
           set cgSlack [get attribute [get pins ${names}/ENABLE] max slack
           if {$cgSlack > -0.150 && $eckFanout > 100} {
          echo "${names}/E"
remove propagated clock *
remove clock tree
```



# **CE Timing at Clock Tree Synthesis Two Pass Flow**







#### **Agenda**

- Basics of Clock Gating
- Problems Created by Clock Gating
- Fixing Clock Enable Timing in RTL-2-GDSII Flow
- Results
- Conclusion



# Die Temperature Without and With Clock Gating



Relative POWER5 processor temperature (Celsius)
- without clock gating (left) and with clock gating (right)

Proceedings of the 11th Int'l Symposium on High-Performance Computer Architecture (HPCA-11 2005)



## ICG Cells and Flops Autobound





## **Comparing Latency Schemes**





#### Results – Effect on cloning on latency







# **Clock Subtree After Cloning**





#### **Comparing Single Pass and Two pass flow**



place\_opt
clock\_opt



place\_opt clock\_clone new place\_opt clock\_opt





## Different schemes to minimize latency





**Harish Dangat** 

#### Conclusion

Clock gating is requirement for low-power design

 Closing CE timing requires to pay attention at all stages of design

 By planning at every step, CE timing can be closed in high-speed low-power designs



# Thank You!



#### **BACKUP SLIDES**

#### BACKUP SLIDES



### **Battery Life is Important**



#### **Smartphone power for continuous web access**

http://www.phonesreview.co.uk/2012/09/26/iphone-5-vs-samsung-galaxy-s3-battery-life-confrontation/



#### **How to Minimize Power**

Use process designed for low power



Use low power architecture

- User power-gating
- Use Clock-gating











# **Power Saving Opportunity**



Harish Dangat

#### Few Facts About Clock Tree Power

 20% to 40% Dynamic power is consumed by clock tree

 About 80% clock tree power is consumed last stages of clock tree





#### **Architectural/Corse Grain Clock Gating**





# **Automated/Fine Grain Clock Gating**





#### **Example of Automated/Fine Grain Clock Gating**

#### \*\*\*\*\*\*\*\*\*\*\*

Report : clock\_gating

-nosplit

Design : red\_blk Version: F-2011.09-SP5-1 Date : Fri Aug 17 23:27:06 2012

\*\*\*\*\*\*\*\*\*\*

#### Clock Gating Summary

| Number of Clock gating elements | 3133           |
|---------------------------------|----------------|
| Number of Gated registers       | 26738 (76,90%) |
| Number of Ungated registers     | 38077 (23,10%) |
| Total number of registers       | 164815         |



#### What To Look For In ICG



- Too many flops used for generating CE signal
- Large delay in combinational path
- Generating flops placed away from ICG cells
- Flops used to generated ICG signal placed away from each other
- Too man flops receive gated clock



#### What To Look For In ICG



- Too many flops used for generating CE signal
- Large delay in combinational path
- Generating flops placed away from ICG cells
- Flops used to generated ICG signal placed away from each other
- Too man flops receive gated clock





#### What To Look For In ICG



- Too many flops used for generating CE signal
- Large delay in combinational path
- Generating flops placed away from ICG cells
- Flops used to generated ICG signal placed away from each other
- Too man flops receive gated clock

