In this article, we have provided a comprehensive overview of several critical concepts related to power dissipation in CMOS circuits. We began with a concise explanation of power dissipation, followed by an in-depth discussion on the clock tree and clock tree synthesis. The article covered various aspects of clock networks, including their fundamental structures and the challenges associated with them. Key parameters such as clock skew, jitter, latency, and slew were also examined, along with their impact on circuit performance. Additionally, we explored different clock distribution methods, such as conventional clock tree distribution, clock mesh, H-tree, and fishbone, culminating in an analysis of multi-source clock tree systems.
Clock Tree :
A clock tree is a clock distribution network within a system/ckt. Includes the clocking circuitry and devices from clock source to destination. Complexity of the clock tree and the number of clocking components used depends on the circuit and its functionality. Systems can have multiple ICs with different clock requirements. A “clock tree” refers to the various clocks feeding those ICs. Usually a single reference clock is cascaded and synthesized into multiple different output clocks.
Clock Tree Synthesis & Design Flow:
Clock Tree Synthesis is the process to distribute clock to the each sequential element of the circuit. CTS is part of Physical Design in back end of flow. RTL design is verified and synthesized into a technology mapped gate level netlist. Floor Planning, Power Planning and Routing are done. Global and detailed placement is done. CTS comes after all these steps. Till now ideal clock is used. Since all sequential elements are placed, real clock signal are included in CTS. Without CTS clock signal goes to each end leaf cells or FlipFlops directly from clock root. After CTS there are number of buffers inserted so that proper power management could be done.
Clock Network:
A clock networks consist of:
(1) Clock generators , (2) Clock elements, (3) Clock wires
Clock can be generated is by using a ring oscillator/other non-stable circuit, but these are susceptible to PVT variations. Off-chip clocks are generation is done using a crystal and an oscillation ckt., on-chip local clock generation is done using with a PLL or DLL.Routing a single clock around a chip is a difficult issue. Routing multiple clocks with little skew is even harder.
Depending on applications there are many timing components. Most common timing component are :
1. Crystals : Crystals uses a piece of quartz which is cut at a particular angle and mounted in a protective metal casing. It provide a frequency output when an electrical signal is applied.
2. Crystal Oscillators (Xos) : Crystal Oscillators (XOs) integrate the crystal with the oscillator circuit, enabling XOs to provide higher frequency outputs.
3. Clock Generators : Clock generators are integrated circuits (ICs) that generate multiple output frequencies from a single input reference frequency.
4. Voltage controlled oscillators (VCXOs) : A self-contained oscillator that varies its output frequency in concert with differing voltages from a voltage reference.
5. Clock Buffers : ICs for distributing multiple copies of a clock to multiple ICs with the same frequency requirements.
6. Jitter Attenuators/Jitter Cleaners : Jitter attenuators are clock generators with specialized circuitry for reducing jitter.
Off chip clock has limitations like ,
(1) Frequency is limited – multiplier is required
(2) Uncontrolled clock phase – synchronization issue
PLL can multiply the clock frequency. If clock multiplication is not required, DLL is used. PLL uses an oscillator that creates a new clock whereas the DLL uses a variable delay line that delays the input clock. A phase detector (PD) produces a signal proportional to the phase difference between the input and output clocks. A loop filter (LF) converts the phase error into a control signal (voltage). A voltage-controlled oscillator (VCO), creates a new clock signal based on the error signal.
In a DLL, same principle, but instead of changing the frequency, it just delays the clock until the phase is equal.
Challenges of Clock Network:
1. Process Variations : Variability due to device structure and other parameters lead to Wafer-to-wafer variations. The direct impact of process variations is on the yield and the performance of the design. The variations in the standard cell result in the mismatch across the various clock trees in the clock network.
2. Power Supply Variations : Supply voltage scaling causes variations in the switching activity across the die. The uneven power dissipation across the die is the result of fluctuations in the demand of current over a short interval of time. Ripples or noise voltage is induced in the supply lines due to the presence of parasitic inductance. The current flows in the chip via interconnect which has a finite resistance. The variation in the resistance leads to IR drop.
3.Temperature Variations : Temperature varies continuously on the chip while it is operating due to the power dissipation on the chip. With the increase in temperature the drain current decreases. Both device and interconnect depend on the temperature and hence, are affected by the variations in the temperature.
Signal Integrity Issue : Signal integrity issues includes crosstalk, EM, IR. Relative switching of wires on account of the capacitive coupling results in crosstalk noise. With the increase in the clock frequency rates, the capacitive coupling dominates and results in significant delay in data paths. If wire resistance is very high or the current through the transistor is higher than estimated, there is an unwanted voltage drop. This unpredicted drop causes timing degradation in the signal and clock nets. It produces unwanted clock skew in the design and hampers the signal integrity of the design.
Timing Violations : A design consists of millions of gate and multiple clocks. Timing violations due to setup and hold, clock skew and due to the signal integrity issue cause hindrance towards the timing closure for a design.
Design Complexity : Modern designs have millions of cells and the single chip is broken down into a hierarchy of modules. The timing budgets are created for the whole design, which permits the engineers to use hierarchical design methodology and work on their modules for the timing closure. Multistage clock gating structures are helpful for CTS. Multi-mode and multi-corner scenarios increase with the technology proceeding towards few nanometers. Therefore the timing closure, as well as the clock network, have become a challenge to the SoC designers.
Clock Skew:
Clock skew is defined as the difference of the insertion delays of two flops belonging to the same clock domain. Every component add some delay in signal transition. Clock signal take finite time to travel from one point to another so there is time difference between arrival of clock signal to two different flipflops. Difference between arrival of clock of two flipflops is skew.
2 types of skew : (1) Local , (2) Global Skew Value can be +ve or -ve. Local skew is the difference of insertion delays of two communicating flops of same clock domain. Global skew is the difference between the delay times for earliest clock reaching flip-flop and latest clock reaching flip-flop for a same clock-domain. If the capture flop receives clock signal late than the launch flop than it results in Positive Skew. If the launch flop receives clock signal late than the capture flop than it results in Negetive Skew.Jitter, Latency & Slew:
Clock Jitter is defined as the deviation of a clock edge from its ideal position in time. Clock jitter is uncertainity in the clock edge. The cause may be noise, a fluctuating power source, or interference from nearby circuits. Jitter can occur in both direction, positive and negetive. Can be modeled by adding uncertainty regions around the rising and falling edges of the clock waveform. Clock jitter can be cycle-to-cycle/period/long term jitter.
Clock Latency/Clock Insertion Delay:
Defined as the amount of time taken by the clock signal in traveling from its source to the sinks.
Clock latency = Source latency + Network latency
Clock Source Latency/Source Insertion Delay defined as the time taken by the clock signal to reach the clock definition point from clock source.
Clock Network Latency/ Network Insertion Delay defined as the time taken by the clock signal in traversing from clock definition point to the sinks of the clock.
Clock Parameter & Their Impact :
Every practical clock has parameters like Skew, Jitter, Latency, Slew etc. which are also their non- idealities.
Reasons behind Skew & Jitter:
(1)Clock generation : manufacturing device variations in clock drivers
(2)Interconnect variations : Number of buffers, Device Variation , Wire length and variation, Coupling , Load
(3) Environmental variations : Power supply and Temperature
Clock skew and jitter can limit the performance of a digital system. Designing a clock network that minimizes both is important.
How to control clock non-idealities :
- Balanced paths (H-tree network, matched RC trees) can eliminate skew.
- Clock grids used in the final stage of the clock distribution network that minimizes absolute delay (not relative delay).
- If the paths are perfectly balanced, clock skew is zero.
- Distributed buffering reduces absolute delay and makes clock gating easier, but is sensitive to variations in the buffer delay.
- Shielding clock wires to minimize/eliminate coupling with neighboring signal nets.
- Keep close eye on temperature and supply rail variations and their effects on skew and jitter.
- Power supply noise limits the performance of clock networks.
Conventional Clock Tree Distribution:
Single point CTS is the choice for designs when frequency is lower and number of sinks are less. As name suggested having single clock source which distribute clock to every corner of design. In Single point CTS the point of divergence lie at the clock source, so it shared very large uncommon clock path, more susceptible to OCV variation. Clock gates are stratergically placed near the source, saving large amount of dynamic power.
Advantages:
(1) Simple to implement
(2) Better clock gating, reducing power dissipation
Disadvantages:
(1) Higher Insertion delay
(2) More uncommon clock path, more prone to OCV
(3) Tough to achieve lower skew, due to asymmetric
distribution of sinks.
(4) Conventional CTS is not a good choice for high frequency signals, having high no of sinks all over core region
Clock Mesh:
Clock Mesh is divided clock domains into many grid areas. Clock signals connect to the clock mesh node through the pre-driver buffer chain , the clock leaf node units get the clock signal from a nearby grid. Global mesh use two layers of metal wiring, crossing each other, in order to spread the clock signal that placed by top level chain to the whole clock domain, and well control clock skew and clock delay. There is a network of pre-mesh drivers to drive the clock signal from clock port to input of mesh drivers. The output of all the mesh drivers will be shorted using a metal mesh, which will carry the clock signal across the block using horizontal and vertical metal stripes. In mesh structure power dissipation is high as clock gates cells are inserted after the mesh net. So clock gating is done at local level only.
Advantages:
(1) Lower Skew
(2) Highly tolerant to On-Chip Variation
(3) Possible to achieve lower insertion delay
Disadvantage:
(1) More power dissipation (Dynamic)
(2) More routing resources required for creating mesh
(3) Difficult to implement
H tree & Fish Bone:
The H-tree structure is highly symmetrical. The pre-drive buffers evenly distributed on the trunk. It can manage clock skew for the clock domains with a large number of flip-flops. H-tree structure consumes more routing resources, while there will be more power consumption.
Multi Source Clock Tree System:
MSCTS is a hybrid system containing the best aspects of a conventional clock tree and pure clock mesh. Clock mesh delivers the best possible clock frequency, skew, and OCV results, and whereas conventional Clock Tree delivers the lowest power consumption and the easiest flow, Multisource CTS offers a compromise between the two methods while favoring the OCV tolerant nature of pure clock mesh.
A MSCTS Design comprises three different structures in the design :
(i) Pre-mesh Clock Tree
(ii) Multisource Mesh Fabric
(iii) Moderately sized clock trees.
Pre-Mesh Clock Tree:
Each buffer in the pre-mesh tree drives four other buffers. Pre-mesh topology is similar to H-tree placement and routing. H-tree structure is a uniform, scalable, predictable method to distribute root clock over a large area. Also tolerant to corner-to-corner variation because of their balanced structure.
Multi Source Mesh Fabric:
The multi-source mesh fabric resembles a power/ground or clock mesh fabric, although less dense. The coarse fabric smoothes out any remaining clock arrival time differences from the multiple H-tree buffers that directly drive the fabric. The measured skew at the mesh plane is effectively zero.
Moderately Sized Clock Trees :
Multiple clock trees that are attached to the coarse mesh. That structure gives the technology its name.
Benefits of Multisource CTS:
(1) Better performance and lower skew than convention
(2) Better OCV Tolerance than conventional CT
(3) Better multi-corner performance than conventional CT
(4) Less power consumption than pure clock mesh
(5) Greater tolerance for irregular, highly macro density designs than pure clock mesh.
(6) Faster and easier flow than pure clock mesh.