Showing posts with label VLSI Physical Design. Show all posts
Showing posts with label VLSI Physical Design. Show all posts

4/14/2025

What is Specialized Routing in VLSI Physical Design?


 



In this article, we have provided an in-depth discussion on specialized routing in VLSI Physical Design, covering several key concepts and techniques essential to advanced chip design. We begin by outlining the overall design flow and introducing the role of specialized routing in enhancing performance and efficiency. The discussion includes detailed insights into area routing, focusing on its primary objectives and the various optimization factors involved. We then explore the fundamentals of clock networks, examining delay issues, clock skew, and common routing challenges. Additionally, we present a multi-part analysis of modern clock tree synthesis techniques, comparing methods like MMM and RGM, and concluding with strategies for optimizing clock skew and managing power trade-offs in complex VLSI systems.


Design Flow & Specialized Routing :


In digital integrated circuits, signal wires undergo global routing first followed by detailed routing. However, in certain designs—such as analog circuits and printed circuit boards (PCBs) with gridless (trackless) routing—this distinction is unnecessary. Similarly, older or smaller designs with only one or two metal layers also fall into this category. When global and detailed routing are not handled separately, area routing is used to directly establish metal connections for signal routing. Unlike routing with multiple metal layers, area routing prioritizes minimizing wire crossings. Clock signals require special considerations.


Area Routing :




Objective: Area routing aims to connect all nets while, bypassing global routing, operating within the available layout, pace, adhering to geometric and electrical constraints.


Optimization Goals: Minimize total routed length and number of vias, optimize wiring area and routing layers used, reduce circuit delay while maintaining uniform wire density, lower capacitive coupling between adjacent routes.

Constraints Considered: Technological constraints like number of routing layers, wire width, electrical constraints like signal integrity, coupling effects, geometric constraints like preferred routing directions, wire pitch etc.

Impact of Net Ordering: - The sequence of net routing affects efficiency and runtime. - Greedy wire-length-based routing can cause inefficiencies. - Multi-pin nets require careful decomposition and ordering.

Net and Pin Ordering Strategies:

1. Pin Ordering: Use Steiner tree-based algorithms to convert multi-pin nets into 2pin nets. Sort pin locations by x coordinate and connect from left to right using shortest- path algorithms.

2. Net Ordering Challenges: Finding the optimal net order is complex (n! Possibilities).

Net Ordering Rules: Nets with larger aspect ratios are routed first. If AR is the same, the shorter net length is prioritized. If a net’s pins are fully inside another net’s bounding box, it is routed first. The net with fewer interfering pins in its bounding box is routed first. Ties are resolved based on total pin count inside the bounding box.


Basic Concepts in Clock Networks:


Most digital designs are synchronous, i.e.,computations

occur in sync with a clock signal . The clock ensures that internal state variables and inputs are processed correctly through combinational logic, generating new outputs and state updates. The clock signal, often called the system’s heartbeat, can be generated off-chip or by on-chip analog circuits like PLLs/DLLs. Its frequency may be divided or multiplied depending on the needs of different circuit

blocks.
Clock tree routing is used to distribute the clock signal efficiently. It builds a clock tree for each clock domain, ensuring that the signal reaches all flip-flops and latches (sinks) at the same time. Unlike other routing types , clock routing focuses on minimizing skew so that all parts of the circuit receive the clock simultaneously.

A clock routing problem involves connecting ( n+1) terminals . A clock routing solution consists of wire
segments that connect all terminals, ensuring the signal from the source reaches every sink.

This solution has two key aspects:

i. Clock tree topology

ii. Embedding

Clock tree topology: A rooted binary tree G with n leaves representing the sinks. Internal nodes include the source and any additional Steiner points. Embedding: Defines the exact physical placement of the edges and internal nodes in the topology. Fig.a illustrates a six-sink clock tree instance, Fig. b shows its connection topology, and Fig. c presents a possible embedded clock tree solution.


Delay in Clock Networks:


Signal delay is the time a signal takes to switch states (low to high or high to low) as it travels through a routing tree. It starts at logic gate outputs, built from nonlinear transistors, and moves through wires and vias, which add parasitic effects. Exact delay calculations are complex, so tools like SPICE or PrimeTime are used for precise "signoff delay" measurements. However, place-and-route algorithms use approximate models, such as the linear and Elmore delay models.
Linear Delay Model : In the linear delay model, signal delay between nodes i and j is proportional to the path length in the routing tree, independent of connection topology. The normalized linear delay between nodes u and w is the sum of edge lengths along the path. On-chip wires have both resistance (R) and capacitance (C), which increase with length. Due to RC effects, wire delay grows quadratically, which the linear model does not capture. Despite this, it remains useful in design tools, especially for older technologies with lower drive resistance and wider wires. Its simplicity makes it widely used in EDA software.
Elmore Delay Model : The Elmore delay model provides a more accurate delay estimate by considering resistances and capacitances in the routing tree, especially for modern circuits with significant RC effects. Physical design tools use the Elmore delay approximation for three key reasons. First, it considers the effect of off-path wire capacitance on sink delay. Second, it provides a good balance of accuracy and correlation with circuit simulator estimates. Third, it can be computed efficiently in linear time using two depth-first traversals—one to determine capacitance below each node and another to calculate delays from the source to each node.

Clock Skew :









Clock Skew : This is the maximum difference in clock signal arrival times between sinks. Since the clock signal should reach all sinks simultaneously, minimizing skew is crucial
for circuit timing.
Local skew : The maximum difference in clock arrival times between related sinks (sinks that are sequentially adjacent, meaning there is a combinational logic path between them).
Global skew : The maximum difference in clock arrival times between any two sinks, whether related or not. It is the difference between the shortest and longest source-to-sink path delays in the clock tree. In most cases, global skew is what is referred to as "clock skew" in design analysis.









Clock Routing Problems:

There are few clock routing problems. Modern low-power clock network design integrates zero-skew trees and relies on SPICE for accurate circuit simulation.

Zero-Skew Tree (ZST) Problem : A ZST ensures that the clock signal reaches all sinks at the same time. Skew is defined using a delay estimate like linear or Elmore delay.
Bounded-Skew Tree (BST) Problem : While ZSTs are useful in theory, exact zero skew is not practical due to, increased wirelength, leading to higher capacitance and manufacturing variations, which cause differences in wire resistance and capacitance. Instead, real-world clock trees allow a small skew within a given bound.

Useful-Skew Tree Problem : In some cases, global skew control is unnecessary. Instead, the focus is on local skew between related flip-flops or latches. Enforcing strict global skew constraints can over complicate the problem. Fishburn proposed a useful-skew method, where clock arrival times at sinks are intentionally adjusted to minimize the clock period, maximize the timing margin. This approach helps ensure that data signals between flip-flops arrive neither too late (zero clocking) nor too early (double clocking). By optimizing sink arrival times, the clock period P) can be reduced, improving circuit performance.


Modern Clock Tree Synthesis

Clock trees are crucial in synchronous circuit design, affecting both performance and power consumption. A well-designed
clock tree ensures low skew, delivering the clock signal to all sequential gates simultaneously. After the initial tree
construction the clock tree goes through, Clock buffer insertion to strengthen the signal, Skew optimization to further
reduce timing variations.

Constructing Trees with Zero Global Skew :
Five early clock tree construction algorithms, whose core ideas still influence modern EDA tools. These algorithms address different scenarios:
1. Builds a clock tree without considering exact sink positions.
2. Constructs both the tree structure and physical layout simultaneously.
3. Given a predefined topology, determines its physical
placement.



1. H-Tree :
The H-tree is a self-similar fractal structure that ensures exact zero skew due to its symmetry. It is built by recursively dividing a unit square :
- A central segment is placed through the root.
- Two shorter perpendicular segments extend to the centers of four quadrants.
- This process continues until reaching all sinks.
The H-tree is widely used for top-level clock distribution, but it has limitations:
1. Blockages can disrupt the pattern.
2. Irregular sink placement makes direct implementation difficult.
3. High routing cost.
To reduce signal reflections, wire tapering is used, where the width halves at each branching point.

2. Method of Means and Medians (MMM) :



The MMM algorithm, proposed by Jackson, Srinivasan, and Kuh in 1990 , improves on the H-tree by handling arbitrarily placed sinks. It follows a recursive approach: 1. Partition terminals into two equal subsets based on the median. 2. Connect the center of mass of the whole set to the centers of mass of both subsets (mean). This method is flexible and adapts to different sink distributions. While MMM can be simplified into an H-tree-like approach, it only minimizes skew heuristically. In the worst case, the longest source-to-sink path can be as large as the chip diameter, making the algorithm's effectiveness highly dependent on cut direction selection.
3. Recursive Geometric Matching (RGM) :




The RGM algorithm, introduced in 1991, is a bottom-up approach to clock tree construction, unlike the top-down MMM algorithm.
How RGM Works :
The algorithm finds a minimum-cost geometric matching of n/2 line segments, ensuring no two segments share an endpoint while minimizing total segment length. After matching, a balance or tapping point is placed on each segment to maintain zero skew between connected sinks. The n/2 tapping points from one stage serve as inputs for the next matching step. The process continues until the clock tree is fully constructed.


MMM vs. RGM :


RGM improves clock tree balance and reduces wirelength compared to MMM. Above figure compares MMM and RGM for four sinks. If MMM selects a poor cut direction, RGM can cut wirelength in half. However, like MMM, RGM does not ensure zero skew. If two subtrees have very different delays and their roots are matched, finding a zero-skew tapping point may not be possible.


4. Exact Zero Skew Algorithm :


Proposed in 1991, the Exact Zero Skew Algorithm improves upon RGM by ensuring precise skew balancing using the Elmore delay model instead of the simpler linear delay model.

Key Feature : It calculates exact zero-skew tapping points using the Elmore delay model, leading to lower actual clock skew in real designs. When merging two sub-trees with different source-sink delays, it adjusts wire lengths to equalize the delays. During merging, the zero- skew tapping point is carefully placed along the matched segment.
5. Deferred-Merge Embedding :



The DME algorithm improves clock tree construction by delaying the selection of merging (tapping) points for
subtrees. Unlike earlier methods that fix internal node locations early, DME optimally places these nodes later, ensuring, minimal source-to- sink delay and minimal total tree cost.
How DME Works : Unlike MMM and RGM, which only require a set of sink locations, DME needs a predefined tree topology as input. Key Advantage : MMM, RGM,Exact Zero Skew fix internal node positions too early, limiting flexibility. DME ensures more optimal placement of internal nodes, leading to better clock tree performance.

Clock Skew Optimization & Power Trade-off :





Key Challenges: Low clock skew is critical for high-performance designs. Clock networks consume significant power, requiring trade-offs between skew and capacitance. Accurate timing analysis is needed, although simulations are time-consuming. Closed-form delay models are inaccurate. A common approach is to optimize using Elmore delay model and then fine-tune with more accurate models.
Clock Tree Optimization Steps:
Clock Tree optimization includes ,
1.Geometric clock tree construction,
2. Initial clock buffer insertion,
3. Clock buffer sizing,
4. Wire sizing,
5. Wire snaking.
These steps account for PVT variations. High-Level Skew Optimization: - Earlier, a single buffer could drive the clock tree. - With technology scaling, multiple buffers are required. - Buffer insertion ensures the clock signal reaches all sinks efficiently. - Ginneken's algorithm optimally buffers a tree to minimize Elmore delay: - Runs in O(n²) time, where n is the number of buffer locations. A faster O(n log n) variant improves scalability. Optimizations reduce skew, power consumption, and variability effects.
Clock Buffer Sizing: - Initial buffer sizes impact later optimizations. - Best size is determined experimentally (e.g., binary search). - To fix skew between two sinks s₁ and s₂: - Identify the unique path (ÊŒ) between them. - Upsize buffers based on precomputed tables. - Larger buffers improve robustness but increase power and delay.

Wire Sizing: - Wire width affects power and manufacturing variation. Wider wires are m ore resilient to variation, although the capacitance and power consumption both are higher. Thinner wires are used in low-power designs, Wire width can be adjusted dynamically based on timing analysis.
Low-Level Skew Optimization: - Focuses on local adjustments with higher precision. - Techniques: Wire sizing i.e , adjust widths for fine-tuned timing, wire snaking i.e, increase path length to delay fast signals, wire detouring increases capacitance & resistance, slowing propagation.




Variation Modeling:
- Process variations impact each transistor differently. - Environmental factors (e.g., temperature, voltage fluctuations) affect performance. - Two modeling approaches: 1. Monte Carlo simulations, accurate but slow. 2. Precomputed lookup tables, efficient and reusable - Captures worst-case skew variations based on: - Technology node, buffer/wire library, path length, variation model, yield. - Enables fast, accurate optimizations (e.g., buffer sizing). Advanced Clocking Techniques: Active deskewing & clock meshes (common in CPUs). Clock gating (reduces power dissipation).


Watch the video lecture here:






COurtesy: Image by www.pngegg.com

4/07/2025

What is detailed Routing in VLSI Physical Design?

 



In this article, we have explored the concept of detailed routing in VLSI Physical Design, an essential step in the overall design flow that ensures efficient signal connectivity while optimizing chip area and performance. We begin by discussing the various routing techniques used in modern VLSI design, along with the significance of horizontal and vertical constraints in determining routing feasibility. The video further delves into zone representation and the Horizontal & Vertical Constraint Graph, which are crucial for structuring the routing process systematically. Additionally, we cover different routing methodologies, including channel routing techniques such as the Left Edge Algorithm and Dogleg Routing, as well as switchbox routing and Over-The-Cell (OTC) routing, explaining both the algorithm and methodology behind OTC routing. Finally, we address the modern challenges faced in detailed routing, highlighting the complexities and evolving strategies required to meet the demands of advanced semiconductor technologies.


Design Flow and Detailed Routing:




In VLSI , routing is a crucial step that connects the various components on a chip. It involves determining the paths for electrical connections, which are implemented through metal layers on the chip. Routing is divided into three major stages:

(i) Global, (ii) Detailed & (iii) Specialized Routing.

The layout region is represented during global routing using a coarse grid of global routing cells (gcells) or more general routing regions (channels, switchboxes). After global routing, each net undergoes detailed routing. Objective of detailed routing: Assign route segments of signal nets to specific routing tracks, vias, and metal layers while following global routes and design rules.

Key advantage:

(a) Detailed routing of one gcell can be performed independently as long as routes remain connected across neighboring gcells.

(b) This enables an efficient divide-and-conquer strategy and supports parallel algorithms, allowing detailed routing runtime to theoretically scale linearly with layout size.

(c) Traditional detailed routing occurs within routing regions such as channels and switch boxes. Modern designs use over-the-cell (OTC) routing , allowing routing over standard cells.

Due to technology scaling, modern detailed routers must consider manufacturing rules and the impact of manufacturing faults.


Different Routing Techniques :




i. Channel Routing :
Type of detailed routing where connections between terminal pins are routed within a channel with no obstacles. Pins are located on opposite sides of the channel .Conventionally, the channel is oriented horizontally, with pins on the top and bottom. In row-based layouts, routing channels typically have uniform width. In gate-array and standard-cell circuits with more than three metal layers, channel height (number of routing tracks) is also uniform.

ii .Switchbox Routing : Used when pin locations are on all four sides of a fixed-size routing region. More complex than channel routing due to additional constraints. OTC (Over-The-Cell) Routing : Utilizes additional metal tracks (e.g., Metal3, Metal4) that are not obstructed by cells. Allows routes to cross over cells and channels. Only metal layers and tracks not occupied by cells can be used. When cells use only polysilicon and Metal1, routing can be performed on Metal2, Metal3, etc., and unused Metal1 resources.

iii. Classical Channel Routing : Routing area is a rectangular grid with pin locations on top and bottom boundaries. Pins are placed on vertical grid lines or columns. Channel height depends on the number of tracks needed to route all nets. In two-layer routing, one layer is reserved for horizontal tracks while other layer is reserved for vertical tracks. Preferred routing direction is determined by floorplan and standard-cell row orientation. 


Horizontal & Vertical Constraint:


1. Horizontal Constraint : 

A horizontal constraint between two nets occurs when their horizontal segments overlap while being placed on the same track. In the example shown includes one horizontal and one vertical routing layer, nets B and C are horizontally constrained. If the horizontal segments of two nets do not overlap, they can be assigned to the same track without constraints for instance, nets A and B.


2. Vertical Constraint :

 


A vertical constraint between two nets  occurs when they have pins in the same column. This means that the vertical segment extending from the top must stop within a short distance to avoid overlapping with the vertical segment coming from the bottom in the same column. If each net is assigned to a single horizontal track, the horizontal segment of a net from the top must be placed above that of a net from the bottom in the same column. In Fig. , this constraint ensures that net A’s horizontal segment is placed above net B’s. To resolve these constraints, at least three columns are needed to separate the two nets. While a vertical constraint implies a horizontal constraint, the reverse is not always true. However, both constraints must be considered when assigning segments within a channel.



Zone Representation :



In a channel, each horizontal wire segment must extend at least from the leftmost to the rightmost pin of its net. Let S(col) represent the set of nets passing through column col. This includes nets that either (1) have a pin in col or (2) connect to pins on both sides of col. Since horizontal segments cannot overlap, each net in S(col) must be assigned a separate track within that column. However, not all columns are necessary to define the entire channel. If a column i has a net set S(i) that is a subset of another column j (i.e., S(i) ⊆ S(j)), then S(i) can be ignored as it imposes fewer constraints on routing. In the above fig , every S(col) is a subset of at least one of S(c), S(f), S(g), or S(i). These columns (c,f, g, and i) form the minimal set needed, as they collectively include all nets. The relative positions of nets in a channel routing instance, defined by horizontal and vertical constraints, can be represented using horizontal and vertical constraint graphs.  These graphs help to: (1) estimate the minimum number of tracks needed , (2) identify potential routing conflicts.


Horizontal & Vertical Constraint Graph:

1. Horizontal Constraint Graph : 


 

A graphical representation can be used to depict the nets within a channel. This can be done using a Horizontal Constraint Graph (HCG), where: nodes (V) represent the nets in the netlist. Edges (E) exist between two nodes if their corresponding nets belong to the same set S(col), meaning they are horizontally constrained.

Fig. (iv) shows the HCG for the channel routing example in Fig.(iii). The minimum number of tracks required for channel routing can be determined using either the HCG or the zone representation. This minimum is given by the largest S(col) set. 


2. Vertical Constraint Graph : 


Vertical Constraint Graph (VCG) represents vertical constraints in channel routing,  nodes (V) represent nets , directed edges (E) exist between nodes if one net must be placed above another. In Fig. (v), some edges (like B → C) are omitted if they can be inferred from other 
                                      edges (e.g., B → E ).


A cycle in the VCG indicates a conflict where two nets overlap in a column, meaning their horizontal segments would need to be both above and below each other—an impossible situation. This is resolved by splitting the net and adding an extra track . fig(vi).


Channel Routing Algorithms :

Channel routing aims to minimize the number of tracks needed for routing. In gate-array designs, where channel height is usually fixed, algorithms are developed to ensure complete (100%) routing.




1. Left-Edge Algorithm: An early channel routing algorithm was developed by Hashimoto and Stevens. Their simple and widely used left-edge heuristic, based on the VCG and zone representation, efficiently maximizes track usage. The VCG determines the order in which nets are assigned to tracks,
while the zone representation decides which nets can share the same track. Each net uses only one horizontal segment (trunk).

 The left-edge algorithm works as follows:
1. Start with the topmost track.
2. For all unassigned nets, generate the VCG and zone representation.
3. Process nets from left to right, assigning each to the current track if:
- It has no predecessors in the VCG.
- It does not conflict with previously assigned nets.
4. Once a net is assigned, remove it from the unassigned list.
5. Move to the next track and repeat the process until all nets are assigned.


This algorithm finds a solution with the minimum number of tracks if the VCG has no cycles. Yoshimura later improved track selection by considering net length in the VCG, and Yoshimura and Kuh further optimized track usage by splitting nets before constructing the VCG.


Dogleg Routing :






To handle cycles in the VCG, an L-shaped "dogleg" can be used. Doglegs help resolve conflicts in VCGs and reduce the total number of tracks. Dogleg algorithm, developed in the 1970s, eliminates cycles and minimizes routing tracks by extending the left-edge algorithm.




 It splits p-pin nets (p > 2) into p – 1 horizontal segments, but only in columns where the net has a pin, assuming no extra vertical tracks are available. After splitting, the algorithm follows the left-edge approach, with subnets represented in the VCG and zone representation.

Switchbox Routing :






Switchbox routing algorithms are often derived from channel routing techniques. Luk extended a greedy channel router by Rivest and Fiduccia to develop a switchbox routing algorithm with key improvements:
1. Pin assignments are made on all four sides.
2. A horizontal track is assigned automatically to a pin on the left.
3. Jogs are used for top and bottom pins and for horizontal tracks connected to the rightmost pins.
While this algorithm performs similarly to the greedy channel router, it does not guarantee full routability due to fixed switchbox dimensions.

Ousterhout et al. introduced a channel and switchbox router that considers obstacles like pre-routed nets . Cohoon and Heck developed BEAVER , which optimizes routing area and via usage. BEAVER offers flexibility in layer routing and
employs four strategies:
1. Corner-routing – uses horizontal and vertical segments forming bends.
2. Line-sweep routing – handles simple connections and straight segments.
3. Thread-routing – supports various connection types.
4. Layer assignment – optimizes layer usage.
BEAVER surpasses previous academic routers in routing area and via efficiency.
Another notable switchbox router, PACKER, developed by Gerez and Herrmann in
1989 follows three main steps:
1. Routing each net independently without considering capacity constraints.
2. Resolving conflicts using connectivity-preserving local transformations (CPLT).
3. Modifying net segments locally to reduce congestion.


Over-the-Cell (OTC) Routing Algorithms :




Most routing algorithms focus on two-layer routing. However, modern standard-cell designs use multiple layers, requiring
extensions to these algorithms. One common approach places cells back-to- back or without routing channels. Internal
routing primarily uses Poly and Metal1, while higher metal layers (e.g., Metal2 and Metal3) remain unobstructed and are used for over-the- cell (OTC) routing. These layers are
represented by a coarse routing grid of gcells. Nets are first globally routed as Steiner trees and then detail-routed.

Another approach introduces channels between cells, but routing within them is limited to internal layers like Poly and Metal1. Higher metal layers (Metal2, Metal3) handle most routing, making traditional routing channels unnecessary. Instead, routing occurs across the entire chip  rather than in defined channels or switchboxes . 

OTC routing often coexists with channel routing. For example, IP blocks may block routing on lower metal layers, forming channels or switchboxes between them. FPGA fabrics use fewer metal layers to reduce costs, clustering interconnects into channels. FPGAs also include pre-designed components like multipliers and DSP blocks that rely on OTC routing. Modern FPGAs feature express-wires on higher metal layers that cross logic elements.



Modern Challenges in Detailed Routing :

1. Technology Scaling & Wire Widths:
Demand for low-cost, high-performance, and low-power ICs has driven technologyscaling since the 1960s. Different metal layers use wires of varying widths, with wider wires on higher layers improving signal speed but reducing routing tracks. Thicker wires are commonly used for clock, power supply, and global interconnects.

2. Routing Complexity & Layer Configurations:
Modern ICs use different metal layer configurations to optimize performance. Vias connecting wires of different widths block routing resources on layers with smaller pitches.
.
3. Manufacturing Yield & Detailed Routing: 
Yield concerns require via doubling and non-tree routing for redundancy. At advanced nodes, design rules become restrictive, specifying minimum spacing between wires and vias based on width and corner proximity. Forbidden pitch rules prohibit certain wire spacings while allowing others.

4. Via Defects & Mitigation: 
Vias connect wires between metal layers but may misalign or degrade due to electromigration. Partial via failure increases resistance, leading to timing violations; complete failure disconnects nets. Double vias improve reliability but require additional area and adherence to design rules. Congested areas may limit via doubling options.

5. Interconnect Defects & Redundancy Measures:
Shorts & Opens: Shorts (unintended wire connections) are mitigated by increased spacing, though excessive spacing
raises wire length. Opens (broken connections) are addressed via non-tree routing and redundant wiring, which improves
reliability but increases short risks.

6.  Antenna-Induced Defects: 
Excessive charge buildup during plasma etching can damage transistor gates. Mitigation involves controlling metal-to-gate area ratios and rerouting via new or relocated vias.

7. Manufacturability-Aware Routing:
Yield-optimized detailed routing is proposed, but its benefits are hard to quantify pre- manufacturing, limiting industry adoption.







Watch the video lecture here:







Courtesy : Image by www.pngegg.com