12/05/2024

Debugging Techniques, Timing and Strategies , Synthesis : Episode - 4



In this article, we delve into several advanced topics related to synthesis and optimization using Yosys. We start by discussing the internal methodology of synthesis tools, providing a detailed look at how they function. Next, we explore optimization macros in Yosys, focusing on their role in enhancing the efficiency of the synthesis process. We introduce FSM (Finite State Machine) optimization, explaining its importance and the specific handling macros used in Yosys for this purpose. We also cover the methodology for detecting FSMs in Yosys and the subsequent steps for extracting and optimizing these state machines. Additionally, we discuss technology mapping, which involves substituting cells, gates, and subcircuits to achieve the desired design specifications. Finally, we provide a comprehensive synthesis summary, encapsulating all the key points discussed in the article to give a complete understanding of the synthesis and optimization process.

Synthesis Tool Internal Methodology :


We’ll review the above infographic  that explains the key engines in a synthesis tool, using YoSys as a reference. These engines work in sequence to process and optimize your design from HDL to an optimized, technology-specific netlist.  

Key Engines in Synthesis: 
1. Translation Engine  
   - Converts the Verilog description (written in any abstraction) into an un-optimized intermediate representation (IR).  
   - This step happens once and is essential for the subsequent stages.  

2. Optimization Engine  
   - Receives the unoptimized IR and performs multiple iterations of logic optimizations.  
   - These optimizations aim to improve power, performance, and area (PPA).  
   - Runs both one-time and iterative optimizations until a predefined goal is achieved.  

3. Mapping Engine
   - Combines the optimized gate-level netlist with technology-specific data (from standard cell libraries and PDKs).  
   - Outputs an optimized tech gate-level netlist, embedding technology-specific information.  

This flow highlights how each engine plays a distinct role, with the translation engine used once, the optimization engine iterating multiple times, and the mapping engine finalizing the process.  

More On Yosys Optimization Macros:

opt_muxtree: 

- Optimizes multiplexer cell trees by analyzing select inputs.

- Replaces inner multiplexers with constants to simplify logic where contradictions exist.

opt_reduce:

- Consolidates identical input bits for $reduce_and and $reduce_or cells

- Sorts input bits for easier identification of shareable cells.

- Consolidates identical inputs to multiplexer cells using $reduce_or.

opt_rmdff:

- Replaces single-bit d-type flip-flops with constant drivers if they have constant data inputs.

opt_clean:

- Identifies and removes unused signals and cells.

- Adds "unused_bits" attribute for debugging and other optimizations.

opt_merge:

- Performs resource sharing by replacing cells with identical inputs with a single instance.

- Option "-nomux" disables resource sharing for multiplexer cells ($mux and $pmux) to preserve multiplexer trees for later optimizations

FSM Optimization : Introduction



Extracting the FSM Logic:

This means finding, taking out, and changing the part of a computer program that acts like a decision-maker into a special format. Think of it like finding and organizing the rules that a computer follows to make decisions.

Optimizing the State Table:

Making the decision-making part more efficient. This involves making the rules simpler and using as few resources as possible. It's like making a car run better by using less fuel and having less traffic. 

FSM Handeling Macros in Yosys:

The fsm pass performs finite-state-machine (FSM) extraction and re-coding. The fsm pass simply executes the following other passes:

Identify and extract FSMs: fsm_detect, fsm_extract

Basic optimizations: fsm_opt, opt_clean , fsm_opt

Expanding to nearby gate-logic (along with -expand): fsm_expand, opt_clean, fsm_opt

Re-code FSM states (unless called with -norecode): fsm_recode

Print information about FSMs: fsm_info,

Export FSMs in KISS2 file format (along with -export): fsm_export

Map FSMs to RTL cells (unless called with -nomap): fsm_map


FSM Detection Methodology In Yosys :

Identifying FSM State Registers: 

- There's a process called "fsm_detect" that looks for special memory-like parts in a computer program.

- It marks these parts as important by giving them a label called "\fsm_encoding = "auto"" under certain conditions.

- It only does this for parts that meet specific criteria, like not already having the label and being connected in a certain way.

Why This Matters: 

This helps make the program run better and faster.

But, it's essential to be careful because wrongly labeling these parts can cause problems.

Steps in the Process: 

First, the "fsm_detect" process finds and labels these special parts. 

Then, another process called "fsm_extract" takes these labeled parts and replaces them with something more efficient.

After that, there are more processes that work with these special parts. 

Finally, there's a step called "fsm_map" that turns them back into regular parts.

N.B: All of these processes happen to a list of instructions, like a recipe. They should be done in the right order, and only after certain other things have been done to the instructions.


FSM Extraction and Optimization : 

What fsm_extract Does:

- It works on specific signals marked as important (not labeled "none").

- For each of these important signals, it figures out:

- Which are the special memory-like parts.

If these parts have a special reset condition.

All the different situations these parts can be in.

What results come from these situations.

A list of all the changes that can happen and the reasons why.

How It Works:

To find the special memory-like parts, it follows the path that controls them. It also looks at the values these parts compare to and how they affect other things. It makes a table that shows all the possible changes in these special parts.


FSM Extraction and Optimization :

Creating the Table: 

To make the table, it uses a special tool called "ConstEval." It goes through each situation and tries to figure out what happens next. If it can't, it tries again with different conditions. Once it knows all the possible changes, it makes a table. After all this, it makes a special part in the program and connects it to the important signals. It disconnects any old connections that are not needed anymore.  

Optimizing with fsm_opt:

Later, there's another process called "fsm_opt" that makes this special part run even better. It does things like removing unnecessary parts, combining similar parts, and making sure everything works efficiently. 

Technology Mapping : Cell Substitution 

Phase 1 - RTL to Internal Cells: In the first phase, RTL cells are mapped to an internal library of single-bit cells. This phase helps in converting high-level RTL descriptions into a more abstract internal representation.

Phase 2 - Internal Cells to Target Technology Gates: In the second phase, the netlist composed of internal gate types is transformed into a netlist of gates from the target technology library. This phase adapts the design to the specific hardware technology being used.

Mapping Coarse-Grain Cells: When the target architecture includes coarse-grain cells like block RAM or ALUs, these must be directly mapped to the RTL netlist. This is necessary because information about the coarse-grain structure is lost when mapping to bit-width gate types.

Cell Substitution (Techmap Pass): The simplest form of technology mapping involves cell substitution, which is performed by the techmap pass. It replaces RTL cells with a provided implementation using simpler cells.

Mapping without a Map File: If no specific map file is provided, techmap uses a built-in map file that translates Yosys RTL cell types into the internal gate library used by Yosys. This map file can be found in the Yosys source tree.

Conditional Mapping: Additional features in techmap allow for conditional mapping of cells. This is useful when the target architecture supports certain hardware features for specific bit-widths but not for others. Conditional mapping allows for flexibility in choosing the appropriate cell types based on the design requirements.

Typical Synthesis Flow: In a typical synthesis flow, the techmap pass is used first to directly map some RTL cells to coarse-grain cells provided by the target architecture (if available). Then, techmap with the built-in default map file is used to map the remaining RTL cells to gate logic in the target technology library. This two-phase approach ensures efficient technology-specific synthesis of the design.

Technology Mapping : SubCkt Substitution

In VLSI synthesis, sometimes the target architecture offers more powerful cells than those available in RTL descriptions used by Yosys. For instance, a target architecture may have cells capable of performing complex operations like calculating the absolute difference of two numbers, which cannot be directly matched with single RTL cell types but require combinations of cells. 

To address this, Yosys provides the "extract" pass, which serves several purposes:

Identifying Isomorphic Subcircuits: The extract pass can match a given set of modules against a design and identify portions of the design that are structurally identical (isomorphic subcircuits) to any of the provided modules. These matched subcircuits can then be replaced by instances of the specified modules.

Handling Basic Variations: The extract pass can also recognize basic variations of the given modules, such as cases with swapped inputs on commutative cell types.

Frequent Subcircuit Mining: The extract pass has limited support for frequent subcircuit mining, which involves finding recurring subcircuits within the design. This capability has applications in designing new coarse-grain architectures.

The algorithmic work performed by the extract pass, including solving the isomorphic subcircuit problem and conducting frequent subcircuit mining, is accomplished using the SubCircuit library. This library can be used independently of Yosys and provides the necessary functionality for these tasks.


Technology Mapping : Gate Level

Liberty File: Target architectures at the gate-level are described using a "Liberty file," an industry-standard format that details the behavior and properties of standard library cells.

Mapping Phases: Mapping a design to the Yosys internal gate library is a two-phase process:

Register Cell Mapping: Initially, register cells in the design need to be mapped to the registers available in the target architecture. The target architecture may not provide all variations of flip-flops, so additional logic like inverters may be added. It's crucial to map register cells first. The "dfflibmap" pass, using a Liberty file as an argument, performs this mapping.

Combinational Logic Mapping: The second phase involves mapping the combinational logic to the target architecture. This is achieved using the external program ABC via the "abc" pass, with the -liberty option specifying the Liberty file. Only the combinational cells from the library are used in this case.

Sensitive Information: Some Liberty files may contain sensitive trade secrets, such as timing data, making it challenging to share or report issues with the tools. To address this, the "yosys-filterlib" tool can be employed to remove sensitive information from the Liberty file, ensuring that non-sensitive data can still be used for synthesis without revealing proprietary details.


Synthesis Summary:

Don't-Care Conditions: Assignments to 'x' in a case or an if statement are treated as don't-care conditions during  synthesis. 

Casex and Casez Statements: Synthesis tools treat 'casex' and 'casez' statements as regular 'case' statements. 

Conditional Operator and Three-State Device: When a conditional operator assigns 'z' to the right-hand side expression in a continuous assignment within a level-sensitive block, it will be synthesized into a three-state device driven by combinational logic. 

Operator Grouping with Parentheses: Using parentheses helps control operator grouping and reduce circuit size. 

Feedback-Free Netlist: Combinational primitives in a feedback-free netlist will be synthesized into latch-free combinational logic.

Continuous Assignments: Feedback-free continuous assignments are synthesized into latch-free combinational logic, but those with feedback are synthesized into latches.

Completeness in Combinational Logic: A Verilog description of combinational logic must ensure that outputs are assigned values for all possible input combinations.

If Statements and Latches: If an if statement in a level-sensitive block assigns a value to a register variable in some branches but not all, it will be synthesized into a latch.

Edge-Sensitive Blocks: Variables referenced within an edge-sensitive block before being assigned a value will be synthesized as flip-flop outputs (e.g., nonblocking assignment).

Elimination of Unused Variables: Variables assigned within a block but not referenced outside it will be eliminated during synthesis (e.g., iteration constants).

Edge-Sensitive Block Outputs: Variables assigned within an edge-sensitive block and referenced outside will be synthesized as flip-flop outputs (e.g., declared as output ports).

State Machine Design: To describe an explicit state machine, use two cyclic blocks: a level-sensitive block for combinational logic and outputs in the next state, and an edge- sensitive block to synchronize state transitions.

Procedural and Nonblocking Assignments: In level-sensitive cyclic blocks describing combinational logic, use the procedural assignment operator ('='). In edge-sensitive cyclic blocks describing state transitions and register transfers, use the non-blocking assignment operator ('<=').


Watch the video lecture here:

Courtesy: Image by www.pngegg.com