Showing posts with label VLSI-FAQ. Show all posts
Showing posts with label VLSI-FAQ. Show all posts

12/04/2024

What is Verification IP [VIP] in VLSI ?



In this article, we delve into key aspects of verification, beginning with an overview of general verification strategies that are essential for ensuring reliable design and functionality. We explore the need for robust verification processes, emphasizing their role in enhancing design accuracy and reliability, especially in complex systems. A detailed verification flow chart is presented to guide viewers through the structured sequence typically followed in verification. We also explain the concept of Verification IP (VIP), outlining the general verification blocks included within VIP and comparing these with those in a regular testbench. Additionally, we discuss the unique advantages of using VIP, particularly its ability to streamline verification and enhance testing efficiency, and conclude with an open example that demonstrates VIP in action.


Once you complete the article you will understand:  

1. What is verification IP in VLSI? 

2. Why a robust verification plan is necessary in VLSI with respect to the VIP context? 

3. How general verification test bench is weaker than a verification IP for the same protocol? 

4. Understanding verification IP using a open example


General Strategies of Verification:

Understand the architecture and micro-architecture; partition logic to create efficient RTL descriptions using moderate gate count blocks. Apply a bottom-up approach, and deploy synchronizers at the top-level design if needed. Use synthesizable constructs during RTL design and non-synthesizable constructs for RTL verification. Use blocking assignments for modeling combinational logic and non-blocking assignments for sequential designs. Avoid mixing blocking and non-blocking assignments. Apply optimization constraints at the RTL level to improve performance. Refer to subsequent chapters for better design and optimization understanding. Develop a robust verification architecture and verification planning for the design. Understand coverage requirements and implement verification strategies to meet coverage goals.




The above one is a picture of a desktop computer motherboard. There are so many types of connections are present which includes  connection for the hard drives, SATA ports, allocation for RAM,  the processor and on top of it, there will be a fan, PS 2 ports, USB port,  RG 45 socket, multiple D shaped ports, audio ports. 

It has to interact with the keyboard and mouse. It has to interact with other plug and play devices like printer, scanner. It has to interact with the monitor. It has to interact with the audio devices whether it is a microphone or a speaker. And here, it will interact with the 1 or more hard drives that are connected. In real time, when you turn up a desktop PC, right, for this CPU, we currently have a rush of information around it. However, when you had it on pen and paper, right, here you have listed things. But in real time, everything will be jumping from here to there, from there to some where else. So all these in and out operation will be simultaneously getting performed. And obviously, the hard OS will be booted from the hard disk. Now you can imagine a enormous data flux is coming from all around all on a sudden, and they are coming simultaneously. It's not that in sequentially, they are coming when you have turned on. The boot sequence has been performed, the OS is loaded and, your OS interface has been ready whether it is a Linux or it is a Windows or it is a Mac. So the user interface is ready. Now multiple things are already started to operate. Simultaneously, multiple operations are ongoing. So this is a real time scenario. That means our CPU has to take a load from all of these kind of applications. This is a practical example that you encounter every day and that's why I have picked this to make you understand why we need a robust verification. To incorporate all these kind of complex things can happen with the CPU. We have to write a exhaustive verification deck for the device under test. Here, the device is CPU. In your case or my case, it could be a design under test. That means a block. So to understand the need of the hour of the robust verification by these particular simple infographics. So this is a very sweet and simple example that will help you to understand why we need a robust verification because from beginning days, you know about the truth table. You will say, okay, we have a truth table kind of stuff and the device will operate like a standard defined way. However, in real time, in real life, when a chip is plugged into the board and the board has so many components and they will forward data and address in multiple direction from multiple sequences, the design can get crashed. And the target of verification is that we will make sure that at what time and what condition this DUT, that means your block or the full SOC, can crash. So we have to crash it. That is the purpose of the verification and hence we need a robust verification plan. So we are done with this particular infographics. Let's move on to the next slide. General verification flowchart.

General Verification Flow Chart



Here we will show you a general verification flowchart. This will contain several blocks. So at the left, there will be some block and from here we will fork outside and right side, there will be block some block. So this will happen maybe in parallel and at some point, we'll have a merger here. So wait for that.
We say so to have a proper kind of idea what we are going to see. So first, we have functional specification. It is also known as spec. That means we use the 4 alphabet, s p e c, for this. Generally, you can see as a spec or it could be a functional specification full word. Next, come the test plan. In previous slide, we have mentioned that your design under test can be bombarded with multiple address and data details from multiple directions simultaneously. We have to have a good test plan. Next, we have the assertions.

Now what are the assertions?
Assertion means you are saying affirmatively that this design will obey these kind of tasks. That means it could be a port having a particular sense of operation or it could be a bus that should be behaving in a particular way like this. So once you go in detail of the coding, you will understand what are the assertion. But as a overall, we can see assertions are way of hard coding the way the particular code or the block of code or the section of code which corresponds to our design behavior should remain in a firm way. And in case there is a violation, the verification text should flag a error or warning, whatever is applicable. Next, we proceed for hardware description coding, then RTL and also including the UPF. So we have verilog series. We already have UPF series. And for RTL, we have synthesis series. Each of them now has a marathon so that you can get the entire series in a single video as a marathon. In case you need to learn, you can go ahead and learn. We have all the resources here already in our channel. And in this arrow direction, the flow will go farther forward. Now comes the linting.  Linting, we already have published a linter on TCL  in this channel. You can go ahead and see that. You will find it in the TCL playlist. Or in case you search it, you will get it. The linting is nothing but a syntax checker routine that allows, engineer to verify the code even before running it. So that if there is a syntactical error or a semantical error, it will be flagged by the Linter tool. Linter tool could be there for Tcl. It could be for Verilog, system Verilog.

It could be there for PERL , Python, any kind of language, whether it is a programming language or a scripting language or hardware description language (HDL) or hardware description and verification language (HDVL). Linting is a general concept, and you can find linters as resources to use them for your good. Next is simulation with assertion and checking. Here, we said about the assertion. That means assertion and the code are there, and it is limited.
That means that we have almost eliminated kind of, in tactical or semantical errors. And then we simulate along with assertion and checking. So up to this stage we have reached. Now comes the bridging part. Now we have the test bench that means with the HDVL like systemverilog, and we plug it in here.
This was the hardware description and linting, and with assertions, we have done some checking for the code and everything. It is ready. Now we plug in the systemverilog test bench to our design. That is the DUT. Next, we will look for the functional coverage and the code coverage.

These are 2 detailed subjects. We will  not go in detail about them, but this will make sure that the exhaust stiffness of the robust verification are implemented as we have discussed with the infographics in earlier slide. Finally, if needed, from this step, we can go back to the test bench again and we do some modification or we may have to insert assertion or something like that as needed. And then we again come back to this state. This loop may go around until you are satisfied that the exhaustive and robust verification is implemented.

It was planned here. However, in coding, when we go for coding, we think in a way and we implement the code and then we try to check. And there is some flaw at the code level deviating from what we have thought in the planning stage. So we have to put some additional ports there to make the code go complete in nature and to tally with our test plan. And once this loop completed, then we will have the functional coverage and the code coverage, and we will have the verified RTL code and verification results.

So here, our verification, we end and we have it verified. So here in this slide, we have talked about, general verification flowchart. In actual practice, when you work in a company or you are using particular tool, you can see there is a little modification way around this particular flowchart. Obviously, because one representation can be represented by different persons in different way and with different perspectives. So all those perspective will be rest to the respective tool person or whoever or maybe the verification engineer who are doing the actual verification.

So this is a general concept. Keep it in mind and stay flexible that something can come in between or this kind of flowchart may have some changes because as we are proceeding further right, many more things are getting updated and things can come in between. So this is a very simple structure to understand the verification flowchart. 

What is Verification IP (VIP):

Verification IP (VIP) in VLSI is an essential tool in the chip design and development process. It provides a standardized, reusable component to verify and validate chip functionality against specific protocols or behaviors. Verification IP is a reusable, modular component used specifically in the verification phase of chip design. It is employed to test and validate the behavior and functionality of the design under test (DUT). Verification IP simplifies the process by providing a pre-built, standardized way to test whether the DUT complies with certain protocols or functions. VIP reduces the time and effort required to ensure that a chip design works correctly before moving to manufacturing.


General Verification Blocks in VIP:


Test Generator: Creates stimuli to drive the DUT.

Monitor: Passively observes and captures DUT signals for analysis.

Checker: Compares DUT output with expected values for

correctness.

Scoreboard: Tracks and compares transaction-level data

over time for consistency.



Verification Blocks Comparison : Regular TestBench Vs VIP




Advantages of Using VIP:

Each VIP is configured to simulate the behavior of these protocols and verify that the DUT adheres to their specifications. For example, a PCIe Verification IP will emulate data transfers, error scenarios, and ensure compliance with PCIe protocol standards. 

There are many such Protocols for which VIPs are created/avilable , such as:

1. PCIe (Peripheral Component Interconnect Express)

2. AXI (Advanced eXtensible Interface)

3. I2C (Inter-Integrated Circuit)

4. Ethernet

5. USB (Universal Serial Bus) .... & many more.

6. Time-saving: Reusable across multiple projects.

7. Comprehensive Testing: Provides a wide range of test scenarios, including edge cases.

8. Standardization: Ensures the DUT adheres to industry-standard protocols.

9. Automation: Automatically generates stimuli and checks results, reducing human error.

Open Example of VIP from GitHub :

We have talked theoretically and info graphics about the VIP. Now the VIP concept is with you. Now we will unbox one particular open example from GitHub. 



You can directly reach to this URL and this VIP is for AXI  protocol. Mostly the author name is Kumar Rishav. And here you can see this is a MIT license is there, and you can go through it. Now if you go down further, scrolling down the page here, you can see this is the block diagram of the VIP. Now here you can see the test bench top and we have the test module here. Inside that, we have sequence. You can find similarity with the block diagram that I have shown in the slides.



However, you can see there is a difference in arrangement. Now here it is a master slave architecture. And we have write sequencer. We have read sequencer. We have monitor. We have driver that drives this sequenced data into the interface. And interface will talk with the DUT. Since the DUT is not here, it is not shown because your VIP will not contain the DUT. It is the verification capsule for your DUT.

So here you can see the driver is here, okay, for the slave and monitor is here for the slave. So we have a master and slave architecture and hence we have 2 different monitors and 2 different drivers. And finally, we have a scoreboard and coverage routine here in this block diagram. Now you have seen the block diagram, which is very much familiar to which we have already discussed . Now we will go down further and here you can see the list of components.


So there is sequence item, sequencer2. We have driver. We have monitor. We have scoreboard. We have environment.

We have test and testbenchtop. And we have environmental config and test config. Okay. All these are there. You can read by yourself and understand.

Now you will have a real VIP in your hand. You can investigate in detail. How we will investigate? Do the up scroll again and here you can see there is a lot of code. Those are uploaded in GitHub and here you can see. Let's see the monitor code. Once I click, it will open a pin like this. It's similar to an ID you have in the left hand side the files and right hand side each file open in the port viewing window. So here you can see that the monitor has several amount of port. This is in either SV or UVM.

Generally, you can see here it's in UVM. So you have the UVM component here. And you can investigate this code by yourself. For that, at least, you have to have the knowledge of systemverilog and UVM.


Next, we have the master part. We have our the silver log code here for the master, and I think somewhere could be slave. Here it is the slave. This is the code for the slave. 
And here you can see it is very much protocol centric and that's why we have its division like this, xi slave. And here is the code of the slave and let me show you the driver. 




So here, you can see the driver will contain different, data width and etc , and it will have sequence items like this. So it will have task and functions to do the driving action. And if you go through the data, you will understand by the comments.



Now let us let me take you to the interface which, was there at the bottom of the infographics of this particular VIP. Our interface is nothing but the port connection. So in SV, we have a method called interface where we define the different ports. Here logic means it is available in only in SV not in Verilog. So guys you need to have the knowledge of SV to understand.



And here you can see all the connections are explicitly mentioned. That means if you instantiate this particular module, axi_ intf,   instantiate it, that means you can use it in a very macro way of connecting different plug and play manner, different ports, you will be doing the plug and play with this interface. So that is the beauty of the interface kind of codes in Systemverilog as well as in UVM . So we have all the codes and this is the test bench top from where we are pulling the control. And here you can see it's very simple.


We have the interface instantiation and we have the clock. We scroll down. We have the configuration here. And all the things will be in very simple manner at the top level. And here you can see, we have included the EXI package, the header file and the SV file.  And here is the test. So this is the detailed test thing. We have all the routines here for testing. And here is the code of the sequencer for the right one. There is a sequencer. So you can go up to this and have a detailed view what are there in the sequencer. 



Watch the video lecture here: 

Courtesy: Image by www.pngegg.com




7/05/2024

How To Get Started In VLSI as a beginner ?



Getting started in VLSI (Very Large Scale Integration) can be an exciting and challenging journey. Here are some steps you can take to get started:

1. Acquire basic knowledge: Start by learning the basic concepts of digital electronics and computer architecture. It will be helpful to have a strong foundation in electronics, digital systems, and integrated circuits. You can take courses in electrical engineering or computer science or read books on these topics.

Get the VLSI fundamentals : Click Here 

2. Learn a HDL & Linux-Basics : Learn one of the hardware description languages (HDLs), such as Verilog or VHDL, that are used to describe digital systems. HDLs are used to design and simulate digital circuits and are essential in VLSI design.

You can start with Verilog : Click Here 

Learn Linux basics : Click Here 

3. Learn programming languages: Familiarize yourself with programming languages such as C,TCL PERL, BASH, and Python. These languages are commonly used in VLSI design and simulation.

Some of the Self-Learning(Free) Turtorials for you

TCL : Click Here

PERL : Click Here

BASH : Click Here

Python : Click Here

4. Practice with design tools: Familiarize yourself with the design tools used in VLSI, such as Cadence, Synopsys, or Mentor Graphics. You can use these tools to create and simulate digital circuits. These are commercial tools and available in companies or learning version available in registered VLSI training institute. If you need to learn such tool join some certification course.

There are many free or open-source tools available : 

1. Vivado (Installation: Click Here ), 

2. Electric VLSI Design System, 

3. Icarus-Verilog (Installation : Click Here ), 

4. Magic, 

5. NGSPICE (Installation : Click Here

6.  OpenTimer (Installtion : Click Here ).


4. Join a VLSI design course: Consider enrolling in a VLSI design course, either online or at a university. This will give you hands-on experience in designing, simulating, and testing digital circuits.

5. Join a community: Join a VLSI design community or forum, where you can interact with professionals in the field and get tips and advice on designing digital circuits.

Join this community (Telegram Group) : https://t.me/vlsichaps

6. Read research papers: Read research papers on VLSI design to keep up-to-date with the latest developments and techniques.

Watch this for furthur guidance :  https://youtu.be/SIcpse82gsw

7. Practice, practice, practice: Finally, practice designing digital circuits on your own, starting with simple circuits and working your way up to more complex systems. The more you practice, the better you will become.

Overall, getting started in VLSI design requires a strong foundation in digital electronics and computer architecture, knowledge of HDLs, familiarity with design tools, practical experience through courses and design projects, and a commitment to continuous learning and practice.


Courtesy : Image by www.pexels.com

7/01/2024

What Is Standard Cell Characterization?

 



This article provides a comprehensive guide to Standard Cell Characterization in VLSI design for beginners. The discussion begins with a concise overview of Standard Cell Characterization and highlights how Standard Cells are the building blocks of ASIC (Application-Specific Integrated Circuit) design. Then it moves on to explain the Standard Cell Design Flow and delves deeper into the concept of Handcrafted CMOS Layout. The types of cells present inside the library and variations in cell design, such as VT, Track, and Drive Strength, are also explored. The Front-End View Generation and Back-End View Generation processes are explained, along with the creation of Liberty files. The article also provides an overview of what happens in Characterization and the concept of a cell .lib Library (Liberty). Finally, a summary is provided, outlining the key points covered in the discussion.


Standard Cells : Building Block of ASIC


Standard cells are the building blocks of ASIC designs. We can understand this with a simple analogy shown in above figure of Lego blocks kids loves to play with. A  chip is built with different types of IPS and majority of them are standard cell IPs.

Standard Cell Libraries are required by all tools used in the ASIC Design RTL-to- GDS flow. It contains primitive cells as well as complex cells too. Standard Cells are designed by Variation of Power-Performance-Area (PPA). For each cell a variety of drive strengths are present. Inverters and Buffers have much Larger drive strengths varieties than any outer cells. Cells contains balanced rise and fall delays. Cell with delay variation present to aid fixing of STA violations. Standard Cell heights(dimensions) are denoted by Track. Variation may be 7T , 11T etc. The distance between two consecutive tracks is called the Pitch.


Standard Cell Design Flow:



This is the design flow of standard cell. The flow starts with specification/requirement. It includes all the information which is necessary for the standard cell. After that either schematic or RTL design is done. Schematic is for analog standard cell and RTL design is for digital standard cell. Next we do the CDL or RTL simulation to get the electrical characteristics. The SPICE-OUT step is for schematic or CDL to get the spice netlist. After that HSPICE/Spectre/Eldo simulation is done on spice netlist. 
After frontend, backend flow starts and layout is drawn using Virtuoso. Then we do DRC/LVS , RC Extraction and Physical verification. In Physical Verification we go for antenna checks , EM checks etc. Once all these checks are done we can move to characterization and standard cell delivery.



Handcrafted CMOS Layout :



For PMOS there is N-well and P -select whereas for NMOS there is N-select. Inside N and P select  diffusion layers and Source and Drain contacts are created as shown above. Before creating contacts all the steps involved are in FEOL. After that Poly gates are created and connected with each other see Fig 4b. At final steps  Vdd, GND and Vout connections are created see Fig 4c. Connection between transistors , Source or  Drain contact creation and Poly gate creation and connection all are included in MEOL. Any step  beyond output contact creation is included in BEOL.



Types of Cells Inside The Library:


A standard cell library usually have Basic logic gates such as AND/OR/NAND, Half adder/Full adder, Multiplexer, ECO cells (specially used for ECO or Electronic Change Order), Tie Cells, AOI/AND-OR Inverter, OAI/ OR-AND Inverter, Flip flops, Scan Flops, Latches, Filler Cells, Tap Cells, End Cap Cells, D -Cap Cells, Clock cells.


Cell Design Variation : VT




Threshold voltage is varied in standard cell design. Ultra low Vt, Low Vt, Standard Vt, High Vt and Ultra High Vt are some Vt variations.



Cell Design Variation : Track


Track or height is another parameters that is varied in standard cells. According to ascending order of cell  12.5T> 10.5T> 9T, 7.5T, 6T. More the track number the height of the cell is higher and area of the cell is bigger.  
 

Cell Design Variation : Drive Strength


Drive strength or drivability of a cell is the third parameter to vary. Drivability means how many output a particular standard cell can drive. 

Front-End View Generation : 


Some front end views available in standard cell library are RTL views like Verilog, VHDL,  SystemVerilog, DB, SDB, SLDB,UPF,CPF,OA etc. 


Back-End View Generation : 


Layout views, Mapping File, NDM, GDSII, LEF, DEF, DB OASIS, CIF, Abstract View are some backend views present in standard cell distribution. Physical verification related views and tool related views could be there.

How Liberty File is Created ?




Liberty file is used in timing analysis. standard cell library is passed through the .lib characterization process and finally we get a ASCII file with .lib extension. This file is in liberty format.



What Happens in Characterization?




We have our basic gates and our netlist which are passed on to characterization engine.  There is a TCL config file which contains all setup info and list of runs need to take place. Then the engione is connected to LSF/UGE, these are load sharing facility runs with token assigned to teams. Since these runs requires lot of memories , high memory machines are used and LSF knows which one to select. Its a setup abvailable in companies and system people takes care of it. Engineers just need to understand how to properly use it. From LSF the whole process launches multiple spice simulation and cumulative result comes back.After that we proceed towards model generation. NLDM, CCS, ECSM, OCV are some common model that is generated. All these data are put in the ASCII format in .lib file. 

Cell .lib Library (Liberty):

Timing Engine Reads a set of Cell Library files (.lib).The .lib file is a text file containing timing and power parameters  associated with any standard cell for a given technology node. It contain the data for all standard cells available to the design in the specified technology node. So , each instance in the verilog/vhdl/systemverilog netlist must have a corresponding cell found in the .lib library. The .lib file contains pre-characterized timing models and data to calculate I/O Delay paths , Timing Check Values & Interconnect Delays,To compensate the PVT and OCV variation de-rating factors are also included.


Summary :

Standard Cell Library is a collection of Basic as Well As Advanced Cells. Standard Cells will contain Consolidated Timing Library (.lib) for all the cells . This is the major product of the Characterization. One particular cell will have multiple views and variations based on parameters like Track/VT/Drive Strength. Standard Cells are the biggest IP collection by volume among all Foundation IPs. Hence its characterization is also cumbersome and time consuming. Without the Characterized Standard Cell Library the Digital VLSI SOC Design is impossible !


Watch the video lecture here:


Courtesy: Image by www.pngegg.com

6/22/2024

Understanding Filler Cells in VLSI: A Comprehensive Guide



In this article, we delve into the world of VLSI and explore the concept of filler cells. We discuss the purpose and importance of these cells in the design process and their impact on the overall performance of the circuit. Whether you're a beginner or an experienced engineer, this comprehensive guide will provide valuable insights into the role of filler cells in VLSI.


Standard Cell Layout:

The Standard Cells are of equal heights (a.k.a Track) but width may vary. Standard Cells are placed in rows with cells butting against each other. This ensures continuous wells across the entire row.  This helps in fabrication and mask design of the design. Such arrangement allows one to run common power lines such as power(VDD) and ground(GND/VSS) 

through the cell array. This ensures VDD/GND rails (follow pins) are fully connected. A desired design is implemented by picking up necessary standard cell from different such rows/column and interconnecting them as per the target functionality.

Standard Cell Layout & Filler Cell:

This tiled structure of standard cells are optimized for area. However Standard cell placement never reaches 100% utilization. Any blank space in the tile structure is filled up with a special cell called Filler Cell or DeCap Cells from the Standard Cell Library. This ensure proper GDS layers to pass DRC and sufficient diffusion and poly densities. The Filler Cell is a empty cell with power and ground rails. Empty means that cell has no functionality. It has physical exsistance that is it has physical layers like diffusion layer etc. Total effective area is calculated by subtracting the sum of all filler cell area from total area of Standard Cells Tile Structure. Chip Finishing for SignOff includes, at the very least: Insertion of Fillers and DeCaps.


Design Automation for Placing Filler Cell :

Filler Cells are inserted after all cells have been placed and the confidence factor of a design in meeting timing is high. Filler cells will fill in all row spaces, which remain open. The PnR Tools in VLSI generally accepts TCL Scripting for automation. The TCL script traverses the standard cell row (from left to right), it checks the adjacent cell edges. If the edges match, the TCL script moves to the next cell. However, if the edges do not match, the script checks if the opposite side of the right cell matches the current cell edge. If it does, the script flips the cell and continues. If neither sides match, then a filler cell is placed in between the cells, to ensure that design rules are satisfied. As power rails (horizontal power lines) are usually built into the standard cells as feed- through. Leaving any space in the row would result in a break in the power line.


Physical Design Aspect :

A set of physical-only cells (without any boolean function) in the form of fillers and decap cells are provided in standard cell distribution as they are required during digital implementation. Fillers Cells are important as they connect the active implants (n+ and p+), as well as n-wells and power rails throughout an entire row. Fillers should come in various distinct widths, where the width is an integer multiple of the metal one routing pitch (track). Fillers consist of dummy polysilicon and diffusion area, which improves density. Some filler cells may include the well-taps which aid in lowering the substrate resistance. Both Tie-low and Tie-high filler cells are provided to avoid direct connections (ESD prevention) to power and ground rails when there is need for a constant input. Finally some decap cells are provided to help mitigate IR drop issues during digital implementation.


Physical Verification (DRC) Aspect:

Commercial P&R tools apparently fix the minimum implant area (MinIA) violations by inserting filler cells at the final design stage. For example, One commercial tool has a utility to define an implant layer group for filler cells, so that each narrow cell can be padded filler cells having the same implant type. Another commercial tool checks and fixes implant area violations according to the rules specified in LEF, during placement and filler cell insertion. Another commercial tool offers a Voltage-threshold-aware filler cell insertion flow according to which users can define the Vt filler cells to be inserted between different Vt regions. For example, users can insert NVT filler cells between NVT and HVT cells, and LVT filler cells between LVT and NVT cells.


Watch video lecture here:

Courtesy : Image by www.pngegg.com



3/02/2024

Reliability issues and IC failure in VLSI




Developing IC is a time, labor,research and money extensive process. There are some physical and electrical reasons, which  can spoil the whole effort we put into. Therefore we must understand and analyse what reasons can fail this development process. In this article we will discuss about reliability issues in CMOS and try to understand reasons that leads to IC failure. This informative article meticulously examines various pivotal aspects surrounding the reliability of VLSI CMOS technology.

 

What is Reliability?

Reliability means how likely it is that a product/system/service will work well for a certain amount of time or under specific conditions without any issues.



In simpler terms, reliability is like:

i. Probability of success , 

ii. Durability , 

iii. Dependability

iv. Quality over time , 

v. Availability to do its job


Failure is deviation from compliance with the system specification for a given 
period of time. Failures can happen for different types of faults. Reasons might be design bugs, manufacturing defects, wear out of oxide or interconnect, external disturbances or intentional manhandling of a product. Although not all faults lead to errors.  A number of physical failure mechanisms that can affect the reliability of a CMOS ASIC. 


Yield and Reliability are two of the most important aspects for the development of new technology. Designing a reliable CMOS chips involves understanding and addressing of the potential reasons of failure.

Reliability Factors – I & II

If a device is used under the wrong use conditions, a failure may occur. Reliability of a device depends on how much stress it can handle. Some factors related to failure are :

i. Electric load, ii. Temperature, iii. Humidity, iv. Mechanical Stress, v. Static Electricity, vi. Effect of Repeated Stress

(i) Electric Load :

- Operation conditions determines the life of semiconductor devices.

- Electric power cause a rise of the junction temperature which might lead to device failure. The electric current should be lowered as far as possible.

- It is necessary to handle the surge current that flows when the switch is turned on or off and the surge voltage of inductive (L) load so that they do not exceed the maximum rated values.

(ii) Temperature :

-Temperature affects the life of semiconductor products. A rapid or gradual change results in deterioration of characteristics leading to device malfunction.

- The relation between the life “L” and temperature “T” :

- The life will be shortened if temperature rises.

- A ventilation device or heat radiation device used to avoid overheating issue.

(iii) Humidity:

- Usually IC chips are covered with surface protective film to protect from humidity. If a device is operated under severe humidity conditions, it should be operated particularly carefully.

(iv) Mechanical Stress :

- If the device is strongly vibrated during transportation, or if an extremely strong force is applied to a device during installation, the device may be directly, mechanically damaged. In addition, moisture or a contaminant may enter the device through the damaged area, and may cause deterioration of the device.

(v)  Static electricity:

- Electrostatic charge damages the equipment. Equipment incorporating devices is often charged with static electricity. In some cases, an Recently, plastic is generally used for the casing and the structure of equipment.

- Human bodies can be also charged with static electricity.

- While handling semiconductor devices, it is necessary to take static charge preventive measures

- This issues became more serious as device dimensions are aggressively scaled and operating frequencies becoming higher.

(v)  Effect of repeated stress :

- If a stress is repeatedly applied it might be stronger than steady stress.

- A high-low temperature cycle and intermittent internal heat generation cycle can apply stresses repeatedly. The effects of such cycles, such as rearrangement of the material structure and fatigue deterioration of resistance to distortion,are examined and utilized for evaluation of failures.


Failure Mechanisms  

(a) Time Dependent Dielectric Breakdown (TDDB):



Gate oxide thickness has reduced with technology nodes. Electric field across Tox is getting ever stronger. Oxide film breakage is caused by : (i) an initial defect , (ii) deterioration of the oxide film. Initial defect leads to an early failure. Deterioration of the oxide film leads to long-term reliability failure. Oxide layer breaks down if applied electric field exceeds dielectric breakdown withstand voltage. Even if electric field with lower value is applied for linger period of time may also cause breakage as time elapses. This type of breakage is referred to as a time dependent dielectric breakdown (TDDB). An empirical formula expresses the TDDB life :

t: Life in practical use (h) ; 

tt: Life in test (h) ; β: Electric field acceleration factor; 

E: Electric field strength in practical use (MV/cm)

Et: Electric field strength in test (MV/cm); Ea: Activation energy (eV) ;

k: Boltzmann constant (eV/K) ; 

T: Temperature for actual use (K) ;

Tt: Test temperature (K)

Effective methods to prevent these failures are:

(i) optimising the process in order to minimise variability.

(ii)formation an oxide film with less defects,

(iii) screening by use of high electric field during inspection/burn-in.


(b)Negative Bias Temperature Instability (NBTI) :

Four types of electric charge exist in gate oxide films:

(1) Mobile ionic charge Qm , (2) Fixed oxide charge,

(3) Interface trapped charge Qit , (4) Oxide trapped charge Qot

NBTI is an increase in the absolute threshold voltage, a degradation of the mobility, drain current and transconductance of p – MOSFETs at either negative Vg or elevated temperatures.  A stronger and faster NBTI effect is produced by their combined action.  Such fields and temperatures are typically encountered during burn in and during routine operation in high-performance ICs.

Si has 4 valence electrons ==>  At the surface of the silicon crystal atoms are missing and traps are formed. The density of these interface states Dit. After oxidation most interface states are saturated with oxygen atoms, interface quality improves. To reduce the number of dangling valence bonds further, surface is annealed with forming gas (mixture of Hydrogen and Nitrogen). The dangling silicon bonds are passivated by forming Si-H bonds. The number of electrically active interface states can be reduced to acceptable range.


These Si-H bonds have lower binding energy. Elevated temperature and high electric fields break these bonds and interface states reactivated. The exact properties of the interface defects, which are trivalent silicon atoms with one unpaired valence electron depends on the exact atomic configuration and on the orientation of the substrate. Holes interact with Si-H bond and weaken Si-H bond  At elevated temperature, the Si-H bonds dissociate : Si 3 ≡ SiH + h + → Si 3 ≡ Si • + H +

The effect of bias temperature instability can be observed in both, p-channel and n-channel MOSFETs. However p-channel MOSFETs with negative Vg stress are more susceptible to this kind of degradation. It has been reported that for NBTI degradation, channel cold holes are important. As the n-channel MOSFET biased into accumulation also has holes at the surface of the substrate, the threshold voltage shift should be similar to p-channel MOSFETs. Therefore, the lack of holes can not be the cause for the different degradation behavior.

Impact of NBTI on Circuits :

(i) Occurs primarily in p-channel MOSFETs with negative gate voltage bias and is negligible for positive gate voltage.

(ii) Usually occur during the “high” state of p-MOSFETs inverter operation.

(iii) Leads to timing shifts and potential circuit failure due to increased spreads in signal arrival in logic circuits.

(iv) Asymmetric degradation in timing paths can lead to non-functionality of sensitive logic circuits product field failures.


(c)Hot Carrier Injection (HCI):






Hot carrier injection is one of the most significant problems regarding reliability of state-of-art MOSFET. It is difficult to reduce the power supply voltage. For DSM and nano devices electric field strength is increasing. Hot carrier is a generic name for high-energy hot electrons and holes generated in the transistor. Hot carriers injection into a gate oxide film generate the interface state and fixed charge, and finally deteriorate the Vt and Gm of the MOSFET. As the Vt of the FET is increased, the circuit operation will become slow, and will finally operate abnormally. Hot Carrier is easily generated when the Vg < Vd/2 . When Vd> Vg, the carriers present in the channel will impact the Si crystal lattice and generate pairs of a hot electron and a hot hole (Impact Ionization). These pairs function as hot carriers. Hot carriers under strong Vd gain enough energy to break the barrier of Si/SiO2 inrterface and go through the gate oxide into the gate. As a result, either the gate oxide film is charged, or Si/SiO2 interface is damaged.


This lead to change is transistor characteristic. Generation mechanism : Channel hot electrons (CHE), Avalanche hot carriers (AHC), Substrate hot electrons (SHE). 

AHC shows remarkable change when devices are miniaturized.


(d) Soft Error : 

A very small amount of radioactive elements (U, Th etc.) are present in the package material. Abnormal operation of devices is caused by α particles radiated from that radioactive element. This problem is referred to as a soft error. This abnormal operation is temporary. So writing data again can restart normal operation. This problem is more dominant in advance node devices. In this dimension , the electric charge of signals handled in the devices is lowered. The electric charge of the noise generated by α particles that are radiated in the chip has a large impact that cannot be ignored.  The α particles are generated at the cell capacitor that stores 1-bit data (1 bit = minimum data unit of dynamic RAM).  The α particles generates electron-hole pairs in the substrate. The α particles loose their energies in generating the e-h pairs. The electrons generated in this process can invert the data of the cell capacitor. A cell capacitor is considered “L” if electrons exist and considered “H” if electron do not exist. If electrons are generated in the cell capacitor by α particles, data “H”will be inverted to data “L”. This is referred to as a soft error in the memory cell mode. Soft errors affect memories, registers, and combinational logic. Memories use error detecting and correcting codes to tolerate soft errors, so these errors rarely turn into failures in a well-designed system.

The cell capacitor data is read out to the bit line by diffusion, and then compared with the reference potential. If electrons generated by α particles flow into the bit line, the potential of the read out data or the reference potential may be lowered. If the data potential is lowered, data will be inverted from “H” to “L”. If the reference potential is lowered, the data will be inverted from “L” to “H”. This is referred to as a soft error in the bit line mode.  If the operation cycle (cycle time) of the dynamic RAM is shortened, the reference potential will be compared with the data potential more frequently. As a result, soft errors in the bit line mode will be increased. On the  other hand, change in the cycle time will not affect the soft errors in the memory cell mode. 

 Prevention of Soft Error :

(i) to use package material that contains less radioactive elements (α particle generative source).

(ii) to prevent α particles from entering the chip by coating organic material on the chip.

(iii) improvement of the bit line structure using wire materials of Al, poly-Si, etc., improvement of the sense amplifier, adoption of the return bit line etc.

(e) Electromigration (EM)


A chip may go above 100 Degree Celsius during practical operation. High frequency power loss & consequent heat dissipation contributes in increased temperature. Rise in temperature enhances solid-state metal ion diffusion. Electromigration is caused by scattering of the moving electrons with the ions, i.e., by momentum transfer between electrons and ions in metal interconnects. This ion-electron interaction is sometimes referred to as "electron wind.” This causes the wire to break or to short circuit to another wire.   Such situation void in interconnects can leads to open circuit i.e chip failure.

EM is one of the most menacing and persistent threat to interconnect reliability. Mean time to failure due to electromigration:


MTTF : Mean time to failure (h) , A : Constant of wire, J : Current density (A/cm2), n : Constant ,  Ea : Activation energy (eV) , k : Boltzmann constant (eV/K), T : Absolute temperature of wire (K)

The following factors can reduce the failures caused by electromigration:

a) Crystal structure (grain diameter, crystal orientation, etc.)

b) Addition of other elements to metal film

c) Laminated wiring structure

Electromigration depends on the current density J = I/wt. It is more likely to occur for wires carrying a DC current where the electron wind blows in a constant direction than for those with bidirectional currents.


(f) Self Heating:

Bidirectional wires are less prone to EM.  Although their current density contributes in by self-heating. High currents dissipate power in the wire. Since surrounding oxide or low-k dielectric is a thermal insulator, the wire temperature can become significantly greater than the underlying substrate. Hot wires exhibit greater resistance and delay.  EM is also highly sensitive to temperature, so self-heating may cause temperature-induced electromigration problems in the bidirectional wires. Brief pulses of high peak currents may even melt the interconnect. A significant percentage of the device self-heat energy flows vertically and laterally to interconnect layer. The local temperature rise depends upon the thermal dissipation path(s) away from the heat energy originating element. Self-heating is dependent on the RMS current density. A conservative rule to control reliability problems with self-heating is to keep Jrms < 15 mA/Rm2 for bidirectional aluminum wires on a silicon substrate.

The maximum capacitance of the wire can be estimated based on the RMS current. EM from high DC current densities is primarily a problem in power  and ground lines. Self-heating limits the RMS current density in bidirectional signal lines.

(g) Stress Migration :



Stress migration/stress-induced voiding (SIV) is wear out failure mechanisms in chip metallization. It causes an open circuit in the metal interconnects, especially at the via, since it is the weakest link. SM is caused by the interaction between the thermo-mechanical stress in the interconnect system and the diffusion of vacancies. The existence of thermal stress in the interconnect is caused by thermal expansion mismatch between the metal and the surrounding materials. The BEOL interconnect structure consists of several different materials like metal, dielectric, diffusion barrier, silicon substrate and capping layer. Fabrication of the structure involves several  thermal cycles from room temperature to about 400°C, a large amount of stress can be introduced due to the thermal expansion mismatch among these materials. Metal expands due to heating and then contracts during the cooling process although unable to retract to the original, since the metal is constrained by other material. As result, there is a tensile stress in metal layer. Metal atoms moves to balance stress condition, thus void is created. Void in metallization tends to nucleate and grow around the vias and blocks the flow of electrical current due to open ckt condition.

(h)CMOS Latchup :


A latch-up is a destructive short circuit phenomenon to the CMOS Structure. It can be defined as a low resistance path between voltage levels. It is caused by low-impedance path between the power supply rails of a MOSFET circuit through PNPN parasitic structure underneath. The circuit function is disrupted by latchup and currents are frequently large enough to cause permanent damage. The parasitic PNPN structure resembles and equivalent to Silicon Controlled Rectifier (SCR) structure. A PNPN structure which created by a PNP and an NPN transistor stacked next to each other. 

Immediately after latch up trigger, one of the transistors starts conducting and the other one begins follows it by start conducting.  They both stay in saturation for as long as the structure is forward-biased and some current flows through it.

(i) Electrostatic Discharge:



ESD is the release of stored static electricity. The most famous ESD (large scale) is lightning. ESD event that take place in chip is not visible.  ESD destroys about 20 % of electronic components before they are installed into a  system. ESD may only damage a component but it leads to further subsequent damages within a brief time during circuit operation. A person can acquire charges by simply walking across a room. When such a charged person/object then approaches an IC, an ESD event occurs, characterized by a high current within a few ns. A high current density and/or electric filed can damage conductor, semiconductor and insulator in an IC. Electrostatic straps are used in industry to protect from ESD where electronics circuits are packaged and assembeled. The circuit present inside the IC, will tend to be partially damaged or might breakdown when this high voltage pulse enters. When we buy different semiconductor component for computers we get it in a dark grey package , that is a external protection for ESD event. ESD in an IC is usually start with the oxide breakdown which result in percussion path. The high current density damages the semiconductor devices through thin-film fusing, filamentation, and junction spiking. The high electric field, on the other hand,can cause failure through dielectric breakdown or charge injection.

Watch the video lecture here:



Courtesy : Image by www.pngegg.com, www.pexels.com, www.pixabay.com