1/20/2022

Regular Expressions (RegEx) in PERL


In this article, we dive into the essentials of Regular Expressions (RegEx) in PERL, exploring real-world applications in VLSI with two examples. We cover string operations, introduce RegEx fundamentals, and explain the role of meta-characters and meta-symbols. Key topics include pattern matching with five examples, using non-default delimiters, extracting matched groups, and applying the split function with RegEx. We also highlight similarities between PERL and SED in search-and-replace functionality, showcasing RegEx's versatility and power.


Here we will learn the Regular Expression or RegEx in PERL in the below sequence :

  • Application of RegEx in VLSI
  • String Operation in PERL
  • Meta-Character & Meta-Symbols in RegEx
  • Pattern Matching through RegEx in PERL
  • Non-Default Delimiters
  • Matched Group & It’s Extraction
  • Split Function & RegEx in PERL
  • Search & Replace in PERL similar to SED

RegEx & Parsing of Files in EDA Automation :

The Regular Expression (RegEx) is the most important feature of PERL . RegEx has its importance and influence over the data processing/post-processing and EDA Automation in managing VLSI Tool Run. PERL and RegEx together are used to maintain the right sequence of VLSI Tool run in the RTL ot GDSII Flow. This section is very important to you if you are going for any VLSI Job interview which have PERL in the Job Description.

We will explain further through the below info-graphics :




Parsing of the Tool Output File :

When we run a tool in VLSI , we generally get a output file with lots of text data in it , as shown in the left-most block of the above image. Generally these output file may contain resistor, capacitor, coupling-capacitor, nets and lots of other important details specific to the design-construction.  The output may vary at every design sub-stage. At the same design sub-stage the output file will vary in internal formatting and/or sequencing from one tool to another tool. Now there may be 1000-10000 entries with specific set of data as mentioned above. In addition these files may contain several error/warning messages and comments. These are very difficult to process by hand. Hence the PERL File I/O and RegEx are used together to parse the data and filter-out the required fields that you need.

Storing the parsed and filtered Data in the Data-Structure:

The file is processed line by line through File I/O. Then RegEx matching is done . The filtered the data is stored in the convenient PERL Data Structure. These data structure may be a simple HASH or Array. In case of complex data organisation , you may have to use a Nested Data Structure such as Array of Hash or Hash of Array. This is shown in the middle block of the above info-graphics. Remember after the data population is done , you must check the sanctity of the data through the Data::Dumper module. 

Making A Hard Copy of the Parsed Data :

After the Sanctity is checked of the parsed data, It is a wise option to dump the data in to a file in the disk. Generally this output files are either CSV or EXCEL file. This is shown in the third block of the above info-graphics. Although , in some cases these might XML file too. The CSV or the Excel file later can be opened and analysed through MS-Excel or Open-Office/Libre-Office Spreadsheet Applications. In these post processing, you may use sorting , using-threshold , custom equations etc in the spreadsheet . Also you may plot the data in the spread sheet in various forms ans shapes.

Bench-marking of  Tools in EDA Automation :

Bench-marking of Tool-A Vs Tool-B is done very frequently in VLSI Companies. This is due to be over sanguine about the outcome of the two tool runs.  

In MNCs similar tool from three major EDA Vendors are used in the design till sign-off. When any new tool is introduced by any of the Tool Vendor , it  also get bench-marked by the design house during its trial period.
In general reported error/warning/violations are filtered from the benchmark and analysed deeply. The odd-man-out are noted down from the bench-mark output. These are good to catch at the beginning.
When all the errors/warnings/violations are mapped among the multiple tools under benchmark exercise and there are no odd-man-out are left-out , the designers feel relieved and start their debug process.



In the above picture the two output files from Tool-A and Tool-B are populated into PERL Hash using the RegEx matching. 
Now we can iterate using the loops in PERL over the two Hashes side by side and match the Current/Voltage/Slew etc of any particular resistor between the output of  two tools.
Then we can save the compared output in disk using File I/O routines in TCL.

Strings in PERL :

  • Strings aren’t technically a separate data type i.e. they can be stored in scalars.
  • Single-quotes are the “standard” delimiters for strings : everything inside the single quotes is taken literally. 
  • Double-quoted strings are “interpolated” : any variable names found inside the string will be replaced by their value. 
  • To compare two strings, we use the string comparison operators: eq (equal), ne (not equal), lt (less than), gt (greater than), le (less or equal), and ge (greater or equal).

Examples of Strings in PERL  : 

my $animal = "Camel"; # pure string
my $sign = "I love $animal"; # string with interpolation
my $cost = 'It costs $100'; # string without interpolation
my $cwd = `pwd`; # string output from a command

Regular Expression (RegEx) in PERL  :

  • A regular expression is simply a pattern which implicitly generates a family of strings, expressed in a special notation. 
  • a*z  - This pattern generates possible family of strings : ‘a’, ‘az’, ‘aa’, ‘aaz’, ‘aaaaaaaaz’, etc. 
  • The matching operator, //, returns true if the string it is bound to matches the regular expression it contains.
  • There are two pattern-matching operators, which by default operate on the default scalar, $_ 
  • These are :
  • The matching operator, /Pattern/, returns true if the string matches the regular expression it contains.
  • The substitution operator, s/Pattern/Replacement/, replaces the Pattern with Replacement if match is found.  
  • Most characters in a regular expression simply represent themselves.
  • Except : \ | ( ) [ { ^ $ * + ? . 
  • These are called meta-characters and have special meanings. 

Meta-Characters And Their Meaning in PERL-RegEx :

^ Matches the beginning of a string
$ Matches the end of a string
. Matches any single character
* Matches any count (0-n) of the previous character
+ Matches any count, but at least 1 of the previous character
[...] Matches any character of a set of characters
[^...] Matches any character *NOT* a member of the set of characters following the ^.
(...) Groups a set of characters into a subSpec.
{m} Exactly m times
{m,} At least m times
{m,n} At least m but not more than n times


Meta-Symbols And Their Meaning in PERL-RegEx :

\d matches a digit, not just [0-9] but also digits from non-roman scripts
\s matches a whitespace character, the set [\ \t\r\n\f] and others
\w matches a word character (alphanumeric or '_'), not just [0-9a-zA-Z_] but also digits and characters from non-roman scripts
\D is a negated \d; it represents any other character than a digit, or [^\d]
\S is a negated \s; it represents any non-white-space character [^\s]
\W is a negated \w; it represents any non-word character [^\w]
The period '.' matches any character but "\n" (unless the modifier /s is in effect, as explained below).
\N, like the period, matches any character but "\n", but it does so regardless of whether the modifier /s is in effect.
Pattern Matching in PERL RegEx : Examples :
‘abcdef’ –Matches ‘abcdef’.
‘a*b’ –Matches zero or more ‘a’s followed by a single ‘b’. For example, ‘b’ or ‘aaaaab’.
‘a?b’ –Matches ‘b’ or ‘ab’.
‘a+b+’ –Matches one or more ‘a’s followed by one or more ‘b’s: ‘ab’ is the shortest  possible match, but other examples are ‘aaaab’ or ‘abbbbb’ or ‘aaaaaabbbbbbb’.
‘.*’ or ‘.+’ –These two both match all the characters in a string; however, the first matches every string (including the empty string), while the second matches only strings containing at least one character.
‘[a-zA-Z0-9]’ –This matches any ASCII letters or digits.

Examples and More Examples .... :
So far we have discussed over importance , rules and regulation of RegEx in PERL. Now let use dive into some of the examples to understand it.
Here is an examples of the a positive match case of PERL RegEx :


Here goes an example of the negative match case of PERL RegEx :



The below example tells you what to expect and what not to expect from PERL RegEx :

The below example tells you what is obvious and what not in PERL RegEx :

The below example tells you to how to combine alpha-numeric combinations in PERL RegEx :



Non-Default Delimiters :

The // default delimiters for a match can be changed to arbitrary delimiters by putting an 'm' out front:
"Hello World" =~ m{World};   # matches, note the matching '{}'
"/usr/bin/perl" =~ m"/perl"; # matches after '/usr/bin',
                             # '/' becomes an ordinary char

"Hello World" =~ m!World!;   # matches, delimited by '!'


Extracting Match Group :

  • The grouping meta-characters ()  allow the extraction of the parts of a string that matched. 
  • For each grouping, the part that matched inside goes into the special variables $1, $2, etc. 
  • They can be used just as ordinary variables, see the below code example :


The below example shows the group matching :

RegEx Split Function in PERL :

  • The split() function is another place where a RegEx is used.
  • It returns a list of values that don't match a given regex in a search string.
  • The /Pattrn/ works as a delimiter to split the string into smaller sub-string
  • These sub-strings are returned in form of an array.
Here is an short and sweet example of  Split Function :



RegEx Search and Replace in PERL :

  • Search and replace is performed using s/regex/replacement/modifiers.

  • The replacement is a Perl double-quoted string that replaces in the string whatever is matched with the regex. 
  • The operator =~ is also used here to associate a string with s///. 
  • If matching against $_, the $_ =~ can be dropped.
  • If there is a match, s/// returns the number of substitutions made;
  • Otherwise it returns false. 
The below example shows the various cases of search and replace :


The entire article is well narrated in the below video :


 

Courtesy : Image by Jae Rue from Pixabay