5.0 The Power of Pattern Matching and Data Manipulation
AWK’s core strength lies in its ability to pair patterns with actions. This model provides the strategic framework for selectively processing data, making it an ideal tool for common sysadmin and developer tasks like log analysis, data transformation, and targeted information retrieval. By defining what a line should look like (pattern) and what to do with it (action), an engineer can construct powerful data filters with exceptionally concise syntax.
5.1 Practical Data Extraction and Filtering
The following examples use the marks.txt file to demonstrate key data manipulation techniques. These examples leverage AWK’s automatic field-splitting and built-in variables (like $0 for the entire line and $n for specific fields), which will be detailed further in Section 6.1.
Printing Specific Columns (Fields)
AWK automatically splits each input line into fields based on whitespace. These fields can be accessed using variables like $1, $2, $3, and so on. The following command extracts the third ($3) and fourth ($4) columns.
Command:
[jerry]$ awk ‘{print $3 “\t” $4}’ marks.txt
Output:
Physics 80
Maths 90
Biology 87
English 85
History 89
Filtering Lines by Pattern
A pattern can be a simple string literal enclosed in slashes (/ /). AWK will execute the associated action only on lines containing that string. The $0 variable represents the entire line.
Command:
[jerry]$ awk ‘/a/ {print $0}’ marks.txt
Output:
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
Combining Patterns with Column Selection
Patterns and column selection can be combined to extract specific data from matching lines. This command prints the third and fourth columns only from lines containing the letter “a”.
Command:
[jerry]$ awk ‘/a/ {print $3 “\t” $4}’ marks.txt
Output:
Maths 90
Biology 87
English 85
History 89
Conditional Processing
Patterns are not limited to string matching. Any expression that evaluates to true or false can serve as a pattern. This example uses the built-in length() function to print only lines that are longer than 18 characters.
Command:
[jerry]$ awk ‘length($0) > 18’ marks.txt
Output:
3) Shyam Biology 87
4) Kedar English 85
Counting Pattern Occurrences
A common operational task is to quantify log entries. For instance, to count the number of authentication failures in a log file, one can use a simple counter. A counter variable (cnt) is incremented in the Body block for every line matching the pattern /a/. The final count is printed in the END block after all lines have been processed.
Command:
[jerry]$ awk ‘/a/{++cnt} END {print “Count = “, cnt}’ marks.txt
Output:
Count = 4
5.2 Leveraging Regular Expressions for Complex Patterns
AWK’s pattern-matching capabilities are greatly enhanced by its native support for regular expressions (regex). This allows for the definition of complex and flexible patterns that go far beyond simple string matching.
Common Regular Expression Metacharacters in AWK
| Metacharacter | Description |
| . | Matches any single character except a newline. Ex: /f.n/ matches fin, fun, fan etc. |
| ^ | Matches the beginning of a line. Ex: /^The/ matches “There” and “Their” but not “this” or “Other”. |
| $ | Matches the end of a line. Ex: /n$/ matches “fun” and “fin”. |
| [ ] | Matches any single character within the brackets. Ex: /[CT]all/ matches “Call” and “Tall”. |
| [^ ] | Matches any single character not within the brackets. Ex: /[^CT]all/ matches “Ball”. |
| ` | ` |
| ? | Matches zero or one occurrence of the preceding character. Ex: /Colou?r/ matches “Color” and “Colour”. |
| * | Matches zero or more occurrences of the preceding character. Ex: /cat*/ matches “ca”, “cat”, and “catt”. |
| + | Matches one or more occurrences of the preceding character. Ex: /2+/ matches “22” and “222”. |
| ( ) | Groups expressions together. Ex: /Apple (Juice|Cake)/ matches “Apple Juice” or “Apple Cake”. |
These core features form the basis of AWK scripting, but the language also provides more advanced programming constructs for building sophisticated logic.