6.0 Advanced Scripting Constructs
While pattern-action rules are the foundation of AWK, its utility extends into true scripting through a set of advanced constructs. The strategic importance of these features—built-in variables, control flow, arrays, and functions—is that they elevate AWK from a simple filter into a complete programming environment capable of managing state, implementing complex logic, and building reusable, modular code.
6.1 Managing State with Built-in Variables
AWK provides a rich set of built-in variables that are automatically managed by the runtime environment. These variables are essential tools that give a script context about the input data, such as the current line number (NR) or the number of fields on that line (NF). They also allow for fine-grained control over how data is processed and formatted by modifying variables like the field separator (FS) and output field separator (OFS).
Two categories of built-in variables exist: standard variables available in all AWK versions and specific extensions found in GNU AWK (GAWK).
Table 1: Standard Built-in Variables
| Variable | Description |
| ARGC | The number of command-line arguments. |
| ARGV | An array containing the command-line arguments. |
| FILENAME | The name of the current input file. |
| FS | The input field separator (default is a space). Ex: awk ‘BEGIN {print “FS = ” FS}’ |
| NF | The number of fields in the current input record. Ex: echo “One Two Three” | awk ‘NF > 2’ |
| NR | The total number of input records processed so far. Ex: echo “One\nTwo\nThree” | awk ‘NR < 3’ |
| OFS | The output field separator (default is a space). |
| ORS | The output record separator (default is a newline). |
| RS | The input record separator (default is a newline). |
| $0 | The entire current input record. |
| $n | The nth field in the current record (e.g., $1, $2). |
Table 2: GNU AWK (GAWK) Specific Variables
| Variable | Description |
| ARGIND | The index in ARGV of the current file being processed. |
| ERRNO | A string describing an error if a redirection or close() fails. |
| IGNORECASE | If set to a non-zero value, enables case-insensitive pattern matching. |
| PROCINFO | An associative array containing information about the running process (e.g., its PID). |
6.2 Building Logic with Control Flow and Loops
Beyond simple pattern-action rules, AWK incorporates standard programming control structures that enable the development of complex logic. These constructs allow scripts to make decisions, execute code conditionally, and repeat actions, turning AWK into a more complete programming environment.
Conditional Statements
- if statement: Executes an action if a condition is true.
- Syntax: if (condition) action
- Example: awk ‘BEGIN {num = 10; if (num % 2 == 0) printf “%d is even number.\n”, num }’
- if-else statement: Executes one action if a condition is true and another if it is false.
- Syntax: if (condition) action-1 else action-2
- Example: awk ‘BEGIN { num = 11; if (num % 2 == 0) printf “%d is even.\n”, num; else printf “%d is odd.\n”, num }’
- if-else-if ladder: Chains multiple conditions to handle various cases.
- Example: awk ‘BEGIN { a = 30; if (a==10) print “a = 10”; else if (a == 20) print “a = 20”; else if (a == 30) print “a = 30”; }’
Looping Constructs
- for loop: Repeats an action a specific number of times.
- Syntax: for (initialization; condition; increment/decrement) action
- Example: awk ‘BEGIN { for (i = 1; i <= 5; ++i) print i }’
- while loop: Repeats an action as long as a condition remains true.
- Syntax: while (condition) action
- Example: awk ‘BEGIN {i = 1; while (i < 6) { print i; ++i } }’
- do-while loop: Similar to while, but the condition is checked at the end, guaranteeing the action executes at least once.
- Syntax: do action while (condition)
- Example: awk ‘BEGIN {i = 1; do { print i; ++i } while (i < 6) }’
Loop Control and Script Termination
- break: Immediately terminates the innermost loop.
- Example: awk ‘BEGIN { sum = 0; for (i = 0; i < 20; ++i) { sum += i; if (sum > 50) break; } }’
- continue: Skips the remainder of the current loop iteration and proceeds to the next one.
- Example: awk ‘BEGIN { for (i = 1; i <= 20; ++i) { if (i % 2 == 0) print i ; else continue } }’
- exit: Terminates the entire AWK script, optionally returning a status code.
- Example: awk ‘BEGIN { sum = 0; for (i = 0; i < 20; ++i) { sum += i; if (sum > 50) exit(10); } }’
6.3 Using Associative Arrays for Data Aggregation
AWK includes a powerful data structure known as an associative array. Unlike traditional arrays that use sequential integers as indices, AWK’s arrays can use strings. This single feature is what elevates AWK from a simple line processor to a powerful data aggregation engine, allowing it to compete with more verbose scripting languages for tasks like counting word frequencies, summarizing logs by IP address, or grouping data by category.
The syntax for creating an array element is simple assignment: array_name[index] = value
The following example creates an array named fruits where the keys are fruit names and the values are their colors.
[jerry]$ awk ‘BEGIN {
fruits[“mango”] = “yellow”;
fruits[“orange”] = “orange”;
print fruits[“orange”] “\n” fruits[“mango”];
}’
Output:
orange
yellow
To remove an element from an array, the delete statement is used:
[jerry]$ awk ‘BEGIN {
fruits[“mango”] = “yellow”;
fruits[“orange”] = “orange”;
delete fruits[“orange”];
print fruits[“orange”];
}’
This command produces no output because the “orange” element was removed before the print statement.
While AWK only supports one-dimensional arrays, multi-dimensional arrays can be simulated by concatenating indices into a single string. For instance, to simulate array[0][0], one can use a string index like array[“0,0”].
6.4 Creating Modular Code with Functions
AWK supports both built-in and user-defined functions, providing a mechanism for creating reusable and modular scripts. This is essential for managing complexity in larger programs and promoting code that is easier to read, test, and maintain. AWK provides numerous built-in functions categorized into groups for Arithmetic, String, Time, and Bit Manipulation operations.
For custom logic, users can define their own functions.
- General Syntax: function function_name(arguments) { body }
The following example, stored in a file named functions.awk, defines functions to find the minimum and maximum of two numbers and calls them from a main function executed within the BEGIN block.
File: functions.awk
# Returns minimum number
function find_min(num1, num2){
if (num1 < num2) return num1
return num2
}
# Returns maximum number
function find_max(num1, num2){
if (num1 > num2) return num1
return num2
}
# Main function
function main(num1, num2){
# Find minimum number
result = find_min(10, 20)
print “Minimum =”, result
# Find maximum number
result = find_max(10, 20)
print “Maximum =”, result
}
# Script execution starts here
BEGIN {
main(10, 20)
}
When executed, this script produces:
Minimum = 10
Maximum = 20
Once the core logic for processing data is in place, the final step is to manage how the results are presented and delivered.