3.0 The AWK Programming Model: Read, Execute, Repeat
3.1. The Core Workflow
AWK’s processing model is remarkably simple yet powerful. Understanding its core Read, Execute, Repeat workflow and the BEGIN/Body/END program structure is the most critical step to developing an intuition for the language. This model forms the foundation of all AWK scripts, from single-line commands to complex programs.
3.2. Analyzing the Workflow
The AWK interpreter processes input through a clear, three-stage cycle:
- Read: AWK reads a single line (by default) from an input stream—which can be a file, a pipe from another command, or standard input—and stores it in memory. This line is referred to as a “record.”
- Execute: The interpreter applies all specified AWK commands to the record currently in memory. Execution can be made conditional by specifying patterns that the record must match.
- Repeat: This cycle of reading and executing continues sequentially, record by record, until the end of the input stream is reached.
3.3. Deconstructing the Program Structure
A complete AWK script is composed of three optional blocks that correspond to different phases of the workflow.
The Block
This block executes only once, before any input records are read. It is the ideal place for initialization tasks, such as setting the values of variables, printing report headers, or preparing data structures.
- Syntax: BEGIN {awk-commands}
The Block
This is the main processing engine of an AWK script. The commands within this block are executed for each input record that is read. The execution can be restricted to only those records that match a specified /pattern/. If no pattern is provided, the commands are executed for every record.
- Syntax: /pattern/ {awk-commands}
The Block
This block executes only once, after all input records have been read and processed. It is typically used for post-processing tasks, such as performing final calculations, printing summaries, or generating report footers.
- Syntax: END {awk-commands}
3.4. A Practical Example
Let’s consider a sample data file named marks.txt containing student records.
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
The following AWK script uses the BEGIN block to print a header and a Body block to print each line of the file.
[jerry]$ awk ‘BEGIN{printf “Sr No\tName\tSub\tMarks\n”} {print}’ marks.txt
Execution Analysis:
- BEGIN Block: Before reading marks.txt, AWK executes the BEGIN block, printing the formatted header string to the standard output.
- Body Block: AWK then begins its Read, Execute, Repeat cycle. It reads the first line (1) Amit Physics 80), executes the body block command ({print}), which prints the line. This repeats for every subsequent line in the file.
- END Block: No END block is present, so the program terminates after the last line is processed.
Formatted Output:
Sr No Name Sub Marks
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
With the core processing model understood, we can now explore the specific syntax for invoking AWK commands.