3.0 Core Concepts: The AWK Workflow and Program Structure
3.1 The “Read, Execute, Repeat” Workflow
The fundamental processing model of AWK is a simple yet powerful cycle that makes it exceptionally effective for line-by-line text analysis. This “Read, Execute, Repeat” workflow is the core engine that drives every AWK program, from simple one-liners to complex scripts.
The workflow consists of three distinct phases that loop until all input is consumed:
- Read: AWK reads a single line from an input stream (which can be a file, a pipe from another command, or standard input) and stores it in memory.
- Execute: It then applies all specified commands and patterns to that line. By default, commands are executed on every line, but patterns can be used to restrict execution to only those lines that meet specific criteria.
- Repeat: This process continues for each subsequent line until the end of the input stream is reached.
3.2 The Anatomy of an AWK Program
An AWK program is structured around three optional blocks, each designed to execute at a different stage of the program lifecycle. This structure allows for initialization before processing, per-line actions during processing, and final reporting after processing is complete. These blocks map directly to the workflow described above.
- The BEGIN block:
- Syntax: BEGIN { awk-commands }
- Purpose: This block executes exactly once, before the “Read, Execute, Repeat” loop begins. It is the ideal place for tasks such as initializing variables, printing report headers, or setting configuration options. It is optional.
- The Body block:
- Syntax: /pattern/ { awk-commands }
- Purpose: This is the core processing block, representing the “Execute” phase inside the main loop. It executes for each line of input. If a pattern is specified, the block’s commands only run on lines that match the pattern. If the pattern is omitted, the block executes for every line of input.
- The END block:
- Syntax: END { awk-commands }
- Purpose: This block executes exactly once, after the “Read, Execute, Repeat” loop has terminated. It is typically used for final calculations, such as computing totals or averages, and for printing summary reports or footers. It is also optional.
To illustrate these concepts, consider a file named marks.txt containing student records.
Input File: marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
The following command uses a BEGIN block to print a formatted header and a Body block to print each line of the file.
AWK Command:
[jerry]$ awk ‘BEGIN{printf “Sr No\tName\tSub\tMarks\n”} {print}’ marks.txt
Final Output:
Sr No Name Sub Marks
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
Here, the BEGIN block executes first, printing the header. Then, for each line in marks.txt, the Body block’s {print} command executes, printing the line to standard output. With this structural foundation in place, we can now explore the practical mechanics of invoking an AWK program, whether as a quick command-line filter or a robust, reusable script.