5.0 Mastering AWK’s Built-in Variables
5.1. The Role of Built-in Variables
Built-in variables are a critical feature of AWK, providing automatic access to information about the input data, the execution environment, and the program’s state. Mastering their use is essential for writing powerful, concise, and idiomatic AWK scripts.
5.2. Standard AWK Variables
- ARGC & ARGV ARGC contains the number of command-line arguments, while ARGV is an array containing the arguments themselves. Note that ARGV[0] holds the name of the command, awk.
- NF (Number of Fields) This variable holds the total number of fields in the current record. It is re-calculated for every record read.
- NR (Number of Record) This variable acts as a global counter, storing the cumulative number of records processed since the script began.
- FNR (File Number of Record) Similar to NR, FNR stores the record number, but it is reset to 1 for each new input file processed. This makes it invaluable for multi-file operations. To illustrate, consider two files: data1.txt containing A\nB and data2.txt containing C\nD.
- Notice that FNR resets to 1 for data2.txt, while NR continues to increment.
- FS (Field Separator) Defines the character or regular expression used to separate fields in the input record. The default is whitespace (one or more spaces or tabs).
- RS (Record Separator) Defines the character used to separate records in the input. The default is a newline character (\n).
- OFS (Output Field Separator) Defines the separator placed between fields in the output when using the print statement with multiple arguments. The default is a single space.
- ORS (Output Record Separator) Defines the separator placed at the end of each record in the output. The default is a newline character (\n).
- FILENAME This variable holds the name of the current input file being processed.
- **$0 and n** As previously noted, `0represents the entire current record, whilen` (e.g., `1, $2`) represents the nth field in that record.
- RSTART & RLENGTH Used with the match() function. After a successful match, RSTART holds the starting position (index) of the matched substring, and RLENGTH holds its length.
- CONVFMT & OFMT These variables control the format for converting numbers to strings (CONVFMT) and the output format for numbers printed by print (OFMT). The default for both is %.6g.
- SUBSEP This variable specifies the subscript separator used to simulate multi-dimensional arrays. The default is the non-printing character \034.
- ENVIRON An associative array that provides access to the shell’s environment variables.
5.3. GNU AWK (GAWK) Specific Variables
GAWK provides several extensions to the POSIX standard through its own set of built-in variables.
- ARGIND: The index in the ARGV array that points to the current file being processed.
- IGNORECASE: If set to a non-zero value, enables case-insensitive pattern matching for all regular expression operations.
- ERRNO: When a redirection or close() operation fails, this variable contains a system-dependent error message string.
- FIELDWIDTHS: A space-separated list of numbers that instructs GAWK to parse records based on fixed field widths instead of using the FS variable.
- LINT: Provides dynamic control over linting. Setting it enables warnings about non-portable or questionable code constructs.
- PROCINFO: An associative array containing information about the running process, such as its process ID (PROCINFO[“pid”]).
- TEXTDOMAIN: Used for internationalization, specifying the text domain for retrieving localized program strings.
- BINMODE: Used on non-POSIX systems to specify binary mode for file I/O operations.
Now that we understand how AWK stores and provides access to data via variables, the next logical step is to explore the operators used to manipulate that data.