For the record hello world, $0 is hello world, $1 is hello, $2 is world, and $3 is empty.
Arrays
Awk uses associative arrays. An associative array stores values under keys, which can be arbitrary strings. You access a value with square brackets, such as m[1].
In match($0, /.../, m), awk stores captured text in the array m. m[0] is the whole match, and m[1] is the first captured part of the match.
m[1]
if (match($0, /hello ([^ ]+)/, m)) { print m[1]}
For the record hello world, this prints world.
Variables
User-defined variables
Awk does not require explicit variable declarations. A variable is created when it is assigned for the first time. User-defined variables are not scoped to one record; they persist across records until they are reassigned.
An uninitialised variable evaluates to the empty string in string context and to 0 in numeric context.
zon_url = m[1]
zon_url = m[1]
This assigns the first captured match to zon_url, so that it can be used later in the program.
NR
NR is the total number of records read so far across all input files.
It starts at 1 for the first record and increases by 1 each time awk reads a new record.
FNR
FNR is the number of the current record within the current input file.
It resets to 1 whenever awk starts a new file.
FNR == NR is true only while awk is reading the first input file.
Statements
Awk has statements that are written without parentheses. This includes keywords such as print and delete, which are not function calls.
Important
Statements like print and delete do not need parentheses, whereas ordinary function calls such as match(...) do.
print
print writes output. It usually prints the current record$0 when no argument is given.
Awk has built-in functions and user-defined functions.
match
match(s, r, a) searches the string s for the POSIX extended regular expressionr. If the search succeeds, it returns the 1-based position of the match in s, and it sets RSTART and RLENGTH accordingly. If a third argument a is given, awk stores the whole match in a[0] and the captured subexpressions in a[1], a[2], and so on.
Important
match() is often used inside a condition, because a successful match returns a non-zero value.
For the record hello world, m[0] is hello world and m[1] is world.
User-defined functions
A user-defined function is declared with the function keyword. It can take parameters and return a value with return.
function
function double(x) { return x * 2}
This defines a function that returns twice its argument.
Awk also allows local variables to be listed after the parameters; they are separated by extra spaces in the function header.
local variables
function flush(x, i) { i = x print i}
Here, x is a parameter and i is a local variable of flush.
Operators
Pattern Matching Operator
The pattern matching operator ~ tests whether the left-hand side matches a POSIX extended regular expression on the right-hand side.
The operator !~ is the negated form.
~
$0 ~ /^[[:space:]]*\{[[:space:]]*$/
This is true when the current record contains only optional whitespace and a literal {.
!~
$0 !~ /foo/
This is true when the current record does not match foo.
Conditionals
if
if runs one block when a condition is true. If the condition is false, awk skips that block. You can also add else and else if branches.
A condition can be any expression whose value is treated as true or false.
For example, match($0, /re/, m) is used as a condition because it returns a non-zero value when the regular expression matches the current record.
if
if ($1 == "yes") { print $0}
This prints the current record only when the first field is yes.
else if adds another condition after an if or else if.
else if
if ($1 == "yes") { print "yes"} else if ($1 == "maybe") { print "maybe"}
This checks a second condition if the first one is false.
Ternary Operator
The ternary operator chooses between two values with a condition.
Ternary operator
print ($1 == "yes" ? "yes" : "no")
This prints yes when the condition is true, and no otherwise.
Pattern-Action Rules
An awk program is a list of pattern-action rules. Each rule has an optional pattern and an action block in braces. If the pattern is true, awk runs the action block.
A rule can look like this:
pattern { action }
The blocks you noticed are action blocks. They can be written one after another, and awk checks them for each record.
FNR == NR { ... }
FNR == NR { print $0}
This rule runs its action block only for records from the first input file.
next
next stops processing the current record and moves awk to the next one.
Without next, awk keeps checking later rules for the same record.
Important
next is useful when one rule handles a record completely and later rules should not see it.
next
if ($1 == "skip") { next}print $0
This skips records whose first field is skip.
next in a rule
FNR == NR { print $0 next}
This prints records from the first input file and then stops awk from applying later rules to those same records.
BEGIN
BEGIN is a special pattern that runs before awk reads the first record. It is commonly used to initialise variables or print a header.
BEGIN
BEGIN { print "start"}
This runs once, before any input is processed.
END
END is a special pattern that runs after awk has read the last record. It is commonly used to print summaries or to flush buffered output.
END
END { if (in_item) { flush() }}
This runs once, after all input has been processed.
Examples
FNR == NR { ... }
This pattern runs only while awk reads the first input file. FNR counts records within the current file, but NR counts records across all files. When awk starts the second file, FNR resets to 1 but NR keeps increasing. So FNR == NR is false for later files.
For example:
FNR == NR { print $0 }
This prints every record from the first file and skips the later files because the condition is no longer true.