Programming with awk

awk summary

The following sections summarize the features of awk.

Command line

   awk  program  filenames
   awk -f  program-file  filenames
   awk -Fs  sets field separator to string s; -Ft sets separator to tab


   /extended regular expression/
   relational expression
   pattern && pattern
   pattern || pattern
   pattern, pattern

Control flow statements

   if (expr) statement [else statement]
   if (subscript in array) statement [else statement]
   while (expr) statement
   for (expr; expr; expr) statement
   for (var in array) statement
   do statement while (expr)
   exit [expr]
   return [expr]


close(filename) close file
getline set $0 from next input record; set NF, NR, FNR
getline <file set $0 from next record of file; set NF
getline var set var from next input record; set NR, FNR
getline var <file set var from next record of file
print print current record
print expr-list print expressions
print expr-list >file print expressions on file
printf fmt, expr-list format and print
printf fmt, expr-list >file format and print on file
system(cmd-line) execute command cmd-line, return status

In print and printf above, >>file appends to the file, and | command writes on a pipe. Similarly, command | getline pipes into getline. getline returns 0 on end of file, and -1 on error.


   func name(parameter list) { statement }
   function name(parameter list) { statement }
   function-name(expr, expr, . . .)

String functions

gsub(r,s,t) substitute string s for each substring matching extended regular expression r in string t, return number of substitutions; if t omitted, use $0
index(s,t) return index of string t in string s, or 0 if not present
length(s) return length of string s
match(s,r) return position in s where extended regular expression r occurs, or 0 if r is not present
split(s,a,r) split string s into array a on extended regular expression r, return number of fields; if r omitted, FS is used in its place
sprintf(fmt, expr-list) print expr-list according to fmt, return resulting string
sub(r,s,t) like gsub except only the first matching substring is replaced
substr(s,i,n) return n-char substring of s starting at i; if n omitted, use rest of s

Arithmetic functions

atan2(y,x) arctangent of y/x in radians
cos(expr) cosine (angle in radians)
exp(expr) exponential
int(expr) truncate to integer
log(expr) natural logarithm
rand() random number between 0 and 1
sin(expr) sine (angle in radians)
sqrt(expr) square root
srand(expr) new seed for random number generator; use time of day if no expr

Operators (increasing precedence)

= += -= *= /= %= ^= assignment
?: conditional expression
|| logical OR
&& logical AND
~ !~ extended regular expression match, negated match
< <= > >= != == relationals
blank string concatenation
+ - add, subtract
* / % multiply, divide, mod
+ - ! unary plus, unary minus, logical negation
^ exponentiation (** is a synonym)
++ -- increment, decrement (prefix and postfix)
$ field

Regular expressions (increasing precedence)

c matches non-metacharacter c
\c matches literal character c
. matches any character but newline
^ matches beginning of line or string
$ matches end of line or string
[abc...] character class matches any of abc...
[^abc...] negated class matches any but abc...
r1|r2 matches either r1 or r2
r1r2 concatenation: matches r1, then r2
r+ matches one or more r's
r* matches zero or more r's
r? matches zero or one r's
r{low,high} at least low rs but no more than high
(r) grouping: matches r

Built-in variables

ARGC number of command-line arguments
ARGV array of command-line arguments (0..ARGC-1)
FILENAME name of current input file
FNR input record number in current file
FS input field separator (default blank)
NF number of fields in current input record
NR input record number since beginning
OFMT output format for numbers (default %.6g)
OFS output field separator (default blank)
ORS output record separator (default newline)
RS input record separator (default newline)
RSTART index of first character matched by match(); 0 if no match
RLENGTH length of string matched by match(); -1 if no match
SUBSEP separates multiple subscripts in array elements; default \034


Any particular implementation of awk enforces some limits. Here are typical values:

100 fields
2500 characters per input record
2500 characters per output record
1024 characters per individual field
1024 characters per printf string
400 characters maximum quoted string
400 characters in character class
15 open files
1 pipe
numbers are limited to what can be represented on the local machine,
for example, 1e-38..1e+38

Initialization, comparison, and type coercion

Each variable and field can potentially be a string or a number or both at any time. When a variable is set by the assignment

   var = expr
its type is set to that of the expression. (Assignment includes +=, -=, and so on.) An arithmetic expression is of type number, a concatenation is of type string, and so on. If the assignment is a simple copy, as in
   v1 = v2
then the type of v1 becomes that of v2.

In comparisons, if both operands are numeric, the comparison is made numerically. Otherwise, operands are coerced to string if necessary, and the
comparison is made on strings. The type of any expression can be coerced to numeric by subterfuges such as

   expr + 0
and to string by
   expr ""
(that is, concatenation with a null string).

Uninitialized variables have the numeric value 0 and the string value "". Accordingly, if x is uninitialized,

   if (x) ...
is false, and
   if (!x) ...
   if (x == 0) ...
   if (x == "") ...
are all true. But the following is false:
   if (x == "0") ...

The type of a field is determined by context when possible; for example,

clearly implies that $1 is to be numeric, and
   $1 = $1 "," $2
implies that $1 and $2 are both to be strings. Coercion is done as needed.

In contexts where types cannot be reliably determined, for example,

   if ($1 == $2) ...
the type of each field is determined on input. All fields are strings; also, each field that contains only a number is also considered numeric.

Fields that are explicitly null have the string value "" ; they are not numeric. Non-existent fields (that is, fields past NF) are treated this way, too.

As it is for fields, so it is for array elements created by split.

Mentioning a variable in an expression causes it to exist, with the value "" as described above. Thus, if arr[i] does not currently exist,

   if (arr[i] == "") ...
causes it to exist with the value "" so the if is satisfied. The special construction
   if (i in arr) ...
determines if arr[i] exists without the side effect of creating it if it does not.
Previous topic: Form-letter generation

© 2004 The SCO Group, Inc. All rights reserved.
UnixWare 7 Release 7.1.4 - 27 April 2004