Programming with awk

The getline function

awk's facility for automatically breaking its input into records that are more than one line long is not adequate for some tasks. For example, if records are not separated by blank lines, but by something more complicated, merely setting RS to null does not work. In such cases, the program must manage the splitting of each record into fields. Here are some suggestions.

The function getline can be used to read input either from the current input or from a file or pipe, by redirection analogous to printf. By itself, getline fetches the next input record and performs the normal field-splitting operations on it. It sets NF, NR, and FNR. getline returns 1 if there was a record present, 0 if the end-of-file was encountered, and -1 if some error occurred (such as failure to open a file).

To illustrate, suppose you have input data consisting of multi-line records, each of which begins with a line beginning with START and ends with a line beginning with STOP. The following awk program processes these multi-line records, a line at a time, putting the lines of the record into consecutive entries of an array

   f[1] f[2] ... f[nf]
Once the line containing STOP is encountered, the record can be processed from the data in the f array:
   /^START/ {
             f[nf=1] = $0
             while (getline && $0 !~ /^STOP/)
                  f[++nf] = $0
             # now process the data in f[1]...f[nf]
Notice that this code uses the fact that && evaluates its operands left to right and stops as soon as one is true.

The same job can also be done by the following program:

   /^START/ && nf==0	{ f[nf=1] = $0 }
   nf > 1	{ f[++nf] = $0 }
   /^STOP/	{ # now process the data in f[1]...f[nf]
   	  nf = 0

The statement

   getline x
reads the next record into the variable x. No splitting is done; NF is not set. The statement
   getline <"file"
reads from file instead of the current input. It has no effect on NR or FNR, but field splitting is performed and NF is set. The statement
   getline x <"file"
gets the next record from file into x; no splitting is done, and NF, NR and FNR are untouched.

If a filename is an expression, it should be in parentheses for evaluation:

   while ( getline x < (ARGV[1] ARGV[2]) ) {  ... }
because the < has precedence over concatenation. Without parentheses, a statement such as
   getline x < "tmp" FILENAME
sets x to read the file tmp and not tmp <value of FILENAME>. Also, if you use this getline statement form, a statement like
   while ( getline x < file ) { ... }
loops forever if the file cannot be read because getline returns -1, not zero if an error occurs. A better way to write this test is
   while ( getline x < file > 0) { ... }

You can also pipe the output of another command directly into getline. For example, the statement

   while ("who" | getline)
executes who and pipes its output into getline. Each iteration of the while loop reads one more line and increments the variable n, so after the while loop terminates, n contains a count of the number of users. Similarly, the statement
   "date" | getline d
pipes the output of date into the variable d, thus setting d to the current date. Note that, in this case, awk leaves the pipeline (and thus the resources associated with date) open, since only one line was read from the pipeline. An explicit close("date") will clear up these unneeded resources. Similarly, if a new invocation of date is desired later, an explicit close("date") is also needed. Otherwise getline would try to read a second line from the first invocation. ``getline function'' summarizes the getline function.

getline function

Form Sets
getline $0, NF, NR, FNR
getline var var, NR, FNR
getline <file $0, NF
getline var <file var
cmd | getline $0, NF
cmd | getline var var

Next topic: Command-line arguments
Previous topic: Multi-line records

© 2004 The SCO Group, Inc. All rights reserved.
UnixWare 7 Release 7.1.4 - 27 April 2004