Programming with awk

Cooperation with the shell

In all the examples thus far, the awk program was in a file and was fetched from there using the -f flag, or it appeared on the command line enclosed in single quotes, as in

   awk '{ print $1 }' . . .
Since awk uses many of the same characters as the shell does, such as $ and ", surrounding the awk program with single quotes ensures that the shell will pass the entire program unchanged to the awk interpreter.

Now, consider writing a command addr that will search a file addresslist for name, address and telephone information. Suppose that addresslist contains names and addresses in which a typical entry is a multi-line record such as

   G. R. Emlin
   600 Mountain Avenue
   Murray Hill, NJ 07974
Records are separated by a single blank line.

You want to search the address list by issuing commands like

   addr Emlin
That is easily done by a program of the form
   awk '
   BEGIN	{ RS = "" }
   ' addresslist
The problem is how to get a different search pattern into the program each time it is run.

There are several ways to do this. One way is to create a file called addr that contains

   awk '
   BEGIN	{ RS = "" }
   ' addresslist
The quotes are critical here. The awk program is only one argument, even though there are two sets of quotes because quotes do not nest. The $1 is outside the single quotes but inside the double quotes, and thus is visible to the shell, which therefore replaces it by the pattern Emlin when the command addr Emlin is invoked. On a UNIX system, addr can be made executable by changing its mode with the following command:
   chmod +x addr

A second way to implement addr relies on the fact that the shell substitutes for $ parameters within double quotes:

   awk "
   BEGIN	{ RS = \"\" }
   " addresslist
Therefore, you must protect the quotes defining RS with backslashes, so that the shell passes them on to awk without interpretation. $1 is recognized as a parameter, however, so the shell replaces it by the pattern when the command addr pattern is invoked.

A third way to implement addr is to use ARGV to pass the extended regular expression to an awk program that explicitly reads through the address list with getline:

   awk '
   BEGIN   { RS = ""
             while (getline < "addresslist")
                if ($0 ~ ARGV[1])
                   print $0
   } ' $*
All processing is done in the BEGIN action.

Notice that any regular expression can be passed to addr; in particular, it is possible to retrieve by parts of an address or telephone number as well as by name.

Next topic: Example applications
Previous topic: The system function

© 2004 The SCO Group, Inc. All rights reserved.
UnixWare 7 Release 7.1.4 - 27 April 2004