Programming with awk


Normally, awk reads its input one line, or record, at a time; a record is, by default, a sequence of characters ending with a newline. Then awk splits each record into fields, where, by default, a field is a string of non-blank, non-tab characters.

As input for many of the awk programs in this topic's sections, we use a file called countries, which contains information about the ten largest countries in the world. (See ``The sample input file countries''.)

Each record contains the name of a country, its area in thousands of square miles, its population in millions, and the continent on which it is located. (Data are from 1978; the U.S.S.R. has been arbitrarily placed in Asia.) The white space between fields is a tab in the original input; a single blank separates North and South from America .

The sample input file countries

USSR 8650 262 Asia
Canada 3852 24 North America
China 3692 866 Asia
USA 3615 219 North America
Brazil 3286 116 South America
Australia 2968 14 Australia
India 1269 637 Asia
Argentina 1072 26 South America
Sudan 968 19 Africa
Algeria 920 18 Africa

This file is typical of the kind of data awk is good at processing--a mixture of words and numbers separated into fields by blanks and tabs.

The number of fields in a record is determined by the field separator. Fields are normally separated by sequences of blanks and/or tabs, so that the first record of countries would have four fields, the second five, and so on. It is possible to set the field separator to just tab, so each line would have four fields, matching the meaning of the data; we will show how to do this shortly. For the time being, we will use the default: fields separated by blanks and/or tabs. The first field within a line is called $1, the second $2, and so forth. The entire record is called $0.

Next topic: Printing
Previous topic: Usage

© 2004 The SCO Group, Inc. All rights reserved.
UnixWare 7 Release 7.1.4 - 27 April 2004