Introducing UNIX and Linux

Awk

Overview
What is 'awk'?
Invoking 'awk'
Naming the fields
Formatted output
      Operators used by Awk
Patterns
Variables
      Accessing Values
      Special variables
Arguments to 'awk' scripts
Arrays
Field and record separators
Functions
      List of Awk functions
Summary
Exercises

Special variables

Just as the shell can use predefined variables such as HOME, so can Awk. There are many of these, all of which use capital letters only (so variable names you choose yourself should use lower-case letters). Some of them we discuss here:

`FILENAME`	The pathname of the current input file
`FS`	Input field separator, usually Space
`NF`	Number of fields in current record
`NR`	Number of current record from start of input
`FNR`	Number of current record from start of current input file
`OFS`	Output field separator used by `print`, usually Space
`ORS`	Output record separator used by `print`, usually Newline

Each input record is counted, starting at 1. Given that the variable NR contains the number of the current record, the following script will prepend each input line with the line number (unless otherwise specified, a record is assumed to be a single line). The format is that of cat -n, where six spaces are allowed for the line numbers, which are separated from the line contents by two blanks. The format specification %6d indicates an integer right-justified within six spaces.

{ printf "%6d  %s\n", NR, $0 }

Try this Awk script, and also cat -n, with a file such as vegetables.

Worked example

Using awk, select the first three lines of standard input, in the manner of head -3.
Solution: Display only those lines whose number, as given by NR, is at most three. When NR is equal to three, the program should finish - otherwise it will continue reading input until the input terminates. The action exit causes awk to terminate.

NR <= 3 { print $0 }
NR == 3 { exit }

The variable NR starts off with value 1 on the first line of input, and continues counting however many files you have given as argument to awk. There is another variable FNR, which is similar to NR, but is reset to 1 each time a new file is read as input. The variable FILENAME holds the name of the current data file being read in.

Worked example

Write an Awk script firstlines to read from a number of files and display the first line of each file preceded by the message The first line of filename is: in the following manner:

$ awk -f firstlines vegetables /usr/dict/words The first line of vegetables is: potatoes 0.50 5 The first line of /usr/dict/words is: AAAA

Solution: Use variable FNR to form the pattern to find the first line of each input file, then printf to display that line ($0).

FNR == 1 { printf "The first line of %s is:\n%s\n", FILENAME, $0 }

Each record consists of a number of fields. The variable NF is the number of fields contained in the current record. Try the following:

$ awk '{ print NF }' hello there 2 A B C D E
5
(blank line)
0
ctrl-D

Worked example

If some data in vegetables had been mistyped, there might be lines in the file containing either less than or more than three fields. Such lines cannot be processed correctly by the previous Awk scripts. Write an Awk script to read a file and display a list of those lines containing a number of fields different to three.
Solution: Use the pattern NF != 3 to choose those lines, and the value of NR to indicate which lines they are:

NF != 3 { printf "Line %d has %d fields\n", NR, NF }

Use this script to check that your file vegetables is indeed of the correct format. Try it on some other files you own and see what happens.