Introducing UNIX and Linux

Advanced shell programming

Overview
Sending and trapping signals
      Signal names
Functions
Aliases
The 'exec' mechanism
The 'eval' mechanism
Sending data across networks
      Sending printable characters
      Splitting files
Makefiles
Safe programming
Setting up a terminal
More on files
Miscellaneous utilities
Summary
Exercises

More on files

UNIX supports named pipes, also known as FIFO files ('first in first out'). A named pipe is like an ordinary pipe, except that it has a name by which it can be referred to from within a program (normally pipes are anonymous). In order to implement a named pipe, UNIX creates a file as the named pipe, whose name is the name of the named pipe. The command mkfifo is available to create such a file. The use of such a file is that many processes can write to it at once, the file will temporarily store the data it receives, and when the contents are requested by another process the contents of the file will be 'flushed'. Try the following:

$ mkfifo pfile $ ls -l total 0 prw------- 1 chris general 0 Dec 3 15:08 pfile $ echo Hello >pfile & [1] 2230 $ echo There >pfile & [2] 2231 $ cat pfile There Hello [1]- Done echo Hello >pfile [2]+ Done echo There >pfile

You will notice that the ls -l indicates that pfile exists as a file in the current directory, but the character in the first column is a p indicating that it is a named pipe. Once a named pipe is 'opened', when you start to write something to it or read something from it, you can write to it from several processes (such as the two echo processes in the above example) - the pipe can therefore have several input streams. The input streams are then merged and the result can be read by reading the named pipe just like any other file.

As soon as all the processes writing to the named pipe have terminated, the named pipe closes and any processes attempting to read it receive the end-of-file character. The named pipe acts as a 'buffer', temporarily storing characters sent to it from its input streams, and after sending these characters to the standard output they are deleted.

There is one 'feature' of named pipes - all processes that read from, or write to, a given named pipe must be running on the same processor as that which created the pipe in the first place. If your UNIX system contains many processors you must take care. We do not discuss named pipes further here, save to indicate that if you are familiar with programming in C you may wish to investigate them in greater detail.

If you create a binary file using (say) C, that file will contain symbols: that is to say, it will contain the names used within that file for data that may need to be communicated to and from the file. Although discussion of the format of object files is outside the scope of this book, the command nm is provided to list names in such a file, and will be of interest to a C programmer. To see how nm works, look in directory /usr/lib, see what files there are in that directory with suffix .a and choose one of them (say libc.a). A large volume of output will be produced, so you will probably wish to pipe it through more:

$ nm /usr/lib/libc.a | more

A similar task can be accomplished on the source code files written in C or FORTRAN using ctags, which will create a file called tags listing the named functions defined in those files given to ctags as arguments. note that ctags can also be used on some other language constructs.

When creating text files that include TABs, it is sometimes inconvenient to have those TABs there - or vice versa, you may wish a file to include TABs where otherwise it might not. Situations where this may be important include preparing input for utilities such as Awk or make. TABs are sometimes included in text files automatically, especially by some editors, in order to utilise space more efficiently. The command expand takes input containing TABs and produce standard output which is the same as the input but with the TABs replaced by the appropriate number of spaces. The command unexpand does the reverse - a file containing spaces will have some (or all) of the spaces replaced by TABs, where tab stops are assumed to occur at every eighth column. See also the command tabs discussed previously.

If you have two text files whose lines have been sorted into order, the utilities comm and join may be of use. To select those lines common to both files, or which occur in only one of them, use comm. This command is complementary to diff and to uniq. The output of comm, when given two arguments representing filenames, is a sequence of lines containing three tab-separated columns. The first column contains the lines unique to the first file, the second column the lines unique to the second, and the third column those lines common to both. For instance, if file A contains

apple
orange
pear
pomegranate
strawberry

and file B contains

apple
peach
pomegranate
raspberry

then the output from

$ comm A B

would be

                apple
orange
        peach
pear
                pomegranate
        raspberry
strawberry

To join lines containing a common field from the two files, use join - this command is complementary to paste. The two files are considered to contain a number of blank-separated columns. Lines with an identical first column cause a line to be displayed that is the first field followed by the other fields from the line in the first file, then the other fields from the second. To illustrate this command, suppose A and B contain lists of fruits together with other data; then join will output a sequence of lines that commence with the fruit names and then include the extra data from A and B. Suppose A is

apple 2 kilos Monday
orange 4 kilos from Jones' shop
pear none
pomegranate 3 kilos Tuesday
strawberry 2 boxes

and file B contains

apple 1 kilo Wednesday
peach none
pomegranate 1 kilo Thursday
raspberry none

then the effect of join would be

$ join A B apple 2 kilos Monday 1 kilo Wednesday pomegranate 3 kilos Tuesday 1 kilo Thursday