More on files
UNIX supports named pipes, also known as
FIFO files ('first in first out'). A named pipe is
like an ordinary pipe, except that it has a name by which it can be
referred to from within a program (normally pipes are anonymous).
In order to implement a named pipe, UNIX creates a file as the
named pipe, whose name is the name of the named pipe. The command
mkfifo is available to create such a file. The use of
such a file is that many processes can write to it at once, the
file will temporarily store the data it receives, and when the
contents are requested by another process the contents of the file
will be 'flushed'. Try the following:
$ mkfifo pfile
$ ls -l
total 0
prw------- 1 chris general 0 Dec 3 15:08 pfile
$ echo Hello >pfile &
[1] 2230
$ echo There >pfile &
[2] 2231
$ cat pfile
There
Hello
[1]- Done echo Hello >pfile
[2]+ Done echo There >pfile
You will notice that the ls -l indicates that pfile
exists as a file in the current directory, but the character in the
first column is a p indicating that it is a named
pipe. Once a named pipe is 'opened', when you start to write
something to it or read something from it, you can write to it from
several processes (such as the two echo processes in
the above example) - the pipe can therefore have several input
streams. The input streams are then merged and the result can be
read by reading the named pipe just like any other file.
As soon as all the processes writing to the named pipe have
terminated, the named pipe closes and any processes attempting to
read it receive the end-of-file character. The named pipe acts as a
'buffer', temporarily storing characters sent to it from its input
streams, and after sending these characters to the standard output
they are deleted.
There is one 'feature' of named pipes - all processes that read
from, or write to, a given named pipe must be running on the same
processor as that which created the pipe in the first place. If
your UNIX system contains many processors you must take care. We do
not discuss named pipes further here, save to indicate that if you
are familiar with programming in C you may wish to investigate them
in greater detail.
If you create a binary file using (say) C, that file will
contain symbols: that is to say, it will contain
the names used within that file for data that may need to be
communicated to and from the file. Although discussion of the
format of object files is outside the scope of this book, the
command nm is provided to list names in such a file,
and will be of interest to a C programmer. To see how
nm works, look in directory /usr/lib , see
what files there are in that directory with suffix .a
and choose one of them (say libc.a ). A large volume of
output will be produced, so you will probably wish to pipe it
through more :
$ nm /usr/lib/libc.a |
more
A similar task can be accomplished on the source code
files written in C or FORTRAN using ctags , which will
create a file called tags listing the named functions
defined in those files given to ctags as arguments.
note that ctags can also be used on some other
language constructs.
When creating text files that include TABs, it is
sometimes inconvenient to have those TABs there - or vice
versa, you may wish a file to include TABs where otherwise
it might not. Situations where this may be important include
preparing input for utilities such as Awk or make .
TABs are sometimes included in text files automatically,
especially by some editors, in order to utilise space more
efficiently. The command expand takes input containing
TABs and produce standard output which is the same as the
input but with the TABs replaced by the appropriate number
of spaces. The command unexpand does the reverse - a
file containing spaces will have some (or all) of the spaces
replaced by TABs, where tab stops are assumed to occur at
every eighth column. See also the command tabs
discussed previously.
If you have two text files whose lines have been sorted into
order, the utilities comm and join may be
of use. To select those lines common to both files, or which occur
in only one of them, use comm . This command is
complementary to diff and to uniq . The
output of comm , when given two arguments representing
filenames, is a sequence of lines containing three tab-separated
columns. The first column contains the lines unique to the first
file, the second column the lines unique to the second, and the
third column those lines common to both. For instance, if file
A contains
apple
orange
pear
pomegranate
strawberry
and file B contains
apple
peach
pomegranate
raspberry
then the output from
$ comm A B
would be
apple
orange
peach
pear
pomegranate
raspberry
strawberry
To join lines containing a common field from
the two files, use join - this command is
complementary to paste . The two files are considered
to contain a number of blank-separated columns. Lines with an
identical first column cause a line to be displayed that is the
first field followed by the other fields from the line in the first
file, then the other fields from the second. To illustrate this
command, suppose A and B contain lists of
fruits together with other data; then join will output
a sequence of lines that commence with the fruit names and then
include the extra data from A and B .
Suppose A is
apple 2 kilos Monday
orange 4 kilos from Jones' shop
pear none
pomegranate 3 kilos Tuesday
strawberry 2 boxes
and file B contains
apple 1 kilo Wednesday
peach none
pomegranate 1 kilo Thursday
raspberry none
then the effect of join would be
$ join A B
apple 2 kilos Monday 1 kilo Wednesday
pomegranate 3 kilos Tuesday 1 kilo Thursday
|