Splitting files
It is possible to write scripts to read in a file containing
several encoded files and automatically separate them.
Some mailers and networks restrict the size of messages they can
send (often restricted to 64k bytes) so that if you wish to mail a
large file you cannot send it in a single message. You could, of
course, edit the message into several smaller files and send each
one in turn, but that would be tedious. The command
split will split a file automatically for you into
similar sized pieces. For instance, try
$ split /usr/dict/words
and you will find a large collection of files have been created
in the current directory with names xaa ,
xab , xac , etc., all of which (with the
exception of the last one) are 1000 lines long (test this with
wc ). With option -l ('lines') followed by
a number, split will chop a file into a collection of
files each of that number of lines. The reason for having 1000
lines as the default size of the file is not only that 1000 is a
'nice round number', but also that 1000 lines of text with 61
characters per line (uuencode outputs lines which are
61 characters wide) comprise just less than the 64k limit that some
mailers impose on files.
Worked example
You have a long file called bigdata , which contains
control characters, and you wish sam to have a copy of
it. Arrange to send sam a copy via electronic
mail.
Solution: First of all, bigdata must
be encoded and then split into small enough chunks to pass through
the mailer. Each of those chunks must be individually mailed to
sam .
$ uuencode bigdata bigdata |
split
$ for i in x??
> do
> mailx -s "File $i" sam <$i
> done
You must ensure that you have no files whose names are of three
letters commencing with an x before attempting this
exercise. The xaa files will be overwritten by
split if files of the same name already exist in your
current directory. The files sam receives can be
joined together and then uudecode d to recreate the
original file bigdata . Although uudecode
will strip away headers and footers from a single file, it
will not cope with extra lines inserted in the middle of a file.
The recreated file will have to be edited to remove all headers and
footers introduced by the mailer.
It is to be hoped that sending data across a network will result
in the message received being identical to that sent. Regrettably,
this is not always the case. If the communications medium is prone
to interference (such as a crackly telephone line) it sometimes
happens that data becomes corrupted. The command cksum
can be used to identify quickly whether this has happened. On some
non-POSIX systems the command sum is provided instead
of cksum . Followed by a filename, cksum
will print a large number, known as a checksum,
based on the contents of the file, together with the number of
characters in that file. If anything is altered in the file, the
number created will be very different, as in the fllowing example
in which the capital 'T' becomes a lower-case 't':
$ echo "To be or not to be" |
cksum
3740407258 19
$ echo "to be or not to be" | cksum
1143317160 19
If the sender and the recipient of data both run
cksum on the message, and this yields the same number,
then they can both be confident that the message has not been
corrupted during transmission. To try out cksum ,
create two files (say data1 and data2 )
using vi , and containing the same piece of text (a
short poem would be suitable). Then see what happens if you first
of all use cksum to check whether they are the same,
and then try diff .
|