Introducing UNIX and Linux

It is possible to write scripts to read in a file containing several encoded files and automatically separate them.

Some mailers and networks restrict the size of messages they can send (often restricted to 64k bytes) so that if you wish to mail a large file you cannot send it in a single message. You could, of course, edit the message into several smaller files and send each one in turn, but that would be tedious. The command split will split a file automatically for you into similar sized pieces. For instance, try

and you will find a large collection of files have been created in the current directory with names xaa, xab, xac, etc., all of which (with the exception of the last one) are 1000 lines long (test this with wc). With option -l ('lines') followed by a number, split will chop a file into a collection of files each of that number of lines. The reason for having 1000 lines as the default size of the file is not only that 1000 is a 'nice round number', but also that 1000 lines of text with 61 characters per line (uuencode outputs lines which are 61 characters wide) comprise just less than the 64k limit that some mailers impose on files.

Worked example

You have a long file called bigdata, which contains control characters, and you wish sam to have a copy of it. Arrange to send sam a copy via electronic mail.
Solution: First of all, bigdata must be encoded and then split into small enough chunks to pass through the mailer. Each of those chunks must be individually mailed to sam.

$ uuencode bigdata bigdata | split $ for i in x?? > do > mailx -s "File $i" sam <$i > done

You must ensure that you have no files whose names are of three letters commencing with an x before attempting this exercise. The xaa files will be overwritten by split if files of the same name already exist in your current directory. The files sam receives can be joined together and then uudecoded to recreate the original file bigdata. Although uudecode will strip away headers and footers from a single file, it will not cope with extra lines inserted in the middle of a file. The recreated file will have to be edited to remove all headers and footers introduced by the mailer.

It is to be hoped that sending data across a network will result in the message received being identical to that sent. Regrettably, this is not always the case. If the communications medium is prone to interference (such as a crackly telephone line) it sometimes happens that data becomes corrupted. The command cksum can be used to identify quickly whether this has happened. On some non-POSIX systems the command sum is provided instead of cksum. Followed by a filename, cksum will print a large number, known as a checksum, based on the contents of the file, together with the number of characters in that file. If anything is altered in the file, the number created will be very different, as in the fllowing example in which the capital 'T' becomes a lower-case 't':

$ echo "To be or not to be" |
cksum

3740407258      19

$ echo "to be or not to be" | cksum

1143317160      19

If the sender and the recipient of data both run cksum on the message, and this yields the same number, then they can both be confident that the message has not been corrupted during transmission. To try out cksum, create two files (say data1 and data2) using vi, and containing the same piece of text (a short poem would be suitable). Then see what happens if you first of all use cksum to check whether they are the same, and then try diff.

Introducing UNIX and Linux

Advanced shell programming

Splitting files

Worked example