Character-to-character transformation
Translating a file so that specific characters are replaced by
others can be accomplished with tr . This command takes
as arguments two strings, which may consist of any number of
individual characters, ranges and character classes. If both
strings are the same length, instances of characters in the first
string are replaced by the corresponding character in the second.
The command tr can only be used as a pipe - it cannot
take a filename as an argument For example, to capitalise all the
lower-case letters in the input we would have:
tr "a-z" "A-Z"
or alternatively
tr "[:lower:]" "[:upper:]"
Try this out using just standard input and standard output. To
capitalise all the words in /usr/dict/words you would
have:
tr "[:lower:]" "[:upper:]" < /usr/dict/words
Worked example
Write a filter to replace all digits by blank spaces.
Solution: Use [:digit:] to represent
digits as first argument to tr .
tr "[:digit:]" " "
The second argument to tr must not be shorter than
the first. If the second argument is longer than the first, the
excess characters in the second argument disregarded, so that in
the pipe
tr "a-z" "A-Z123"
the characters 1 , 2 and 3
are unaffected. The two arguments to tr are strings;
as usual, if the strings contain whitespace they must be quoted,
and the standard conventions for quoted strings are used. So for a
filter to replace all blanks in the input with a B ,
you could have:
tr ' ' 'B'
Remember that between double quotes the characters
$ , * and @ have special
meanings and that certain characters must be escaped. If neither
string argument to tr includes characters requiring
quoting, then the quotes are not needed. The following three
filters are equivalent:
tr a-z A-Z
tr "a-z" "A-Z"
tr 'a-z' 'A-Z'
Although the strings tr is given as arguments do
not always require quoting, when the strings contain no characters
that are interpreted by the shell in an undesired fashion, it may
be helpful to quote them anyway, and from now on we will always
quote strings. This has two benefits - firstly, it reminds you to
be careful that some characters may need to be escaped in the
strings, and secondly it may make it easier to see where the two
strings start and finish.
Worked example
Write a filter to replace all double quotes by single
quotes.
Solution: The tricky part of this example is to
specify the strings correctly. The first string is a double quote,
but in order for it not to be interpreted by the shell, it must
either be preceded by a \ or enclosed by single
quotes. The second must also either be escaped with a
\ or enclosed in double quotes. Either of the
following two filters will solve the problem.
tr '"' "'"
tr \" \'
We can specify a string comprising a number of instances of a
single character: "[X*5]" is the same as
"XXXXX" . The notation "[X*]" yields a
string containing sufficient numbers of the character
X so that if used as a component of the second string,
the second string is long enough to match the first one. For
instance, to replace all digits with a question mark, you could use
either of the following:
tr "0-9" "[?*10]"
tr "0-9" "[?*]"
Worked example
Write a filter to replace all letters in the first half of the
alphabet by A and all in the second half by
Z .
Solution: Use tr , and note that there
are 13 letters in the first half of the alphabet, each having an
upper-case and a lower-case character. Thus the first half of the
alphabet is represented by a set of 26 characters.
tr "A-Ma-mN-Zn-z" "[A*26][Z*26]"
There are also options available to tr ; with option
-d ('delete') and only one string as argument, all
occurrences of characters specified by that string are deleted.
With option -c ('complement') as well as
-d all characters not occurring within the string are
deleted.
Worked example
Write a filter to delete all non-letter characters from the
input.
Solution: Use tr with option
-c to specify all non-alphabetic characters,
and -d to delete them.
tr -cd "A-Za-z"
Alternatively, use character classes:
tr -cd "[:alpha:]"
After all other changes have been performed, repeated instances
of a character specified in the final string argument can be
replaced by single instances of the same character using option
-s ('squash'). In this case, the string passed to
tr represents those characters on which this operation
is performed. So to replace multiple spaces by single ones:
$ echo "hello there Chris" | tr -s " "
hello there Chris
|