Overview
Using
filters
Collating
sequence
Character
classes
Character-to-character
transformation
Selecting lines by content
Regular
expressions
Basic
regular expressions
Extended
regular expressions
Grep
Stream
editor
Sed addresses
Splitting a file
according to context
Choosing between the
three filters
More on
Vi
Summary
Exercises
|
Collating sequence
Before considering these filters we must digress with some
remarks about characters. Specifically, we must ask the
question: 'how are they ordered?' We have already remarked that to
each character is assigned a code (normally the ASCII
representation), and the ordering of characters corresponds with
the numerical order of the codes. So, for instance, the code for
b is one greater than the code for a .
There are two possible problems with this: first, it is not
necessarily the case that ASCII is being used, and secondly, the
code representation - and ordering of characters - is different
depending on which native language you speak. Although most UNIX
systems use standard English/American, and a standard keyboard,
POSIX allows for user interfaces consistent with other languages
and equipment. Where, for instance, do accented letters fit in the
alphabet, or completely different letters such as Greek? We
therefore have a concept called a collating
sequence which is a specification of the logical
ordering for the character set you are using. In practice, this
ordering applies just to letters and to digits, although it is
defined for the whole character set. The collating sequence can be
changed in POSIX by amending the locale.
In the following discussion we will refer to
ranges, which are collections of characters that
are consecutive within the collating sequence. A range is specified
by a first and by a last character, separated by
a hyphen. For instance,
b-z
refers to the characters between b and
z , inclusive, in the current collating sequence.
Characters come in various familiar flavours: there are letters,
numbers, punctuation marks, and so on. These are character
classes, and there is a notation for referring to these
classes that is used by some utilities. The form this takes is a
name of a class enclosed between [: and
:] .
|