version 1.10, 2013/01/25 11:51:42 |
version 1.11, 2015/08/22 14:04:54 |
Line 83 and obsolete REs (roughly those of |
|
Line 83 and obsolete REs (roughly those of |
|
Obsolete REs mostly exist for backward compatibility in some old programs; |
Obsolete REs mostly exist for backward compatibility in some old programs; |
they will be discussed at the end. |
they will be discussed at the end. |
1003.2 leaves some aspects of RE syntax and semantics open; |
1003.2 leaves some aspects of RE syntax and semantics open; |
`#' marks decisions on these aspects that |
`(*)' marks decisions on these aspects that |
may not be fully portable to other 1003.2 implementations. |
may not be fully portable to other 1003.2 implementations. |
.Pp |
.Pp |
A (modern) RE is one# or more non-empty# |
A (modern) RE is one(*) or more non-empty(*) |
.Em branches , |
.Em branches , |
separated by `|'. |
separated by `|'. |
It matches anything that matches one of the branches. |
It matches anything that matches one of the branches. |
.Pp |
.Pp |
A branch is one# or more |
A branch is one(*) or more |
.Em pieces , |
.Em pieces , |
concatenated. |
concatenated. |
It matches a match for the first, followed by a match for the second, etc. |
It matches a match for the first, followed by a match for the second, etc. |
Line 99 It matches a match for the first, follow |
|
Line 99 It matches a match for the first, follow |
|
A piece is an |
A piece is an |
.Em atom |
.Em atom |
possibly followed |
possibly followed |
by a single# `*', `+', `?', or |
by a single(*) `*', `+', `?', or |
.Em bound . |
.Em bound . |
An atom followed by `*' matches a sequence of 0 or more matches of the atom. |
An atom followed by `*' matches a sequence of 0 or more matches of the atom. |
An atom followed by `+' matches a sequence of 1 or more matches of the atom. |
An atom followed by `+' matches a sequence of 1 or more matches of the atom. |
|
|
is `{' followed by an unsigned decimal integer, possibly followed by `,' |
is `{' followed by an unsigned decimal integer, possibly followed by `,' |
possibly followed by another unsigned decimal integer, |
possibly followed by another unsigned decimal integer, |
always followed by `}'. |
always followed by `}'. |
The integers must lie between 0 and RE_DUP_MAX (255#) inclusive, |
The integers must lie between 0 and RE_DUP_MAX (255(*)) inclusive, |
and if there are two of them, the first may not exceed the second. |
and if there are two of them, the first may not exceed the second. |
An atom followed by a bound containing one integer |
An atom followed by a bound containing one integer |
.Em i |
.Em i |
|
|
(inclusive) matches of the atom. |
(inclusive) matches of the atom. |
.Pp |
.Pp |
An atom is a regular expression enclosed in `()' (matching a match for the |
An atom is a regular expression enclosed in `()' (matching a match for the |
regular expression), an empty set of `()' (matching the null string)#, a |
regular expression), an empty set of `()' (matching the null string)(*), a |
.Em bracket expression |
.Em bracket expression |
(see below), `.' (matching any single character), |
(see below), `.' (matching any single character), |
`^' (matching the null string at the beginning of a line), |
`^' (matching the null string at the beginning of a line), |
`$' (matching the null string at the end of a line), |
`$' (matching the null string at the end of a line), |
a `\e' followed by one of the characters `^.[$()|*+?{\e' |
a `\e' followed by one of the characters `^.[$()|*+?{\e' |
(matching that character taken as an ordinary character), |
(matching that character taken as an ordinary character), |
a `\e' followed by any other character# |
a `\e' followed by any other character(*) |
(matching that character taken as an ordinary character, |
(matching that character taken as an ordinary character, |
as if the `\e' had not been present#), |
as if the `\e' had not been present(*)), |
or a single character with no other significance (matching that character). |
or a single character with no other significance (matching that character). |
A `{' followed by a character other than a digit is an ordinary |
A `{' followed by a character other than a digit is an ordinary |
character, not the beginning of a bound#. |
character, not the beginning of a bound(*). |
It is illegal to end an RE with `\e'. |
It is illegal to end an RE with `\e'. |
.Pp |
.Pp |
A |
A |
|
|
.Em range |
.Em range |
of characters between those two (inclusive) in the collating sequence, |
of characters between those two (inclusive) in the collating sequence, |
e.g. `[0-9]' in ASCII matches any decimal digit. |
e.g. `[0-9]' in ASCII matches any decimal digit. |
It is illegal# for two ranges to share an endpoint, e.g. `a-c-e'. |
It is illegal(*) for two ranges to share an endpoint, e.g. `a-c-e'. |
Ranges are very collating-sequence-dependent, |
Ranges are very collating-sequence-dependent, |
and portable programs should avoid relying on them. |
and portable programs should avoid relying on them. |
.Pp |
.Pp |
Line 194 of all collating elements equivalent to |
|
Line 194 of all collating elements equivalent to |
|
the treatment is as if the enclosing delimiters were `[.' and `.]'.) |
the treatment is as if the enclosing delimiters were `[.' and `.]'.) |
For example, if o and '\(^o' are the members of an equivalence class, |
For example, if o and '\(^o' are the members of an equivalence class, |
then `[[=o=]]', `[[=\(^o'=]]', and `[o\(^o']' are all synonymous. |
then `[[=o=]]', `[[=\(^o'=]]', and `[o\(^o']' are all synonymous. |
An equivalence class may not# be an endpoint |
An equivalence class may not(*) be an endpoint |
of a range. |
of a range. |
.Pp |
.Pp |
Within a bracket expression, the name of a |
Within a bracket expression, the name of a |
Line 214 These stand for the character classes de |
|
Line 214 These stand for the character classes de |
|
A locale may provide others. |
A locale may provide others. |
A character class may not be used as an endpoint of a range. |
A character class may not be used as an endpoint of a range. |
.Pp |
.Pp |
There are two special cases# of bracket expressions: |
There are two special cases(*) of bracket expressions: |
the bracket expressions `[[:\*[Lt]:]]' and `[[:\*[Gt]:]]' match |
the bracket expressions `[[:\*[Lt]:]]' and `[[:\*[Gt]:]]' match |
the null string at the beginning and end of a word respectively. |
the null string at the beginning and end of a word respectively. |
A word is defined as a sequence of word characters |
A word is defined as a sequence of word characters |
Line 260 When it appears inside a bracket express |
|
Line 260 When it appears inside a bracket express |
|
of it are added to the bracket expression, so that (e.g.) `[x]' |
of it are added to the bracket expression, so that (e.g.) `[x]' |
becomes `[xX]' and `[^x]' becomes `[^xX]'. |
becomes `[xX]' and `[^x]' becomes `[^xX]'. |
.Pp |
.Pp |
No particular limit is imposed on the length of REs#. |
No particular limit is imposed on the length of REs(*). |
Programs intended to be portable should not employ REs longer |
Programs intended to be portable should not employ REs longer |
than 256 bytes, |
than 256 bytes, |
as an implementation can refuse to accept such REs and remain |
as an implementation can refuse to accept such REs and remain |
Line 274 with `{' and `}' by themselves ordinary |
|
Line 274 with `{' and `}' by themselves ordinary |
|
The parentheses for nested subexpressions are `\e(' and `\e)', |
The parentheses for nested subexpressions are `\e(' and `\e)', |
with `(' and `)' by themselves ordinary characters. |
with `(' and `)' by themselves ordinary characters. |
`^' is an ordinary character except at the beginning of the |
`^' is an ordinary character except at the beginning of the |
RE or# the beginning of a parenthesized subexpression, |
RE or(*) the beginning of a parenthesized subexpression, |
`$' is an ordinary character except at the end of the |
`$' is an ordinary character except at the end of the |
RE or# the end of a parenthesized subexpression, |
RE or(*) the end of a parenthesized subexpression, |
and `*' is an ordinary character if it appears at the beginning of the |
and `*' is an ordinary character if it appears at the beginning of the |
RE or the beginning of a parenthesized subexpression |
RE or the beginning of a parenthesized subexpression |
(after a possible leading `^'). |
(after a possible leading `^'). |