src/lib/libc/regex/re_format.7 - diff

Return to re_format.7 CVS log

Up to [cvs.NetBSD.org] / src / lib / libc / regex

Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.

Diff for /src/lib/libc/regex/re_format.7 between version 1.10 and 1.11

version 1.10, 2013/01/25 11:51:42

version 1.11, 2015/08/22 14:04:54

Line 83 and obsolete REs (roughly those of

Obsolete REs mostly exist for backward compatibility in some old programs;

they will be discussed at the end.

1003.2 leaves some aspects of RE syntax and semantics open;

`#' marks decisions on these aspects that

`(*)' marks decisions on these aspects that

may not be fully portable to other 1003.2 implementations.

.Pp

A (modern) RE is one# or more non-empty#

A (modern) RE is one(*) or more non-empty(*)

.Em branches ,

separated by `|'.

It matches anything that matches one of the branches.

.Pp

A branch is one# or more

A branch is one(*) or more

.Em pieces ,

concatenated.

It matches a match for the first, followed by a match for the second, etc.

Line 99 It matches a match for the first, follow

A piece is an

.Em atom

possibly followed

by a single# `*', `+', `?', or

by a single(*) `*', `+', `?', or

.Em bound .

An atom followed by `*' matches a sequence of 0 or more matches of the atom.

An atom followed by `+' matches a sequence of 1 or more matches of the atom.

Line 110 A

is `{' followed by an unsigned decimal integer, possibly followed by `,'

possibly followed by another unsigned decimal integer,

always followed by `}'.

The integers must lie between 0 and RE_DUP_MAX (255#) inclusive,

The integers must lie between 0 and RE_DUP_MAX (255(*)) inclusive,

and if there are two of them, the first may not exceed the second.

An atom followed by a bound containing one integer

.Em i

Line 133 through

(inclusive) matches of the atom.

.Pp

An atom is a regular expression enclosed in `()' (matching a match for the

regular expression), an empty set of `()' (matching the null string)#, a

regular expression), an empty set of `()' (matching the null string)(*), a

.Em bracket expression

(see below), `.' (matching any single character),

`^' (matching the null string at the beginning of a line),

`$' (matching the null string at the end of a line),

a `\e' followed by one of the characters `^.[$()|*+?{\e'

(matching that character taken as an ordinary character),

a `\e' followed by any other character#

a `\e' followed by any other character(*)

(matching that character taken as an ordinary character,

as if the `\e' had not been present#),

as if the `\e' had not been present(*)),

or a single character with no other significance (matching that character).

A `{' followed by a character other than a digit is an ordinary

character, not the beginning of a bound#.

character, not the beginning of a bound(*).

It is illegal to end an RE with `\e'.

.Pp

Line 161 for the full

.Em range

of characters between those two (inclusive) in the collating sequence,

e.g. `[0-9]' in ASCII matches any decimal digit.

It is illegal# for two ranges to share an endpoint, e.g. `a-c-e'.

It is illegal(*) for two ranges to share an endpoint, e.g. `a-c-e'.

Ranges are very collating-sequence-dependent,

and portable programs should avoid relying on them.

.Pp

Line 194 of all collating elements equivalent to

the treatment is as if the enclosing delimiters were `[.' and `.]'.)

For example, if o and '\(^o' are the members of an equivalence class,

then `[[=o=]]', `[[=\(^o'=]]', and `[o\(^o']' are all synonymous.

An equivalence class may not# be an endpoint

An equivalence class may not(*) be an endpoint

of a range.

.Pp

Within a bracket expression, the name of a

Line 214 These stand for the character classes de

A locale may provide others.

A character class may not be used as an endpoint of a range.

.Pp

There are two special cases# of bracket expressions:

There are two special cases(*) of bracket expressions:

the bracket expressions `[[:\*[Lt]:]]' and `[[:\*[Gt]:]]' match

the null string at the beginning and end of a word respectively.

A word is defined as a sequence of word characters

Line 260 When it appears inside a bracket express

of it are added to the bracket expression, so that (e.g.) `[x]'

becomes `[xX]' and `[^x]' becomes `[^xX]'.

.Pp

No particular limit is imposed on the length of REs#.

No particular limit is imposed on the length of REs(*).

Programs intended to be portable should not employ REs longer

than 256 bytes,

as an implementation can refuse to accept such REs and remain

Line 274 with `{' and `}' by themselves ordinary

The parentheses for nested subexpressions are `\e(' and `\e)',

with `(' and `)' by themselves ordinary characters.

`^' is an ordinary character except at the beginning of the

RE or# the beginning of a parenthesized subexpression,

RE or(*) the beginning of a parenthesized subexpression,

`$' is an ordinary character except at the end of the

RE or# the end of a parenthesized subexpression,

RE or(*) the end of a parenthesized subexpression,

and `*' is an ordinary character if it appears at the beginning of the

RE or the beginning of a parenthesized subexpression

(after a possible leading `^').

CVSweb <webmaster@jp.NetBSD.org>