Annotation of src/lib/libc/regex/regex.3, Revision 1.18
1.18 ! wiz 1: .\" $NetBSD: regex.3,v 1.17 2003/08/07 16:43:20 agc Exp $
1.4 cgd 2: .\"
1.3 cgd 3: .\" Copyright (c) 1992, 1993, 1994
4: .\" The Regents of the University of California. All rights reserved.
1.17 agc 5: .\"
6: .\" This code is derived from software contributed to Berkeley by
7: .\" Henry Spencer.
8: .\"
9: .\" Redistribution and use in source and binary forms, with or without
10: .\" modification, are permitted provided that the following conditions
11: .\" are met:
12: .\" 1. Redistributions of source code must retain the above copyright
13: .\" notice, this list of conditions and the following disclaimer.
14: .\" 2. Redistributions in binary form must reproduce the above copyright
15: .\" notice, this list of conditions and the following disclaimer in the
16: .\" documentation and/or other materials provided with the distribution.
17: .\" 3. Neither the name of the University nor the names of its contributors
18: .\" may be used to endorse or promote products derived from this software
19: .\" without specific prior written permission.
20: .\"
21: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
22: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
24: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
25: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
27: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
28: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
29: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31: .\" SUCH DAMAGE.
32: .\"
33: .\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
1.3 cgd 34: .\"
35: .\" This code is derived from software contributed to Berkeley by
36: .\" Henry Spencer.
37: .\"
38: .\" Redistribution and use in source and binary forms, with or without
39: .\" modification, are permitted provided that the following conditions
40: .\" are met:
41: .\" 1. Redistributions of source code must retain the above copyright
42: .\" notice, this list of conditions and the following disclaimer.
43: .\" 2. Redistributions in binary form must reproduce the above copyright
44: .\" notice, this list of conditions and the following disclaimer in the
45: .\" documentation and/or other materials provided with the distribution.
46: .\" 3. All advertising materials mentioning features or use of this software
47: .\" must display the following acknowledgement:
48: .\" This product includes software developed by the University of
49: .\" California, Berkeley and its contributors.
50: .\" 4. Neither the name of the University nor the names of its contributors
51: .\" may be used to endorse or promote products derived from this software
52: .\" without specific prior written permission.
53: .\"
54: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
55: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
56: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
57: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
58: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
59: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
60: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
61: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
62: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
63: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
64: .\" SUCH DAMAGE.
65: .\"
66: .\" @(#)regex.3 8.4 (Berkeley) 3/20/94
67: .\"
1.18 ! wiz 68: .Dd December 29, 2003
1.5 kleink 69: .Dt REGEX 3
70: .Os
71: .Sh NAME
72: .Nm regex ,
73: .Nm regcomp ,
74: .Nm regexec ,
75: .Nm regerror ,
76: .Nm regfree
77: .Nd regular-expression library
1.7 perry 78: .Sh LIBRARY
79: .Lb libc
1.5 kleink 80: .Sh SYNOPSIS
1.16 wiz 81: .In regex.h
1.5 kleink 82: .Ft int
1.15 kleink 83: .Fn regcomp "regex_t * restrict preg" "const char * restrict pattern" "int cflags"
1.5 kleink 84: .Ft int
1.15 kleink 85: .Fn regexec "const regex_t * restrict preg" "const char * restrict string" "size_t nmatch" "regmatch_t pmatch[]" "int eflags"
1.5 kleink 86: .Ft size_t
1.15 kleink 87: .Fn regerror "int errcode" "const regex_t * restrict preg" "char * restrict errbuf" "size_t errbuf_size"
1.5 kleink 88: .Ft void
89: .Fn regfree "regex_t *preg"
90: .Sh DESCRIPTION
91: These routines implement
92: .St -p1003.2-92
93: regular expressions (``RE''s);
1.1 jtc 94: see
1.5 kleink 95: .Xr re_format 7 .
96: .Fn regcomp
1.1 jtc 97: compiles an RE written as a string into an internal form,
1.5 kleink 98: .Fn regexec
1.1 jtc 99: matches that internal form against a string and reports results,
1.5 kleink 100: .Fn regerror
1.1 jtc 101: transforms error codes from either into human-readable messages,
102: and
1.5 kleink 103: .Fn regfree
1.1 jtc 104: frees any dynamically-allocated storage used by the internal form
105: of an RE.
1.5 kleink 106: .Pp
1.1 jtc 107: The header
1.12 ross 108: .Em \*[Lt]regex.h\*[Gt]
1.1 jtc 109: declares two structure types,
1.5 kleink 110: .Fa regex_t
1.1 jtc 111: and
1.5 kleink 112: .Fa regmatch_t ,
1.1 jtc 113: the former for compiled internal forms and the latter for match reporting.
114: It also declares the four functions,
115: a type
1.5 kleink 116: .Fa regoff_t ,
1.1 jtc 117: and a number of constants with names starting with ``REG_''.
1.5 kleink 118: .Pp
1.9 lukem 119: .Fn regcomp
1.1 jtc 120: compiles the regular expression contained in the
1.5 kleink 121: .Fa pattern
1.1 jtc 122: string,
123: subject to the flags in
1.5 kleink 124: .Fa cflags ,
1.1 jtc 125: and places the results in the
1.5 kleink 126: .Fa regex_t
1.1 jtc 127: structure pointed to by
1.5 kleink 128: .Fa preg .
1.9 lukem 129: .Fa cflags
1.1 jtc 130: is the bitwise OR of zero or more of the following flags:
1.5 kleink 131: .Bl -tag -width XXXREG_EXTENDED
132: .It Dv REG_EXTENDED
133: Compile modern (``extended'') REs, rather than the obsolete
134: (``basic'') REs that are the default.
135: .It Dv REG_BASIC
1.1 jtc 136: This is a synonym for 0,
137: provided as a counterpart to REG_EXTENDED to improve readability.
1.5 kleink 138: .It Dv REG_NOSPEC
1.14 wiz 139: Compile with recognition of all special characters turned off.
140: All characters are thus considered ordinary, so the ``RE'' is a literal
1.5 kleink 141: string.
142: This is an extension, compatible with but not specified by
143: .St -p1003.2-92 ,
144: and should be used with caution in software intended to be portable to
145: other systems.
146: .Dv REG_EXTENDED
147: and
148: .Dv REG_NOSPEC
149: may not be used in the same call to
150: .Fn regcomp .
151: .It Dv REG_ICASE
1.14 wiz 152: Compile for matching that ignores upper/lower case distinctions.
153: See
1.5 kleink 154: .Xr re_format 7 .
155: .It Dv REG_NOSUB
156: Compile for matching that need only report success or failure, not
157: what was matched.
158: .It Dv REG_NEWLINE
1.1 jtc 159: Compile for newline-sensitive matching.
160: By default, newline is a completely ordinary character with no special
161: meaning in either REs or strings.
162: With this flag,
163: `[^' bracket expressions and `.' never match newline,
164: a `^' anchor matches the null string after any newline in the string
165: in addition to its normal function,
166: and the `$' anchor matches the null string before any newline in the
167: string in addition to its normal function.
1.5 kleink 168: .It Dv REG_PEND
169: The regular expression ends, not at the first NUL, but just before the
170: character pointed to by the
171: .Fa re_endp
1.1 jtc 172: member of the structure pointed to by
1.5 kleink 173: .Fa preg .
1.1 jtc 174: The
1.5 kleink 175: .Fa re_endp
1.1 jtc 176: member is of type
1.5 kleink 177: .Fa "const\ char\ *" .
178: This flag permits inclusion of NULs in the RE; they are considered
179: ordinary characters.
180: This is an extension, compatible with but not specified by
181: .St -p1003.2-92 ,
182: and should be used with caution in software intended to be portable to
183: other systems.
184: .El
185: .Pp
1.1 jtc 186: When successful,
1.5 kleink 187: .Fn regcomp
1.1 jtc 188: returns 0 and fills in the structure pointed to by
1.5 kleink 189: .Fa preg .
190: One member of that structure (other than
191: .Fa re_endp )
1.1 jtc 192: is publicized:
1.5 kleink 193: .Fa re_nsub ,
1.1 jtc 194: of type
1.5 kleink 195: .Fa size_t ,
1.1 jtc 196: contains the number of parenthesized subexpressions within the RE
197: (except that the value of this member is undefined if the
1.5 kleink 198: .Dv REG_NOSUB
199: flag was used).
1.1 jtc 200: If
1.5 kleink 201: .Fn regcomp
1.1 jtc 202: fails, it returns a non-zero error code;
1.11 wiz 203: see
204: .Sx DIAGNOSTICS .
1.5 kleink 205: .Pp
1.9 lukem 206: .Fn regexec
1.1 jtc 207: matches the compiled RE pointed to by
1.5 kleink 208: .Fa preg
1.1 jtc 209: against the
1.5 kleink 210: .Fa string ,
1.1 jtc 211: subject to the flags in
1.5 kleink 212: .Fa eflags ,
1.1 jtc 213: and reports results using
1.5 kleink 214: .Fa nmatch ,
215: .Fa pmatch ,
1.1 jtc 216: and the returned value.
217: The RE must have been compiled by a previous invocation of
1.5 kleink 218: .Fn regcomp .
1.1 jtc 219: The compiled form is not altered during execution of
1.5 kleink 220: .Fn regexec ,
1.1 jtc 221: so a single compiled RE can be used simultaneously by multiple threads.
1.5 kleink 222: .Pp
1.1 jtc 223: By default,
224: the NUL-terminated string pointed to by
1.5 kleink 225: .Fa string
1.1 jtc 226: is considered to be the text of an entire line, minus any terminating
227: newline.
228: The
1.5 kleink 229: .Fa eflags
1.1 jtc 230: argument is the bitwise OR of zero or more of the following flags:
1.5 kleink 231: .Bl -tag -width XXXREG_NOTBOL
232: .It Dv REG_NOTBOL
233: The first character of the string
1.1 jtc 234: is not the beginning of a line, so the `^' anchor should not match before it.
1.5 kleink 235: This does not affect the behavior of newlines under
236: .Dv REG_NEWLINE .
237: .It Dv REG_NOTEOL
238: The NUL terminating the string does not end a line, so the `$' anchor
1.14 wiz 239: should not match before it.
240: This does not affect the behavior of newlines under
1.5 kleink 241: .Dv REG_NEWLINE .
242: .It Dv REG_STARTEND
1.1 jtc 243: The string is considered to start at
1.5 kleink 244: .Fa string
245: +
246: .Fa pmatch[0].rm_so
1.1 jtc 247: and to have a terminating NUL located at
1.5 kleink 248: .Fa string
249: +
250: .Fa pmatch[0].rm_eo
1.1 jtc 251: (there need not actually be a NUL at that location),
252: regardless of the value of
1.5 kleink 253: .Fa nmatch .
1.1 jtc 254: See below for the definition of
1.5 kleink 255: .Fa pmatch
1.1 jtc 256: and
1.5 kleink 257: .Fa nmatch .
258: This is an extension, compatible with but not specified by
259: .St -p1003.2-92 ,
260: and should be used with caution in software intended to be portable to
261: other systems.
262: Note that a non-zero
263: .Fa rm_so
264: does not imply
265: .Dv REG_NOTBOL ;
266: .Dv REG_STARTEND
267: affects only the location of the string, not how it is matched.
268: .El
269: .Pp
1.1 jtc 270: See
1.5 kleink 271: .Xr re_format 7
1.1 jtc 272: for a discussion of what is matched in situations where an RE or a
273: portion thereof could match any of several substrings of
1.5 kleink 274: .Fa string .
275: .Pp
1.1 jtc 276: Normally,
1.5 kleink 277: .Fn regexec
278: returns 0 for success and the non-zero code
279: .Dv REG_NOMATCH
280: for failure.
1.1 jtc 281: Other non-zero error codes may be returned in exceptional situations;
1.11 wiz 282: see
283: .Sx DIAGNOSTICS .
1.5 kleink 284: .Pp
285: If
286: .Dv REG_NOSUB
287: was specified in the compilation of the RE, or if
288: .Fa nmatch
1.1 jtc 289: is 0,
1.5 kleink 290: .Fn regexec
1.1 jtc 291: ignores the
1.5 kleink 292: .Fa pmatch
293: argument (but see below for the case where
294: .Dv REG_STARTEND
295: is specified).
1.1 jtc 296: Otherwise,
1.5 kleink 297: .Fa pmatch
1.1 jtc 298: points to an array of
1.5 kleink 299: .Fa nmatch
1.1 jtc 300: structures of type
1.5 kleink 301: .Fa regmatch_t .
1.1 jtc 302: Such a structure has at least the members
1.5 kleink 303: .Fa rm_so
1.1 jtc 304: and
1.5 kleink 305: .Fa rm_eo ,
1.1 jtc 306: both of type
1.5 kleink 307: .Fa regoff_t
1.1 jtc 308: (a signed arithmetic type at least as large as an
1.5 kleink 309: .Fa off_t
1.1 jtc 310: and a
1.5 kleink 311: .Fa ssize_t ) ,
1.1 jtc 312: containing respectively the offset of the first character of a substring
313: and the offset of the first character after the end of the substring.
314: Offsets are measured from the beginning of the
1.5 kleink 315: .Fa string
1.1 jtc 316: argument given to
1.5 kleink 317: .Fn regexec .
1.1 jtc 318: An empty substring is denoted by equal offsets,
319: both indicating the character following the empty substring.
1.5 kleink 320: .Pp
1.1 jtc 321: The 0th member of the
1.5 kleink 322: .Fa pmatch
1.1 jtc 323: array is filled in to indicate what substring of
1.5 kleink 324: .Fa string
1.1 jtc 325: was matched by the entire RE.
326: Remaining members report what substring was matched by parenthesized
327: subexpressions within the RE;
328: member
1.5 kleink 329: .Fa i
1.1 jtc 330: reports subexpression
1.5 kleink 331: .Fa i ,
332: with subexpressions counted (starting at 1) by the order of their
333: opening parentheses in the RE, left to right.
1.1 jtc 334: Unused entries in the array\(emcorresponding either to subexpressions that
335: did not participate in the match at all, or to subexpressions that do not
1.5 kleink 336: exist in the RE (that is,
337: .Fa i
1.12 ross 338: \*[Gt]
339: .Fa preg-\*[Gt]re_nsub )
1.5 kleink 340: \(emhave both
341: .Fa rm_so
1.1 jtc 342: and
1.5 kleink 343: .Fa rm_eo
344: set to -1.
1.1 jtc 345: If a subexpression participated in the match several times,
346: the reported substring is the last one it matched.
347: (Note, as an example in particular, that when the RE `(b*)+' matches `bbb',
348: the parenthesized subexpression matches each of the three `b's and then
349: an infinite number of empty strings following the last `b',
350: so the reported substring is one of the empties.)
1.5 kleink 351: .Pp
352: If
353: .Dv REG_STARTEND
354: is specified,
355: .Fa pmatch
1.1 jtc 356: must point to at least one
1.5 kleink 357: .Fa regmatch_t
1.1 jtc 358: (even if
1.5 kleink 359: .Fa nmatch
360: is 0 or
361: .Dv REG_NOSUB
362: was specified),
363: to hold the input offsets for
364: .Dv REG_STARTEND .
1.1 jtc 365: Use for output is still entirely controlled by
1.5 kleink 366: .Fa nmatch ;
1.1 jtc 367: if
1.5 kleink 368: .Fa nmatch
369: is 0 or
370: .Dv REG_NOSUB
371: was specified,
1.1 jtc 372: the value of
1.5 kleink 373: .Fa pmatch [0]
1.1 jtc 374: will not be changed by a successful
1.5 kleink 375: .Fn regexec .
376: .Pp
1.9 lukem 377: .Fn regerror
1.1 jtc 378: maps a non-zero
1.5 kleink 379: .Fa errcode
1.1 jtc 380: from either
1.5 kleink 381: .Fn regcomp
1.1 jtc 382: or
1.5 kleink 383: .Fn regexec
1.1 jtc 384: to a human-readable, printable message.
385: If
1.5 kleink 386: .Fa preg
1.1 jtc 387: is non-NULL,
1.5 kleink 388: the error code should have arisen from use of the
389: .Fa regex_t
1.1 jtc 390: pointed to by
1.5 kleink 391: .Fa preg ,
1.1 jtc 392: and if the error code came from
1.5 kleink 393: .Fn regcomp ,
1.1 jtc 394: it should have been the result from the most recent
1.5 kleink 395: .Fn regcomp
1.1 jtc 396: using that
1.5 kleink 397: .Fa regex_t . (
1.9 lukem 398: .Fn regerror
1.1 jtc 399: may be able to supply a more detailed message using information
400: from the
1.5 kleink 401: .Fa regex_t . )
1.9 lukem 402: .Fn regerror
1.1 jtc 403: places the NUL-terminated message into the buffer pointed to by
1.5 kleink 404: .Fa errbuf ,
1.1 jtc 405: limiting the length (including the NUL) to at most
1.5 kleink 406: .Fa errbuf_size
1.1 jtc 407: bytes.
408: If the whole message won't fit,
409: as much of it as will fit before the terminating NUL is supplied.
410: In any case,
411: the returned value is the size of buffer needed to hold the whole
412: message (including terminating NUL).
413: If
1.5 kleink 414: .Fa errbuf_size
1.1 jtc 415: is 0,
1.5 kleink 416: .Fa errbuf
1.1 jtc 417: is ignored but the return value is still correct.
1.5 kleink 418: .Pp
1.1 jtc 419: If the
1.5 kleink 420: .Fa errcode
1.1 jtc 421: given to
1.5 kleink 422: .Fn regerror
423: is first ORed with
424: .Dv REG_ITOA ,
1.1 jtc 425: the ``message'' that results is the printable name of the error code,
426: e.g. ``REG_NOMATCH'',
427: rather than an explanation thereof.
428: If
1.5 kleink 429: .Fa errcode
1.10 jdolecek 430: is
1.5 kleink 431: .Dv REG_ATOI ,
1.1 jtc 432: then
1.5 kleink 433: .Fa preg
1.1 jtc 434: shall be non-NULL and the
1.5 kleink 435: .Fa re_endp
1.1 jtc 436: member of the structure it points to
437: must point to the printable name of an error code;
438: in this case, the result in
1.5 kleink 439: .Fa errbuf
1.1 jtc 440: is the decimal digits of
441: the numeric value of the error code
442: (0 if the name is not recognized).
1.5 kleink 443: .Dv REG_ITOA
444: and
445: .Dv REG_ATOI
446: are intended primarily as debugging facilities;
447: they are extensions, compatible with but not specified by
448: .St -p1003.2-92 ,
449: and should be used with caution in software intended to be portable to
450: other systems.
1.1 jtc 451: Be warned also that they are considered experimental and changes are possible.
1.5 kleink 452: .Pp
1.9 lukem 453: .Fn regfree
1.1 jtc 454: frees any dynamically-allocated storage associated with the compiled RE
455: pointed to by
1.5 kleink 456: .Fa preg .
1.1 jtc 457: The remaining
1.5 kleink 458: .Fa regex_t
1.1 jtc 459: is no longer a valid compiled RE
460: and the effect of supplying it to
1.5 kleink 461: .Fn regexec
1.1 jtc 462: or
1.5 kleink 463: .Fn regerror
1.1 jtc 464: is undefined.
1.5 kleink 465: .Pp
1.1 jtc 466: None of these functions references global variables except for tables
467: of constants;
468: all are safe for use from multiple threads if the arguments are safe.
1.5 kleink 469: .Sh IMPLEMENTATION CHOICES
470: There are a number of decisions that
471: .St -p1003.2-92
472: leaves up to the implementor,
1.1 jtc 473: either by explicitly saying ``undefined'' or by virtue of them being
474: forbidden by the RE grammar.
475: This implementation treats them as follows.
1.5 kleink 476: .Pp
1.1 jtc 477: See
1.5 kleink 478: .Xr re_format 7
1.1 jtc 479: for a discussion of the definition of case-independent matching.
1.5 kleink 480: .Pp
1.1 jtc 481: There is no particular limit on the length of REs,
482: except insofar as memory is limited.
483: Memory usage is approximately linear in RE size, and largely insensitive
484: to RE complexity, except for bounded repetitions.
485: See BUGS for one short RE using them
486: that will run almost any system out of memory.
1.5 kleink 487: .Pp
1.1 jtc 488: A backslashed character other than one specifically given a magic meaning
1.5 kleink 489: by
490: .St -p1003.2-92
491: (such magic meanings occur only in obsolete [``basic''] REs)
1.1 jtc 492: is taken as an ordinary character.
1.5 kleink 493: .Pp
494: Any unmatched [ is a
495: .Dv REG_EBRACK
496: error.
497: .Pp
1.1 jtc 498: Equivalence classes cannot begin or end bracket-expression ranges.
499: The endpoint of one range cannot begin another.
1.5 kleink 500: .Pp
501: .Dv RE_DUP_MAX ,
502: the limit on repetition counts in bounded repetitions, is 255.
503: .Pp
1.1 jtc 504: A repetition operator (?, *, +, or bounds) cannot follow another
505: repetition operator.
506: A repetition operator cannot begin an expression or subexpression
507: or follow `^' or `|'.
1.5 kleink 508: .Pp
1.1 jtc 509: `|' cannot appear first or last in a (sub)expression or after another `|',
510: i.e. an operand of `|' cannot be an empty subexpression.
511: An empty parenthesized subexpression, `()', is legal and matches an
512: empty (sub)string.
513: An empty string is not a legal RE.
1.5 kleink 514: .Pp
1.1 jtc 515: A `{' followed by a digit is considered the beginning of bounds for a
516: bounded repetition, which must then follow the syntax for bounds.
517: A `{' \fInot\fR followed by a digit is considered an ordinary character.
1.5 kleink 518: .Pp
1.1 jtc 519: `^' and `$' beginning and ending subexpressions in obsolete (``basic'')
520: REs are anchors, not ordinary characters.
1.5 kleink 521: .Sh DIAGNOSTICS
1.1 jtc 522: Non-zero error codes from
1.5 kleink 523: .Fn regcomp
1.1 jtc 524: and
1.5 kleink 525: .Fn regexec
1.1 jtc 526: include the following:
1.5 kleink 527: .Pp
528: .Bl -tag -width XXXREG_ECOLLATE -compact
529: .It Dv REG_NOMATCH
530: regexec() failed to match
531: .It Dv REG_BADPAT
532: invalid regular expression
533: .It Dv REG_ECOLLATE
534: invalid collating element
535: .It Dv REG_ECTYPE
536: invalid character class
537: .It Dv REG_EESCAPE
538: \e applied to unescapable character
539: .It Dv REG_ESUBREG
540: invalid backreference number
541: .It Dv REG_EBRACK
542: brackets [ ] not balanced
543: .It Dv REG_EPAREN
544: parentheses ( ) not balanced
545: .It Dv REG_EBRACE
546: braces { } not balanced
547: .It Dv REG_BADBR
548: invalid repetition count(s) in { }
549: .It Dv REG_ERANGE
550: invalid character range in [ ]
551: .It Dv REG_ESPACE
552: ran out of memory
553: .It Dv REG_BADRPT
554: ?, *, or + operand invalid
555: .It Dv REG_EMPTY
556: empty (sub)expression
557: .It Dv REG_ASSERT
558: ``can't happen''\(emyou found a bug
559: .It Dv REG_INVARG
560: invalid argument, e.g. negative-length string
561: .El
1.11 wiz 562: .Sh SEE ALSO
563: .Xr grep 1 ,
564: .Xr sed 1 ,
565: .Xr re_format 7
566: .Pp
567: .St -p1003.2-92 ,
568: sections 2.8 (Regular Expression Notation)
569: and
570: B.5 (C Binding for Regular Expression Matching).
1.5 kleink 571: .Sh HISTORY
1.3 cgd 572: Originally written by Henry Spencer.
1.8 perry 573: Altered for inclusion in the
574: .Bx 4.4
575: distribution.
1.5 kleink 576: .Sh BUGS
1.1 jtc 577: There is one known functionality bug.
578: The implementation of internationalization is incomplete:
1.5 kleink 579: the locale is always assumed to be the default one of
580: .St -p1003.2-92 ,
1.1 jtc 581: and only the collating elements etc. of that locale are available.
1.5 kleink 582: .Pp
1.1 jtc 583: The back-reference code is subtle and doubts linger about its correctness
584: in complex cases.
1.5 kleink 585: .Pp
1.9 lukem 586: .Fn regexec
1.1 jtc 587: performance is poor.
588: This will improve with later releases.
1.9 lukem 589: .Fa nmatch
1.1 jtc 590: exceeding 0 is expensive;
1.5 kleink 591: .Fa nmatch
1.1 jtc 592: exceeding 1 is worse.
1.9 lukem 593: .Fa regexec
1.5 kleink 594: is largely insensitive to RE complexity
595: .Em except
596: that back references are massively expensive.
1.1 jtc 597: RE length does matter; in particular, there is a strong speed bonus
598: for keeping RE length under about 30 characters,
599: with most special characters counting roughly double.
1.5 kleink 600: .Pp
1.9 lukem 601: .Fn regcomp
1.1 jtc 602: implements bounded repetitions by macro expansion,
603: which is costly in time and space if counts are large
604: or bounded repetitions are nested.
605: An RE like, say,
606: `((((a{1,100}){1,100}){1,100}){1,100}){1,100}'
607: will (eventually) run almost any existing machine out of swap space.
1.5 kleink 608: .Pp
1.1 jtc 609: There are suspected problems with response to obscure error conditions.
610: Notably,
611: certain kinds of internal overflow,
612: produced only by truly enormous REs or by multiply nested bounded repetitions,
613: are probably not handled well.
1.5 kleink 614: .Pp
615: Due to a mistake in
616: .St -p1003.2-92 ,
617: things like `a)b' are legal REs because `)' is a special character
1.14 wiz 618: only in the presence of a previous unmatched `('.
619: This can't be fixed until the spec is fixed.
1.5 kleink 620: .Pp
1.1 jtc 621: The standard's definition of back references is vague.
622: For example, does
623: `a\e(\e(b\e)*\e2\e)*d' match `abbbd'?
1.5 kleink 624: Until the standard is clarified, behavior in such cases should not be
625: relied on.
626: .Pp
1.1 jtc 627: The implementation of word-boundary matching is a bit of a kludge,
628: and bugs may lurk in combinations of word-boundary matching and anchoring.
CVSweb <webmaster@jp.NetBSD.org>