Annotation of src/lib/libc/regex/regex.3, Revision 1.12
1.12 ! ross 1: .\" $NetBSD: regex.3,v 1.11 2001/09/16 02:20:13 wiz Exp $
1.4 cgd 2: .\"
1.3 cgd 3: .\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
4: .\" Copyright (c) 1992, 1993, 1994
5: .\" The Regents of the University of California. All rights reserved.
6: .\"
7: .\" This code is derived from software contributed to Berkeley by
8: .\" Henry Spencer.
9: .\"
10: .\" Redistribution and use in source and binary forms, with or without
11: .\" modification, are permitted provided that the following conditions
12: .\" are met:
13: .\" 1. Redistributions of source code must retain the above copyright
14: .\" notice, this list of conditions and the following disclaimer.
15: .\" 2. Redistributions in binary form must reproduce the above copyright
16: .\" notice, this list of conditions and the following disclaimer in the
17: .\" documentation and/or other materials provided with the distribution.
18: .\" 3. All advertising materials mentioning features or use of this software
19: .\" must display the following acknowledgement:
20: .\" This product includes software developed by the University of
21: .\" California, Berkeley and its contributors.
22: .\" 4. Neither the name of the University nor the names of its contributors
23: .\" may be used to endorse or promote products derived from this software
24: .\" without specific prior written permission.
25: .\"
26: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
27: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
28: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
29: .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
30: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
31: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
32: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
33: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
34: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
35: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
36: .\" SUCH DAMAGE.
37: .\"
38: .\" @(#)regex.3 8.4 (Berkeley) 3/20/94
39: .\"
1.6 perry 40: .Dd March 20, 1994
1.5 kleink 41: .Dt REGEX 3
42: .Os
43: .Sh NAME
44: .Nm regex ,
45: .Nm regcomp ,
46: .Nm regexec ,
47: .Nm regerror ,
48: .Nm regfree
49: .Nd regular-expression library
1.7 perry 50: .Sh LIBRARY
51: .Lb libc
1.5 kleink 52: .Sh SYNOPSIS
1.12 ! ross 53: .Fd #include \*[Lt]sys/types.h\*[Gt]
! 54: .Fd #include \*[Lt]regex.h\*[Gt]
1.5 kleink 55: .Ft int
56: .Fn regcomp "regex_t *preg" "const char *pattern" "int cflags"
57: .Ft int
58: .Fn regexec "const regex_t *preg" "const char *string" "size_t nmatch" "regmatch_t pmatch[]" "int eflags"
59: .Ft size_t
60: .Fn regerror "int errcode" "const regex_t *preg" "char *errbuf" "size_t errbuf_size"
61: .Ft void
62: .Fn regfree "regex_t *preg"
63: .Sh DESCRIPTION
64: These routines implement
65: .St -p1003.2-92
66: regular expressions (``RE''s);
1.1 jtc 67: see
1.5 kleink 68: .Xr re_format 7 .
69: .Fn regcomp
1.1 jtc 70: compiles an RE written as a string into an internal form,
1.5 kleink 71: .Fn regexec
1.1 jtc 72: matches that internal form against a string and reports results,
1.5 kleink 73: .Fn regerror
1.1 jtc 74: transforms error codes from either into human-readable messages,
75: and
1.5 kleink 76: .Fn regfree
1.1 jtc 77: frees any dynamically-allocated storage used by the internal form
78: of an RE.
1.5 kleink 79: .Pp
1.1 jtc 80: The header
1.12 ! ross 81: .Em \*[Lt]regex.h\*[Gt]
1.1 jtc 82: declares two structure types,
1.5 kleink 83: .Fa regex_t
1.1 jtc 84: and
1.5 kleink 85: .Fa regmatch_t ,
1.1 jtc 86: the former for compiled internal forms and the latter for match reporting.
87: It also declares the four functions,
88: a type
1.5 kleink 89: .Fa regoff_t ,
1.1 jtc 90: and a number of constants with names starting with ``REG_''.
1.5 kleink 91: .Pp
1.9 lukem 92: .Fn regcomp
1.1 jtc 93: compiles the regular expression contained in the
1.5 kleink 94: .Fa pattern
1.1 jtc 95: string,
96: subject to the flags in
1.5 kleink 97: .Fa cflags ,
1.1 jtc 98: and places the results in the
1.5 kleink 99: .Fa regex_t
1.1 jtc 100: structure pointed to by
1.5 kleink 101: .Fa preg .
1.9 lukem 102: .Fa cflags
1.1 jtc 103: is the bitwise OR of zero or more of the following flags:
1.5 kleink 104: .Bl -tag -width XXXREG_EXTENDED
105: .It Dv REG_EXTENDED
106: Compile modern (``extended'') REs, rather than the obsolete
107: (``basic'') REs that are the default.
108: .It Dv REG_BASIC
1.1 jtc 109: This is a synonym for 0,
110: provided as a counterpart to REG_EXTENDED to improve readability.
1.5 kleink 111: .It Dv REG_NOSPEC
112: Compile with recognition of all special characters turned off. All
113: characters are thus considered ordinary, so the ``RE'' is a literal
114: string.
115: This is an extension, compatible with but not specified by
116: .St -p1003.2-92 ,
117: and should be used with caution in software intended to be portable to
118: other systems.
119: .Dv REG_EXTENDED
120: and
121: .Dv REG_NOSPEC
122: may not be used in the same call to
123: .Fn regcomp .
124: .It Dv REG_ICASE
125: Compile for matching that ignores upper/lower case distinctions. See
126: .Xr re_format 7 .
127: .It Dv REG_NOSUB
128: Compile for matching that need only report success or failure, not
129: what was matched.
130: .It Dv REG_NEWLINE
1.1 jtc 131: Compile for newline-sensitive matching.
132: By default, newline is a completely ordinary character with no special
133: meaning in either REs or strings.
134: With this flag,
135: `[^' bracket expressions and `.' never match newline,
136: a `^' anchor matches the null string after any newline in the string
137: in addition to its normal function,
138: and the `$' anchor matches the null string before any newline in the
139: string in addition to its normal function.
1.5 kleink 140: .It Dv REG_PEND
141: The regular expression ends, not at the first NUL, but just before the
142: character pointed to by the
143: .Fa re_endp
1.1 jtc 144: member of the structure pointed to by
1.5 kleink 145: .Fa preg .
1.1 jtc 146: The
1.5 kleink 147: .Fa re_endp
1.1 jtc 148: member is of type
1.5 kleink 149: .Fa "const\ char\ *" .
150: This flag permits inclusion of NULs in the RE; they are considered
151: ordinary characters.
152: This is an extension, compatible with but not specified by
153: .St -p1003.2-92 ,
154: and should be used with caution in software intended to be portable to
155: other systems.
156: .El
157: .Pp
1.1 jtc 158: When successful,
1.5 kleink 159: .Fn regcomp
1.1 jtc 160: returns 0 and fills in the structure pointed to by
1.5 kleink 161: .Fa preg .
162: One member of that structure (other than
163: .Fa re_endp )
1.1 jtc 164: is publicized:
1.5 kleink 165: .Fa re_nsub ,
1.1 jtc 166: of type
1.5 kleink 167: .Fa size_t ,
1.1 jtc 168: contains the number of parenthesized subexpressions within the RE
169: (except that the value of this member is undefined if the
1.5 kleink 170: .Dv REG_NOSUB
171: flag was used).
1.1 jtc 172: If
1.5 kleink 173: .Fn regcomp
1.1 jtc 174: fails, it returns a non-zero error code;
1.11 wiz 175: see
176: .Sx DIAGNOSTICS .
1.5 kleink 177: .Pp
1.9 lukem 178: .Fn regexec
1.1 jtc 179: matches the compiled RE pointed to by
1.5 kleink 180: .Fa preg
1.1 jtc 181: against the
1.5 kleink 182: .Fa string ,
1.1 jtc 183: subject to the flags in
1.5 kleink 184: .Fa eflags ,
1.1 jtc 185: and reports results using
1.5 kleink 186: .Fa nmatch ,
187: .Fa pmatch ,
1.1 jtc 188: and the returned value.
189: The RE must have been compiled by a previous invocation of
1.5 kleink 190: .Fn regcomp .
1.1 jtc 191: The compiled form is not altered during execution of
1.5 kleink 192: .Fn regexec ,
1.1 jtc 193: so a single compiled RE can be used simultaneously by multiple threads.
1.5 kleink 194: .Pp
1.1 jtc 195: By default,
196: the NUL-terminated string pointed to by
1.5 kleink 197: .Fa string
1.1 jtc 198: is considered to be the text of an entire line, minus any terminating
199: newline.
200: The
1.5 kleink 201: .Fa eflags
1.1 jtc 202: argument is the bitwise OR of zero or more of the following flags:
1.5 kleink 203: .Bl -tag -width XXXREG_NOTBOL
204: .It Dv REG_NOTBOL
205: The first character of the string
1.1 jtc 206: is not the beginning of a line, so the `^' anchor should not match before it.
1.5 kleink 207: This does not affect the behavior of newlines under
208: .Dv REG_NEWLINE .
209: .It Dv REG_NOTEOL
210: The NUL terminating the string does not end a line, so the `$' anchor
211: should not match before it. This does not affect the behavior of
212: newlines under
213: .Dv REG_NEWLINE .
214: .It Dv REG_STARTEND
1.1 jtc 215: The string is considered to start at
1.5 kleink 216: .Fa string
217: +
218: .Fa pmatch[0].rm_so
1.1 jtc 219: and to have a terminating NUL located at
1.5 kleink 220: .Fa string
221: +
222: .Fa pmatch[0].rm_eo
1.1 jtc 223: (there need not actually be a NUL at that location),
224: regardless of the value of
1.5 kleink 225: .Fa nmatch .
1.1 jtc 226: See below for the definition of
1.5 kleink 227: .Fa pmatch
1.1 jtc 228: and
1.5 kleink 229: .Fa nmatch .
230: This is an extension, compatible with but not specified by
231: .St -p1003.2-92 ,
232: and should be used with caution in software intended to be portable to
233: other systems.
234: Note that a non-zero
235: .Fa rm_so
236: does not imply
237: .Dv REG_NOTBOL ;
238: .Dv REG_STARTEND
239: affects only the location of the string, not how it is matched.
240: .El
241: .Pp
1.1 jtc 242: See
1.5 kleink 243: .Xr re_format 7
1.1 jtc 244: for a discussion of what is matched in situations where an RE or a
245: portion thereof could match any of several substrings of
1.5 kleink 246: .Fa string .
247: .Pp
1.1 jtc 248: Normally,
1.5 kleink 249: .Fn regexec
250: returns 0 for success and the non-zero code
251: .Dv REG_NOMATCH
252: for failure.
1.1 jtc 253: Other non-zero error codes may be returned in exceptional situations;
1.11 wiz 254: see
255: .Sx DIAGNOSTICS .
1.5 kleink 256: .Pp
257: If
258: .Dv REG_NOSUB
259: was specified in the compilation of the RE, or if
260: .Fa nmatch
1.1 jtc 261: is 0,
1.5 kleink 262: .Fn regexec
1.1 jtc 263: ignores the
1.5 kleink 264: .Fa pmatch
265: argument (but see below for the case where
266: .Dv REG_STARTEND
267: is specified).
1.1 jtc 268: Otherwise,
1.5 kleink 269: .Fa pmatch
1.1 jtc 270: points to an array of
1.5 kleink 271: .Fa nmatch
1.1 jtc 272: structures of type
1.5 kleink 273: .Fa regmatch_t .
1.1 jtc 274: Such a structure has at least the members
1.5 kleink 275: .Fa rm_so
1.1 jtc 276: and
1.5 kleink 277: .Fa rm_eo ,
1.1 jtc 278: both of type
1.5 kleink 279: .Fa regoff_t
1.1 jtc 280: (a signed arithmetic type at least as large as an
1.5 kleink 281: .Fa off_t
1.1 jtc 282: and a
1.5 kleink 283: .Fa ssize_t ) ,
1.1 jtc 284: containing respectively the offset of the first character of a substring
285: and the offset of the first character after the end of the substring.
286: Offsets are measured from the beginning of the
1.5 kleink 287: .Fa string
1.1 jtc 288: argument given to
1.5 kleink 289: .Fn regexec .
1.1 jtc 290: An empty substring is denoted by equal offsets,
291: both indicating the character following the empty substring.
1.5 kleink 292: .Pp
1.1 jtc 293: The 0th member of the
1.5 kleink 294: .Fa pmatch
1.1 jtc 295: array is filled in to indicate what substring of
1.5 kleink 296: .Fa string
1.1 jtc 297: was matched by the entire RE.
298: Remaining members report what substring was matched by parenthesized
299: subexpressions within the RE;
300: member
1.5 kleink 301: .Fa i
1.1 jtc 302: reports subexpression
1.5 kleink 303: .Fa i ,
304: with subexpressions counted (starting at 1) by the order of their
305: opening parentheses in the RE, left to right.
1.1 jtc 306: Unused entries in the array\(emcorresponding either to subexpressions that
307: did not participate in the match at all, or to subexpressions that do not
1.5 kleink 308: exist in the RE (that is,
309: .Fa i
1.12 ! ross 310: \*[Gt]
! 311: .Fa preg-\*[Gt]re_nsub )
1.5 kleink 312: \(emhave both
313: .Fa rm_so
1.1 jtc 314: and
1.5 kleink 315: .Fa rm_eo
316: set to -1.
1.1 jtc 317: If a subexpression participated in the match several times,
318: the reported substring is the last one it matched.
319: (Note, as an example in particular, that when the RE `(b*)+' matches `bbb',
320: the parenthesized subexpression matches each of the three `b's and then
321: an infinite number of empty strings following the last `b',
322: so the reported substring is one of the empties.)
1.5 kleink 323: .Pp
324: If
325: .Dv REG_STARTEND
326: is specified,
327: .Fa pmatch
1.1 jtc 328: must point to at least one
1.5 kleink 329: .Fa regmatch_t
1.1 jtc 330: (even if
1.5 kleink 331: .Fa nmatch
332: is 0 or
333: .Dv REG_NOSUB
334: was specified),
335: to hold the input offsets for
336: .Dv REG_STARTEND .
1.1 jtc 337: Use for output is still entirely controlled by
1.5 kleink 338: .Fa nmatch ;
1.1 jtc 339: if
1.5 kleink 340: .Fa nmatch
341: is 0 or
342: .Dv REG_NOSUB
343: was specified,
1.1 jtc 344: the value of
1.5 kleink 345: .Fa pmatch [0]
1.1 jtc 346: will not be changed by a successful
1.5 kleink 347: .Fn regexec .
348: .Pp
1.9 lukem 349: .Fn regerror
1.1 jtc 350: maps a non-zero
1.5 kleink 351: .Fa errcode
1.1 jtc 352: from either
1.5 kleink 353: .Fn regcomp
1.1 jtc 354: or
1.5 kleink 355: .Fn regexec
1.1 jtc 356: to a human-readable, printable message.
357: If
1.5 kleink 358: .Fa preg
1.1 jtc 359: is non-NULL,
1.5 kleink 360: the error code should have arisen from use of the
361: .Fa regex_t
1.1 jtc 362: pointed to by
1.5 kleink 363: .Fa preg ,
1.1 jtc 364: and if the error code came from
1.5 kleink 365: .Fn regcomp ,
1.1 jtc 366: it should have been the result from the most recent
1.5 kleink 367: .Fn regcomp
1.1 jtc 368: using that
1.5 kleink 369: .Fa regex_t . (
1.9 lukem 370: .Fn regerror
1.1 jtc 371: may be able to supply a more detailed message using information
372: from the
1.5 kleink 373: .Fa regex_t . )
1.9 lukem 374: .Fn regerror
1.1 jtc 375: places the NUL-terminated message into the buffer pointed to by
1.5 kleink 376: .Fa errbuf ,
1.1 jtc 377: limiting the length (including the NUL) to at most
1.5 kleink 378: .Fa errbuf_size
1.1 jtc 379: bytes.
380: If the whole message won't fit,
381: as much of it as will fit before the terminating NUL is supplied.
382: In any case,
383: the returned value is the size of buffer needed to hold the whole
384: message (including terminating NUL).
385: If
1.5 kleink 386: .Fa errbuf_size
1.1 jtc 387: is 0,
1.5 kleink 388: .Fa errbuf
1.1 jtc 389: is ignored but the return value is still correct.
1.5 kleink 390: .Pp
1.1 jtc 391: If the
1.5 kleink 392: .Fa errcode
1.1 jtc 393: given to
1.5 kleink 394: .Fn regerror
395: is first ORed with
396: .Dv REG_ITOA ,
1.1 jtc 397: the ``message'' that results is the printable name of the error code,
398: e.g. ``REG_NOMATCH'',
399: rather than an explanation thereof.
400: If
1.5 kleink 401: .Fa errcode
1.10 jdolecek 402: is
1.5 kleink 403: .Dv REG_ATOI ,
1.1 jtc 404: then
1.5 kleink 405: .Fa preg
1.1 jtc 406: shall be non-NULL and the
1.5 kleink 407: .Fa re_endp
1.1 jtc 408: member of the structure it points to
409: must point to the printable name of an error code;
410: in this case, the result in
1.5 kleink 411: .Fa errbuf
1.1 jtc 412: is the decimal digits of
413: the numeric value of the error code
414: (0 if the name is not recognized).
1.5 kleink 415: .Dv REG_ITOA
416: and
417: .Dv REG_ATOI
418: are intended primarily as debugging facilities;
419: they are extensions, compatible with but not specified by
420: .St -p1003.2-92 ,
421: and should be used with caution in software intended to be portable to
422: other systems.
1.1 jtc 423: Be warned also that they are considered experimental and changes are possible.
1.5 kleink 424: .Pp
1.9 lukem 425: .Fn regfree
1.1 jtc 426: frees any dynamically-allocated storage associated with the compiled RE
427: pointed to by
1.5 kleink 428: .Fa preg .
1.1 jtc 429: The remaining
1.5 kleink 430: .Fa regex_t
1.1 jtc 431: is no longer a valid compiled RE
432: and the effect of supplying it to
1.5 kleink 433: .Fn regexec
1.1 jtc 434: or
1.5 kleink 435: .Fn regerror
1.1 jtc 436: is undefined.
1.5 kleink 437: .Pp
1.1 jtc 438: None of these functions references global variables except for tables
439: of constants;
440: all are safe for use from multiple threads if the arguments are safe.
1.5 kleink 441: .Sh IMPLEMENTATION CHOICES
442: There are a number of decisions that
443: .St -p1003.2-92
444: leaves up to the implementor,
1.1 jtc 445: either by explicitly saying ``undefined'' or by virtue of them being
446: forbidden by the RE grammar.
447: This implementation treats them as follows.
1.5 kleink 448: .Pp
1.1 jtc 449: See
1.5 kleink 450: .Xr re_format 7
1.1 jtc 451: for a discussion of the definition of case-independent matching.
1.5 kleink 452: .Pp
1.1 jtc 453: There is no particular limit on the length of REs,
454: except insofar as memory is limited.
455: Memory usage is approximately linear in RE size, and largely insensitive
456: to RE complexity, except for bounded repetitions.
457: See BUGS for one short RE using them
458: that will run almost any system out of memory.
1.5 kleink 459: .Pp
1.1 jtc 460: A backslashed character other than one specifically given a magic meaning
1.5 kleink 461: by
462: .St -p1003.2-92
463: (such magic meanings occur only in obsolete [``basic''] REs)
1.1 jtc 464: is taken as an ordinary character.
1.5 kleink 465: .Pp
466: Any unmatched [ is a
467: .Dv REG_EBRACK
468: error.
469: .Pp
1.1 jtc 470: Equivalence classes cannot begin or end bracket-expression ranges.
471: The endpoint of one range cannot begin another.
1.5 kleink 472: .Pp
473: .Dv RE_DUP_MAX ,
474: the limit on repetition counts in bounded repetitions, is 255.
475: .Pp
1.1 jtc 476: A repetition operator (?, *, +, or bounds) cannot follow another
477: repetition operator.
478: A repetition operator cannot begin an expression or subexpression
479: or follow `^' or `|'.
1.5 kleink 480: .Pp
1.1 jtc 481: `|' cannot appear first or last in a (sub)expression or after another `|',
482: i.e. an operand of `|' cannot be an empty subexpression.
483: An empty parenthesized subexpression, `()', is legal and matches an
484: empty (sub)string.
485: An empty string is not a legal RE.
1.5 kleink 486: .Pp
1.1 jtc 487: A `{' followed by a digit is considered the beginning of bounds for a
488: bounded repetition, which must then follow the syntax for bounds.
489: A `{' \fInot\fR followed by a digit is considered an ordinary character.
1.5 kleink 490: .Pp
1.1 jtc 491: `^' and `$' beginning and ending subexpressions in obsolete (``basic'')
492: REs are anchors, not ordinary characters.
1.5 kleink 493: .Sh DIAGNOSTICS
1.1 jtc 494: Non-zero error codes from
1.5 kleink 495: .Fn regcomp
1.1 jtc 496: and
1.5 kleink 497: .Fn regexec
1.1 jtc 498: include the following:
1.5 kleink 499: .Pp
500: .Bl -tag -width XXXREG_ECOLLATE -compact
501: .It Dv REG_NOMATCH
502: regexec() failed to match
503: .It Dv REG_BADPAT
504: invalid regular expression
505: .It Dv REG_ECOLLATE
506: invalid collating element
507: .It Dv REG_ECTYPE
508: invalid character class
509: .It Dv REG_EESCAPE
510: \e applied to unescapable character
511: .It Dv REG_ESUBREG
512: invalid backreference number
513: .It Dv REG_EBRACK
514: brackets [ ] not balanced
515: .It Dv REG_EPAREN
516: parentheses ( ) not balanced
517: .It Dv REG_EBRACE
518: braces { } not balanced
519: .It Dv REG_BADBR
520: invalid repetition count(s) in { }
521: .It Dv REG_ERANGE
522: invalid character range in [ ]
523: .It Dv REG_ESPACE
524: ran out of memory
525: .It Dv REG_BADRPT
526: ?, *, or + operand invalid
527: .It Dv REG_EMPTY
528: empty (sub)expression
529: .It Dv REG_ASSERT
530: ``can't happen''\(emyou found a bug
531: .It Dv REG_INVARG
532: invalid argument, e.g. negative-length string
533: .El
1.11 wiz 534: .Sh SEE ALSO
535: .Xr grep 1 ,
536: .Xr sed 1 ,
537: .Xr re_format 7
538: .Pp
539: .St -p1003.2-92 ,
540: sections 2.8 (Regular Expression Notation)
541: and
542: B.5 (C Binding for Regular Expression Matching).
1.5 kleink 543: .Sh HISTORY
1.3 cgd 544: Originally written by Henry Spencer.
1.8 perry 545: Altered for inclusion in the
546: .Bx 4.4
547: distribution.
1.5 kleink 548: .Sh BUGS
1.1 jtc 549: This is an alpha release with known defects.
550: Please report problems.
1.5 kleink 551: .Pp
1.1 jtc 552: There is one known functionality bug.
553: The implementation of internationalization is incomplete:
1.5 kleink 554: the locale is always assumed to be the default one of
555: .St -p1003.2-92 ,
1.1 jtc 556: and only the collating elements etc. of that locale are available.
1.5 kleink 557: .Pp
1.1 jtc 558: The back-reference code is subtle and doubts linger about its correctness
559: in complex cases.
1.5 kleink 560: .Pp
1.9 lukem 561: .Fn regexec
1.1 jtc 562: performance is poor.
563: This will improve with later releases.
1.9 lukem 564: .Fa nmatch
1.1 jtc 565: exceeding 0 is expensive;
1.5 kleink 566: .Fa nmatch
1.1 jtc 567: exceeding 1 is worse.
1.9 lukem 568: .Fa regexec
1.5 kleink 569: is largely insensitive to RE complexity
570: .Em except
571: that back references are massively expensive.
1.1 jtc 572: RE length does matter; in particular, there is a strong speed bonus
573: for keeping RE length under about 30 characters,
574: with most special characters counting roughly double.
1.5 kleink 575: .Pp
1.9 lukem 576: .Fn regcomp
1.1 jtc 577: implements bounded repetitions by macro expansion,
578: which is costly in time and space if counts are large
579: or bounded repetitions are nested.
580: An RE like, say,
581: `((((a{1,100}){1,100}){1,100}){1,100}){1,100}'
582: will (eventually) run almost any existing machine out of swap space.
1.5 kleink 583: .Pp
1.1 jtc 584: There are suspected problems with response to obscure error conditions.
585: Notably,
586: certain kinds of internal overflow,
587: produced only by truly enormous REs or by multiply nested bounded repetitions,
588: are probably not handled well.
1.5 kleink 589: .Pp
590: Due to a mistake in
591: .St -p1003.2-92 ,
592: things like `a)b' are legal REs because `)' is a special character
593: only in the presence of a previous unmatched `('. This can't be fixed
594: until the spec is fixed.
595: .Pp
1.1 jtc 596: The standard's definition of back references is vague.
597: For example, does
598: `a\e(\e(b\e)*\e2\e)*d' match `abbbd'?
1.5 kleink 599: Until the standard is clarified, behavior in such cases should not be
600: relied on.
601: .Pp
1.1 jtc 602: The implementation of word-boundary matching is a bit of a kludge,
603: and bugs may lurk in combinations of word-boundary matching and anchoring.
CVSweb <webmaster@jp.NetBSD.org>