[BACK]Return to regex.3 CVS log [TXT][DIR] Up to [cvs.NetBSD.org] / src / lib / libc / regex

Annotation of src/lib/libc/regex/regex.3, Revision 1.12

1.12    ! ross        1: .\"    $NetBSD: regex.3,v 1.11 2001/09/16 02:20:13 wiz Exp $
1.4       cgd         2: .\"
1.3       cgd         3: .\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
                      4: .\" Copyright (c) 1992, 1993, 1994
                      5: .\"    The Regents of the University of California.  All rights reserved.
                      6: .\"
                      7: .\" This code is derived from software contributed to Berkeley by
                      8: .\" Henry Spencer.
                      9: .\"
                     10: .\" Redistribution and use in source and binary forms, with or without
                     11: .\" modification, are permitted provided that the following conditions
                     12: .\" are met:
                     13: .\" 1. Redistributions of source code must retain the above copyright
                     14: .\"    notice, this list of conditions and the following disclaimer.
                     15: .\" 2. Redistributions in binary form must reproduce the above copyright
                     16: .\"    notice, this list of conditions and the following disclaimer in the
                     17: .\"    documentation and/or other materials provided with the distribution.
                     18: .\" 3. All advertising materials mentioning features or use of this software
                     19: .\"    must display the following acknowledgement:
                     20: .\"    This product includes software developed by the University of
                     21: .\"    California, Berkeley and its contributors.
                     22: .\" 4. Neither the name of the University nor the names of its contributors
                     23: .\"    may be used to endorse or promote products derived from this software
                     24: .\"    without specific prior written permission.
                     25: .\"
                     26: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
                     27: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
                     28: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
                     29: .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
                     30: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
                     31: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
                     32: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
                     33: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
                     34: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
                     35: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
                     36: .\" SUCH DAMAGE.
                     37: .\"
                     38: .\"    @(#)regex.3     8.4 (Berkeley) 3/20/94
                     39: .\"
1.6       perry      40: .Dd March 20, 1994
1.5       kleink     41: .Dt REGEX 3
                     42: .Os
                     43: .Sh NAME
                     44: .Nm regex ,
                     45: .Nm regcomp ,
                     46: .Nm regexec ,
                     47: .Nm regerror ,
                     48: .Nm regfree
                     49: .Nd regular-expression library
1.7       perry      50: .Sh LIBRARY
                     51: .Lb libc
1.5       kleink     52: .Sh SYNOPSIS
1.12    ! ross       53: .Fd #include \*[Lt]sys/types.h\*[Gt]
        !            54: .Fd #include \*[Lt]regex.h\*[Gt]
1.5       kleink     55: .Ft int
                     56: .Fn regcomp "regex_t *preg" "const char *pattern" "int cflags"
                     57: .Ft int
                     58: .Fn regexec "const regex_t *preg" "const char *string" "size_t nmatch" "regmatch_t pmatch[]" "int eflags"
                     59: .Ft size_t
                     60: .Fn regerror "int errcode" "const regex_t *preg" "char *errbuf" "size_t errbuf_size"
                     61: .Ft void
                     62: .Fn regfree "regex_t *preg"
                     63: .Sh DESCRIPTION
                     64: These routines implement
                     65: .St -p1003.2-92
                     66: regular expressions (``RE''s);
1.1       jtc        67: see
1.5       kleink     68: .Xr re_format 7 .
                     69: .Fn regcomp
1.1       jtc        70: compiles an RE written as a string into an internal form,
1.5       kleink     71: .Fn regexec
1.1       jtc        72: matches that internal form against a string and reports results,
1.5       kleink     73: .Fn regerror
1.1       jtc        74: transforms error codes from either into human-readable messages,
                     75: and
1.5       kleink     76: .Fn regfree
1.1       jtc        77: frees any dynamically-allocated storage used by the internal form
                     78: of an RE.
1.5       kleink     79: .Pp
1.1       jtc        80: The header
1.12    ! ross       81: .Em \*[Lt]regex.h\*[Gt]
1.1       jtc        82: declares two structure types,
1.5       kleink     83: .Fa regex_t
1.1       jtc        84: and
1.5       kleink     85: .Fa regmatch_t ,
1.1       jtc        86: the former for compiled internal forms and the latter for match reporting.
                     87: It also declares the four functions,
                     88: a type
1.5       kleink     89: .Fa regoff_t ,
1.1       jtc        90: and a number of constants with names starting with ``REG_''.
1.5       kleink     91: .Pp
1.9       lukem      92: .Fn regcomp
1.1       jtc        93: compiles the regular expression contained in the
1.5       kleink     94: .Fa pattern
1.1       jtc        95: string,
                     96: subject to the flags in
1.5       kleink     97: .Fa cflags ,
1.1       jtc        98: and places the results in the
1.5       kleink     99: .Fa regex_t
1.1       jtc       100: structure pointed to by
1.5       kleink    101: .Fa preg .
1.9       lukem     102: .Fa cflags
1.1       jtc       103: is the bitwise OR of zero or more of the following flags:
1.5       kleink    104: .Bl -tag -width XXXREG_EXTENDED
                    105: .It Dv REG_EXTENDED
                    106: Compile modern (``extended'') REs, rather than the obsolete
                    107: (``basic'') REs that are the default.
                    108: .It Dv REG_BASIC
1.1       jtc       109: This is a synonym for 0,
                    110: provided as a counterpart to REG_EXTENDED to improve readability.
1.5       kleink    111: .It Dv REG_NOSPEC
                    112: Compile with recognition of all special characters turned off.  All
                    113: characters are thus considered ordinary, so the ``RE'' is a literal
                    114: string.
                    115: This is an extension, compatible with but not specified by
                    116: .St -p1003.2-92 ,
                    117: and should be used with caution in software intended to be portable to
                    118: other systems.
                    119: .Dv REG_EXTENDED
                    120: and
                    121: .Dv REG_NOSPEC
                    122: may not be used in the same call to
                    123: .Fn regcomp .
                    124: .It Dv REG_ICASE
                    125: Compile for matching that ignores upper/lower case distinctions. See
                    126: .Xr re_format 7 .
                    127: .It Dv REG_NOSUB
                    128: Compile for matching that need only report success or failure, not
                    129: what was matched.
                    130: .It Dv REG_NEWLINE
1.1       jtc       131: Compile for newline-sensitive matching.
                    132: By default, newline is a completely ordinary character with no special
                    133: meaning in either REs or strings.
                    134: With this flag,
                    135: `[^' bracket expressions and `.' never match newline,
                    136: a `^' anchor matches the null string after any newline in the string
                    137: in addition to its normal function,
                    138: and the `$' anchor matches the null string before any newline in the
                    139: string in addition to its normal function.
1.5       kleink    140: .It Dv REG_PEND
                    141: The regular expression ends, not at the first NUL, but just before the
                    142: character pointed to by the
                    143: .Fa re_endp
1.1       jtc       144: member of the structure pointed to by
1.5       kleink    145: .Fa preg .
1.1       jtc       146: The
1.5       kleink    147: .Fa re_endp
1.1       jtc       148: member is of type
1.5       kleink    149: .Fa "const\ char\ *" .
                    150: This flag permits inclusion of NULs in the RE; they are considered
                    151: ordinary characters.
                    152: This is an extension, compatible with but not specified by
                    153: .St -p1003.2-92 ,
                    154: and should be used with caution in software intended to be portable to
                    155: other systems.
                    156: .El
                    157: .Pp
1.1       jtc       158: When successful,
1.5       kleink    159: .Fn regcomp
1.1       jtc       160: returns 0 and fills in the structure pointed to by
1.5       kleink    161: .Fa preg .
                    162: One member of that structure (other than
                    163: .Fa re_endp )
1.1       jtc       164: is publicized:
1.5       kleink    165: .Fa re_nsub ,
1.1       jtc       166: of type
1.5       kleink    167: .Fa size_t ,
1.1       jtc       168: contains the number of parenthesized subexpressions within the RE
                    169: (except that the value of this member is undefined if the
1.5       kleink    170: .Dv REG_NOSUB
                    171: flag was used).
1.1       jtc       172: If
1.5       kleink    173: .Fn regcomp
1.1       jtc       174: fails, it returns a non-zero error code;
1.11      wiz       175: see
                    176: .Sx DIAGNOSTICS .
1.5       kleink    177: .Pp
1.9       lukem     178: .Fn regexec
1.1       jtc       179: matches the compiled RE pointed to by
1.5       kleink    180: .Fa preg
1.1       jtc       181: against the
1.5       kleink    182: .Fa string ,
1.1       jtc       183: subject to the flags in
1.5       kleink    184: .Fa eflags ,
1.1       jtc       185: and reports results using
1.5       kleink    186: .Fa nmatch ,
                    187: .Fa pmatch ,
1.1       jtc       188: and the returned value.
                    189: The RE must have been compiled by a previous invocation of
1.5       kleink    190: .Fn regcomp .
1.1       jtc       191: The compiled form is not altered during execution of
1.5       kleink    192: .Fn regexec ,
1.1       jtc       193: so a single compiled RE can be used simultaneously by multiple threads.
1.5       kleink    194: .Pp
1.1       jtc       195: By default,
                    196: the NUL-terminated string pointed to by
1.5       kleink    197: .Fa string
1.1       jtc       198: is considered to be the text of an entire line, minus any terminating
                    199: newline.
                    200: The
1.5       kleink    201: .Fa eflags
1.1       jtc       202: argument is the bitwise OR of zero or more of the following flags:
1.5       kleink    203: .Bl -tag -width XXXREG_NOTBOL
                    204: .It Dv REG_NOTBOL
                    205: The first character of the string
1.1       jtc       206: is not the beginning of a line, so the `^' anchor should not match before it.
1.5       kleink    207: This does not affect the behavior of newlines under
                    208: .Dv REG_NEWLINE .
                    209: .It Dv REG_NOTEOL
                    210: The NUL terminating the string does not end a line, so the `$' anchor
                    211: should not match before it.  This does not affect the behavior of
                    212: newlines under
                    213: .Dv REG_NEWLINE .
                    214: .It Dv REG_STARTEND
1.1       jtc       215: The string is considered to start at
1.5       kleink    216: .Fa string
                    217: +
                    218: .Fa pmatch[0].rm_so
1.1       jtc       219: and to have a terminating NUL located at
1.5       kleink    220: .Fa string
                    221: +
                    222: .Fa pmatch[0].rm_eo
1.1       jtc       223: (there need not actually be a NUL at that location),
                    224: regardless of the value of
1.5       kleink    225: .Fa nmatch .
1.1       jtc       226: See below for the definition of
1.5       kleink    227: .Fa pmatch
1.1       jtc       228: and
1.5       kleink    229: .Fa nmatch .
                    230: This is an extension, compatible with but not specified by
                    231: .St -p1003.2-92 ,
                    232: and should be used with caution in software intended to be portable to
                    233: other systems.
                    234: Note that a non-zero
                    235: .Fa rm_so
                    236: does not imply
                    237: .Dv REG_NOTBOL ;
                    238: .Dv REG_STARTEND
                    239: affects only the location of the string, not how it is matched.
                    240: .El
                    241: .Pp
1.1       jtc       242: See
1.5       kleink    243: .Xr re_format 7
1.1       jtc       244: for a discussion of what is matched in situations where an RE or a
                    245: portion thereof could match any of several substrings of
1.5       kleink    246: .Fa string .
                    247: .Pp
1.1       jtc       248: Normally,
1.5       kleink    249: .Fn regexec
                    250: returns 0 for success and the non-zero code
                    251: .Dv REG_NOMATCH
                    252: for failure.
1.1       jtc       253: Other non-zero error codes may be returned in exceptional situations;
1.11      wiz       254: see
                    255: .Sx DIAGNOSTICS .
1.5       kleink    256: .Pp
                    257: If
                    258: .Dv REG_NOSUB
                    259: was specified in the compilation of the RE, or if
                    260: .Fa nmatch
1.1       jtc       261: is 0,
1.5       kleink    262: .Fn regexec
1.1       jtc       263: ignores the
1.5       kleink    264: .Fa pmatch
                    265: argument (but see below for the case where
                    266: .Dv REG_STARTEND
                    267: is specified).
1.1       jtc       268: Otherwise,
1.5       kleink    269: .Fa pmatch
1.1       jtc       270: points to an array of
1.5       kleink    271: .Fa nmatch
1.1       jtc       272: structures of type
1.5       kleink    273: .Fa regmatch_t .
1.1       jtc       274: Such a structure has at least the members
1.5       kleink    275: .Fa rm_so
1.1       jtc       276: and
1.5       kleink    277: .Fa rm_eo ,
1.1       jtc       278: both of type
1.5       kleink    279: .Fa regoff_t
1.1       jtc       280: (a signed arithmetic type at least as large as an
1.5       kleink    281: .Fa off_t
1.1       jtc       282: and a
1.5       kleink    283: .Fa ssize_t ) ,
1.1       jtc       284: containing respectively the offset of the first character of a substring
                    285: and the offset of the first character after the end of the substring.
                    286: Offsets are measured from the beginning of the
1.5       kleink    287: .Fa string
1.1       jtc       288: argument given to
1.5       kleink    289: .Fn regexec .
1.1       jtc       290: An empty substring is denoted by equal offsets,
                    291: both indicating the character following the empty substring.
1.5       kleink    292: .Pp
1.1       jtc       293: The 0th member of the
1.5       kleink    294: .Fa pmatch
1.1       jtc       295: array is filled in to indicate what substring of
1.5       kleink    296: .Fa string
1.1       jtc       297: was matched by the entire RE.
                    298: Remaining members report what substring was matched by parenthesized
                    299: subexpressions within the RE;
                    300: member
1.5       kleink    301: .Fa i
1.1       jtc       302: reports subexpression
1.5       kleink    303: .Fa i ,
                    304: with subexpressions counted (starting at 1) by the order of their
                    305: opening parentheses in the RE, left to right.
1.1       jtc       306: Unused entries in the array\(emcorresponding either to subexpressions that
                    307: did not participate in the match at all, or to subexpressions that do not
1.5       kleink    308: exist in the RE (that is,
                    309: .Fa i
1.12    ! ross      310: \*[Gt]
        !           311: .Fa preg-\*[Gt]re_nsub )
1.5       kleink    312: \(emhave both
                    313: .Fa rm_so
1.1       jtc       314: and
1.5       kleink    315: .Fa rm_eo
                    316: set to -1.
1.1       jtc       317: If a subexpression participated in the match several times,
                    318: the reported substring is the last one it matched.
                    319: (Note, as an example in particular, that when the RE `(b*)+' matches `bbb',
                    320: the parenthesized subexpression matches each of the three `b's and then
                    321: an infinite number of empty strings following the last `b',
                    322: so the reported substring is one of the empties.)
1.5       kleink    323: .Pp
                    324: If
                    325: .Dv REG_STARTEND
                    326: is specified,
                    327: .Fa pmatch
1.1       jtc       328: must point to at least one
1.5       kleink    329: .Fa regmatch_t
1.1       jtc       330: (even if
1.5       kleink    331: .Fa nmatch
                    332: is 0 or
                    333: .Dv REG_NOSUB
                    334: was specified),
                    335: to hold the input offsets for
                    336: .Dv REG_STARTEND .
1.1       jtc       337: Use for output is still entirely controlled by
1.5       kleink    338: .Fa nmatch ;
1.1       jtc       339: if
1.5       kleink    340: .Fa nmatch
                    341: is 0 or
                    342: .Dv REG_NOSUB
                    343: was specified,
1.1       jtc       344: the value of
1.5       kleink    345: .Fa pmatch [0]
1.1       jtc       346: will not be changed by a successful
1.5       kleink    347: .Fn regexec .
                    348: .Pp
1.9       lukem     349: .Fn regerror
1.1       jtc       350: maps a non-zero
1.5       kleink    351: .Fa errcode
1.1       jtc       352: from either
1.5       kleink    353: .Fn regcomp
1.1       jtc       354: or
1.5       kleink    355: .Fn regexec
1.1       jtc       356: to a human-readable, printable message.
                    357: If
1.5       kleink    358: .Fa preg
1.1       jtc       359: is non-NULL,
1.5       kleink    360: the error code should have arisen from use of the
                    361: .Fa regex_t
1.1       jtc       362: pointed to by
1.5       kleink    363: .Fa preg ,
1.1       jtc       364: and if the error code came from
1.5       kleink    365: .Fn regcomp ,
1.1       jtc       366: it should have been the result from the most recent
1.5       kleink    367: .Fn regcomp
1.1       jtc       368: using that
1.5       kleink    369: .Fa regex_t . (
1.9       lukem     370: .Fn regerror
1.1       jtc       371: may be able to supply a more detailed message using information
                    372: from the
1.5       kleink    373: .Fa regex_t . )
1.9       lukem     374: .Fn regerror
1.1       jtc       375: places the NUL-terminated message into the buffer pointed to by
1.5       kleink    376: .Fa errbuf ,
1.1       jtc       377: limiting the length (including the NUL) to at most
1.5       kleink    378: .Fa errbuf_size
1.1       jtc       379: bytes.
                    380: If the whole message won't fit,
                    381: as much of it as will fit before the terminating NUL is supplied.
                    382: In any case,
                    383: the returned value is the size of buffer needed to hold the whole
                    384: message (including terminating NUL).
                    385: If
1.5       kleink    386: .Fa errbuf_size
1.1       jtc       387: is 0,
1.5       kleink    388: .Fa errbuf
1.1       jtc       389: is ignored but the return value is still correct.
1.5       kleink    390: .Pp
1.1       jtc       391: If the
1.5       kleink    392: .Fa errcode
1.1       jtc       393: given to
1.5       kleink    394: .Fn regerror
                    395: is first ORed with
                    396: .Dv REG_ITOA ,
1.1       jtc       397: the ``message'' that results is the printable name of the error code,
                    398: e.g. ``REG_NOMATCH'',
                    399: rather than an explanation thereof.
                    400: If
1.5       kleink    401: .Fa errcode
1.10      jdolecek  402: is
1.5       kleink    403: .Dv REG_ATOI ,
1.1       jtc       404: then
1.5       kleink    405: .Fa preg
1.1       jtc       406: shall be non-NULL and the
1.5       kleink    407: .Fa re_endp
1.1       jtc       408: member of the structure it points to
                    409: must point to the printable name of an error code;
                    410: in this case, the result in
1.5       kleink    411: .Fa errbuf
1.1       jtc       412: is the decimal digits of
                    413: the numeric value of the error code
                    414: (0 if the name is not recognized).
1.5       kleink    415: .Dv REG_ITOA
                    416: and
                    417: .Dv REG_ATOI
                    418: are intended primarily as debugging facilities;
                    419: they are extensions, compatible with but not specified by
                    420: .St -p1003.2-92 ,
                    421: and should be used with caution in software intended to be portable to
                    422: other systems.
1.1       jtc       423: Be warned also that they are considered experimental and changes are possible.
1.5       kleink    424: .Pp
1.9       lukem     425: .Fn regfree
1.1       jtc       426: frees any dynamically-allocated storage associated with the compiled RE
                    427: pointed to by
1.5       kleink    428: .Fa preg .
1.1       jtc       429: The remaining
1.5       kleink    430: .Fa regex_t
1.1       jtc       431: is no longer a valid compiled RE
                    432: and the effect of supplying it to
1.5       kleink    433: .Fn regexec
1.1       jtc       434: or
1.5       kleink    435: .Fn regerror
1.1       jtc       436: is undefined.
1.5       kleink    437: .Pp
1.1       jtc       438: None of these functions references global variables except for tables
                    439: of constants;
                    440: all are safe for use from multiple threads if the arguments are safe.
1.5       kleink    441: .Sh IMPLEMENTATION CHOICES
                    442: There are a number of decisions that
                    443: .St -p1003.2-92
                    444: leaves up to the implementor,
1.1       jtc       445: either by explicitly saying ``undefined'' or by virtue of them being
                    446: forbidden by the RE grammar.
                    447: This implementation treats them as follows.
1.5       kleink    448: .Pp
1.1       jtc       449: See
1.5       kleink    450: .Xr re_format 7
1.1       jtc       451: for a discussion of the definition of case-independent matching.
1.5       kleink    452: .Pp
1.1       jtc       453: There is no particular limit on the length of REs,
                    454: except insofar as memory is limited.
                    455: Memory usage is approximately linear in RE size, and largely insensitive
                    456: to RE complexity, except for bounded repetitions.
                    457: See BUGS for one short RE using them
                    458: that will run almost any system out of memory.
1.5       kleink    459: .Pp
1.1       jtc       460: A backslashed character other than one specifically given a magic meaning
1.5       kleink    461: by
                    462: .St -p1003.2-92
                    463: (such magic meanings occur only in obsolete [``basic''] REs)
1.1       jtc       464: is taken as an ordinary character.
1.5       kleink    465: .Pp
                    466: Any unmatched [ is a
                    467: .Dv REG_EBRACK
                    468: error.
                    469: .Pp
1.1       jtc       470: Equivalence classes cannot begin or end bracket-expression ranges.
                    471: The endpoint of one range cannot begin another.
1.5       kleink    472: .Pp
                    473: .Dv RE_DUP_MAX ,
                    474: the limit on repetition counts in bounded repetitions, is 255.
                    475: .Pp
1.1       jtc       476: A repetition operator (?, *, +, or bounds) cannot follow another
                    477: repetition operator.
                    478: A repetition operator cannot begin an expression or subexpression
                    479: or follow `^' or `|'.
1.5       kleink    480: .Pp
1.1       jtc       481: `|' cannot appear first or last in a (sub)expression or after another `|',
                    482: i.e. an operand of `|' cannot be an empty subexpression.
                    483: An empty parenthesized subexpression, `()', is legal and matches an
                    484: empty (sub)string.
                    485: An empty string is not a legal RE.
1.5       kleink    486: .Pp
1.1       jtc       487: A `{' followed by a digit is considered the beginning of bounds for a
                    488: bounded repetition, which must then follow the syntax for bounds.
                    489: A `{' \fInot\fR followed by a digit is considered an ordinary character.
1.5       kleink    490: .Pp
1.1       jtc       491: `^' and `$' beginning and ending subexpressions in obsolete (``basic'')
                    492: REs are anchors, not ordinary characters.
1.5       kleink    493: .Sh DIAGNOSTICS
1.1       jtc       494: Non-zero error codes from
1.5       kleink    495: .Fn regcomp
1.1       jtc       496: and
1.5       kleink    497: .Fn regexec
1.1       jtc       498: include the following:
1.5       kleink    499: .Pp
                    500: .Bl -tag -width XXXREG_ECOLLATE -compact
                    501: .It Dv REG_NOMATCH
                    502: regexec() failed to match
                    503: .It Dv REG_BADPAT
                    504: invalid regular expression
                    505: .It Dv REG_ECOLLATE
                    506: invalid collating element
                    507: .It Dv REG_ECTYPE
                    508: invalid character class
                    509: .It Dv REG_EESCAPE
                    510: \e applied to unescapable character
                    511: .It Dv REG_ESUBREG
                    512: invalid backreference number
                    513: .It Dv REG_EBRACK
                    514: brackets [ ] not balanced
                    515: .It Dv REG_EPAREN
                    516: parentheses ( ) not balanced
                    517: .It Dv REG_EBRACE
                    518: braces { } not balanced
                    519: .It Dv REG_BADBR
                    520: invalid repetition count(s) in { }
                    521: .It Dv REG_ERANGE
                    522: invalid character range in [ ]
                    523: .It Dv REG_ESPACE
                    524: ran out of memory
                    525: .It Dv REG_BADRPT
                    526: ?, *, or + operand invalid
                    527: .It Dv REG_EMPTY
                    528: empty (sub)expression
                    529: .It Dv REG_ASSERT
                    530: ``can't happen''\(emyou found a bug
                    531: .It Dv REG_INVARG
                    532: invalid argument, e.g. negative-length string
                    533: .El
1.11      wiz       534: .Sh SEE ALSO
                    535: .Xr grep 1 ,
                    536: .Xr sed 1 ,
                    537: .Xr re_format 7
                    538: .Pp
                    539: .St -p1003.2-92 ,
                    540: sections 2.8 (Regular Expression Notation)
                    541: and
                    542: B.5 (C Binding for Regular Expression Matching).
1.5       kleink    543: .Sh HISTORY
1.3       cgd       544: Originally written by Henry Spencer.
1.8       perry     545: Altered for inclusion in the
                    546: .Bx 4.4
                    547: distribution.
1.5       kleink    548: .Sh BUGS
1.1       jtc       549: This is an alpha release with known defects.
                    550: Please report problems.
1.5       kleink    551: .Pp
1.1       jtc       552: There is one known functionality bug.
                    553: The implementation of internationalization is incomplete:
1.5       kleink    554: the locale is always assumed to be the default one of
                    555: .St -p1003.2-92 ,
1.1       jtc       556: and only the collating elements etc. of that locale are available.
1.5       kleink    557: .Pp
1.1       jtc       558: The back-reference code is subtle and doubts linger about its correctness
                    559: in complex cases.
1.5       kleink    560: .Pp
1.9       lukem     561: .Fn regexec
1.1       jtc       562: performance is poor.
                    563: This will improve with later releases.
1.9       lukem     564: .Fa nmatch
1.1       jtc       565: exceeding 0 is expensive;
1.5       kleink    566: .Fa nmatch
1.1       jtc       567: exceeding 1 is worse.
1.9       lukem     568: .Fa regexec
1.5       kleink    569: is largely insensitive to RE complexity
                    570: .Em except
                    571: that back references are massively expensive.
1.1       jtc       572: RE length does matter; in particular, there is a strong speed bonus
                    573: for keeping RE length under about 30 characters,
                    574: with most special characters counting roughly double.
1.5       kleink    575: .Pp
1.9       lukem     576: .Fn regcomp
1.1       jtc       577: implements bounded repetitions by macro expansion,
                    578: which is costly in time and space if counts are large
                    579: or bounded repetitions are nested.
                    580: An RE like, say,
                    581: `((((a{1,100}){1,100}){1,100}){1,100}){1,100}'
                    582: will (eventually) run almost any existing machine out of swap space.
1.5       kleink    583: .Pp
1.1       jtc       584: There are suspected problems with response to obscure error conditions.
                    585: Notably,
                    586: certain kinds of internal overflow,
                    587: produced only by truly enormous REs or by multiply nested bounded repetitions,
                    588: are probably not handled well.
1.5       kleink    589: .Pp
                    590: Due to a mistake in
                    591: .St -p1003.2-92 ,
                    592: things like `a)b' are legal REs because `)' is a special character
                    593: only in the presence of a previous unmatched `('.  This can't be fixed
                    594: until the spec is fixed.
                    595: .Pp
1.1       jtc       596: The standard's definition of back references is vague.
                    597: For example, does
                    598: `a\e(\e(b\e)*\e2\e)*d' match `abbbd'?
1.5       kleink    599: Until the standard is clarified, behavior in such cases should not be
                    600: relied on.
                    601: .Pp
1.1       jtc       602: The implementation of word-boundary matching is a bit of a kludge,
                    603: and bugs may lurk in combinations of word-boundary matching and anchoring.

CVSweb <webmaster@jp.NetBSD.org>