[BACK]Return to regex.3 CVS log [TXT][DIR] Up to [cvs.NetBSD.org] / src / lib / libc / regex

Annotation of src/lib/libc/regex/regex.3, Revision 1.18

1.18    ! wiz         1: .\"    $NetBSD: regex.3,v 1.17 2003/08/07 16:43:20 agc Exp $
1.4       cgd         2: .\"
1.3       cgd         3: .\" Copyright (c) 1992, 1993, 1994
                      4: .\"    The Regents of the University of California.  All rights reserved.
1.17      agc         5: .\"
                      6: .\" This code is derived from software contributed to Berkeley by
                      7: .\" Henry Spencer.
                      8: .\"
                      9: .\" Redistribution and use in source and binary forms, with or without
                     10: .\" modification, are permitted provided that the following conditions
                     11: .\" are met:
                     12: .\" 1. Redistributions of source code must retain the above copyright
                     13: .\"    notice, this list of conditions and the following disclaimer.
                     14: .\" 2. Redistributions in binary form must reproduce the above copyright
                     15: .\"    notice, this list of conditions and the following disclaimer in the
                     16: .\"    documentation and/or other materials provided with the distribution.
                     17: .\" 3. Neither the name of the University nor the names of its contributors
                     18: .\"    may be used to endorse or promote products derived from this software
                     19: .\"    without specific prior written permission.
                     20: .\"
                     21: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
                     22: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
                     23: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
                     24: .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
                     25: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
                     26: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
                     27: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
                     28: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
                     29: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
                     30: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
                     31: .\" SUCH DAMAGE.
                     32: .\"
                     33: .\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
1.3       cgd        34: .\"
                     35: .\" This code is derived from software contributed to Berkeley by
                     36: .\" Henry Spencer.
                     37: .\"
                     38: .\" Redistribution and use in source and binary forms, with or without
                     39: .\" modification, are permitted provided that the following conditions
                     40: .\" are met:
                     41: .\" 1. Redistributions of source code must retain the above copyright
                     42: .\"    notice, this list of conditions and the following disclaimer.
                     43: .\" 2. Redistributions in binary form must reproduce the above copyright
                     44: .\"    notice, this list of conditions and the following disclaimer in the
                     45: .\"    documentation and/or other materials provided with the distribution.
                     46: .\" 3. All advertising materials mentioning features or use of this software
                     47: .\"    must display the following acknowledgement:
                     48: .\"    This product includes software developed by the University of
                     49: .\"    California, Berkeley and its contributors.
                     50: .\" 4. Neither the name of the University nor the names of its contributors
                     51: .\"    may be used to endorse or promote products derived from this software
                     52: .\"    without specific prior written permission.
                     53: .\"
                     54: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
                     55: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
                     56: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
                     57: .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
                     58: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
                     59: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
                     60: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
                     61: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
                     62: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
                     63: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
                     64: .\" SUCH DAMAGE.
                     65: .\"
                     66: .\"    @(#)regex.3     8.4 (Berkeley) 3/20/94
                     67: .\"
1.18    ! wiz        68: .Dd December 29, 2003
1.5       kleink     69: .Dt REGEX 3
                     70: .Os
                     71: .Sh NAME
                     72: .Nm regex ,
                     73: .Nm regcomp ,
                     74: .Nm regexec ,
                     75: .Nm regerror ,
                     76: .Nm regfree
                     77: .Nd regular-expression library
1.7       perry      78: .Sh LIBRARY
                     79: .Lb libc
1.5       kleink     80: .Sh SYNOPSIS
1.16      wiz        81: .In regex.h
1.5       kleink     82: .Ft int
1.15      kleink     83: .Fn regcomp "regex_t * restrict preg" "const char * restrict pattern" "int cflags"
1.5       kleink     84: .Ft int
1.15      kleink     85: .Fn regexec "const regex_t * restrict preg" "const char * restrict string" "size_t nmatch" "regmatch_t pmatch[]" "int eflags"
1.5       kleink     86: .Ft size_t
1.15      kleink     87: .Fn regerror "int errcode" "const regex_t * restrict preg" "char * restrict errbuf" "size_t errbuf_size"
1.5       kleink     88: .Ft void
                     89: .Fn regfree "regex_t *preg"
                     90: .Sh DESCRIPTION
                     91: These routines implement
                     92: .St -p1003.2-92
                     93: regular expressions (``RE''s);
1.1       jtc        94: see
1.5       kleink     95: .Xr re_format 7 .
                     96: .Fn regcomp
1.1       jtc        97: compiles an RE written as a string into an internal form,
1.5       kleink     98: .Fn regexec
1.1       jtc        99: matches that internal form against a string and reports results,
1.5       kleink    100: .Fn regerror
1.1       jtc       101: transforms error codes from either into human-readable messages,
                    102: and
1.5       kleink    103: .Fn regfree
1.1       jtc       104: frees any dynamically-allocated storage used by the internal form
                    105: of an RE.
1.5       kleink    106: .Pp
1.1       jtc       107: The header
1.12      ross      108: .Em \*[Lt]regex.h\*[Gt]
1.1       jtc       109: declares two structure types,
1.5       kleink    110: .Fa regex_t
1.1       jtc       111: and
1.5       kleink    112: .Fa regmatch_t ,
1.1       jtc       113: the former for compiled internal forms and the latter for match reporting.
                    114: It also declares the four functions,
                    115: a type
1.5       kleink    116: .Fa regoff_t ,
1.1       jtc       117: and a number of constants with names starting with ``REG_''.
1.5       kleink    118: .Pp
1.9       lukem     119: .Fn regcomp
1.1       jtc       120: compiles the regular expression contained in the
1.5       kleink    121: .Fa pattern
1.1       jtc       122: string,
                    123: subject to the flags in
1.5       kleink    124: .Fa cflags ,
1.1       jtc       125: and places the results in the
1.5       kleink    126: .Fa regex_t
1.1       jtc       127: structure pointed to by
1.5       kleink    128: .Fa preg .
1.9       lukem     129: .Fa cflags
1.1       jtc       130: is the bitwise OR of zero or more of the following flags:
1.5       kleink    131: .Bl -tag -width XXXREG_EXTENDED
                    132: .It Dv REG_EXTENDED
                    133: Compile modern (``extended'') REs, rather than the obsolete
                    134: (``basic'') REs that are the default.
                    135: .It Dv REG_BASIC
1.1       jtc       136: This is a synonym for 0,
                    137: provided as a counterpart to REG_EXTENDED to improve readability.
1.5       kleink    138: .It Dv REG_NOSPEC
1.14      wiz       139: Compile with recognition of all special characters turned off.
                    140: All characters are thus considered ordinary, so the ``RE'' is a literal
1.5       kleink    141: string.
                    142: This is an extension, compatible with but not specified by
                    143: .St -p1003.2-92 ,
                    144: and should be used with caution in software intended to be portable to
                    145: other systems.
                    146: .Dv REG_EXTENDED
                    147: and
                    148: .Dv REG_NOSPEC
                    149: may not be used in the same call to
                    150: .Fn regcomp .
                    151: .It Dv REG_ICASE
1.14      wiz       152: Compile for matching that ignores upper/lower case distinctions.
                    153: See
1.5       kleink    154: .Xr re_format 7 .
                    155: .It Dv REG_NOSUB
                    156: Compile for matching that need only report success or failure, not
                    157: what was matched.
                    158: .It Dv REG_NEWLINE
1.1       jtc       159: Compile for newline-sensitive matching.
                    160: By default, newline is a completely ordinary character with no special
                    161: meaning in either REs or strings.
                    162: With this flag,
                    163: `[^' bracket expressions and `.' never match newline,
                    164: a `^' anchor matches the null string after any newline in the string
                    165: in addition to its normal function,
                    166: and the `$' anchor matches the null string before any newline in the
                    167: string in addition to its normal function.
1.5       kleink    168: .It Dv REG_PEND
                    169: The regular expression ends, not at the first NUL, but just before the
                    170: character pointed to by the
                    171: .Fa re_endp
1.1       jtc       172: member of the structure pointed to by
1.5       kleink    173: .Fa preg .
1.1       jtc       174: The
1.5       kleink    175: .Fa re_endp
1.1       jtc       176: member is of type
1.5       kleink    177: .Fa "const\ char\ *" .
                    178: This flag permits inclusion of NULs in the RE; they are considered
                    179: ordinary characters.
                    180: This is an extension, compatible with but not specified by
                    181: .St -p1003.2-92 ,
                    182: and should be used with caution in software intended to be portable to
                    183: other systems.
                    184: .El
                    185: .Pp
1.1       jtc       186: When successful,
1.5       kleink    187: .Fn regcomp
1.1       jtc       188: returns 0 and fills in the structure pointed to by
1.5       kleink    189: .Fa preg .
                    190: One member of that structure (other than
                    191: .Fa re_endp )
1.1       jtc       192: is publicized:
1.5       kleink    193: .Fa re_nsub ,
1.1       jtc       194: of type
1.5       kleink    195: .Fa size_t ,
1.1       jtc       196: contains the number of parenthesized subexpressions within the RE
                    197: (except that the value of this member is undefined if the
1.5       kleink    198: .Dv REG_NOSUB
                    199: flag was used).
1.1       jtc       200: If
1.5       kleink    201: .Fn regcomp
1.1       jtc       202: fails, it returns a non-zero error code;
1.11      wiz       203: see
                    204: .Sx DIAGNOSTICS .
1.5       kleink    205: .Pp
1.9       lukem     206: .Fn regexec
1.1       jtc       207: matches the compiled RE pointed to by
1.5       kleink    208: .Fa preg
1.1       jtc       209: against the
1.5       kleink    210: .Fa string ,
1.1       jtc       211: subject to the flags in
1.5       kleink    212: .Fa eflags ,
1.1       jtc       213: and reports results using
1.5       kleink    214: .Fa nmatch ,
                    215: .Fa pmatch ,
1.1       jtc       216: and the returned value.
                    217: The RE must have been compiled by a previous invocation of
1.5       kleink    218: .Fn regcomp .
1.1       jtc       219: The compiled form is not altered during execution of
1.5       kleink    220: .Fn regexec ,
1.1       jtc       221: so a single compiled RE can be used simultaneously by multiple threads.
1.5       kleink    222: .Pp
1.1       jtc       223: By default,
                    224: the NUL-terminated string pointed to by
1.5       kleink    225: .Fa string
1.1       jtc       226: is considered to be the text of an entire line, minus any terminating
                    227: newline.
                    228: The
1.5       kleink    229: .Fa eflags
1.1       jtc       230: argument is the bitwise OR of zero or more of the following flags:
1.5       kleink    231: .Bl -tag -width XXXREG_NOTBOL
                    232: .It Dv REG_NOTBOL
                    233: The first character of the string
1.1       jtc       234: is not the beginning of a line, so the `^' anchor should not match before it.
1.5       kleink    235: This does not affect the behavior of newlines under
                    236: .Dv REG_NEWLINE .
                    237: .It Dv REG_NOTEOL
                    238: The NUL terminating the string does not end a line, so the `$' anchor
1.14      wiz       239: should not match before it.
                    240: This does not affect the behavior of newlines under
1.5       kleink    241: .Dv REG_NEWLINE .
                    242: .It Dv REG_STARTEND
1.1       jtc       243: The string is considered to start at
1.5       kleink    244: .Fa string
                    245: +
                    246: .Fa pmatch[0].rm_so
1.1       jtc       247: and to have a terminating NUL located at
1.5       kleink    248: .Fa string
                    249: +
                    250: .Fa pmatch[0].rm_eo
1.1       jtc       251: (there need not actually be a NUL at that location),
                    252: regardless of the value of
1.5       kleink    253: .Fa nmatch .
1.1       jtc       254: See below for the definition of
1.5       kleink    255: .Fa pmatch
1.1       jtc       256: and
1.5       kleink    257: .Fa nmatch .
                    258: This is an extension, compatible with but not specified by
                    259: .St -p1003.2-92 ,
                    260: and should be used with caution in software intended to be portable to
                    261: other systems.
                    262: Note that a non-zero
                    263: .Fa rm_so
                    264: does not imply
                    265: .Dv REG_NOTBOL ;
                    266: .Dv REG_STARTEND
                    267: affects only the location of the string, not how it is matched.
                    268: .El
                    269: .Pp
1.1       jtc       270: See
1.5       kleink    271: .Xr re_format 7
1.1       jtc       272: for a discussion of what is matched in situations where an RE or a
                    273: portion thereof could match any of several substrings of
1.5       kleink    274: .Fa string .
                    275: .Pp
1.1       jtc       276: Normally,
1.5       kleink    277: .Fn regexec
                    278: returns 0 for success and the non-zero code
                    279: .Dv REG_NOMATCH
                    280: for failure.
1.1       jtc       281: Other non-zero error codes may be returned in exceptional situations;
1.11      wiz       282: see
                    283: .Sx DIAGNOSTICS .
1.5       kleink    284: .Pp
                    285: If
                    286: .Dv REG_NOSUB
                    287: was specified in the compilation of the RE, or if
                    288: .Fa nmatch
1.1       jtc       289: is 0,
1.5       kleink    290: .Fn regexec
1.1       jtc       291: ignores the
1.5       kleink    292: .Fa pmatch
                    293: argument (but see below for the case where
                    294: .Dv REG_STARTEND
                    295: is specified).
1.1       jtc       296: Otherwise,
1.5       kleink    297: .Fa pmatch
1.1       jtc       298: points to an array of
1.5       kleink    299: .Fa nmatch
1.1       jtc       300: structures of type
1.5       kleink    301: .Fa regmatch_t .
1.1       jtc       302: Such a structure has at least the members
1.5       kleink    303: .Fa rm_so
1.1       jtc       304: and
1.5       kleink    305: .Fa rm_eo ,
1.1       jtc       306: both of type
1.5       kleink    307: .Fa regoff_t
1.1       jtc       308: (a signed arithmetic type at least as large as an
1.5       kleink    309: .Fa off_t
1.1       jtc       310: and a
1.5       kleink    311: .Fa ssize_t ) ,
1.1       jtc       312: containing respectively the offset of the first character of a substring
                    313: and the offset of the first character after the end of the substring.
                    314: Offsets are measured from the beginning of the
1.5       kleink    315: .Fa string
1.1       jtc       316: argument given to
1.5       kleink    317: .Fn regexec .
1.1       jtc       318: An empty substring is denoted by equal offsets,
                    319: both indicating the character following the empty substring.
1.5       kleink    320: .Pp
1.1       jtc       321: The 0th member of the
1.5       kleink    322: .Fa pmatch
1.1       jtc       323: array is filled in to indicate what substring of
1.5       kleink    324: .Fa string
1.1       jtc       325: was matched by the entire RE.
                    326: Remaining members report what substring was matched by parenthesized
                    327: subexpressions within the RE;
                    328: member
1.5       kleink    329: .Fa i
1.1       jtc       330: reports subexpression
1.5       kleink    331: .Fa i ,
                    332: with subexpressions counted (starting at 1) by the order of their
                    333: opening parentheses in the RE, left to right.
1.1       jtc       334: Unused entries in the array\(emcorresponding either to subexpressions that
                    335: did not participate in the match at all, or to subexpressions that do not
1.5       kleink    336: exist in the RE (that is,
                    337: .Fa i
1.12      ross      338: \*[Gt]
                    339: .Fa preg-\*[Gt]re_nsub )
1.5       kleink    340: \(emhave both
                    341: .Fa rm_so
1.1       jtc       342: and
1.5       kleink    343: .Fa rm_eo
                    344: set to -1.
1.1       jtc       345: If a subexpression participated in the match several times,
                    346: the reported substring is the last one it matched.
                    347: (Note, as an example in particular, that when the RE `(b*)+' matches `bbb',
                    348: the parenthesized subexpression matches each of the three `b's and then
                    349: an infinite number of empty strings following the last `b',
                    350: so the reported substring is one of the empties.)
1.5       kleink    351: .Pp
                    352: If
                    353: .Dv REG_STARTEND
                    354: is specified,
                    355: .Fa pmatch
1.1       jtc       356: must point to at least one
1.5       kleink    357: .Fa regmatch_t
1.1       jtc       358: (even if
1.5       kleink    359: .Fa nmatch
                    360: is 0 or
                    361: .Dv REG_NOSUB
                    362: was specified),
                    363: to hold the input offsets for
                    364: .Dv REG_STARTEND .
1.1       jtc       365: Use for output is still entirely controlled by
1.5       kleink    366: .Fa nmatch ;
1.1       jtc       367: if
1.5       kleink    368: .Fa nmatch
                    369: is 0 or
                    370: .Dv REG_NOSUB
                    371: was specified,
1.1       jtc       372: the value of
1.5       kleink    373: .Fa pmatch [0]
1.1       jtc       374: will not be changed by a successful
1.5       kleink    375: .Fn regexec .
                    376: .Pp
1.9       lukem     377: .Fn regerror
1.1       jtc       378: maps a non-zero
1.5       kleink    379: .Fa errcode
1.1       jtc       380: from either
1.5       kleink    381: .Fn regcomp
1.1       jtc       382: or
1.5       kleink    383: .Fn regexec
1.1       jtc       384: to a human-readable, printable message.
                    385: If
1.5       kleink    386: .Fa preg
1.1       jtc       387: is non-NULL,
1.5       kleink    388: the error code should have arisen from use of the
                    389: .Fa regex_t
1.1       jtc       390: pointed to by
1.5       kleink    391: .Fa preg ,
1.1       jtc       392: and if the error code came from
1.5       kleink    393: .Fn regcomp ,
1.1       jtc       394: it should have been the result from the most recent
1.5       kleink    395: .Fn regcomp
1.1       jtc       396: using that
1.5       kleink    397: .Fa regex_t . (
1.9       lukem     398: .Fn regerror
1.1       jtc       399: may be able to supply a more detailed message using information
                    400: from the
1.5       kleink    401: .Fa regex_t . )
1.9       lukem     402: .Fn regerror
1.1       jtc       403: places the NUL-terminated message into the buffer pointed to by
1.5       kleink    404: .Fa errbuf ,
1.1       jtc       405: limiting the length (including the NUL) to at most
1.5       kleink    406: .Fa errbuf_size
1.1       jtc       407: bytes.
                    408: If the whole message won't fit,
                    409: as much of it as will fit before the terminating NUL is supplied.
                    410: In any case,
                    411: the returned value is the size of buffer needed to hold the whole
                    412: message (including terminating NUL).
                    413: If
1.5       kleink    414: .Fa errbuf_size
1.1       jtc       415: is 0,
1.5       kleink    416: .Fa errbuf
1.1       jtc       417: is ignored but the return value is still correct.
1.5       kleink    418: .Pp
1.1       jtc       419: If the
1.5       kleink    420: .Fa errcode
1.1       jtc       421: given to
1.5       kleink    422: .Fn regerror
                    423: is first ORed with
                    424: .Dv REG_ITOA ,
1.1       jtc       425: the ``message'' that results is the printable name of the error code,
                    426: e.g. ``REG_NOMATCH'',
                    427: rather than an explanation thereof.
                    428: If
1.5       kleink    429: .Fa errcode
1.10      jdolecek  430: is
1.5       kleink    431: .Dv REG_ATOI ,
1.1       jtc       432: then
1.5       kleink    433: .Fa preg
1.1       jtc       434: shall be non-NULL and the
1.5       kleink    435: .Fa re_endp
1.1       jtc       436: member of the structure it points to
                    437: must point to the printable name of an error code;
                    438: in this case, the result in
1.5       kleink    439: .Fa errbuf
1.1       jtc       440: is the decimal digits of
                    441: the numeric value of the error code
                    442: (0 if the name is not recognized).
1.5       kleink    443: .Dv REG_ITOA
                    444: and
                    445: .Dv REG_ATOI
                    446: are intended primarily as debugging facilities;
                    447: they are extensions, compatible with but not specified by
                    448: .St -p1003.2-92 ,
                    449: and should be used with caution in software intended to be portable to
                    450: other systems.
1.1       jtc       451: Be warned also that they are considered experimental and changes are possible.
1.5       kleink    452: .Pp
1.9       lukem     453: .Fn regfree
1.1       jtc       454: frees any dynamically-allocated storage associated with the compiled RE
                    455: pointed to by
1.5       kleink    456: .Fa preg .
1.1       jtc       457: The remaining
1.5       kleink    458: .Fa regex_t
1.1       jtc       459: is no longer a valid compiled RE
                    460: and the effect of supplying it to
1.5       kleink    461: .Fn regexec
1.1       jtc       462: or
1.5       kleink    463: .Fn regerror
1.1       jtc       464: is undefined.
1.5       kleink    465: .Pp
1.1       jtc       466: None of these functions references global variables except for tables
                    467: of constants;
                    468: all are safe for use from multiple threads if the arguments are safe.
1.5       kleink    469: .Sh IMPLEMENTATION CHOICES
                    470: There are a number of decisions that
                    471: .St -p1003.2-92
                    472: leaves up to the implementor,
1.1       jtc       473: either by explicitly saying ``undefined'' or by virtue of them being
                    474: forbidden by the RE grammar.
                    475: This implementation treats them as follows.
1.5       kleink    476: .Pp
1.1       jtc       477: See
1.5       kleink    478: .Xr re_format 7
1.1       jtc       479: for a discussion of the definition of case-independent matching.
1.5       kleink    480: .Pp
1.1       jtc       481: There is no particular limit on the length of REs,
                    482: except insofar as memory is limited.
                    483: Memory usage is approximately linear in RE size, and largely insensitive
                    484: to RE complexity, except for bounded repetitions.
                    485: See BUGS for one short RE using them
                    486: that will run almost any system out of memory.
1.5       kleink    487: .Pp
1.1       jtc       488: A backslashed character other than one specifically given a magic meaning
1.5       kleink    489: by
                    490: .St -p1003.2-92
                    491: (such magic meanings occur only in obsolete [``basic''] REs)
1.1       jtc       492: is taken as an ordinary character.
1.5       kleink    493: .Pp
                    494: Any unmatched [ is a
                    495: .Dv REG_EBRACK
                    496: error.
                    497: .Pp
1.1       jtc       498: Equivalence classes cannot begin or end bracket-expression ranges.
                    499: The endpoint of one range cannot begin another.
1.5       kleink    500: .Pp
                    501: .Dv RE_DUP_MAX ,
                    502: the limit on repetition counts in bounded repetitions, is 255.
                    503: .Pp
1.1       jtc       504: A repetition operator (?, *, +, or bounds) cannot follow another
                    505: repetition operator.
                    506: A repetition operator cannot begin an expression or subexpression
                    507: or follow `^' or `|'.
1.5       kleink    508: .Pp
1.1       jtc       509: `|' cannot appear first or last in a (sub)expression or after another `|',
                    510: i.e. an operand of `|' cannot be an empty subexpression.
                    511: An empty parenthesized subexpression, `()', is legal and matches an
                    512: empty (sub)string.
                    513: An empty string is not a legal RE.
1.5       kleink    514: .Pp
1.1       jtc       515: A `{' followed by a digit is considered the beginning of bounds for a
                    516: bounded repetition, which must then follow the syntax for bounds.
                    517: A `{' \fInot\fR followed by a digit is considered an ordinary character.
1.5       kleink    518: .Pp
1.1       jtc       519: `^' and `$' beginning and ending subexpressions in obsolete (``basic'')
                    520: REs are anchors, not ordinary characters.
1.5       kleink    521: .Sh DIAGNOSTICS
1.1       jtc       522: Non-zero error codes from
1.5       kleink    523: .Fn regcomp
1.1       jtc       524: and
1.5       kleink    525: .Fn regexec
1.1       jtc       526: include the following:
1.5       kleink    527: .Pp
                    528: .Bl -tag -width XXXREG_ECOLLATE -compact
                    529: .It Dv REG_NOMATCH
                    530: regexec() failed to match
                    531: .It Dv REG_BADPAT
                    532: invalid regular expression
                    533: .It Dv REG_ECOLLATE
                    534: invalid collating element
                    535: .It Dv REG_ECTYPE
                    536: invalid character class
                    537: .It Dv REG_EESCAPE
                    538: \e applied to unescapable character
                    539: .It Dv REG_ESUBREG
                    540: invalid backreference number
                    541: .It Dv REG_EBRACK
                    542: brackets [ ] not balanced
                    543: .It Dv REG_EPAREN
                    544: parentheses ( ) not balanced
                    545: .It Dv REG_EBRACE
                    546: braces { } not balanced
                    547: .It Dv REG_BADBR
                    548: invalid repetition count(s) in { }
                    549: .It Dv REG_ERANGE
                    550: invalid character range in [ ]
                    551: .It Dv REG_ESPACE
                    552: ran out of memory
                    553: .It Dv REG_BADRPT
                    554: ?, *, or + operand invalid
                    555: .It Dv REG_EMPTY
                    556: empty (sub)expression
                    557: .It Dv REG_ASSERT
                    558: ``can't happen''\(emyou found a bug
                    559: .It Dv REG_INVARG
                    560: invalid argument, e.g. negative-length string
                    561: .El
1.11      wiz       562: .Sh SEE ALSO
                    563: .Xr grep 1 ,
                    564: .Xr sed 1 ,
                    565: .Xr re_format 7
                    566: .Pp
                    567: .St -p1003.2-92 ,
                    568: sections 2.8 (Regular Expression Notation)
                    569: and
                    570: B.5 (C Binding for Regular Expression Matching).
1.5       kleink    571: .Sh HISTORY
1.3       cgd       572: Originally written by Henry Spencer.
1.8       perry     573: Altered for inclusion in the
                    574: .Bx 4.4
                    575: distribution.
1.5       kleink    576: .Sh BUGS
1.1       jtc       577: There is one known functionality bug.
                    578: The implementation of internationalization is incomplete:
1.5       kleink    579: the locale is always assumed to be the default one of
                    580: .St -p1003.2-92 ,
1.1       jtc       581: and only the collating elements etc. of that locale are available.
1.5       kleink    582: .Pp
1.1       jtc       583: The back-reference code is subtle and doubts linger about its correctness
                    584: in complex cases.
1.5       kleink    585: .Pp
1.9       lukem     586: .Fn regexec
1.1       jtc       587: performance is poor.
                    588: This will improve with later releases.
1.9       lukem     589: .Fa nmatch
1.1       jtc       590: exceeding 0 is expensive;
1.5       kleink    591: .Fa nmatch
1.1       jtc       592: exceeding 1 is worse.
1.9       lukem     593: .Fa regexec
1.5       kleink    594: is largely insensitive to RE complexity
                    595: .Em except
                    596: that back references are massively expensive.
1.1       jtc       597: RE length does matter; in particular, there is a strong speed bonus
                    598: for keeping RE length under about 30 characters,
                    599: with most special characters counting roughly double.
1.5       kleink    600: .Pp
1.9       lukem     601: .Fn regcomp
1.1       jtc       602: implements bounded repetitions by macro expansion,
                    603: which is costly in time and space if counts are large
                    604: or bounded repetitions are nested.
                    605: An RE like, say,
                    606: `((((a{1,100}){1,100}){1,100}){1,100}){1,100}'
                    607: will (eventually) run almost any existing machine out of swap space.
1.5       kleink    608: .Pp
1.1       jtc       609: There are suspected problems with response to obscure error conditions.
                    610: Notably,
                    611: certain kinds of internal overflow,
                    612: produced only by truly enormous REs or by multiply nested bounded repetitions,
                    613: are probably not handled well.
1.5       kleink    614: .Pp
                    615: Due to a mistake in
                    616: .St -p1003.2-92 ,
                    617: things like `a)b' are legal REs because `)' is a special character
1.14      wiz       618: only in the presence of a previous unmatched `('.
                    619: This can't be fixed until the spec is fixed.
1.5       kleink    620: .Pp
1.1       jtc       621: The standard's definition of back references is vague.
                    622: For example, does
                    623: `a\e(\e(b\e)*\e2\e)*d' match `abbbd'?
1.5       kleink    624: Until the standard is clarified, behavior in such cases should not be
                    625: relied on.
                    626: .Pp
1.1       jtc       627: The implementation of word-boundary matching is a bit of a kludge,
                    628: and bugs may lurk in combinations of word-boundary matching and anchoring.

CVSweb <webmaster@jp.NetBSD.org>