[BACK]Return to regex.3 CVS log [TXT][DIR] Up to [cvs.NetBSD.org] / src / lib / libc / regex

Annotation of src/lib/libc/regex/regex.3, Revision 1.26

1.26    ! kamil       1: .\"    $NetBSD: regex.3,v 1.25 2017/07/03 21:32:49 wiz Exp $
1.4       cgd         2: .\"
1.3       cgd         3: .\" Copyright (c) 1992, 1993, 1994
                      4: .\"    The Regents of the University of California.  All rights reserved.
1.17      agc         5: .\"
                      6: .\" This code is derived from software contributed to Berkeley by
                      7: .\" Henry Spencer.
                      8: .\"
                      9: .\" Redistribution and use in source and binary forms, with or without
                     10: .\" modification, are permitted provided that the following conditions
                     11: .\" are met:
                     12: .\" 1. Redistributions of source code must retain the above copyright
                     13: .\"    notice, this list of conditions and the following disclaimer.
                     14: .\" 2. Redistributions in binary form must reproduce the above copyright
                     15: .\"    notice, this list of conditions and the following disclaimer in the
                     16: .\"    documentation and/or other materials provided with the distribution.
                     17: .\" 3. Neither the name of the University nor the names of its contributors
                     18: .\"    may be used to endorse or promote products derived from this software
                     19: .\"    without specific prior written permission.
                     20: .\"
                     21: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
                     22: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
                     23: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
                     24: .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
                     25: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
                     26: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
                     27: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
                     28: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
                     29: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
                     30: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
                     31: .\" SUCH DAMAGE.
                     32: .\"
                     33: .\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
1.3       cgd        34: .\"
                     35: .\" This code is derived from software contributed to Berkeley by
                     36: .\" Henry Spencer.
                     37: .\"
                     38: .\" Redistribution and use in source and binary forms, with or without
                     39: .\" modification, are permitted provided that the following conditions
                     40: .\" are met:
                     41: .\" 1. Redistributions of source code must retain the above copyright
                     42: .\"    notice, this list of conditions and the following disclaimer.
                     43: .\" 2. Redistributions in binary form must reproduce the above copyright
                     44: .\"    notice, this list of conditions and the following disclaimer in the
                     45: .\"    documentation and/or other materials provided with the distribution.
                     46: .\" 3. All advertising materials mentioning features or use of this software
                     47: .\"    must display the following acknowledgement:
                     48: .\"    This product includes software developed by the University of
                     49: .\"    California, Berkeley and its contributors.
                     50: .\" 4. Neither the name of the University nor the names of its contributors
                     51: .\"    may be used to endorse or promote products derived from this software
                     52: .\"    without specific prior written permission.
                     53: .\"
                     54: .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
                     55: .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
                     56: .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
                     57: .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
                     58: .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
                     59: .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
                     60: .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
                     61: .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
                     62: .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
                     63: .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
                     64: .\" SUCH DAMAGE.
                     65: .\"
                     66: .\"    @(#)regex.3     8.4 (Berkeley) 3/20/94
                     67: .\"
1.26    ! kamil      68: .Dd February 26, 2018
1.5       kleink     69: .Dt REGEX 3
                     70: .Os
                     71: .Sh NAME
                     72: .Nm regex ,
                     73: .Nm regcomp ,
                     74: .Nm regexec ,
                     75: .Nm regerror ,
1.23      christos   76: .Nm regfree ,
                     77: .Nm regasub ,
1.24      christos   78: .Nm regnsub
1.5       kleink     79: .Nd regular-expression library
1.7       perry      80: .Sh LIBRARY
                     81: .Lb libc
1.5       kleink     82: .Sh SYNOPSIS
1.16      wiz        83: .In regex.h
1.5       kleink     84: .Ft int
1.15      kleink     85: .Fn regcomp "regex_t * restrict preg" "const char * restrict pattern" "int cflags"
1.5       kleink     86: .Ft int
1.15      kleink     87: .Fn regexec "const regex_t * restrict preg" "const char * restrict string" "size_t nmatch" "regmatch_t pmatch[]" "int eflags"
1.5       kleink     88: .Ft size_t
1.15      kleink     89: .Fn regerror "int errcode" "const regex_t * restrict preg" "char * restrict errbuf" "size_t errbuf_size"
1.5       kleink     90: .Ft void
                     91: .Fn regfree "regex_t *preg"
1.23      christos   92: .Ft ssize_t
1.24      christos   93: .Fn regnsub "char *buf" "size_t bufsiz" "const char *sub" "const regmatch_t *rm" "const char *str"
1.23      christos   94: .Ft ssize_t
                     95: .Fn regasub "char **buf" "const char *sub" "const regmatch_t *rm" "const char *sstr"
1.5       kleink     96: .Sh DESCRIPTION
                     97: These routines implement
                     98: .St -p1003.2-92
                     99: regular expressions (``RE''s);
1.1       jtc       100: see
1.5       kleink    101: .Xr re_format 7 .
                    102: .Fn regcomp
1.1       jtc       103: compiles an RE written as a string into an internal form,
1.5       kleink    104: .Fn regexec
1.1       jtc       105: matches that internal form against a string and reports results,
1.5       kleink    106: .Fn regerror
1.1       jtc       107: transforms error codes from either into human-readable messages,
                    108: and
1.5       kleink    109: .Fn regfree
1.1       jtc       110: frees any dynamically-allocated storage used by the internal form
                    111: of an RE.
1.5       kleink    112: .Pp
1.1       jtc       113: The header
1.21      joerg     114: .In regex.h
1.1       jtc       115: declares two structure types,
1.5       kleink    116: .Fa regex_t
1.1       jtc       117: and
1.5       kleink    118: .Fa regmatch_t ,
1.1       jtc       119: the former for compiled internal forms and the latter for match reporting.
                    120: It also declares the four functions,
                    121: a type
1.5       kleink    122: .Fa regoff_t ,
1.1       jtc       123: and a number of constants with names starting with ``REG_''.
1.5       kleink    124: .Pp
1.9       lukem     125: .Fn regcomp
1.1       jtc       126: compiles the regular expression contained in the
1.5       kleink    127: .Fa pattern
1.1       jtc       128: string,
                    129: subject to the flags in
1.5       kleink    130: .Fa cflags ,
1.1       jtc       131: and places the results in the
1.5       kleink    132: .Fa regex_t
1.1       jtc       133: structure pointed to by
1.5       kleink    134: .Fa preg .
1.9       lukem     135: .Fa cflags
1.1       jtc       136: is the bitwise OR of zero or more of the following flags:
1.5       kleink    137: .Bl -tag -width XXXREG_EXTENDED
                    138: .It Dv REG_EXTENDED
                    139: Compile modern (``extended'') REs, rather than the obsolete
                    140: (``basic'') REs that are the default.
                    141: .It Dv REG_BASIC
1.1       jtc       142: This is a synonym for 0,
                    143: provided as a counterpart to REG_EXTENDED to improve readability.
1.5       kleink    144: .It Dv REG_NOSPEC
1.14      wiz       145: Compile with recognition of all special characters turned off.
                    146: All characters are thus considered ordinary, so the ``RE'' is a literal
1.5       kleink    147: string.
                    148: This is an extension, compatible with but not specified by
                    149: .St -p1003.2-92 ,
                    150: and should be used with caution in software intended to be portable to
                    151: other systems.
                    152: .Dv REG_EXTENDED
                    153: and
                    154: .Dv REG_NOSPEC
                    155: may not be used in the same call to
                    156: .Fn regcomp .
                    157: .It Dv REG_ICASE
1.14      wiz       158: Compile for matching that ignores upper/lower case distinctions.
                    159: See
1.5       kleink    160: .Xr re_format 7 .
                    161: .It Dv REG_NOSUB
                    162: Compile for matching that need only report success or failure, not
                    163: what was matched.
                    164: .It Dv REG_NEWLINE
1.1       jtc       165: Compile for newline-sensitive matching.
                    166: By default, newline is a completely ordinary character with no special
                    167: meaning in either REs or strings.
                    168: With this flag,
                    169: `[^' bracket expressions and `.' never match newline,
                    170: a `^' anchor matches the null string after any newline in the string
                    171: in addition to its normal function,
                    172: and the `$' anchor matches the null string before any newline in the
                    173: string in addition to its normal function.
1.5       kleink    174: .It Dv REG_PEND
                    175: The regular expression ends, not at the first NUL, but just before the
                    176: character pointed to by the
                    177: .Fa re_endp
1.1       jtc       178: member of the structure pointed to by
1.5       kleink    179: .Fa preg .
1.1       jtc       180: The
1.5       kleink    181: .Fa re_endp
1.1       jtc       182: member is of type
1.5       kleink    183: .Fa "const\ char\ *" .
                    184: This flag permits inclusion of NULs in the RE; they are considered
                    185: ordinary characters.
                    186: This is an extension, compatible with but not specified by
                    187: .St -p1003.2-92 ,
                    188: and should be used with caution in software intended to be portable to
                    189: other systems.
                    190: .El
                    191: .Pp
1.1       jtc       192: When successful,
1.5       kleink    193: .Fn regcomp
1.1       jtc       194: returns 0 and fills in the structure pointed to by
1.5       kleink    195: .Fa preg .
                    196: One member of that structure (other than
                    197: .Fa re_endp )
1.1       jtc       198: is publicized:
1.5       kleink    199: .Fa re_nsub ,
1.1       jtc       200: of type
1.5       kleink    201: .Fa size_t ,
1.1       jtc       202: contains the number of parenthesized subexpressions within the RE
                    203: (except that the value of this member is undefined if the
1.5       kleink    204: .Dv REG_NOSUB
                    205: flag was used).
1.1       jtc       206: If
1.5       kleink    207: .Fn regcomp
1.1       jtc       208: fails, it returns a non-zero error code;
1.11      wiz       209: see
                    210: .Sx DIAGNOSTICS .
1.5       kleink    211: .Pp
1.9       lukem     212: .Fn regexec
1.1       jtc       213: matches the compiled RE pointed to by
1.5       kleink    214: .Fa preg
1.1       jtc       215: against the
1.5       kleink    216: .Fa string ,
1.1       jtc       217: subject to the flags in
1.5       kleink    218: .Fa eflags ,
1.1       jtc       219: and reports results using
1.5       kleink    220: .Fa nmatch ,
                    221: .Fa pmatch ,
1.1       jtc       222: and the returned value.
                    223: The RE must have been compiled by a previous invocation of
1.5       kleink    224: .Fn regcomp .
1.1       jtc       225: The compiled form is not altered during execution of
1.5       kleink    226: .Fn regexec ,
1.1       jtc       227: so a single compiled RE can be used simultaneously by multiple threads.
1.5       kleink    228: .Pp
1.1       jtc       229: By default,
                    230: the NUL-terminated string pointed to by
1.5       kleink    231: .Fa string
1.1       jtc       232: is considered to be the text of an entire line, minus any terminating
                    233: newline.
                    234: The
1.5       kleink    235: .Fa eflags
1.1       jtc       236: argument is the bitwise OR of zero or more of the following flags:
1.5       kleink    237: .Bl -tag -width XXXREG_NOTBOL
                    238: .It Dv REG_NOTBOL
                    239: The first character of the string
1.1       jtc       240: is not the beginning of a line, so the `^' anchor should not match before it.
1.5       kleink    241: This does not affect the behavior of newlines under
                    242: .Dv REG_NEWLINE .
                    243: .It Dv REG_NOTEOL
                    244: The NUL terminating the string does not end a line, so the `$' anchor
1.14      wiz       245: should not match before it.
                    246: This does not affect the behavior of newlines under
1.5       kleink    247: .Dv REG_NEWLINE .
                    248: .It Dv REG_STARTEND
1.1       jtc       249: The string is considered to start at
1.5       kleink    250: .Fa string
                    251: +
                    252: .Fa pmatch[0].rm_so
1.1       jtc       253: and to have a terminating NUL located at
1.5       kleink    254: .Fa string
                    255: +
                    256: .Fa pmatch[0].rm_eo
1.1       jtc       257: (there need not actually be a NUL at that location),
                    258: regardless of the value of
1.5       kleink    259: .Fa nmatch .
1.1       jtc       260: See below for the definition of
1.5       kleink    261: .Fa pmatch
1.1       jtc       262: and
1.5       kleink    263: .Fa nmatch .
                    264: This is an extension, compatible with but not specified by
                    265: .St -p1003.2-92 ,
                    266: and should be used with caution in software intended to be portable to
                    267: other systems.
                    268: Note that a non-zero
                    269: .Fa rm_so
                    270: does not imply
                    271: .Dv REG_NOTBOL ;
                    272: .Dv REG_STARTEND
                    273: affects only the location of the string, not how it is matched.
                    274: .El
                    275: .Pp
1.1       jtc       276: See
1.5       kleink    277: .Xr re_format 7
1.1       jtc       278: for a discussion of what is matched in situations where an RE or a
                    279: portion thereof could match any of several substrings of
1.5       kleink    280: .Fa string .
                    281: .Pp
1.1       jtc       282: Normally,
1.5       kleink    283: .Fn regexec
                    284: returns 0 for success and the non-zero code
                    285: .Dv REG_NOMATCH
                    286: for failure.
1.1       jtc       287: Other non-zero error codes may be returned in exceptional situations;
1.11      wiz       288: see
                    289: .Sx DIAGNOSTICS .
1.5       kleink    290: .Pp
                    291: If
                    292: .Dv REG_NOSUB
                    293: was specified in the compilation of the RE, or if
                    294: .Fa nmatch
1.1       jtc       295: is 0,
1.5       kleink    296: .Fn regexec
1.1       jtc       297: ignores the
1.5       kleink    298: .Fa pmatch
                    299: argument (but see below for the case where
                    300: .Dv REG_STARTEND
                    301: is specified).
1.1       jtc       302: Otherwise,
1.5       kleink    303: .Fa pmatch
1.1       jtc       304: points to an array of
1.5       kleink    305: .Fa nmatch
1.1       jtc       306: structures of type
1.5       kleink    307: .Fa regmatch_t .
1.1       jtc       308: Such a structure has at least the members
1.5       kleink    309: .Fa rm_so
1.1       jtc       310: and
1.5       kleink    311: .Fa rm_eo ,
1.1       jtc       312: both of type
1.5       kleink    313: .Fa regoff_t
1.1       jtc       314: (a signed arithmetic type at least as large as an
1.5       kleink    315: .Fa off_t
1.1       jtc       316: and a
1.5       kleink    317: .Fa ssize_t ) ,
1.1       jtc       318: containing respectively the offset of the first character of a substring
                    319: and the offset of the first character after the end of the substring.
                    320: Offsets are measured from the beginning of the
1.5       kleink    321: .Fa string
1.1       jtc       322: argument given to
1.5       kleink    323: .Fn regexec .
1.1       jtc       324: An empty substring is denoted by equal offsets,
                    325: both indicating the character following the empty substring.
1.5       kleink    326: .Pp
1.1       jtc       327: The 0th member of the
1.5       kleink    328: .Fa pmatch
1.1       jtc       329: array is filled in to indicate what substring of
1.5       kleink    330: .Fa string
1.1       jtc       331: was matched by the entire RE.
                    332: Remaining members report what substring was matched by parenthesized
                    333: subexpressions within the RE;
                    334: member
1.5       kleink    335: .Fa i
1.1       jtc       336: reports subexpression
1.5       kleink    337: .Fa i ,
                    338: with subexpressions counted (starting at 1) by the order of their
                    339: opening parentheses in the RE, left to right.
1.1       jtc       340: Unused entries in the array\(emcorresponding either to subexpressions that
                    341: did not participate in the match at all, or to subexpressions that do not
1.5       kleink    342: exist in the RE (that is,
                    343: .Fa i
1.25      wiz       344: >
                    345: .Fa preg->re_nsub )
1.5       kleink    346: \(emhave both
                    347: .Fa rm_so
1.1       jtc       348: and
1.5       kleink    349: .Fa rm_eo
                    350: set to -1.
1.1       jtc       351: If a subexpression participated in the match several times,
                    352: the reported substring is the last one it matched.
                    353: (Note, as an example in particular, that when the RE `(b*)+' matches `bbb',
                    354: the parenthesized subexpression matches each of the three `b's and then
                    355: an infinite number of empty strings following the last `b',
                    356: so the reported substring is one of the empties.)
1.5       kleink    357: .Pp
                    358: If
                    359: .Dv REG_STARTEND
                    360: is specified,
                    361: .Fa pmatch
1.1       jtc       362: must point to at least one
1.5       kleink    363: .Fa regmatch_t
1.1       jtc       364: (even if
1.5       kleink    365: .Fa nmatch
                    366: is 0 or
                    367: .Dv REG_NOSUB
                    368: was specified),
                    369: to hold the input offsets for
                    370: .Dv REG_STARTEND .
1.1       jtc       371: Use for output is still entirely controlled by
1.5       kleink    372: .Fa nmatch ;
1.1       jtc       373: if
1.5       kleink    374: .Fa nmatch
                    375: is 0 or
                    376: .Dv REG_NOSUB
                    377: was specified,
1.1       jtc       378: the value of
1.5       kleink    379: .Fa pmatch [0]
1.1       jtc       380: will not be changed by a successful
1.5       kleink    381: .Fn regexec .
                    382: .Pp
1.9       lukem     383: .Fn regerror
1.1       jtc       384: maps a non-zero
1.5       kleink    385: .Fa errcode
1.1       jtc       386: from either
1.5       kleink    387: .Fn regcomp
1.1       jtc       388: or
1.5       kleink    389: .Fn regexec
1.1       jtc       390: to a human-readable, printable message.
                    391: If
1.5       kleink    392: .Fa preg
1.1       jtc       393: is non-NULL,
1.5       kleink    394: the error code should have arisen from use of the
                    395: .Fa regex_t
1.1       jtc       396: pointed to by
1.5       kleink    397: .Fa preg ,
1.1       jtc       398: and if the error code came from
1.5       kleink    399: .Fn regcomp ,
1.1       jtc       400: it should have been the result from the most recent
1.5       kleink    401: .Fn regcomp
1.1       jtc       402: using that
1.22      enami     403: .Fa regex_t .
                    404: .Po Fn regerror
1.1       jtc       405: may be able to supply a more detailed message using information
                    406: from the
1.22      enami     407: .Fa regex_t . Pc
1.9       lukem     408: .Fn regerror
1.1       jtc       409: places the NUL-terminated message into the buffer pointed to by
1.5       kleink    410: .Fa errbuf ,
1.1       jtc       411: limiting the length (including the NUL) to at most
1.5       kleink    412: .Fa errbuf_size
1.1       jtc       413: bytes.
                    414: If the whole message won't fit,
                    415: as much of it as will fit before the terminating NUL is supplied.
                    416: In any case,
                    417: the returned value is the size of buffer needed to hold the whole
                    418: message (including terminating NUL).
                    419: If
1.5       kleink    420: .Fa errbuf_size
1.1       jtc       421: is 0,
1.5       kleink    422: .Fa errbuf
1.1       jtc       423: is ignored but the return value is still correct.
1.5       kleink    424: .Pp
1.1       jtc       425: If the
1.5       kleink    426: .Fa errcode
1.1       jtc       427: given to
1.5       kleink    428: .Fn regerror
                    429: is first ORed with
                    430: .Dv REG_ITOA ,
1.1       jtc       431: the ``message'' that results is the printable name of the error code,
                    432: e.g. ``REG_NOMATCH'',
                    433: rather than an explanation thereof.
                    434: If
1.5       kleink    435: .Fa errcode
1.10      jdolecek  436: is
1.5       kleink    437: .Dv REG_ATOI ,
1.1       jtc       438: then
1.5       kleink    439: .Fa preg
1.1       jtc       440: shall be non-NULL and the
1.5       kleink    441: .Fa re_endp
1.1       jtc       442: member of the structure it points to
                    443: must point to the printable name of an error code;
                    444: in this case, the result in
1.5       kleink    445: .Fa errbuf
1.1       jtc       446: is the decimal digits of
                    447: the numeric value of the error code
                    448: (0 if the name is not recognized).
1.5       kleink    449: .Dv REG_ITOA
                    450: and
                    451: .Dv REG_ATOI
                    452: are intended primarily as debugging facilities;
                    453: they are extensions, compatible with but not specified by
                    454: .St -p1003.2-92 ,
                    455: and should be used with caution in software intended to be portable to
                    456: other systems.
1.1       jtc       457: Be warned also that they are considered experimental and changes are possible.
1.5       kleink    458: .Pp
1.9       lukem     459: .Fn regfree
1.1       jtc       460: frees any dynamically-allocated storage associated with the compiled RE
                    461: pointed to by
1.5       kleink    462: .Fa preg .
1.1       jtc       463: The remaining
1.5       kleink    464: .Fa regex_t
1.1       jtc       465: is no longer a valid compiled RE
                    466: and the effect of supplying it to
1.5       kleink    467: .Fn regexec
1.1       jtc       468: or
1.5       kleink    469: .Fn regerror
1.1       jtc       470: is undefined.
1.5       kleink    471: .Pp
1.1       jtc       472: None of these functions references global variables except for tables
                    473: of constants;
                    474: all are safe for use from multiple threads if the arguments are safe.
1.23      christos  475: .Pp
                    476: The
1.24      christos  477: .Fn regnsub
1.23      christos  478: and
                    479: .Fn regasub
                    480: functions perform substitutions using
                    481: .Xr sed 1
                    482: like syntax.
                    483: They return the length of the string that would have been created
                    484: if there was enough space or
                    485: .Dv \-1
                    486: on error, setting
                    487: .Dv errno .
                    488: The result
                    489: is being placed in
                    490: .Fa buf
                    491: which is user-supplied in
1.24      christos  492: .Fn regnsub
1.23      christos  493: and dynamically allocated in
                    494: .Fn regasub .
                    495: The
                    496: .Fa sub
                    497: argument contains a substitution string which might refer to the first
                    498: 9 regular expression strings using
                    499: .Dq \e<n>
                    500: to refer to the nth matched
                    501: item, or
                    502: .Dq &
                    503: (which is equivalent to
                    504: .Dq \e0 )
                    505: to refer to the full match.
                    506: The
                    507: .Fa rm
                    508: array must be at least 10 elements long, and should contain the result
                    509: of the matches from a previous
                    510: .Fn regexec
                    511: call.
1.26    ! kamil     512: Only 10 elements of the
        !           513: .Fa rm
        !           514: array can be used.
1.23      christos  515: The
                    516: .Fa str
                    517: argument contains the source string to apply the transformation to.
1.5       kleink    518: .Sh IMPLEMENTATION CHOICES
                    519: There are a number of decisions that
                    520: .St -p1003.2-92
                    521: leaves up to the implementor,
1.1       jtc       522: either by explicitly saying ``undefined'' or by virtue of them being
                    523: forbidden by the RE grammar.
                    524: This implementation treats them as follows.
1.5       kleink    525: .Pp
1.1       jtc       526: See
1.5       kleink    527: .Xr re_format 7
1.1       jtc       528: for a discussion of the definition of case-independent matching.
1.5       kleink    529: .Pp
1.1       jtc       530: There is no particular limit on the length of REs,
                    531: except insofar as memory is limited.
                    532: Memory usage is approximately linear in RE size, and largely insensitive
                    533: to RE complexity, except for bounded repetitions.
                    534: See BUGS for one short RE using them
                    535: that will run almost any system out of memory.
1.5       kleink    536: .Pp
1.1       jtc       537: A backslashed character other than one specifically given a magic meaning
1.5       kleink    538: by
                    539: .St -p1003.2-92
                    540: (such magic meanings occur only in obsolete [``basic''] REs)
1.1       jtc       541: is taken as an ordinary character.
1.5       kleink    542: .Pp
                    543: Any unmatched [ is a
                    544: .Dv REG_EBRACK
                    545: error.
                    546: .Pp
1.1       jtc       547: Equivalence classes cannot begin or end bracket-expression ranges.
                    548: The endpoint of one range cannot begin another.
1.5       kleink    549: .Pp
                    550: .Dv RE_DUP_MAX ,
                    551: the limit on repetition counts in bounded repetitions, is 255.
                    552: .Pp
1.1       jtc       553: A repetition operator (?, *, +, or bounds) cannot follow another
                    554: repetition operator.
                    555: A repetition operator cannot begin an expression or subexpression
                    556: or follow `^' or `|'.
1.5       kleink    557: .Pp
1.1       jtc       558: `|' cannot appear first or last in a (sub)expression or after another `|',
                    559: i.e. an operand of `|' cannot be an empty subexpression.
                    560: An empty parenthesized subexpression, `()', is legal and matches an
                    561: empty (sub)string.
                    562: An empty string is not a legal RE.
1.5       kleink    563: .Pp
1.1       jtc       564: A `{' followed by a digit is considered the beginning of bounds for a
                    565: bounded repetition, which must then follow the syntax for bounds.
1.19      joerg     566: A `{'
                    567: .Em not
                    568: followed by a digit is considered an ordinary character.
1.5       kleink    569: .Pp
1.1       jtc       570: `^' and `$' beginning and ending subexpressions in obsolete (``basic'')
                    571: REs are anchors, not ordinary characters.
1.5       kleink    572: .Sh DIAGNOSTICS
1.1       jtc       573: Non-zero error codes from
1.5       kleink    574: .Fn regcomp
1.1       jtc       575: and
1.5       kleink    576: .Fn regexec
1.1       jtc       577: include the following:
1.5       kleink    578: .Pp
                    579: .Bl -tag -width XXXREG_ECOLLATE -compact
                    580: .It Dv REG_NOMATCH
1.20      wiz       581: .Fn regexec
                    582: failed to match
1.5       kleink    583: .It Dv REG_BADPAT
                    584: invalid regular expression
                    585: .It Dv REG_ECOLLATE
                    586: invalid collating element
                    587: .It Dv REG_ECTYPE
                    588: invalid character class
                    589: .It Dv REG_EESCAPE
                    590: \e applied to unescapable character
                    591: .It Dv REG_ESUBREG
                    592: invalid backreference number
                    593: .It Dv REG_EBRACK
                    594: brackets [ ] not balanced
                    595: .It Dv REG_EPAREN
                    596: parentheses ( ) not balanced
                    597: .It Dv REG_EBRACE
                    598: braces { } not balanced
                    599: .It Dv REG_BADBR
                    600: invalid repetition count(s) in { }
                    601: .It Dv REG_ERANGE
                    602: invalid character range in [ ]
                    603: .It Dv REG_ESPACE
                    604: ran out of memory
                    605: .It Dv REG_BADRPT
                    606: ?, *, or + operand invalid
                    607: .It Dv REG_EMPTY
                    608: empty (sub)expression
                    609: .It Dv REG_ASSERT
                    610: ``can't happen''\(emyou found a bug
                    611: .It Dv REG_INVARG
                    612: invalid argument, e.g. negative-length string
                    613: .El
1.11      wiz       614: .Sh SEE ALSO
                    615: .Xr grep 1 ,
                    616: .Xr sed 1 ,
                    617: .Xr re_format 7
                    618: .Pp
                    619: .St -p1003.2-92 ,
                    620: sections 2.8 (Regular Expression Notation)
                    621: and
                    622: B.5 (C Binding for Regular Expression Matching).
1.5       kleink    623: .Sh HISTORY
1.3       cgd       624: Originally written by Henry Spencer.
1.8       perry     625: Altered for inclusion in the
                    626: .Bx 4.4
                    627: distribution.
1.23      christos  628: .Pp
                    629: The
1.24      christos  630: .Fn regnsub
1.23      christos  631: and
                    632: .Fn regasub
                    633: functions appeared in
                    634: .Nx 8 .
1.5       kleink    635: .Sh BUGS
1.1       jtc       636: There is one known functionality bug.
                    637: The implementation of internationalization is incomplete:
1.5       kleink    638: the locale is always assumed to be the default one of
                    639: .St -p1003.2-92 ,
1.1       jtc       640: and only the collating elements etc. of that locale are available.
1.5       kleink    641: .Pp
1.1       jtc       642: The back-reference code is subtle and doubts linger about its correctness
                    643: in complex cases.
1.5       kleink    644: .Pp
1.9       lukem     645: .Fn regexec
1.1       jtc       646: performance is poor.
                    647: This will improve with later releases.
1.9       lukem     648: .Fa nmatch
1.1       jtc       649: exceeding 0 is expensive;
1.5       kleink    650: .Fa nmatch
1.1       jtc       651: exceeding 1 is worse.
1.9       lukem     652: .Fa regexec
1.5       kleink    653: is largely insensitive to RE complexity
                    654: .Em except
                    655: that back references are massively expensive.
1.1       jtc       656: RE length does matter; in particular, there is a strong speed bonus
                    657: for keeping RE length under about 30 characters,
                    658: with most special characters counting roughly double.
1.5       kleink    659: .Pp
1.9       lukem     660: .Fn regcomp
1.1       jtc       661: implements bounded repetitions by macro expansion,
                    662: which is costly in time and space if counts are large
                    663: or bounded repetitions are nested.
                    664: An RE like, say,
                    665: `((((a{1,100}){1,100}){1,100}){1,100}){1,100}'
                    666: will (eventually) run almost any existing machine out of swap space.
1.5       kleink    667: .Pp
1.1       jtc       668: There are suspected problems with response to obscure error conditions.
                    669: Notably,
                    670: certain kinds of internal overflow,
                    671: produced only by truly enormous REs or by multiply nested bounded repetitions,
                    672: are probably not handled well.
1.5       kleink    673: .Pp
                    674: Due to a mistake in
                    675: .St -p1003.2-92 ,
                    676: things like `a)b' are legal REs because `)' is a special character
1.14      wiz       677: only in the presence of a previous unmatched `('.
                    678: This can't be fixed until the spec is fixed.
1.5       kleink    679: .Pp
1.1       jtc       680: The standard's definition of back references is vague.
                    681: For example, does
                    682: `a\e(\e(b\e)*\e2\e)*d' match `abbbd'?
1.5       kleink    683: Until the standard is clarified, behavior in such cases should not be
                    684: relied on.
                    685: .Pp
1.1       jtc       686: The implementation of word-boundary matching is a bit of a kludge,
                    687: and bugs may lurk in combinations of word-boundary matching and anchoring.

CVSweb <webmaster@jp.NetBSD.org>