[ Reference Manual | Alphabetic Index ]

library(regex)

Interface to POSIX regular expression handling   [more]

Predicates

compile_pattern(+Pattern, +Options, -CompiledPattern)
Precompile a pattern for repeated use
match(+Pattern, +String)
A substring of String matches the regular expression Pattern
match(+Pattern, +String, +Options)
A substring of String matches the regular expression Pattern
match(+Pattern, +String, +Options, -Match)
Match is the first substring of String that matches the regular expression Pattern
matchall(+Pattern, +String, +Options, -AllMatches)
AllMatches is a list of substrings of String which match the regular expression Pattern
matchsub(+Pattern, +String, +Options, -SubMatches)
A substring of String matches the regular expression Pattern and SubMatches are matching sub-expressions
split(+Pattern, +String, +Options, -Parts)
Parts is a list of substrings, partitioning String according to Pattern

Description

This library implements an ECLiPSe API for POSIX 1003.2 regular expressions (on Unix systems it calls the regular expression functions from the standard library, on Windows it uses Henry Spencer's regex library version 3.8).

Regular Expressions

This is just a very brief summary of the essentials. For details of regular expressions see any POSIX regex(7) man page. Two types of regular expressions are supported:
Extended Regular Expressions (the default)
These are described below and correspond essentially to those understood by the UNIX egrep command.
Basic Regular Expressions
These correspond essentially to those in the UNIX ed editor or the grep command, and are mostly obsolete.
Note that our choice of default differs from the POSIX 1003.2 C API.

Characters

Every character stands for itself, except for the characters ^.[$()|*+?{\ which must be escaped with a \ to prevent them from having special meaning (and note that, since the ECLiPSe parser already interprets backslashes, you will have escape the backslash with another backslash in your ECLiPSe source string).
.
Matches any character
[aeiou]
Matches any of the characters between the brackets
[^aeiou]
Matches any character except those listed
[a-z0-9]
Matches any character in the given ranges

Anchors

^
Matches at the beginning of the string (or line)
$
Matches at the end of the string (or line)

Repetition

?
Matches the preceding element 0 or 1 times
*
Matches the preceding element 0 or more times
+
Matches the preceding element 1 or more times
{3}
Matches the preceding element 3 times
{1,3}
Matches the preceding element 1 to 3 times

Grouping

(subexpr)
Matches the parenthesized expression. This grouping is used in connection with the repetition operators, or for indicating subexpressions whose matches are to be captured and returned
(one|two|three)
Matches any of the alternative expressions

Options

Most of the predicates in this library accept a list of options. The accepted options are:
basic
Interpret the pattern as a Basic Regular Expression, rather than the default Extended Regular Expression.
extended
Interpret the pattern as an Extended Regular Expression (this flag is redundant since this is the default).
icase
Ignore case when matching.
newline
Treat newlines specially, i.e. don't treat them as normal characters and make ^ match after a newline and $ before a newline. By default, newlines are treated as ordinary characters.
notbol
Don't interpret the beginning of the string as the beginning of a line, i.e. don't let ^ match there.
noteol
Don't interpret the end of the string as the end of a line, i.e. don't let $ match there.

Shortcomings

  1. Due to limitations of the underlying implementation, the predicates in this library do not handle embedded NUL characters in strings correctly (they are interpreted as the end of the string).
  2. POSIX regular expressions don't seem to have a notion of "noncapturing parentheses", i.e. parentheses that are only used for grouping, not for indicating that one wants to capture the matching substring.
  3. In an environment like ECLiPSe, one would like to be able to do things like
    	?- ideal_match("(/[^/]*)+", "/usr/local/eclipse", L).
    	L = ["/usr", "/local", "/eclipse"]
    	Yes
        
    i.e. capture every instance of a matching subexpression. There seems to be no way to do that with a POSIX regexp implementation.

About


Generated from regex.eci on 2022-09-03 14:26