ertical.
=head3 Unicode classes
C<\pP> (where C is a single letter) and C<\p{Property}> are used to
match a character that matches the given Unicode property; properties
include things like "letter", or "thai character". Capitalizing the
sequence to C<\PP> and C<\P{Property}> make the sequence match a character
that doesn't match the given Unicode property. For more details, see
L and
L.
Mnemonic: Iroperty.
=head2 Referencing
If capturing parenthesis are used in a regular expression, we can refer
to the part of the source string that was matched, and match exactly the
same thing. There are three ways of referring to such I:
absolutely, relatively, and by name.
=for later add link to perlrecapture
=head3 Absolute referencing
Either C<\gI> (starting in Perl 5.10.0), or C<\I> (old-style) where I
is a positive (unsigned) decimal number of any length is an absolute reference
to a capturing group.
I refers to the Nth set of parentheses, so C<\gI> refers to whatever has
been matched by that set of parentheses. Thus C<\g1> refers to the first
capture group in the regex.
The C<\gI> form can be equivalently written as C<\g{I}>
which avoids ambiguity when building a regex by concatenating shorter
strings. Otherwise if you had a regex C, and C<$a> contained
C<"\g1">, and C<$b> contained C<"37">, you would get C\g137/> which is
probably not what you intended.
In the C<\I> form, I must not begin with a "0", and there must be at
least I capturing groups, or else I is considered an octal escape
(but something like C<\18> is the same as C<\0018>; that is, the octal escape
C<"\001"> followed by a literal digit C<"8">).
Mnemonic: Iroup.
=head4 Examples
/(\w+) \g1/; # Finds a duplicated word, (e.g. "cat cat").
/(\w+) \1/; # Same thing; written old-style.
/(.)(.)\g2\g1/; # Match a four letter palindrome (e.g. "ABBA").
=head3 Relative referencing
C<\g-I> (starting in Perl 5.10.0) is used for relative addressing. (It can
be written as C<\g{-I}>.) It refers to the Ith group before the
C<\g{-I}>.
The big advantage of this form is that it makes it much easier to write
patterns with references that can be interpolated in larger patterns,
even if the larger pattern also contains capture groups.
=head4 Examples
/(A) # Group 1
( # Group 2
(B) # Group 3
\g{-1} # Refers to group 3 (B)
\g{-3} # Refers to group 1 (A)
)
/x; # Matches "ABBA".
my $qr = qr /(.)(.)\g{-2}\g{-1}/; # Matches 'abab', 'cdcd', etc.
/$qr$qr/ # Matches 'ababcdcd'.
=head3 Named referencing
C<\g{I}> (starting in Perl 5.10.0) can be used to back refer to a
named capture group, dispensing completely with having to think about capture
buffer positions.
To be compatible with .Net regular expressions, C<\g{name}> may also be
written as C<\k{name}>, C<< \k >> or C<\k'name'>.
To prevent any ambiguity, I must not start with a digit nor contain a
hyphen.
=head4 Examples
/(?\w+) \g{word}/ # Finds duplicated word, (e.g. "cat cat")
/(?\w+) \k{word}/ # Same.
/(?\w+) \k/ # Same.
/(?.)(?.)\g{letter2}\g{letter1}/
# Match a four letter palindrome (e.g. "ABBA")
=head2 Assertions
Assertions are conditions that have to be true; they don't actually
match parts of the substring. There are six assertions that are written as
backslash sequences.
=over 4
=item \A
C<\A> only matches at the beginning of the string. If the C modifier
isn't used, then C\A/> is equivalent to C^/>. However, if the C
modifier is used, then C^/> matches internal newlines, but the meaning
of C\A/> isn't changed by the C modifier. C<\A> matches at the beginning
of the string regardless whether the C modifier is used.
=item \z, \Z
C<\z> and C<\Z> match at the end of the string. If the C modifier isn't
used, then C\Z/> is equivalent to C$/>; that is, it matches at the
end of the string, or one before the newline at the end of the string. If the
C modifier is used, then C$/> matches at internal newlines, but the
meaning of C\Z/> isn't changed by the C modifier. C<\Z> matches at
the end of the string (or just before a trailing newline) regardless whether
the C modifier is used.
C<\z> is just like C<\Z>, except that it does not match before a trailing
newline. C<\z> matches at the end of the string only, regardless of the
modifiers used, and not just before a newline. It is how to anchor the
match to the true end of the string under all conditions.
=item \G
C<\G> is usually used only in combination with the C modifier. If the
C modifier is used and the match is done in scalar context, Perl
remembers where in the source string the last match ended, and the next time,
it will start the match from where it ended the previous time.
C<\G> matches the point where the previous match on that string ended,
or the beginning of that string if there was no previous match.
=for later add link to perlremodifiers
Mnemonic: Ilobal.
=item \b{}, \b, \B{}, \B
C<\b{...}>, available starting in v5.22, matches a boundary (between two
characters, or before the first character of the string, or after the
final character of the string) based on the Unicode rules for the
boundary type specified inside the braces. The boundary
types are given a few paragraphs below. C<\B{...}> matches at any place
between characters where C<\b{...}> of the same type doesn't match.
C<\b> when not immediately followed by a C<"{"> matches at any place
between a word (something matched by C<\w>) and a non-word character
(C<\W>); C<\B> when not immediately followed by a C<"{"> matches at any
place between characters where C<\b> doesn't match. To get better
word matching of natural language text, see L\b{wb}> below.
C<\b>
and C<\B> assume there's a non-word character before the beginning and after
the end of the source string; so C<\b> will match at the beginning (or end)
of the source string if the source string begins (or ends) with a word
character. Otherwise, C<\B> will match.
Do not use something like C<\b=head\d\b> and expect it to match the
beginning of a line. It can't, because for there to be a boundary before
the non-word "=", there must be a word character immediately previous.
All plain C<\b> and C<\B> boundary determinations look for word
characters alone, not for
non-word characters nor for string ends. It may help to understand how
C<\b> and C<\B> work by equating them as follows:
\b really means (?:(?<=\w)(?!\w)|(? and C<\B{...}> may or may not match at the
beginning and end of the line, depending on the boundary type. These
implement the Unicode default boundaries, specified in
L and
L.
The boundary types are:
=over
=item C<\b{gcb}> or C<\b{g}>
This matches a Unicode "Grapheme Cluster Boundary". (Actually Perl
always uses the improved "extended" grapheme cluster"). These are
explained below under L