_uvchr>,
but are used for UTF-8 encoded strings. The two forms are different names for
the same thing. Each call to one of these classifies the first character of
the string starting at C. The second parameter, C, points to anywhere in
the string beyond the first character, up to one byte past the end of the
entire string. Although both variants are identical, the suffix C<_safe> in
one name emphasizes that it will not attempt to read beyond S>,
provided that the constraint S e>> is true (this is asserted for in
C<-DDEBUGGING> builds). If the UTF-8 for the input character is malformed in
some way, the program may croak, or the function may return FALSE, at the
discretion of the implementation, and subject to change in future releases.
Variant C_LC> is like the C_A> and C_L1> variants,
but the result is based on the current locale, which is what C in the name
stands for. If Perl can determine that the current locale is a UTF-8 locale,
it uses the published Unicode rules; otherwise, it uses the C library function
that gives the named classification. For example, C when not in
a UTF-8 locale returns the result of calling C. FALSE is always
returned if the input won't fit into an octet. On some platforms where the C
library function is known to be defective, Perl changes its result to follow
the POSIX standard's rules.
Variant C_LC_uvchr> acts exactly like C_LC> for inputs less
than 256, but for larger ones it returns the Unicode classification of the code
point.
Variants C_LC_utf8> and C_LC_utf8_safe> are like
C_LC_uvchr>, but are used for UTF-8 encoded strings. The two forms
are different names for the same thing. Each call to one of these classifies
the first character of the string starting at C. The second parameter,
C, points to anywhere in the string beyond the first character, up to one
byte past the end of the entire string. Although both variants are identical,
the suffix C<_safe> in one name emphasizes that it will not attempt to read
beyond S>, provided that the constraint S e>> is true (this
is asserted for in C<-DDEBUGGING> builds). If the UTF-8 for the input
character is malformed in some way, the program may croak, or the function may
return FALSE, at the discretion of the implementation, and subject to change in
future releases.
=for apidoc Am|bool|isALPHA|int ch
Returns a boolean indicating whether the specified input is one of C<[A-Za-z]>,
analogous to C.
See the L for an explanation of
variants
C, C, C, C,
C, C, C, C,
and C.
=cut
Here and below, we add the protoypes of these macros for downstream programs
that would be interested in them, such as Devel::PPPort
=for apidoc Amh|bool|isALPHA_A|int ch
=for apidoc Amh|bool|isALPHA_L1|int ch
=for apidoc Amh|bool|isALPHA_uvchr|int ch
=for apidoc Amh|bool|isALPHA_utf8_safe|U8 * s|U8 * end
=for apidoc Amh|bool|isALPHA_utf8|U8 * s|U8 * end
=for apidoc Amh|bool|isALPHA_LC|int ch
=for apidoc Amh|bool|isALPHA_LC_uvchr|int ch
=for apidoc Amh|bool|isALPHA_LC_utf8_safe|U8 * s| U8 *end
=for apidoc Am|bool|isALPHANUMERIC|int ch
Returns a boolean indicating whether the specified character is one of
C<[A-Za-z0-9]>, analogous to C.
See the L for an explanation of
variants
C, C, C,
C, C, C,
C, C, and
C.
A (discouraged from use) synonym is C (where the C suffix means
this corresponds to the C language alphanumeric definition). Also
there are the variants
C, C
C, and C.
=for apidoc Amh|bool|isALPHANUMERIC_A|int ch
=for apidoc Amh|bool|isALPHANUMERIC_L1|int ch
=for apidoc Amh|bool|isALPHANUMERIC_uvchr|int ch
=for apidoc Amh|bool|isALPHANUMERIC_utf8_safe|U8 * s|U8 * end
=for apidoc Amh|bool|isALPHANUMERIC_utf8|U8 * s|U8 * end
=for apidoc Amh|bool|isALPHANUMERIC_LC|int ch
=for apidoc Amh|bool|isALPHANUMERIC_LC_uvchr|int ch
=for apidoc Amh|bool|isALPHANUMERIC_LC_utf8_safe|U8 * s| U8 *end
=for apidoc Amh|bool|isALNUMC|int ch
=for apidoc Amh|bool|isALNUMC_A|int ch
=for apidoc Amh|bool|isALNUMC_L1|int ch
=for apidoc Amh|bool|isALNUMC_LC|int ch
=for apidoc Amh|bool|isALNUMC_LC_uvchr|int ch
=for apidoc Am|bool|isASCII|int ch
Returns a boolean indicating whether the specified character is one of the 128
characters in the ASCII character set, analogous to C.
On non-ASCII platforms, it returns TRUE iff this
character corresponds to an ASCII character. Variants C and
C are identical to C.
See the L for an explanation of
variants
C, C, C, C,
C, C, and C.
Note, however, that some platforms do not have the C library routine
C. In these cases, the variants whose names contain C are the
same as the corresponding ones without.
=for apidoc Amh|bool|isASCII_A|int ch
=for apidoc Amh|bool|isASCII_L1|int ch
=for apidoc Amh|bool|isASCII_uvchr|int ch
=for apidoc Amh|bool|isASCII_utf8_safe|U8 * s|U8 * end
=for apidoc Amh|bool|isASCII_utf8|U8 * s|U8 * end
=for apidoc Amh|bool|isASCII_LC|int ch
=for apidoc Amh|bool|isASCII_LC_uvchr|int ch
=for apidoc Amh|bool|isASCII_LC_utf8_safe|U8 * s| U8 *end
Also note, that because all ASCII characters are UTF-8 invariant (meaning they
have the exact same representation (always a single byte) whether encoded in
UTF-8 or not), C will give the correct results when called with any
byte in any string encoded or not in UTF-8. And similarly C and
C will work properly on any string encoded or not in UTF-8.
=for apidoc Am|bool|isBLANK|char ch
Returns a boolean indicating whether the specified character is a
character considered to be a blank, analogous to C.
See the L for an explanation of
variants
C, C, C, C,
C, C, C, C,
and C. Note,
however, that some platforms do not have the C library routine
C. In these cases, the variants whose names contain C are
the same as the corresponding ones without.
=for apidoc Amh|bool|isBLANK_A|int ch
=for apidoc Amh|bool|isBLANK_L1|int ch
=for apidoc Amh|bool|isBLANK_uvchr|int ch
=for apidoc Amh|bool|isBLANK_utf8_safe|U8 * s|U8 * end
=for apidoc Amh|bool|isBLANK_utf8|U8 * s|U8 * end
=for apidoc Amh|bool|isBLANK_LC|int ch
=for apidoc Amh|bool|isBLANK_LC_uvchr|int ch
=for apidoc Amh|bool|isBLANK_LC_utf8_safe|U8 * s| U8 *end
=for apidoc Am|bool|isCNTRL|char ch
Returns a boolean indicating whether the specified character is a
control character, analogous to C.
See the L for an explanation of
variants
C, C, C, C,
C, C, C, C
and C. On EBCDIC
platforms, you almost always want to use the C variant.
=for apidoc Amh|bool|isCNTRL_A|int ch
=for apidoc Amh|bool|isCNTRL_L1|int ch
=for apidoc Amh|bool|isCNTRL_uvchr|int ch
=for apidoc Amh|bool|isCNTRL_utf8_safe|U8 * s|U8 * end
=for apidoc Amh|bool|isCNTRL_utf8|U8 * s|U8 * end
=for apidoc Amh|bool|isCNTRL_LC|int ch
=for apidoc Amh|bool|isCNTRL_LC_uvchr|int ch
=for apidoc Amh|bool|isCNTRL_LC_utf8_safe|U8 * s| U8 *end
=for apidoc Am|bool|isDIGIT|char ch
Returns a boolean indicating whether the specified character is a
digit, analogous to C.
Variants C and C are identical to C.
See the L for an explanation of
variants
C, C, C, C,
C, C, and C.
=for apidoc Amh|bool|isDIGIT_A|int ch
=for apidoc Amh|bool|isDIGIT_L1|int ch
=for apidoc Amh|bool|isDIGIT_uvchr|int ch
=for apidoc Amh|bool|isDIGIT_utf8_safe|U8 * s|U8 * end
=for apidoc Amh|bool|isDIGIT_utf8|U8 * s|U8 * end
=for apidoc Amh|bool|isDIGIT_LC|int ch
=for apidoc Amh|bool|isDIGIT_LC_uvchr|int ch
=for apidoc Amh|bool|isDIGIT_LC_utf8_safe|U8 * s| U8 *end
=for apidoc Am|bool|isGRAPH|char ch
Returns a boolean indicating whether the specified character is a
graphic character, analogous to C.
See the L for an explanation of
variants C, C, C, C,
C, C, C,
C, and C.
=for apidoc Amh|bool|isGRAPH_A|int ch
=for apidoc Amh|bool|isGRAPH_L1|int ch
=for apidoc Amh|bool|isGRAPH_uvchr|int ch
=for apidoc Amh|bool|isGRAPH_utf8_safe|U8 * s|U8 * end
=for apidoc Amh|bool|isGRAPH_utf8|U8 * s|U8 * end
=for apidoc Amh|bool|isGRAPH_LC|int ch
=for apidoc Amh|bool|isGRAPH_LC_uvchr|int ch
=for apidoc Amh|bool|isGRAPH_LC_utf8_safe|U8 * s| U8 *end
=for apidoc Am|bool|isLOWER|char ch
Returns a boolean indicating whether the specified character is a
lowercase character, analogous to C.
See the L for an explanation of
variants
C, C, C, C,
C, C, C, C,
and C.
=for apidoc Amh|bool|isLOWER_A|int ch
=for apidoc Amh|bool|isLOWER_L1|int ch
=for apidoc Amh|bool|isLOWER_uvchr|int ch
=for apidoc Amh|bool|isLOWER_utf8_safe|U8 * s|U8 * end
=for apidoc Amh|bool|isLOWER_utf8|U8 * s|U8 * end
=for apidoc Amh|bool|isLOWER_LC|int ch
=for apidoc Amh|bool|isLOWER_LC_uvchr|int ch
=for apidoc Amh|bool|isLOWER_LC_utf8_safe|U8 * s| U8 *end
=for apidoc Am|bool|isOCTAL|char ch
Returns a boolean indicating whether the specified character is an
octal digit, [0-7].
The only two variants are C and C; each is identical to
C.
=for apidoc Amh|bool|isOCTAL_A|int ch
=for apidoc Amh|bool|isOCTAL_L1|int ch
=for apidoc Am|bool|isPUNCT|char ch
Returns a boolean indicating whether the specified character is a
punctuation character, analogous to C.
Note that the definition of what is punctuation isn't as
straightforward as one might desire. See L for details.
See the L for an explanation of
variants C, C, C, C,
C, C, C, C,
and C.
=for apidoc Amh|bool|isPUNCT_A|int ch
=for apidoc Amh|bool|isPUNCT_L1|int ch
=for apidoc Amh|bool|isPUNCT_uvchr|int ch
=for apidoc Amh|bool|isPUNCT_utf8_safe|U8 * s|U8 * end
=for apidoc Amh|bool|isPUNCT_utf8|U8 * s|U8 * end
=for apidoc Amh|bool|isPUNCT_LC|int ch
=for apidoc Amh|bool|isPUNCT_LC_uvchr|int ch
=for apidoc Amh|bool|isPUNCT_LC_utf8_safe|U8 * s| U8 *end
=for apidoc Am|bool|isSPACE|char ch
Returns a boolean indicating whether the specified character is a
whitespace character. This is analogous
to what C matches in a regular expression. Starting in Perl 5.18
this also matches what C does. Prior to 5.18, only the
locale forms of this macro (the ones with C in their names) matched
precisely what C does. In those releases, the only difference,
in the non-locale variants, was that C did not match a vertical tab.
(See L for a macro that matches a vertical tab in all releases.)
See the L for an explanation of
variants
C, C, C, C,
C, C, C, C,
and C.
=for apidoc Amh|bool|isSPACE_A|int ch
=for apidoc Amh|bool|isSPACE_L1|int ch
=for apidoc Amh|bool|isSPACE_uvchr|int ch
=for apidoc Amh|bool|isSPACE_utf8_safe|U8 * s|U8 * end
=for apidoc Amh|bool|isSPACE_utf8|U8 * s|U8 * end
=for apidoc Amh|bool|isSPACE_LC|int ch
=for apidoc Amh|bool|isSPACE_LC_uvchr|int ch
=for apidoc Amh|bool|isSPACE_LC_utf8_safe|U8 * s| U8 *end
=for apidoc Am|bool|isPSXSPC|char ch
(short for Posix Space)
Starting in 5.18, this is identical in all its forms to the
corresponding C macros.
The locale forms of this macro are identical to their corresponding
C forms in all Perl releases. In releases prior to 5.18, the
non-locale forms differ from their C forms only in that the
C forms don't match a Vertical Tab, and the C forms do.
Otherwise they are identical. Thus this macro is analogous to what
C matches in a regular expression.
See the L for an explanation of
variants C, C, C, C,
C, C, C,
C, and C.
=for apidoc Amh|bool|isPSXSPC_A|int ch
=for apidoc Amh|bool|isPSXSPC_L1|int ch
=for apidoc Amh|bool|isPSXSPC_uvchr|int ch
=for apidoc Amh|bool|isPSXSPC_utf8_safe|U8 * s|U8 * end
=for apidoc Amh|bool|isPSXSPC_utf8|U8 * s|U8 * end
=for apidoc Amh|bool|isPSXSPC_LC|int ch
=for apidoc Amh|bool|isPSXSPC_LC_uvchr|int ch
=for apidoc Amh|bool|isPSXSPC_LC_utf8_safe|U8 * s| U8 *end
=for apidoc Am|bool|isUPPER|char ch
Returns a boolean indicating whether the specified character is an
uppercase character, analogous to C.
See the L for an explanation of
variants C, C, C, C,
C, C, C, C,
and C.
=for apidoc Amh|bool|isUPPER_A|int ch
=for apidoc Amh|bool|isUPPER_L1|int ch
=for apidoc Amh|bool|isUPPER_uvchr|int ch
=for apidoc Amh|bool|isUPPER_utf8_safe|U8 * s|U8 * end
=for apidoc Amh|bool|isUPPER_utf8|U8 * s|U8 * end
=for apidoc Amh|bool|isUPPER_LC|int ch
=for apidoc Amh|bool|isUPPER_LC_uvchr|int ch
=for apidoc Amh|bool|isUPPER_LC_utf8_safe|U8 * s| U8 *end
=for apidoc Am|bool|isPRINT|char ch
Returns a boolean indicating whether the specified character is a
printable character, analogous to C.
See the L for an explanation of
variants
C, C, C, C,
C, C, C, C,
and C.
=for apidoc Amh|bool|isPRINT_A|int ch
=for apidoc Amh|bool|isPRINT_L1|int ch
=for apidoc Amh|bool|isPRINT_uvchr|int ch
=for apidoc Amh|bool|isPRINT_utf8_safe|U8 * s|U8 * end
=for apidoc Amh|bool|isPRINT_utf8|U8 * s|U8 * end
=for apidoc Amh|bool|isPRINT_LC|int ch
=for apidoc Amh|bool|isPRINT_LC_uvchr|int ch
=for apidoc Amh|bool|isPRINT_LC_utf8_safe|U8 * s| U8 *end
=for apidoc Am|bool|isWORDCHAR|char ch
Returns a boolean indicating whether the specified character is a character
that is a word character, analogous to what C and C match
in a regular expression. A word character is an alphabetic character, a
decimal digit, a connecting punctuation character (such as an underscore), or
a "mark" character that attaches to one of those (like some sort of accent).
C is a synonym provided for backward compatibility, even though a
word character includes more than the standard C language meaning of
alphanumeric.
See the L for an explanation of
variants C