Return to the PCRE2 index page.
This page is part of the PCRE2 HTML documentation. It was generated automatically from the original man page. If there is any nonsense in it, please consult the man page, in case the conversion went wrong.
PCRE2 is distributed with a configure script that can be used to build the library in Unix-like environments using the applications known as Autotools. Also in the distribution are files to support building using CMake instead of configure. The text file README contains general information about building with Autotools (some of which is repeated below), and also has some comments about building on various operating systems. There is a lot more information about building PCRE2 without using Autotools (including information about using CMake and building "by hand") in the text file called NON-AUTOTOOLS-BUILD. You should consult this file as well as the README file if you are building in a non-Unix-like environment.
The rest of this document describes the optional features of PCRE2 that can be selected when the library is compiled. It assumes use of the configure script, where the optional features are selected or deselected by providing options to configure before running the make command. However, the same options can be selected in both Unix-like and non-Unix-like environments if you are using CMake instead of configure to build PCRE2.
If you are not using Autotools or CMake, option selection can be done by editing the config.h file, or by passing parameter settings to the compiler, as described in NON-AUTOTOOLS-BUILD.
The complete list of options for configure (which includes the standard ones such as the selection of the installation directory) can be obtained by running
./configure --help
By default, a library called libpcre2-8 is built, containing functions that take string arguments contained in arrays of bytes, interpreted either as single-byte characters, or UTF-8 strings. You can also build two other libraries, called libpcre2-16 and libpcre2-32, which process strings that are contained in arrays of 16-bit and 32-bit code units, respectively. These can be interpreted either as single-unit characters or UTF-16/UTF-32 strings. To build these additional libraries, add one or both of the following to the configure command:
--enable-pcre2-16 --enable-pcre2-32
--disable-pcre2-8
The Autotools PCRE2 building process uses libtool to build both shared and static libraries by default. You can suppress an unwanted library by adding one of
--disable-shared --disable-static
By default, PCRE2 is built with support for Unicode and UTF character strings. To build it without Unicode support, add
--disable-unicode
Of itself, Unicode support does not make PCRE2 treat strings as UTF-8, UTF-16 or UTF-32. To do that, applications that use the library can set the PCRE2_UTF option when they call pcre2_compile() to compile a pattern. Alternatively, patterns may be started with (*UTF) unless the application has locked this out by setting PCRE2_NEVER_UTF.
UTF support allows the libraries to process character code points up to 0x10ffff in the strings that they handle. Unicode support also gives access to the Unicode properties of characters, using pattern escapes such as \P, \p, and \X. Only the general category properties such as Lu and Nd, script names, and some bi-directional properties are supported. Details are given in the pcre2pattern documentation.
Pattern escapes such as \d and \w do not by default make use of Unicode properties. The application can request that they do by setting the PCRE2_UCP option. Unless the application has set PCRE2_NEVER_UCP, a pattern may also request this by starting with (*UCP).
The \C escape sequence, which matches a single code unit, even in a UTF mode, can cause unpredictable behaviour because it may leave the current matching point in the middle of a multi-code-unit character. The application can lock it out by setting the PCRE2_NEVER_BACKSLASH_C option when calling pcre2_compile(). There is also a build-time option
--enable-never-backslash-C
Just-in-time (JIT) compiler support is included in the build by specifying
--enable-jit
--enable-jit=auto
--enable-jit-sealloc
--disable-pcre2grep-jit
By default, PCRE2 interprets the linefeed (LF) character as indicating the end of a line. This is the normal newline character on Unix-like systems. You can compile PCRE2 to use carriage return (CR) instead, by adding
--enable-newline-is-cr
Alternatively, you can specify that line endings are to be indicated by the two-character sequence CRLF (CR immediately followed by LF). If you want this, add
--enable-newline-is-crlf
--enable-newline-is-anycrlf
--enable-newline-is-any
--enable-newline-is-nul
Whatever default line ending convention is selected when PCRE2 is built can be overridden by applications that use the library. At build time it is recommended to use the standard for your operating system.
By default, the sequence \R in a pattern matches any Unicode newline sequence, independently of what has been selected as the line ending sequence. If you specify
--enable-bsr-anycrlf
Within a compiled pattern, offset values are used to point from one part to another (for example, from an opening parenthesis to an alternation metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values are used for these offsets, leading to a maximum size for a compiled pattern of around 64 thousand code units. This is sufficient to handle all but the most gigantic patterns. Nevertheless, some people do want to process truly enormous patterns, so it is possible to compile PCRE2 to use three-byte or four-byte offsets by adding a setting such as
--with-link-size=3
The pcre2_match() function increments a counter each time it goes round its main loop. Putting a limit on this counter controls the amount of computing resource used by a single call to pcre2_match(). The limit can be changed at run time, as described in the pcre2api documentation. The default is 10 million, but this can be changed by adding a setting such as
--with-match-limit=500000
The pcre2_match() function starts out using a 20KiB vector on the system stack to record backtracking points. The more nested backtracking points there are (that is, the deeper the search tree), the more memory is needed. If the initial vector is not large enough, heap memory is used, up to a certain limit, which is specified in kibibytes (units of 1024 bytes). The limit can be changed at run time, as described in the pcre2api documentation. The default limit (in effect unlimited) is 20 million. You can change this by a setting such as
--with-heap-limit=500
You can also explicitly limit the depth of nested backtracking in the pcre2_match() interpreter. This limit defaults to the value that is set for --with-match-limit. You can set a lower default limit by adding, for example,
--with-match-limit-depth=10000
As well as applying to pcre2_match(), the depth limit also controls the depth of recursive function calls in pcre2_dfa_match(). These are used for lookaround assertions, atomic groups, and recursion within patterns. The limit does not apply to JIT matching.
PCRE2 uses fixed tables for processing characters whose code points are less than 256. By default, PCRE2 is built with a set of tables that are distributed in the file src/pcre2_chartables.c.dist. These tables are for ASCII codes only. If you add
--enable-rebuild-chartables
If you need to create alternative tables when cross compiling, you will have to do so "by hand". There may also be other reasons for creating tables manually. To cause pcre2_dftables to be built on the local host, run a normal compiling command, and then run the program with the output file as its argument, for example:
cc src/pcre2_dftables.c -o pcre2_dftables ./pcre2_dftables src/pcre2_chartables.c
LC_ALL=fr_FR ./pcre2_dftables -L src/pcre2_chartables.c
PCRE2 assumes by default that it will run in an environment where the character code is ASCII or Unicode, which is a superset of ASCII. This is the case for most computer operating systems. PCRE2 can, however, be compiled to run in an 8-bit EBCDIC environment by adding
--enable-ebcdic --disable-unicode
It is not possible to support both EBCDIC and UTF-8 codes in the same version of the library. Consequently, --enable-unicode and --enable-ebcdic are mutually exclusive.
The EBCDIC character that corresponds to an ASCII LF is assumed to have the value 0x15 by default. However, in some EBCDIC environments, 0x25 is used. In such an environment you should use
--enable-ebcdic-nl25
The options that select newline behaviour, such as --enable-newline-is-cr, and equivalent run-time options, refer to these character values in an EBCDIC environment.
By default pcre2grep supports the use of callouts with string arguments within the patterns it is matching. There are two kinds: one that generates output using local code, and another that calls an external program or script. If --disable-pcre2grep-callout-fork is added to the configure command, only the first kind of callout is supported; if --disable-pcre2grep-callout is used, all callouts are completely ignored. For more details of pcre2grep callouts, see the pcre2grep documentation.
By default, pcre2grep reads all files as plain text. You can build it so that it recognizes files whose names end in .gz or .bz2, and reads them with libz or libbz2, respectively, by adding one or both of
--enable-pcre2grep-libz --enable-pcre2grep-libbz2
pcre2grep uses an internal buffer to hold a "window" on the file it is scanning, in order to be able to output "before" and "after" lines when it finds a match. The default starting size of the buffer is 20KiB. The buffer itself is three times this size, but because of the way it is used for holding "before" lines, the longest line that is guaranteed to be processable is the notional buffer size. If a longer line is encountered, pcre2grep automatically expands the buffer, up to a specified maximum size, whose default is 1MiB or the starting size, whichever is the larger. You can change the default parameter values by adding, for example,
--with-pcre2grep-bufsize=51200 --with-pcre2grep-max-bufsize=2097152
If you add one of
--enable-pcre2test-libreadline --enable-pcre2test-libedit
Setting --enable-pcre2test-libreadline causes the -lreadline option to be added to the pcre2test build. In many operating environments with a sytem-installed readline library this is sufficient. However, in some environments (e.g. if an unmodified distribution version of readline is in use), some extra configuration may be necessary. The INSTALL file for libreadline says this:
"Readline uses the termcap functions, but does not link with the termcap or curses library itself, allowing applications which link with readline the to choose an appropriate library."
LIBS="-ncurses"
If you add
--enable-debug
--enable-valgrind
If your C compiler is gcc, you can build a version of PCRE2 that can generate a code coverage report for its test suite. To enable this, you must install lcov version 1.6 or above. Then specify
--enable-coverage
Note that using ccache (a caching C compiler) is incompatible with code coverage reporting. If you have configured ccache to run automatically on your system, you must set the environment variable
CCACHE_DISABLE=1
When --enable-coverage is used, the following addition targets are added to the Makefile:
make coverage
make coverage-reset
make coverage-baseline
make coverage-report
make coverage-clean-report
make coverage-clean-data
make coverage-clean
The C99 standard defines formatting modifiers z and t for size_t and ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in environments other than old versions of Microsoft Visual Studio when __STDC_VERSION__ is defined and has a value greater than or equal to 199901L (indicating support for C99). However, there is at least one environment that claims to be C99 but does not support these modifiers. If
--disable-percent-zt
There is a special option for use by people who want to run fuzzing tests on PCRE2:
--enable-fuzz-support
Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to be created. This is normally run under valgrind or used when PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing function and outputs information about what it is doing. The input strings are specified by arguments: if an argument starts with "=" the rest of it is a literal input string. Otherwise, it is assumed to be a file name, and the contents of the file are the test string.
In versions of PCRE2 prior to 10.30, there were two ways of handling backtracking in the pcre2_match() function. The default was to use the system stack, but if
--disable-stack-for-recursion
pcre2api(3), pcre2-config(3).
Philip Hazel University Computing Service Cambridge, England.
Last updated: 08 December 2021 Copyright © 1997-2021 University of Cambridge.