Shaping a complex-script text run involves transforming the
input sequence of Unicode codepoints with some combination of
operations that is specified in the shaping model for the
script.
The specific conditions that trigger a given operation for a
text run varies from script to script, as do the order that the
operations are performed in and which codepoints are
affected. However, the same general set of shaping operations is
common to all of the complex-script shaping models.
-
A reordering operation moves a glyph
from its original ("logical") position in the sequence to
some other ("visual") position.
The shaping model for a given complex script might involve
more than one reordering step.
A joining operation replaces a glyph
with an alternate form that is designed to connect with one
or more of the adjacent glyphs in the sequence.
-
A contextual substitution operation
replaces either a single glyph or a subsequence of several
glyphs with an alternate glyph. This substitution is
performed when the original glyph or subsequence of glyphs
occurs in a specified position with respect to the
surrounding sequence. For example, one substitution might be
performed only when the target glyph is the first glyph in
the sequence, while another substitution is performed only
when a different target glyph occurs immediately after a
particular string pattern.
The shaping model for a given complex script might involve
multiple contextual-substitution operations, each applying
to different target glyphs and patterns, and which are
performed in separate steps.
-
A contextual positioning operation
moves the horizontal and/or vertical position of a
glyph. This positioning move is performed when the glyph
occurs in a specified position with respect to the
surrounding sequence.
Many contextual positioning operations are used to place
mark glyphs (such as diacritics, vowel
signs, and tone markers) with respect to
base glyphs. However, some complex
scripts may use contextual positioning operations to
correctly place base glyphs as well, such as
when the script uses stacking characters.