The distinction between levels 0 and 1
The preceding examples demonstrate the main effects of using
cluster levels 0 and 1. The only difference between the two
levels is this: in level 0, at the very beginning of the shaping
process, HarfBuzz merges the cluster of each base character
with the clusters of all Unicode marks (combining or not) and
modifiers that follow it.
For example, let us start with the following character sequence
(top row) and accompanying initial cluster values (bottom row):
A,acute,B
0,1 ,2
The acute
is a Unicode mark. If HarfBuzz is
using cluster level 0 on this sequence, then the
A
and acute
clusters will
merge, and the result will become:
A,acute,B
0,0 ,2
This merger is performed before any other script-shaping
steps.
This initial cluster merging is the default behavior of the
Windows shaping engine, and the old HarfBuzz codebase copied
that behavior to maintain compatibility. Consequently, it has
remained the default behavior in the new HarfBuzz codebase.
But this initial cluster-merging behavior makes it impossible
for client programs to implement some features (such as to
color diacritic marks differently from their base
characters). That is why, in level 1, HarfBuzz does not perform
the initial merging step.
For client programs that rely on HarfBuzz cluster values to
perform cursor positioning, level 0 is more convenient. But
relying on cluster boundaries for cursor positioning is wrong: cursor
positions should be determined based on Unicode grapheme
boundaries, not on shaping-cluster boundaries. As such, using
level 1 clustering behavior is recommended.
One final facet of levels 0 and 1 is worth noting. HarfBuzz
currently does not allow any
multiple-substitution GSUB lookups to
replace a glyph with zero glyphs (in other words, to delete a
glyph).
But, in some other situations, glyphs can be deleted. In
those cases, if the glyph being deleted is the last glyph of its
cluster, HarfBuzz makes sure to merge the deleted glyph's
cluster with a neighboring cluster.
This is done primarily to make sure that the starting cluster of the
text always has the cluster index pointing to the start of the text
for the run; more than one client program currently relies on this
guarantee.
Incidentally, Apple's CoreText does something different to
maintain the same promise: it inserts a glyph with id 65535 at
the beginning of the glyph string if the glyph corresponding to
the first character in the run was deleted. HarfBuzz might do
something similar in the future.