One Hat Cyber Team

View File Name : working-with-harfbuzz-clusters.html

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Working with HarfBuzz clusters: HarfBuzz Manual</title>
<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
<link rel="home" href="index.html" title="HarfBuzz Manual">
<link rel="up" href="clusters.html" title="Clusters">
<link rel="prev" href="clusters.html" title="Clusters">
<link rel="next" href="a-clustering-example-for-levels-0-and-1.html" title="A clustering example for levels 0 and 1">
<meta name="generator" content="GTK-Doc V1.32 (XML mode)">
<link rel="stylesheet" href="style.css" type="text/css">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<table class="navigation" id="top" width="100%" summary="Navigation header" cellpadding="2" cellspacing="5"><tr valign="middle">
<td width="100%" align="left" class="shortcuts"></td>
<td><a accesskey="h" href="index.html"><img src="home.png" width="16" height="16" border="0" alt="Home"></a></td>
<td><a accesskey="u" href="clusters.html"><img src="up.png" width="16" height="16" border="0" alt="Up"></a></td>
<td><a accesskey="p" href="clusters.html"><img src="left.png" width="16" height="16" border="0" alt="Prev"></a></td>
<td><a accesskey="n" href="a-clustering-example-for-levels-0-and-1.html"><img src="right.png" width="16" height="16" border="0" alt="Next"></a></td>
</tr></table>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="working-with-harfbuzz-clusters"></a>Working with HarfBuzz clusters</h2></div></div></div>
<p>
      When you add text to a HarfBuzz buffer, each code point must be
      assigned a <span class="emphasis"><em>cluster value</em></span>.
    </p>
<p>
      This cluster value is an arbitrary number; HarfBuzz uses it only
      to distinguish between clusters. Many client programs will use
      the index of each code point in the input text stream as the
      cluster value. This is for the sake of convenience; the actual
      value does not matter.
    </p>
<p>
      Some of the shaping operations performed by HarfBuzz —
      such as reordering, composition, decomposition, and substitution
      — may alter the cluster values of some characters. The
      final cluster values in the buffer at the end of the shaping
      process will indicate to client programs which subsequences of
      glyphs represent a cluster and, therefore, must not be
      separated.
    </p>
<p>
      In addition, client programs can query the final cluster values
      to discern other potentially important information about the
      glyphs in the output buffer (such as whether or not a ligature
      was formed).
    </p>
<p>
      For example, if the initial sequence of cluster values was:
    </p>
<pre class="programlisting">
      0,1,2,3,4
    </pre>
<p>
      and the final sequence of cluster values is:
    </p>
<pre class="programlisting">
      0,0,3,3
    </pre>
<p>
      then there are two clusters in the output buffer: the first
      cluster includes the first two glyphs, and the second cluster
      includes the third and fourth glyphs. It is also evident that a
      ligature or conjunct has been formed, because there are fewer
      glyphs in the output buffer (four) than there were code points
      in the input buffer (five).
    </p>
<p>
      Although client programs using HarfBuzz are free to assign
      initial cluster values in any manner they choose to, HarfBuzz
      does offer some useful guarantees if the cluster values are
      assigned in a monotonic (either non-decreasing or non-increasing)
      order.
    </p>
<p>
      For buffers in the left-to-right (LTR)
      or top-to-bottom (TTB) text flow direction,
      HarfBuzz will preserve the monotonic property: client programs
      are guaranteed that monotonically increasing initial cluster
      values will be returned as monotonically increasing final
      cluster values.
    </p>
<p>
      For buffers in the right-to-left (RTL)
      or bottom-to-top (BTT) text flow direction,
      the directionality of the buffer itself is reversed for final
      output as a matter of design. Therefore, HarfBuzz inverts the
      monotonic property: client programs are guaranteed that
      monotonically increasing initial cluster values will be
      returned as monotonically <span class="emphasis"><em>decreasing</em></span> final
      cluster values.
    </p>
<p>
      Client programs can adjust how HarfBuzz handles clusters during
      shaping by setting the
      <code class="literal">cluster_level</code> of the
      buffer. HarfBuzz offers three <span class="emphasis"><em>levels</em></span> of
      clustering support for this property:
    </p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem">
<p><span class="emphasis"><em>Level 0</em></span> is the default and
	reproduces the behavior of the old HarfBuzz library.
	</p>
<p>
	  The distinguishing feature of level 0 behavior is that, at
	  the beginning of processing the buffer, all code points that
	  are categorized as <span class="emphasis"><em>marks</em></span>,
	  <span class="emphasis"><em>modifier symbols</em></span>, or
	  <span class="emphasis"><em>Emoji extended pictographic</em></span> modifiers,
	  as well as the <span class="emphasis"><em>Zero Width Joiner</em></span> and
	  <span class="emphasis"><em>Zero Width Non-Joiner</em></span> code points, are
	  assigned the cluster value of the closest preceding code
	  point from <span class="emphasis"><em>different</em></span> category. 
	</p>
<p>
	  In essence, whenever a base character is followed by a mark
	  character or a sequence of mark characters, those marks are
	  reassigned to the same initial cluster value as the base
	  character. This reassignment is referred to as
	  "merging" the affected clusters. This behavior is based on
	  the Grapheme Cluster Boundary specification in <a class="ulink" href="https://www.unicode.org/reports/tr29/#Regex_Definitions" target="_top">Unicode
	  Technical Report 29</a>.
	</p>
<p>
	  Client programs can specify level 0 behavior for a buffer by
	  setting its <code class="literal">cluster_level</code> to
	  <code class="literal">HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES</code>. 
	</p>
</li>
<li class="listitem">
<p>
	  <span class="emphasis"><em>Level 1</em></span> tweaks the old behavior
	  slightly to produce better results. Therefore, level 1
	  clustering is recommended for code that is not required to
	  implement backward compatibility with the old HarfBuzz.
	</p>
<p>
	  Level 1 differs from level 0 by not merging the 
	  clusters of marks and other modifier code points with the
	  preceding "base" code point's cluster. By preserving the
	  separate cluster values of these marks and modifier code
	  points, script shapers can perform additional operations
	  that might lead to improved results (for example, reordering
	  a sequence of marks).
	</p>
<p>
	  Client programs can specify level 1 behavior for a buffer by
	  setting its <code class="literal">cluster_level</code> to
	  <code class="literal">HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS</code>. 
	</p>
</li>
<li class="listitem">
<p>
	  <span class="emphasis"><em>Level 2</em></span> differs significantly in how it
	  treats cluster values. In level 2, HarfBuzz never merges
	  clusters.
	</p>
<p>
	  This difference can be seen most clearly when HarfBuzz processes
	  ligature substitutions and glyph decompositions. In level 0 
	  and level 1, ligatures and glyph decomposition both involve
	  merging clusters; in level 2, neither of these operations
	  triggers a merge.
	</p>
<p>
	  Client programs can specify level 2 behavior for a buffer by
	  setting its <code class="literal">cluster_level</code> to
	  <code class="literal">HB_BUFFER_CLUSTER_LEVEL_CHARACTERS</code>. 
	</p>
</li>
</ul></div>
<p>
      As mentioned earlier, client programs using HarfBuzz often
      assign initial cluster values in a buffer by reusing the indices
      of the code points in the input text. This gives a sequence of
      cluster values that is monotonically increasing (for example,
      0,1,2,3,4). 
    </p>
<p>
      It is not <span class="emphasis"><em>required</em></span> that the cluster values
      in a buffer be monotonically increasing. However, if the initial
      cluster values in a buffer are monotonic and the buffer is
      configured to use cluster level 0 or 1, then HarfBuzz
      guarantees that the final cluster values in the shaped buffer
      will also be monotonic. No such guarantee is made for cluster
      level 2.
    </p>
<p>
      In levels 0 and 1, HarfBuzz implements the following conceptual
      model for cluster values:
    </p>
<div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: disc; ">
<li class="listitem"><p>
          If the sequence of input cluster values is monotonic, the
	  sequence of cluster values will remain monotonic.
	</p></li>
<li class="listitem"><p>
          Each cluster value represents a single cluster.
	</p></li>
<li class="listitem"><p>
          Each cluster contains one or more glyphs and one or more
          characters.
	</p></li>
</ul></div>
<p>
      In practice, this model offers several benefits. Assuming that
      the initial cluster values were monotonically increasing
      and distinct before shaping began, then, in the final output:
    </p>
<div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: disc; ">
<li class="listitem"><p>
	  All adjacent glyphs having the same final cluster
	  value belong to the same cluster.
	</p></li>
<li class="listitem"><p>
          Each character belongs to the cluster that has the highest
	  cluster value <span class="emphasis"><em>not larger than</em></span> its
	  initial cluster value.
	</p></li>
</ul></div>
</div>
<div class="footer">
<hr>Generated by GTK-Doc V1.32</div>
</body>
</html>