Characters can be accessed in two ways: as code units or as * code points. * Unicode code points are 21-bit integers and are the scalar values * of Unicode characters. ICU uses the type UChar32 for them. * Unicode code units are the storage units of a given * Unicode/UCS Transformation Format (a character encoding scheme). * With UTF-16, all code points can be represented with either one * or two code units ("surrogates"). * String storage is typically based on code units, while properties * of characters are typically determined using code point values. * Some processes may be designed to work with sequences of code units, * or it may be known that all characters that are important to an * algorithm can be represented with single code units. * Other processes will need to use the code point access functions.
ForwardCharacterIterator provides nextPostInc() to access * a code unit and advance an internal position into the text object, * similar to a return text[position++]. * It provides next32PostInc() to access a code point and advance an internal * position.
return text[position++]
next32PostInc() assumes that the current position is that of * the beginning of a code point, i.e., of its first code unit. * After next32PostInc(), this will be true again. * In general, access to code units and code points in the same * iteration loop should not be mixed. In UTF-16, if the current position * is on a second code unit (Low Surrogate), then only that code unit * is returned even by next32PostInc().
For iteration with either function, there are two ways to * check for the end of the iteration. When there are no more * characters in the text object: *
Despite the fact that this function is public, * DO NOT CONSIDER IT PART OF CHARACTERITERATOR'S API! * @return a UClassID for this ForwardCharacterIterator * @stable ICU 2.0 */ virtual UClassID getDynamicClassID(void) const = 0; /** * Gets the current code unit for returning and advances to the next code unit * in the iteration range * (toward endIndex()). If there are * no more code units to return, returns DONE. * @return the current code unit. * @stable ICU 2.0 */ virtual char16_t nextPostInc(void) = 0; /** * Gets the current code point for returning and advances to the next code point * in the iteration range * (toward endIndex()). If there are * no more code points to return, returns DONE. * @return the current code point. * @stable ICU 2.0 */ virtual UChar32 next32PostInc(void) = 0; /** * Returns FALSE if there are no more code units or code points * at or after the current position in the iteration range. * This is used with nextPostInc() or next32PostInc() in forward * iteration. * @returns FALSE if there are no more code units or code points * at or after the current position in the iteration range. * @stable ICU 2.0 */ virtual UBool hasNext() = 0; protected: /** Default constructor to be overridden in the implementing class. @stable ICU 2.0*/ ForwardCharacterIterator(); /** Copy constructor to be overridden in the implementing class. @stable ICU 2.0*/ ForwardCharacterIterator(const ForwardCharacterIterator &other); /** * Assignment operator to be overridden in the implementing class. * @stable ICU 2.0 */ ForwardCharacterIterator &operator=(const ForwardCharacterIterator&) { return *this; } }; /** * Abstract class that defines an API for iteration * on text objects. * This is an interface for forward and backward iteration * and random access into a text object. * *
The API provides backward compatibility to the Java and older ICU * CharacterIterator classes but extends them significantly: *
Examples for some of the new functions:
Examples, especially for the old API:
* \code * void processChar( char16_t c ) * { * cout << " " << c; * } * \endcode *
* \code * void traverseForward(CharacterIterator& iter) * { * for(char16_t c = iter.first(); c != CharacterIterator.DONE; c = iter.next()) { * processChar(c); * } * } * \endcode *
* \code * void traverseBackward(CharacterIterator& iter) * { * for(char16_t c = iter.last(); c != CharacterIterator.DONE; c = iter.previous()) { * processChar(c); * } * } * \endcode *
* \code * void traverseOut(CharacterIterator& iter, int32_t pos) * { * char16_t c; * for (c = iter.setIndex(pos); * c != CharacterIterator.DONE && (Unicode::isLetter(c) || Unicode::isDigit(c)); * c = iter.next()) {} * int32_t end = iter.getIndex(); * for (c = iter.setIndex(pos); * c != CharacterIterator.DONE && (Unicode::isLetter(c) || Unicode::isDigit(c)); * c = iter.previous()) {} * int32_t start = iter.getIndex() + 1; * * cout << "start: " << start << " end: " << end << endl; * for (c = iter.setIndex(start); iter.getIndex() < end; c = iter.next() ) { * processChar(c); * } * } * \endcode *
* \code * void CharacterIterator_Example( void ) * { * cout << endl << "===== CharacterIterator_Example: =====" << endl; * UnicodeString text("Ein kleiner Satz."); * StringCharacterIterator iterator(text); * cout << "----- traverseForward: -----------" << endl; * traverseForward( iterator ); * cout << endl << endl << "----- traverseBackward: ----------" << endl; * traverseBackward( iterator ); * cout << endl << endl << "----- traverseOut: ---------------" << endl; * traverseOut( iterator, 7 ); * cout << endl << endl << "-----" << endl; * } * \endcode *