Collator
* Collator is an abstract base class. Subclasses implement * specific collation strategies. One subclass, * RuleBasedCollator, is currently provided and is applicable * to a wide set of languages. Other subclasses may be created to handle more * specialized needs. *
RuleBasedCollator
* Like other locale-sensitive classes, you can use the static factory method, * createInstance, to obtain the appropriate * Collator object for a given locale. You will only need to * look at the subclasses of Collator if you need to * understand the details of a particular collation strategy or if you need to * modify that strategy. *
createInstance
* The following example shows how to compare two strings using the * Collator for the default locale. * \htmlonly
\endhtmlonly * * \code * // Compare two strings in the default locale * UErrorCode success = U_ZERO_ERROR; * Collator* myCollator = Collator::createInstance(success); * if (myCollator->compare("abc", "ABC") < 0) * cout << "abc is less than ABC" << endl; * else * cout << "abc is greater than or equal to ABC" << endl; * \endcode * * \htmlonly
* \code * // Compare two strings in the default locale * UErrorCode success = U_ZERO_ERROR; * Collator* myCollator = Collator::createInstance(success); * if (myCollator->compare("abc", "ABC") < 0) * cout << "abc is less than ABC" << endl; * else * cout << "abc is greater than or equal to ABC" << endl; * \endcode *
* You can set a Collator's strength attribute to * determine the level of difference considered significant in comparisons. * Five strengths are provided: PRIMARY, SECONDARY, * TERTIARY, QUATERNARY and IDENTICAL. * The exact assignment of strengths to language features is locale dependent. * For example, in Czech, "e" and "f" are considered primary differences, * while "e" and "\u00EA" are secondary differences, "e" and "E" are tertiary * differences and "e" and "e" are identical. The following shows how both case * and accents could be ignored for US English. * \htmlonly
PRIMARY
SECONDARY
TERTIARY
QUATERNARY
IDENTICAL
\endhtmlonly * * \code * //Get the Collator for US English and set its strength to PRIMARY * UErrorCode success = U_ZERO_ERROR; * Collator* usCollator = Collator::createInstance(Locale::getUS(), success); * usCollator->setStrength(Collator::PRIMARY); * if (usCollator->compare("abc", "ABC") == 0) * cout << "'abc' and 'ABC' strings are equivalent with strength PRIMARY" << endl; * \endcode * * \htmlonly
* \code * //Get the Collator for US English and set its strength to PRIMARY * UErrorCode success = U_ZERO_ERROR; * Collator* usCollator = Collator::createInstance(Locale::getUS(), success); * usCollator->setStrength(Collator::PRIMARY); * if (usCollator->compare("abc", "ABC") == 0) * cout << "'abc' and 'ABC' strings are equivalent with strength PRIMARY" << endl; * \endcode *
getSortKey
strcmp()
CollationKey
* Note: Collators with different Locale, * and CollationStrength settings will return different sort * orders for the same set of strings. Locales have specific collation rules, * and the way in which secondary and tertiary differences are taken into * account, for example, will result in a different sorting order for same * strings. *
Example of use: *
* . char16_t ABC[] = {0x41, 0x42, 0x43, 0}; // = "ABC" * . char16_t abc[] = {0x61, 0x62, 0x63, 0}; // = "abc" * . UErrorCode status = U_ZERO_ERROR; * . Collator *myCollation = * . Collator::createInstance(Locale::getUS(), status); * . if (U_FAILURE(status)) return; * . myCollation->setStrength(Collator::PRIMARY); * . // result would be Collator::EQUAL ("abc" == "ABC") * . // (no primary difference between "abc" and "ABC") * . Collator::EComparisonResult result = * . myCollation->compare(abc, 3, ABC, 3); * . myCollation->setStrength(Collator::TERTIARY); * . // result would be Collator::LESS ("abc" <<< "ABC") * . // (with tertiary difference between "abc" and "ABC") * . result = myCollation->compare(abc, 3, ABC, 3); *
Use CollationKey::equals or CollationKey::compare to compare the * generated sort keys. * If the source string is null, a null collation key will be returned. * * Note that sort keys are often less efficient than simply doing comparison. * For more details, see the ICU User Guide. * * @param source the source string to be transformed into a sort key. * @param key the collation key to be filled in * @param status the error code status. * @return the collation key of the string based on the collation rules. * @see CollationKey#compare * @stable ICU 2.0 */ virtual CollationKey& getCollationKey(const UnicodeString& source, CollationKey& key, UErrorCode& status) const = 0; /** * Transforms the string into a series of characters that can be compared * with CollationKey::compareTo. It is not possible to restore the original * string from the chars in the sort key. *
Use CollationKey::equals or CollationKey::compare to compare the * generated sort keys. *
If the source string is null, a null collation key will be returned. * * Note that sort keys are often less efficient than simply doing comparison. * For more details, see the ICU User Guide. * * @param source the source string to be transformed into a sort key. * @param sourceLength length of the collation key * @param key the collation key to be filled in * @param status the error code status. * @return the collation key of the string based on the collation rules. * @see CollationKey#compare * @stable ICU 2.0 */ virtual CollationKey& getCollationKey(const char16_t*source, int32_t sourceLength, CollationKey& key, UErrorCode& status) const = 0; /** * Generates the hash code for the collation object * @stable ICU 2.0 */ virtual int32_t hashCode(void) const = 0; #ifndef U_FORCE_HIDE_DEPRECATED_API /** * Gets the locale of the Collator * * @param type can be either requested, valid or actual locale. For more * information see the definition of ULocDataLocaleType in * uloc.h * @param status the error code status. * @return locale where the collation data lives. If the collator * was instantiated from rules, locale is empty. * @deprecated ICU 2.8 This API is under consideration for revision * in ICU 3.0. */ virtual Locale getLocale(ULocDataLocaleType type, UErrorCode& status) const = 0; #endif // U_FORCE_HIDE_DEPRECATED_API /** * Convenience method for comparing two strings based on the collation rules. * @param source the source string to be compared with. * @param target the target string to be compared with. * @return true if the first string is greater than the second one, * according to the collation rules. false, otherwise. * @see Collator#compare * @stable ICU 2.0 */ UBool greater(const UnicodeString& source, const UnicodeString& target) const; /** * Convenience method for comparing two strings based on the collation rules. * @param source the source string to be compared with. * @param target the target string to be compared with. * @return true if the first string is greater than or equal to the second * one, according to the collation rules. false, otherwise. * @see Collator#compare * @stable ICU 2.0 */ UBool greaterOrEqual(const UnicodeString& source, const UnicodeString& target) const; /** * Convenience method for comparing two strings based on the collation rules. * @param source the source string to be compared with. * @param target the target string to be compared with. * @return true if the strings are equal according to the collation rules. * false, otherwise. * @see Collator#compare * @stable ICU 2.0 */ UBool equals(const UnicodeString& source, const UnicodeString& target) const; #ifndef U_FORCE_HIDE_DEPRECATED_API /** * Determines the minimum strength that will be used in comparison or * transformation. *
E.g. with strength == SECONDARY, the tertiary difference is ignored *
E.g. with strength == PRIMARY, the secondary and tertiary difference * are ignored. * @return the current comparison level. * @see Collator#setStrength * @deprecated ICU 2.6 Use getAttribute(UCOL_STRENGTH...) instead */ virtual ECollationStrength getStrength(void) const; /** * Sets the minimum strength to be used in comparison or transformation. *
* \code * UErrorCode status = U_ZERO_ERROR; * Collator*myCollation = Collator::createInstance(Locale::getUS(), status); * if (U_FAILURE(status)) return; * myCollation->setStrength(Collator::PRIMARY); * // result will be "abc" == "ABC" * // tertiary differences will be ignored * Collator::ComparisonResult result = myCollation->compare("abc", "ABC"); * \endcode *
The reordering codes are a combination of script codes and reorder codes. * @param reorderCodes An array of script codes in the new order. This can be NULL if the * length is also set to 0. An empty array will clear any reordering codes on the collator. * @param reorderCodesLength The length of reorderCodes. * @param status error code * @see ucol_setReorderCodes * @see Collator#getReorderCodes * @see Collator#getEquivalentReorderCodes * @see UScriptCode * @see UColReorderCode * @stable ICU 4.8 */ virtual void setReorderCodes(const int32_t* reorderCodes, int32_t reorderCodesLength, UErrorCode& status) ; /** * Retrieves the reorder codes that are grouped with the given reorder code. Some reorder * codes will be grouped and must reorder together. * Beginning with ICU 55, scripts only reorder together if they are primary-equal, * for example Hiragana and Katakana. * * @param reorderCode The reorder code to determine equivalence for. * @param dest The array to fill with the script equivalence reordering codes. * @param destCapacity The length of dest. If it is 0, then dest may be NULL and the * function will only return the length of the result without writing any codes (pre-flighting). * @param status A reference to an error code value, which must not indicate * a failure before the function call. * @return The length of the of the reordering code equivalence array. * @see ucol_setReorderCodes * @see Collator#getReorderCodes * @see Collator#setReorderCodes * @see UScriptCode * @see UColReorderCode * @stable ICU 4.8 */ static int32_t U_EXPORT2 getEquivalentReorderCodes(int32_t reorderCode, int32_t* dest, int32_t destCapacity, UErrorCode& status); /** * Get name of the object for the desired Locale, in the desired language * @param objectLocale must be from getAvailableLocales * @param displayLocale specifies the desired locale for output * @param name the fill-in parameter of the return value * @return display-able name of the object for the object locale in the * desired language * @stable ICU 2.0 */ static UnicodeString& U_EXPORT2 getDisplayName(const Locale& objectLocale, const Locale& displayLocale, UnicodeString& name); /** * Get name of the object for the desired Locale, in the language of the * default locale. * @param objectLocale must be from getAvailableLocales * @param name the fill-in parameter of the return value * @return name of the object for the desired locale in the default language * @stable ICU 2.0 */ static UnicodeString& U_EXPORT2 getDisplayName(const Locale& objectLocale, UnicodeString& name); /** * Get the set of Locales for which Collations are installed. * *
Note this does not include locales supported by registered collators. * If collators might have been registered, use the overload of getAvailableLocales * that returns a StringEnumeration.
* If standard locale display names are sufficient, Collator instances can * be registered using registerInstance instead.
* Note: if the collators are to be used from C APIs, they must be instances * of RuleBasedCollator.