This API is used to convert codepage or character encoded data to and * from UTF-16. You can open a converter with {@link ucnv_open() }. With that * converter, you can get its properties, set options, convert your data and * close the converter.
Since many software programs recognize different converter names for * different types of converters, there are other functions in this API to * iterate over the converter aliases. The functions {@link ucnv_getAvailableName() }, * {@link ucnv_getAlias() } and {@link ucnv_getStandardName() } are some of the * more frequently used alias functions to get this information.
When a converter encounters an illegal, irregular, invalid or unmappable character * its default behavior is to use a substitution character to replace the * bad byte sequence. This behavior can be changed by using {@link ucnv_setFromUCallBack() } * or {@link ucnv_setToUCallBack() } on the converter. The header ucnv_err.h defines * many other callback actions that can be used instead of a character substitution.
More information about this API can be found in our * User's * Guide.
NULL
A converter name for ICU 1.5 and above may contain options * like a locale specification to control the specific behavior of * the newly instantiated converter. * The meaning of the options depends on the particular converter. * If an option is not defined for or recognized by a given converter, then it is ignored.
Options are appended to the converter name string, with a * UCNV_OPTION_SEP_CHAR between the name and the first option and * also between adjacent options.
UCNV_OPTION_SEP_CHAR
If the alias is ambiguous, then the preferred converter is used * and the status is set to U_AMBIGUOUS_ALIAS_WARNING.
The conversion behavior and names can vary between platforms. ICU may * convert some characters differently from other platforms. Details on this topic * are in the User's * Guide. Aliases starting with a "cp" prefix have no specific meaning * other than its an alias starting with the letters "cp". Please do not * associate any meaning to these aliases.
See ucnv_open for the complete details
Creates a UConverter object specified from a packageName and a converterName.
The packageName and converterName must point to an ICU udata object, as defined by * udata_open( packageName, "cnv", converterName, err) or equivalent. * Typically, packageName will refer to a (.dat) file, or to a package registered with * udata_setAppData(). Using a full file or directory pathname for packageName is deprecated.
udata_open( packageName, "cnv", converterName, err)
The name will NOT be looked up in the alias mechanism, nor will the converter be * stored in the converter cache or the alias table. The only way to open further converters * is call this function multiple times, or use the ucnv_safeClone() function to clone a * 'master' converter.
A future version of ICU may add alias table lookups and/or caching * to this function.
Example Use: * cnv = ucnv_openPackage("myapp", "myconverter", &err); *
cnv = ucnv_openPackage("myapp", "myconverter", &err);
U_BUFFER_OVERFLOW_ERROR
Handling of surrogate pairs and supplementary-plane code points: * There are two different kinds of codepages that provide mappings for surrogate characters: *
U_INDEX_OUTOFBOUNDS_ERROR
ucnv_countAliases()
const char *
ucnv_getStandardName
uenum_close
* Example alias table: * conv alias1 { STANDARD1 } alias2 { STANDARD1* } *
* Result of ucnv_getStandardName("conv", "STANDARD1") from example * alias table: * "alias2" * * @param name original converter name * @param standard name of the standard governing the names; MIME and IANA * are such standards * @param pErrorCode result of operation * @return returns the standard converter name; * if a standard converter name cannot be determined, * then NULL is returned. Owned by the library. * @stable ICU 2.0 */ U_STABLE const char * U_EXPORT2 ucnv_getStandardName(const char *name, const char *standard, UErrorCode *pErrorCode); /** * This function will return the internal canonical converter name of the * tagged alias. This is the opposite of ucnv_openStandardNames, which * returns the tagged alias given the canonical name. *
* Result of ucnv_getStandardName("alias1", "STANDARD1") from example * alias table: * "conv" * * @return returns the canonical converter name; * if a standard or alias name cannot be determined, * then NULL is returned. The returned string is * owned by the library. * @see ucnv_getStandardName * @stable ICU 2.4 */ U_STABLE const char * U_EXPORT2 ucnv_getCanonicalName(const char *alias, const char *standard, UErrorCode *pErrorCode); /** * Returns the current default converter name. If you want to open * a default converter, you do not need to use this function. * It is faster if you pass a NULL argument to ucnv_open the * default converter. * * If U_CHARSET_IS_UTF8 is defined to 1 in utypes.h then this function * always returns "UTF-8". * * @return returns the current default converter name. * Storage owned by the library * @see ucnv_setDefaultName * @stable ICU 2.0 */ U_STABLE const char * U_EXPORT2 ucnv_getDefaultName(void); #ifndef U_HIDE_SYSTEM_API /** * This function is not thread safe. DO NOT call this function when ANY ICU * function is being used from more than one thread! This function sets the * current default converter name. If this function needs to be called, it * should be called during application initialization. Most of the time, the * results from ucnv_getDefaultName() or ucnv_open with a NULL string argument * is sufficient for your application. * * If U_CHARSET_IS_UTF8 is defined to 1 in utypes.h then this function * does nothing. * * @param name the converter name to be the default (must be known by ICU). * @see ucnv_getDefaultName * @system * @stable ICU 2.0 */ U_STABLE void U_EXPORT2 ucnv_setDefaultName(const char *name); #endif /* U_HIDE_SYSTEM_API */ /** * Fixes the backslash character mismapping. For example, in SJIS, the backslash * character in the ASCII portion is also used to represent the yen currency sign. * When mapping from Unicode character 0x005C, it's unclear whether to map the * character back to yen or backslash in SJIS. This function will take the input * buffer and replace all the yen sign characters with backslash. This is necessary * when the user tries to open a file with the input buffer on Windows. * This function will test the converter to see whether such mapping is * required. You can sometimes avoid using this function by using the correct version * of Shift-JIS. * * @param cnv The converter representing the target codepage. * @param source the input buffer to be fixed * @param sourceLen the length of the input buffer * @see ucnv_isAmbiguous * @stable ICU 2.0 */ U_STABLE void U_EXPORT2 ucnv_fixFileSeparator(const UConverter *cnv, UChar *source, int32_t sourceLen); /** * Determines if the converter contains ambiguous mappings of the same * character or not. * @param cnv the converter to be tested * @return TRUE if the converter contains ambiguous mapping of the same * character, FALSE otherwise. * @stable ICU 2.0 */ U_STABLE UBool U_EXPORT2 ucnv_isAmbiguous(const UConverter *cnv); /** * Sets the converter to use fallback mappings or not. * Regardless of this flag, the converter will always use * fallbacks from Unicode Private Use code points, as well as * reverse fallbacks (to Unicode). * For details see ".ucm File Format" * in the Conversion Data chapter of the ICU User Guide: * http://www.icu-project.org/userguide/conversion-data.html#ucmformat * * @param cnv The converter to set the fallback mapping usage on. * @param usesFallback TRUE if the user wants the converter to take advantage of the fallback * mapping, FALSE otherwise. * @stable ICU 2.0 * @see ucnv_usesFallback */ U_STABLE void U_EXPORT2 ucnv_setFallback(UConverter *cnv, UBool usesFallback); /** * Determines if the converter uses fallback mappings or not. * This flag has restrictions, see ucnv_setFallback(). * * @param cnv The converter to be tested * @return TRUE if the converter uses fallback, FALSE otherwise. * @stable ICU 2.0 * @see ucnv_setFallback */ U_STABLE UBool U_EXPORT2 ucnv_usesFallback(const UConverter *cnv); /** * Detects Unicode signature byte sequences at the start of the byte stream * and returns the charset name of the indicated Unicode charset. * NULL is returned when no Unicode signature is recognized. * The number of bytes in the signature is output as well. * * The caller can ucnv_open() a converter using the charset name. * The first code unit (UChar) from the start of the stream will be U+FEFF * (the Unicode BOM/signature character) and can usually be ignored. * * For most Unicode charsets it is also possible to ignore the indicated * number of initial stream bytes and start converting after them. * However, there are stateful Unicode charsets (UTF-7 and BOCU-1) for which * this will not work. Therefore, it is best to ignore the first output UChar * instead of the input signature bytes. *
* Usage: * \snippet samples/ucnv/convsamp.cpp ucnv_detectUnicodeSignature * * @param source The source string in which the signature should be detected. * @param sourceLength Length of the input string, or -1 if terminated with a NUL byte. * @param signatureLength A pointer to int32_t to receive the number of bytes that make up the signature * of the detected UTF. 0 if not detected. * Can be a NULL pointer. * @param pErrorCode ICU error code in/out parameter. * Must fulfill U_SUCCESS before the function call. * @return The name of the encoding detected. NULL if encoding is not detected. * @stable ICU 2.4 */ U_STABLE const char* U_EXPORT2 ucnv_detectUnicodeSignature(const char* source, int32_t sourceLength, int32_t *signatureLength, UErrorCode *pErrorCode); /** * Returns the number of UChars held in the converter's internal state * because more input is needed for completing the conversion. This function is * useful for mapping semantics of ICU's converter interface to those of iconv, * and this information is not needed for normal conversion. * @param cnv The converter in which the input is held * @param status ICU error code in/out parameter. * Must fulfill U_SUCCESS before the function call. * @return The number of UChars in the state. -1 if an error is encountered. * @stable ICU 3.4 */ U_STABLE int32_t U_EXPORT2 ucnv_fromUCountPending(const UConverter* cnv, UErrorCode* status); /** * Returns the number of chars held in the converter's internal state * because more input is needed for completing the conversion. This function is * useful for mapping semantics of ICU's converter interface to those of iconv, * and this information is not needed for normal conversion. * @param cnv The converter in which the input is held as internal state * @param status ICU error code in/out parameter. * Must fulfill U_SUCCESS before the function call. * @return The number of chars in the state. -1 if an error is encountered. * @stable ICU 3.4 */ U_STABLE int32_t U_EXPORT2 ucnv_toUCountPending(const UConverter* cnv, UErrorCode* status); /** * Returns whether or not the charset of the converter has a fixed number of bytes * per charset character. * An example of this are converters that are of the type UCNV_SBCS or UCNV_DBCS. * Another example is UTF-32 which is always 4 bytes per character. * A Unicode code point may be represented by more than one UTF-8 or UTF-16 code unit * but a UTF-32 converter encodes each code point with 4 bytes. * Note: This method is not intended to be used to determine whether the charset has a * fixed ratio of bytes to Unicode codes units for any particular Unicode encoding form. * FALSE is returned with the UErrorCode if error occurs or cnv is NULL. * @param cnv The converter to be tested * @param status ICU error code in/out paramter * @return TRUE if the converter is fixed-width * @stable ICU 4.8 */ U_STABLE UBool U_EXPORT2 ucnv_isFixedWidth(UConverter *cnv, UErrorCode *status); #endif #endif /*_UCNV*/