Performing conversions

ULS provides two sets of functions for converting text to or from UCS-2.

Care should be taken as these two sets of functions differ slightly in their parameters. In particular, the UniStr*Ucs functions specify the output buffer first, followed by the input buffer. Conversely, the UniUconv*Ucs functions specify the input buffer parameters first, followed by the output buffer parameters.

In addition, the parameters used to specify length behave somewhat differently between the two sets of functions; see the next section for details.

UniStrToUcs and UniStrFromUcs

These functions may be used to convert a string as a single operation, using relatively straightforward syntax:

UniStrToUcs: Converts a multi-byte codepage string to a UCS-2 string.
UniStrFromUcs: Converts a UCS-2 string to a multi-byte codepage string.

Each of these functions takes the following parameters:

A UconvObject which defines the conversion.
An output buffer, which must have been previously allocated. This is a UniChar string in the case of UniStrToUcs, or a normal string in the case of UniStrFromUcs.
An input buffer, which must have been previously allocated. This is a normal string in the case of UniStrToUcs, or a UniChar string in the case of UniStrFromUcs.
An integer value indicating the total length of the output buffer (in bytes or UniChars, as applicable). This must include one position for the terminating null character.

If the function returns successfully, the converted string is placed in the output buffer; the other parameters are unchanged. If an error is encountered, the conversion aborts and an error code is returned. Character substitution is always enabled, regardless of the UconvObject's attributes.

UniUconvToUcs and UniUconvFromUcs

These functions are also used to convert strings, but with a greater level of control and error recovery:

UniUconvToUcs: Converts a multi-byte codepage string to a UCS-2 string.
UniUconvFromUcs: Converts a UCS-2 string to a multi-byte codepage string.

Each of these functions takes the following parameters:

A UconvObject which defines the conversion (including whether or not substitution is enabled).
A pointer to a previously allocated input buffer: this is a pointer to a normal string in the case of UniUconvToUcs, or pointer to a UniChar string in the case of UniUconvFromUcs.
A pointer to a size_t value indicating "input length": this indicates the number of characters in the input buffer that must still be converted. This should initially be set to the length of the input buffer (not including the terminating null character).
A pointer to a previously allocated output buffer: this is a pointer to a UniChar string in the case of UniUconvToUcs, or pointer to a normal string in the case of UniUconvFromUcs.
A pointer to a size_t value indicating "output length": this indicates the number of empty character positions available in the output buffer. This length does not include the position allocated for a terminating null character; it should initially be set to one less than the total length of the output buffer (in bytes or UniChars, as applicable).
A pointer to a size_t value containing a substitution count.

When the function returns, the converted string is placed in the output buffer, and the substitution count will contain the number of character substitutions performed. If an error is encountered during conversion, an error code will be returned, and the parameters will be modified as follows:

The output buffer will contain all characters successfully converted up to the point when the error occured.
The output length value will be decremented to indicate the number of unassigned character positions still available in the output buffer.
The pointer to the input buffer will be incremented to point to the first character which has not yet been converted.
The input length value will be decremented to indicate the number of characters in the input buffer which have not yet been converted.
The substitution count will contain the number of substitutions performed up to the point when the error occured.

If the conversion completes with no errors, the input length value will be set to 0 when the function returns.

Character substitution

During conversion, particularly when converting from UCS-2 to another codepage, it often occurs that a character is encountered in the input string for which no equivalent character exists in the target codepage. When this happens, the unsupported character is normally replaced by a generic symbol or "substitution character". This process is known as character substitution.

Every codepage (as well as UCS-2) has its own default substitution character which is normally used when converting text to that codepage. For example, under codepage 850 the default substitution character is 0x7F (""). The default substitution character for UCS-2 is U+FFFD (the Unicode replacement character).

You may specify a different substitution character in the UconvObject, either through the conversion specifier passed to UniCreateUconvObject, or subsequently through UniSetUconvObject. The substitution character should be a displayable glyph under the target codepage.

When using the UniUconvToUcs and UniUconvFromUcs functions, substitution may be disabled by setting the attributes of the UconvObject accordingly. However, if substitution is disabled, these functions will return with an error condition (with the results described above) whenever an unsupported character is encountered.

[Back] [Next]