HMG-UNICODE
Unicode Documentation

Since version HMG.3.1.0 (2012/11/25), HMG at the same time supports ANSI and Unicode character set, or only ANSI character set (for compatibility with previous versions) depending on the choice of compilation in the build of the library. By default HMG supports ANSI and Unicode character set (see INCLUDE\SET_COMPILE_HMG_UNICODE.CH).

Unicode is the current standard in character set, say Microsoft in your documentation:

“Unicode is a worldwide character encoding standard that provides a unique number to represent each character used in modern computing, including technical symbols and special characters used in publishing. Unicode is required by modern standards, such as XML and ECMAScript (JavaScript), and is the official mechanism for implementing ISO/IEC 10646 (UCS: Universal Character Set). It is supported by many operating systems, all modern browsers, and many other products. New Windows applications should use Unicode to avoid the inconsistencies of varied code pages and to aid in simplifying localization.”

Thereby HMG-Unicode is the future in the xBase programming for Windows. Since version HMG.3.2.0 (2013/12/08), HMG-Unicode is considered stable.

HMG-Unicode required set 'Encoding in UTF-8' in your text editor for all the source code files which contain strings in languages using Unicode characters. See Main.UNI.Demo in SAMPLES folder for Unicode characters in Tamil language.

# General Functions/Commands

- HMG_SupportUnicode() - Return .T. or .F.

- Return true only if HMG is compiled for support the ANSI and Unicode character set.

- HMG_CharSetName() - Return "UNICODE" --> if HMG is compiled for support the ANSI and Unicode character set.

- Return "ANSI" --> if HMG is compiled for support ONLY the ANSI character set.

- SET CODEPAGE TO UNICODE - Sets the character code page to UTF-8 (Unicode).

- If HMG is compiled for support ANSI/Unicode character set: UTF-8 is default code page.

- HMG_IsCurrentCodePageUnicode() - Returns TRUE if current code page is UTF-8.

- IF HMG SUPPORT UNICODE [ RUN | STOP] --> This is a security command for avoid error in program execution/compilation for

programmers that used HMG library with and without support for the Unicode character set.

- RUN --> Only run the program if HMG library supports the Unicode character set.

- STOP --> Stop the execution of the program if HMG library supports the Unicode character set (useful when deactivate the COMPILE_HMG_UNICODE directive to build the HMG library).

- Note: The programs written entirely in ANSI can be compiled easily with HMG-UNICODE, adding to the beginning of the

function MAIN() the appropriate ANSI code page, eg. SET CODEPAGE TO SPANISH, without need to disabling

the COMPILE_HMG_UNICODE directive (in file INCLUDE\SET_COMPILE_HMG_UNICODE.CH) and rebuild the HMG library.

The hybrid programs must alternate the appropriate ANSI code page with UTF-8 code page according to the needs.

- Remember: To develop applications that support the ANSI/UNICODE character set, you should replace in your

programs ALL functions that ONLY support the ANSI character set, by ANSI/UNICODE equivalent functions.

# Alternative string functions that support ANSI/Unicode character set

ANSI/UNICODE ANSI Only

- HMG_LEN() <=> LEN()

- HMG_LOWER() <=> LOWER()

- HMG_UPPER() <=> UPPER()

- HMG_PADC() <=> PADC()

- HMG_PADL() <=> PADL()

- HMG_PADR() <=> PADR()

- HMG_ISALPHA() <=> ISALPHA()

- HMG_ISDIGIT() <=> ISDIGIT()

- HMG_ISLOWER() <=> ISLOWER()

- HMG_ISUPPER() <=> ISUPPER()

- HMG_ISALPHANUMERIC() <=> RETURN (ISALPHA(c) .OR. ISDIGIT(c))

- (*) HB_USUBSTR() <=> SUBSTR()

- (*) HB_ULEFT() <=> LEFT()

- (*) HB_URIGHT() <=> RIGHT()

- (*) HB_UAT() <=> AT()

- (*) HB_UTF8RAT() <=> RAT()

- (*) HB_UTF8STUFF() <=> STUFF()

(*) Harbour native functions

# Gets Unicode text value

- HB_UCODE ( cUnicodeCharacter ) --> Return nCode

- HB_UCHAR ( nCode ) --> Return cUnicodeCharacter

- HMG_GetUnicodeValue ( cUnicodeText ) --> Return array { nCode1, nCode2, ..., nCodeN }

- HMG_GetUnicodeCharacter ( { nCode1, nCode2, ..., nCodeN } ) --> Return cUnicodeText

# UTF8 functions

- HMG_IsUTF8 ( cString ) --> lBoolean

- HMG_IsUTF8WithBOM ( cString ) --> lBoolean

- HMG_UTF8RemoveBOM ( cString ) --> cString

- HMG_UTF8InsertBOM ( cString ) --> cString

- HMG_UNICODE_TO_ANSI ( cTextUNICODE ) --> cTextANSI

- HMG_ANSI_TO_UNICODE ( cTextANSI ) --> cTextUNICODE

# Unicode functions

- HMG_StrCmp ( cText1 , cText2 , [ lCaseSensitive ] ) --> CmpValue