HMG-UNICODE
Unicode Documentation

Since version HMG.3.1.0 (2012/11/25), HMG at the same time supports ANSI and
Unicode character set, or only ANSI character set (for compatibility with
previous versions) depending on the choice of compilation in the build of the
library. By default HMG supports ANSI and Unicode character set (see
INCLUDE\SET_COMPILE_HMG_UNICODE.CH).
Unicode is the current standard in character
set, say Microsoft in your
documentation:
“Unicode is a
worldwide character encoding standard that provides a unique number to
represent each character used in modern computing, including technical symbols
and special characters used in publishing. Unicode is required by modern
standards, such as XML and ECMAScript (JavaScript), and is the official
mechanism for implementing ISO/IEC 10646 (UCS: Universal Character Set). It is
supported by many operating systems, all modern browsers, and many other
products. New Windows applications
should use Unicode to avoid the inconsistencies of varied code pages and to
aid in simplifying localization.”
Thereby HMG-Unicode
is the future in the xBase programming for Windows.
Since version HMG.3.2.0 (2013/12/08), HMG-Unicode is considered stable.
HMG-Unicode required set 'Encoding in UTF-8' in your text editor for all the
source code files which contain strings in languages using Unicode characters.
See Main.UNI.Demo in SAMPLES folder for Unicode
characters in Tamil language.
# General Functions/Commands
- HMG_SupportUnicode() -
Return .T. or .F.
- Return true only if HMG is
compiled for support the ANSI and Unicode character set.
- HMG_CharSetName() -
Return "UNICODE" --> if HMG is compiled for support the ANSI and Unicode
character set.
- Return "ANSI" --> if HMG is compiled for support ONLY
the ANSI character set.
- SET
CODEPAGE TO UNICODE - Sets the
character code page to UTF-8 (Unicode).
- If HMG is compiled for support
ANSI/Unicode character set: UTF-8 is default code page.
- HMG_IsCurrentCodePageUnicode() -
Returns TRUE if current code page is UTF-8.
- IF
HMG SUPPORT UNICODE [ RUN | STOP] --> This is a
security command for avoid error in program execution/compilation for
programmers that used HMG library with and
without support for the Unicode character set.
- RUN --> Only run
the program if HMG library supports the Unicode character set.
- STOP
--> Stop the execution of the program if HMG library supports the Unicode
character set (useful when deactivate the COMPILE_HMG_UNICODE directive to
build the HMG library).
- Note:
The programs written entirely in ANSI can be compiled easily with HMG-UNICODE,
adding to the beginning of the
function MAIN() the appropriate ANSI code page,
eg. SET CODEPAGE TO SPANISH, without need to
disabling
the COMPILE_HMG_UNICODE directive (in file
INCLUDE\SET_COMPILE_HMG_UNICODE.CH) and rebuild the HMG library.
The hybrid programs must alternate the appropriate ANSI code page with
UTF-8 code page according to the needs.
-
Remember: To develop applications that support the ANSI/UNICODE character set,
you should replace in your
programs ALL functions that
ONLY support the ANSI character set, by ANSI/UNICODE equivalent functions.
# Alternative string functions that
support ANSI/Unicode character set
ANSI/UNICODE ANSI Only
-
HMG_LEN() <=> LEN()
-
HMG_LOWER() <=>
LOWER()
-
HMG_UPPER() <=> UPPER()
-
HMG_PADC() <=> PADC()
-
HMG_PADL() <=> PADL()
-
HMG_PADR() <=> PADR()
-
HMG_ISALPHA() <=> ISALPHA()
-
HMG_ISDIGIT() <=> ISDIGIT()
-
HMG_ISLOWER() <=> ISLOWER()
-
HMG_ISUPPER() <=> ISUPPER()
-
HMG_ISALPHANUMERIC()
<=> RETURN (ISALPHA(c)
.OR. ISDIGIT(c))
-
(*) HB_USUBSTR() <=> SUBSTR()
-
(*) HB_ULEFT() <=> LEFT()
-
(*) HB_URIGHT() <=> RIGHT()
-
(*) HB_UAT() <=> AT()
-
(*) HB_UTF8RAT() <=> RAT()
-
(*) HB_UTF8STUFF() <=>
STUFF()
(*) Harbour native
functions
# Gets Unicode text value
-
HB_UCODE ( cUnicodeCharacter ) -->
Return nCode
-
HB_UCHAR ( nCode ) --> Return cUnicodeCharacter
-
HMG_GetUnicodeValue ( cUnicodeText ) --> Return array { nCode1,
nCode2, ..., nCodeN }
-
HMG_GetUnicodeCharacter ( { nCode1,
nCode2, ..., nCodeN } ) --> Return cUnicodeText
# UTF8 functions
-
HMG_IsUTF8 ( cString )
--> lBoolean
-
HMG_IsUTF8WithBOM ( cString )
--> lBoolean
-
HMG_UTF8RemoveBOM ( cString )
--> cString
-
HMG_UTF8InsertBOM ( cString )
--> cString
-
HMG_UNICODE_TO_ANSI ( cTextUNICODE
)
--> cTextANSI
- HMG_ANSI_TO_UNICODE ( cTextANSI ) --> cTextUNICODE
# Unicode functions
-
HMG_StrCmp ( cText1 , cText2
, [ lCaseSensitive ] ) --> CmpValue