Harbour has at least two ways to represent a ANSI character constant in terms of its code point, e.g. CHR(0xA0) and E"\xA0" for no-break space. So I thought it should also have a way to do this for UTF-8. But after looking through the Harbour changelog and other documentation, I did not find any solution.
Of course, it is possible to simply put the character in quotes. This works well for some characters. But for others, like no-break space, this does not work well at all, since this character looks just like a regular space (U+0020). For Asian characters, another problem is that they can be difficult to read at point sizes usually used for Western characters.
Harbour does have a function HB_UTF8CHR() that converts a numeric code point to its UTF-8 representation. But this is executed only at runtime. So HB_UTF8CHR() of a constant integer is not considered a constant string from Harbour's point of view. It cannot be used in places where a constant is required, such as in an initialization expression for a STATIC variable, or in a CASE statement of a SWITCH block. It is also inefficient to do a conversion at runtime that could instead be done at compile time.
Fortunately, I discovered that a few functions are evaluated at compile time if they have constant arguments, and the result is therefore also considered a constant. For instance, CHR(99) is considered a constant, because it is evaluated at compile time, not runtime. I did some testing and discovered that the following functions in this category:
Code: Select all
+ // numeric and string
- * / %
^ // including negative and fractional exponents
0x $ == != < <= > >= .T. .Y. .F. .N. ! .NOT. .AND. .OR. {} {=>} {||} E""
ASC() AT() CHR() EMPTY() HB_BITAND() HB_BITNOT() HB_BITOR() HB_BITRESET() HB_BITSET() HB_BITSHIFT() HB_BITTEST() HB_BITXOR() IF() INT() LEN() LOWER() MAX() MIN() UPPER()
Code: Select all
ABS() ALLTRIM() EVAL() EXP() HB_UTF8ASC() HB_UTF8AT() HB_UTF8CHR() HB_UTF8LEFT() HB_UTF8LEN() HB_UTF8RAT() HB_UTF8RIGHT() HB_UTF8SUBSTR() ISALPHA() ISDIGIT() ISLOWER() ISUPPER() LEFT() LOG() LTRIM() MOD() PADC() PADL() PADR() RAT() REPLICATE() RIGHT() ROUND() RTRIM() SPACE() SQRT() STR() STRTRAN() STRZERO() STUFF() SUBSTR() TRANSFORM() TYPE() VAL() VALTYPE()
Code: Select all
#translate U(<c>) => ;
IF(<c> \< 0x80 , CHR( <c> ), ;
IF(<c> \< 0x0800 , CHR(INT(<c> / 0x40) + 0xC0) + CHR( <c> % 0x40 + 0x80), ;
IF(<c> \< 0x10000, CHR(INT(<c> / 0x1000) + 0xE0) + CHR(INT(<c> / 0x40) % 0x40 + 0x80) + CHR( <c> % 0x40 + 0x80), ;
CHR(INT(<c> / 0x40000) + 0xF0) + CHR(INT(<c> / 0x1000) % 0x40 + 0x80) + CHR(INT(<c> / 0x40) % 0x40 + 0x80) + CHR( <c> % 0x40 + 0x80))))
Kevin