Language messages at our library

Creative ideas/suggestions for HMG

Moderator: Rathinagiri

User avatar
apais
Posts: 440
Joined: Fri Aug 01, 2008 6:03 pm
DBs Used: DBF
Location: uruguay
Contact:

Re: Language messages at our library

Post by apais »

As I undesrtand i18n pulls all strings from a compiled program a write them to a file were you can substitute them with your own.
That include library strings.
See how in works on hbmk2

HTH
Angel
Angel Pais
Web Apps consultant/architect/developer.
HW_apache (webserver modules) co-developer.
HbTron (Html GUI for harbour desktop hybrid apps) co-developer.
https://www.hbtron.com
User avatar
Pablo César
Posts: 4059
Joined: Wed Sep 08, 2010 1:18 pm
Location: Curitiba - Brasil

Language messages at our library

Post by Pablo César »

Wow Angel, thanks for this useful info. I never stopped to read carefully this question of internationalization (I18N) in Harbour. It's seems so cool !

I've found following information which I wish to share:

Internationalization Concept (i18n)[/size](by Margaret Rouse) Internationalization (sometimes shortened to "I18N , meaning "I - eighteen letters -N") is the process of planning and implementing products and services so that they can easily be adapted to specific local languages and cultures, a process called localization . The internationalization process is sometimes called translation or localization enablement . Enablement can include:
  • Allowing space in user interfaces (for example, hardware labels, help pages, and online menus) for translation into languages that require more characters
  • Developing with products (such as Web editors or authoring tools) that can support international character sets ( Unicode )
  • Creating print or Web site graphic images so that their text labels can be translated inexpensively
  • Using written examples that have global meaning
  • For software, ensuring data space so that messages can be translated from languages with single-byte character codes (such as English) into languages requiring multiple-byte character codes (such as Japanese Kanji)

Harbour support for i18n
Massimo Belgrano
http://en.wikipedia.org/wiki/Internatio ... calization
http://www.debian.org/doc/manuals/intro-i18n/
https://github.com/vszakats/harbour-cor ... b-diff.txt
Harbour /j[<file>] generate i18n gettext file (.pot)

Harbour and xHarbour compilers have build-in support for internationalization (I18N) and it's enabled during compilation using the same compiler time switch -j[<file>] but the low level implementation in compilers and at runtime is completely different.

Harbour compiler is very close to 'gettext' functionality with some additional extensions.
It recognize the following functions at compiletime:

hb_i18n_gettext( <cText> [, <cDomain> ] )
hb_i18n_gettext_strict( <cText> [, <cDomain> ] )
hb_i18n_gettext_noop( <cText> [, <cDomain> ] )
hb_i18n_ngettext( <nCount>, <cText> [, <cDomain> ] )
hb_i18n_ngettext_strict( <nCount>, <cText> [, <cDomain> ] )
hb_i18n_ngettext_noop( <nCount>, <cText> [, <cDomain> ] )

And then generates .pot text files which are gettext compatible so it's possible to use gettext tools to process them (merge, translate, update, etc.). Additionally Harbour has own tool called hbi18n which can merge.pot files, add automatic translations from other .po[t] or .hbl files and generate compiled Harbour I18N binary modules as .hbl files.

Harbour supports plural forms translations, context domains, automatic CP translations, etc. The plural form support in Harbour is extended in comparison to gettext and allows to use non US languages as base. In MT mode each thread inherits I18N translation module from parent thread but then can set and use its own one. There is a set of HB_I18N_*() functions available for programmers at runtime which allows to make different operations on compiled I18N modules and also .po[t] files.

Using -w3 switch during compilation enable additional validation of hb_i18n_[n]gettext*() parameters so compiler generates warnings. Just like in original gettext it's suggested to use #define or #xtranslate macros instead of direct calls to hb_i18n_[n]gettext*() functions, i.e.:

Code: Select all

      # _I( <x,...> ) => hb_i18n_gettext( <x> )
      #xtranslate _IN( <x,...> ) => hb_i18n_ngettext( <x> )
or:
      #xtranslate I18N( <x,...> ) => hb_i18n_gettext( <x> )
It allows to keep source code shorter and if necessary easy switch to STRICT (for strict parameter validation) or NOOP (disabled at compile time I18N support) versions. Additionally Harbour compiler can recognize user I18N functions. They have the same name as above hb_i18n_*() functions but with additional user '_*' suffix so they are in the form like:

Code: Select all

      hb_i18n_[n]gettext{_strict,_noop,}_*([<params,...>])
Using them users can easy introduce their own I18N runtime modules. To reduce dependencies on external tools by default Harbour uses own format for compiled .po[t] files but it's planned to add also native runtime gettext support as optional user I18N interface.

In practice, this command reads all prg and saves them in an array all strings found in the previous functions in i18n()

First of all for internationalization I strongly suggest to use Harbour build in support for i18n functionality.

Harbour supports gettext compatible interface so you can use gettext like functions with hb_i18n_ prefix:

Code: Select all

   hb_i18n_gettext( <cMsg> [, <cContext> ] ) -> <cMsg>
   hb_i18n_ngettext( <nNum>, <cMsg> [, <cContext> ] ) -> <cMsg>
How to do it telling it to harbour? If we cannot tell it to harbour, then it will handle those strings incorrectly (f.e. when calculating its length in characters)

Just like in gettext you can use some macros for shortern form, i.e

Code: Select all

   #xtranslate _I( <x...> ) => hb_i18n_gettext( <x> )
   #xtranslate _IN( <x...> ) => hb_i18n_ngettext( <x> )
Compiler recognize above functions and generate gettext compatible .po files when -j switch is used. HBMK2 has build in support to update .po files, merge text from existing translations and generate .hbl files which later can be loaded dynamically by final application.

You can use many different GNU tools to automate translations. Please also note that final users can also create such translations which can be loaded dynamically. Of course if you want then you can link your executbale with compiled .hbl files using #pragma __*include* directives or even attach .po files compiling them at runtime.

Look at utils/hbmk2/hbmk2.*.po files as en example. Please note that they are updated automatically when you compile hbmk2 using hbmk2.hbp file. In ChangeLog you will find details about HB_I18N_*() functions.

So if you begin to use it then for sure it will make your life much easier reducing the job necessary for updating translations and also open the doors to users from other countires to create new xailer translations you can attach to next distribution. Everybody will be happy.
In the other hand, I also need to distribute programs that will be executed elsewhere. If I set a specific cp, how will behave when the user types text in an input box? I suppose that f.e. I cannot compare that text to a string constant. So, the best choise is again to write those constant strings using utf-8.
This text will be limited to the CP you set. If user hit some unsupported characters then they will be translated to '?'.

If you use CP like UTF8 then it can show all characters.
That's why I was asking for some kind of casting at compile time, to tell harbour that a constant string is in utf-8. Just like most C compilers do with L"", that store the string in utf-16. BTW, although I preffer utf-8, I will be happy also if utf-16 is using instead.
I still do not see why you need such casting and I think that you haven't analyzed how many new problems it may introduce.
I.e. let's imagine that we introduce new flag HB_IT_STRUTF8 working in similar way to HB_IT_MEMOFLAG in strings.

Then we will have to define what to do when user makes sth like:

Code: Select all

   <cUtf8> + <cStr>
Is the result UTF8 string or not? Or maybe it should cause RTE?

Which functions should remove this flag? i.e.: fread( hFile, @cUtf8, 1 ) seems to be trivial and we should remove it but what to do with translate( <cUtf8>, <cStrPict> ) etc.
Of course all such things can be defined and Harbour code updated to respect it anyhow we still have contry dependen problems, i.e. how UPPER() and LOWER() should work for UTF8 strings?
If it's library then what national strings you have to encode inside? utf-8, if possible.
Very good. So if you want to return some strings to user code then use hb_utf8ToStr() function. They will be translated to CP set by user.

BTW if you want to create OLE item with string using UTF-8 encoding then use recently added functionality and create new function:

/* __OLEVariantNewUTF8Str( <cUtf8Str> ) -> <pVariant> */

Code: Select all

HB_FUNC( __OLEVARIANTNEWUTF8STR )
{
   PHB_ITEM pString = hb_param( 1, HB_IT_STRING );

   if( pString )
   {
      VARIANT variant;
      const char * pszUtf8Str = hb_itemGetCPtr( pString );
      HB_SIZE nLen = hb_itemGetCLen( pString );
      UINT uiStrLen;
      BSTR strVal;

      uiStrLen = ( UINT ) hb_cdpUTF8StringLength( pszUtf8Str, nLen );
      strVal = SysAllocStringLen( NULL, uiStrLen );
      hb_cdpStrToU16( hb_cdpFindExt( "UTF8" ), HB_CDP_ENDIAN_NATIVE,
                      pszUtf8Str, nLen, strVal, uiStrLen + 1 );
      V_VT( &variant ) = VT_BSTR;
      V_BSTR( &variant ) = strVal;
      hb_oleItemPutVariant( hb_stackReturnItem(), &variant, HB_TRUE );
   }
   else
      hb_errRT_OLE( EG_ARG, 1018, 0, NULL, HB_ERR_FUNCNAME, NULL );
}
If it's final executable then you should well know what CP you are using in your code and simply set it at application startup code.
It could be a good solution, but if cp also drives sort operations, it's not so good. A user from another country could expect another result (f.e. when sorting a list of names from a database).
Yes he can. Anyhow if you want to execute this application from different countries sharing data you have to use exactly the same collation algorithm. Otherwise the indexes will be corrupted so you have to chose some arbitrary decision about sorting in your application and use it for all instances of your application.

BTW now we have one predefined sorting method in UTF8EX. Maybe in the future I'll add method for defining collations
dynamically but I'm seriously afraid what will happen with database indexes when users begin to update it and create
custom collations. I do not have enough energy to answer for possible hundreds messages about corrupted index files.

best regards,
Przemek

Fact is that extreme amounts of effort was/is put into Harbour to actually make it support truly international apps by adding unicode support to the HVM, unicode support to 3rd party interfaces, f.e. the Windows API, CP independent language modules, i18n support through the i18n API.

It's expected that it takes time to depart from old habits like "OEM", "ANSI" and legacy 8-bit codepages. Plus, adding
unicode support is not a toggle switch, but it needs serious work on the part of 3rd party lib developers and app developers
to get added and get added right.

[ There are still a few confusing aspects of Harbour CP handling (f.e. the fact that Harbour "CP" actually means CP + collation so there is a bit of culture specific information tied to the codepages as we know them in general). ]

But, there are some things which are routinely confused by users, one of the is 'language modules' and 'codepages', HB_LANGSELECT()/HB_USERLANG()/HB_LANG_* and HB_CDPSELECT()/HB_CODEPAGE_*.

HB_LANGSELECT(): It will simply select the strings/text used by certain Clipper-compatibility functions, f.e. the ones
used in error messages or CMONTH(), CDAY().

HB_USERLANG(): It will return the standard language ID that is setup in the OS.

You can use f.e.: HB_LANGSELECT( HB_USERLANG() )

HB_USERLANG() value is however not a CODEPAGE value, so it's wrong to pass it to HB_TRANSLATE() or HB_CDPSELECT(), as one of the posters tried to do.

Then it's a whole different story to use international support in certain 3rd party libs where f.e. unicode support is missing and/or broken. Namely MINIGUI has zero support for true unicode apps with Harbour, and I can only hope it will be added in the future.

Revision: 18407
http://harbour-project.svn.sourceforge. ... 7&view=rev
Author: druzus
Date: 2012-10-24 13:36:24 +0000 (Wed, 24 Oct 2012)

Log Message:
2012-10-24 15:35 UTC+0200 Przemyslaw Czerpak (druzus/at/poczta.onet.pl)
* harbour/include/hbexprb.c
! fixed typo in function IDs.
HB_I18N_NGETTEXT_STRICT() and HB_I18N_NGETTEXT_NOOP() were not
recognized as i18n gettext functions

* harbour/doc/cmpopt.txt
! fixed HB_I18N_NGETTEXT_NOOP*() syntax used in examples

* harbour/src/common/expropt2.c
* harbour/doc/cmpopt.txt
+ added compile time optimizations for expressions like
<exp> = <lVal>
<exp> == <lVal>
<exp> != <lVal>
<lVal> = <exp>
<lVal> == <exp>
<lVal> != <exp>
They are reduced to <exp> or !<exp>. Because it may disable
some runtime errors so it's not Clipper compatible optimization
and is enabled when -ko compiler switch is used.
hbi18n.exe - Harbour's tool to compile .po files

Syntax:
              hbi18n -m | -g | -a [-o<outfile>] [-e] [-q] <files1[.pot] ...>

HMGing a better world
"Matter tells space how to curve, space tells matter how to move."
Albert Einstein
User avatar
serge_girard
Posts: 3161
Joined: Sun Nov 25, 2012 2:44 pm
DBs Used: 1 MySQL - MariaDB
2 DBF
Location: Belgium
Contact:

Re: Language messages at our library

Post by serge_girard »

Great job all!

A bit complicated at first sight...

Serge
There's nothing you can do that can't be done...
Post Reply