Question about RichEditBox handling of Unicode text files

HMG Unicode versions 3.1.x related

Moderator: Rathinagiri

Post Reply
User avatar
kcarmody
Posts: 152
Joined: Tue Oct 07, 2014 11:13 am
Contact:

Question about RichEditBox handling of Unicode text files

Post by kcarmody »

A few days ago, I posted this message in the HMG Source forum (viewtopic.php?f=8&t=4031) but I'm reposting it here since it asks a question about how the HMG rich edit control handles Unicode text files.

RichEditBox has two methods that handle files, RtfLoadFile and RtfSaveFile. These methods call RichEditBox_StreamIn and RichEditBox_StreamOut in c_richeditbox.c.

These two functions handle RTF and ANSI text files OK, but have some problems with Unicode text files. They seem to ignore the byte order marks (BOM) that are usually necessary for software to recognize text files as Unicode text files.

RichEditBox_StreamIn removes the BOM from a UTF-8 text file (nDataFormat = 1), but it does not remove the BOM from a UTF-16 text file (nDataFormat = 3). This behavior is actually implicit in the Windows code that the function calls.

RichEditBox_StreamOut does not add any BOMs, either to UTF-8 (nDataFormat = 1 or 2), or to UTF-16 (nDataFomat = 3). Some software can recognize unmarked UTF-8, but no software I have ever seen recognizes unmarked UTF-16.

I think that Windows acts this way because the EM_STREAMIN and EM_STREAMOUT messages are designed for "data streams", which may be internal buffers as well as file contents. Windows seems to assume that the developer will take care of BOMs if the data stream is going to or from a file.

All software that handles Unicode text files recognizes marked text files, so there is never any harm in putting a BOM in, while plenty of harm can come from leaving it out.

Both of these functions include a case (nDataFormat = 5) for UTF-8 RTF, but this is useless, as RTF encodes all Unicode characters as plain text RTF commands. So you never see a UTF-8 RTF file, and if you did, nothing would open it.

I came across the BOM problem when I was enhancing the Rich Edit Demo, viewtopic.php?f=9&t=4030. It was important to me to be able to read and write text files, so I added some workarounds to the demo to fix the behavior of RichEditBox_StreamIn/Out. This was a quick fix using Memoread and Memowrite, but it would be better to use fread and fwrite, either in Harbour or in C.

I could add such fixes into h_controlmisc.prg (definition of RtfLoadFile and RtfSaveFile methods) or into c_richeditbox.c (definition of RichEditBox_StreamIn/Out), but that would change the behavior of these methods and functions.

The question is, should these methods and functions be changed so that they handle BOMs? It might break existing code if we do. But I suspect that no one is using this code now, as it does not handle BOMs properly.

Kevin
User avatar
esgici
Posts: 4543
Joined: Wed Jul 30, 2008 9:17 pm
DBs Used: DBF
Location: iskenderun / Turkiye
Contact:

Re: Question about RichEditBox handling of Unicode text files

Post by esgici »

Hello Mr. Carmody

I know, goal of your posts about Rich-Edit-Box and especially this one is develop the control. Since I'm not a developer I can't respond your questions. But I learned many thing by reading that posts. Thanks.

If I remember correct, there are some useful materials, articles in your internet site, especially about Unicode. Now I can't find them :( If you not erased would you inform us about accessing way to these valuable materials ?

In other hand I have a personal problem related to your site : due to some complex problems, I can't access geocities.com :( Would you please place your articles to another place, especially related to software like XBase and Table Oriented Programming ?

I want offer my appreciations to your immense participation to software and especially xBase community.

Thank you very much.

Best regards.
Viva INTERNATIONAL HMG :D
User avatar
kcarmody
Posts: 152
Joined: Tue Oct 07, 2014 11:13 am
Contact:

Re: Question about RichEditBox handling of Unicode text files

Post by kcarmody »

Hi Escici,

You're right, I used to have a section on my site that listed every Unicode character, along with its name, in a set of HTML files. I developed this because I wanted to see all the Unicode characters in one place and be able to select them. But it became too much for me to keep up with all with the changes that Unicode made every year or so. However I found a much better solution, BabelMap, which I describe on my site http://kevincarmody.com/software/software.html

Geocities is permanently closed -- it's not just you! The geocities pages I was linking to have moved to oocities.org. I just updated these links on my site, if you want try again.

Thank you for your kind words.

Kevin
User avatar
esgici
Posts: 4543
Joined: Wed Jul 30, 2008 9:17 pm
DBs Used: DBF
Location: iskenderun / Turkiye
Contact:

Re: Question about RichEditBox handling of Unicode text files

Post by esgici »

kcarmody wrote:Hi Escici,

You're right, I used to have a section on my site that listed every Unicode character, along with its name, in a set of HTML files. I developed this because I wanted to see all the Unicode characters in one place and be able to select them. But it became too much for me to keep up with all with the changes that Unicode made every year or so. However I found a much better solution, BabelMap, which I describe on my site http://kevincarmody.com/software/software.html

Geocities is permanently closed -- it's not just you! The geocities pages I was linking to have moved to oocities.org. I just updated these links on my site, if you want try again.

Thank you for your kind words.

Kevin
Thanks Mr. Carmody

I remember more about Unicode; articles (probably more than one) on Unicode contains valuable info for programmers. I will inspect Babelmap.

Found "tablizer" but not in oogeocity, but rogeocity :?

And article not include your name. Is this correct ?

Best regards

Esgici
Viva INTERNATIONAL HMG :D
User avatar
kcarmody
Posts: 152
Joined: Tue Oct 07, 2014 11:13 am
Contact:

Re: Question about RichEditBox handling of Unicode text files

Post by kcarmody »

esgici wrote:And article not include your name.
I didn't write it.
esgici wrote:Is this correct ?
Yes, that's the article. It's also in oocities at http://www.oocities.org/tablizer/top.htm
User avatar
esgici
Posts: 4543
Joined: Wed Jul 30, 2008 9:17 pm
DBs Used: DBF
Location: iskenderun / Turkiye
Contact:

Re: Question about RichEditBox handling of Unicode text files

Post by esgici »

OK

Thanks Mr. Carmody

Regards
Viva INTERNATIONAL HMG :D
Post Reply