Problem reading Unicode file

HMG Unicode versions 3.1.x related

Moderator: Rathinagiri

User avatar
Clip2Mania
Posts: 99
Joined: Fri Jun 13, 2014 7:16 am
Location: Belgium
Been thanked: 1 time

Problem reading Unicode file

Post by Clip2Mania » Sat Jul 19, 2014 7:00 pm

Anyone can suggest how to read unicode file attached?
I can open it in Windows Notepad without any problems.
I'm using HMG 3.3.1, 32 bits, Unicode.
Tried using memoread(), HB_Memoread() and FOpen(), FReadStr() combination,
both in ANSI & UNICODE versions of HMG, and apparently cannot open it. :cry:
Suggestions?
Thx,
Erik
Attachments
test_unicode.zip
(999 Bytes) Downloaded 104 times

User avatar
bpd2000
Posts: 963
Joined: Sat Sep 10, 2011 4:07 am
Location: India
Has thanked: 136 times
Been thanked: 43 times

Post by bpd2000 » Sun Jul 20, 2014 4:14 am

Refer attached demo
You have to save file using Encoding UTF-8
Also refer
viewtopic.php?f=7&t=3689&p=34140&hilit= ... F+8#p34140
Attachments
DemoUni.rar
(603 Bytes) Downloaded 136 times
BPD
Convert Dream into Reality through HMG

User avatar
Clip2Mania
Posts: 99
Joined: Fri Jun 13, 2014 7:16 am
Location: Belgium
Been thanked: 1 time

Post by Clip2Mania » Sun Jul 20, 2014 6:58 am

You have to save file using Encoding UTF-8
Yes, I saw the demo & read post previously. That is exactly the issue. I cannot save the file in UTF-8, because it comes from an external program (EAC). I have a lot of these files and need to read them, so manually opening & saving each file is way too much work for my customer. Furthermore, I want to save him the complexity :!:

In the mean time, found a command-line conversion tool on the web (http://www.autohotkey.com/board/topic/9 ... icode-cmd/. which allows to do this. I use "execute file" command to convert each file first. It's not really beautiful, but it kinda works... :geek:
Last edited by Clip2Mania on Tue Jul 22, 2014 11:02 am, edited 1 time in total.

User avatar
srvet_claudio
Posts: 1958
Joined: Thu Feb 25, 2010 8:43 pm
Location: Uruguay
Has thanked: 32 times
Been thanked: 125 times
Contact:

Post by srvet_claudio » Tue Jul 22, 2014 1:57 am

Clip2Mania wrote:Anyone can suggest how to read unicode file attached?
I can open it in Windows Notepad without any problems.
I'm using HMG 3.3.1, 32 bits, Unicode.
Tried using memoread(), HB_Memoread() and FOpen(), FReadStr() combination,
both in ANSI & UNICODE versions of HMG, and apparently cannot open it. :cry:
Suggestions?
Thx,
Erik
Hi Erik,
the problem is that you file is in Unicode UTF16LE (Unicode of Windows) and HMG work with UTF8,
see this code:

Code: Select all



#include "hmg.ch"

FUNCTION Main()

cText := HMG_UTF16LE_TO_UTF8 ("test_unicodeUTF16LE.txt")

MsgInfo (cText)

RETURN NIL



#pragma BEGINDUMP

#define UNICODE

#include "HMG_UNICODE.h"
#include <windows.h>
#include "hbapi.h"

HB_FUNC ( HMG_UTF16LE_TO_UTF8 )
{ 
   TCHAR *FileName = (TCHAR *) HMG_parc (1);
   
   HANDLE    hFile;
   DWORD     nFileSize;
   DWORD     nReadByte;

   hFile = CreateFile (FileName, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
   if (hFile == INVALID_HANDLE_VALUE)
       return;
          
   nFileSize = GetFileSize (hFile, NULL);
   if (nFileSize == INVALID_FILE_SIZE)
   {   CloseHandle (hFile); 
       return;
   }

   TCHAR cBuffer [ nFileSize ];

   ReadFile (hFile, cBuffer, nFileSize, &nReadByte, NULL);
   
   CloseHandle (hFile);

   HMG_retc (cBuffer);
}

#pragma ENDDUMP

Best regards.
Dr. Claudio Soto
(from Uruguay)
http://srvet.blogspot.com

User avatar
esgici
Posts: 4363
Joined: Wed Jul 30, 2008 9:17 pm
DBs Used: DBF
Location: iskenderun / Turkiye
Has thanked: 247 times
Been thanked: 91 times
Contact:

Post by esgici » Tue Jul 22, 2014 2:37 am

Simply another approach :

Code: Select all

/*
  Convert big-endian Unicode string to ANSI 
  CAUTION : Use only for big-endian Unicode string  !
*/

#include <hmg.ch>

PROCEDURE Main
   MsgBox( UniBE2UT8( HB_MEMOREAD( "test_unicode.txt" ) ) )
RETURN

FUNCTION UniBE2UT8( cBigEndianStr )          // Convert big-endian Unicode string to ANSI
RETURN ( SUBSTR( STRTRAN( cBigEndianStr, CHR(0), '' ), 3 ) )
Viva INTERNATIONAL HMG :D

User avatar
Clip2Mania
Posts: 99
Joined: Fri Jun 13, 2014 7:16 am
Location: Belgium
Been thanked: 1 time

Post by Clip2Mania » Tue Jul 22, 2014 7:57 am

Fantastic, thanks gentlemen for the effort! :)
There is a problem with both codes, however
Mr. esgici's code does not read to the end of the file but stops somewhere :(
Dr. Claudio's code reads too much :) (see the "garbage" characters at the end of the file)
Not beautiful in Msgbox, but I can filter out in my code. ;)
Attachments
unicode_claudio.jpg
dr. claudio's result
unicode_claudio.jpg (95.1 KiB) Viewed 1422 times
unicode_esgici.jpg
mr esgici's result
unicode_esgici.jpg (22.82 KiB) Viewed 1422 times

User avatar
esgici
Posts: 4363
Joined: Wed Jul 30, 2008 9:17 pm
DBs Used: DBF
Location: iskenderun / Turkiye
Has thanked: 247 times
Been thanked: 91 times
Contact:

Post by esgici » Tue Jul 22, 2014 10:48 am

Clip2Mania wrote:...
Mr. esgici's code does not read to the end of the file but stops somewhere :(
...
There isn't such truncate problem in my method and upper extra characters in Claudio's method at my side :(
UpperExtraCharactersInClaudio'sMethod.PNG
UpperExtraCharactersInClaudio'sMethod
UpperExtraCharactersInClaudio'sMethod.PNG (109.12 KiB) Viewed 1403 times
And physically there isn't such extra (letter or not) characters into your file :?

If you made this test on another file, please send me it.

Regards
Viva INTERNATIONAL HMG :D

User avatar
Clip2Mania
Posts: 99
Joined: Fri Jun 13, 2014 7:16 am
Location: Belgium
Been thanked: 1 time

Post by Clip2Mania » Tue Jul 22, 2014 11:11 am

Mr esgici,
the trouble is in the accents/special characters (it always is :( )
I tried adding 'SET CODEPAGE TO UNICODE' at the beginning of the program, but that does not change anything.
Attachments
test2.jpg
test2.jpg (9.92 KiB) Viewed 1402 times
Chanson_EAC.zip
(1.12 KiB) Downloaded 62 times

User avatar
Clip2Mania
Posts: 99
Joined: Fri Jun 13, 2014 7:16 am
Location: Belgium
Been thanked: 1 time

Post by Clip2Mania » Tue Jul 22, 2014 11:17 am

It is true, I added the éèçàôù characters in the file, because they are very common. In the example above,
if you leave them out, you will see that they are not correctly translated further in the file.
Attachments
original.jpg
original.jpg (95.82 KiB) Viewed 1401 times

User avatar
esgici
Posts: 4363
Joined: Wed Jul 30, 2008 9:17 pm
DBs Used: DBF
Location: iskenderun / Turkiye
Has thanked: 247 times
Been thanked: 91 times
Contact:

Post by esgici » Tue Jul 22, 2014 12:25 pm

You are right, my conversion method not convenient to your needs :(
Viva INTERNATIONAL HMG :D

Post Reply