Problem reading Unicode file

HMG Unicode versions 3.1.x related

Moderator: Rathinagiri

User avatar
Clip2Mania
Posts: 99
Joined: Fri Jun 13, 2014 7:16 am
Location: Belgium

Problem reading Unicode file

Post by Clip2Mania »

Anyone can suggest how to read unicode file attached?
I can open it in Windows Notepad without any problems.
I'm using HMG 3.3.1, 32 bits, Unicode.
Tried using memoread(), HB_Memoread() and FOpen(), FReadStr() combination,
both in ANSI & UNICODE versions of HMG, and apparently cannot open it. :cry:
Suggestions?
Thx,
Erik
Attachments
test_unicode.zip
(999 Bytes) Downloaded 461 times
User avatar
bpd2000
Posts: 1207
Joined: Sat Sep 10, 2011 4:07 am
Location: India

Re: Problem reading Unicode file

Post by bpd2000 »

Refer attached demo
You have to save file using Encoding UTF-8
Also refer
viewtopic.php?f=7&t=3689&p=34140&hilit= ... F+8#p34140
Attachments
DemoUni.rar
(603 Bytes) Downloaded 510 times
BPD
Convert Dream into Reality through HMG
User avatar
Clip2Mania
Posts: 99
Joined: Fri Jun 13, 2014 7:16 am
Location: Belgium

Re: Problem reading Unicode file

Post by Clip2Mania »

You have to save file using Encoding UTF-8
Yes, I saw the demo & read post previously. That is exactly the issue. I cannot save the file in UTF-8, because it comes from an external program (EAC). I have a lot of these files and need to read them, so manually opening & saving each file is way too much work for my customer. Furthermore, I want to save him the complexity :!:

In the mean time, found a command-line conversion tool on the web (http://www.autohotkey.com/board/topic/9 ... icode-cmd/. which allows to do this. I use "execute file" command to convert each file first. It's not really beautiful, but it kinda works... :geek:
Last edited by Clip2Mania on Tue Jul 22, 2014 11:02 am, edited 1 time in total.
User avatar
srvet_claudio
Posts: 2193
Joined: Thu Feb 25, 2010 8:43 pm
Location: Uruguay
Contact:

Re: Problem reading Unicode file

Post by srvet_claudio »

Clip2Mania wrote:Anyone can suggest how to read unicode file attached?
I can open it in Windows Notepad without any problems.
I'm using HMG 3.3.1, 32 bits, Unicode.
Tried using memoread(), HB_Memoread() and FOpen(), FReadStr() combination,
both in ANSI & UNICODE versions of HMG, and apparently cannot open it. :cry:
Suggestions?
Thx,
Erik
Hi Erik,
the problem is that you file is in Unicode UTF16LE (Unicode of Windows) and HMG work with UTF8,
see this code:

Code: Select all



#include "hmg.ch"

FUNCTION Main()

cText := HMG_UTF16LE_TO_UTF8 ("test_unicodeUTF16LE.txt")

MsgInfo (cText)

RETURN NIL



#pragma BEGINDUMP

#define UNICODE

#include "HMG_UNICODE.h"
#include <windows.h>
#include "hbapi.h"

HB_FUNC ( HMG_UTF16LE_TO_UTF8 )
{ 
   TCHAR *FileName = (TCHAR *) HMG_parc (1);
   
   HANDLE    hFile;
   DWORD     nFileSize;
   DWORD     nReadByte;

   hFile = CreateFile (FileName, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
   if (hFile == INVALID_HANDLE_VALUE)
       return;
          
   nFileSize = GetFileSize (hFile, NULL);
   if (nFileSize == INVALID_FILE_SIZE)
   {   CloseHandle (hFile); 
       return;
   }

   TCHAR cBuffer [ nFileSize ];

   ReadFile (hFile, cBuffer, nFileSize, &nReadByte, NULL);
   
   CloseHandle (hFile);

   HMG_retc (cBuffer);
}

#pragma ENDDUMP

Best regards.
Dr. Claudio Soto
(from Uruguay)
http://srvet.blogspot.com
User avatar
esgici
Posts: 4543
Joined: Wed Jul 30, 2008 9:17 pm
DBs Used: DBF
Location: iskenderun / Turkiye
Contact:

Re: Problem reading Unicode file

Post by esgici »

Simply another approach :

Code: Select all

/*
  Convert big-endian Unicode string to ANSI 
  CAUTION : Use only for big-endian Unicode string  !
*/

#include <hmg.ch>

PROCEDURE Main
   MsgBox( UniBE2UT8( HB_MEMOREAD( "test_unicode.txt" ) ) )
RETURN

FUNCTION UniBE2UT8( cBigEndianStr )          // Convert big-endian Unicode string to ANSI
RETURN ( SUBSTR( STRTRAN( cBigEndianStr, CHR(0), '' ), 3 ) )
Viva INTERNATIONAL HMG :D
User avatar
Clip2Mania
Posts: 99
Joined: Fri Jun 13, 2014 7:16 am
Location: Belgium

Re: Problem reading Unicode file

Post by Clip2Mania »

Fantastic, thanks gentlemen for the effort! :)
There is a problem with both codes, however
Mr. esgici's code does not read to the end of the file but stops somewhere :(
Dr. Claudio's code reads too much :) (see the "garbage" characters at the end of the file)
Not beautiful in Msgbox, but I can filter out in my code. ;)
Attachments
dr. claudio's result
dr. claudio's result
unicode_claudio.jpg (95.1 KiB) Viewed 10324 times
mr esgici's result
mr esgici's result
unicode_esgici.jpg (22.82 KiB) Viewed 10324 times
User avatar
esgici
Posts: 4543
Joined: Wed Jul 30, 2008 9:17 pm
DBs Used: DBF
Location: iskenderun / Turkiye
Contact:

Re: Problem reading Unicode file

Post by esgici »

Clip2Mania wrote:...
Mr. esgici's code does not read to the end of the file but stops somewhere :(
...
There isn't such truncate problem in my method and upper extra characters in Claudio's method at my side :(
UpperExtraCharactersInClaudio'sMethod
UpperExtraCharactersInClaudio'sMethod
UpperExtraCharactersInClaudio'sMethod.PNG (109.12 KiB) Viewed 10305 times
And physically there isn't such extra (letter or not) characters into your file :?

If you made this test on another file, please send me it.

Regards
Viva INTERNATIONAL HMG :D
User avatar
Clip2Mania
Posts: 99
Joined: Fri Jun 13, 2014 7:16 am
Location: Belgium

Re: Problem reading Unicode file

Post by Clip2Mania »

Mr esgici,
the trouble is in the accents/special characters (it always is :( )
I tried adding 'SET CODEPAGE TO UNICODE' at the beginning of the program, but that does not change anything.
Attachments
test2.jpg
test2.jpg (9.92 KiB) Viewed 10304 times
Chanson_EAC.zip
(1.12 KiB) Downloaded 409 times
User avatar
Clip2Mania
Posts: 99
Joined: Fri Jun 13, 2014 7:16 am
Location: Belgium

Re: Problem reading Unicode file

Post by Clip2Mania »

It is true, I added the éèçàôù characters in the file, because they are very common. In the example above,
if you leave them out, you will see that they are not correctly translated further in the file.
Attachments
original.jpg
original.jpg (95.82 KiB) Viewed 10303 times
User avatar
esgici
Posts: 4543
Joined: Wed Jul 30, 2008 9:17 pm
DBs Used: DBF
Location: iskenderun / Turkiye
Contact:

Re: Problem reading Unicode file

Post by esgici »

You are right, my conversion method not convenient to your needs :(
Viva INTERNATIONAL HMG :D
Post Reply