ANSI -> UT8 Conversion

Post by **esgici** » Wed Dec 11, 2013 11:30 pm

Hi All

After OEM->ANSI, now we are in another migration ANSI -> Unicode.

We have string based conversion functions, but as far as I know not one file based.

I guess many people can't start that migration due to difficulty for convert program source files manually.

I hope this small program ( either considerable as an utility or not ) will be useful to our friends.

As always, please don't forgive my faults; any suggestion, correction, bug report are welcome

: Screen shoot of CFAN2UT8 program; CFANS2UT8.jpg (9.3 KiB) Viewed 6537 times

CFANS2UT8(src).zip: Source files for CFANS2UT8 program; (3.01 KiB) Downloaded 464 times

CFANS2UT8(exe).zip: Executable for CFANS2UT8 program; (1.17 MiB) Downloaded 433 times

Viva HMG

Javier Tovar · Post by **Javier Tovar** » Thu Dec 12, 2013 12:42 am

Gracias Sr. Esgici por compartir.
Saludos
////////////////////////////////////////////////////////////
Thanks for sharing Mr. Esgici.
regards

Post by **serge_girard** » Thu Dec 12, 2013 8:36 am

Thanks Esgici !

I'm busy converting 242Kb PRG files (12) to UTF-8.

I have a few suggestions:

1) Please make backup before conversion.
2) Progress of files + progress of lines instead of only individual lines.

Greetings and I will let you know the results!

Thanks, Serge

Post by **serge_girard** » Thu Dec 12, 2013 9:35 am

Esgici,

Conversion went well (a bit slow).

I did a file compare and all converted (=new) have 3 bytes extra at the very beginning: ef bb bf (= BOM)
and at the end an extra 0d 0a (= CRLF)

Code: Select all

20131212 09:22:18

Folder : P:\hmg.3.2\KEMP

ANALYSE.TXT : 101 lines,  1,757 bytes converted to UT8 format in 1,762  bytes.
BH_DOC.Prg : 534 lines,  12,841 bytes converted to UT8 format in 12,846  bytes.
BH_DOWNLOADS.Prg : 302 lines,  7,357 bytes converted to UT8 format in 7,360  bytes.
BH_EXE.Prg : 415 lines,  10,444 bytes converted to UT8 format in 10,449  bytes.
BH_PROG_AUTH.Prg : 677 lines,  18,935 bytes converted to UT8 format in 18,940  bytes.
bh_proj.Prg : 1,564 lines,  42,694 bytes converted to UT8 format in 42,699  bytes.
BH_TEXT.Prg : 571 lines,  13,446 bytes converted to UT8 format in 13,449  bytes.
BH_USERS.Prg : 1,193 lines,  36,291 bytes converted to UT8 format in 36,346  bytes.
HALLOCKS.PRG : 526 lines,  10,236 bytes converted to UT8 format in 10,239  bytes.
INIT.PRG : 632 lines,  16,252 bytes converted to UT8 format in 16,255  bytes.
KEMP.PRG : 2,486 lines,  65,370 bytes converted to UT8 format in 65,378  bytes.
KEMP_HELP.PRG : 101 lines,  1,827 bytes converted to UT8 format in 1,832  bytes.
KEMP_SETUP.prg : 540 lines,  12,338 bytes converted to UT8 format in 12,341  bytes.

20131212 10:09:13

So all looks very good. Later I will try compilation in HMG3.2; I will let you know.

Greetings, Serge

Post by **esgici** » Thu Dec 12, 2013 12:12 pm

Thanks to interested

Serge:

Sadly for now I haven't enough time to deal extra features

For some probable future works please give me a road map for backing up: what kind of backup will be better,

rename original files
move original files to a separate folder
compress original files
... etc

and please think naming issues when repeating process.

If only difference is BOM between two format, this means ANSI file don't include foreign ( non-English ) characters.

Cause of last extra CRLF may be different; anyway this isn't an important problem, I think

Anyway thanks to interest and nice words

Viva INTERNATIONAL HMG

Post by **srvet_claudio** » Thu Dec 12, 2013 1:26 pm

esgici wrote:Hi All

After OEM->ANSI, now we are in another migration ANSI -> Unicode.

We have string based conversion functions, but as far as I know not one file based.

I guess many people can't start that migration due to difficulty for convert program source files manually.

I hope this small program ( either considerable as an utility or not ) will be useful to our friends.

As always, please don't forgive my faults; any suggestion, correction, bug report are welcome

Viva HMG

Very Nice Friend!!!

mustafa · Post by **mustafa** » Thu Dec 12, 2013 5:43 pm

Hola amigos:
En primer lugar felicidades a Esgici
por tu programa.

Los Viejos "Dinosaurios" que procedemos de Summer87
de Clipper y dBfast, reconozco que nos cuesta el
reciclaje, yo personalmente casi nunca huso el IDE, ni los
ficheros FMG, siempre he escrito los programas con Notepad
por defecto guarda con ANSI y UTF-8 estoy haciendo pruebas
con la nueva versión de HMG 3.2
un fichero ANSI no refleja compilado correctamente los
caracteres "& Ñ ñ € % $ @ #" si lo reconvierto a UTF-8
si me salen correcto todos.

Sin poner SET CODEPAGE TO SPANICH

Menos el ---> & no se si hay
que usar CHR(068), sigue sin salir nada.

Esgici indica "UT8 with BOM" que en el Notepad no está
solo UTF-8 es lo mismo ?

Tengo que reciclar todos los mis códigos fuentes de ANSI
a UT8 with BOM ? en Notepad++ si que vi la opción:
Encode in UTF-8 Without BOM ó en Encode in UTF-8
El mismo fichero en ANSI Notepad --------------> 1994 bytes
UTF-8 Esgici --------------> 2003 bytes
UTF-8 Notepad -------------> 2002 bytes

Perdonad mi ignorancia pero en este tema, por mucho que
he leido todos los Post no entiendo si para que los
nuevos códigos fuentes para que funcionen correctamente
compilados con HMG 3.2 hay que Guardar como UTF-8

Gracias , un saludo
Mustafa

*-------------------------------------------------------------*
Hello friends :
Firstly congratulations to Esgici
for your program.

The "Dinosaurs " Old who come from Summer87
Clipper and DBFAST , we recognize that the costs
recycling , I personally almost never All the IDE , nor
FMG files , programs have always written with Notepad
default saved with ANSI and UTF -8 I'm doing tests
with the new version 3.2 of HMG
an ANSI file not compiled correctly reflects the
characters "& Ñ ñ € % $ @ #" if reconvierto to UTF-8
if I go all right .

Without calling TO SET CODEPAGE SPANICH

Less --- > & if not
to use CHR ( 068) , still not out anything.

Esgici indicates " UT8 with BOM " in the Notepad is not
only UTF- 8 is the same ?

I have to recycle all my source codes of ANSI
UT8 with a BOM ? in Notepad + + if I saw the option :
Encode in UTF -8 Without BOM or Encode in UTF -8
The same file in ANSI Notepad -------------- > 1994 bytes
UTF -8 Esgici -------------- > 2003 bytes
UTF -8 Notepad ------------- > 2002 bytes

Forgive my ignorance on this subject but , much as
I read all posts so that I do not understand if the
new source codes to work properly
HMG compiled with 3.2 should save as UTF -8

Thanks , a greeting
Mustafa

mol · Post by **mol** » Thu Dec 12, 2013 6:55 pm

Clipper and harbour adds chr(26) (EOF) to the end of file, It's the reason of different lengths of result files.

mustafa · Post by **mustafa** » Thu Dec 12, 2013 7:28 pm

Hola Mol
el código ----> &
creo que es CHR(038)
tampoco sale nada, fichero guardado en UTF-8
gracias
Mustafa
*--------------------------------------*
Hello Mol
code ----> &
I think it's CHR (038)
not miss anything, save file in UTF-8
thanks
Mustafa

mustafa · Post by **mustafa** » Thu Dec 12, 2013 7:55 pm

Hola Mol
Curiosamente si pones:
@ 210,100 LABEL Label_c VALUE "ampersand "+ chr(038) WIDTH 290 HEIGHT 25 FONT "ARIAL" SIZE 14
solo sale ------------> ampersand , no sale simbolo &
pero si pones:
@ 310,100 LABEL Label_d VALUE "ampersand "+"&" + chr(038) WIDTH 290 HEIGHT 25 FONT "ARIAL" SIZE 14
sale correcto --------> ampersand &
Guardado fichero con UTF-8
Curioso
Mustafa
*-------------------------------------------------*
Hello Mol
Interestingly if you put:
@ 210,100 Label_c LABEL VALUE "ampersand" + chr (038) 290 HEIGHT 25 WIDTH FONT "ARIAL" SIZE 14
only goes ------------> ampersand, no sale symbol &
but if you put:
@ 310.100 Label_d LABEL VALUE "ampersand" + "&" + chr (038) 290 HEIGHT 25 WIDTH FONT "ARIAL" SIZE 14
goes right ampersand &
save file with UTF-8
curious
Mustafa

HMGforum.com

ANSI -> UT8 Conversion

ANSI -> UT8 Conversion

Re: ANSI -> UT8 Conversion

Re: ANSI -> UT8 Conversion

Re: ANSI -> UT8 Conversion

Re: ANSI -> UT8 Conversion

Re: ANSI -> UT8 Conversion

Re: ANSI -> UT8 Conversion

Re: ANSI -> UT8 Conversion

Re: ANSI -> UT8 Conversion

Re: ANSI -> UT8 Conversion