ANSI -> UT8 Conversion
Moderator: Rathinagiri
- esgici
- Posts: 4543
- Joined: Wed Jul 30, 2008 9:17 pm
- DBs Used: DBF
- Location: iskenderun / Turkiye
- Contact:
ANSI -> UT8 Conversion
Hi All
After OEM->ANSI, now we are in another migration ANSI -> Unicode.
We have string based conversion functions, but as far as I know not one file based.
I guess many people can't start that migration due to difficulty for convert program source files manually.
I hope this small program ( either considerable as an utility or not ) will be useful to our friends.
As always, please don't forgive my faults; any suggestion, correction, bug report are welcome
Viva HMG
After OEM->ANSI, now we are in another migration ANSI -> Unicode.
We have string based conversion functions, but as far as I know not one file based.
I guess many people can't start that migration due to difficulty for convert program source files manually.
I hope this small program ( either considerable as an utility or not ) will be useful to our friends.
As always, please don't forgive my faults; any suggestion, correction, bug report are welcome
Viva HMG
Viva INTERNATIONAL HMG
-
- Posts: 1275
- Joined: Tue Sep 03, 2013 4:22 am
- Location: Tecámac, México
Re: ANSI -> UT8 Conversion
Gracias Sr. Esgici por compartir.
Saludos
////////////////////////////////////////////////////////////
Thanks for sharing Mr. Esgici.
regards
Saludos
////////////////////////////////////////////////////////////
Thanks for sharing Mr. Esgici.
regards
- serge_girard
- Posts: 3169
- Joined: Sun Nov 25, 2012 2:44 pm
- DBs Used: 1 MySQL - MariaDB
2 DBF - Location: Belgium
- Contact:
Re: ANSI -> UT8 Conversion
Thanks Esgici !
I'm busy converting 242Kb PRG files (12) to UTF-8.
I have a few suggestions:
1) Please make backup before conversion.
2) Progress of files + progress of lines instead of only individual lines.
Greetings and I will let you know the results!
Thanks, Serge
I'm busy converting 242Kb PRG files (12) to UTF-8.
I have a few suggestions:
1) Please make backup before conversion.
2) Progress of files + progress of lines instead of only individual lines.
Greetings and I will let you know the results!
Thanks, Serge
There's nothing you can do that can't be done...
- serge_girard
- Posts: 3169
- Joined: Sun Nov 25, 2012 2:44 pm
- DBs Used: 1 MySQL - MariaDB
2 DBF - Location: Belgium
- Contact:
Re: ANSI -> UT8 Conversion
Esgici,
Conversion went well (a bit slow).
I did a file compare and all converted (=new) have 3 bytes extra at the very beginning: ef bb bf (= BOM)
and at the end an extra 0d 0a (= CRLF)
So all looks very good. Later I will try compilation in HMG3.2; I will let you know.
Greetings, Serge
Conversion went well (a bit slow).
I did a file compare and all converted (=new) have 3 bytes extra at the very beginning: ef bb bf (= BOM)
and at the end an extra 0d 0a (= CRLF)
Code: Select all
20131212 09:22:18
Folder : P:\hmg.3.2\KEMP
ANALYSE.TXT : 101 lines, 1,757 bytes converted to UT8 format in 1,762 bytes.
BH_DOC.Prg : 534 lines, 12,841 bytes converted to UT8 format in 12,846 bytes.
BH_DOWNLOADS.Prg : 302 lines, 7,357 bytes converted to UT8 format in 7,360 bytes.
BH_EXE.Prg : 415 lines, 10,444 bytes converted to UT8 format in 10,449 bytes.
BH_PROG_AUTH.Prg : 677 lines, 18,935 bytes converted to UT8 format in 18,940 bytes.
bh_proj.Prg : 1,564 lines, 42,694 bytes converted to UT8 format in 42,699 bytes.
BH_TEXT.Prg : 571 lines, 13,446 bytes converted to UT8 format in 13,449 bytes.
BH_USERS.Prg : 1,193 lines, 36,291 bytes converted to UT8 format in 36,346 bytes.
HALLOCKS.PRG : 526 lines, 10,236 bytes converted to UT8 format in 10,239 bytes.
INIT.PRG : 632 lines, 16,252 bytes converted to UT8 format in 16,255 bytes.
KEMP.PRG : 2,486 lines, 65,370 bytes converted to UT8 format in 65,378 bytes.
KEMP_HELP.PRG : 101 lines, 1,827 bytes converted to UT8 format in 1,832 bytes.
KEMP_SETUP.prg : 540 lines, 12,338 bytes converted to UT8 format in 12,341 bytes.
20131212 10:09:13
Greetings, Serge
There's nothing you can do that can't be done...
- esgici
- Posts: 4543
- Joined: Wed Jul 30, 2008 9:17 pm
- DBs Used: DBF
- Location: iskenderun / Turkiye
- Contact:
Re: ANSI -> UT8 Conversion
Thanks to interested
Serge:
Sadly for now I haven't enough time to deal extra features
For some probable future works please give me a road map for backing up: what kind of backup will be better,
If only difference is BOM between two format, this means ANSI file don't include foreign ( non-English ) characters.
Cause of last extra CRLF may be different; anyway this isn't an important problem, I think
Anyway thanks to interest and nice words
Viva INTERNATIONAL HMG
Serge:
Sadly for now I haven't enough time to deal extra features
For some probable future works please give me a road map for backing up: what kind of backup will be better,
- rename original files
move original files to a separate folder
compress original files
... etc
If only difference is BOM between two format, this means ANSI file don't include foreign ( non-English ) characters.
Cause of last extra CRLF may be different; anyway this isn't an important problem, I think
Anyway thanks to interest and nice words
Viva INTERNATIONAL HMG
Viva INTERNATIONAL HMG
- srvet_claudio
- Posts: 2193
- Joined: Thu Feb 25, 2010 8:43 pm
- Location: Uruguay
- Contact:
Re: ANSI -> UT8 Conversion
Very Nice Friend!!!esgici wrote:Hi All
After OEM->ANSI, now we are in another migration ANSI -> Unicode.
We have string based conversion functions, but as far as I know not one file based.
I guess many people can't start that migration due to difficulty for convert program source files manually.
I hope this small program ( either considerable as an utility or not ) will be useful to our friends.
As always, please don't forgive my faults; any suggestion, correction, bug report are welcome
Viva HMG
- mustafa
- Posts: 1162
- Joined: Fri Mar 20, 2009 11:38 am
- DBs Used: DBF
- Location: Alicante - Spain
- Contact:
Re: ANSI -> UT8 Conversion
Hola amigos:
En primer lugar felicidades a Esgici
por tu programa.
Los Viejos "Dinosaurios" que procedemos de Summer87
de Clipper y dBfast, reconozco que nos cuesta el
reciclaje, yo personalmente casi nunca huso el IDE, ni los
ficheros FMG, siempre he escrito los programas con Notepad
por defecto guarda con ANSI y UTF-8 estoy haciendo pruebas
con la nueva versión de HMG 3.2
un fichero ANSI no refleja compilado correctamente los
caracteres "& Ñ ñ € % $ @ #" si lo reconvierto a UTF-8
si me salen correcto todos.
Sin poner SET CODEPAGE TO SPANICH
Menos el ---> & no se si hay
que usar CHR(068), sigue sin salir nada.
Esgici indica "UT8 with BOM" que en el Notepad no está
solo UTF-8 es lo mismo ?
Tengo que reciclar todos los mis códigos fuentes de ANSI
a UT8 with BOM ? en Notepad++ si que vi la opción:
Encode in UTF-8 Without BOM ó en Encode in UTF-8
El mismo fichero en ANSI Notepad --------------> 1994 bytes
UTF-8 Esgici --------------> 2003 bytes
UTF-8 Notepad -------------> 2002 bytes
Perdonad mi ignorancia pero en este tema, por mucho que
he leido todos los Post no entiendo si para que los
nuevos códigos fuentes para que funcionen correctamente
compilados con HMG 3.2 hay que Guardar como UTF-8
Gracias , un saludo
Mustafa
*-------------------------------------------------------------*
Hello friends :
Firstly congratulations to Esgici
for your program.
The "Dinosaurs " Old who come from Summer87
Clipper and DBFAST , we recognize that the costs
recycling , I personally almost never All the IDE , nor
FMG files , programs have always written with Notepad
default saved with ANSI and UTF -8 I'm doing tests
with the new version 3.2 of HMG
an ANSI file not compiled correctly reflects the
characters "& Ñ ñ € % $ @ #" if reconvierto to UTF-8
if I go all right .
Without calling TO SET CODEPAGE SPANICH
Less --- > & if not
to use CHR ( 068) , still not out anything.
Esgici indicates " UT8 with BOM " in the Notepad is not
only UTF- 8 is the same ?
I have to recycle all my source codes of ANSI
UT8 with a BOM ? in Notepad + + if I saw the option :
Encode in UTF -8 Without BOM or Encode in UTF -8
The same file in ANSI Notepad -------------- > 1994 bytes
UTF -8 Esgici -------------- > 2003 bytes
UTF -8 Notepad ------------- > 2002 bytes
Forgive my ignorance on this subject but , much as
I read all posts so that I do not understand if the
new source codes to work properly
HMG compiled with 3.2 should save as UTF -8
Thanks , a greeting
Mustafa
En primer lugar felicidades a Esgici
por tu programa.
Los Viejos "Dinosaurios" que procedemos de Summer87
de Clipper y dBfast, reconozco que nos cuesta el
reciclaje, yo personalmente casi nunca huso el IDE, ni los
ficheros FMG, siempre he escrito los programas con Notepad
por defecto guarda con ANSI y UTF-8 estoy haciendo pruebas
con la nueva versión de HMG 3.2
un fichero ANSI no refleja compilado correctamente los
caracteres "& Ñ ñ € % $ @ #" si lo reconvierto a UTF-8
si me salen correcto todos.
Sin poner SET CODEPAGE TO SPANICH
Menos el ---> & no se si hay
que usar CHR(068), sigue sin salir nada.
Esgici indica "UT8 with BOM" que en el Notepad no está
solo UTF-8 es lo mismo ?
Tengo que reciclar todos los mis códigos fuentes de ANSI
a UT8 with BOM ? en Notepad++ si que vi la opción:
Encode in UTF-8 Without BOM ó en Encode in UTF-8
El mismo fichero en ANSI Notepad --------------> 1994 bytes
UTF-8 Esgici --------------> 2003 bytes
UTF-8 Notepad -------------> 2002 bytes
Perdonad mi ignorancia pero en este tema, por mucho que
he leido todos los Post no entiendo si para que los
nuevos códigos fuentes para que funcionen correctamente
compilados con HMG 3.2 hay que Guardar como UTF-8
Gracias , un saludo
Mustafa
*-------------------------------------------------------------*
Hello friends :
Firstly congratulations to Esgici
for your program.
The "Dinosaurs " Old who come from Summer87
Clipper and DBFAST , we recognize that the costs
recycling , I personally almost never All the IDE , nor
FMG files , programs have always written with Notepad
default saved with ANSI and UTF -8 I'm doing tests
with the new version 3.2 of HMG
an ANSI file not compiled correctly reflects the
characters "& Ñ ñ € % $ @ #" if reconvierto to UTF-8
if I go all right .
Without calling TO SET CODEPAGE SPANICH
Less --- > & if not
to use CHR ( 068) , still not out anything.
Esgici indicates " UT8 with BOM " in the Notepad is not
only UTF- 8 is the same ?
I have to recycle all my source codes of ANSI
UT8 with a BOM ? in Notepad + + if I saw the option :
Encode in UTF -8 Without BOM or Encode in UTF -8
The same file in ANSI Notepad -------------- > 1994 bytes
UTF -8 Esgici -------------- > 2003 bytes
UTF -8 Notepad ------------- > 2002 bytes
Forgive my ignorance on this subject but , much as
I read all posts so that I do not understand if the
new source codes to work properly
HMG compiled with 3.2 should save as UTF -8
Thanks , a greeting
Mustafa
Re: ANSI -> UT8 Conversion
Clipper and harbour adds chr(26) (EOF) to the end of file, It's the reason of different lengths of result files.
- mustafa
- Posts: 1162
- Joined: Fri Mar 20, 2009 11:38 am
- DBs Used: DBF
- Location: Alicante - Spain
- Contact:
Re: ANSI -> UT8 Conversion
Hola Mol
el código ----> &
creo que es CHR(038)
tampoco sale nada, fichero guardado en UTF-8
gracias
Mustafa
*--------------------------------------*
Hello Mol
code ----> &
I think it's CHR (038)
not miss anything, save file in UTF-8
thanks
Mustafa
el código ----> &
creo que es CHR(038)
tampoco sale nada, fichero guardado en UTF-8
gracias
Mustafa
*--------------------------------------*
Hello Mol
code ----> &
I think it's CHR (038)
not miss anything, save file in UTF-8
thanks
Mustafa
- mustafa
- Posts: 1162
- Joined: Fri Mar 20, 2009 11:38 am
- DBs Used: DBF
- Location: Alicante - Spain
- Contact:
Re: ANSI -> UT8 Conversion
Hola Mol
Curiosamente si pones:
@ 210,100 LABEL Label_c VALUE "ampersand "+ chr(038) WIDTH 290 HEIGHT 25 FONT "ARIAL" SIZE 14
solo sale ------------> ampersand , no sale simbolo &
pero si pones:
@ 310,100 LABEL Label_d VALUE "ampersand "+"&" + chr(038) WIDTH 290 HEIGHT 25 FONT "ARIAL" SIZE 14
sale correcto --------> ampersand &
Guardado fichero con UTF-8
Curioso
Mustafa
*-------------------------------------------------*
Hello Mol
Interestingly if you put:
@ 210,100 Label_c LABEL VALUE "ampersand" + chr (038) 290 HEIGHT 25 WIDTH FONT "ARIAL" SIZE 14
only goes ------------> ampersand, no sale symbol &
but if you put:
@ 310.100 Label_d LABEL VALUE "ampersand" + "&" + chr (038) 290 HEIGHT 25 WIDTH FONT "ARIAL" SIZE 14
goes right ampersand &
save file with UTF-8
curious
Mustafa
Curiosamente si pones:
@ 210,100 LABEL Label_c VALUE "ampersand "+ chr(038) WIDTH 290 HEIGHT 25 FONT "ARIAL" SIZE 14
solo sale ------------> ampersand , no sale simbolo &
pero si pones:
@ 310,100 LABEL Label_d VALUE "ampersand "+"&" + chr(038) WIDTH 290 HEIGHT 25 FONT "ARIAL" SIZE 14
sale correcto --------> ampersand &
Guardado fichero con UTF-8
Curioso
Mustafa
*-------------------------------------------------*
Hello Mol
Interestingly if you put:
@ 210,100 Label_c LABEL VALUE "ampersand" + chr (038) 290 HEIGHT 25 WIDTH FONT "ARIAL" SIZE 14
only goes ------------> ampersand, no sale symbol &
but if you put:
@ 310.100 Label_d LABEL VALUE "ampersand" + "&" + chr (038) 290 HEIGHT 25 WIDTH FONT "ARIAL" SIZE 14
goes right ampersand &
save file with UTF-8
curious
Mustafa