I submitted similar proposals in November 2014 for HMG 3.3.1 (http://hmgforum.com/viewtopic.php?f=43&t=4071&start=11), and in September 2015 for HMG 3.4.1 (http://hmgforum.com/viewtopic.php?f=43&t=4471&start=38), but each time only a small portion of the changes I proposed were put into the next version. This time I am explaining more, so that it will be easier to understand how the changes I am proposing fit together.
These changes are a patch that should be installed on top of version 3.4.2. Individual modified source files are at http://kevincarmody.com/hmg/, and a zip of files in the patch is at http://kevincarmody.com/hmg/HmgChangeProposal.zip.
This patch includes an overhauled Rich Edit demo, which uses all the source code changes, except for the HasNonAnsiChars property and the SelPasteSpecial method. The new Rich Edit demo is at http://kevincarmody.com/hmg/SAMPLES/Con ... chEditBox/, including the executable at http://kevincarmody.com/hmg/SAMPLES/Con ... x/demo.exe.
New rich edit control property HASNONASCIICHARS (read only) Detects whether a rich edit control contains non-ASCII Unicode characters.
http://kevincarmody.com/hmg/INCLUDE/i_window.ch - line 175
Code: Select all
;; /*
Following line modified by Kevin Carmody, October 2015
It adds the HasNonAsciiChars and HasNonAnsiChars properties to the rich edit box control.
HasNonAsciiChars detects whether a rich edit control contains non-ASCII Unicode characters.
HasNonAnsiChars detects whether a rich edit control contains non-ANSI Unicode characters.
See
_RichEditBox_GetProperty() in SOURCE\h_controlmisc.prg
RichEditBox_HasNonAsciiChars() and RichEditBox_HasNonAnsiChars() in SOURCE\h_richeditbox.prg
HMG_IsNonASCII() and HMG_UTF8IsNonANSI() in SOURCE\h_UNICODE_String.prg
*/ ;;
#xtranslate <w>. \<c\> . \<p:RTFTextMode,AutoURLDetect,Zoom,SelectRange,CaretPos,Value,GetSelectText,GetTextLength,ViewRect,HasNonAsciiChars,HasNonAnsiChars\> => GetProperty ( <"w">, \<"c"\> , \<"p"\> ) ;;
Code: Select all
/*
Following two cases added by Kevin Carmody, October 2015
They adds the HasNonAsciiChars and HasNonAnsiChars properties to the rich edit box control.
HasNonAsciiChars detects whether a rich edit control contains non-ASCII Unicode characters.
HasNonAnsiChars detects whether a rich edit control contains non-ANSI Unicode characters.
See
HasNonAsciiChars and HasNonAnsiChars translations in SOURCE\i_window.ch
RichEditBox_HasNonAsciiChars() and RichEditBox_HasNonAnsiChars() in SOURCE\h_richeditbox.prg
HMG_IsNonASCII() and HMG_UTF8IsNonANSI() in SOURCE\h_UNICODE_String.prg
*/
CASE Arg3 == "HASNONASCIICHARS"
xData := RichEditBox_HasNonAsciiChars ( hWndControl )
RetVal := .T.
CASE Arg3 == "HASNONANSICHARS"
xData := RichEditBox_HasNonAnsiChars ( hWndControl )
RetVal := .T.
http://kevincarmody.com/hmg/SOURCE/h_richeditbox.prg - lines 532-538
Code: Select all
/*
Following function added by Kevin Carmody, October 2015
This function tests for the presence of non-ASCII characters in a rich
edit control. For efficiency, it does not distinguish between non-ASCII
ANSI and non-ASCII Unicode.
See
HasNonAsciiChars translation in SOURCE\i_window.ch
_RichEditBox_GetProperty() in SOURCE\h_controlmisc.prg
HMG_IsNonASCII() in SOURCE\h_UNICODE_String.prg
*/
*-----------------------------------------------------------------------------*
FUNCTION RichEditBox_HasNonAsciiChars( hWndControl )
*-----------------------------------------------------------------------------*
LOCAL cBuffer := RichEditBox_GetText( hWndControl, .N. )
RETURN HMG_IsNonASCII( cBuffer, .N. )
Determines whether a string contains any non-ASCII characters.
http://kevincarmody.com/hmg/SOURCE/h_UNICODE_String.prg - lines 329-343
Code: Select all
/*
Following function added by Kevin Carmody, October 2015
Determines whether a string contains any non-ASCII characters.
See
HasNonAsciiChars translation in SOURCE\i_window.ch
_RichEditBox_GetProperty() in SOURCE\h_controlmisc.prg
RichEditBox_HasNonAsciiChars() in SOURCE\h_richeditbox.prg
*/
FUNCTION HMG_IsNonASCII( cString )
LOCAL lNonASCII := .F.
LOCAL cChar
BEGIN SEQUENCE
FOR EACH cChar IN cString
IF cChar >= CHR( 0x80 )
lNonASCII := .T.
BREAK
ENDIF
NEXT
END SEQUENCE
RETURN lNonASCII
http://kevincarmody.com/hmg/INCLUDE/i_window.ch - line 175
Code: Select all
;; /*
Following line modified by Kevin Carmody, October 2015
It adds the HasNonAsciiChars and HasNonAnsiChars properties to the rich edit box control.
HasNonAsciiChars detects whether a rich edit control contains non-ASCII Unicode characters.
HasNonAnsiChars detects whether a rich edit control contains non-ANSI Unicode characters.
See
_RichEditBox_GetProperty() in SOURCE\h_controlmisc.prg
RichEditBox_HasNonAsciiChars() and RichEditBox_HasNonAnsiChars() in SOURCE\h_richeditbox.prg
HMG_IsNonASCII() and HMG_UTF8IsNonANSI() in SOURCE\h_UNICODE_String.prg
*/ ;;
#xtranslate <w>. \<c\> . \<p:RTFTextMode,AutoURLDetect,Zoom,SelectRange,CaretPos,Value,GetSelectText,GetTextLength,ViewRect,HasNonAsciiChars,HasNonAnsiChars\> => GetProperty ( <"w">, \<"c"\> , \<"p"\> ) ;;
Code: Select all
/*
Following two cases added by Kevin Carmody, October 2015
They adds the HasNonAsciiChars and HasNonAnsiChars properties to the rich edit box control.
HasNonAsciiChars detects whether a rich edit control contains non-ASCII Unicode characters.
HasNonAnsiChars detects whether a rich edit control contains non-ANSI Unicode characters.
See
HasNonAsciiChars and HasNonAnsiChars translations in SOURCE\i_window.ch
RichEditBox_HasNonAsciiChars() and RichEditBox_HasNonAnsiChars() in SOURCE\h_richeditbox.prg
HMG_IsNonASCII() and HMG_UTF8IsNonANSI() in SOURCE\h_UNICODE_String.prg
*/
CASE Arg3 == "HASNONASCIICHARS"
xData := RichEditBox_HasNonAsciiChars ( hWndControl )
RetVal := .T.
CASE Arg3 == "HASNONANSICHARS"
xData := RichEditBox_HasNonAnsiChars ( hWndControl )
RetVal := .T.
http://kevincarmody.com/hmg/SOURCE/h_richeditbox.prg - lines 555-561
Code: Select all
/*
Following function added by Kevin Carmody, October 2015
This function tests for the presence of non-ANSI characters in a rich
edit control. It is slower than RichEditBox_HasNonAsciiChars but does
not reject any Unicode characters that are in ANSI.
See
HasNonAnsiChars translation in SOURCE\i_window.ch
_RichEditBox_GetProperty() in SOURCE\h_controlmisc.prg
HMG_IsNonANSI() in SOURCE\h_UNICODE_String.prg
*/
*-----------------------------------------------------------------------------*
FUNCTION RichEditBox_HasNonAnsiChars( hWndControl )
*-----------------------------------------------------------------------------*
LOCAL cBuffer := RichEditBox_GetText( hWndControl, .N. )
RETURN HMG_UTF8IsNonANSI( cBuffer )
Determines whether a UTF-8 string contains any non-ANSI characters.
http://kevincarmody.com/hmg/SOURCE/h_UNICODE_String.prg - lines 358-348
Code: Select all
/*
Following function added by Kevin Carmody, October 2015
Determines whether a UTF-8 string contains any non-ANSI characters.
It does not check whether the string is valid UTF-8.
See
HasNonAnsiChars translation in SOURCE\i_window.ch
_RichEditBox_GetProperty() in SOURCE\h_controlmisc.prg
RichEditBox_HasNonAnsiChars() in SOURCE\h_richeditbox.prg
*/
FUNCTION HMG_UTF8IsNonANSI( cUtf8Str )
LOCAL aAnsiTrans := { ;
0x20AC, ; // ANSI 0x80 - EURO SIGN
0x201A, ; // ANSI 0x82 - SINGLE LOW-9 QUOTATION MARK
0x0192, ; // ANSI 0x83 - LATIN SMALL LETTER F WITH HOOK
0x201E, ; // ANSI 0x84 - DOUBLE LOW-9 QUOTATION MARK
0x2026, ; // ANSI 0x85 - HORIZONTAL ELLIPSIS
0x2020, ; // ANSI 0x86 - DAGGER
0x2021, ; // ANSI 0x87 - DOUBLE DAGGER
0x02C6, ; // ANSI 0x88 - MODIFIER LETTER CIRCUMFLEX ACCENT
0x2030, ; // ANSI 0x89 - PER MILLE SIGN
0x0160, ; // ANSI 0x8A - LATIN CAPITAL LETTER S WITH CARON
0x2039, ; // ANSI 0x8B - SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x0152, ; // ANSI 0x8C - LATIN CAPITAL LIGATURE OE
0x017D, ; // ANSI 0x8E - LATIN CAPITAL LETTER Z WITH CARON
0x2018, ; // ANSI 0x91 - LEFT SINGLE QUOTATION MARK
0x2019, ; // ANSI 0x92 - RIGHT SINGLE QUOTATION MARK
0x201C, ; // ANSI 0x93 - LEFT DOUBLE QUOTATION MARK
0x201D, ; // ANSI 0x94 - RIGHT DOUBLE QUOTATION MARK
0x2022, ; // ANSI 0x95 - BULLET
0x2013, ; // ANSI 0x96 - EN DASH
0x2014, ; // ANSI 0x97 - EM DASH
0x02DC, ; // ANSI 0x98 - SMALL TILDE
0x2122, ; // ANSI 0x99 - TRADE MARK SIGN
0x0161, ; // ANSI 0x9A - LATIN SMALL LETTER S WITH CARON
0x203A, ; // ANSI 0x9B - SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x0153, ; // ANSI 0x9C - LATIN SMALL LIGATURE OE
0x017E, ; // ANSI 0x9E - LATIN SMALL LETTER Z WITH CARON
0x0178 } // ANSI 0x9F - LATIN CAPITAL LETTER Y WITH DIAERESIS
LOCAL aAnsiSkip := { ;
0x81, ;
0x8D, ;
0x8F, ;
0x90, ;
0x9D }
LOCAL lNonANSI := .F.
LOCAL nOctets := 0
LOCAL cChar, nChar, nCode
BEGIN SEQUENCE
FOR EACH cChar IN cUtf8Str
nChar := HB_BCODE( cChar )
IF nOctets != 0
--nOctets
nCode := HB_BITOR( HB_BITSHIFT( nCode, 6 ), HB_BITAND( nChar, 0x3F ) )
IF nOctets == 0
DO CASE
CASE nCode >= 0x100
IF ASCAN( aAnsiTrans, nCode ) == 0
lNonANSI := .T.
ENDIF
CASE nCode >= 0xA0
CASE nCode >= 0x80
IF ASCAN( aAnsiSkip, nCode ) == 0
lNonANSI := .T.
ENDIF
ENDCASE
ENDIF
ELSEIF HB_BITAND( nChar, 0x80 ) != 0
DO WHILE HB_BITAND( nChar, 0x80 ) != 0
nChar := HB_BITAND( HB_BITSHIFT ( nChar, 1 ), 0xFF )
++nOctets
ENDDO
--nOctets
nCode := HB_BITAND( HB_BCODE( cChar ), HB_BITSHIFT( 1, nOctets ) - 1 )
ENDIF
NEXT
END SEQUENCE
RETURN lNonANSI
http://kevincarmody.com/hmg/INCLUDE/i_window.ch - lines 195-196, 203
Code: Select all
;; /*
Following 2 lines modified by Kevin Carmody, October 2015
They add the LoadFile and SaveFile methods to the rich edit box control
as synonyms for the RTFLoadFile and RTFSaveFile methods.
See
_RichEditBox_DoMethod() in SOURCE\h_controlmisc.prg
RichEditBox_LoadFile() and RichEditBox_SaveFile() in SOURCE\h_richeditbox.prg
RichEditBox_StreamIn() and RichEditBox_StreamOut() in SOURCE\c_richeditbox.c
HMG_UTF16ByteSwap() in SOURCE\h_UNICODE_String.prg
*/ ;;
#xtranslate <w>. \<c\> . \<p:RTFLoadFile,RTFSaveFile,LoadFile,SaveFile\> (\<arg1\>,\<arg2\>,\<arg3\>) => DoMethod ( <"w">, \<"c"\> , \<"p"\> , \<arg1\> , \<arg2\>, \<arg3\> ) ;;
#xtranslate <w>. \<c\> . \<p:RTFLoadFile,RTFSaveFile,LoadFile,SaveFile\> (\<arg1\>,\<arg2\>) => DoMethod ( <"w">, \<"c"\> , \<"p"\> , \<arg1\> , \<arg2\> ) ;;
;; /*
Following line added by Kevin Carmody, October 2015
It allows the RTFLoadFile, RTFSaveFile, LoadFile, and SaveFile methods to be called with one argument,
the file name. The second argument, lSelection (RichEditBox_LoadFile() in h_richeditbox.prg), defaults to .F.
*/ ;;
#xtranslate <w>. \<c\> . \<p:RTFLoadFile,RTFSaveFile,LoadFile,SaveFile\> (\<arg1\>) => DoMethod ( <"w">, \<"c"\> , \<"p"\> , \<arg1\> ) ;;
Code: Select all
/*
Following two cases modified by Kevin Carmody, October 2015
These cases use the xData argument defined above to return a value from
the Rich Edit methods RTFLoadFile, LoadFile, RTFSaveFile, and SaveFile.
*/
CASE Arg3 == HMG_UPPER ("RTFLoadFile") .OR. Arg3 == HMG_UPPER ("LoadFile")
xData := RichEditBox_LoadFile ( hWndControl, Arg4, Arg5, Arg6 ) // by default save in SF_RTF format
RetVal := .T.
CASE Arg3 == HMG_UPPER ("RTFSaveFile") .OR. Arg3 == HMG_UPPER ("SaveFile")
xData := RichEditBox_SaveFile ( hWndControl, Arg4, Arg5, Arg6 ) // by default save in SF_RTF format
RetVal := .T.
http://kevincarmody.com/hmg/INCLUDE/i_window.ch - lines 195-196, 203
Code: Select all
;; /*
Following 2 lines modified by Kevin Carmody, October 2015
They add the LoadFile and SaveFile methods to the rich edit box control
as synonyms for the RTFLoadFile and RTFSaveFile methods.
See
_RichEditBox_DoMethod() in SOURCE\h_controlmisc.prg
RichEditBox_LoadFile() and RichEditBox_SaveFile() in SOURCE\h_richeditbox.prg
RichEditBox_StreamIn() and RichEditBox_StreamOut() in SOURCE\c_richeditbox.c
HMG_UTF16ByteSwap() in SOURCE\h_UNICODE_String.prg
*/ ;;
#xtranslate <w>. \<c\> . \<p:RTFLoadFile,RTFSaveFile,LoadFile,SaveFile\> (\<arg1\>,\<arg2\>,\<arg3\>) => DoMethod ( <"w">, \<"c"\> , \<"p"\> , \<arg1\> , \<arg2\>, \<arg3\> ) ;;
#xtranslate <w>. \<c\> . \<p:RTFLoadFile,RTFSaveFile,LoadFile,SaveFile\> (\<arg1\>,\<arg2\>) => DoMethod ( <"w">, \<"c"\> , \<"p"\> , \<arg1\> , \<arg2\> ) ;;
;; /*
Following line added by Kevin Carmody, October 2015
It allows the RTFLoadFile, RTFSaveFile, LoadFile, and SaveFile methods to be called with one argument,
the file name. The second argument, lSelection (RichEditBox_LoadFile() in h_richeditbox.prg), defaults to .F.
*/ ;;
#xtranslate <w>. \<c\> . \<p:RTFLoadFile,RTFSaveFile,LoadFile,SaveFile\> (\<arg1\>) => DoMethod ( <"w">, \<"c"\> , \<"p"\> , \<arg1\> ) ;;
Code: Select all
/*
Following two cases modified by Kevin Carmody, October 2015
These cases use the xData argument defined above to return a value from
the Rich Edit methods RTFLoadFile, LoadFile, RTFSaveFile, and SaveFile.
*/
CASE Arg3 == HMG_UPPER ("RTFLoadFile") .OR. Arg3 == HMG_UPPER ("LoadFile")
xData := RichEditBox_LoadFile ( hWndControl, Arg4, Arg5, Arg6 ) // by default save in SF_RTF format
RetVal := .T.
CASE Arg3 == HMG_UPPER ("RTFSaveFile") .OR. Arg3 == HMG_UPPER ("SaveFile")
xData := RichEditBox_SaveFile ( hWndControl, Arg4, Arg5, Arg6 ) // by default save in SF_RTF format
RetVal := .T.
- Pastes the clipboard into a rich edit box control using a specified format.
- Clipboard formats CF_* are declared in <winuser.h> and documented in MSDN.
Code: Select all
;; /*
Following line added by Kevin Carmody, October 2015
It adds the SelPasteSpecial method to the rich edit box control.
This method pastes the clipboard into a rich edit box control using a specified format.
See
_RichEditBox_DoMethod() in SOURCE\h_controlmisc.prg
RichEditBox_PasteSpecial() in SOURCE\c_richeditbox.c
*/ ;;
#xtranslate <w>. \<c\> . \<p:SelPasteSpecial\> (\<arg1\>) => DoMethod ( <"w">, \<"c"\> , \<"p"\> , \<arg1\> ) ;;
Code: Select all
CASE Arg3 == HMG_UPPER ("SelPasteSpecial")
RichEditBox_PasteSpecial ( hWndControl, Arg4 )
RetVal := .T.
- Skips over byte order marks in Unicode text files.
- Supports UTF-16 BE text files (big endian Unicode text file).
- The RTFUTF8 file type has been removed and the UTF-16 BE file type has been added. See below.
Code: Select all
;; /*
Following 2 lines modified by Kevin Carmody, October 2015
They add the LoadFile and SaveFile methods to the rich edit box control
as synonyms for the RTFLoadFile and RTFSaveFile methods.
See
_RichEditBox_DoMethod() in SOURCE\h_controlmisc.prg
RichEditBox_LoadFile() and RichEditBox_SaveFile() in SOURCE\h_richeditbox.prg
RichEditBox_StreamIn() and RichEditBox_StreamOut() in SOURCE\c_richeditbox.c
HMG_UTF16ByteSwap() in SOURCE\h_UNICODE_String.prg
*/ ;;
#xtranslate <w>. \<c\> . \<p:RTFLoadFile,RTFSaveFile,LoadFile,SaveFile\> (\<arg1\>,\<arg2\>,\<arg3\>) => DoMethod ( <"w">, \<"c"\> , \<"p"\> , \<arg1\> , \<arg2\>, \<arg3\> ) ;;
#xtranslate <w>. \<c\> . \<p:RTFLoadFile,RTFSaveFile,LoadFile,SaveFile\> (\<arg1\>,\<arg2\>) => DoMethod ( <"w">, \<"c"\> , \<"p"\> , \<arg1\> , \<arg2\> ) ;;
;; /*
Following line added by Kevin Carmody, October 2015
It allows the RTFLoadFile, RTFSaveFile, LoadFile, and SaveFile methods to be called with one argument,
the file name. The second argument, lSelection (RichEditBox_LoadFile() in h_richeditbox.prg), defaults to .F.
*/ ;;
#xtranslate <w>. \<c\> . \<p:RTFLoadFile,RTFSaveFile,LoadFile,SaveFile\> (\<arg1\>) => DoMethod ( <"w">, \<"c"\> , \<"p"\> , \<arg1\> ) ;;
Code: Select all
/*
Following two cases modified by Kevin Carmody, October 2015
These cases use the xData argument defined above to return a value from
the Rich Edit methods RTFLoadFile, LoadFile, RTFSaveFile, and SaveFile.
*/
CASE Arg3 == HMG_UPPER ("RTFLoadFile") .OR. Arg3 == HMG_UPPER ("LoadFile")
xData := RichEditBox_LoadFile ( hWndControl, Arg4, Arg5, Arg6 ) // by default save in SF_RTF format
RetVal := .T.
CASE Arg3 == HMG_UPPER ("RTFSaveFile") .OR. Arg3 == HMG_UPPER ("SaveFile")
xData := RichEditBox_SaveFile ( hWndControl, Arg4, Arg5, Arg6 ) // by default save in SF_RTF format
RetVal := .T.
The file types that RtfLoadFile and RtfSaveFile accept have been changed. The type parameter now accepts one of following constants:
- RICHEDITFILE_RTF - RTF file
- RICHEDITFILE_TEXTANSI - ANSI text file
- RICHEDITFILE_TEXTUTF8 - UTF-8 text file
- RICHEDITFILE_TEXTUTF16LE - UTF-16 LE (little endian) text file
- RICHEDITFILE_TEXTUTF16BE - UTF-16 BE (big endian) text file
- RICHEDITFILE_TEXT has been renamed to RICHEDITFILE_TEXTANSI for clarity.
- RICHEDITFILE_TEXTUTF16BE has been added because RtfLoadFile and RtfSaveFile now support UTF-16 BE text files. UTF-16 BE files are supported by Notepad and many other word processing applications.
- RICHEDITFILE_TEXTUTF16 has been renamed to RICHEDITFILE_TEXTUTF16LE to clearly distinguish it from RICHEDITFILE_TEXTUTF16BE.
- RICHEDITFILE_RTFUTF8 is unnecessary and has been removed. Although the Windows EM_STREAMIN and EM_STREAMOUT messages supports it, in practice this file type never occurs, and it is not supported by any standard word processing application that I have ever seen. In RTF files, non-ASCII characters are always written as ASCII escape sequences, e.g. \'e8 for è (CHR(0xE8)) and \u916? for Δ (Greek uppercase delta, U+0394, decimal 916). So UTF-8 on top of RTF is never needed.
Code: Select all
/*
Following 5 #defines modified by Kevin Carmody, October 2015
These constant names have been modified.
RICHEDITFILE_TEXT has been renamed to RICHEDITFILE_TEXTANSI for clarity.
RICHEDITFILE_TEXTUTF16BE has been added because RtfLoadFile now supports
UTF-16 BE text files. UTF-16 BE files are supported by Notepad and
many other word processing applications.
RICHEDITFILE_TEXTUTF16 has been renamed to RICHEDITFILE_TEXTUTF16LE to
clearly distinguish it from RICHEDITFILE_TEXTUTF16BE.
RICHEDITFILE_RTFUTF8 is unnecessary and has been removed. Although
the Windows EM_STREAMIN message supports it, in practice this file
type never occurs, and it is not supported by any standard word
processing application that I have ever seen. In RTF files, non-ASCII
characters are always written as ASCII escape sequences, e.g. \'e8 for
è (CHR(0xE8)) and \u916? for Greek uppercase delta (U+0394, decimal
916). So UTF-8 on top of RTF is never needed.
These values are returned by GetRichEditFileType() and are used by the
LoadFile, RtfLoadFile, SaveFile, and RtfSaveFile rich edit box methods.
See
GetRichEditFileType() in SOURCE\h_richeditbox.prg
_RichEditBox_DoMethod() in SOURCE\h_controlmisc.prg
RichEditBox_LoadFile() and RichEditBox_SaveFile() in SOURCE\h_richeditbox.prg
RichEditBox_StreamIn() and RichEditBox_StreamOut() in SOURCE\c_richeditbox.c
HMG_UTF16ByteSwap() in SOURCE\h_UNICODE_String.prg
*/
*****************
* File type *
*****************
#define RICHEDITFILE_TEXTANSI 1 // ANSI text file
#define RICHEDITFILE_TEXTUTF8 2 // UTF-8 text file
#define RICHEDITFILE_TEXTUTF16LE 3 // UTF-16 LE (little endian) text file
#define RICHEDITFILE_RTF 4 // RTF file
#define RICHEDITFILE_TEXTUTF16BE 5 // UTF-16 BE (big endian) text file
RtfLoadFile calls RICHEDITBOX_LOADFILE(), which has been enhanced.
- Skips over byte order marks in Unicode text files.
- Supports UTF-16 BE file type. This function supports UTF-16 BE text files by using HMG_UTF16ByteSwap() to first convert it a UTF-16 BE file to UTF-16 LE and then calling RichEditBox_StreamIn() on the UTF-16 LE file.
Code: Select all
/*
Following function modified by Kevin Carmody, October 2015
Changes
Skips over byte order marks in Unicode text files.
Supports UTF-16 BE text files (big endian Unicode text file).
This function supports UTF-16 BE text files by using HMG_UTF16ByteSwap()
to first convert it a UTF-16 BE file to UTF-16 LE and then calling
RichEditBox_StreamIn() on the UTF-16 LE file.
See
RICHEDITFILE_* constants in SOURCE\i_richeditbox.ch
RTFLoadFile and LoadFile translations in SOURCE\i_window.ch
_RichEditBox_DoMethod() in SOURCE\h_controlmisc.prg
RichEditBox_StreamIn() in SOURCE\c_richeditbox.c
HMG_UTF16ByteSwap() in SOURCE\h_UNICODE_String.prg
*/
*-----------------------------------------------------------------------------*
FUNCTION RichEditBox_LoadFile( hWndControl, cFile, lSelection, nType )
*-----------------------------------------------------------------------------*
LOCAL lSuccess := .F.
LOCAL cTempFile
IF ValType( lSelection ) <> "L"
lSelection := .F.
ENDIF
IF ValType( nType ) <> "N"
nType := RICHEDITFILE_RTF
ENDIF
lSuccess := RichEditBox_RTFLoadResourceFile( hWndControl, cFile, lSelection )
IF RichEditBox_RTFLoadResourceFile( hWndControl, cFile, lSelection )
lSuccess := .T.
ELSE
IF nType == RICHEDITFILE_TEXTUTF16BE
cTempFile := GETTEMPFOLDER() + "_RichEditLoadFile.txt"
lSuccess := HMG_UTF16ByteSwap( cFile, cTempFile )
IF lSuccess
lSuccess := RichEditBox_StreamIn( hWndControl, cTempFile, lSelection, RICHEDITFILE_TEXTUTF16LE )
ENDIF
DELETE FILE ( cTempFile )
ELSE
lSuccess := RichEditBox_StreamIn( hWndControl, cFile, lSelection, nType )
ENDIF
ENDIF
Return lSuccess
Converts between UTF-16 LE and UTF-16 BE files.
http://kevincarmody.com/hmg/SOURCE/h_UNICODE_String.prg - lines 454-498
Code: Select all
/*
Following function added by Kevin Carmody, October 2015
Converts between UTF-16 LE and UTF-16 BE files.
See
RichEditBox_LoadFile() and RichEditBox_SaveFile() in SOURCE\h_richeditbox.prg
RICHEDITFILE_* constants in SOURCE\i_richeditbox.ch
RichEditBox_StreamIn() and RichEditBox_StreamOut() in SOURCE\c_richeditbox.c
_RichEditBox_DoMethod() in SOURCE\h_controlmisc.prg
RTFLoadFile, LoadFile, RTFSaveFile, SaveFile translations in SOURCE\i_window.ch
*/
*-----------------------------------------------------------------------------*
FUNCTION HMG_UTF16ByteSwap( cInFile, cOutFile )
*-----------------------------------------------------------------------------*
LOCAL hInFile := FOPEN( cInFile , FO_READ )
LOCAL hOutFile := FCREATE( cOutFile )
LOCAL cInBuffer := SPACE( 0x400 )
LOCAL nBufRead := 1
LOCAL lSuccess := .N.
LOCAL cOutBuffer, cBytePair, nBufWrite, nByte
BEGIN SEQUENCE
IF hInFile < 0
BREAK
ENDIF
IF hOutFile < 0
BREAK
ENDIF
WHILE nBufRead > 0
cOutBuffer := ""
nBufRead := FREAD( hInFile, @cInBuffer, 0x400 )
IF nBufRead > 0
FOR nByte := 1 TO nBufRead STEP 2
cBytePair := SUBSTR( cInBuffer, nByte, 2 )
cOutBuffer += RIGHT( cBytePair, 1 ) + LEFT( cBytePair, 1 )
NEXT
nBufWrite := FWRITE( hOutFile, cOutBuffer )
IF nBufWrite < nBufRead
BREAK
ENDIF
ENDIF
ENDDO
lSuccess := .Y.
END SEQUENCE
FCLOSE( hInFile )
FCLOSE( hOutFile )
RETURN lSuccess
Now skips over byte order mark in Unicode text files.
http://kevincarmody.com/hmg/SOURCE/c_richeditbox.c - lines 198-263 (modified 207-209, 216-218, 233-250)
Code: Select all
/*
Following function modified by Kevin Carmody, October 2015
Now skips over byte order marks in Unicode text files.
This function does not directly support UTF-16 BE text files.
RichEditBox_LoadFile() supports it by using HMG_UTF16ByteSwap() to
first convert it a UTF-16 BE file to UTF-16 LE and then calling this
function on the UTF-16 LE file.
See
RichEditBox_LoadFile() in SOURCE\h_richeditbox.prg
HMG_UTF16ByteSwap() in SOURCE\h_UNICODE_String.prg
_RichEditBox_DoMethod() in SOURCE\h_controlmisc.prg
RTFLoadFile and LoadFile translations in SOURCE\i_window.ch
RICHEDITFILE_* constants in SOURCE\i_richeditbox.ch
*/
// RichEditBox_StreamIn ( hWndControl, cFileName, lSelection, nDataFormat )
HB_FUNC ( RICHEDITBOX_STREAMIN )
{
HWND hWndControl = (HWND) HMG_parnl (1);
TCHAR *cFileName = (TCHAR*) HMG_parc (2);
BOOL lSelection = (BOOL) hb_parl (3);
LONG nDataFormat = (LONG) hb_parnl (4);
HANDLE hFile;
// Following 3 lines added by Kevin Carmody, October 2015
BYTE bUtf8Bom[3];
BYTE bUtf16Bom[2];
DWORD dwRead;
EDITSTREAM es;
LONG Format;
switch( nDataFormat )
{
// Comments in this switch block modified by Kevin Carmody, October 2015
case 1: Format = SF_TEXT; break; // ANSI or UTF-8 with BOM or mixed (UTF-8 BOM is removed, overlong UTF-8 is accepted, invalid UTF-8 is read as ANSI)
case 2: Format = ( CP_UTF8 << 16 ) | SF_USECODEPAGE | SF_TEXT; break; // UTF-8 without BOM (BOM is not removed)
case 3: Format = SF_TEXT | SF_UNICODE; break; // UTF-16 LE without BOM (BOM is not removed)
case 4: Format = SF_RTF; break;
// case 5, UTF-8 RTF, removed by Kevin Carmody, October 2015, because it never occurs
default: Format = SF_RTF; break;
}
if ( lSelection )
Format = Format | SFF_SELECTION;
if( ( hFile = CreateFile (cFileName, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, NULL )) == INVALID_HANDLE_VALUE )
{ hb_retl (FALSE);
return;
}
// Following switch block added by Kevin Carmody, October 2015
switch( nDataFormat )
{
case 1: break;
case 2:
if ( ! ReadFile (hFile, bUtf8Bom, 3, &dwRead, NULL) ) // read past BOM if present
hb_retl (FALSE);
if ( ! ( dwRead == 3 && bUtf8Bom[0] == 0xEF && bUtf8Bom[1] == 0xBB && bUtf8Bom[2] == 0xBF ) )
SetFilePointer (hFile, 0, 0, FILE_BEGIN);
break;
case 3:
if ( ! ReadFile (hFile, bUtf16Bom, 2, &dwRead, NULL) ) // read past BOM if present
hb_retl (FALSE);
if ( ! ( dwRead == 2 && bUtf16Bom[0] == 0xFF && bUtf16Bom[1] == 0xFE ) )
SetFilePointer (hFile, 0, 0, FILE_BEGIN);
break;
case 4: break;
default: break;
}
es.pfnCallback = EditStreamCallbackRead;
es.dwCookie = (DWORD_PTR) hFile;
es.dwError = 0;
SendMessage ( hWndControl, EM_STREAMIN, (WPARAM) Format, (LPARAM) &es );
CloseHandle (hFile);
if( es.dwError )
hb_retl (FALSE);
else
hb_retl (TRUE);
}
Added xData variable to enable it to return a value from _RichEditBox_DoMethod(). xData is used to return a value from a Rich Edit method. This change parallels the xData variable in GetProperty() that is currently used to return a value from a GridEx, Tree, or Rich Edit property.
http://kevincarmody.com/hmg/SOURCE/h_controlmisc.prg - lines 8883, 8900-8902
Code: Select all
Function DoMethod ( Arg1 , Arg2 , Arg3 , Arg4 , Arg5 , Arg6 , Arg7 , Arg8 , Arg9 )
/*
Following line modified by Kevin Carmody, October 2015
Added xData variable to enable it to return a value from
_RichEditBox_DoMethod(). xData is used to return a value from a Rich
Edit method. This change parallels the xData variable in GetProperty()
that is currently used to return a value from a GridEx, Tree, or Rich
Edit property.
See
GetProperty() above
_GridEx_GetProperty(), _Tree_GetProperty(), _RichEditBox_GetProperty(), _RichEditBox_DoMethod() below
RTFLoadFile, RTFSaveFile, LoadFile, SaveFile translations in SOURCE\i_window.ch
RichEditBox_LoadFile() and RichEditBox_SaveFile() in SOURCE\h_richeditbox.prg
*/
Local xData, i, hWnd
Local cMacro, cControlType
IF _GridEx_DoMethod ( Arg1 , Arg2 , Arg3 , Arg4 , Arg5 , Arg6 , Arg7 , Arg8 , Arg9 ) == .T.
Return Nil
ENDIF
IF _Tree_DoMethod ( Arg1 , Arg2 , Arg3 , Arg4 , Arg5 , Arg6 , Arg7 , Arg8 , Arg9 ) == .T.
Return Nil
ENDIF
/*
Following 2 lines modified by Kevin Carmody, October 2015
These lines use the xData variable defined above to return a value from
a Rich Edit method.
*/
IF _RichEditBox_DoMethod ( @xData, Arg1 , Arg2 , Arg3 , Arg4 , Arg5 , Arg6 , Arg7 , Arg8 , Arg9 ) == .T.
Return xData
ENDIF
Added xData argument to enable it to return a value from the LoadFile, RtfLoadFile, SaveFile, and RtfSaveFile methods. This change parallels the xData argument that is currently used in _GridEx_GetProperty(), _Tree_GetProperty(), and _RichEditBox_GetProperty().
http://kevincarmody.com/hmg/SOURCE/h_controlmisc.prg - line 10464
Code: Select all
/*
Following line modified by Kevin Carmody, October 2015
Added xData argument to enable it to return a value to
_GetProperty(). xData is used to return a value from a Rich
Edit method. This change parallels the xData variable in
_RichEditBox_GetProperty().
See
GetProperty(), _GridEx_GetProperty(), _Tree_GetProperty(),
_RichEditBox_GetProperty() above
RTFLoadFile, RTFSaveFile, LoadFile, SaveFile translations in SOURCE\i_window.ch
RichEditBox_LoadFile() and RichEditBox_SaveFile() in SOURCE\h_richeditbox.prg
*/
Function _RichEditBox_DoMethod ( xData, Arg1 , Arg2 , Arg3 , Arg4 , Arg5 , Arg6 , Arg7 , Arg8 , Arg9 )
Code: Select all
/*
Following two cases modified by Kevin Carmody, October 2015
These cases use the xData argument defined above to return a value from
the Rich Edit methods RTFLoadFile, LoadFile, RTFSaveFile, and SaveFile.
*/
CASE Arg3 == HMG_UPPER ("RTFLoadFile") .OR. Arg3 == HMG_UPPER ("LoadFile")
xData := RichEditBox_LoadFile ( hWndControl, Arg4, Arg5, Arg6 ) // by default save in SF_RTF format
RetVal := .T.
CASE Arg3 == HMG_UPPER ("RTFSaveFile") .OR. Arg3 == HMG_UPPER ("SaveFile")
xData := RichEditBox_SaveFile ( hWndControl, Arg4, Arg5, Arg6 ) // by default save in SF_RTF format
RetVal := .T.
- Writes byte order marks to Unicode text files.
- Supports UTF-16 BE text files (big endian Unicode text file).
- The RTFUTF8 file type has been removed and the UTF-16 BE file type has been added. See below.
Code: Select all
;; /*
Following 2 lines modified by Kevin Carmody, October 2015
They add the LoadFile and SaveFile methods to the rich edit box control
as synonyms for the RTFLoadFile and RTFSaveFile methods.
See
_RichEditBox_DoMethod() in SOURCE\h_controlmisc.prg
RichEditBox_LoadFile() and RichEditBox_SaveFile() in SOURCE\h_richeditbox.prg
RichEditBox_StreamIn() and RichEditBox_StreamOut() in SOURCE\c_richeditbox.c
HMG_UTF16ByteSwap() in SOURCE\h_UNICODE_String.prg
*/ ;;
#xtranslate <w>. \<c\> . \<p:RTFLoadFile,RTFSaveFile,LoadFile,SaveFile\> (\<arg1\>,\<arg2\>,\<arg3\>) => DoMethod ( <"w">, \<"c"\> , \<"p"\> , \<arg1\> , \<arg2\>, \<arg3\> ) ;;
#xtranslate <w>. \<c\> . \<p:RTFLoadFile,RTFSaveFile,LoadFile,SaveFile\> (\<arg1\>,\<arg2\>) => DoMethod ( <"w">, \<"c"\> , \<"p"\> , \<arg1\> , \<arg2\> ) ;;
;; /*
Following line added by Kevin Carmody, October 2015
It allows the RTFLoadFile, RTFSaveFile, LoadFile, and SaveFile methods to be called with one argument,
the file name. The second argument, lSelection (RichEditBox_LoadFile() in h_richeditbox.prg), defaults to .F.
*/ ;;
#xtranslate <w>. \<c\> . \<p:RTFLoadFile,RTFSaveFile,LoadFile,SaveFile\> (\<arg1\>) => DoMethod ( <"w">, \<"c"\> , \<"p"\> , \<arg1\> ) ;;
Code: Select all
/*
Following two cases modified by Kevin Carmody, October 2015
These cases use the xData argument defined above to return a value from
the Rich Edit methods RTFLoadFile, LoadFile, RTFSaveFile, and SaveFile.
*/
CASE Arg3 == HMG_UPPER ("RTFLoadFile") .OR. Arg3 == HMG_UPPER ("LoadFile")
xData := RichEditBox_LoadFile ( hWndControl, Arg4, Arg5, Arg6 ) // by default save in SF_RTF format
RetVal := .T.
CASE Arg3 == HMG_UPPER ("RTFSaveFile") .OR. Arg3 == HMG_UPPER ("SaveFile")
xData := RichEditBox_SaveFile ( hWndControl, Arg4, Arg5, Arg6 ) // by default save in SF_RTF format
RetVal := .T.
The new function GetRichEditFileType() returns a file type which can be used for the file type. See the description of GetRichEditFileType() below.
RtfSaveFile calls RICHEDITBOX_SAVEFILE(), which has been enhanced.
- Writes byte order marks to Unicode text files.
- Supports UTF-16 BE file type. This function supports UTF-16 BE text files by first calling RichEditBox_StreamOut() to generate a UTF-16 LE file and then calling HMG_UTF16ByteSwap() to convert the UTF-16 LE file to UTF-16 BE.
Code: Select all
/*
Following function modified by Kevin Carmody, October 2015
Changes
Writes byte order marks to Unicode text files.
Supports UTF-16 BE text files (big endian Unicode text file).
This function supports UTF-16 BE text files by first calling
RichEditBox_StreamOut() to generate a UTF-16 LE file and then calling
HMG_UTF16ByteSwap() to convert the UTF-16 LE file to UTF-16 BE.
See
RICHEDITFILE_* constants in SOURCE\i_richeditbox.ch
RTFSaveFile and SaveFile translations in SOURCE\i_window.ch
_RichEditBox_DoMethod() in SOURCE\h_controlmisc.prg
RichEditBox_StreamOut() in SOURCE\c_richeditbox.c
HMG_UTF16ByteSwap() in SOURCE\h_UNICODE_String.prg
*/
*-----------------------------------------------------------------------------*
FUNCTION RichEditBox_SaveFile( hWndControl, cFile, lSelection, nType )
*-----------------------------------------------------------------------------*
LOCAL lSuccess := .N.
LOCAL cTempFile
IF ValType( lSelection ) <> "L"
lSelection := .F.
ENDIF
IF ValType( nType ) <> "N"
nType := RICHEDITFILE_RTF
ENDIF
IF nType == RICHEDITFILE_TEXTUTF16BE
cTempFile := GETTEMPFOLDER() + "_RichEditLoadFile.txt"
lSuccess := RichEditBox_StreamOut( hWndControl, cTempFile, lSelection, RICHEDITFILE_TEXTUTF16LE )
IF lSuccess
lSuccess := HMG_UTF16ByteSwap( cTempFile, cFile )
ENDIF
DELETE FILE ( cTempFile )
ELSE
lSuccess := RichEditBox_StreamOut( hWndControl, cFile, lSelection, nType )
ENDIF
RETURN lSuccess
Converts between UTF-16 LE and UTF-16 BE files.
http://kevincarmody.com/hmg/SOURCE/h_UNICODE_String.prg - lines 454-498
Code: Select all
/*
Following function added by Kevin Carmody, October 2015
Converts between UTF-16 LE and UTF-16 BE files.
See
RichEditBox_LoadFile() and RichEditBox_SaveFile() in SOURCE\h_richeditbox.prg
RICHEDITFILE_* constants in SOURCE\i_richeditbox.ch
RichEditBox_StreamIn() and RichEditBox_StreamOut() in SOURCE\c_richeditbox.c
_RichEditBox_DoMethod() in SOURCE\h_controlmisc.prg
RTFLoadFile, LoadFile, RTFSaveFile, SaveFile translations in SOURCE\i_window.ch
*/
*-----------------------------------------------------------------------------*
FUNCTION HMG_UTF16ByteSwap( cInFile, cOutFile )
*-----------------------------------------------------------------------------*
LOCAL hInFile := FOPEN( cInFile , FO_READ )
LOCAL hOutFile := FCREATE( cOutFile )
LOCAL cInBuffer := SPACE( 0x400 )
LOCAL nBufRead := 1
LOCAL lSuccess := .N.
LOCAL cOutBuffer, cBytePair, nBufWrite, nByte
BEGIN SEQUENCE
IF hInFile < 0
BREAK
ENDIF
IF hOutFile < 0
BREAK
ENDIF
WHILE nBufRead > 0
cOutBuffer := ""
nBufRead := FREAD( hInFile, @cInBuffer, 0x400 )
IF nBufRead > 0
FOR nByte := 1 TO nBufRead STEP 2
cBytePair := SUBSTR( cInBuffer, nByte, 2 )
cOutBuffer += RIGHT( cBytePair, 1 ) + LEFT( cBytePair, 1 )
NEXT
nBufWrite := FWRITE( hOutFile, cOutBuffer )
IF nBufWrite < nBufRead
BREAK
ENDIF
ENDIF
ENDDO
lSuccess := .Y.
END SEQUENCE
FCLOSE( hInFile )
FCLOSE( hOutFile )
RETURN lSuccess
Now writes byte order mark in Unicode text files.
http://kevincarmody.com/hmg/SOURCE/c_richeditbox.c - lines 294-349 (modified 303-305, 312-314, 329-336)
Code: Select all
/*
Following function modified by Kevin Carmody, October 2015
Now writes byte order mark in Unicode text files.
This function does not directly support a UTF-16 BE text file.
RichEditBox_SaveFile() supports it by first calling this function to
generate a UTF-16 LE file and then calling HMG_UTF16ByteSwap() to
convert the UTF-16 LE file to UTF-16 BE.
See
RichEditBox_SaveFile() in SOURCE\h_richeditbox.prg
HMG_UTF16ByteSwap() in SOURCE\h_UNICODE_String.prg
_RichEditBox_DoMethod() in SOURCE\h_controlmisc.prg
RTFSaveFile and SaveFile translations in SOURCE\i_window.ch
RICHEDITFILE_* constants in SOURCE\i_richeditbox.ch
*/
// RichEditBox_StreamOut ( hWndControl, cFileName, lSelection, nDataFormat )
HB_FUNC ( RICHEDITBOX_STREAMOUT )
{
HWND hWndControl = (HWND) HMG_parnl (1);
TCHAR *cFileName = (TCHAR*) HMG_parc (2);
BOOL lSelection = (BOOL) hb_parl (3);
LONG nDataFormat = (LONG) hb_parnl (4);
HANDLE hFile;
// Following 3 lines added by Kevin Carmody, October 2015
BYTE bUtf8Bom[3] = {0xEF, 0xBB, 0xBF};
BYTE bUtf16Bom[2] = {0xFF, 0xFE};
DWORD dwWritten;
EDITSTREAM es;
LONG Format;
switch( nDataFormat )
{
// Comments in this switch block modified by Kevin Carmody, October 2015
case 1: Format = SF_TEXT; break; // ANSI (non-ANSI characters are converted to question marks)
case 2: Format = ( CP_UTF8 << 16 ) | SF_USECODEPAGE | SF_TEXT; break; // UTF-8 without BOM
case 3: Format = SF_TEXT | SF_UNICODE; break; // UTF-16 LE without BOM
case 4: Format = SF_RTF; break;
// case 5, UTF-8 RTF, removed by Kevin Carmody, October 2015, because it never occurs
default: Format = SF_RTF; break;
}
if ( lSelection )
Format = Format | SFF_SELECTION;
if( ( hFile = CreateFile (cFileName, GENERIC_WRITE, FILE_SHARE_WRITE, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL )) == INVALID_HANDLE_VALUE )
{ hb_retl (FALSE);
return;
}
// Following switch block added by Kevin Carmody, October 2015
switch( nDataFormat )
{
case 1: break;
case 2: WriteFile( hFile, bUtf8Bom, 3, &dwWritten, NULL ); break; // write UTF-8 BOM at head of file
case 3: WriteFile( hFile, bUtf16Bom, 2, &dwWritten, NULL ); break; // write UTF-16 LE BOM at head of file
case 4: break;
default: break;
}
es.pfnCallback = EditStreamCallbackWrite;
es.dwCookie = (DWORD_PTR) hFile;
es.dwError = 0;
SendMessage ( hWndControl, EM_STREAMOUT, (WPARAM) Format, (LPARAM) &es );
CloseHandle (hFile);
if( es.dwError )
hb_retl (FALSE);
else
hb_retl (TRUE);
}
Other new function GETRICHEDITFILETYPE()
- Returns the file type of an RTF file or text file, which can be used as the file type parameter in the LoadFile, RtfLoadFile, SaveFile, and RtfSaveFile methods. This function examines the first few bytes of the file to see if there is an RTF header or Unicode byte order mark.
- When there is no RTF header or byte order mark: If the optional lUtf8Test argument is .T., then the whole file is scanned to see if it is in UTF-8 format. If so, then the file type is returned as UTF-8. Otherwise the file type is returned as ANSI.
Code: Select all
/*
Following function added by Kevin Carmody, October 2015
This function returns the file type of an RTF file or text file, which
can be used as the file type parameter in the LoadFile, RtfLoadFile,
SaveFile, and RtfSaveFile methods. This function examines the first few
bytes of the file to see if there is an RTF header or Unicode byte order
mark.
When there is no RTF header or byte order mark: If the optional
lUtf8Test argument is .T., then the whole file is scanned to see if it is
in UTF-8 format. If so, then the file type is returned as UTF-8.
Otherwise the file type is returned as ANSI.
See
RICHEDITFILE_* constants in SOURCE\i_richeditbox.ch
RTFLoadFile, LoadFile, RTFSaveFile, SaveFile translations in SOURCE\i_window.ch
_RichEditBox_DoMethod() in SOURCE\h_controlmisc.prg
RichEditBox_StreamIn() and RichEditBox_StreamOut() in SOURCE\c_richeditbox.c
HMG_UTF16ByteSwap() and HMG_IsUtf8() in SOURCE\h_UNICODE_String.prg
*/
*-----------------------------------------------------------------------------*
FUNCTION GetRichEditFileType ( cFile, lUtf8Test )
*-----------------------------------------------------------------------------*
LOCAL hFile := FOPEN( cFile, FO_READ )
LOCAL cBuffer := SPACE( 5 )
LOCAL nBufRead := 0
LOCAL nType := 0
/*
The following code block tests whether an umnarked text file contains
valid UTF-8 text with non-ASCII characters.
*/
LOCAL bIsUtf8NonAscii := {||
LOCAL lUtf8NonAscii := .N.
LOCAL cPartial := ''
cBuffer := SPACE( 0x400 )
nBufRead := 1
BEGIN SEQUENCE
WHILE nBufRead > 0
nBufRead := FREAD( hFile, @cBuffer, 0x400 )
IF nBufRead > 0 .AND. HMG_IsUtf8( cPartial + cBuffer, .N., .Y., @cPartial )
lUtf8NonAscii := .Y.
BREAK
ENDIF
ENDDO
IF ! EMPTY( cPartial )
lUtf8NonAscii := .N.
ENDIF
END SEQUENCE
RETURN lUtf8NonAscii
}
BEGIN SEQUENCE
IF hFile < 0
BREAK
ENDIF
nBufRead := FREAD( hFile, @cBuffer, 5 )
DO CASE
CASE nBufRead >= 5 .AND. LEFT( cBuffer, 5 ) == "{\rtf"
nType := RICHEDITFILE_RTF
CASE nBufRead >= 3 .AND. LEFT( cBuffer, 3 ) == E"\xEF\xBB\xBF"
nType := RICHEDITFILE_TEXTUTF8
CASE nBufRead >= 2 .AND. LEFT( cBuffer, 2 ) == E"\xFF\xFE"
nType := RICHEDITFILE_TEXTUTF16LE
CASE nBufRead >= 2 .AND. LEFT( cBuffer, 2 ) == E"\xFE\xFF"
nType := RICHEDITFILE_TEXTUTF16BE
CASE ! EMPTY( lUtf8Test ) .AND. bIsUtf8NonAscii:EVAL( )
nType := RICHEDITFILE_TEXTUTF8
OTHERWISE
nType := RICHEDITFILE_TEXTANSI
ENDCASE
END SEQUENCE
FCLOSE( hFile )
RETURN nType
- When cString is the empty string or is all ASCII: If the optional lAllowASCII argument is .T., then the return value is .T. Otherwise the return value is .F.
- When cString is valid UTF-8 except that it ends with an incomplete UTF-8 byte sequence: If the optional lAllowPartial argument is .T., then the return value is .T. and the incomplete byte sequence is passed back through the cPartial argument. Otherwise the return value is .F. This is useful when cString is a file buffer.
- The return value is .F. if cString encodes any code point greater than the Unicode limit of 0x10FFFF, or if it encodes any surrogate character, or if it contains an overlong UTF-8 byte sequence. One overlong sequnce is accepted, the 2-byte overlong sequence for the null character (0xC0 0x80), which is commonly accepted by UTF-8 parsers.
Code: Select all
/*
Following function modified by Kevin Carmody, October 2015
Changes
When cString is the empty string or is all ASCII: If the optional
lAllowASCII argument is .T., then the return value is .T. Otherwise
the return value is .F.
When cString is valid UTF-8 except that it ends with an incomplete
UTF-8 byte sequence: If the optional lAllowPartial argument is .T.,
then the return value is .T. and the incomplete byte sequence is
passed back through the cPartial argument. Otherwise the return
value is .F. This is useful when cString is a file buffer.
The return value is .F. if cString encodes any code point greater than
the Unicode limit of 0x10FFFF, or if it encodes any surrogate
character, or if it contains an overlong UTF-8 byte sequence. One
overlong sequnce is accepted, the 2-byte overlong sequence for the
null character (0xC0 0x80), which is commonly accepted by UTF-8
parsers.
See
GetRichEditFileType() in SOURCE\h_richeditbox.prg
HB_STRISUTF8 in \src\rtl\strutf8.c in Harbour source
is_utf8() at http://stackoverflow.com/questions/1031645/how-to-detect-utf-8-in-plain-c
*/
FUNCTION HMG_IsUTF8( cString, lAllowASCII, lAllowPartial, cPartial )
LOCAL lASCII := .T.
LOCAL lCheck := .F.
LOCAL lUTF8 := .T.
LOCAL nCBytes := 0
LOCAL nRBytes := 0
LOCAL cChar, nChar, nLead
IF lAllowASCII == NIL
lAllowASCII := .F.
ENDIF
IF lAllowPartial == NIL
lAllowPartial := .F.
ENDIF
BEGIN SEQUENCE
FOR EACH cChar IN cString
nChar := HB_BCODE( cChar )
IF nCBytes > 0 // check continuation bytes
IF nChar < 0x80 .OR. nChar > 0xBF // disallow invalid continuation byte
BREAK
ENDIF
IF lCheck // check first continuation byte for partially valid lead byte
SWITCH nLead
CASE 0xC0 // disallow 2-byte overlongs except overlong null character
IF nChar != 0x80
BREAK
ENDIF
EXIT
CASE 0xE0 // disallow 3-byte overlongs
IF nChar < 0xA0
BREAK
ENDIF
EXIT
CASE 0xED // disallow surrogates
IF nChar > 0x9F
BREAK
ENDIF
EXIT
CASE 0xF0 // disallow 4-byte overlongs
IF nChar < 0x90
BREAK
ENDIF
EXIT
CASE 0xF4 // disallow 4-byte sequences beyond end of Unicode
IF nChar > 0x8F
BREAK
ENDIF
EXIT
ENDSWITCH
lCheck := .F.
ENDIF
nCBytes --
nRBytes ++
ELSEIF nChar >= 0x80 // check lead byte
lASCII := .F.
nLead := nChar
IF nLead < 0xC0 .OR. nLead == 0xC1 .OR. nLead > 0xF4 // disallow invalid lead bytes
BREAK
ENDIF
lCheck := ( nLead == 0xC0 .OR. nLead == 0xE0 .OR. nLead == 0xED .OR. ;
nLead == 0xF0 .OR. nLead == 0xF4 ) // partially valid lead bytes
DO CASE // compute number of continuation bytes
CASE nLead <= 0xDF
nCBytes := 1
CASE nLead <= 0xEF
nCBytes := 2
OTHERWISE
nCBytes := 3
ENDCASE
nRBytes := 1
ENDIF
NEXT
RECOVER
lUTF8 := .F.
END SEQUENCE
IF lUTF8 .AND. nCBytes > 0
IF lAllowPartial
cPartial := RIGHT( cString, nRBytes )
ELSE
lUTF8 := .F.
ENDIF
ELSE
IF lAllowPartial
cPartial := ''
ENDIF
ENDIF
IF ! lAllowASCII .AND. lASCII
lUTF8 := .F.
ENDIF
RETURN lUTF8
- Main menu
- List of recently used files
- Resizable windows
- Ctrl-B, Ctrl-I, Ctrl-U supported
- File name in title, file name and page in status bar
- Modified flag, caps lock, num lock, insert status on status bar
- Window size, font name and size, file locations, file filters, recently used file names stored in registry
- Paragraph numbering
- Read and write text files
- Many other enhancements
The rich edit demo uses all the enhancements described above (except the HasNonAnsiChars property, SelPasteSpecial method, and HMG_UTF8InsertBOM function).
Source files changed: Source files added: Source files deleted: