TokenNum()

TokenNum()

Get the total number of tokens in a token environment

Syntax

      TokenNum( [<@cTokenEnvironment>] ) -> nNumberofTokens

Arguments

<@cTokenEnvironment> a token environment

Returns

<nNumberofTokens> number of tokens in the token environment

Description

The TokenNum() function can be used to retrieve the total number of tokens in a token environment. If the parameter <@cTokenEnvironment> is supplied (must be by reference), the information from this token environment is used, otherwise the global token environment is used.

Examples

      tokeninit( "a.b.c.d", ".", 1 )  // initialize global token environment
      ? TokenNum()  // --> 4

Compliance

TokenNum() is a new function in Harbour’s CT3 library.

Platforms

All

Files

Source is token2.c, library is libct.

Seealso

TOKENINIT(), TOKENEXIT(), TOKENNEXT(), TOKENAT(), SAVETOKEN(), RESTTOKEN(), TOKENEND()

TokenNext()

TokenNext()

Successivly obtains tokens from a string

Syntax

      TokenNext( <[@]cString>, [<nToken>],
                 [<@cTokenEnvironment>] ) -> cToken

Arguments

<[@]cString> the processed string <nToken> a token number

<@cTokenEnvironment> a token environment

Returns

<cToken> a token from <cString>

Description

With TokenNext(), the tokens determined with the TOKENINIT() functions can be retrieved. To do this, TokenNext() uses the information stored in either the global token environment or the local one supplied by <cTokenEnvironment>. Note that, is supplied, this 3rd parameter has always to be passed by reference.

If the 2nd parameter, <nToken> is given, TokenNext() simply returns the <nToken>th token without manipulating the TE counter. Otherwise the token pointed to by the TE counter is returned and the counter is incremented by one. Like this, a simple loop with TOKENEND() can be used to retrieve all tokens of a string successivly.

Note that <cString> does not have to be the same used in TOKENINIT(), so that one can do a “correlational tokenization”, i.e. tokenize a string as if it was another! E.G. using TOKENINIT() with the string “AA, BBB” but calling TokenNext() with “CCCEE” would give first “CC” and then “EE” (because “CCCEE” is not long enough).

Examples

      // default behavhiour
      tokeninit( cString ) // initialize a token environment
      DO WHILE ! tokenend()
         ? TokenNext( cString )  // get all tokens successivly
      ENDDO
      ? TokenNext( cString, 3 )  // get the 3rd token, counter will remain 
                                 // the same
      tokenexit()                // free the memory used for the global 
                                 // token environment

Compliance

TokenNext() is compatible with CT3’s TokenNext(), but there are two additional parameters featuring local token environments and optional access to tokens.

Platforms

All

Files

Source is token2.c, library is libct.

Seealso

TOKENINIT(), TOKENEXIT(), TOKENNUM(), TOKENAT(), SAVETOKEN(), RESTTOKEN(), TOKENEND()

TokenInit()

TokenInit()

Initializes a token environment

Syntax

      TokenInit( <[@]cString>], [<cTokenizer>], [<nSkipWidth>],
                 [<@cTokenEnvironment>] ) -> lState

Arguments

<[@]cString> is the processed string

<cTokenizer> is a list of characters separating the tokens in <cString> Default: chr(0) + chr(9) + chr(10) + chr(13) + chr(26) +  chr(32) + chr(32) + chr(138) + chr(141) +  “, .;:!\?/\\<>()#&%+-*”

<nSkipWidth> specifies the maximum number of successive tokenizing characters that are combined as ONE token stop, e.g. specifying 1 can yield to empty token Default: 0, any number of successive tokenizing characters are combined as ONE token stop

<@cTokenEnvironment> is a token environment stored in a binary encoded string

Returns

<lState> success of the initialization

Description

The TokenInit() function initializes a token environment. A token environment is the information about how a string is to be tokenized. This information is created in the process of tokenization of the string <cString> – equal to the one used in the TOKEN() function with the help of the <cTokenizer> and <nSkipWidth> parameters.

This token environment can be very useful when large strings have to be tokenized since the tokenization has to take place only once whereas the TOKEN() function must always start the tokenizing process from scratch.

Unlike CT3, this function provides two mechanisms of storing the resulting token environment. If a variable is passed by reference as 4th parameter, the token environment is stored in this variable, otherwise the global token environment is used. Do not modify the token environment string directly !

Additionally, a counter is stored in the token environment, so that the tokens can successivly be obtained. This counter is first set to 1. When the TokenInit() function is called without a string a tokenize, the counter of either the global environment or the environment given by reference in the 4th parameter is rewind to 1.

Additionally, unlike CT3, TokenInit() does not need the string <cString> to be passed by reference, since one must provide the string in calls to TOKENNEXT() again.

Examples

  TokenInit( cString )             // tokenize the string <cString> with 
                                   // default rules and store the token 
                                   // environment globally and eventually 
                                   // delete an old global token environment
  TokenInit( @cString )            // no difference in result, but eventually 
                                   // faster, since the string must not be 
  TokenInit()                      // copied rewind counter of global TE to 1
  TokenInit( "1,2,3", "," , 1 )    // tokenize constant string, store in 
                                   // global token environment  
  TokenInit( cString, , 1, @cTE1)  // tokenize cString and store token 
                                   // environment in cTE1 only without 
                                   // overriding global token environment
  TokenInit( cString, , 1, cTE1 )  // tokenize cString and store token 
                                   // environment in GLOBAL token environment 
                                   // since 4th parameter is not given by 
                                   // reference !!!
  TokenInit( ,,, @cTE1 )           // set counter in TE stored in cTE1 to 1

Compliance

TokenInit() is compatible with CT3’s TokenInit(), but there is an additional parameter featuring local token environments.

Platforms

All

Files

Source is token2.c, library is libct.

Seealso

TOKEN(), TOKENEXIT(), TOKENNEXT(), TOKENNUM(), TOKENAT(), SAVETOKEN(), RESTTOKEN(), TOKENEND()

TokenExit()

TokenExit()

Release global token environment

Syntax

      TokenExit() -> lStaticEnvironmentReleased

Returns

<lStaticEnvironmentReleased> .T., if global token environment is successfully released

Description

The TokenExit() function releases the memory associated with the global token environment. One should use it for every tokeninit() using the global token environment. Additionally, TokenExit() is implicitly called from CTEXIT() to free the memory at library shutdown.

Examples

      tokeninit( cString ) // initialize a token environment
      DO WHILE ! tokenend()
         ? tokennext( cString )  // get all tokens successivly
      ENDDO
      ? tokennext( cString, 3 )  // get the 3rd token, counter 
                                 // will remain the same
      TokenExit()                // free the memory used for the 
                                 // global token environment

Compliance

TokenExit() is a new function in Harbour’s CT3 library.

Platforms

All

Files

Source is token2.c, library is libct.

Seealso

TOKENINIT(), TOKENNEXT(), TOKENNUM(), TOKENAT(), SAVETOKEN(), RESTTOKEN(), TOKENEND()

TokenEnd()

TokenEnd()

Check whether additional tokens are available with TOKENNEXT()

Syntax

      TokenEnd( [<@cTokenEnvironment>] ) -> lTokenEnd

Arguments

<@cTokenEnvironment> a token environment

Returns

<lTokenEnd> .T., if additional tokens are available

Description

The TokenEnd() function can be used to check whether the next call to TOKENNEXT() would return a new token. This can not be decided with TOKENNEXT() alone, since an empty token cannot be distinguished from a “no more” tokens.

If the parameter <@cTokenEnvironment> is supplied (must be by reference), the information from this token environment is used, otherwise the global TE is used.

With a combination of TokenEnd() and TOKENNEXT(), all tokens from a string can be retrieved successivly (see example).

Examples

      tokeninit( "a.b.c.d", ".", 1 )  // initialize global token environment
      DO WHILE ! TokenEnd()
         ? tokennext( "a.b.c.d" )     // get all tokens successivly
      ENDDO

Compliance

TokenEnd() is compatible with CT3’s TokenEnd(), but there are is an additional parameter featuring local token environments.

Platforms

All

Files

Source is token2.c, library is libct.

Seealso

TOKENINIT(), TOKENEXIT(), TOKENNEXT(), TOKENNUM(), TOKENAT(), SAVETOKEN(), RESTTOKEN()

TokenAt()

TOKENAT()

Get start and end positions of tokens in a token environment

Syntax

      TOKENAT( [<lSeparatorPositionBehindToken>], [<nToken>],
               [<@cTokenEnvironment>] ) -> nPosition

Arguments

<lSeparatorPositionBehindToken> .T., if TOKENAT() should return the position of the separator character BEHIND the token. Default: .F., return start position of a token.

<nToken> a token number <@cTokenEnvironment> a token environment

Returns

<nPosition> See description

Description

The TOKENAT() function is used to retrieve the start and end position of the tokens in a token environment. Note however that the position of last character of a token is given by tokenat (.T.)-1 !!

If the 2nd parameter, <nToken> is given, TOKENAT() returns the positions of the <nToken>th token. Otherwise the token pointed to by the TE counter, i.e. the token that will be retrieved by TOKENNEXT() _NEXT_ is used.

If the parameter <@cTokenEnvironment> is supplied (must be by reference), the information from this token environment is used, otherwise the global TE is used.

Tests

      tokeninit( cString ) // initialize a token environment
      DO WHILE ! tokenend()
         ? "From", tokenat(), "to", tokenat( .T. ) - 1
         ? tokennext( cString )  // get all tokens successivly
      ENDDO
      ? tokennext( cString, 3 )  // get the 3rd token, 
// counter will remain the same tokenexit() // free the memory used for the
// global token environment

Compliance

TOKENAT() is compatible with CT3’s TOKENAT(), but there are two additional parameters featuring local token environments and optional access to tokens.

Platforms

All

Files

Source is token2.c, library is libct.

Seealso

TOKENINIT(), TOKENEXIT(), TOKENNEXT(), TOKENNUM(), SAVETOKEN(), RESTTOKEN(), TOKENEND()

SaveToken()

SaveToken()

Save the global token environment

Syntax

      SaveToken() -> cStaticTokenEnvironment

Returns

<cStaticTokenEnvironment> a binary string encoding the global token environment

Description

The SaveToken() function can be used to store the global token environment for future use or when two or more incremental tokenizers must the nested. Note however that the latter can now be solved with locally stored token environments.

Compliance

SaveToken() is compatible with CT3’s SaveToken(),

Platforms

All

Files

Source is token2.c, library is libct.

Seealso

TOKENINIT(), TOKENEXIT(), TOKENNEXT(), TOKENNUM(), TOKENAT(), RESTTOKEN(), TOKENEND()

CT_RESTTOKEN

 RESTTOKEN()
 Recreates an incremental tokenizer environment
------------------------------------------------------------------------------
 Syntax

     RESTTOKEN(<cTokenEnvironment>) --> cEmptyString

 Argument

     <cTokenEnvironment>  Designates a character string returned by the
     SAVETOKEN() function.

 Returns

     RESTTOKEN() always returns an empty string.

 Description

     The internal environment for the incremental tokenizer can be restored
     using RESTTOKEN().  RESTTOKEN() does the opposite of the SAVETOKEN()
     function.

 Note

     .  <nTokenEnvironment> must originate from the current program
        run; for example, it cannot have been restored from a (.mem) file.

 Examples

     .  Here is an incremental tokenizer.  Text is broken into
        individual lines, and each line is broken into words:

        TOKENINIT(@cTextString, CHR(13) + CHR(10), 2)
        cLine   :=  TOKENNEXT()

        DO WHILE .NOT. TOKENEND()
           cLine  :=  TOKENNEXT(cTextString)
           WORD(cLine)
        ENDDO

     .  The function then breaks the lines into words:

        FUNCTION WORD(cLine)

           cOldEnv  := SAVETOKEN()
           TOKENINIT(@cLine, " .,-:;")

           DO WHILE .NOT. TOKENEND()
              cWord  :=  TOKENNEXT(cLine)
              ? cWord
           ENDDO
           RESTTOKEN(cOldEnv)
           RETURN("")

See Also: SAVETOKEN() TOKENINIT() TOKENNEXT()

 

Tools – String Manipulations

Introduction 
ADDASCII()   Adds a value to each ASCII code in a string
AFTERATNUM() Returns remainder of a string after nth appearance of sequence
ASCIISUM()   Finds sum of the ASCII values of all the characters of a string
ASCPOS()     Determines ASCII value of a character at a position in a string
ATADJUST()   Adjusts the beginning position of a sequence within a string
ATNUM()      Determines the starting position of a sequence within a string
ATREPL()     Searches for a sequence within a string and replaces it
ATTOKEN()    Finds the position of a token within a string
BEFORATNUM() Returns string segment before the nth occurrence of a sequence
CENTER()     Centers a string using pad characters
CHARADD()    Adds the corresponding ASCII codes of two strings
CHARAND()    Links corresponding ASCII codes of paired strings with AND
CHAREVEN()   Returns characters in the even positions of a string
CHARLIST()   Lists each character in a string
CHARMIRR()   Mirrors characters within a string
CHARMIX()    Mixes two strings together
CHARNOLIST() Lists the characters that do not appear in a string
CHARNOT()    Complements each character in a string
CHARODD()    Returns characters in the odd positions of a string
CHARONE()    Reduces adjoining duplicate characters in string to 1 character
CHARONLY()   Determines the common denominator between two strings
CHAROR()     Joins the corresponding ASCII code of paired strings with OR
CHARPACK()   Compresses (packs) a string
CHARRELA()   Correlates the character positions in paired strings
CHARRELREP() Replaces characters in a string depending on their correlation
CHARREM()    Removes particular characters from a string
CHARREPL()   Replaces certain characters with others
CHARSORT()   Sorts sequences within a string
CHARSPREAD() Expands a string at the tokens
CHARSWAP()   Exchanges all adjoining characters in a string
CHARUNPACK() Decompresses (unpacks) a string
CHARXOR()    Joins ASCII codes of paired strings with exclusive OR operation
CHECKSUM()   Calculates the checksum for a character string (algorithm)
COUNTLEFT()  Counts a particular character at the beginning of a string
COUNTRIGHT() Counts a particular character at the end of a string
CRYPT()      Encrypts and decrypts a string
CSETATMUPA() Determines setting of the multi-pass mode for ATXXX() functions
CSETREF()    Determines whether reference sensitive functions return a value
EXPAND()     Expands a string by inserting characters
JUSTLEFT()   Moves characters from the beginning to the end of a string
JUSTRIGHT()  Moves characters from the end of a string to the beginning
LIKE()       Compares character strings using wildcard characters
LTOC()       Converts a logical value into a character
MAXLINE()    Finds the longest line within a string
NUMAT()      Counts the number of occurrences of a sequence within a string
NUMLINE()    Determines the number of lines required for string output
NUMTOKEN()   Determines the number of tokens in a string
PADLEFT()    Pads a string on the left to a particular length
PADRIGHT()   Pads a string on the right to a particular length
POSALPHA()   Determines position of first alphabetic character in a string
POSCHAR()    Replaces individual character at particular position in string
POSDEL()     Deletes characters at a particular position in a string
POSDIFF()    Finds the first position from which two strings differ
POSEQUAL()   Finds the first position at which two strings are the same
POSINS()     Inserts characters at a particular position within a string
POSLOWER()   Finds the position of the first lower case alphabetic character
POSRANGE()   Determines position of first character in an ASCII code range
POSREPL()    Replaces one or more characters from a certain position
POSUPPER()   Finds the position of the first uppercase, alphabetic character
RANGEREM()   Deletes characters that are within a specified ASCII code range
RANGEREPL()  Replaces characters within a specified ASCII code range
REMALL()     Removes characters from the beginning and end of a string
REMLEFT()    Removes particular characters from the beginning of a string
REMRIGHT()   Removes particular characters at the end of a string
REPLALL()    Exchanges characters at the beginning and end of a string
REPLLEFT()   Exchanges particular characters at the beginning of a string
REPLRIGHT()  Exchanges particular characters at the end of a string
RESTTOKEN()  Recreates an incremental tokenizer environment
SAVETOKEN()  Saves the incremental tokenizer environment to a variable
SETATLIKE()  Provides an additional search mode for all AT functions
STRDIFF()    Finds similarity between two strings (Levenshtein Distance)
STRSWAP()    Interchanges two strings
TABEXPAND()  Converts tabs to spaces
TABPACK()    Converts spaces in tabs
TOKEN()      Selects the nth token from a string
TOKENAT()    Determines the most recent TOKENNEXT() position within a string
TOKENEND()   Determines if more tokens are available in TOKENNEXT()
TOKENINIT()  Initializes a string for TOKENNEXT()
TOKENLOWER() Converts initial alphabetic character of a token into lowercase
TOKENNEXT()  Provides an incremental tokenizer
TOKENSEP()   Provides separator before/after most recently retrieved TOKEN()
TOKENUPPER() Converts the initial letter of a token into upper case
VALPOS()     Determines numerical value of character at particular position
WORDONE()    Reduces multiple appearances of double characters to one
WORDONLY()   Finds common denominator of 2 strings on double character basis
WORDREPL()   Replaces particular double characters with others
WORDSWAP()   Exchanges double characters lying beside each other in a string
WORDTOCHAR() Exchanges double characters for individual ones