StrDiff()

StrDiff()

Evaluate the “Edit (Levensthein) Distance” of two strings

Syntax

      StrDiff( <cString1>, <cString2>, [<nReplacementPenalty>], 
               [<nDeletionPenalty>], [<nInsertionPenalty>] ) 
               -> <nDistance>

Arguments

<cString1> string at the “starting point” of the transformation process, default is “”

<cString2> string at the “end point” of the transformation process, default is “”

<nReplacementPenalty> penalty points for a replacement of one character, default is 3

<nDeletionPenalty> penalty points for a deletion of one character, default is 6

<nInsertionPenalty> penalty points for an insertion of one character, default is 1

Returns

<nDistance> penalty point sum of all operations needed to transform <cString1> to <cString2>

Description

The StrDiff() functions calculates the so called “Edit” or “Levensthein” distance of two strings. This distance is a measure for the number of single character replace/insert/delete operations (so called “point mutations”) required to transform <cString1> into <cString2> and its value will be the smallest sum of the penalty points of the required operations.

Be aware that this function is both quite time – O(len(cString1)*len(cString2)) – and memory consuming – O((len(cString1)+1)*(len(cString2)+1)*sizeof(int)) – so keep the strings as short as possible. E.g., on common 32 bit systems (sizeof(int) == 4), calling StrDiff() with two strings of 1024 bytes in length will consume 4 MB of memory. To not impose unneeded restrictions, the function will only check if (len(cString1)+1)*(len(cString2)+1)*sizeof(int) <= UINT_MAX, although allocing UINT_MAX bytes will not work on most systems. If this simple check fails, -1 is returned.

Also, be aware that there can be an overflow when the penalty points are summed up: Assuming that the number of transformation operations is in the order of max(len(cString1), len(cString2)), the penalty point sum, that is internally stored in an “int” variable, is in the order of (max(len(cString1), len(cString2))*max(nReplacementPenalty, nDeletionPenalty, nInsertionPentaly). The StrDiff() does not do an overflow check due to time performance reasons. Future versions of StrDiff() could use a type different to “int” to store the penalty point sum to save memory or to avoid overflows.

The function is aware of the settings done by SETATLIKE(), that means that the wildchar character is considered equal to ALL characters.

Examples

      ? StrDiff( "ABC", "ADC" ) // 3, one character replaced
      ? StrDiff( "ABC", "AEC" ) // 3, dito
      ? StrDiff( "CBA", "ABC" ) // 6, two characters replaced
      ? StrDiff( "ABC", "AXBC" ) // 1, one character inserted
      ? StrDiff( "AXBC", "ABC" ) // 6, one character removed
      ? StrDiff( "AXBC", "ADC" ) // 9, one character removed and one replaced

Tests

      StrDiff( "ABC", "ADC" ) == 3
      StrDiff( "ABC", "AEC" ) == 3
      StrDiff( "CBA", "ABC" ) == 6
      StrDiff( "ABC", "AXBC" ) == 1
      StrDiff( "AXBC", "ABC" ) == 6
      StrDiff( "AXBC", "ADC" ) == 9

Compliance

StrDiff() is compatible with CT3’s StrDiff().

Platforms

All

Files

Source is strdiff.c, library is libct.

Seealso

SETATLIKE()

2 responses to “StrDiff()

  1. Pingback: Harbour String Functions | Viva Clipper !

  2. Pingback: Harbour All Functions – S | Viva Clipper !

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.