suppose you want to search a list of names to see if the name DataMonk is in the list.
You want to look for an exact match or names that are similar. This is called Fuzzy Matching.
Merging data sets on names with approximately the same spelling, or merging on times that are
within two minutes of each other are examples of these kinds of merges.
SAS has a in-built function called ‘spedis’, which stands for Spelling Distance.
The function can be used for Fuzzy Matching.
The SPEDIS function returns a 0 if the two arguments match exactly. The function
assigns penalty points for each type of spelling error. For example, getting the first letter
wrong is assigned more points than misspelling other letters. Interchanging two letters is
a relatively small error, as is adding an extra letter to a word.
To identify any name that is similar to DataMonk, you could extract all names where the
value returned by the SPEDIS function is less than some predetermined value.
Frequently SAS programmers must merge files where
the values of the key variables are only approximately the
same. Merging on names with approximately the same
spelling, or merging on times that are within three
minutes of each other are examples of these kinds of merges
Answers ( 2 )
suppose you want to search a list of names to see if the name DataMonk is in the list.
You want to look for an exact match or names that are similar. This is called Fuzzy Matching.
Merging data sets on names with approximately the same spelling, or merging on times that are
within two minutes of each other are examples of these kinds of merges.
SAS has a in-built function called ‘spedis’, which stands for Spelling Distance.
The function can be used for Fuzzy Matching.
The SPEDIS function returns a 0 if the two arguments match exactly. The function
assigns penalty points for each type of spelling error. For example, getting the first letter
wrong is assigned more points than misspelling other letters. Interchanging two letters is
a relatively small error, as is adding an extra letter to a word.
To identify any name that is similar to DataMonk, you could extract all names where the
value returned by the SPEDIS function is less than some predetermined value.
Frequently SAS programmers must merge files where
the values of the key variables are only approximately the
same. Merging on names with approximately the same
spelling, or merging on times that are within three
minutes of each other are examples of these kinds of merges