NYSIIS: Difference between revisions

708 bytes added ,  11 years ago
m
Modified task description for clarity
(→‎{{header|Tcl}}: Add Python)
m (Modified task description for clarity)
Line 2:
{{wikipedia}}
The [[wp:New York State Identification and Intelligence System|New York State Identification and Intelligence System phonetic code]], commonly known as NYSIIS, is a phonetic algorithm for creating indices for words based on their pronunciation. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.
 
The task here is to implement the original NYSIIS algorithm, shown in Wikipedia, rather than any other subsequent modification. Also, before the algorithm is applied the input string should be converted to upper case with all white space removed.
 
An optional step is to handle multiple names, including double-barrelled names or double surname (e.g. 'Hoyle-Johnson' or 'Vaughan Williams') and unnecessary suffixes/honours that are not required for indexing purposes (e.g. 'Jnr', 'Sr', 'III', 'CBE'). The original implementation is also restricted to six characters, but this is not a requirement.
 
;See also
* [[Soundex]]
Line 12 ⟶ 17:
{
 
ClassMethod Encode(pAlgorithm As %String = "", pName As %String = "", ByRefOutput pCode As %String, pSuffixRem As %Boolean = 1, pTruncate As %Integer = 0) As %Status
{
// check algorithm and name
Line 34 ⟶ 39:
"GCVO", "KCVO", "DCVO", "CVO", "LVO", "MVO", "OM", "ISO", "GBE", "KBE", "DBE", "CBE", "OBE", "MBE", "CH")
Set decs=$ListBuild("VC", "GC", "CGC", "RRC", "DSC", "MC", "DFC", "AFC", "ARRC", "OBI", "IOM")
Set regexp="( )(SNR$|SR$|JNR$|JR$|ESQ$|"_$ListToString(ords, "$|")_"$|"_$ListToString(decs, "$|")_"$|[IVX]+$)"
Set rem=##class(%Regex.Matcher).%New(regexp, pName)
Set pName=rem.ReplaceAll("")
Line 46 ⟶ 51:
Set pCode=""
For piece=1:1:$Length(pName, " ") {
If pAlgorithm="nysiis" Set pCode=pCode_..NYSIIS($Piece(pName, " ", piece))
}
If pTruncate {
Line 67 ⟶ 72:
*/
// create regexp matcher instance, remove punctuation and convert all to upper case
Set rem=##class(%Regex.Matcher).%New(" ")
Set rem.Text=$ZConvert($ZStrip(pName, "*P"), "U")
Line 159 ⟶ 164:
Wheeler -> WHALAR
Louis XVI -> L
Hoyle-Johnson -> HAYLJA[NSAN]
Vaughan Williams -> VAGANW[ALAN]
D'Souza -> DSAS
de Sousa -> DSAS
</pre>