Revision as of 19:15, 23 March 2013 (view source) rosettacode>Paddy3118 (→‎{{header\|Tcl}}: Add Python) ← Older edit		Revision as of 12:48, 24 March 2013 (view source) rosettacode>Toucanbird m (Modified task description for clarity) Newer edit →
Line 2: {{wikipedia}} The [[wp:New York State Identification and Intelligence System\|New York State Identification and Intelligence System phonetic code]], commonly known as NYSIIS, is a phonetic algorithm for creating indices for words based on their pronunciation. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The task here is to implement the original NYSIIS algorithm, shown in Wikipedia, rather than any other subsequent modification. Also, before the algorithm is applied the input string should be converted to upper case with all white space removed. An optional step is to handle multiple names, including double-barrelled names or double surname (e.g. 'Hoyle-Johnson' or 'Vaughan Williams') and unnecessary suffixes/honours that are not required for indexing purposes (e.g. 'Jnr', 'Sr', 'III', 'CBE'). The original implementation is also restricted to six characters, but this is not a requirement. ;See also * [[Soundex]] Line 12 ⟶ 17: { ClassMethod Encode(pAlgorithm As %String = "", pName As %String = "", ~~ByRef~~Output pCode As %String, pSuffixRem As %Boolean = 1, pTruncate As %Integer = 0) As %Status { // check algorithm and name Line 34 ⟶ 39: "GCVO", "KCVO", "DCVO", "CVO", "LVO", "MVO", "OM", "ISO", "GBE", "KBE", "DBE", "CBE", "OBE", "MBE", "CH") Set decs=$ListBuild("VC", "GC", "CGC", "RRC", "DSC", "MC", "DFC", "AFC", "ARRC", "OBI", "IOM") Set regexp="( )(SNR$\|SR$\|JNR$\|JR$\|ESQ$\|"_$ListToString(ords, "$\|")_"$\|"_$ListToString(decs, "$\|")_"$\|[IVX]+$)" Set rem=##class(%Regex.Matcher).%New(regexp, pName) Set pName=rem.ReplaceAll("") Line 46 ⟶ 51: Set pCode="" For piece=1:1:$Length(pName, " ") { If pAlgorithm="nysiis" Set pCode=pCode_..NYSIIS($Piece(pName, " ", piece)) } If pTruncate { Line 67 ⟶ 72: / // create regexp matcher instance, remove punctuation and convert all to upper case Set rem=##class(%Regex.Matcher).%New(" ") Set rem.Text=$ZConvert($ZStrip(pName, "P"), "U") Line 159 ⟶ 164: Wheeler -> WHALAR Louis XVI -> L Hoyle-Johnson -> HAYLJA[NSAN] Vaughan Williams -> VAGANW[ALAN] D'Souza -> DSAS de Sousa -> DSAS </pre>

NYSIIS: Difference between revisions

NYSIIS (view source)

Revision as of 12:48, 24 March 2013