Talk:NYSIIS

From Rosetta Code

task status

What must one do to accomplish the task? It doesn't say. --Paddy3118 13:06, 23 March 2013 (UTC)

It is still ambiguous. Really the task was promoted out of draft too soon. Typical nysiis usage assumes removal of common suffixes before encoding, and the reference implementation ( Caché ObjectScript ) did that, so I did as well in Perl 6. It probably should be put into the task description though (and soon, before too many implementations are added). Also, typically, the algorithm is used to encode both first and last names in one go. None of the implementations as of now will handle that. IE "John Smith" -> nysiis -> "JAN SNAT".
There are a few ambiguities in the algorithm description too IMO. Should you consider the first letter when encoding H? Should Wheeler encode as WALAR or WHALAR? I believe that it should consider the first character (WALAR - W is a non-vowel so remove the H) but it is open to interpretation.
Also, When removing terminal S or A. If the last characters are AS and you remove the S, do you then need to remove the newly terminal A or should you only remove the last character once? Should Louis encode as L or LA? Again, it is open to interpretation. --Thundergnat 21:03, 23 March 2013 (UTC)
Well I changed it back to draft status as I thought it needed just this kind of discussion to improve the task description.
How about changing the task description to explicitly state that just the wikipedia algorithm need be implemented but leaving it open for implementors to add extra functionality as long as they state that they are doing that (and maybe what extra functionality is being added).? --Paddy3118 06:49, 24 March 2013 (UTC)

Apologies to all, I am new to this site and regret not being more precise about this particular task. I should have also read the guidelines before starting. The task is basically to implement the NYSIIS algorithm, but I have noticed the original algorithm has subsequently been modified by others (presumably to improve its indexing capabilities). I propose the main task should be implement the standard NYSIIS algorithm, as shown on Wikipedia, which I believe is based on a single name. There should then be the optional task of allowing multiple names to be processed (I will give some examples of those, including double-barrelled names, or double surname, including removing unnecessary suffixes/honours that are not required for indexing purposes). Does this sound okay? BTW, I appreciate all the helpful comments already posted here. --Toucanbird 11:42, 24 March 2013 (UTC)

Hey Toucanbird, that's OK - I was a newbie here once too. I followed links on suffixes/honours and found that there are so many of them that just listing them would take up too much space and that was just for British ones. Just following the wp (wikipedia) algorithm might make for the best task but you might allow for people to show how to handle a small sample of suffices as extra credit. What do you think? --Paddy3118 16:07, 24 March 2013 (UTC)
I have amended the task wording slightly to allow for a small selection of suffixes/honours to suffice for demonstration purposes - feel free to amend further if necessary. --Toucanbird 19:42, 24 March 2013 (UTC)

post-nominal letters

As far as the problem of the numerous post-nominal letters (honorific, professional, generational, and others ...), most of them have a common identifier   (just to list a very small sampling):

A.B  Atty.  B.A.  B.E.  B.F.A.  B.S.  B.Sc.  B.Tech.  C.S.V.  CEng.  CFA.  D.C.  D.D.  D.O.  D.Phil.
Dr.  e.g.  Ed.D.  Eng.D.  Esq.  etc.  family.  grandfather.  herself.  himself.  II.  III. IV.  J.D
J.D.  Jnr.  Jr.  Junior.  K.B.E.  L.  L.L.B  lawyer.  LL.D  LL.D.  LL.M  M.A.  M.B.A.  M.D.  M.Eng
M.F.A.  M.L.A.  M.S.  M.Sc.  Master.  MEOA.  Minor.  Miss.  Mr.  Mrs.  Ms.  Mz.  nephew.  O-3.
O.F.M.  P.E.  P.G.  Ph.D.  Pharm.D.  R.A.  R.I.P  S.  Snr.  son.  Sr.

The REXX entry that I coded tests for the secret character in the last word in the name, and if found, ignores it   (elides it from the name).   -- Gerard Schildberger (talk) 23:27, 12 August 2015 (UTC)

Uncertainty about wikipedia spec

For a name like 'JOHN DOE' is 'D' a "first character of a name"? Is 'N' a "last character of a name"? And does each step use the same definition for these concepts? --Rdm (talk) 02:40, 13 August 2015 (UTC)

Rules 7, 8 and 9 and non-alphabetic characters

Although I'm implementing an Algol 68 sample following the interpretations of the Wikipedia entry as used by the other samples, I'm having trouble with it. I realise this is a task from many years ago but...

Looking at the Wikipedia entry as at 2nd August 2024, it doesn't say that only one of rules 7, 8 and 9 can apply, whereas rule 5 specifically states that only one of its cases can apply.
The existing samples treat rules 7, 8 and 9 as being mutually exclusive, but it seems the me that Mathews should yield MAT and Willis should yield WAL - unless I've missed some other rule that forces MATA and WALA ?

Additionally, (almost?) all the current samples remove non-alphabetic characters as a first step, whereas the task states only that all whitespace characters should be removed. The Wikipedia entry doesn't say anything about non-alphabetic characters.

Anyone any thoughts or can you point out the error of my ways ??
--Tigerofdarkness (talk) 14:27, 2 August 2024 (UTC)

And now the bad news... : (
Apache have an implementation of the NYSIIS algorithm in their Apache Commons codec library and it encodes Willis as WAL and Mathews as MAT. I therefore conclude that all a lot of the existing samples are incorrect - I've fixed the Algol 68 and Java samples and added a sample that uses the Apache library.
--Tigerofdarkness (talk) 20:55, 2 August 2024 (UTC)