SEDOLs

Revision as of 20:16, 2 August 2008 by 68.175.31.239 (talk) (New challenge: SEDOL numbers)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

For each number list of 6-digit SEDOLs, calculate and append the checksum digit. That is, given this input:

Task
SEDOLs
You are encouraged to solve this task according to the task description, using any language you may know.
710889
B0YBKJ
406566
B0YBLH
228276
B0YBKL
557910
B0YBKR
585284
B0YBKT

Produce this output:

7108899

B0YBKJ7 4065663 B0YBLH2 2282765 B0YBKL9 5579107 B0YBKR5 5852842

B0YBKT7

J

There are several ways to perform this in J. This most closely follows the algorithmic description at Wikipedia:

  sn   =.  '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'  
  ac0  =.  (, 10 | 1 3 1 7 3 9 +/@:* -)&.(sn i. |:)

However, so J is so concise, having written the above, it becomes clear that the negation (-) is unneccsary.

The fundamental operation is the linear combination (+/@:*) and neither argument is "special". In particular, the coefficients are just another array participating in the calculation, and there's no reason we can't modify them as easily as the input array. Having this insight, it is obvious that manipulating the coefficients, rather than the input array, will be more efficient (because the coefficients are fixed at small size, while the input array can be arbitrarily large).

Which leads us to this more efficient formulation:

  ac1  =.  (, 10 | (10 - 1 3 1 7 3 9) +/@:* ])&.(sn i. |:)

which reduces to:

  ac1  =.  (, 10 | 9 7 9 3 7 1 +/@:* ])&.(sn i. |:)

Which is just as concise as ac0, but faster.

Following this train of thought, our array thinking leads us to realize that even the modulous isn't neccesary. The number of SEDOL numbers is finite, as is the number of coefficients; therefore the number of possible linear combinations of these is finite. In fact, there are only 841 possible outcomes. This is a small number, and can be efficiently stored as a lookup table (even better, since the outcomes will be mod 10, they are restricted to the digits 0-9, and they repeat).

Which leads us to:

  ac2  =.  (,"1 0 (841 $ '0987654321') {~ 1 3 1 7 3 9 +/ .*~ sn i. ])

Which is more than twice as fast as even the optimized formulation (ac1), though it is slightly longer.