Determine if a string has all unique characters: Difference between revisions

Content added Content deleted

Inline

Revision as of 01:55, 30 October 2019

Task

Given a character string (which may be empty, or have a length of zero characters):

create a function/procedure/routine to:

determine if all the characters in the string are unique
indicate if or which character is duplicated and where

display each string and it's length (as the strings are being examined)
a zero─length (empty) string shall be considered as unique
process the strings from left─to─right
if unique, display a message saying such
if not unique, then:

display a message saying such
display what character is duplicated
only the 1^st non─unique character need be displayed
display where "both" duplicated characters are in the string
the above messages can be part of a single message
display the hexadecimal value of the duplicated character

Use (at least) these five test values (strings):

a string of length 0 (an empty string)
a string of length 1 which is a single period (.)
a string of length 6 which contains: abcABC
a string of length 7 which contains a blank in the middle: XYZ ZYX
a string of length 36 which doesn't contain the letter "oh":

1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ

Show all output here on this page.

Factor

<lang factor>USING: accessors formatting generalizations io kernel math.parser regexp sequences sets strings ;

>dup-char< ( str n -- char hex first-index second-index )

   1string tuck [ dup first >hex ] 2dip <regexp>
   all-matching-slices first2 [ from>> ] bi@ ;

duplicate-info. ( str -- )

   dup duplicates
   [ >dup-char< "'%s' (0x%s) at indices %d and %d.\n" printf ]
   with each nl ;

uniqueness-report. ( str -- )

   dup dup length "%u — length %d — contains " printf dup
   all-unique? [ drop "all unique characters." print nl ]
   [ "duplicate characters:" print duplicate-info. ] if ;

"" "." "abcABC" "XYZ ZYX" "1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" [ uniqueness-report. ] 5 napply</lang>

Output:

"" — length 0 — contains all unique characters.

"." — length 1 — contains all unique characters.

"abcABC" — length 6 — contains all unique characters.

"XYZ ZYX" — length 7 — contains duplicate characters:
'Z' (0x5a) at indices 2 and 4.
'Y' (0x59) at indices 1 and 5.
'X' (0x58) at indices 0 and 6.

"1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" — length 36 — contains duplicate characters:
'0' (0x30) at indices 9 and 24.

Perl 6

Works with: Rakudo version 2019.07.1

Perl 6 works with unicode natively and handles combining characters and multi-byte emoji correctly. In the last string, notice the the length is correctly shown as 11 characters and that the delta with a combining circumflex in position 6 is not the same as the deltas without in positions 5 & 9.

<lang perl6> -> $str {

   print "\n{$str.perl} (length: {$str.chars}), has ";
   if my $match = $str.match( / (.).*$0 /, :ex ) {
       my %m;
       %m{.values.Str}.append(flat 1 + .from, .pos) for $match.list;
       say "duplicated characters:";
       say "'{.key}' ({.key.uninames}; hex ordinal: {(.key.ords).fmt: "0x%X"})" ~
       " in positions: {.value.sort.squish.join: ', '}" for %m.sort( *.value[0] );
   } else {
       say "no duplicated characters."
   }

} for

   ,
   '.',
   'abcABC',
   'XYZ ZYX',
   '1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ',
   '01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X',
   '🦋🙂👨‍👩‍👧‍👦🙄ΔΔ̂ 🦋Δ👍👨‍👩‍👧‍👦'</lang>

Output:

"" (length: 0), has no duplicated characters.

"." (length: 1), has no duplicated characters.

"abcABC" (length: 6), has no duplicated characters.

"XYZ ZYX" (length: 7), has duplicated characters:
'X' (LATIN CAPITAL LETTER X; hex ordinal: 0x58) in positions: 1, 7
'Y' (LATIN CAPITAL LETTER Y; hex ordinal: 0x59) in positions: 2, 6
'Z' (LATIN CAPITAL LETTER Z; hex ordinal: 0x5A) in positions: 3, 5

"1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" (length: 36), has duplicated characters:
'0' (DIGIT ZERO; hex ordinal: 0x30) in positions: 10, 25

"01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X" (length: 39), has duplicated characters:
'0' (DIGIT ZERO; hex ordinal: 0x30) in positions: 1, 11, 26, 38
'X' (LATIN CAPITAL LETTER X; hex ordinal: 0x58) in positions: 35, 39

"🦋🙂👨‍👩‍👧‍👦🙄ΔΔ̂ 🦋Δ👍👨‍👩‍👧‍👦" (length: 11), has duplicated characters:
'🦋' (BUTTERFLY; hex ordinal: 0x1F98B) in positions: 1, 8
'👨‍👩‍👧‍👦' (MAN ZERO WIDTH JOINER WOMAN ZERO WIDTH JOINER GIRL ZERO WIDTH JOINER BOY; hex ordinal: 0x1F468 0x200D 0x1F469 0x200D 0x1F467 0x200D 0x1F466) in positions: 3, 11
'Δ' (GREEK CAPITAL LETTER DELTA; hex ordinal: 0x394) in positions: 5, 9

REXX

<lang rexx>/*REXX pgm determines if a string is comprised of all unique characters (no duplicates).*/ @.= /*assign a default for the @. array. */ parse arg @.1 /*obtain optional argument from the CL.*/ if @.1= then do; @.1= /*Not specified? Then assume defaults.*/

                     @.2= .
                     @.3= 'abcABC'
                     @.4= 'XYZ ZYX'
                     @.5= '1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ'
               end

    do j=1;  if j\==1  &  @.j==  then leave   /*String is null & not j=1?  We're done*/
    say copies('─', 79)                         /*display a separator line  (a fence). */
    say 'Testing for the string (length' length(@.j)"): "   @.j
    say
    dup= isUnique(@.j)
    say 'The characters in the string'   word("are aren't", 1 + (dup>0) )  'all unique.'
    if dup==0  then iterate
    ?= substr(@.j, dup, 1)
    say 'The character '  ?  " ('"c2x(?)"'x)  at position "  dup ,
                                ' is repeated at position '  pos(?, @.j, dup+1)
    end   /*j*/

exit /*stick a fork in it, we're all done. */ /*──────────────────────────────────────────────────────────────────────────────────────*/ isUnique: procedure; parse arg x /*obtain the character string.*/

                      do k=1  to length(x) - 1           /*examine all but the last.   */
                      p= pos( substr(x, k, 1), x, k + 1) /*see if the Kth char is a dup*/
                      if p\==0  then return k            /*Find a dup? Return location.*/
                      end   /*k*/
         return 0                                        /*indicate all chars unique.  */</lang>

output when using the internal defaults

───────────────────────────────────────────────────────────────────────────────
Testing for the string (length 0):

The characters in the string are all unique.
───────────────────────────────────────────────────────────────────────────────
Testing for the string (length 1):  .

The characters in the string are all unique.
───────────────────────────────────────────────────────────────────────────────
Testing for the string (length 6):  abcABC

The characters in the string are all unique.
───────────────────────────────────────────────────────────────────────────────
Testing for the string (length 7):  XYZ ZYX

The characters in the string aren't all unique.
The character  X  ('58'x)  at position  1  is repeated at position  7
───────────────────────────────────────────────────────────────────────────────
Testing for the string (length 36):  1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ

The characters in the string aren't all unique.
The character  0  ('30'x)  at position  10  is repeated at position  25