Unique characters: Difference between revisions

From Rosetta Code
Content added Content deleted
(→‎{{header|jq}}: efficiency)
(Add 8080 assembly)
Line 13:
=={{header|8080 Assembly}}==
<lang 8080asm>puts: equ 9 ; CP/M print syscall
TERM: equ '$' ; CP/M string terminator
org 100h
jmp demo
;;; Given a list of strings, find characters appearing only
;;; in one string and once only.
;;; Input: DE = list of strings, BC = start of output
unique: xra a ; Zero out the workspace
mvi h,upage
mov l,a
uzbuf: mov m,a
inr l
jnz uzbuf
push d
ustr: pop h
mov e,m ; Load next string pointer
inx h
mov d,m
inx h
mov a,d ; End of list?
ora e
jz uclat ; Then go find the uniques
push h ; Otherwise, keep list pointer
mvi h,upage
uchar: ldax d ; Get character
cpi TERM ; Done?
jz ustr ; Then do next string
mov l,a ; Otherwise, count the character
inr m
inx d ; Next character
jmp uchar
uclat: lxi h,upage*256 ; Start of page
utst: dcr m ; Is this character included?
jnz uskip
mov a,l ; If so add it to the output
stax b
inx b
uskip: inr l ; Try next character
jnz utst
mvi a,TERM ; CP/M string terminator
stax b
;;; Demo code
demo: lxi b,outbuf ; Set BC to location of output buffe
lxi d,list ; Set DE to the list of strings
call unique ; Call the code
mvi c,puts ; Print the result
lxi d,outbuf
jmp 5
;;; List of strings
list: dw str1, str2, str3, 0
str1: db '133252abcdeeffd', TERM
str2: db 'a6789798st', TERM
str3: db 'yxcdfgxcyz', TERM
;;; Memory
upage: equ ($/256)+1 ; Workspace for 'unique'
outbuf: equ (upage+1)*256 ; Output </lang>

Revision as of 09:02, 6 August 2021

Unique characters is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Given a list of strings,   find characters appearing only in one string and once only.

The result should be given in alphabetical order.

Use the following list for this task:

        ["133252abcdeeffd",  "a6789798st",  "yxcdfgxcyz"]

Other tasks related to string operations:
Song lyrics/poems/Mad Libs/phrases

8080 Assembly

<lang 8080asm>puts: equ 9 ; CP/M print syscall TERM: equ '$' ; CP/M string terminator org 100h jmp demo ;;; Given a list of strings, find characters appearing only ;;; in one string and once only. ;;; Input: DE = list of strings, BC = start of output unique: xra a ; Zero out the workspace mvi h,upage mov l,a uzbuf: mov m,a inr l jnz uzbuf push d ustr: pop h mov e,m ; Load next string pointer inx h mov d,m inx h mov a,d ; End of list? ora e jz uclat ; Then go find the uniques push h ; Otherwise, keep list pointer mvi h,upage uchar: ldax d ; Get character cpi TERM ; Done? jz ustr ; Then do next string mov l,a ; Otherwise, count the character inr m inx d ; Next character jmp uchar uclat: lxi h,upage*256 ; Start of page utst: dcr m ; Is this character included? jnz uskip mov a,l ; If so add it to the output stax b inx b uskip: inr l ; Try next character jnz utst mvi a,TERM ; CP/M string terminator stax b ret ;;; Demo code demo: lxi b,outbuf ; Set BC to location of output buffe lxi d,list ; Set DE to the list of strings call unique ; Call the code mvi c,puts ; Print the result lxi d,outbuf jmp 5 ;;; List of strings list: dw str1, str2, str3, 0 str1: db '133252abcdeeffd', TERM str2: db 'a6789798st', TERM str3: db 'yxcdfgxcyz', TERM ;;; Memory upage: equ ($/256)+1 ; Workspace for 'unique' outbuf: equ (upage+1)*256 ; Output </lang>




The filtering here is case sensitive, the sorting dependent on locale.

<lang applescript>on uniqueCharacters(listOfStrings)

   set astid to AppleScript's text item delimiters
   set AppleScript's text item delimiters to ""
   set countedSet to current application's class "NSCountedSet"'s setWithArray:((listOfStrings as text)'s characters)
   set AppleScript's text item delimiters to astid
   set mutableSet to current application's class "NSMutableSet"'s setWithSet:(countedSet)
   tell countedSet to minusSet:(mutableSet)
   tell mutableSet to minusSet:(countedSet)
   set sortDescriptor to current application's class "NSSortDescriptor"'s sortDescriptorWithKey:("self") ¬
       ascending:(true) selector:("localizedStandardCompare:")
   return (mutableSet's sortedArrayUsingDescriptors:({sortDescriptor})) as list

end uniqueCharacters</lang>


<lang applescript>{"1", "5", "6", "b", "g", "s", "t", "z"}</lang>

Core language only

This isn't quite as fast as the ASObjC solution above, but it can be case-insensitive if required. (Simply leave out the 'considering case' statement round the call to the handler). The requirement for AppleScript 2.3.1 is just for the 'use' command which loads the "Heap Sort" script. If "Heap Sort"'s loaded differently or compiled directly into the code, this script will work on systems at least as far back as Mac OS X 10.5 (Leopard) and possibly earlier. Same output as above.

<lang applescript>use AppleScript version "2.3.1" -- OS X 10.9 (Mavericks) or later use sorter : script "Heap Sort" -- <https://www.rosettacode.org/wiki/Sorting_algorithms/Heapsort#AppleScript>

on uniqueCharacters(listOfStrings)

   script o
       property allCharacters : {}
       property uniques : {}
   end script
   set astid to AppleScript's text item delimiters
   set AppleScript's text item delimiters to ""
   set o's allCharacters to text items of (listOfStrings as text)
   set AppleScript's text item delimiters to astid
   set characterCount to (count o's allCharacters)
   tell sorter to sort(o's allCharacters, 1, characterCount)
   set i to 1
   set currentCharacter to beginning of o's allCharacters
   repeat with j from 2 to characterCount
       set thisCharacter to item j of o's allCharacters
       if (thisCharacter is not currentCharacter) then
           if (j - i = 1) then set end of o's uniques to currentCharacter
           set i to j
           set currentCharacter to thisCharacter
       end if
   end repeat
   if (i = j) then set end of o's uniques to currentCharacter
   return o's uniques

end uniqueCharacters

considering case

   return uniqueCharacters({"133252abcdeeffd", "a6789798st", "yxcdfgxcyz"})

end considering</lang>


<lang rebol>arr: ["133252abcdeeffd" "a6789798st" "yxcdfgxcyz"] str: join arr

print sort select split str 'ch -> 1 = size match str ch</lang>

1 5 6 b g s t z


<lang AWK>

  2. sorting:
  3. PROCINFO["sorted_in"] is used by GAWK
  4. SORTTYPE is used by Thompson Automation's TAWK


   PROCINFO["sorted_in"] = "@ind_str_asc" ; SORTTYPE = 1
   n = split("133252abcdeeffd,a6789798st,yxcdfgxcyz",arr1,",")
   for (i=1; i<=n; i++) {
     str = arr1[i]
     total_c += leng = length(str)
     for (j=1; j<=leng; j++) {
   for (c in arr2) {
     if (arr2[c] == 1) {
       rec = sprintf("%s%s",rec,c)
   printf("%d strings, %d characters, %d different, %d unique: %s\n",n,total_c,length(arr2),length(rec),rec)

} </lang>

3 strings, 35 characters, 20 different, 8 unique: 156bgstz


<lang cpp>#include <iostream>

  1. include <map>

int main() {

   const char* strings[] = {"133252abcdeeffd", "a6789798st", "yxcdfgxcyz"};
   std::map<char, int> count;
   for (const char* str : strings) {
       for (; *str; ++str)
   for (const auto& p : count) {
       if (p.second == 1)
           std::cout << p.first;
   std::cout << '\n';




Works with: Factor version 0.99 build 2074

<lang factor>USING: io sequences sets.extras sorting ;

{ "133252abcdeeffd" "a6789798st" "yxcdfgxcyz" } concat non-repeating natural-sort print</lang>



<lang go>package main

import (



func main() {

   strings := []string{"133252abcdeeffd", "a6789798st", "yxcdfgxcyz"}
   m := make(map[rune]int)
   for _, s := range strings {
       for _, c := range s {
   var chars []rune
   for k, v := range m {
       if v == 1 {
           chars = append(chars, k)
   sort.Slice(chars, func(i, j int) bool { return chars[i] < chars[j] })




Works with: jq

Works with gojq, the Go implementation of jq

The following "bag-of-words" solution is quite efficient as it takes advantage of the fact that jq implements JSON objects as a hash.<lang jq>

  1. bag of words

def bow(stream):

 reduce stream as $word ({}; .[($word|tostring)] += 1);
  1. Input: an array of strings
  2. Output: an array of the characters that appear just once

def in_one_just_once:

 bow( .[] | explode[] | [.] | implode) | with_entries(select(.value==1)) | keys;

</lang> The task <lang jq>["133252abcdeeffd", "a6789798st", "yxcdfgxcyz"] | in_one_just_once</lang>



<lang julia>list = ["133252abcdeeffd", "a6789798st", "yxcdfgxcyz"]

function is_once_per_all_strings_in(a::Vector{String})

   charlist = collect(prod(a))
   counts = Dict(c => count(x -> c == x, charlist) for c in unique(charlist))
   return sort([p[1] for p in counts if p[2] == 1])





['1', '5', '6', 'b', 'g', 's', 't', 'z']

One might think that the method above suffers from too many passes through the text with one pass per count, but with a small text length the dictionary lookup takes more time. Compare times for a single pass version:

<lang julia>function uniquein(a)

   counts = Dict{Char, Int}()
   for c in prod(list)
       counts[c] = get!(counts, c, 0) + 1
   return sort([c for (c, n) in counts if n == 1])



using BenchmarkTools @btime is_once_per_all_strings_in(list) @btime uniquein(list)



['1', '5', '6', 'b', 'g', 's', 't', 'z']

 1.740 μs (28 allocations: 3.08 KiB)
 3.763 μs (50 allocations: 3.25 KiB)

This can be rectified (see Phix entry) if we don't save the counts as we go but just exclude entries with duplicates: <lang julia>function uniquein2(a)

   s = sort(collect(prod(list)))
   l = length(s)
   return [p[2] for p in enumerate(s) if (p[1] == 1 || p[2] != s[p[1] - 1]) && (p[1] == l || p[2] != s[p[1] + 1])]



@btime uniquein2(list)



['1', '5', '6', 'b', 'g', 's', 't', 'z']

 1.010 μs (14 allocations: 1.05 KiB)


One solution, but others are possible, for instance concatenating the strings and building the count table from it rather than merging several count tables. And to build the last sequence, we could have used something like sorted(toSeq(charCount.pairs).filterIt(it[1] == 1).mapIt(it[0])), which is a one liner but less readable and less efficient than our solution using “collect”.

<lang Nim>import algorithm, sugar, tables

var charCount: CountTable[char]

for str in ["133252abcdeeffd", "a6789798st", "yxcdfgxcyz"]:

 charCount.merge str.toCountTable

let uniqueChars = collect(newSeq):

                   for ch, count in charCount.pairs:
                     if count == 1: ch

echo sorted(uniqueChars)</lang>

@['1', '5', '6', 'b', 'g', 's', 't', 'z']


Translation of: Raku

<lang perl># 20210506 Perl programming solution

use strict; use warnings; use utf8; use Unicode::Collate 'sort';

my %seen; binmode(STDOUT, ':encoding(utf8)'); map { s/(\X)/$seen{$1}++/egr }

  "133252abcdeeffd", "a6789798st", "yxcdfgxcyz", "AАΑSäaoö٥🤔👨‍👩‍👧‍👧";

my $uca = Unicode::Collate->new(); print $uca->sort ( grep { $seen{$_} == 1 } keys %seen )</lang>



function once(integer ch, i, string s)
    integer l = length(s)
    return (i=1 or ch!=s[i-1])
       and (i=l or ch!=s[i+1])
end function

sequence set = {"133252abcdeeffd","a6789798st","yxcdfgxcyz"},
         res = filter(sort(join(set,"")),once)
printf(1,"found %d unique characters: %s\n",{length(res),res})
found 8 unique characters: 156bgstz


<lang PicoLisp>(de uni (Lst

  (let R NIL
              '((L) (accu 'R L 1)) L ) )
        (mapcar chop Lst) )
           (filter '((L) (=1 (cdr L))) R) ) ) ) )


        "yxcdfgxcyz" ) ) )</lang>
("1" "5" "6" "b" "g" "s" "t" "z")


One has to wonder where the digits 0 through 9 come in the alphabet... 🤔 For that matter, What alphabet should they be in order of? Most of these entries seem to presuppose ASCII order but that isn't specified anywhere. What to do with characters outside of ASCII (or Latin-1)? Unicode ordinal order? Or maybe DUCET Unicode collation order? It's all very vague.

<lang perl6>my @list = <133252abcdeeffd a6789798st yxcdfgxcyz>;

for @list, (@list, 'AАΑSäaoö٥🤔👨‍👩‍👧‍👧') {

   say "$_\nSemi-bogus \"Unicode natural sort\" order: ",
   .map( *.comb ).Bag.grep( *.value == 1 )».key.sort( { .unival, .NFKD[0], .fc } ).join,
   "\n        (DUCET) Unicode collation order: ",
   .map( *.comb ).Bag.grep( *.value == 1 )».key.collate.join, "\n";


133252abcdeeffd a6789798st yxcdfgxcyz
Semi-bogus "Unicode natural sort" order: 156bgstz
        (DUCET) Unicode collation order: 156bgstz

133252abcdeeffd a6789798st yxcdfgxcyz AАΑSäaoö٥🤔👨‍👩‍👧‍👧
Semi-bogus "Unicode natural sort" order: 15٥6ASäbgoöstzΑА👨‍👩‍👧‍👧🤔
        (DUCET) Unicode collation order: 👨‍👩‍👧‍👧🤔ä15٥6AbögosStzΑА


This REXX program doesn't assume ASCII (or any other) order.   This example was run on an ASCII machine.

If this REXX program is run on an  ASCII  machine,   it will use the   ASCII   order of characters,   in this case,
decimal digits,   uppercase Latin letters,   and then lowercase Latin letters,   with other characters interspersed.

On an  EBCDIC  machine,   the order would be lowercase Latin letters,   uppercase Latin letters,   and then the
decimal digits,   with other characters interspersed.

On an  EBCDIC  machine,   the lowercase letters and the uppercase letters   aren't   contiguous. <lang rexx>/*REXX pgm finds and shows characters that are unique to only one string and once only.*/ parse arg $ /*obtain optional arguments from the CL*/ if $= | $="," then $= '133252abcdeeffd' "a6789798st" 'yxcdfgxcyz' /*use defaults.*/ if $= then do; say "***error*** no lists were specified."; exit 13; end @= /*will be a list of all unique chars. */

   do j=0  for 256;     x= d2c(j)               /*process all the possible characters. */
                        if x==' '  then iterate /*ignore blanks which are a delimiter. */
   _= pos(x, $);        if _==0    then iterate /*character not found,  then skip it.  */
   _= pos(x, $, _+1);   if _ >0    then iterate /*Character is a duplicate?  Skip it.  */
   @= @ x
   end   /*j*/                                  /*stick a fork in it,  we're all done. */

@@= space(@, 0); L= length(@@) /*elided superfluous blanks; get length*/ if @@== then @= " (none)" /*if none were found, pretty up message*/ if L==0 then L= "no" /*do the same thing for the # of chars.*/ say 'unique characters are: ' @ /*display the unique characters found. */ say say 'Found ' L " unique characters." /*display the # of unique chars found. */</lang>

output   when using the default inputs:
unique characters are:   1 5 6 b g s t z

Found  8  unique characters.


<lang ring> see "working..." + nl see "Unique characters are:" + nl row = 0 str = "" cList = [] uniqueChars = ["133252abcdeeffd", "a6789798st","yxcdfgxcyz"] for n = 1 to len(uniqueChars)

   str = str + uniqueChars[n]

next for n = 1 to len(str)

   ind = count(str,str[n])
   if ind = 1
      row = row + 1

next cList = sort(cList) for n = 1 to len(cList)

   see "" + cList[n] + " "

next see nl

see "Found " + row + " unique characters" + nl see "done..." + nl

func count(cString,dString)

    sum = 0
    while substr(cString,dString) > 0
          cString = substr(cString,substr(cString,dString)+len(string(sum)))
    return sum


Unique characters are:
1 5 6 b g s t z 
Found 8 unique characters


Library: Wren-seq
Library: Wren-sort

<lang ecmascript>import "/seq" for Lst import "/sort" for Sort

var strings = ["133252abcdeeffd", "a6789798st","yxcdfgxcyz"] var totalChars = strings.reduce { |acc, str| acc + str }.toList var uniqueChars = Lst.individuals(totalChars).where { |l| l[1] == 1 }.map { |l| l[0] }.toList Sort.insertion(uniqueChars) System.print("Found %(uniqueChars.count) unique character(s), namely:") System.print(uniqueChars.join(" "))</lang>

Found 8 unique character(s), namely:
1 5 6 b g s t z


<lang XPL0>int List, I, N, C; char Tbl(128), Str; string 0; [List:= ["133252abcdeeffd", "a6789798st","yxcdfgxcyz"]; for I:= 0 to 127 do Tbl(I):= 0; for N:= 0 to 2 do

       [Str:= List(N);
       I:= 0;
       loop    [C:= Str(I);
               if C = 0 then quit;
               I:= I+1;
               Tbl(C):= Tbl(C)+1;

for I:= 0 to 127 do

       if Tbl(I) = 1 then ChOut(0, I);

