Remove duplicate elements: Difference between revisions

Content added Content deleted

Inline

Revision as of 17:36, 13 November 2007

Given an Array, derive a data structure containing a sequence of elements, derive a sequence of elements in which all duplicates are removed.

Ada

Compiler: GNAT-gpl 2007

with Ada.Containers.Ordered_Sets;
with Ada.Text_IO; use Ada.Text_IO;

procedure Unique_Set is
   package Int_Sets is new Ada.Containers.Ordered_Sets(Integer);
   use Int_Sets;
   Nums : array (Natural range <>) of Integer := (1,2,3,4,5,5,6,7,1);
   Unique : Set;
   Set_Cur : Cursor;
   Success : Boolean;
begin
   for I in Nums'range loop
      Unique.Insert(Nums(I), Set_Cur, Success);
   end loop;
   Set_Cur := Unique.First;
   loop
      Put_Line(Item => Integer'Image(Element(Set_Cur)));
      exit when Set_Cur = Unique.Last;
      Set_Cur := Next(Set_Cur);
   end loop;
end Unique_Set;

AppleScript

set array to {1, 2, 3, "a", "b", "c", 2, 3, 4, "b", "c", "d"}
set unique to {}
repeat with i in array
    -- very important -- list index starts at 1 not 0
    if (i is not in unique) then
        set unique to unique & i
    end if
end repeat

Objective-C

NSArray *items = [NSArray arrayWithObjects:@"A", @"B", @"C", @"B", @"A", nil];

NSSet *unique = [NSSet setWithArray:items];

C++

#include <set>
#include <iostream>
using namespace std;

int main( int argc, char* argv[] ) {
    typedef set<int> TyHash;
    int data[] = {1, 2, 3, 2, 3, 4};

    TyHash hash(data, data + 6);

    cout << "Set items:" << endl;
    for (TyHash::iterator iter = hash.begin(); iter != hash.end(); iter++)
          cout << *iter << " ";
    cout << endl;
}

Alternative method working directly on the array:

#include <iostream>
#include <iterator>
#include <algorithm>

// helper template
template<typename T> T* end(T (&array)[size]) { return array+size; }

int main()
{
  int data[] = { 1, 2, 3, 2, 3, 4 };
  std::sort(data, end(data));
  int* new_end = std::unique(data, end(data));
  std::copy(data, new_end, std::ostream_iterator<int>(std::cout, " ");
  std::cout << std::endl;
}

C#

Compiler: MSVS 2005

 List<int> nums = new List<int>( new int { 1, 1, 2, 3, 4, 4 } );
 List<int> unique = new List<int>();
 foreach( int i in nums )
   if( !unique.Contains( i ) )
     unique.Add( i );

Common Lisp

To remove duplicates non-destructively:

(remove-duplicates '(1 3 2 9 1 2 3 8 8 1 0 2))
> (9 3 8 1 0 2)

Or, to remove duplicates in-place:

(delete-duplicates '(1 3 2 9 1 2 3 8 8 1 0 2))
> (9 3 8 1 0 2)

D

void main() {
    int[] data = [1, 2, 3, 2, 3, 4];
    int[int] hash;
    foreach(el; data)
        hash[el] = 0;
}

E

[1,2,3,2,3,4].asSet().getElements()

Erlang

List = [1, 2, 3, 2, 2, 4, 5, 5, 4, 6, 6, 5].
Set = sets:from_list(List).

Forth

Forth has no built-in hashtable facility, so the easiest way to achieve this goal is to take the "uniq" program as an example.

The work uniq, if given a sorted array of cells, will remove the duplicate entries and return the new length of the array. For simplicity, uniq has been written to process cells (which are to Forth what "int" is to C), but could easily be modified to handle a variety of data types through deferred procedures, etc.

The input data is assumed to be sorted.

\ Increments a2 until it no longer points to the same value as a1
\ a3 is the address beyond the data a2 is traversing.
: skip-dups ( a1 a2 a3 -- a1 a2+n )
    dup rot ?do
      over @ i @ <> if drop i leave then
    cell +loop ;

\ Compress an array of cells by removing adjacent duplicates
\ Returns the new count
: uniq ( a n -- n2 )
   over >r             \ Original addr to return stack
   cells over + >r     \ "to" addr now on return stack, available as r@
   dup begin           ( write read )
      dup r@ <
   while
      2dup @ swap !    \ copy one cell
      cell+ r@ skip-dups
      cell 0 d+        \ increment write ptr only
   repeat  r> 2drop  r> - cell / ;

Here is another implementation of "uniq" that uses a popular parameters and local variables extension words. It is structurally the same as the above implementation, but uses less overt stack manipulation.

: uniqv { a n \ r e -- n }
    a n cells+ to e
    a dup to r
    \ the write address lives on the stack
    begin
      r e <
    while
      r @ over !
      r cell+ e skip-dups to r
      cell+
    repeat
    a - cell / ;

To test this code, you can execute:

create test 1 , 2 , 3 , 2 , 6 , 4 , 5 , 3 , 6 ,
here test - cell / constant ntest
: .test ( n -- ) 0 ?do test i cells + ? loop ; 

test ntest 2dup cell-sort uniq .test

output

1 2 3 4 5 6 ok

Haskell

values = [1,2,3,2,3,4]
unique = List.nub values

IDL

 result = uniq( array[sort( array )] )

Java

 //Using Java 1.5/5.0
 Object[] data = new Object[] {1, 2, 3, "a", "b", "c", 2, 3, 4, "b", "c", "d"};
 Set uniqueSet = new HashSet(Arrays.asList(data));
 Object[] unique = uniqueSet.toArray();

MAXScript

uniques = #(1, 2, 3, "a", "b", "c", 2, 3, 4, "b", "c", "d")
for i in uniques.count to 1 by -1 do
(
    id = findItem uniques uniques[i]
    if (id != i) do deleteItem uniques i
)

==Perl==] Interpreter: Perl

use List::MoreUtils qw(uniq);
my @uniq = uniq qw(1 2 3 a b c 2 3 4 b c d);

Without modules:

my %seen;
my @uniq = grep {!$seen{$_}++} qw(1 2 3 a b c 2 3 4 b c d);

PHP

 $list = array(1, 2, 3, 'a', 'b', 'c', 2, 3, 4, 'b', 'c', 'd');
 $unique_list = array_unique($list);

Pop11

;;; Initial array
lvars ar = {1 2 3 2 3 4};
;;; Create a hash table
lvars ht= newmapping([], 50, 0, true);
;;; Put all array as keys into the hash table
lvars i;
for i from 1 to length(ar) do
   1 -> ht(ar(i))
endfor;

;;; Collect keys into a list
lvars ls = [];
appdata(ht, procedure(x); cons(front(x), ls) -> ls; endprocedure);

Prolog

 uniq(Data,Uniques) :- sort(Data,Uniques).

Example usage:

 ?- uniq([1, 2, 3, 2, 3, 4],Xs).
 Xs = [1, 2, 3, 4]

Python

 items = [1, 2, 3, 'a', 'b', 'c', 2, 3, 4, 'b', 'c', 'd']
 unique = list(set(items))

Raven

[ 1 2 3 'a' 'b' 'c' 2 3 4 'b' 'c' 'd' ] as items
items copy unique print

list (8 items)
 0 => 1
 1 => 2
 2 => 3
 3 => "a"
 4 => "b"
 5 => "c"
 6 => 4
 7 => "d"

Ruby

 ary = [1,1,2,1,'redundant',[1,2,3],[1,2,3],'redundant']
 uniq_ary = ary.uniq
 # => [1, 2, "redundant", [1, 2, 3]]

Scala

 val list = List(1,2,3,4,2,3,4,99)
 val l2 = list.removeDuplicates
 // l2: scala.List[scala.Int] = List(1,2,3,4,99)

Tcl

The concept of an "array" in TCL is strictly associative - and since there cannot be duplicate keys, there cannot be a redundant element in an array. What is called "array" in many other languages is probably better represented by the "list" in TCL (as in LISP).

 set result [lsort -unique $listname]

@@ Line 3: / Line 3: @@
 Given an Array, derive a data structure containing a sequence of elements, derive a sequence of elements in which all duplicates are removed.
-==[[Ada]]==
+=={{header|Ada}}==
-[[category:Ada]]
 Compiler: GNAT-gpl 2007
  with Ada.Containers.Ordered_Sets;
@@ Line 28: / Line 27: @@
  end Unique_Set;
-==[[AppleScript]]==
+=={{header|AppleScript}}==
-[[Category:AppleScript]]
  set array to {1, 2, 3, "a", "b", "c", 2, 3, 4, "b", "c", "d"}
  set unique to {}
@@ Line 39: / Line 37: @@
  end repeat
-==[[Objective-C]]==
+=={{header|Objective-C}}==
-[[Category:Objective-C]]
  NSArray *items = [NSArray arrayWithObjects:@"A", @"B", @"C", @"B", @"A", nil];
@@ Line 84: / Line 81: @@
  }
-==[[C#]]==
+=={{header|C sharp|C #}}==
 '''Compiler:''' MSVS 2005
@@ Line 93: / Line 90: @@
       unique.Add( i );
-==[[Common Lisp]]==
+=={{header|Common Lisp}}==
-[[Category:Common Lisp]]
 To remove duplicates non-destructively:
@@ Line 108: / Line 104: @@
-==[[D]]==
+=={{header|D}}==
-[[Category:D]]
 <pre>
 void main() {
@@ Line 119: / Line 114: @@
 </pre>
-==[[E]]==
+=={{header|E}}==
-[[Category:E]]
  [1,2,3,2,3,4].asSet().getElements()
-==[[Erlang]]==
+=={{header|Erlang}}==
-[[Category:Erlang]]
  List = [1, 2, 3, 2, 2, 4, 5, 5, 4, 6, 6, 5].
  Set = sets:from_list(List).
-==[[Forth]]==
+=={{header|Forth}}==
-[[Category:Forth]]
 Forth has no built-in hashtable facility, so the easiest way to achieve this goal is to take the "uniq" program as an example.
@@ Line 186: / Line 178: @@
 2 3 4 5 6 ok
-==[[Haskell]]==
+=={{header|Haskell}}==
-[[Category:Haskell]]
  values = [1,2,3,2,3,4]
  unique = List.nub values
-==[[IDL]]==
+=={{header|IDL}}==
-[[Category:IDL]]
   result = uniq( array[sort( array )] )
-==[[Java]]==
+=={{header|Java}}==
   //Using Java 1.5/5.0
   Object[] data = new Object[] {1, 2, 3, "a", "b", "c", 2, 3, 4, "b", "c", "d"};
@@ Line 203: / Line 193: @@
   Object[] unique = uniqueSet.toArray();
-==[[MAXScript]]==
+=={{header|MAXScript}}==
-[[Category:MAXScript]]
  uniques = #(1, 2, 3, "a", "b", "c", 2, 3, 4, "b", "c", "d")
  for i in uniques.count to 1 by -1 do
@@ Line 212: / Line 201: @@
  )
-==[[Perl]]==
+=={{header|Perl}}==]
-[[Category:Perl]]
 '''Interpreter:''' [[Perl]]
  use List::MoreUtils qw(uniq);
@@ Line 221: / Line 209: @@
  my @uniq = grep {!$seen{$_}++} qw(1 2 3 a b c 2 3 4 b c d);
-==[[PHP]]==
+=={{header|PHP}}==
   $list = array(1, 2, 3, 'a', 'b', 'c', 2, 3, 4, 'b', 'c', 'd');
   $unique_list = array_unique($list);
-==[[Pop11]]==
+=={{header|Pop11}}==
-[[Category:Pop11]]
  ;;; Initial array
@@ Line 242: / Line 229: @@
  appdata(ht, procedure(x); cons(front(x), ls) -> ls; endprocedure);
-==[[Prolog]]==
+=={{header|Prolog}}==
-[[Category:Prolog]]
   uniq(Data,Uniques) :- sort(Data,Uniques).
@@ Line 252: / Line 238: @@
   Xs = [1, 2, 3, 4]
-==[[Python]]==
+=={{header|Python}}==
-[[Category:Python]]
   items = [1, 2, 3, 'a', 'b', 'c', 2, 3, 4, 'b', 'c', 'd']
   unique = list(set(items))
 See also http://www.peterbe.com/plog/uniqifiers-benchmark and http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560
-==[[Raven]]==
+=={{header|Raven}}==
-[[Category:Raven]]
  [ 1 2 3 'a' 'b' 'c' 2 3 4 'b' 'c' 'd' ] as items
@@ Line 274: / Line 258: @@
 => "d"
-==[[Ruby]]==
+=={{header|Ruby}}==
-[[Category:Ruby]]
   ary = [1,1,2,1,'redundant',[1,2,3],[1,2,3],'redundant']
   uniq_ary = ary.uniq
   # => [1, 2, "redundant", [1, 2, 3]]
-==[[Scala]]==
+=={{header|Scala}}==
-[[Category:Scala]]
   val list = List(1,2,3,4,2,3,4,99)
   val l2 = list.removeDuplicates
   // l2: scala.List[scala.Int] = List(1,2,3,4,99)
-==[[Tcl]]==
+=={{header|Tcl}}==
-[[Category:Tcl]]
 The concept of an "array" in TCL is strictly associative - and since there cannot be duplicate keys, there cannot be a redundant element in an array. What is called "array" in many other languages is probably better represented by the "list" in TCL (as in LISP).