Sorting algorithms/Tree sort on a linked list

From Rosetta Code
Sorting algorithms/Tree sort on a linked list is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

This page uses content from Wikipedia. The current wikipedia article is at article. The original RosettaCode article was extracted from the wikipedia article № 295989333 of 15:13, 12 June 2009 . The list of authors can be seen in the page history. As with Rosetta Code, the pre 5 June 2009 text of Wikipedia is available under the GNU FDL. (See links for details on variance)

A tree sort is a sort algorithm that builds a binary search tree from the keys to be sorted, and then traverses the tree (in-order) so that the keys come out in sorted order. Its typical use is when sorting the elements of a stream from a file. Several other sorts would have to load the elements to a temporary data structure, whereas in a tree sort the act of loading the input into a data structure is sorting it.

The tree sort is considered by some to be the faster method to sort a linked list, followed by Quicksort and Mergesort:

Sediment sort, bubble sort, selection sort perform very badly.

Test case: Create a linked list of all the sentences of all the episodes of Finnegans Wake by James Joyce. Load them into a local memory based linked list, then tree sort them inplace and print the number of node swaps and seconds of BogoMips are necessary to perform the sort. (The book is to be stored locally (whole or in individual episodes) in a test text file on disk before loading by the specimin code into a linked list for sorting.)

J

What *is* a sentence in Finnegan's wake? Let's say that it's all the text leading up to a period, question mark or exclamation point if (and only if) the character is followed by a space. (There are some practical difficulties here - this means, for example, that the first sentence of a chapter includes the chapter heading - but it's good enough for now.)

There's also the issue of how do we want to sort the sentences? Let's say we'll sort them in ascii order without normalization of the text (since that is simplest).

Let's also say that we have prepared a file which contains some sort of ascii rendition of the text. Note that the final result we get here will depend on exactly how that ascii rendition was prepared. But let's just ignore that issue so we can get something working.

Next, we need to think of what kind of tree, there are a great number of kinds of trees, and they can be categorized in many different ways. For example, a directory tree is almost never a balanced binary tree. (Note that a linked list is a kind of a tree - an extremely tall and skinny unbalanced tree, but a tree nonetheless. Then again, note that efficiency claims in general are specious, because efficient for one purpose tends to be inefficient for many other purposes.) Since we are going for efficiency here, we will implement a short, fat tree (let's call that "efficient use of the programmer's time" or something like that...). Specifically, we'll be implementing a one level deep tree which happens to have 14961 leaves connected directly to the root node.

Simplicity is a virtue, right?

Finally, there's the matter of counting swaps. Let's define our swap count as the minimal number of swaps which would be needed to produce our sorted result.

With these choices, the task becomes:

<lang J> finn=: fread '~user/temp/wake/finneganswake.txt'

  sentences=: (<;.2~ '. '&E.@rplc&(LF,' !.?.')) finn
  #sentences

14961

     +/<:#@>C./:sentences

14945</lang>

We have to swap almost every sentence, but 16 of them can be sorted "for free" with the swaps of the other sentences.

For that matter, inspecting the lengths of the cycles formed by the minimal arrangements of swaps...

<lang J> /:~ #@>C./:sentences 1 1 2 2 4 9 12 25 32 154 177 570 846 935 1314 10877</lang>

... we can see that two of the sentences were fine right where they start out. Let's see what they are:

<lang J>  ;:inv (#~(= /:~))sentences

Very, all
  fourlike tellt.  What tyronte power!</lang>

So now you know.

(Processing time here is negligible - other than the time needed to fetch a copy of the book and render it as plain text ascii - but if we were careful to implement the efficiency recommendations of this task more in the spirit of whatever the task is presumably implying, we could probably increase the processing time by several orders of magnitude.)

Racket

Translation of: Scheme

-- this implementation illustrates differences in identifiers and syntaxes of Scheme and Racket's match-lambda family. racket/match documented here.

<lang racket>#lang racket/base (require racket/match)

(define insert

 ;; (insert key tree)
 (match-lambda**
  [(x '())         `(() ,x ())]
  [(x '(() () ())) `(() ,x ())]
  [(x `(,l ,k ,r)) #:when (<= x k) `(,(insert x l) ,k ,r)]
  [(x `(,l ,k ,r)) `(,l ,k ,(insert x r))]
  [(_ _) "incorrect arguments or broken tree"]))

(define in-order

 ;; (in-order tree)
 (match-lambda
   [`(() ,x ()) `(,x)]
   [`(,l ,x ())  (append (in-order l) `(,x))]
   [`(() ,x ,r)  (append `(,x) (in-order r))]
   [`(,l ,x ,r)  (append (in-order l) `(,x) (in-order r))]
   [_ "incorrect arguments or broken tree"]))

(define (tree-sort lst)

 (define tree-sort-itr
   (match-lambda**
     [(x `())        (in-order x)]
     [(x `(,a . ,b)) (tree-sort-itr (insert a x) b)] 
     [(_ _) "incorrect arguments or broken tree"]))
 (tree-sort-itr '(() () ()) lst))

(tree-sort '(5 3 7 9 1))</lang>

Output:
'(1 3 5 7 9)

Scheme

The following implements a sorting algorithm that takes a linked list, puts each key into an unbalanced binary tree and returns an in-order traversal of the tree.

Library: Matchable
Works with: Chicken Scheme

<lang Scheme>(use matchable)

(define insert

 ;; (insert key tree)
 (match-lambda*
  [(x ())         `(() ,x ()) ]
  [(x (() () ())) `(() ,x ()) ]
  [(x (l k r))
   (=> continue)
   (if (<= x k)

`(,(insert x l) ,k ,r) (continue)) ]

  [(x (l k r)) `(,l ,k ,(insert x r)) ]
  [_ "incorrect arguments or broken tree" ]))

(define in-order

 ;; (in-order tree)
 (match-lambda
  [(() x ()) `(,x)]
  [(l x ())  (append (in-order l) `(,x))]
  [(() x r)  (append `(,x) (in-order r))]
  [(l x r)   (append (in-order l) `(,x) (in-order r))]
  [_ "incorrect arguments or broken tree" ]))

(define (tree-sort lst)

 (define tree-sort-itr
   (match-lambda*
    [(x ())      (in-order x)]
    [(x (a . b)) (tree-sort-itr (insert a x) b)] 
    [_ "incorrect arguments or broken tree" ]))
 (tree-sort-itr '(() () ()) lst))</lang>

Usage: <lang Scheme> #;2> (tree-sort '(5 3 7 9 1)) (1 3 5 7 9)</lang>