Natural sorting: Difference between revisions
Content deleted Content added
m placed the categories in the right order for the task. |
m →{{header|Wren}}: Minor tidy |
||
(6 intermediate revisions by 4 users not shown) | |||
Line 84:
</pre><br><br>
=={{header|
{{trans|Nim}}
<syntaxhighlight lang="11l">T.enum Kind
STRING
NUMBER
T KeyItem
Kind kind
String s
Int num
F natOrderKey(=s)
// Remove leading and trailing white spaces.
s = s.trim((‘ ’, "\t", "\r", "\n"))
// Make all whitespace characters equivalent and remove adjacent spaces.
s = s.replace(re:‘\s+’, ‘ ’)
// Switch to lower case.
s = s.lowercase()
// Remove leading "the ".
I s.starts_with(‘the ’) & s.len > 4
s = s[4..]
// Split into fields.
[KeyItem] result
V idx = 0
L idx < s.len
V e = idx
L e < s.len & !s[e].is_digit()
e++
I e > idx
V ki = KeyItem()
ki.kind = Kind.STRING
ki.s = s[idx .< e]
result.append(ki)
idx = e
L e < s.len & s[e].is_digit()
e++
I e > idx
V ki = KeyItem()
ki.kind = Kind.NUMBER
ki.num = Int(s[idx .< e])
result.append(ki)
idx = e
R result
F scmp(s1, s2)
I s1 < s2 {R -1}
I s1 > s2 {R 1}
R 0
F naturalCmp(String sa, String sb)
V a = natOrderKey(sa)
V b = natOrderKey(sb)
L(i) 0 .< min(a.len, b.len)
V ai = a[i]
V bi = b[i]
I ai.kind == bi.kind
V result = I ai.kind == STRING {scmp(ai.s, bi.s)} E ai.num - bi.num
I result != 0
R result
E
R I ai.kind == STRING {1} E -1
R I a.len < b.len {-1} E (I a.len == b.len {0} E 1)
F test(title, list)
print(title)
print(sorted(list, key' cmp_to_key(naturalCmp)).map(s -> ‘'’s‘'’).join("\n"))
print()
test(‘Ignoring leading spaces.’,
[‘ignore leading spaces: 2-2’,
‘ ignore leading spaces: 2-1’,
‘ ignore leading spaces: 2+0’,
‘ ignore leading spaces: 2+1’])
test(‘Ignoring multiple adjacent spaces (MAS).’,
[‘ignore MAS spaces: 2-2’,
‘ignore MAS spaces: 2-1’,
‘ignore MAS spaces: 2+0’,
‘ignore MAS spaces: 2+1’])
test(‘Equivalent whitespace characters.’,
[‘Equiv. spaces: 3-3’,
"Equiv. \rspaces: 3-2",
"Equiv. \x0cspaces: 3-1",
"Equiv. \x0bspaces: 3+0",
"Equiv. \nspaces: 3+1",
"Equiv. \tspaces: 3+2"])
test(‘Case Independent sort.’,
[‘cASE INDEPENDENT: 3-2’,
‘caSE INDEPENDENT: 3-1’,
‘casE INDEPENDENT: 3+0’,
‘case INDEPENDENT: 3+1’])
test(‘Numeric fields as numerics.’,
[‘foo100bar99baz0.txt’,
‘foo100bar10baz0.txt’,
‘foo1000bar99baz10.txt’,
‘foo1000bar99baz9.txt’])
test(‘Title sorts.’,
[‘The Wind in the Willows’,
‘The 40th step more’,
‘The 39 steps’,
‘Wanda’])</syntaxhighlight>
{{out}}
<pre>
Ignoring leading spaces.
' ignore leading spaces: 2+0'
' ignore leading spaces: 2+1'
' ignore leading spaces: 2-1'
'ignore leading spaces: 2-2'
Ignoring multiple adjacent spaces (MAS).
'ignore MAS spaces: 2+0'
'ignore MAS spaces: 2+1'
'ignore MAS spaces: 2-1'
'ignore MAS spaces: 2-2'
Equivalent whitespace characters.
'Equiv. �spaces: 3+0'
'Equiv.
spaces: 3+1'
'Equiv. spaces: 3+2'
'Equiv. �spaces: 3-1'
'Equiv.
spaces: 3-2'
'Equiv. spaces: 3-3'
Case Independent sort.
'casE INDEPENDENT: 3+0'
'case INDEPENDENT: 3+1'
'caSE INDEPENDENT: 3-1'
'cASE INDEPENDENT: 3-2'
Numeric fields as numerics.
'foo100bar10baz0.txt'
'foo100bar99baz0.txt'
'foo1000bar99baz9.txt'
'foo1000bar99baz10.txt'
Title sorts.
'The 39 steps'
'The 40th step more'
'Wanda'
'The Wind in the Willows'
</pre>
=={{header|AppleScript}}==
AppleScript doesn't have a built-in sort facility, but its string comparisons are normalised and several attributes can be specifically either considered or ignored, making it fairly simple for a sort coded in the language to produce the results required here.
<
use framework "Foundation"
use sorter : script ¬
"Custom Iterative Ternary Merge Sort" -- <www.macscripter.net/t/timsort-and-nigsort/71383/3>
on naturalSort(listOfText)
--
--
-- run of white space to a single character, zap any leading/trailing space, move
-- any article word at the beginning to the end, and replace any "ſ" or "ʒ" with "s".
script o
property
end script
set regex to current application's NSRegularExpressionSearch
set substitutions to {{"\\s++", space}, {"^ | $", ""}, ¬
{"^(?i)(The|An?) (.++)$", "$2 $1"}, {"[\\u0292\\u017f]", "s"}}
repeat with i from 1 to (count o's doctored)
set mutableString to (current application's class "NSMutableString"'s ¬
stringWithString:(o's doctored's item i))
repeat with thisSub in substitutions
tell mutableString to replaceOccurrencesOfString:(searchStr) ¬
withString:(replacement) options:(regex) range:({0, its |length|()})
end repeat
set o's doctored's item i to mutableString as text
end repeat
-- Sort the doctored strings with the relevant AppleScript comparison attributes
-- explicitly either set or not, echoing the moves in the original list.
considering numeric strings, white space and hyphens but ignoring diacriticals, punctuation and case
tell sorter to sort(o's doctored, 1, -1, {slave:{listOfText}})
end considering
Line 131 ⟶ 281:
end naturalSort
on join(lst, delim)
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to delim
set txt to lst as text
set AppleScript's text item delimiters to astid
return txt
end join
on tests()
set output to {"(* Leading, trailing, and multiple white spaces ignored *)"}
set output's end to ¬
naturalSort({" ignore superfluous spaces: 1-3", "ignore superfluous spaces: 1-1", ¬
" ignore superfluous spaces: 1-2", " ignore superfluous spaces: 1-4", ¬
"ignore superfluous spaces: 1-7", "ignore superfluous spaces: 1-5 ", ¬
"ignore superfluous spaces:
set output's end to linefeed & "(* All white space characters treated as equivalent *)"
set output's end to naturalSort({"Equiv. spaces: 2-6", "Equiv." & return & "spaces: 2-5", ¬
"Equiv." & (character id 12) & "spaces: 2-4", ¬
"Equiv." & (character id 11) & "spaces: 2-3", ¬
"Equiv." & linefeed & "spaces: 2-2", "Equiv." & tab & "spaces: 2-1"})
set output's end to linefeed & ¬
"(* Case ignored. (The sort order would actually be the same with case considered,
since case only decides the issue when the strings are otherwise identical.) *)"
set output's end to naturalSort({"cASE INDEPENDENT: 3-1", "caSE INDEPENDENT: 3-2", ¬
"CASE independent: 3-3", "casE INDEPENDENT: 3-4", "case INDEPENDENT: 3-5"})
set output's end to linefeed & "(* Numerics considered by number value *)"
set output's end to naturalSort({"foo1000bar99baz10.txt", "foo100bar99baz0.txt", ¬
"foo100bar10baz0.txt", "foo1000bar99baz9.txt"})
set output's end to linefeed & "(* Title sort *)"
set output's end to ¬
naturalSort({"The Wind in the Willows", "The 40th Step More", ¬
"A Matter of Life and Death", "The 39 steps", ¬
"An Inspector Calls", "Wanda"})
set output's end to linefeed & "(* Diacriticals (and case) ignored *)"
set output's end to naturalSort({"Equiv. " & (character id 253) & " accents: 6-1", ¬
"Equiv. " & (character id 221) & " accents: 6-3", ¬
"Equiv. y accents: 6-4", "Equiv. Y accents: 6-2"})
set output's end to linefeed & "(* Ligatures *)"
set output's end to naturalSort({(character id 306) & " ligatured", ¬
"of", "ij no ligature", (character id 339), "od"})
set output's end to linefeed & ¬
"(* Custom \"s\" equivalents and Esszet (NB. Esszet normalises to \"ss\") *)"
set output's end to naturalSort({"Start with an " & (character id 658) & ": 8-1", ¬
"Start with an " & (character id 383) & ": 8-2", ¬
"Start with an " & (character id 223) & ": 8-3", ¬
"Start with an s: 8-4", "Start with an ss: 8-5"})
return join(output, linefeed)
end tests
tests()</syntaxhighlight>
{{output}}
<syntaxhighlight lang="applescript">"(* Leading, trailing, and multiple white spaces ignored *)
ignore superfluous spaces: 1-1
ignore superfluous spaces: 1-2
ignore superfluous spaces: 1-3
ignore superfluous spaces: 1-4
ignore superfluous spaces: 1-5
ignore superfluous spaces: 1-6
ignore superfluous spaces: 1-7
ignore superfluous spaces: 1-8
(* All white space characters treated as equivalent *)
Equiv. spaces: 2-1
Equiv.
spaces: 2-2
Equiv.�spaces: 2-3
Equiv.�spaces: 2-4
Equiv.
spaces: 2-5
Equiv. spaces: 2-6
(* Case ignored. (The sort order would actually be the same with case considered,
since case only decides the issue when strings are otherwise identical.) *)
cASE INDEPENDENT: 3-1
caSE INDEPENDENT: 3-2
CASE independent: 3-3
casE INDEPENDENT: 3-4
case INDEPENDENT: 3-5
(* Numerics considered by number value *)
foo100bar10baz0.txt
foo100bar99baz0.txt
foo1000bar99baz9.txt
foo1000bar99baz10.txt
(* Title sort *)
The 39 steps
The 40th Step More
An Inspector Calls
A Matter of Life and Death
Wanda
The Wind in the Willows
(* Diacriticals (and case) ignored *)
Equiv. ý accents: 6-1
Equiv. Y accents: 6-2
Equiv. Ý accents: 6-3
Equiv. y accents: 6-4
(* Ligatures *)
IJ ligatured
ij no ligature
od
œ
of
(* Custom \"s\" equivalents and Esszet (NB. Esszet normalises to \"ss\") *)
Start with an ʒ: 8-1
Start with an ſ: 8-2
Start with an s: 8-4
Start with an ß: 8-3
Start with an ss: 8-5"</syntaxhighlight>
=={{header|ATS}}==
<syntaxhighlight lang="ats">
(*------------------------------------------------------------------*)
(* Natural sorting. *)
(*------------------------------------------------------------------*)
(* I deal only with ASCII here and solve only the first four
problems. For Unicode, I would most likely use GNU libunistring (or
Gnulib) and UTF-32. Handling Unicode properly is complicated.
There are other matters that make "natural sorting" a tricky
thing. For instance, which "accented letters" are actually
"accented"--rather than distinct letters in their own
right--depends on the language and the purpose. In Esperanto, for
example, 'ĉ' is a distinct letter that goes just after the letter
'c'. *)
#include "share/atspre_staload.hats"
staload UN = "prelude/SATS/unsafe.sats"
#define NIL list_nil ()
#define :: list_cons
(*------------------------------------------------------------------*)
(* Types and interfaces. *)
typedef char_skipper =
{n : int}
{i : nat | i <= n}
(string n,
size_t n,
size_t i) -<cloref>
[j : int | i <= j; j <= n]
size_t j
typedef char_skipper_backwards =
{n : int}
{i : nat | i <= n}
(string n,
size_t n,
size_t i) -<cloref>
[j : int | 0 <= j; j <= i]
size_t j
typedef char_translator =
{n : int}
string n -<cloref,!wrt> string n
extern fn
make_char_skipper :
(char -<cloref> bool) -<> char_skipper
extern fn
make_char_skipper_backwards :
(char -<cloref> bool) -<> char_skipper_backwards
extern fn
make_char_translator :
(char -<cloref> bool,
char -<cloref> [c : int | 1 <= c; c <= 127] char c) -<>
char_translator
extern fn
remove_leading_spaces :
{n : int}
string n -< !wrt >
[m : nat | m <= n]
string m
extern fn
remove_trailing_spaces :
{n : int}
string n -< !wrt >
[m : nat | m <= n]
string m
extern fn
combine_adjacent_spaces :
{n : int}
string n -< !wrt >
[m : nat | m <= n]
string m
extern fn
compare_strings_containing_numbers :
{m, n : int}
(string m, string n) -<> int
extern fn
evaluate_natural :
{m : int}
{i, n : nat | i + n <= m}
(string m, size_t i, size_t n) -<> ullint
extern fn
compare_strings_naturally :
{m, n : int}
(string m, string n) -< !wrt > int
extern fn
list_sort_strings_naturally :
{n : int}
list (String, n) -< !wrt > list (String, n)
(*------------------------------------------------------------------*)
(* Global closures. *)
val skip_spaces =
make_char_skipper (lam c => c = ' ')
val skip_spaces_backwards =
make_char_skipper_backwards (lam c => c = ' ')
val skip_digits =
make_char_skipper (lam c => isdigit c)
val translate_whitespace_to_spaces =
make_char_translator (lam c => isspace c, lam c => ' ')
(* A little unit test. *)
val- "1 2 3 " = translate_whitespace_to_spaces "1\t2\v3\n"
val string_tolower =
make_char_translator
(lam c => isupper c,
lam c =>
let
typedef ascii_lowercase =
[c : int | 'a' <= c; c <= 'z'] char c
val c = tolower c
in
$UN.cast{ascii_lowercase} c
end)
(* A little unit test. *)
val- "abcdef" = string_tolower "ABCdef"
(*------------------------------------------------------------------*)
(* Implementations. *)
implement
make_char_skipper match =
let
fun
skipper {n : int}
{i : nat | i <= n}
.<n - i>.
(s : string n,
n : size_t n,
i : size_t i)
:<cloref> [j : int | i <= j; j <= n]
size_t j =
if i = n then
i
else if ~match s[i] then
i
else
skipper (s, n, succ i)
in
skipper
end
implement
make_char_skipper_backwards match =
let
fun
skipper {n : int}
{i : nat | i <= n}
.<i>.
(s : string n,
n : size_t n,
i : size_t i)
:<cloref> [j : int | 0 <= j; j <= i]
size_t j =
if i = i2sz 0 then
i
else if ~match s[pred i] then
i
else
skipper (s, n, pred i)
in
skipper
end
implement
make_char_translator (match, replace) =
let
fn
translator {n : int}
(s : string n)
:<cloref,!wrt> string n =
let
prval () = lemma_string_param s
val n = strlen s
fun
loop {i : nat | i <= n}
.<n - i>.
(t : strnptr n, (* strnptr so it is mutable. *)
i : size_t i)
:<!wrt> strnptr n =
if i = n then
t
else
let
val c = t[i]
in
if match c then
t[i] := replace c;
loop (t, succ i)
end
in
strnptr2string (loop (string1_copy s, i2sz 0))
end
in
translator
end
implement
remove_leading_spaces s =
let
prval () = lemma_string_param s
val n = strlen s
val j = skip_spaces (s, n, i2sz 0)
in
strnptr2string (string_make_substring (s, j, n - j))
end
(* A little unit test. *)
val- "1 " = remove_leading_spaces " 1 "
implement
remove_trailing_spaces s =
let
prval () = lemma_string_param s
val n = strlen s
val j = skip_spaces_backwards (s, n, n)
in
strnptr2string (string_make_substring (s, i2sz 0, j))
end
(* A little unit test. *)
val- " 1" = remove_trailing_spaces " 1 "
fn
_combine_adjacent_spaces
{n : int}
(s : string n)
:<!refwrt> [m : nat | m <= n]
string m =
let
prval () = lemma_string_param s
val n = strlen s
in
if n = i2sz 0 then
s
else
let
val buf = arrayref_make_elt<char> (succ n, '\0')
val () = buf[0] := s[0]
fun
loop {i : pos | i <= n}
{j : pos | j <= i}
.<n - i>.
(i : size_t i,
j : size_t j)
:<!refwrt> [k : pos | k <= n]
size_t k =
if i = n then
j
else
begin
if (s[i] = ' ') * (s[pred i] = ' ') then
loop (succ i, j)
else
begin
buf[j] := s[i];
loop (succ i, succ j)
end
end
val [k : int] k = loop (i2sz 1, i2sz 1)
val s1 = $UN.cast{string k} (ptrcast buf)
in
strnptr2string (string_make_substring (s1, i2sz 0, k))
end
end
implement
combine_adjacent_spaces s =
$effmask_ref _combine_adjacent_spaces s
(* A little unit test. *)
val- " 1 2 3 4 " = combine_adjacent_spaces " 1 2 3 4 "
implement
compare_strings_containing_numbers {m, n} (sm, sn) =
let
prval () = lemma_string_param sm
prval () = lemma_string_param sn
val m = strlen sm
and n = strlen sn
fun
compare_them
{i, j : nat | i <= m; j <= n}
.<m - i, n - j>.
(i : size_t i,
j : size_t j)
:<> int =
if i = m then
(if j = n then 0 else ~1)
else if j = n then
1
else if (isdigit sm[i]) * (isdigit sn[j]) then
let
val i1 = skip_digits (sm, m, succ i)
and j1 = skip_digits (sn, n, succ j)
val vm = evaluate_natural (sm, i, i1 - i)
and vn = evaluate_natural (sn, j, j1 - j)
in
if vm = vn then
compare_them (i1, j1)
else if vm < vn then
~1
else
1
end
else
let
val cmp = compare (sm[i], sn[j])
in
if cmp = 0 then
compare_them (succ i, succ j)
else
cmp
end
in
compare_them (i2sz 0, i2sz 0)
end
implement
evaluate_natural {m} {i, n} (s, i, n) =
let
fun
loop {k : int | i <= k; k <= i + n}
.<(i + n) - k>.
(k : size_t k,
accum : ullint)
:<> ullint =
if k = i + n then
accum
else
let
val digit = (char2int0 s[k] - char2int0 '0')
val accum = (10ULL * accum) + g0i2u digit
in
loop (succ k, accum)
end
in
loop (i, 0ULL)
end
(* A little unit test. *)
val- 1234ULL = evaluate_natural ("xy1234z", i2sz 2, i2sz 4)
implement
compare_strings_naturally (sm, sn) =
let
prval () = lemma_string_param sm
prval () = lemma_string_param sn
fn
adjust_string (s : String0)
:<!wrt> String0 =
let
val s = translate_whitespace_to_spaces s
val s = string_tolower s
val s = remove_leading_spaces s
val s = remove_trailing_spaces s
val s = combine_adjacent_spaces s
in
s
end
in
compare_strings_containing_numbers (adjust_string sm,
adjust_string sn)
end
implement
list_sort_strings_naturally lst =
let
implement
list_mergesort$cmp<String> (x, y) =
$effmask_wrt compare_strings_naturally (x, y)
in
list_vt2t (list_mergesort<String> lst)
end
(*------------------------------------------------------------------*)
(* Now see if we pass the required tests. *)
implement
gequal_val_val<String> (x, y) =
g0ofg1 x = g0ofg1 y
fn
nicefy_string (s : string)
: string =
let
val s = g1ofg0 s
prval () = lemma_string_param s
val n = strlen s
prval [n : int] EQINT () = eqint_make_guint n
var t : string = ""
var i : [i : nat | i <= n] size_t i
in
for (i := i2sz 0; i <> n; i := succ i)
let
val c = char2int0 s[i]
in
if c < 32 then
let
val numstr = strptr2string (g0int2string c)
val u =
strptr2string (string0_append4 (t, "[", numstr, "]"))
in
t := u
end
else
let
val chrstr = strnptr2string (string_sing s[i])
val u = strptr2string (string0_append (t, chrstr))
in
t := u
end
end;
t
end
fn
make_list_printable {n : int}
(lst : list (String, n))
: list (string, n) =
let
prval () = lemma_list_param lst
fun
loop {i : nat | i <= n}
.<n - i>.
(lst : list (String, n - i),
accum : list (string, i))
: list (string, n) =
case+ lst of
| NIL => list_vt2t (list_reverse accum)
| head :: tail =>
let
val s = strptr2string (string0_append3 ("|", head, "|"))
in
loop (tail, nicefy_string s :: accum)
end
in
loop (lst, NIL)
end
fn
test_ignoring_leading_spaces () : void =
let
val givenlst = $list ("ignore leading spaces: 2-2",
" ignore leading spaces: 2-1",
" ignore leading spaces: 2+0",
" ignore leading spaces: 2+1")
val expected = $list (" ignore leading spaces: 2+0",
" ignore leading spaces: 2+1",
" ignore leading spaces: 2-1",
"ignore leading spaces: 2-2")
val sortedlst = list_sort_strings_naturally givenlst
in
assertloc (sortedlst = expected);
println! ("Ignoring leading spaces:");
println! (" given: ", make_list_printable givenlst);
println! (" result: ", make_list_printable sortedlst)
end
fn
test_ignoring_trailing_spaces () : void =
(* I added this test, myself. *)
let
val givenlst = $list ("ignore trailing spaces: 2b2",
"ignore trailing spaces: 2b1 ",
"ignore trailing spaces: 2b+0 ",
"ignore trailing spaces: 2b+1 ")
val expected = $list ("ignore trailing spaces: 2b+0 ",
"ignore trailing spaces: 2b+1 ",
"ignore trailing spaces: 2b1 ",
"ignore trailing spaces: 2b2")
val sortedlst = list_sort_strings_naturally givenlst
in
assertloc (sortedlst = expected);
println! ("Ignoring trailing spaces:");
println! (" given: ", make_list_printable givenlst);
println! (" result: ", make_list_printable sortedlst)
end
fn
test_combining_adjacent_spaces () : void =
let
val givenlst = $list ("ignore m.a.s spaces: 2-2",
"ignore m.a.s spaces: 2-1",
"ignore m.a.s spaces: 2+0",
"ignore m.a.s spaces: 2+1")
val expected = $list ("ignore m.a.s spaces: 2+0",
"ignore m.a.s spaces: 2+1",
"ignore m.a.s spaces: 2-1",
"ignore m.a.s spaces: 2-2")
val sortedlst = list_sort_strings_naturally givenlst
in
assertloc (sortedlst = expected);
println! ("Combining adjacent spaces:");
println! (" given: ", make_list_printable givenlst);
println! (" result: ", make_list_printable sortedlst)
end
fn
test_whitespace_equivalence () : void =
let
val givenlst = $list ("Equiv. spaces: 3-3",
"Equiv.\rspaces: 3-2",
"Equiv.\x0cspaces: 3-1",
"Equiv.\x0bspaces: 3+0",
"Equiv.\nspaces: 3+1",
"Equiv.\tspaces: 3+2")
val expected = $list ("Equiv.\x0bspaces: 3+0",
"Equiv.\nspaces: 3+1",
"Equiv.\tspaces: 3+2",
"Equiv.\x0cspaces: 3-1",
"Equiv.\rspaces: 3-2",
"Equiv. spaces: 3-3")
val sortedlst = list_sort_strings_naturally givenlst
in
assertloc (sortedlst = expected);
println! ("All whitespace characters equivalent:");
println! (" given: ", make_list_printable givenlst);
println! (" result: ", make_list_printable sortedlst)
end
fn
test_case_independence () : void =
let
val givenlst = $list ("cASE INDEPENENT: 3-2",
"caSE INDEPENENT: 3-1",
"casE INDEPENENT: 3+0",
"case INDEPENENT: 3+1")
val expected = $list ("casE INDEPENENT: 3+0",
"case INDEPENENT: 3+1",
"caSE INDEPENENT: 3-1",
"cASE INDEPENENT: 3-2")
val sortedlst = list_sort_strings_naturally givenlst
in
assertloc (sortedlst = expected);
println! ("Case independence:");
println! (" given: ", make_list_printable givenlst);
println! (" result: ", make_list_printable sortedlst)
end
fn
test_numeric_fields () : void =
let
val givenlst = $list ("foo100bar99baz0.txt",
"foo100bar10baz0.txt",
"foo1000bar99baz10.txt",
"foo1000bar99baz9.txt")
val expected = $list ("foo100bar10baz0.txt",
"foo100bar99baz0.txt",
"foo1000bar99baz9.txt",
"foo1000bar99baz10.txt")
val sortedlst = list_sort_strings_naturally givenlst
in
assertloc (sortedlst = expected);
println! ("Numeric fields:");
println! (" given: ", make_list_printable givenlst);
println! (" result: ", make_list_printable sortedlst)
end
implement
main0 () =
begin
test_ignoring_leading_spaces ();
test_ignoring_trailing_spaces ();
test_combining_adjacent_spaces ();
test_whitespace_equivalence ();
test_case_independence ();
test_numeric_fields ();
println! ("success")
end
(*------------------------------------------------------------------*)
</syntaxhighlight>
{{out}}
<pre>$ patscc -DATS_MEMALLOC_GCBDW natural_sorting.dats -lgc && ./a.out
Ignoring leading spaces:
given: |ignore leading spaces: 2-2|, | ignore leading spaces: 2-1|, | ignore leading spaces: 2+0|, | ignore leading spaces: 2+1|
result: | ignore leading spaces: 2+0|, | ignore leading spaces: 2+1|, | ignore leading spaces: 2-1|, |ignore leading spaces: 2-2|
Ignoring trailing spaces:
given: |ignore trailing spaces: 2b2|, |ignore trailing spaces: 2b1 |, |ignore trailing spaces: 2b+0 |, |ignore trailing spaces: 2b+1 |
result: |ignore trailing spaces: 2b+0 |, |ignore trailing spaces: 2b+1 |, |ignore trailing spaces: 2b1 |, |ignore trailing spaces: 2b2|
Combining adjacent spaces:
given: |ignore m.a.s spaces: 2-2|, |ignore m.a.s spaces: 2-1|, |ignore m.a.s spaces: 2+0|, |ignore m.a.s spaces: 2+1|
result: |ignore m.a.s spaces: 2+0|, |ignore m.a.s spaces: 2+1|, |ignore m.a.s spaces: 2-1|, |ignore m.a.s spaces: 2-2|
All whitespace characters equivalent:
given: |Equiv. spaces: 3-3|, |Equiv.[13]spaces: 3-2|, |Equiv.[12]spaces: 3-1|, |Equiv.[11]spaces: 3+0|, |Equiv [10]spaces: 3+1|, |Equiv.[9]spaces: 3+2|
result: |Equiv.[11]spaces: 3+0|, |Equiv.[10]spaces: 3+1|, |Equiv.[9]spaces: 3+2|, |Equiv.[12]spaces: 3-1|, |Equiv.[13]spaces: 3-2|, |Equiv. spaces: 3-3|
Case independence:
given: |cASE INDEPENENT: 3-2|, |caSE INDEPENENT: 3-1|, |casE INDEPENENT: 3+0|, |case INDEPENENT: 3+1|
result: |casE INDEPENENT: 3+0|, |case INDEPENENT: 3+1|, |caSE INDEPENENT: 3-1|, |cASE INDEPENENT: 3-2|
Numeric fields:
given: |foo100bar99baz0.txt|, |foo100bar10baz0.txt|, |foo1000bar99baz10.txt|, |foo1000bar99baz9.txt|
result: |foo100bar10baz0.txt|, |foo100bar99baz0.txt|, |foo1000bar99baz9.txt|, |foo1000bar99baz10.txt|
success
</pre>
=={{header|C}}==
Line 183 ⟶ 1,041:
Besides the numeric part, everything else was done in a uniform way by transforming input strings into some normalized format and comparing those instead. All sort options flags can be freely mixed together. C source is written in UTF-8 for easier reading here: don't do this for serious code.
<
#include <stdlib.h>
#include <wchar.h>
Line 439 ⟶ 1,297:
return 0;
}</
0000098 nina
100 NINA
Line 491 ⟶ 1,349:
99 Ninja
100 NINA
100 niño</
=={{header|D}}==
Implements requests 1-5.
<
std.ascii, std.range;
Line 561 ⟶ 1,419:
}
}
</syntaxhighlight>
{{out}}
<pre>Test strings:
Line 686 ⟶ 1,544:
=={{header|Elixir}}==
Implements requests 1-5.
<
def sorting(texts) do
Enum.sort_by(texts, fn text -> compare_value(text) end)
Line 741 ⟶ 1,599:
["The Wind in the Willows", "The 40th step more", "The 39 steps", "Wanda"]}
]
|> Enum.each(fn {title, input} -> Natural.task(title, input) end)</
{{out}}
Line 863 ⟶ 1,721:
Objectives six to eight are attainable, except that the character encodements available are not portable. Code page 850 doesn't offer the same accented character codes as for other systems, such as code page 437. But the auxiliary sort key approach easily accommodates substitute characters (and could also swap + and -, for example!), and could recognise ligatures as well. One might be prodded into escalating to 16-bit or even 32-bit character codes to maintain the same ease of manipulation.
<syntaxhighlight lang="fortran">
MODULE STASHTEXTS !Using COMMON is rather more tedious.
INTEGER MSG,KBD !I/O unit numbers.
Line 1,229 ⟶ 2,087:
END DO !On to the next.
END !A handy hint from Mr. Natural: "At home or at work, get the right tool for the job!"
</syntaxhighlight>
Example output, in two columns:
Entry|Text Character Order Entry|Text 'Natural' Order
Line 1,266 ⟶ 2,124:
When (if) a scan reaches the end of its text, the TAIL will be considered for the extraction of further characters.
<syntaxhighlight lang="fortran">
MODULE ASSISTANCE
INTEGER MSG,KBD !I/O unit numbers.
Line 1,592 ⟶ 2,450:
END DO !On to the next.
END !A handy hint from Mr. Natural: "At home or at work, get the right tool for the job!"
</syntaxhighlight>
This time, because the texts are no longer being parsed into pieces, the book titles are not ordered together though they are in the required sequence disregarding the other entries. "The 39" and "The 40" have the "The " part converted into a TAIL, and so their first comparison characters are their digits, and in ASCII, digits precede letters. This is revealed in the third column, where the comparison characters are revealed, in an ad-hoc manner: what appears are the characters placed by every comparison so there may be contention.
Output:
Line 1,629 ⟶ 2,487:
First four rules, no extra credit:
<
import (
Line 1,762 ⟶ 2,620:
}
return false
}</
{{out}}
<pre>
Line 1,833 ⟶ 2,691:
Implements requests 1-5.
<
import Data.List
import Data.Char
Line 1,924 ⟶ 2,782:
commonWords = ["the","a","an","of"]
</syntaxhighlight>
{{out}}
Line 2,047 ⟶ 2,905:
The natural way of approaching this task in J is to normalize the text based on the rules desired for sorting. Here, we limit ourselves to ascii, for portability, and decide that our domain shall be terminated strings (where each string ends with the same character - typically a newline):
<
lines=: <;.2
Line 2,056 ⟶ 2,914:
norm=: [: split (32 9 12 13 14 15{a.) -.~ [: titleFix tolower
natSor=: lines ;@/: norm&.>@lines</
Example data:
<
ignore leading spaces: 2-2
ignore leading spaces: 2-1
Line 2,104 ⟶ 2,962:
The 39 steps
Wanda
)</
Note that the required example which contains equivalent whitespace characters includes a '\n' in the data. So, for that example, we use a backslash as our terminator.
Line 2,110 ⟶ 2,968:
Example use:
<
ignore leading spaces: 2+0
ignore leading spaces: 2+1
Line 2,144 ⟶ 3,002:
The Wind in the Willows
</syntaxhighlight>
=={{header|JavaScript}}==
Line 2,150 ⟶ 3,008:
Implements the first four rules. Rule 4 works for digits up to 20 characters.
<syntaxhighlight lang="javascript">
var nsort = function(input) {
var e = function(s) {
Line 2,170 ⟶ 3,028:
]));
// -> ['\nfile9.txt', 'file10.txt', 'File11.TXT', 'file12.txt']
</syntaxhighlight>
=={{header|jq}}==
Line 2,187 ⟶ 3,045:
matter therefore comes down to the filter named "splitup", which for
clarity, we define here as a top-level function, as follows:
<
def tidy: if .[0] == "" then .[1:] else . end | if .[length-1] == "" then .[0:length-1] else . end ;
Line 2,244 ⟶ 3,102:
| splitup # embedded integers
;
sort_by(naturally);</
'''Testing'''
For brevity, we use the input as given above, but modified slightly so that it can be read
in as valid JSON. For example, the comments have been quoted. With these adjustments, the test driver can be written as a one-liner:
<
{{out}} (scrollable)
<div style="overflow:scroll; height:400px;">
<
# Ignoring leading spaces
Line 2,301 ⟶ 3,159:
"Wanda",
"The Wind in the Willows"
]</
</div>
=={{header|Julia}}==
The functional programming principle used was to customize the "lt" comparison option of Julia's basic sort() to the "natural" sort features required.
<
natural1(x, y) = strip(x) < strip(y)
Line 2,390 ⟶ 3,248:
println("Testing sorting mod number $i. Sorted is: $(sort(testarrays[i], lt=ltfunction)).")
end
</
Testing sorting mod number 1. Sorted is: [" ignore leading spaces: 2+0", " ignore leading spaces: 2+1", " ignore leading spaces: 2-1", "ignore leading spaces: 2-2"].
Testing sorting mod number 2. Sorted is: ["ignore m.a.s spaces: 2+0", "ignore m.a.s spaces: 2+1", "ignore m.a.s spaces: 2-1", "ignore m.a.s spaces: 2-2"].
Line 2,403 ⟶ 3,261:
=={{header|Kotlin}}==
<
val r2 = Regex("""[ ]{2,}""")
Line 2,583 ⟶ 3,441:
s9.sortBy(::selector9)
println(s9.map { "'$it'" }.joinToString("\n"))
}</
{{out}}
Line 2,642 ⟶ 3,500:
To build a key, we use the procedure “unidecode” which convert the UTF-8 string into an ASCII string, simplifying the implementation of the other rules. Unfortunately, the letter 'ʒ' is translated as 'z', so we have to convert this letter before calling “unidecode”.
<
type
Line 2,769 ⟶ 3,627:
"Start with an ſ: 2-1",
"Start with an ß: 2+0",
"Start with an s: 2+1"])</
{{out}}
Line 2,808 ⟶ 3,666:
The "structured" features of Pascal do not facilitate escape from loops, so, ... some <code>goto</code> atavisms appear in what follows...
<syntaxhighlight lang="pascal">
Program Natural; Uses DOS, crt; {Simple selection.}
{Demonstrates a "natural" order of sorting text with nameish parts.}
Line 3,053 ⟶ 3,911:
END.
</syntaxhighlight>
Output, with "!" instead of a backslash to prevent context confusions here:
Text order Natural order
Line 3,087 ⟶ 3,945:
This implements all 8 requirements<sup>*</sup>:
<
use feature 'fc';
use Unicode::Normalize;
Line 3,110 ⟶ 3,968:
} @items;
}
</syntaxhighlight>
: <sup>*)</sup> Note that decomposing the strings to the NFKD normalization form and subsequently stripping off all code points of the <code>Nonspacing_Mark</code> category, removes differences caused by accents / ligatures / alternate character forms / etc. in a standards-compliant way. This coincides with all the examples given in the task description, with the exception that it does ''not'' replace "ʒ" with "s" — one could add <pre style="display:inline;padding:0.3em">$str =~ tr/ʒ/s/;</pre> for that but it seems a bit [[wp:International_Phonetic_Alphabet_chart_for_English_dialects#Chart|whimsical]].)
Line 3,116 ⟶ 3,974:
'''Testing:'''
<
use utf8; # interpret this script's source code as UTF8
use Test::More; # for plan(), is_deeply()
Line 3,147 ⟶ 4,005:
print "\n";
}
</syntaxhighlight>
{{out}}
Line 3,225 ⟶ 4,083:
Needs chcp 65001 (or 28591) to get this to work on Windows, be sure to save as utf8.
<!--<
<span style="color: #000080;font-style:italic;">--
-- demo/rosetta/Natural_sorting2.exw
Line 3,374 ⟶ 4,232:
<span style="color: #7060A8;">puts</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\n"</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<!--</
{{Out}}
Line 3,460 ⟶ 4,318:
=={{header|PicoLisp}}==
This parser takes care of features 1,2,3,4,5 and 8:
<
(clip
(make
Line 3,488 ⟶ 4,346:
"ß" "ss" "ſ" "s" "ʒ" "s" ) )
(unless (member Word '(the it to))
(link Word) ) ) ) ) ) ) ) )</
Test:
<
-> ("abc" 123 "defss" " " "ghi")</
Sorting is trivial then:
<
(by parseNatural sort Lst) )</
Test:
<
"# Ignoring leading spaces"
("ignore leading spaces: 2-2" " ignore leading spaces: 2-1" " ignore leading spaces: 2+0" " ignore leading spaces: 2+1")
Line 3,537 ⟶ 4,395:
(pythonOut "Normally sorted :" (sort (copy X)))
(pythonOut "Naturally sorted:" (naturalSort X))
(prinl) ) )</
Output:
<pre># Ignoring leading spaces
Line 3,688 ⟶ 4,546:
=={{header|PowerShell}}==
<
# six sorting
$Discard = '^a ', '^an ', '^the '
Line 3,751 ⟶ 4,609:
)
}
</syntaxhighlight>
{{Out}}
<pre>
Line 3,874 ⟶ 4,732:
=={{header|Python}}==
All eight features:
<
# Not Python 3.x (Can't compare str and int)
Line 4,028 ⟶ 4,886:
print 'Text strings:'; pp(txt)
print 'Normally sorted :'; print '\n'.join(sorted(txt))
print 'Naturally sorted:'; print '\n'.join(ns(txt))</
===Sample Python output===
Line 4,178 ⟶ 5,036:
beginning/end, easy to implement, but sounds wrong).
<
#lang racket
(define (natural-sort l)
Line 4,200 ⟶ 5,058:
(shuffle '("foo9.txt" "foo10.txt" "x9y99" "x9y100" "x10y0" "x z" "x y")))
;; => '("foo9.txt" "foo10.txt" "x9y99" "x9y100" "x10y0" "x y" "x z")
</syntaxhighlight>
=={{header|Raku}}==
Line 4,218 ⟶ 5,076:
different results depending on the order they are applied.
<syntaxhighlight lang="raku"
sub naturally ($a) { $a.lc.subst(/(\d+)/, ->$/ {0~$0.chars.chr~$0},:g) ~"\x0"~$a }
Line 4,320 ⟶ 5,178:
say "\n" ~ '*' x 40 ~ "\n";
}</
Sample output:
Line 4,511 ⟶ 5,369:
=={{header|Ruby}}==
Requirements 1,2,3 and 5 are met in one line of code:
<
Almost all of the code below is handling requirement 4. The problem is that Ruby will happily sort ["a",1] against ["a",2] or even ["b"], but it does not know how to handle [1, "a"] against ["a", 2] and raises an ArgumentError. The code below does not define a new sort method, it defines a new class which is sortable by the existing method (falling back on string comparison).
<
include Comparable
attr_reader :scrubbed, :ints_and_strings, :i_s_pattern
Line 4,537 ⟶ 5,395:
end
</syntaxhighlight>
Demo:
<
{"Ignoring leading spaces" =>
[ "ignore leading spaces: 2-2 ", " ignore leading spaces: 2-1 ", " ignore leading spaces: 2+0 ", " ignore leading spaces: 2+1 "],
Line 4,557 ⟶ 5,415:
puts [title,"--input--", ar, "--normal sort--", ar.sort, "--natural sort--", nat_sorts.sort, "\n"]
end
</syntaxhighlight>
{{out}}
<pre>
Line 4,680 ⟶ 5,538:
=={{header|Scala}}==
All 8:
<
implicit object ArrayOrdering extends Ordering[Array[String]] { // 4
val INT = "([0-9]+)".r
Line 4,732 ⟶ 5,590:
okay
})
}</
Output:
<pre>PASS: 1 IGNORING LEADING SPACES
Line 4,786 ⟶ 5,644:
Tasks 1-5 are completed.
<
(import (scheme base)
(scheme char)
Line 4,878 ⟶ 5,736:
("foo1bar99baz4.txt" "foo2bar99baz3.txt" "foo4bar99baz1.txt" "foo3bar99baz2.txt")
("The Wind in the Willows" "The 40th step more" "The 39 steps" "Wanda")))
</syntaxhighlight>
{{out}}
Line 5,065 ⟶ 5,923:
=={{header|Sidef}}==
{{trans|Raku}}
<
# Sort groups of digits in number order. Sort by order of magnitude then lexically.
-> naturally { self.lc.gsub(/(\d+)/, {|s1| "0" + s1.len.chr + s1 }) + "\x0" + self };
Line 5,090 ⟶ 5,948:
self.gsub(re, {|s1| tr{s1} });
}
}</
Tests:
<
[
"Task 1a\nSort while ignoring leading spaces.",
Line 5,166 ⟶ 6,024:
say "\n#{'*' * 40}\n";
}</
{{out}}
Line 5,359 ⟶ 6,217:
Note also that Tcl supports case-insensitive sorting and “treat digit sequences as numbers” as native sorting options. (The latter is particularly useful for handling filenames.)
<
proc sortWithCollationKey {keyBuilder list} {
Line 5,417 ⟶ 6,275:
foo100bar10baz0.txt
foo1000bar99baz10.txt
foo1000bar99baz9.txt}</
Output:
<pre>
Line 5,518 ⟶ 6,376:
{{libheader|Wren-fmt}}
{{libheader|Wren-sort}}
<
import "./str" for Str
import "./fmt" for Fmt
import "./sort" for Cmp, Sort
var p2 = Pattern.new("+2 ")
Line 5,702 ⟶ 6,560:
System.print(ss[i].map { |s| "'%(s)'" }.join("\n"))
if (i < 8) System.print()
}</
{{out}}
Line 5,761 ⟶ 6,619:
First, a tag-mangled-fields sort, use tags to pull from the original list in sorted order function:
<
x.enumerate().sort(fcn([(_,a)],[(_,b)]){a<b})
.apply('wrap([(n,_)]){orig[n]});
}</
Now, mangle a copy of the original list to "normalize" it to the task requirements and sort.
<
ts1:=T("ignore leading spaces: 2-2", " ignore leading spaces: 2-1",
" ignore leading spaces: 2+0", " ignore leading spaces: 2+1");
dsuSort(ts1.apply("strip"),ts1).println();</
{{out}}
<pre>
Line 5,778 ⟶ 6,636:
</pre>
<
ts2:=T("ignore m.a.s spaces: 2-2", "ignore m.a.s spaces: 2-1",
"ignore m.a.s spaces: 2+0", "ignore m.a.s spaces: 2+1");
dsuSort(ts2.apply('-(" ")),ts2).println();</
{{out}}
<pre>
Line 5,790 ⟶ 6,648:
</pre>
<
ts3:=T("Equiv. spaces: 3-3", "Equiv.\rspaces: 3-2", "Equiv.\x0cspaces: 3-1",
"Equiv.\x0bspaces: 3+0", "Equiv.\nspaces: 3+1", "Equiv.\tspaces: 3+2");
dsuSort(ts3.apply('-.fp1("\n\r\t\f\b\x0b ")),ts3).println();</
{{out}}
<pre>
Line 5,804 ⟶ 6,662:
</pre>
<
ts4:=T("cASE INDEPENENT: 3-2", "caSE INDEPENENT: 3-1",
"casE INDEPENENT: 3+0", "case INDEPENENT: 3+1");
dsuSort(ts4.apply("toLower"),ts4).println();</
{{out}}
<pre>
Line 5,817 ⟶ 6,675:
And the nasty one: multiple numeric fields. Each key is blown apart into a list of ints, where each int is consecutive numeric characters. To compare a pair of fields (ie two lists of ints), walk the two in parallel, stopping when a difference is found. Ugly but it works. Another tactic would be to normalize each field to the same length, concat the fields and use that as the mangled key. But that might have issues if the number of fields differ, the former doesn't care.
<
fcn fieldize(s){
s.apply(fcn(c){"1234567890".holds(c) and c or "."}).split(".")
Line 5,831 ⟶ 6,689:
list.apply(fieldize).enumerate().sort(fcmp)
.apply('wrap([(n,_)]){list[n]});
}</
<
"foo1000bar99baz9.txt");
fsort(ts5).println();
x:=T("x9y99","foo10.txt","x10y0","foo9.txt","x9y100");
fsort(x).println();</
{{out}}
<pre>
|