Longest string challenge

From Rosetta Code
Revision as of 21:18, 6 September 2011 by rosettacode>Glennj (→‎{{header|Tcl}}: tweak the output for the empty input case)
Longest string challenge is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Note: This task description was recently changed to following discussions on the talk page about intent, the nature of restrictions, and applicability. The new task should not invalidate any solutions and provides additional guidance and flexibility.


Background

This problem and challenge is inspired by one that used to be given as a challenge to students learning Icon. It was intended to be tried in Icon and another language the student was familiar with. The basic problem is quite simple the challenge and fun part came through the introduction of restrictions. Experience has shown that the original restrictions required some adjustment to bring out the intent of the challenge and make it suitable for Rosetta Code.
The original programming challenge and some solutions can be found at Unicon Programming TWiki / Longest Strings Puzzle. (See notes on talk page if you have trouble with the site).

Basic problem statement:

Write a program that reads lines from standard input and, upon end of file, writes the longest line to standard output.
If there are ties for the longest line, the program writes out all the lines that tie.
If there is no input, the program should produce no output.

Original list of restrictions:

1. No comparison operators may be used.
2. No arithmetic operations, such as addition and subtraction, may be used.
3. The only datatypes you may use are integer and string. In particular, you may not use lists.

An additional restriction became apparent in the discussion.

4. Do not re-read the input file. Avoid using files as a replacement for lists.

Intent of Restrictions

Because of the variety of languages on Rosetta and the wide variety of concepts used in them there needs to be a bit of clarification and guidance here to get to the spirit of the challenge and the intent of the restrictions.
The basic problem can be solved very conventionally and that's boring and pedestrian. The original intent here wasn't to unduly frustrate people with interpreting the restrictions, it was to get people to think outside of their particular box and have a bit of fun doing it.
The guiding principle here should be that when using the language of your choice, try to solve this creatively showing off some of your language capabilities. If you need to bend the restrictions a bit, explain why and try to follow the intent. If you think you've implemented a 'cheat' call out the fragment yourself and ask the reader if they can spot why. If you absolutely can't get around one of the restrictions, say why in your description.
Now having said that, the restrictions require some elaboration.
  • In general, the restrictions are meant to avoid the explicit use of these features.
  • "No comparison operators may be used" - At some level there must be some test that allows the solution to get at the length and determine if one string is longer. Comparison operators, in particular any less/greater comparison should be avoided. Representing the length of any string as a number should also be avoided. Various approaches allow for detecting the end of a string. Some of these involve implicitly using equal/not-equal; however, explicitly using equal/not-equal should be acceptable.
  • "No arithmetic operations" - Again, at some level something may have to advance through the string. Often there are ways a language can do this implicitly advance a cursor or pointer without explicitly using a +, - , ++, --, add, subtract, etc.
  • The datatype restrictions are amongst the most difficult to reinterpret. In the language of the original challenge strings are atomic datatypes and structured datatypes like lists are quite distinct and have many different operations that apply to them. This becomes a bit fuzzier with languages with a different programming paradigm. The intent would be to avoid using an easy structure to accumulate the longest strings and spit them out. There will be some natural reinterpretation here.
To make this a bit more concrete, here are a couple of specific examples:
In C, a string is an array of chars, so using a couple of arrays as strings is in the spirit while using a second array in a non-string like fashion would violate the intent.
In APL or J, arrays are the core of the language so ruling them out is unfair. Meeting the spirit will come down to how they are used.
Please keep in mind these are just examples and you may hit new territory finding a solution. There will be other cases like these. Explain your reasoning. You may want to open a discussion on the talk page as well.
  • The added "No rereading" restriction is for practical reasons, re-reading stdin should be broken. I haven't outright banned the use of other files but I've discouraged them as it is basically another form of a list. Somewhere there may be a language that just sings when doing file manipulation and where that makes sense; however, for most there should be a way to accomplish without resorting to an externality.
At the end of the day for the implementer this should be a bit of fun. As an implementer you represent the expertise in your language, the reader may have no knowledge of your language. For the reader it should give them insight into how people think outside the box in other languages. Comments, especially for non-obvious (to the reader) bits will be extremely helpful. While the implementations may be a bit artificial in the context of this task, the general techniques may be useful elsewhere.

Task

Implement a solution to the basic problem that adheres to the spirit of the restrictions.
Describe how you circumvented or got around these 'restrictions' and met the 'spirit' of the challenge. Your supporting description may need to describe any challenges to interpreting the restrictions and how you made this interpretation. You should state any assumptions, warnings, or other relevant points.
This task is likely to encourage multiple different types of solutions. They should be substantially different approaches.

Given the input:

a
bb
ccc
ddd
ee
f
ggg

The output should be (possibly rearranged):

ccc
ddd
ggg

Ada

How to bypass the restrictions:

  • All lines of potential output are stored in an (unbounded) string, named Buffer. On special character (Latin_1.Nul) is used to separate between different lines.
  • We can't directly compare the lengths of two strings. So instead, we assign the difference of the lengths to a variable of type Natural (*). If the result is outside of Natural, this raises an exception. If there is no exception, we assign the result to a variable of type Positive (*), which raises an exception if the result is outside of Positive.

(*) Technically, Natural and Positive are not types but subtypes of Integer: Natural ranges from 0 to Integer'Last, Positive from 1 to Integer'Last.


So this is the first solution.

<lang Ada>with Ada.Strings.Unbounded; use Ada.Strings.Unbounded; with Ada.Text_IO, Ada.Characters.Latin_1;

procedure Longest_String_Challenge is

  function "+"(S: String) return Unbounded_String renames To_Unbounded_String;
  Separator: constant Character := Ada.Characters.Latin_1.NUL;
  procedure Funny_Stuff(B, L: in out Unbounded_String; N: Unbounded_String) is
     Nat: Natural;
  begin
     Nat := Length(N) - Length(L);
     declare
        Pos: Positive;
     begin
        Pos := Nat;
        B   := N;
        L   := N;
     exception
        when Constraint_Error => B := B & Separator & N;
     end;
  exception
     when Constraint_Error => null;
  end Funny_Stuff;
  Buffer: Unbounded_String := +"";
  Longest: Unbounded_String := +"";
  Next: Unbounded_String;

begin

  while True loop
     Next := + Ada.Text_IO.Get_Line;
     Funny_Stuff(Buffer, Longest, Next);
  end loop;

exception

  when others =>
     for I in To_String(Buffer)'Range loop
        if To_String(Buffer)(I) = Separator then
           Ada.Text_IO.New_Line;
        else
           Ada.Text_IO.Put(To_String(Buffer)(I));
        end if;
     end loop;

end Longest_String_Challenge;</lang>

Output, when run with its own source code as the input:

   function "+"(S: String) return Unbounded_String renames To_Unbounded_String;
   procedure Funny_Stuff(B, L: in out Unbounded_String; N: Unbounded_String) is

Here is the second solution. It also makes heavy use of exceptions, but it does not require to compute the difference (which is an arithmetic operation, i.e., a cheat). Instead, the procedure Funny_Stuff carries some auxiliary strings S, T. If they are unequal and neither is empty, it recursively calls itself with the same strings shortened by 1. At some point of time, either S=T, or S is empty, or T is empty.

<lang ada>with Ada.Strings.Unbounded; use Ada.Strings.Unbounded; with Ada.Text_IO, Ada.Characters.Latin_1;

procedure Longest_String_Challenge is

  function "+"(S: String) return Unbounded_String renames To_Unbounded_String;
  function "-"(U: Unbounded_String) return String renames To_String;
  Separator: constant Character := Ada.Characters.Latin_1.NUL;
  procedure Funny_Stuff(B, L: in out Unbounded_String;
                        N: Unbounded_String;
                        S, T: String) is
     C: Character;
  begin
     if S = T then
        B := B & Separator & N;
     else
        C:= T(T'First); -- raises Constraint_Error if T is empty
        begin
           C := S(S'First); -- same if S is empty
           Funny_Stuff(B,L,N,S(S'First+1 .. S'Last), T(T'First+1..T'Last)); --
        exception
           when Constraint_Error =>
              B   := N;
              L   := N;
        end;
     end if;
  exception
     when Constraint_Error => null;
  end Funny_Stuff;
  Buffer: Unbounded_String := +"";
  Longest: Unbounded_String := +"";
  Next: Unbounded_String;

begin

  while True loop
     Next := + Ada.Text_IO.Get_Line;
     Funny_Stuff(Buffer, Longest, Next, -Longest, -Next);
  end loop;

exception

  when others =>
     for I in To_String(Buffer)'Range loop
        if To_String(Buffer)(I) = Separator then
           Ada.Text_IO.New_Line;
        else
           Ada.Text_IO.Put(To_String(Buffer)(I));
        end if;
     end loop;

end Longest_String_Challenge;</lang>

The output, when given its own source code as the input:

   function "+"(S: String) return Unbounded_String renames To_Unbounded_String;
            Funny_Stuff(B,L,N,S(S'First+1 .. S'Last), T(T'First+1..T'Last)); --

C

This is pointless on so many levels. <lang c>#include <stdio.h>

  1. include <string.h>

int cmp(char *p, char *q) { while (*p && *q) p = &p[1], q = &q[1]; return *p; }

int main() { char line[65536]; char buf[1000000] = {0}; char *last = buf; char *next = buf;

while (gets(line)) { strcat(line, "\n"); if (cmp(last, line)) continue; if (cmp(line, last)) next = buf; last = next; strcpy(next, line); while (*next) next = &next[1]; }

printf("%s", buf); return 0; }</lang>Running it:<lang>% printf "a\nbb\nccc\nddd\nee\nf\nggg" | ./a.out ccc ddd ggg</lang> Note that the above code never checked for memory bounds and long input can overrun the buffers. It's intentionally made this way to keep it simple, please don't complicate it by adding safety features: if you are really concerned with that, below is a second method that can handle arbitrary length input.<lang c>#include <stdio.h>

  1. include <stdio.h>
  2. include <stdlib.h>
  3. include <string.h>

int inc(int x) { return (int)&((char *)x)[1]; } int dec(int x) { return (int)&((char *)x)[-1]; } int gt(int x, int y) { while (y && x) y = dec(y), x = dec(x); return x; }

int eq(int x, int y) { return !gt(x, y) && !gt(y, x); }

int add(int x, int y) { while(y) x = inc(x), y = dec(y); return x; }

/* strlen(a) + 1 */ int length(char *a) { char *x = 0; // assuming (int)(char*)0 == 0 if (!a) return 0; while (*a) a = &a[1], x = &x[1]; return (int)x; }

char *str_cat(char *a, char *b) { int len = add(1, add(length(a), length(b))); if (!(a = realloc(a, len))) abort(); return strcat(a, b); }

char *get_line(char *l, FILE *fp) { int c, len = 0; char tmp[2] = {0};

*l = 0; while ((c = fgetc(fp)) != EOF) { *tmp = c; len = inc(len);

l = str_cat(l, tmp); if (eq(*tmp, '\n')) return l; }

*tmp = '\n'; return len ? str_cat(l, tmp) : l; }

int main() { int l1, l2; char *line = malloc(1), *buf = malloc(1), *longest = malloc(1); while (1) { line = get_line(line, stdin);

if (!(l1 = length(line))) break; l2 = length(longest);

if (gt(l1, l2)) { *buf = *longest = 0; longest = str_cat(longest, line); } else if (gt(l2, l1)) continue;

buf = str_cat(buf, line); } printf("%s", buf);

free(buf); free(longest); free(line);

return 0; }</lang>

Here is a more concise variation which exits (with a non-zero return code) if it encounters a buffer overflow:

<lang c>#include <stdio.h>

  1. include <stdlib.h>
  2. include <string.h>

int longer(char *p, char *q) {

       while (*p && *q) p = &p[1], q = &q[1];
       return *p;

}

int main() {

       char line[100000];
       char buf[1100001];
       char *linend= &line[99999];
       char *bufend= &buf[1000000];
       char *last = buf;
       char *next = buf;
       memset(line, 1, 100000);
       memset(buf, 1, 1100001);
       buf[0]= buf[1100000]= 0;
       while (fgets(line, 100000, stdin)) {
               if (!*linend) exit(1);
               if (longer(last, line)) continue;
               if (!longer(bufend, line)) exit(1);
               if (longer(line, last)) next = buf;
               last = next;
               strcpy(next, line);
               while (*next) next = &next[1];
       }
       printf("%s", buf);
       exit(0);

}</lang>

Icon and Unicon

String Scanning / Pattern Matching Solution

<lang Icon>procedure main(arglist)

   local b  # the current buffer (string)
   local l  # the last string 
   local L  # a \n delimited accumulation of all the longest strings
   
   while b := read() do {
       /l := b      # primes l on first pass
       b ? ( move(*l), if move(1) then L := (l := b) || "\n" else if move(0) then L := (\L|"") || b || "\n") 
           #       move(*l) - fails if b is not l characters long
           #       move(1)  - succeeds/fails if the string is longer and triggers a reset of L
       }
   
   write(\L)

end</lang>

Sample Output:

ccc
ddd
ggg

Recursive Solution

Here is a recursive solution using only single character substring-ing (no string scanning/pattern matching). <lang Icon>procedure main()

   longest(".")   # needs a single character seed to throw away

end

procedure longest(Longest)

   Line := read() | return Longest        # read until we can return the longest strings
   if Line[*Longest] then Longest := Line # prime/reset Longest
   Longest := longest(Longest)            # use stack to hold multiples
   if Line[*Longest] then write(Line)     # write only longest strings, 
                                          # Line must be at least as long as Longest
   return Longest                         # feed back longest for length

end</lang>

Sample Output:

ggg
ddd
ccc

J

J programming really cannot be separated from lists, and I have made few attempts to do so here. However, all of the lists used are of a fixed length and could theoretically be simulated by explicit functions or are strings.

<lang j> isempty =. (0 [ 0&{) :: 1: NB. 0=#

  compare =. ($:&}.)`((0 1,:_1 0) {~ <@,&isempty)@.(+.&isempty) NB. *@-&#
  add =. ,`(,:@[)`] @. (compare {:)
  > add&.>/ (}: , ,:&.>@{:) ;: 'a bb ccc ddd ee f ggg'

ccc ddd ggg</lang>

OCaml

Without the restrictions, the standard way to solve the task would be to iterate the lines, compare their length, and accumulate the longest lines in a list. Here we bypass the restriction of not using comparison operators (in the function cmp) by taking advantage that characters of strings are indexed from 0 to [length - 1] and trying to access to a character which index is the length of the other string (if it is beyond the end of the string an exception is raised). The other restriction of not using lists is easily bypassed by concatenating the results instead of accumulating it in a list.

<lang ocaml>let input_line_opt ic =

 try Some (input_line ic)
 with End_of_file -> None

let cmp s1 s2 =

 try ignore(s1.[String.length s2]); 1     (* s1 is longer *)
 with _ ->
   try ignore(s2.[String.length s1]); -1  (* s2 is longer *)
   with _ -> 0                            (* both same length *)

let () =

 let ic = open_in Sys.argv.(1) in
 let rec loop longest acc =
   match input_line_opt ic with
   | Some line ->
     ( match cmp line longest with
       | 1 -> loop line (line ^ "\n")
       | 0 -> loop line (acc ^ line ^ "\n")
       | _ -> loop longest acc )
   | None ->
       close_in ic;
       print_string acc
 in
 loop "" ""</lang>

Perl

<lang perl>#!/usr/bin/perl -n END{ print $all }

substr($_, length($l)) and $all = $l = $_ or substr($l, length) or $all .= $_;</lang>

Perl 6

<lang perl6>my $l = ; # Sample longest string seen. my $a = ; # Accumulator to save longest strings.

while $*IN.get -> $s {

  my $n = "$s\n";
  if $n.substr($l.chars) {     # Is new string longer?
      $a = $l = $n;            # Reset accumulator.
  }
  elsif !$l.substr($n.chars) { # Same length?
     $a ~= $n;                 # Accumulate it.
  }

} print $a;</lang>

Given the example input, returns:

ccc
ddd
ggg

PicoLisp

Not sure if this meets the spirit. I would implement it the same way if there were no "restrictions": <lang PicoLisp>(mapc prinl

  (maxi '((L) (length (car L)))
     (by length group
        (in NIL
           (make (until (eof) (link (line)))) ) ) ) )</lang>

Another solution avoids 'group', and builds an associative buffer of lines instead: <lang PicoLisp>(let Buf NIL

  (in NIL
     (until (eof)
        (let (Line (line)  Len (length Line))
           (if (assoc Len Buf)
              (conc @ (cons Line))
              (push 'Buf (cons Len (cons Line))) ) ) ) )
  (mapc prinl (cdr (maxi car Buf))) )</lang>

Python

<lang python>import fileinput

  1. return len(a) - len(b) if positive, 0 otherwise

def longer(a, b):

   while len(a) and len(b):
       a, b = a[1:], b[1:]
   return len(a)

longest, lines = , for x in fileinput.input():

   if longer(x, longest):
       lines, longest = x, x
   elif not longer(longest, x):
       lines += x

print(lines, end=)</lang>

Sample runs
paddy@paddy-VirtualBox:~$ cat <<! | python3.2 longlines.py 
a
bb
ccc
ddd
ee
f
ggg
!
ccc
ddd
ggg
paddy@paddy-VirtualBox:~$ touch nothing.txt
paddy@paddy-VirtualBox:~$ cat nothing.txt  | python3.2 longlines.py 
paddy@paddy-VirtualBox:~$ 

Tcl

Uses only string comparisons for equality and glob-style matching

<lang tcl>#!/usr/bin/env tclsh

set longest z set output "" while {[gets stdin line] != -1} {

   set comparison [string repeat z [string length $line]]
   if {$longest eq $comparison} {
       # this line is equally long
       append output $line \n
   } elseif {[string match ${longest}z* $comparison]} {
       # this line is longer
       set longest $comparison
       set output "$line\n"
   }

} puts -nonewline $output</lang>

Test:

$ ./longest.tcl <<END
> a
> bb
> ccc
> ddd
> ee
> f
> ggg
> END
ccc
ddd
ggg
$ ./longest.tcl </dev/null
$