User defined pipe and redirection operators

This page uses content from Wikipedia. The original article was at Pipeline_(Unix). The list of authors can be seen in the page history. As with Rosetta Code, the text of Wikipedia is available under the GNU FDL. (See links for details on variance)

In Unix-like computer operating systems (and, to some extent, Microsoft Windows), a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process (stdout) feeds directly as input (stdin) to the next one. Each connection is implemented by an anonymous pipe. Filter programs are often used in this configuration.

The concept was invented by Douglas McIlroy for Unix shells and it was named by analogy to a physical pipeline.

Unix pipeline can be thought of as left associative infix operation whose operands are programs with parameters. Programatically all programs in pipeline run at the same time (in parallel), but, looking at syntax, it can be thought that one runs after another (note, that parallelism is actually emulated; for how just see how pipelines are implemented later on this page). It is a functional composition. One can be reminded of functional programming, where data is passed from one function to another (as their input or output).

Task: If the language supports operator definition, then:

create "user defined" the equivalents of the Unix shell "<", "|", ">", "<<", ">>" and $(cmd) operators.
Provide simple equivalents of: cat, tee, grep, uniq, wc, head & tail, but as filters/procedures native to the specific language.
Replicate the below sample shell script, but in the specific language
Specifically do not cache the entire stream before the subsequent filter/procedure starts. Pass each record on as soon as available through each of the filters/procedures in the chain.

Alternately: if the language does not support operator definition then replace with:

define the procedures: input(cmd,stream), pipe(stream,cmd), output(stream, stream), whereis(array), append(stream)

For bonus Kudos: Implement the shell "&" concept as a dyadic operator in the specific language. e.g.: <lang sh>( head x & tail x & wait ) | grep test</lang>

Sample shell script: ¢ draft - pending a better (more interesting) suggestion ¢ <lang sh>aa="$(

 (
   head -4 < List_of_computer_scientists.lst;
   cat List_of_computer_scientists.lst | grep ALGOL | tee ALGOL_pioneers.lst;
   tail -4 List_of_computer_scientists.lst
 ) | sort | uniq | tee "the_important_scientists.lst" | grep aa

); echo "Pioneer: $aa"</lang> Input File:

List_of_computer_scientists.lst - cut from wikipedia.

Output:

Pioneer: Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL

ALGOL 68

Works with: ALGOL 68 version Revision 1; one minor extension - PRAGMA READ; one major extension - Algol68G's Currying.

Works with: ALGOL 68G version tested with release 1.18.0-9h.tiny.

File: Iterator_pipe_operators.a68 <lang algol68>MODE

 PAGEIN =         PAGE,
 PAGEAPPEND = REF PAGE,
 PAGEOUT =    REF PAGE;

MODE

 MOID = VOID,
 YIELDLINE = PROC(LINE)VOID,
 GENLINE = PROC(YIELDLINE)VOID,
 FILTER = PROC(GENLINE)GENLINE, # the classic shell filter #
 MANYTOONE = PROC([]GENLINE)GENLINE; # eg cat, as in con[cat]enate #

PRIO =: = 5, << = 5, >> = 5;

OP < = (FILTER filter, PAGEIN page)GENLINE: filter(READ page),

  <  = (MANYTOONE cmd, PAGEIN page)GENLINE: cmd(READ page),
  << = (FILTER filter, PAGEIN page)GENLINE: filter(READ page),
  >  = (GENLINE gen, PAGEOUT page)VOID: gen(WRITE page),
  >> = (GENLINE gen, PAGEAPPEND page)VOID: gen(APPEND page),
  =: = (GENLINE gen, FILTER filter)GENLINE: filter(gen),
  =: = (GENLINE gen, MANYTOONE cmd)GENLINE: cmd(gen);</lang>File: Iterator_pipe_utilities.a68

<lang algol68>#!/usr/local/bin/a68g --script #

PROC cat yield line = ([]GENLINE argv, YIELDLINE yield)VOID:

 FOR gen line FROM LWB argv TO UPB argv DO
   argv[gen line](yield)
 OD;

PROC cat = ([]GENLINE argv)GENLINE:

 cat yield line(argv, );

PROC tee yield line = (GENLINE gen, []YIELDLINE args yield, YIELDLINE yield)VOID: (

 # FOR LINE line IN # gen(#) DO #
 ##   (LINE line)VOID: (
        yield(line);
        FOR outn FROM LWB args yield TO UPB args yield DO
           args yield[outn](line)
        OD
 # OD #))

);

PROC tee filter = (GENLINE gen, []YIELDLINE args yield)GENLINE:

 tee yield line(gen, args yield, );

PROC tee = ([]YIELDLINE args filter)FILTER:

 tee filter(, args filter);

PROC grep yield line = (STRING pattern, []GENLINE argv, YIELDLINE yield)VOID:

FOR LINE line IN # cat(argv)(
1. (LINE line)VOID:

      IF string in string(pattern, NIL, line) THEN yield(line) FI

OD #);

PROC grep = (STRING pattern, []GENLINE argv)GENLINE:

 grep yield line(pattern, cat(argv), );

PROC uniq yield line = (GENLINE arg, YIELDLINE yield)VOID:(

 UNION(VOID, LINE)prev := EMPTY;

FOR LINE this IN # arg(#) DO #
1. (LINE this)VOID:

      CASE prev IN
        (LINE case prev): IF NOT(case prev=this) THEN prev := this; yield(this) FI,
        (VOID): (prev := this; yield(this))
      ESAC

OD #)

);

PROC uniq = (GENLINE arg)GENLINE:

 uniq yield line(arg, );

MODE SORTSTRUCT = LINE; PR READ "prelude/sort.a68" PR

PROC sort yield line = ([]GENLINE args, YIELDLINE yield)VOID:(

 PAGE out; cat(args) > out; in place shell sort(out);
 FOR elem FROM LWB out TO UPB out DO
   yield(out[elem])
 OD

);

PROC sort = (GENLINE arg)GENLINE:

 sort yield line(arg, );

PROC head yield line= (INT n, []GENLINE args, YIELDLINE yield)VOID:

 FOR argn FROM LWB args TO UPB args DO
   GENLINE line gen = args[argn];
   INT count := 0;
 # FOR LINE line IN # cat(line gen)(#) DO #
 ##   (LINE line)VOID:(
        count+:=1;
        yield(line);
        IF count = n THEN done FI
 # OD #));
   done: SKIP
 OD;

PROC head = (INT n, []GENLINE args)GENLINE:

 head yield line(n, args, );

PROC tail yield line = (INT n, []GENLINE args, YIELDLINE yield)VOID:

 FOR argn FROM LWB args TO UPB args DO
   GENLINE gen line = args[argn];
   [0:n-1]LINE window; INT window end := -1;
 # FOR LINE line IN # gen line(#) DO #
 ##   (LINE line)VOID:
        window[(window end+:=1) MOD n]:= line
 # OD #);
   FOR line FROM window end-n+1 TO window end DO
     IF line>=0 THEN
       yield(window[line MOD n])
     FI
   OD
 OD;

PROC tail = (INT n, []GENLINE args)GENLINE:

 tail yield line(n, args, );

Define an optional monadic OPerator #

OP TAIL = (INT n)MANYTOONE: tail(n, );</lang>File: Iterator_pipe_page.a68 <lang algol68># Define the required OPerators for pipes of user-defined type "PAGE" # OP +:= = (PAGEOUT page, LINE line)MOID:(

 [LWB page:UPB page+1]LINE out;
 out[:UPB page]:=page;
 out[UPB out]:=line;
 page := out

);

PROC page read line = (PAGEIN page, YIELDLINE yield)VOID:

 FOR elem FROM LWB page TO UPB page DO
   yield(page[elem])
 OD;

OP READ = (PAGEIN page)GENLINE:

 page read line(page, );

PROC page append line = (PAGEAPPEND page, LINE line)VOID:

 page +:= line;

OP WRITE = (PAGEOUT page)YIELDLINE: (

 page := LINE();
 page append line(page, )

);

OP APPEND = (PAGEAPPEND page)YIELDLINE:

 page append line(page, );</lang>File: test_Iterator_pipe_page.a68

<lang algol68>#!/usr/local/bin/a68g --script #

First define what kind of record (aka LINE) we are piping and filtering #

FORMAT line fmt = $xg$; MODE

 LINE = STRING,
 PAGE = FLEX[0]LINE,
 BOOK = FLEX[0]PAGE;

PR READ "Iterator_pipe_page.a68" PR PR READ "Iterator_pipe_operators.a68" PR PR READ "Iterator_pipe_utilities.a68" PR

PAGE list of computer scientists = (

 "Wil van der Aalst - business process management, process mining, Petri nets",
 "Hal Abelson - intersection of computing and teaching",
 "Serge Abiteboul - database theory",
 "Samson Abramsky - game semantics",
 "Leonard Adleman - RSA, DNA computing",
 "Manindra Agrawal - polynomial-time primality testing",
 "Luis von Ahn - human-based computation",
 "Alfred Aho - compilers book, the 'a' in AWK",
 "Stephen R. Bourne - Bourne shell, portable ALGOL 68C compiler",
 "Kees Koster - ALGOL 68",
 "Lambert Meertens - ALGOL 68, ABC (programming language)",
 "Peter Naur - BNF, ALGOL 60",
 "Guido van Rossum - Python (programming language)",
 "Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL",
 "Dennis E. Wisnosky - Integrated Computer-Aided Manufacturing (ICAM), IDEF",
 "Stephen Wolfram - Mathematica",
 "William Wulf - compilers",
 "Edward Yourdon - Structured Systems Analysis and Design Method",
 "Lotfi Zadeh - fuzzy logic",
 "Arif Zaman - Pseudo-random number generator",
 "Albert Zomaya - Australian pioneer of scheduling in parallel and distributed systems",
 "Konrad Zuse - German pioneer of hardware and software"

);

PAGE algol pioneers list, the scientists list; PAGE aa;

Now do a bit of plumbing: #

cat((

   head(4, ) <  list of computer scientists,
   cat(READ list of computer scientists) =: grep("ALGOL", ) =: tee(WRITE algol pioneers list),
   tail(4, READ list of computer scientists)
 )) =: sort =: uniq =: tee(WRITE the scientists list) =: grep("aa", ) >> aa;

Finally check the result: #

printf((

 $"Pioneer: "$, line fmt, aa, $l$,
 $"Number of Algol pioneers: "g(-0)$, UPB algol pioneers list, $l$,
 $"Number of scientists: "g(-0)$, UPB the scientists list, $l$

))</lang> Output:

Pioneer:  Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL
Number of Algol pioneers: 6
Number of scientists: 15

J

If we ignore the gratuitous complexity requirements of this task, it boils down to this:

Step 0: get the data. The task does not specify how to get the data, so here I use lynx, which is readily available on most unix-like systems, including cygwin. Note that lynx needs to be in the OS PATH when running j.

<lang j>require 'task' data=:<;._2 shell 'lynx -dump -nolist -width=999 http://en.wikipedia.org/wiki/List_of_computer_scientists'</lang>

Step 1: define task core algorithms:

Step 2: select and display the required data:

<lang j> ;'aa' grep 'ALGOL' grep data

    * Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL

</lang>

As for the concept of a pipe that presents data one record at a time to a downstream function, that corresponds to the J operator @ and we could achieve the "left to right" syntax mechanism by explicitly ordering its arguments 2 :'v@u' but it's not clear how to demonstrate that usefully, in this task. (And, I could write a lot of code, to accomplish what's being accomplished here with the two successive greps, but I find that concept distasteful and tedious.)

However, note also that J's sort (/:~) and uniq (~.) operations would work just fine on this kind of data. For example:

<lang j> ;'aa' grep 'ALGOL' grep data,data

    * Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL
    * Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL

  ;'aa' grep ~. 'ALGOL' grep data,data
    * Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL

</lang>

That said, this implements most (perhaps all) of the required complexities:

<lang j>declare=: erase@boxopen tee=: 4 :0

 if._1=nc boxopen x do.(x)=:  end.
 (x)=: (do x),y
 y

) grep=: 4 :'x (+./@E.S:0 # ]) y' pipe=:2 :'v@(u"0)' NB. small pipe -- spoon feed one record at a time PIPE=:2 :0 NB. big pipe -- feed everything all together

 v u y

 v (,x)"_ y        NB. syntactic sugar, beware of tooth decay

) head=: {. tail=: -@[ {. ] sort=: /:~ uniq=: ~. cat=: ] echo=: smoutput@;

declare;:'ALGOL_pioneers the_important_scientists' aa=: ;do TXT=:0 :0 -.LF

 (
   (
     4 head data
   ),(
     cat pipe
     ('ALGOL'&grep) pipe
     ('ALGOL_pioneers'&tee)
       data
   ),(
     4 tail data
 )) PIPE
 sort PIPE
 uniq PIPE
 ('the_important_scientists'&tee) PIPE
 ('aa'&grep)

)

echo 'Pioneer:';aa</lang>

This produces the result:

<lang>Pioneer: * Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL</lang>

Perl

Implementing only stream chaining, cat, grep and tee. Oddly enough, I don't feel the urge to implement all of the more-or-less-the-same features asked for by the task. <lang perl>use strict; use 5.10.0;

package IO::File; sub readline { CORE::readline(shift) } # icing, not essential

package Stream; use Exporter 'import';

Only overload one operator. "file | stream" and "stream | stream"
are not ambiguous like with shell commands.

use overload '|' => \&chain; sub new { my $cls = shift; bless { args => [@_] }, ref $cls || $cls; }

sub chain { my ($left, $right, $swap) = @_; ($left, $right) = ($right, $left) if $swap;

if (!ref $left) { my $h; open $h, $left and $left = $h or die $left }

if (!ref $right) { # output file not implemented: don't know where I'd ever use it my $h; open $h, '>', $right and $right = $h or die $right }

if (ref $left and $left->isa(__PACKAGE__)) { $left->{output} = $right; }

if (ref $right and $right->isa(__PACKAGE__)) { $right->{input} = $left; } $right; }

Read a line and do something to it. By default it's this dummy
pass-through function. Overriding it defines a subclass' behavior

sub transform { shift; shift }

sub readline { my $obj = shift; my $line; return $line = <STDIN> unless defined $obj->{input};

while (1) { $line = $obj->{input}->readline or return; return $line if $line = $obj->transform($line); } }

package Cat; use parent -norequire, 'Stream';

Dummy, exactly the same as Stream. Except now we can invoke
as Cat::ter, instead of Stream::ter, which is not even a word

sub ter { Cat->new(@_) }

package Grep; use parent -norequire, 'Stream';

sub transform { my ($obj, $line) = @_; for (@{$obj->{args}}) { return $line if ($line =~ $_) } return; }

sub per { Grep->new(@_) }

package Tee; use parent -norequire, 'Stream'; sub er{ my $obj = Tee->new(@_); @{$obj->{tees}} = map { open my $h, '>', $_ or die $_; $h } @{$obj->{args}}; delete $obj->{args}; $obj }

sub transform { my ($obj, $line) = @_; print $_ $line for @{$obj->{tees}}; $line; }

print while $_ = $chain->readline;</lang>