User defined pipe and redirection operators

Revision as of 22:17, 17 September 2011 by rosettacode>NevilleDNZ (Input Records: A shorter version - for testing - of the wikipedia page. http://en.wikipedia.org/wiki/List_of_computer_scientists)
This page uses content from Wikipedia. The original article was at Pipeline_(Unix). The list of authors can be seen in the page history. As with Rosetta Code, the text of Wikipedia is available under the GNU FDL. (See links for details on variance)

In Unix-like computer operating systems (and, to some extent, Microsoft Windows), a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process (stdout) feeds directly as input (stdin) to the next one. Each connection is implemented by an anonymous pipe. Filter programs are often used in this configuration.

User defined pipe and redirection operators is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

The concept was invented by Douglas McIlroy for Unix shells and it was named by analogy to a physical pipeline.

Unix pipeline can be thought of as left associative infix operation whose operands are programs with parameters. Programatically all programs in pipeline run at the same time (in parallel), but, looking at syntax, it can be thought that one runs after another (note, that parallelism is actually emulated; for how just see how pipelines are implemented later on this page). It is a functional composition. One can be reminded of functional programming, where data is passed from one function to another (as their input or output).

Task: If the language supports operator definition, then:

  • create "user defined" the equivalents of the Unix shell "<", "|", ">", "<<", ">>" and $(cmd) operators.
  • Provide simple equivalents of: cat, tee, grep, uniq, wc, head & tail, but as filters/procedures native to the specific language.
  • Replicate the below sample shell script, but in the specific language
  • Specifically do not cache the entire stream before the subsequent filter/procedure starts. Pass each record on as soon as available through each of the filters/procedures in the chain.

Alternately: if the language does not support operator definition then replace with:

  • define the procedures: input(cmd,stream), pipe(stream,cmd), output(stream, stream), whereis(array), append(stream)

For bonus Kudos: Implement the shell "&" concept as a dyadic operator in the specific language. e.g.: <lang sh>( head x & tail x & wait ) | grep test</lang>

Sample shell script: ¢ draft - pending a better (more interesting) suggestion ¢ <lang sh>aa="$(

 (
   head -4 < List_of_computer_scientists.lst;
   cat List_of_computer_scientists.lst | grep ALGOL | tee ALGOL_pioneers.lst;
   tail -4 List_of_computer_scientists.lst
 ) | sort | uniq | tee "the_important_scientists.lst" | grep aa

); echo "Pioneer: $aa"</lang> Input Records:

A test sample of scientists from wikipedia's "List of computer scientists"
Name Areas of interest
Wil van der Aalst business process management, process mining, Petri nets
Hal Abelson intersection of computing and teaching
Serge Abiteboul database theory
Samson Abramsky game semantics
Leonard Adleman RSA, DNA computing
Manindra Agrawal polynomial-time primality testing
Luis von Ahn human-based computation
Alfred Aho compilers book, the 'a' in AWK
Stephen R. Bourne Bourne shell, portable ALGOL 68C compiler
Kees Koster ALGOL 68
Lambert Meertens ALGOL 68, ABC (programming language)
Peter Naur BNF, ALGOL 60
Guido van Rossum Python (programming language)
Adriaan van Wijngaarden Dutch pioneer; ARRA, ALGOL
Dennis E. Wisnosky Integrated Computer-Aided Manufacturing (ICAM), IDEF
Stephen Wolfram Mathematica
William Wulf compilers
Edward Yourdon Structured Systems Analysis and Design Method
Lotfi Zadeh fuzzy logic
Arif Zaman Pseudo-random number generator
Albert Zomaya Australian pioneer of scheduling in parallel and distributed systems
Konrad Zuse German pioneer of hardware and software

These records can be declared in any format appropriate to the specific language. eg table, array, list, table or text file etc.

Output:

Pioneer: Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL

ALGOL 68

See User defined pipe and redirection operators/ALGOL 68

J

If we ignore the gratuitous complexity requirements of this task, it boils down to this:

Step 0: get the data. The task does not specify how to get the data, so here I use lynx, which is readily available on most unix-like systems, including cygwin. Note that lynx needs to be in the OS PATH when running j.

<lang j>require 'task' data=:<;._2 shell 'lynx -dump -nolist -width=999 http://en.wikipedia.org/wiki/List_of_computer_scientists'</lang>

Step 1: define task core algorithms:

<lang j>grep=: +./@E.S:0 # ]</lang>

Step 2: select and display the required data:

<lang j>  ;'aa' grep 'ALGOL' grep data

    * Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL

</lang>

As for the concept of a pipe that presents data one record at a time to a downstream function, that corresponds to the J operator @ and we could achieve the "left to right" syntax mechanism by explicitly ordering its arguments 2 :'v@u' but it's not clear how to demonstrate that usefully, in this task. (And, I could write a lot of code, to accomplish what's being accomplished here with the two successive greps, but I find that concept distasteful and tedious.)

However, note also that J's sort (/:~) and uniq (~.) operations would work just fine on this kind of data. For example:

<lang j>  ;'aa' grep 'ALGOL' grep data,data

    * Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL
    * Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL
  ;'aa' grep ~. 'ALGOL' grep data,data
    * Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL

</lang>

That said, this implements most (perhaps all) of the required complexities:

<lang j>declare=: erase@boxopen tee=: 4 :0

 if._1=nc boxopen x do.(x)=:  end.
 (x)=: (do x),y
 y

) grep=: 4 :'x (+./@E.S:0 # ]) y' pipe=:2 :'v@(u"0)' NB. small pipe -- spoon feed one record at a time PIPE=:2 :0 NB. big pipe -- feed everything all together

 v u y
 v (,x)"_ y        NB. syntactic sugar, beware of tooth decay

) head=: {. tail=: -@[ {. ] sort=: /:~ uniq=: ~. cat=: ] echo=: smoutput@;

declare;:'ALGOL_pioneers the_important_scientists' aa=: ;do TXT=:0 :0 -.LF

 (
   (
     4 head data
   ),(
     cat pipe
     ('ALGOL'&grep) pipe
     ('ALGOL_pioneers'&tee)
       data
   ),(
     4 tail data
 )) PIPE
 sort PIPE
 uniq PIPE
 ('the_important_scientists'&tee) PIPE
 ('aa'&grep)
   

)

echo 'Pioneer:';aa</lang>

This produces the result:

<lang>Pioneer: * Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL</lang>

Perl

Implementing only stream chaining, cat, grep and tee. Oddly enough, I don't feel the urge to implement all of the more-or-less-the-same features asked for by the task. <lang perl>use strict; use 5.10.0;

package IO::File; sub readline { CORE::readline(shift) } # icing, not essential

package Stream; use Exporter 'import';

  1. Only overload one operator. "file | stream" and "stream | stream"
  2. are not ambiguous like with shell commands.

use overload '|' => \&chain; sub new { my $cls = shift; bless { args => [@_] }, ref $cls || $cls; }

sub chain { my ($left, $right, $swap) = @_; ($left, $right) = ($right, $left) if $swap;

if (!ref $left) { my $h; open $h, $left and $left = $h or die $left }

if (!ref $right) { # output file not implemented: don't know where I'd ever use it my $h; open $h, '>', $right and $right = $h or die $right }

if (ref $left and $left->isa(__PACKAGE__)) { $left->{output} = $right; }

if (ref $right and $right->isa(__PACKAGE__)) { $right->{input} = $left; } $right; }

  1. Read a line and do something to it. By default it's this dummy
  2. pass-through function. Overriding it defines a subclass' behavior

sub transform { shift; shift }

sub readline { my $obj = shift; my $line; return $line = <STDIN> unless defined $obj->{input};

while (1) { $line = $obj->{input}->readline or return; return $line if $line = $obj->transform($line); } }

package Cat; use parent -norequire, 'Stream';

  1. Dummy, exactly the same as Stream. Except now we can invoke
  2. as Cat::ter, instead of Stream::ter, which is not even a word

sub ter { Cat->new(@_) }

package Grep; use parent -norequire, 'Stream';

sub transform { my ($obj, $line) = @_; for (@{$obj->{args}}) { return $line if ($line =~ $_) } return; }

sub per { Grep->new(@_) }

package Tee; use parent -norequire, 'Stream'; sub er{ my $obj = Tee->new(@_); @{$obj->{tees}} = map { open my $h, '>', $_ or die $_; $h } @{$obj->{args}}; delete $obj->{args}; $obj }

sub transform { my ($obj, $line) = @_; print $_ $line for @{$obj->{tees}}; $line; }

package main; my $chain = '/etc/services' # head of chain; omit to use STDIN | Cat::ter # don't really need this line | Grep::per(qr/tcp/) | Tee::er('/tmp/t1', '/tmp/t2') | Grep::per(qr/170/) | Tee::er('/tmp/t3') ;

print while $_ = $chain->readline;</lang>