User defined pipe and redirection operators

Revision as of 14:27, 16 September 2011 by Rdm (talk | contribs) (→‎{{header|J}})
This page uses content from Wikipedia. The original article was at Pipeline_(Unix). The list of authors can be seen in the page history. As with Rosetta Code, the text of Wikipedia is available under the GNU FDL. (See links for details on variance)

In Unix-like computer operating systems (and, to some extent, Microsoft Windows), a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process (stdout) feeds directly as input (stdin) to the next one. Each connection is implemented by an anonymous pipe. Filter programs are often used in this configuration.

User defined pipe and redirection operators is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

The concept was invented by Douglas McIlroy for Unix shells and it was named by analogy to a physical pipeline.

Unix pipeline can be thought of as left associative infix operation whose operands are programs with parameters. Programatically all programs in pipeline run at the same time (in parallel), but, looking at syntax, it can be thought that one runs after another (note, that parallelism is actually emulated; for how just see how pipelines are implemented later on this page). It is a functional composition. One can be reminded of functional programming, where data is passed from one function to another (as their input or output).

Task: If the language supports operator definition, then:

  • create "user defined" the equivalents of the Unix shell "<", "|", ">", "<<", ">>" and $(cmd) operators.
  • Provide simple equivalents of: cat, tee, grep, uniq, wc, head & tail, but as filters/procedures native to the specific language.
  • Replicate the below sample shell script, but in the specific language
  • Specifically do not cache the entire stream before the subsequent filter/procedure starts. Pass each record on as soon as available through each of the filters/procedures in the chain.

Alternately: if the language does not support operator definition then replace with:

  • define the procedures: input(cmd,stream), pipe(stream,cmd), output(stream, stream), whereis(array), append(stream)

For bonus Kudos: Implement the shell "&" concept as a dyadic operator in the specific language. e.g.: <lang sh>( head x & tail x & wait ) | grep test</lang>

Sample shell script: ¢ draft - pending a better (more interesting) suggestion ¢ <lang sh>aa="$(

 (
   head -4 < List_of_computer_scientists.lst;
   cat List_of_computer_scientists.lst | grep ALGOL | tee ALGOL_pioneers.lst;
   tail -4 List_of_computer_scientists.lst
 ) | sort | uniq | tee "the_important_scientists.lst" | grep aa

); echo "Pioneer: $aa"</lang> Input File:

  • List_of_computer_scientists.lst - cut from wikipedia.

Output:

Pioneer: Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL

J

If we ignore the gratuitous complexity requirements of this task, it boils down to this:

Step 0: get the data. The task does not specify how to get the data, so here I use lynx, which is readily available on most unix-like systems, including cygwin. Note that lynx needs to be in the OS PATH when running j.

<lang j>require 'task' data=:<;._2 shell 'lynx -dump -nolist -width=999 http://en.wikipedia.org/wiki/List_of_computer_scientists'</lang>

Step 1: define task core algorithms:

<lang j>grep=: +./@E.S:0 # ]</lang>

Step 2: select and display the required data:

<lang j>  ;'aa' grep 'ALGOL' grep data

    * Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL

</lang>

As for the concept of a pipe that presents data one record at a time to a downstream function, that corresponds to the J operator @ and we could achieve the "left to right" syntax mechanism by explicitly ordering its arguments 2 :'v@u' but it's not clear how to demonstrate that usefully, in this task. (And, I could write a lot of code, to accomplish what's being accomplished here with the two successive greps, but I find that concept distasteful and tedious.)

That said, note also that J's sort (/:~) and uniq (~.) operations would work just fine on this kind of data. For example:

<lang j>  ;'aa' grep 'ALGOL' grep data,data

    * Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL
    * Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL
  ;'aa' grep 'ALGOL' ~. grep data,data
    * Adriaan van Wijngaarden - Dutch pioneer; ARRA, ALGOL

</lang>

Perl

Implementing only stream chaining, cat, grep and tee. Oddly enough, I don't feel the urge to implement all of the more-or-less-the-same features asked for by the task. <lang perl>use strict; use 5.10.0;

package IO::File; sub readline { CORE::readline(shift) } # icing, not essential

package Stream; use Exporter 'import';

  1. Only overload one operator. "file | stream" and "stream | stream"
  2. are not ambiguous like with shell commands.

use overload '|' => \&chain; sub new { my $cls = shift; bless { args => [@_] }, ref $cls || $cls; }

sub chain { my ($left, $right, $swap) = @_; ($left, $right) = ($right, $left) if $swap;

if (!ref $left) { my $h; open $h, $left and $left = $h or die $left }

if (!ref $right) { # output file not implemented: don't know where I'd ever use it my $h; open $h, '>', $right and $right = $h or die $right }

if (ref $left and $left->isa(__PACKAGE__)) { $left->{output} = $right; }

if (ref $right and $right->isa(__PACKAGE__)) { $right->{input} = $left; } $right; }

  1. Read a line and do something to it. By default it's this dummy
  2. pass-through function. Overriding it defines a subclass' behavior

sub transform { shift; shift }

sub readline { my $obj = shift; my $line; return $line = <STDIN> unless defined $obj->{input};

while (1) { $line = $obj->{input}->readline or return; return $line if $line = $obj->transform($line); } }

package Cat; use parent -norequire, 'Stream';

  1. Dummy, exactly the same as Stream. Except now we can invoke
  2. as Cat::ter, instead of Stream::ter, which is not even a word

sub ter { Cat->new(@_) }

package Grep; use parent -norequire, 'Stream';

sub transform { my ($obj, $line) = @_; for (@{$obj->{args}}) { return $line if ($line =~ $_) } return; }

sub per { Grep->new(@_) }

package Tee; use parent -norequire, 'Stream'; sub er{ my $obj = Tee->new(@_); @{$obj->{tees}} = map { open my $h, '>', $_ or die $_; $h } @{$obj->{args}}; delete $obj->{args}; $obj }

sub transform { my ($obj, $line) = @_; print $_ $line for @{$obj->{tees}}; $line; }

package main; my $chain = '/etc/services' # head of chain; omit to use STDIN | Cat::ter # don't really need this line | Grep::per(qr/tcp/) | Tee::er('/tmp/t1', '/tmp/t2') | Grep::per(qr/170/) | Tee::er('/tmp/t3') ;

print while $_ = $chain->readline;</lang>