Talk:User defined pipe and redirection operators

From Rosetta Code
Revision as of 03:50, 14 September 2011 by rosettacode>NevilleDNZ (Drop reference to Co-routines.)

Yet another task without a task

this task has several potential areas of activity:

  1. syntax (right to left infix notation)
  2. serialization and deserialization
  3. command chaining
  4. dealing with external programs

Currently, it's not clear which of these areas of activity are desired, and there are at least 15 possibilities (considering only the inclusion or exclusion of each area of activity, and not considering variants within each area). --Rdm 01:45, 13 September 2011 (UTC)

I don't think the task wants any of the above. Seems the goal is to define stream-like objects where each one's output can be taken up by another as input, and the task's focus is to device a mechanism to drive data through such a chain. BTW, since data flows unidirectionally, it definitely does not require coroutines, all you need to do is have the object at the output end to pull data from upstream on-demand. The problem of the task: it's asking to much. Tail, head, uniq, sort, grep, wc, file io, subshell, redirect-in, redirect-out, pipe -- it's what, reliving 40 years of unix experience in a flash? --Ledrug 02:04, 13 September 2011 (UTC)
As a general rule, languages already implement routines where on object's output can be taken by another as input. In fact, it's hard to think of any language which implements objects which does not implement something like message passing. Even functional languages let you pass the output of one function to another function. So... that seems a bit trivial? And if you are going to require one of them starts before the other completes, that gets into time slicing or multiprocessing or co-routines? But some OS pipe implementations (*cough*windows*cough*) buffer the full output from one command before starting the next... --Rdm 10:43, 13 September 2011 (UTC)
I'm looking for "User defined pipe and redirection operators" in particular. However - you are right - how the actual commands are run (sequentially or concurrently) is relevant. {Naively I was thinking nobody would want to actually implement "MSDS" sequential (MSDOS caches intermediate result in a file, faking piping, without multitasking)... oh... Microsoft did and conquered the world.... sigh}
BTW you don't need to actually do any message passing, or multitasking. It can be done totally within one process using co-procedures.
However (I believe) the "&" operator requires at least threads. And... I just figured out how to define "&" in Algol68... cheers! :-) I'll make the "&" operator a language "kudos" if achieved.
This is starting to sound like an OS CLI implementation task... That said, yes, unix trailing & (as opposed to &&) requires either threading or coroutines. That said, & could be implemented as "defer this operation until you have nothing else to do", in a single threaded environment -- this is equivalent to a time-slice implementation where backgrounded tasks do not get any resources (or to a perceived behavior similar to that of a time slice implementation under heavy load). --Rdm 13:11, 13 September 2011 (UTC)

Adhere to the syntax of the specific language where required, eg the use of brackets and names of operators.

For example: I had to use the operator "=:" instead of the standard "|" pipe-char as the "|" char has a special (and fixed and unchangeable) meaning in Algol68. (I could have used Douglas McIlroy's "^" char for piping maybe? but =: reads better.)

Here is the "Sample shell script", but rewritten in Algol. <lang algol68>PR READ "prelude/general.a68" PR

MODE REC = STRING; FORMAT rec fmt = $g$;

PR READ "Iterator_pipe_operators.a68" PR PR READ "Iterator_pipe_utilities.a68" PR

FLEX[0]STRING aa;

cat (

   head(4,) < "List_of_computer_scientists.lst",
   cat("List_of_computer_scientists.lst") =: grep(ALGOL,) =: tee("ALGOL_pioneers.lst"),
   tail(4,"List_of_computer_scientists.lst")
 ) =: sort =: uniq =: tee("the_important_scientists.lst") =: grep "aa" >> aa;

printf(($"Pioneer: ", $" "g$, aa, $l$)) </lang> I have almost finished, and hope it will take less then 300 lines of code.

So far:

$ wc -l *Iterator_pipe*s.a68
 174 Iterator_pipe_operators.a68
  58 Iterator_pipe_utilities.a68
  20 test_Iterator_pipe_operators.a68
 252 total

This task should be OK in python, especially the operators, and also Ada. I figure the GNU C has a fair chance. C++ should be able to handle the operator overloading.

I'm not familiar enough with other languages to make any real comment. [Ocaml can do any thing! (apparently)] :-) ... Go should be real interesting!

BTW: Here is a complete implementation of "tail", notice it uses a sliding window: <lang algol68>PROC tail yield rec = (INT n, CONJUNCTION args, YIELDREC yield)VOID:

 FOR argn FROM LWB args TO UPB args DO
   INSTREAM rec gen = args[argn];
   CASE rec gen IN
     (FILENAME name): IF LWB args = UPB args THEN yield("==> "+name+" <==") FI
   ESAC;
   [0:n-1]REC window; INT window end := -1;
 # FOR REC rec IN # cat(rec gen)(#) DO #
 ##   (REC rec)VOID:
        window[(window end+:=1) MOD n]:= rec
 # OD #);
   done:
   FOR line FROM window end-n+1 TO window end DO
     IF line>=0 THEN
       yield(window[line MOD n])
     FI
   OD
 OD;

PROC tail = (INT n, CONJUNCTION args)GENREC:

 tail yield rec(n, args ,);
  1. Define an optional monadic TAIL OPerator #

OP TAIL = (INT n)MANYTOONE: tail(n,);</lang>

Note that this "tail" implementation requires just one argument "n", keeping things simple to satisfy the use of tail in the "Sample shell script". The task is not asking for reinvention of head/tail etc. just enough to run the "sample shell script" while retaining the basic functionality of the cloned shell utility, basically just a proof of concept for a particular language.

Rationale: Pipes appear in a hoard of different languages. It always bugs me when a feature is cemented into a language and hence cannot enhanced. Being such a wide spread and useful concept, it would be nice to simply define a few new operators and have piping/redirection available in the new language.

Indeed, having to the ability to add pipes & redirections to a language means a coder can evolve the pipe/redirection definition to match the environment. For example the pipe/redirection operators defined above are "strong typed" (currently string), hence the compiler will detect data of the wrong type (currently string) being piped to the wrong "coprocedure" type and report with a compile time semantic error, hence one unit test script just wrote itself!! (joy).

NevilleDNZ 03:26, 13 September 2011 (UTC)

more ambitious than unix

The task currently says "Pass each record on as soon possible running all filters/procedures concurrently."

But unix does not know anything about records and passes blocks of characters which typically do not represent complete records or complete lines (except in non-portable contexts where the programs have records which match the OS buffer block size). Meanwhile, in a non-multi-tasking language "as soon [as] possible" conflicts with the task requirement "Specifically do not cache the entire stream before the subsequent filter/procedure start".

Also, there's an implicit task "requirement" here, that output be characters. And a secondary implicit task "requirement" here that files be supported (since that what redirection means) which would also suggest that the task needs to support file reference by name. And, finally, in unix, the commands are (as a general rule) external programs, but it's not clear if this task allows for that kind of implementation. --Rdm 13:17, 13 September 2011 (UTC)

I think the confusion is that the task is using "shell" terminology such as "pipe" & "redirect". This "terminology" created the expectation that the task needs to use actual OS based "pipes", "processes" and shell commands. Basically the appearance that "shell like stuff" is being done creates the expectation that the task mandates the code to create some king of "CLI" (command line interpreter). This is not the actuality.

In essence that is required is the creation of the operators "<", "|", ">", "<<" & ">>", with the basic plumbing such as "cat" & "tee".

Here are my thoughts, sketched in python code, note:

  • the code does not require OS pipes
  • neither does it require OS multitasking
  • data is being passed as "rec", this could be a string, but the key point is that (essentially) in this "python sketch" only a one record buffer is required, and this in actuality is only the argument list itself.

<lang python>#!/usr/bin/env python

class Filter(function):

 def __init__(self, code):
   self.code=code
 def __or__(self, next_cmd):
   for rec in self.code():
     yield(next_cmd(self))
 def __gt__(self, filename):
   file = open(filename, "w")
   for rec in self.code:
     print >> file, rec
 def __rshift__(self, filename):
   file = open(filename, "a")
   for rec in self.code:
     print >> file, rec
 # some more subclass attributes required to actually call "code" by proxy #

def cat_code(args):

 for file in args:
   for rec in open(file,"r"):
     yield(rec)


cat = Filter(cat_code)

def grep_code(pattern, args):

 for arg in args:
   for rec in arg:
     if pattern in rec:
       yield rec

grep = Filter(grep_code)

cat("List_of_computer_scientists.lst") | grep("ALGOL") > "ALGOL_pioneers.lst"</lang>

So try not using "Unix pipes" and "Unix processes".

I hope that helps.

NevilleDNZ 22:24, 13 September 2011 (UTC)

Thinking aloud: Python three+ different ways of achieving this pseudo-piping. Decorators, functional-programming and sub-classing. However overloading the operators is easiest done via a subclass. In Algol (I believe) functional-programming must be used.

Thinking again: In the case of the above python sketch, records are processed one at a time, and cat stops while grep does its work on that particular record. So cat & grep are not exactly "concurrent". Is there a name for this kind of programming? I know of examples of this in application/utility programs (eg X-Windows and GTK) where the trick is using a "run-loop". A better description might be "collateral programming" (or co-programming) instead of "concurrent programming"?

NevilleDNZ 00:23, 14 September 2011 (UTC)

It now sounds like you are describing co-routines as an alternative. Though other models are also possible. Meanwhile there are a lot of loose ends. Consider, for example an implementation where it's desirable for grep to pass along empty records when they do not match (a GPU implementation might favor this kind of thing). But since this is implementation by analogy with a few "don't do this the easy way" constraints, it's kind of hard to predict what is going to be acceptable and what is not. --Rdm 01:23, 14 September 2011 (UTC)

ThanX for that, the name "co-routines" sounds very close. But when I read the wikipedia Coroutines and generators section it seems that "coroutines" require multi-threading. This task does not (necessarily) require multi-threading. A single threaded "generator" operator is enough.

You idea of a stream of different data types (esp. including empty records) is intriguing. I will take your lead and provide a simple solution in my pet language where the record is the usual string. And a more complex solution where the record type is union(void, string), this tagged-union allows the yeilding of empty records. Further the redirection targets can be a regular file, but it also can be a array or linked-list. This nicely demonstrates a benefit of user-defined pipe and redirection operators.

Having said the above. I don't think the ability to pipe a generalised record should be a necessary part of the actual task solution for any specific language. Piping variable length strings is enough to satisfy the task requirements.

BTW: I have been wondering how to naturally take advantage of a GPU and you have given me some ideas. ThanX.

NevilleDNZ 02:15, 14 September 2011 (UTC)

Coroutines and multithreading are completely separate topics. You yourself linked to coroutine implementation using Duff's device before, which decidedly is a single thread. And for your stream objects, you don't really need either of those anyway. --Ledrug 02:48, 14 September 2011 (UTC)

Indeed you are right. On reflection, linking to that page on C coroutines was a mistake. Duff's device is kind of extreme. (Interesting, but extreme!) In fact I've renamed the Algol routines to "Iterator_pipe_operators.a68". I'll drop reference to co-processing out of the Task Description too. ThanX

NevilleDNZ 03:50, 14 September 2011 (UTC)