Talk:User defined pipe and redirection operators: Difference between revisions

Suggest changing the task to less language specific
(Suggest changing the task to less language specific)
 
(11 intermediate revisions by 5 users not shown)
Line 180:
 
The task page currently references "List_of_computer_scientists.lst - cut from wikipedia" but without any url. I can find [[wp:List of computer scientists]] but it's not clear to me that the corresponding html of that page is the intended starting point. Perhaps the task could be more specific here? --[[User:Rdm|Rdm]] 14:16, 14 September 2011 (UTC)
 
I've just added 'A test sample of scientists from wikipedia's "List of computer scientists"' table.
 
As this task is still in "Draft" I am keen to find a "classic shell script" that showcases the important features of piping {cat,tee,misc filters} and the various redirections. Any suggestions?
 
Also the "List of computer scientists" is a bit random and the nature of the records is that ''sort | uniq'' filters are not required. Any suggestions about more appropriate data. Maybe we could use "[[Old lady swallowed a fly]]" to generate the test data as the lyrics are wonderfully repetitious? Also the code to generate the test data appears to be suitably brief.
 
[[User:NevilleDNZ|NevilleDNZ]] 22:32, 17 September 2011 (UTC)
 
:Here's a one with some history: [[http://books.google.com/books?id=jO-iKwPRX0QC&lpg=PT126&ots=xiWAzX9HDK&dq=knuth%20bell%20lab%20challenge&ie=ISO-8859-1&pg=PT126#v=onepage&q&f=false]]. --[[User:Ledrug|Ledrug]] 23:55, 17 September 2011 (UTC)
 
== comments after writing the J implementation ==
Line 213 ⟶ 223:
::: I disagree about the pipe-like dataflow being not useful. For example, when reading or writing a PDF, a PDF stream object may go through multiple encoders/filters in sequence, e.g. PNG, then hexencoded, then RLE, then flate, and each stream itself may contain multiple streams inside (not unlike tar xf). When dealing with one of these, it's probably a good idea to use filters that pass on partial data as soon as possible so that total memory usage doesn't get blown out of proportion. It has ''nothing'' to do with OS pipes, either.
::: The beef you have is probably more of a J thing: assume we'll all have massively parallel computers in the near future; assume we'll always have enough memory; thus always deal with the full extent of available data because it's the 'right thing' and will naturally lead to better-looking, more concise code. Which is probably ok for academia, but it's not fair to blame everything else that fall outside of academic scope. --[[User:Ledrug|Ledrug]] 19:27, 16 September 2011 (UTC)
:::: I do not know enough about PDF internals to contemplate that one in any depth. But my impression of PDF is that it represents a sequence of pagepages. So I think the logical architecture there would be to produce each page independently. Or, if there is intermediate state on one page that's needed for a subsequent page, the logical architecture would be to produce each page sequentially. Or perhaps you split that in two and build up a document structure that's independent of the pages and then build the pages independently... But I am not aware of any requirement for an arbitrarily sized buffer which has nothing to do with the structure of the data being processed. That's what pipes do -- and it has no algorithmic value that I'm aware of. Mind you, I have seen PDF implementations which produce arbitrary rectangles on a page rather than the whole page. But that does not seem like much of a win when I am reading the thing -- usually it just means extra waits imposed on me before I can finish reading a page, and even when I am reading a PDF on my phone there seems to be enough memory to cache a complete image of the page at least for a short while. And, even if "rectangle smaller than a page" was an algorithmic requirement, that's still different from a pipe -- your page winds up being a collection of rectangles and your implementation knows how big they are. With pipes, the buffer size is imposed by the OS and can be subject to change for reasons that have nothing to do with your implementation. --[[User:Rdm|Rdm]] 19:41, 16 September 2011 (UTC)
 
== Scoping ==
 
Right now, the task description is long and difficult to read. It also prescribes a lot of work. So for the Tcl implementation I didn't bother doing all the syntactic parts and instead focussed on the concept of a pipeline as a sequence of items (keeping redirections as their own pipeline elements). That gives a lot of bang for the buck yet with very little effort. What I don't know is whether this short-cutting is acceptable as a solution, which indicates that the scoping/description of the task isn't quite right yet IMO.
 
For the record, my test case was this:
<lang tcl>pipeline cat bigfile.txt | grep "foo" | sort | head 5 | >> /dev/tty</lang>
Except with some minor changes (I used a real file and searched for something that I knew was there on about 1% of lines). –[[User:Dkf|Donal Fellows]] 10:56, 20 September 2011 (UTC)
 
I have to agree, so I 'reduced task description..." by removing the wikipedia cut/paste of "[[wp:Pipeline_(Unix)|Pipeline_(Unix)]]" as it didn't seem to add much value.
 
I'm still keen to see a degree of parallelism in the piping in the test case eg `cmd1;cmd2...` But strictly speaking this isn't an pipe or a redirection operator! Maybe it is worth putting conjunctions of pipes in another task...
 
[[User:NevilleDNZ|NevilleDNZ]] 12:53, 20 September 2011 (UTC)
 
:This comment reminds me of: http://jlouisramblings.blogspot.com/2011/07/erlangs-parallelism-is-not-parallelism.html -- or, from my point of view: we talk about "concurrency" and "parallelism" as if they were simple things when in fact they can represent a range of concepts, many of which are only loosely related to each other. Meanwhile, depending on the application, some of those concepts can be undesirable even though others are desirable. All of which can matter when dealing with issues of scope and practicality. --[[User:Rdm|Rdm]] 17:20, 21 September 2011 (UTC)
 
== Simplify description ==
 
Maybe the task description should simply read:
 
Create "user defined" the equivalents of the Unix shell "<", "|", ">", "<<", ">>" and $(cmd) operators and demonstate their operation.
 
Languages that do not support user defined operators should be omitted.
 
[[User:Markhobley|Markhobley]] 22:51, 9 February 2013 (UTC)
 
: That's not enough -- equivalence is contextual. Here, I can easily identify three different forms of equivalence, each of which allows a variety of variations even without considering optional combinations with the others.
 
:: Syntactic equivalence (ordering of the components)
 
:: Functional equivalence (results after execution)
 
:: Implementation equivalence (for example: similar buffer size and structure, use of fork(), ...)
 
: One problem, here, is that the interpretation favored on this site (functional equivalence) is trivial in the context of most programming languages - most of these operators wind up being "put a result somewhere". So, to avoid the trivialness of this task we might be inclined to favor an implementation equivalence (perhaps, including: buffer sizes less than multiple gigabytes in size, which in turn means buffers are required in the implementation), but where do you draw the line? For example, is it important to implement a process scheduler as a part of this task? [probably not, but what about the many other dozens of facets of how I have used these unix mechanisms?]. --[[User:Rdm|Rdm]] 03:33, 10 February 2013 (UTC)
 
:: I think just go for the trivial context for the purpose of this task, ie "put a result somewhere" is fine. For more complicated scenarios create a new task, eg "User defined pipe and redirection operators/With buffering", or "User defined pipe and redirection operators/With scheduler".
[[User:Markhobley|Markhobley]] 07:24, 10 February 2013 (UTC)
 
== Different grep ? ==
Perhaps the task details should be less "Algol oriented". The "Algol pioneers" count could be replaced by something more generic - "compiler" maybe? --[[User:Tigerofdarkness|Tigerofdarkness]] ([[User talk:Tigerofdarkness|talk]]) 11:55, 4 June 2017 (UTC)
3,021

edits