Talk:User defined pipe and redirection operators: Difference between revisions

Suggest changing the task to less language specific
(Suggest changing the task to less language specific)
 
(20 intermediate revisions by 6 users not shown)
Line 29:
FORMAT rec fmt = $g$;
 
PR READ "Coroutine_pipe_operatorsIterator_pipe_operators.a68" PR
PR READ "Coroutine_pipe_utilitiesIterator_pipe_utilities.a68" PR
 
FLEX[0]STRING aa;
Line 46:
''' So far:'''
<pre>
$ wc -l *Coroutine_pipeIterator_pipe*s.a68
174 Coroutine_pipe_operatorsIterator_pipe_operators.a68
58 Coroutine_pipe_utilitiesIterator_pipe_utilities.a68
20 test_Coroutine_pipe_operatorstest_Iterator_pipe_operators.a68
252 total
</pre>
Line 170:
[[User:NevilleDNZ|NevilleDNZ]] 02:15, 14 September 2011 (UTC)
: Coroutines and multithreading are completely separate topics. You yourself linked to coroutine implementation using Duff's device before, which decidedly is a single thread. And for your stream objects, you don't really need either of those anyway. --[[User:Ledrug|Ledrug]] 02:48, 14 September 2011 (UTC)
 
Indeed you are right. On reflection, linking to that page on [http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html C coroutines] was a mistake. Duff's device is kind of extreme. (Interesting, but extreme!) In fact I've renamed the Algol routines to "Iterator_pipe_operators.a68". I'll drop reference to co-processing out of the ''Task Description'' too. ThanX
 
[[User:NevilleDNZ|NevilleDNZ]] 03:50, 14 September 2011 (UTC)
 
:Looking at the current task, nothing in the task depends on data handling. In fact, I could not see any issues where I could distinguish between "all at once" handling and "buffer block at a time" handling. In some cases (for example, sort), it's simply not possible to produce any output until after the end of file is reached. In other cases (cat) the command is a no-op. In the remaining cases (grep, tee), the data set being produced is too small to fill a buffer block. --[[User:Rdm|Rdm]] 14:10, 16 September 2011 (UTC)
 
== test data? ==
 
The task page currently references "List_of_computer_scientists.lst - cut from wikipedia" but without any url. I can find [[wp:List of computer scientists]] but it's not clear to me that the corresponding html of that page is the intended starting point. Perhaps the task could be more specific here? --[[User:Rdm|Rdm]] 14:16, 14 September 2011 (UTC)
 
I've just added 'A test sample of scientists from wikipedia's "List of computer scientists"' table.
 
As this task is still in "Draft" I am keen to find a "classic shell script" that showcases the important features of piping {cat,tee,misc filters} and the various redirections. Any suggestions?
 
Also the "List of computer scientists" is a bit random and the nature of the records is that ''sort | uniq'' filters are not required. Any suggestions about more appropriate data. Maybe we could use "[[Old lady swallowed a fly]]" to generate the test data as the lyrics are wonderfully repetitious? Also the code to generate the test data appears to be suitably brief.
 
[[User:NevilleDNZ|NevilleDNZ]] 22:32, 17 September 2011 (UTC)
 
:Here's a one with some history: [[http://books.google.com/books?id=jO-iKwPRX0QC&lpg=PT126&ots=xiWAzX9HDK&dq=knuth%20bell%20lab%20challenge&ie=ISO-8859-1&pg=PT126#v=onepage&q&f=false]]. --[[User:Ledrug|Ledrug]] 23:55, 17 September 2011 (UTC)
 
== comments after writing the J implementation ==
 
I think its worth noting that by following the dictates of the task -- which specify "how" the algorithm should be implmented -- we get code an order of magnitude bulkier (and an order of magnitude harder to read, and measurably slower) than an implementation which ignores those dictates.
 
I have enjoyed my time here, on rosettacode, when we have tasks which allow me to implement them. I have found the "tasks" which tell me how I am allowed to solve the problem much less enjoyable, because those kinds of constraints almost universally keep me from doing "the right thing". And, this is not my idea of fun.
 
In this case, we have syntactic requirements (why?) and we have dataflow requirements (why?) and for illustration we have an example where none of these requirements matter at all (except in terms of code complexity).
 
I think that when we have an example which does not illustrate the task (or, worse, were the task asks us to implement no example) that we have something which no one really needs. If there was a need, there would be a clear example which relates to those needs. I think that the absence of any good example is a symptom of a task that needs to be replaced.
 
And, the same goes for tasks which specify "allowed techniques". In my opinion, these are not the sort of thing that anyone trying to solve a problem in a language should reach for. When we disallow better solutions we are, by definition, asking for a suboptimal implementation. Anyone serious about using a computer to solve a problem should probably avoid any implementations taken from tasks which prohibit better solutions.
 
At minimum, I think we should put all tasks which impose constraints on how the task is implemented in a category which would warn people that they should consider alternative techniques if they need to solve a related problem. --[[User:Rdm|Rdm]] 15:46, 16 September 2011 (UTC)
 
Of course, there are cases where the techniques used in this task are useful. Unfortunately, this task is not currently one of those cases. --[[User:Rdm|Rdm]] 15:46, 16 September 2011 (UTC)
 
:Then lets hope the task stays as draft until you've had time to discuss this with the original author as others have made similar comments, and I read the task and decided to wait before starting on a Python solution . --[[User:Paddy3118|Paddy3118]] 15:58, 16 September 2011 (UTC)
 
:P.S. What would you change the task to? (It's a collaboratve site after-all) --[[User:Paddy3118|Paddy3118]] 16:00, 16 September 2011 (UTC)
 
::Overall? I do not know. But I see three different tasks here:
::# Composing operations
::# Syntax declaration
::# Dataflow management
::Of those three, syntax declaration seems the most useless. At the very least, I see no reason to impose "left to right" processing on all languages. So I would be inclined to discard the syntactic requirements.
::Of the remaining two, composition should be trivial, for any language represented here. But that might be worthy of a task, or at least a reference to an existing task. (And we probably do have something like that already posted.)
::That leaves data flow manipulation. But the dataflow used in pipes is something of a hack -- it is neither reliable, nor algorithmically useful. You mostly see it in action because file systems are so slow, but depending on what you are doing you still might need to wait hours before getting meaningful results...
::Of course, composability matters -- it [http://blog.dbpatterson.com/post/10244529137 matters a lot] -- but it's scarce at the OS level, for some operating systems, and is readily available in the context of programming languages.
::Similarly, good data flow also matters... and is also readily available in the context of programming languages (as long as you do not try to make one language behave exactly like another).
::So... personally? I would not bother changing the task -- I would just file it somewhere out of the way. It's not like there's any shortage of problems that need solving. --[[User:Rdm|Rdm]] 16:56, 16 September 2011 (UTC)
::: I disagree about the pipe-like dataflow being not useful. For example, when reading or writing a PDF, a PDF stream object may go through multiple encoders/filters in sequence, e.g. PNG, then hexencoded, then RLE, then flate, and each stream itself may contain multiple streams inside (not unlike tar xf). When dealing with one of these, it's probably a good idea to use filters that pass on partial data as soon as possible so that total memory usage doesn't get blown out of proportion. It has ''nothing'' to do with OS pipes, either.
::: The beef you have is probably more of a J thing: assume we'll all have massively parallel computers in the near future; assume we'll always have enough memory; thus always deal with the full extent of available data because it's the 'right thing' and will naturally lead to better-looking, more concise code. Which is probably ok for academia, but it's not fair to blame everything else that fall outside of academic scope. --[[User:Ledrug|Ledrug]] 19:27, 16 September 2011 (UTC)
:::: I do not know enough about PDF internals to contemplate that one in any depth. But my impression of PDF is that it represents a sequence of pages. So I think the logical architecture there would be to produce each page independently. Or, if there is intermediate state on one page that's needed for a subsequent page, the logical architecture would be to produce each page sequentially. Or perhaps you split that in two and build up a document structure that's independent of the pages and then build the pages independently... But I am not aware of any requirement for an arbitrarily sized buffer which has nothing to do with the structure of the data being processed. That's what pipes do -- and it has no algorithmic value that I'm aware of. Mind you, I have seen PDF implementations which produce arbitrary rectangles on a page rather than the whole page. But that does not seem like much of a win when I am reading the thing -- usually it just means extra waits imposed on me before I can finish reading a page, and even when I am reading a PDF on my phone there seems to be enough memory to cache a complete image of the page at least for a short while. And, even if "rectangle smaller than a page" was an algorithmic requirement, that's still different from a pipe -- your page winds up being a collection of rectangles and your implementation knows how big they are. With pipes, the buffer size is imposed by the OS and can be subject to change for reasons that have nothing to do with your implementation. --[[User:Rdm|Rdm]] 19:41, 16 September 2011 (UTC)
 
== Scoping ==
 
Right now, the task description is long and difficult to read. It also prescribes a lot of work. So for the Tcl implementation I didn't bother doing all the syntactic parts and instead focussed on the concept of a pipeline as a sequence of items (keeping redirections as their own pipeline elements). That gives a lot of bang for the buck yet with very little effort. What I don't know is whether this short-cutting is acceptable as a solution, which indicates that the scoping/description of the task isn't quite right yet IMO.
 
For the record, my test case was this:
<lang tcl>pipeline cat bigfile.txt | grep "foo" | sort | head 5 | >> /dev/tty</lang>
Except with some minor changes (I used a real file and searched for something that I knew was there on about 1% of lines). –[[User:Dkf|Donal Fellows]] 10:56, 20 September 2011 (UTC)
 
I have to agree, so I 'reduced task description..." by removing the wikipedia cut/paste of "[[wp:Pipeline_(Unix)|Pipeline_(Unix)]]" as it didn't seem to add much value.
 
I'm still keen to see a degree of parallelism in the piping in the test case eg `cmd1;cmd2...` But strictly speaking this isn't an pipe or a redirection operator! Maybe it is worth putting conjunctions of pipes in another task...
 
[[User:NevilleDNZ|NevilleDNZ]] 12:53, 20 September 2011 (UTC)
 
:This comment reminds me of: http://jlouisramblings.blogspot.com/2011/07/erlangs-parallelism-is-not-parallelism.html -- or, from my point of view: we talk about "concurrency" and "parallelism" as if they were simple things when in fact they can represent a range of concepts, many of which are only loosely related to each other. Meanwhile, depending on the application, some of those concepts can be undesirable even though others are desirable. All of which can matter when dealing with issues of scope and practicality. --[[User:Rdm|Rdm]] 17:20, 21 September 2011 (UTC)
 
== Simplify description ==
 
Maybe the task description should simply read:
 
Create "user defined" the equivalents of the Unix shell "<", "|", ">", "<<", ">>" and $(cmd) operators and demonstate their operation.
 
Languages that do not support user defined operators should be omitted.
 
[[User:Markhobley|Markhobley]] 22:51, 9 February 2013 (UTC)
 
: That's not enough -- equivalence is contextual. Here, I can easily identify three different forms of equivalence, each of which allows a variety of variations even without considering optional combinations with the others.
 
:: Syntactic equivalence (ordering of the components)
 
:: Functional equivalence (results after execution)
 
:: Implementation equivalence (for example: similar buffer size and structure, use of fork(), ...)
 
: One problem, here, is that the interpretation favored on this site (functional equivalence) is trivial in the context of most programming languages - most of these operators wind up being "put a result somewhere". So, to avoid the trivialness of this task we might be inclined to favor an implementation equivalence (perhaps, including: buffer sizes less than multiple gigabytes in size, which in turn means buffers are required in the implementation), but where do you draw the line? For example, is it important to implement a process scheduler as a part of this task? [probably not, but what about the many other dozens of facets of how I have used these unix mechanisms?]. --[[User:Rdm|Rdm]] 03:33, 10 February 2013 (UTC)
 
:: I think just go for the trivial context for the purpose of this task, ie "put a result somewhere" is fine. For more complicated scenarios create a new task, eg "User defined pipe and redirection operators/With buffering", or "User defined pipe and redirection operators/With scheduler".
[[User:Markhobley|Markhobley]] 07:24, 10 February 2013 (UTC)
 
== Different grep ? ==
Perhaps the task details should be less "Algol oriented". The "Algol pioneers" count could be replaced by something more generic - "compiler" maybe? --[[User:Tigerofdarkness|Tigerofdarkness]] ([[User talk:Tigerofdarkness|talk]]) 11:55, 4 June 2017 (UTC)
3,021

edits