Talk:Ordered words: Difference between revisions

← Older edit

Talk:Ordered words (view source)

Revision as of 19:18, 7 June 2019

16,350 bytes added , 4 years ago

→‎No Dictionary?: possible solution

Anonymous user

rosettacode>Balrog

Revision as of 06:11, 16 July 2012 (view source) Walterpachl (talk \| contribs) (→‎A bug (which was not really a bug) in Rexx solution: fast and readable) ← Older edit		Latest revision as of 19:18, 7 June 2019 (view source) rosettacode>Balrog (→‎No Dictionary?: possible solution)
(30 intermediate revisions by 5 users not shown)
Line 6: ::Indeed the task description should state either ::any lexicographical ordered dictionary such as ... (instead of 'this ~~disctionary~~dictionary') :::tasks should not be geared to one input, should they? ::or (better yet??) drop the requirement for it to be ordered. Line 96: I dare not change your program. : The misspelling wasn't in the program, but the REXX language entry section header.   -- [[User:Gerard Schildberger\|Gerard Schildberger]] ([[User talk:Gerard Schildberger\|talk]]) 23:06, 13 August 2016 (UTC) : I had programmed the REXX example to expect a lexicographical ordered word list. I corrected the error. -- [[User:Gerard Schildberger\|Gerard Schildberger]] 06:42, 14 July 2012 (UTC) Line 187 ⟶ 188: *********************************************************************/ </lang> As for me, I like to program for people (readingI with a limited line length. :now I see what you mean by your recurring elimination of dead code :which is actually avoiding empty lines created by the formatter and :makes copy/paste harder for me. ~~:And finally, can you provide your benchmark results for the strict comparison?~~ :: No, I was referring to dead '''code''', not whitespace. The code that I was referring to was the '''Parse Arg a''' REXX statement, which I see is no longer present in the above example, but it is back in the benchmark program. -- [[User:Gerard Schildberger\|Gerard Schildberger]] 20:53, 16 July 2012 (UTC) :: By the way, the above boxed REXX example for the '''uppercase''' subroutine/function is the best yet, a straight one-liner, albeit that it's really a two-liner. -- [[User:Gerard Schildberger\|Gerard Schildberger]] 20:53, 16 July 2012 (UTC) And finally, can you provide your benchmark results for the strict comparison case? ::--[[User:Walterpachl\|Walterpachl]] 06:11, 16 July 2012 (UTC) Interesting: for ooRexx it's a mere 30% <lang rexx> Call time 'R' Do i=1 to 10000000 x=uppercase('Wölter') End Say 'oneliner: ' i time('E') Call time 'R' Do i=1 to 10000000 x=uppercase2('Wölter') End Say 'Procedure:' I time('E') Exit uppercase: return translate(changestr("ß",translate(arg(1),'ÄÖÜ',"äöü"),'SS')) uppercase2: Procedure Parse Arg a a=translate(arg(1),'ÄÖÜ',"äöü") / translate lowercase umlaute / a=changestr("ß",a,'SS') / replace ß with SS / return translate(a) / translate lowercase letters / </lang> oneliner: 10000001 11.731000 Procedure: 10000001 16.029000 :please post your benchmark --[[User:Walterpachl\|Walterpachl]] 06:45, 16 July 2012 (UTC) ----- I can't speak for ooRexx as I don't have a copy to test it. I don't like to use words like ''mere'' which preloads a judgement. 30% of an 20 hour run is an extra   <sup>'''1</sup>/<sub>4'''</sub>   day (this would be in regards to that 82 million record "database"). I took your program ''as is'' and ran it on my isolated computer (no internet connection, no active anti-virus protection programs running, etc, it's a 3.20 GHz box and is running all four processors with five 100%-CPU-bound unrelated programs on below-normal priority), and the results are: <pre> oneliner: 10000001 13.088000 Procedure: 10000001 60.223000 </pre> Then, just to show what the REXX overhead is for processing a "normal" '''do''' loop, I replaced the <lang rexx>do i=1 to 10000000</lang> with <lang rexx>i=10000000 do i</lang> and the results are: <pre> oneliner: 10000001 13.412000 Procedure: 10000001 60.061000 </pre> Considering that the bulk of the execution time is spent in the subroutines, it's noteworthy; the difference is the way REXX handles incrementing a '''do''' loop index (and testing for termination of same). <br>I then made the program compliant by adding a <lang>/REXX/</lang> statement to the front of the program. <br>Code was then added for: user-specifiable (number of) times to repeat the loop * added "greasers" (have REXX allocate stuff so SUBs don't have to) * force REXX interpreter to read entire file (mostly for older REXXes) * a repetition of both invocations to eliminate snowplowing * use of FOR instead of TO in DO loops for faster execution * disallowing the caching effect for "small" loops * made invocations unique by using unique passed arguments * eliminating piggy-backing by not using the same variables * show the version of the REXX interpreter <br>Here's the code that was used for the 3rd benchmark: <lang rexx>/REXX/ parse version _; say 'version:' _; say arg times . if times=='' then times=10 * 1000000 /ten million times, yikes ! / call time 'R' /just grease the wheels a bit/ call uppercase; call uppercase2 /force some REXXes to get 'em/ x=0; y=0; j=0; k=0 /have REXX allocate variables/ do 2 call time 'R' /────────────────────reset the REXX timer/ do j=1 for times x=uppercase(j 'w÷lter') end say 'oneliner: ' j time('e') call time 'R' /────────────────────reset the REXX timer/ do k=1 for times y=uppercase2(k 'w÷lter') end say 'procedure:' k time('e') end exit /──────────────────────────────────UPPERCASE subroutine─────────────/ uppercase: return translate(changestr("ß",translate(arg(1),'ÄÖÜ',"äöü"),'SS')) /──────────────────────────────────UPPERCASE2 subroutine────────────/ uppercase2: procedure parse arg a /<-------------------------------------- dead code. / a=translate(arg(1),'ÄÖÜ',"äöü") /* translate lowercase umlaute / a=changestr("ß",a,'SS') / replace ß with SS / return translate(a) / translate lowercase letters /</lang> and the results were: <pre> version: REXX-Regina_3.6(MT) 5.00 31 Dec 2011 oneliner: 10000001 15.605000 Procedure: 10000001 63.023000 oneliner: 10000001 15.420000 Procedure: 10000001 63.106000 </pre> <br>More work should be done on the benchmark REXX program(s), but there's only so much time in a day... -- [[User:Gerard Schildberger\|Gerard Schildberger]] 20:53, 16 July 2012 (UTC) ... and it's not worth doing it ooRexx results: <pre> version: REXX-ooRexx_4.0.1(MT) 6.03 2 May 2010 oneliner: 10000001 16.271000 procedure: 10000001 18.751000 oneliner: 10000001 15.881000 procedure: 10000001 18.767000 </pre> At to ''mere'': I'd assume that the program does some other stuff in them 20 hours, so the 20 % cost shown here would amount to how many minutes? --[[User:Walterpachl\|Walterpachl]] 05:28, 17 July 2012 (UTC) ----- It's a moot point, as I '''cannot''' use ooRexx. So, the question is, is the ''procedure'' version worth the four times the execution time in the REXX that I have to use? The answer to your question is: no, anything more than twenty hours is too long, the run takes long enough as it is. ooRexx consumes too much virtual storage (which is just one of my concerns), and the big classic REXX program is always bumping into the 2G limit (this is for Regina REXX). What I mean is that the program frequently exhausts virtual memory and the run (solution) has to be managed in another way, essentially breaking up the many runs into more multiple runs, which is a major pita. What I remember from 15 years ago, (IBM's o-o REXX for a big program consummed too much CPU for the this type of program (long running, lots of I/O, very big stemmed arrays) that I use. "It" is two main programs, 3825 + 330 REXX statements, plus it makes use of other classic REXX programs. I have no desire to install ooRexx and then spend many hours reworking a bunch of classic REXX programs to work with ooRexx. Another thing to compare would be a REXX program that runs under (say) Regina REXX, and compare it to running under ooRexx (on the same hardward and operating system, of course). It would be an interesting comparison. Since ooRexx was originally (I think) written (coded) by IBM, I assume it has pretty high standards. I really don't know if IBM wrote the code or had it written elsewhere. -- [[User:Gerard Schildberger\|Gerard Schildberger]] 05:58, 17 July 2012 (UTC) ----- It 'classic' IBM Rexx and ooRexx was written by IBM (people) and the key person(s) are still here (in RexxLA). High standards? Yes, I managed the testing of the Rexx compiler(s) on VM and then TSO (one of my best lifetime projects). --[[User:Walterpachl\|Walterpachl]] 06:25, 17 July 2012 (UTC) ==REXX benchmarks== These are the results for REXX exact vs. regular comparisons as per Walter's request. <br>I no longer have the original ''regular compare'' vs. ''exact compare'' REXX bench-marking programs, <br>but I took the (above) existing code and ripped its guts out (er, disemboweled it), and made a <br>simple benchmark test out of it. I soon discovered that the two versions of the '''if''' statement was being dwarfed by the <br>overhead of the '''do''' loop, so I unrolled the '''if''' statements. Just for grins, I reversed the order of the compares on every other compare, and I was <br>somewhat surprised that more CPU time was consumed. <br>I left that modification in the benchmark program. I ran the REXX benchmark against the three classic REXX interpreters that I have <br>installed on my two computers, plus an o-o REXX interpreter: ::   R4 ::*   ROO ::*   Regina ::*   Personal REXX <lang rexx>/REXX/ parse version _; say 'version:' _; say arg times . if times=='' then times=1000000 /default is one-million times/ call time 'R' /just grease the wheels a bit/ j=0; k=0; x=0; y=0 /have REXX allocate variables/ do 3 call time 'R' /────────────────────reset the REXX timer/ do j=1 for times if _=j then x=j if j=_ then x=j if _=j then x=j if j=_ then x=j if _=j then x=j if j=_ then x=j if _=j then x=j if j=_ then x=j if _=j then x=j if j=_ then x=j if _=j then x=j if j=_ then x=j end say ' reg compare:' times "times" right(format(time('e'),,2),15) call time 'R' /────────────────────reset the REXX timer/ do k=1 for times if _==k then y=k if k==_ then y=k if _==k then y=k if k==_ then y=k if _==k then y=k if k==_ then y=k if _==k then y=k if k==_ then y=k if _==k then y=k if k==_ then y=k if _==k then y=k if k==_ then y=k end say 'Xact compare:' times "times" right(format(time('e'),,2),15) say end</lang> Using the benchmark program (shown above), for both computers (one is running <br>Windows/XP pro, the other is running Windows 7), the results are: <pre> * R4 --------------- 27% slower using regular comparisons * ROO -------------- 39% slower using regular comparisons * Regina ----------- 150% slower using regular comparisons * Personal REXX ---- 250% slower using regular comparisons </pre> <br>(In all of the above runs (about a half-dozen runs on each computer), I used the <br>lowest percentage found.) <lang rexx>/REXX/ parse version _; say 'version:' _; say; _=word(_,2) arg times . if times=='' then times=1000000 /default is one-million times/ call time 'R' /just grease the wheels a bit/ j=0; k=0; x=0; y=0; p=0; q=0 /have REXX allocate variables/ do 3 call time 'R' /────────────────────reset the REXX timer/ do j=1 for times p=_\|\|j if p=j then x=j if j=p then x=j end say ' reg compare:' times "times" right(format(time('e'),,2),15) call time 'R' /────────────────────reset the REXX timer/ do k=1 for times q=_\|\|j if q==k then y=k if k==q then y=k end say 'Xact compare:' times "times" right(format(time('e'),,2),15) say end</lang> Using the benchmark program (shown above) and using the same methodology, the results are: <pre> * R4 --------------- 115% slower using regular comparisons (115% --> 119%) * ROO -------------- 39% slower using regular comparisons (39% --> 40%) * Regina ----------- 40% slower using regular comparisons (40% --> 41%) * Personal REXX ---- 35% slower using regular comparisons (35% --> 41%) </pre> <br>(In all of the above runs (again, about a half-dozen runs), I used the lowest percentage found, but I included the ranges as well.) <br><br>Please note that these benchmark tests were of the "quick and dirty" type, and I didn't have the <br>time to spend on it as I would have wished. I spent (probably) way too much time on this simple case. <br>This is to say, your mileage may vary. I hope others will execute these two REXX benchmark programs <br>for other REXXes (or the same ones on other operating systems, other hardware). <br>I tried to include the usual methodologies to minimize background noise and overhead interference. <br>As with most benchmarks, I often feel that I'm leading a horse to water ... <br> -- [[User:Gerard Schildberger\|Gerard Schildberger]] 23:32, 16 July 2012 (UTC) P.S.:   I benchmarked the programs on an air-gap computer.   -- [[User:Gerard Schildberger\|Gerard Schildberger]] ([[User talk:Gerard Schildberger\|talk]]) 07:02, 15 August 2018 (UTC) ----- ooRexx Results for the above 2 programs thar surprised me (a little): <pre> version: REXX-ooRexx_4.0.1(MT) 6.03 2 May 2010 reg compare: 1000000 times 1.33 Xact compare: 1000000 times 1.11 version: REXX-ooRexx_4.0.1(MT) 6.03 2 May 2010 reg compare: 1000000 times 1.09 Xact compare: 1000000 times 0.53 </pre> --[[User:Walterpachl\|Walterpachl]] 05:16, 17 July 2012 (UTC) ==Ruby Golfing?== It looks as if the last Ruby example is just a "Code golf" solution and is not idiomatic Ruby. If so then it probably shouldn't be on RC. What do you think?<br> --[[User:Paddy3118\|Paddy3118]] ([[User talk:Paddy3118\|talk]]) 09:08, 14 August 2015 (UTC) It's pretty idiomatic (would add a couple more spaces and use full words instead of chars for variables), but I agree that it ventures closer to that realm. So, I deleted it. I was comparing my solution to the "short local version" of the python code. I would argue that my one line of ruby code (which I pulled) is more idiomatic than the python code for the "short local version". The python code fails to use variable names (uses single char variable names), and it leaves a dangling filehandle in the way the file is opened (should use a with..as context manager). Also, given that python is at version 3.4, the use of 2.X print statement syntax is outdated (should be a print function). In other words, I think an argument could be made that my one liner is more idiomatic ruby than the python "short local version". Perhaps that one should be updated in a similar fashion? [I don't really care, I'm just bringing it up for the sake of consistency]. --[[User:Jtprince\|Jtprince]] ([[User talk:Jtprince\|talk]]) 15:52, 14 August 2015 (UTC) ==No Dictionary?== At this moment, the URL for the dictionary (http://www.puzzlers.org/pub/wordlists/unixdict.txt) is returning a 401. Is any action required to remedy this? --[[User:Balrog\|Balrog]] ([[User talk:Balrog\|talk]]) 20:20, 14 August 2018 (UTC) : Yes, it would be nice to have a (stable) version of the   '''unixdict.txt'''   stored somewhere on Rosetta Code,   that way,   any new computer programming examples would be consistent with those entered before the latest updates or changes that might have been made to the original (dictionary) file.   Plus it would eliminate the possibility of any 401 and 404 errors,   and the possibility of added cookies from the original host site mentioned above.   -- [[User:Gerard Schildberger\|Gerard Schildberger]] ([[User talk:Gerard Schildberger\|talk]]) 22:50, 14 August 2018 (UTC) ::I started to look into it, but have to stop, myself. I got as far as [http://www.puzzlers.org/word-lists this page] which states that '''some''' of the lists may be open-source. If someone finds that we could site the list here then we might then try and get a copy re-hosted; but it would have to be the exact same page or saved as a latin_1 encoded text file. (We would not want a copy to cause problems with existing code). :::I successfully accessed the wordlist using the "Wayback Machine" with this URL ==> https://web.archive.org/web/20180611003215/http://www.puzzlers.org/pub/wordlists/unixdict.txt :::I'll edit that URL into the page. If that's a mistake feel free to back out my change. :::--[[User:Balrog\|Balrog]] ([[User talk:Balrog\|talk]]) 19:16, 7 June 2019 (UTC)