User talk:Abwillis

From Rosetta Code
Revision as of 20:23, 25 April 2017 by rosettacode>Gerard Schildberger (→‎Using translate and remove unwanted characters: added another comment.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

concerning the two REXX examples (strip control chars...)

Thanks for find that error in the two REXX examples that I entered concerning the Rosetta Code task:

Strip control codes and extended characters from a string.


I rewrote the two REXX examples and then I noticed that they were removing (wanted) multiple blank spaces,   so I rewrote the two REXX programs once again to address that problem.

One problem with the   translate   BIF is that it works on characters, not glyphs   which can be a problem when writing code to work on both ASCII and EBCDIC hardware.

Although the   translate   BIF could be used to change the unwanted characters to something other than a blank, one would still have to delete/remove all the unwanted characters, and the   space   BIF doesn't have this functionality/capability.   -- Gerard Schildberger (talk) 23:24, 22 April 2017 (UTC)


Using translate and remove unwanted characters

<lang rexx>/*REXX program strips all "control codes" from a character string (ASCII or EBCDIC). */ /* xxx= 'string of ☺☻♥♦⌂, may include control characters and other ilk.♫☼§►↔◄' */ xxx = 'string of ☺☻♥♦⌂, may include control characters and other ♫☼§►↔◄░▒▓█┌┴┐±÷²¬└┬┘ilk.'

below = xrange('00'x,'1F'x) above = xrange('80'x,'FF'x) yyy = translate(xxx,,below,'00'x) zzz = translate(yyy,,above,'00'x)

say 'old = »»»'xxx"«««" say 'new = »»»'yyy"«««" say 'newer = »»»'zzz"«««" </lang>

Output:
old = »»»string of ☺☻♥♦⌂, may include control characters and other    ♫☼§►↔◄░▒▓█┌┴┐±÷²¬└┬┘ilk.«««
new = »»»string of ☺☻♥♦⌂, may include control characters and other    ♫☼§►↔◄░▒▓█┌┴┐±÷²¬└┬┘ilk.«««
newer = »»»string of , may include control characters and other    ilk.«««

I had thought of the fact that all duplicate spaces would be stripped by space in my use case... as a result I had used a verify() to only process lines that had extended characters and not on those that did not (it was not a problem to remove extra spaces from the lines that had extended characters in my case but I did not want to remove them from all lines). I have, though, found what I think is a solution in the above to use translate. The '00'x is a Null character... I tried '7F'x first, which is a del character, which seemed to work for the original string but not the new one. There does not appear to be any control codes (00x-1Fx) as I see no change when removing them.



By the way, the REXX output shown (above) doesn't match what the actual REXX program is doing.

All that is being done is changing the unwanted characters to blanks;   the   space   BIF isn't be shown   (I'm assuming that it was intended to be used to eliminate the "unwanted" extra blanks).

Also, all previous text(s)/conversations on Rosetta Code should be left intact   (as I understand Rosetta Code policy),   although I've used the over-strike capabilities of HTML to show that older text is now irrelevant or somehow in error, but it leaves the old text still visible.   Other people reading the above changed text won't be able to follow the conversation   (showing an example of over-striking).   -- Gerard Schildberger (talk) 20:04, 25 April 2017 (UTC)


The Rosetta Code task is to remove all control characters and also extended characters.

That the test case that I chose   (for the REXX entry)   had no null characters ('00'x)   [and others]   and it shouldn't be used for any specific and assumptive programming solution for the general case,   where it isn't known in advance what characters are or aren't in the string.

The fly in the ointment is the special case of   all   control characters,   all   extended characters, and   all   "regular" characters are present in the string.   This essentially rules out the use of the   translate   BIF for this special case,   as there isn't any character to be used for the preservation of the original blanks.

In other forums   (years ago and far, far away)   discussing this issue, one solution was to change all blanks to some character such as   '00'x   or   'ff'x.


Then, the solutions becomes:

  •   find a character   (let's call it   ¥)   that   isn't   in the string,
  •   change all blanks   (via the   translate   BIF)   to the   ¥   character,
  •   change all the unwanted characters   (via the   translate   BIF)   to a blank,
  •   use the   space   BIF to remove all the (new) blanks with   space(xxx,0)   and
  •   change all the   ¥   characters to blanks   (via the   translate   BIF).


This works,   as long as there can be found a character   ¥   that isn't in the string being used.

Therein lies the rub.

So, I was left with the general case   (well, two cases)   of a REXX programming solution(s) to treat:

  •   the   wanted   characters, or
  •   the unwanted characters (which is normally faster as there are fewer unwanted characters)

on a character-by-character basis.   -- Gerard Schildberger (talk) 20:04, 25 April 2017 (UTC)


Also, handling of EBCDIC as well as ASCII was more easily handled with the REXX programming entries used.   -- Gerard Schildberger (talk) 20:22, 25 April 2017 (UTC)