concerning the two REXX examples (strip control chars...)
Thanks for find that error in the two REXX examples that I entered concerning the Rosetta Code task:
- Strip control codes and extended characters from a string.
I rewrote the two REXX examples and then I noticed that they were removing (wanted) multiple blank spaces, so I rewrote the two REXX programs once again to address that problem.
One problem with the translate BIF is that it works on characters, not glyphs which can be a problem when writing code to work on both ASCII and EBCDIC hardware.
Although the translate BIF could be used to change the unwanted characters to something other than a blank, one would still have to delete/remove all the unwanted characters, and the space BIF doesn't have this functionality/capability. -- Gerard Schildberger (talk) 23:24, 22 April 2017 (UTC)
Using translate and remove unwanted characters
<lang rexx>/*REXX program strips all "control codes" from a character string (ASCII or EBCDIC). */ /* xxx= 'string of ☺☻♥♦⌂, may include control characters and other ilk.♫☼§►↔◄' */ xxx = 'string of ☺☻♥♦⌂, may include control characters and other ♫☼§►↔◄░▒▓█┌┴┐±÷²¬└┬┘ilk.'
below = xrange('00'x,'1F'x) above = xrange('80'x,'FF'x) yyy = translate(xxx,,below,'00'x) zzz = translate(yyy,,above,'00'x)
say 'old = »»»'xxx"«««" say 'new = »»»'yyy"«««" say 'newer = »»»'zzz"«««" </lang>
Output: old = »»»string of ☺☻♥♦⌂, may include control characters and other ♫☼§►↔◄░▒▓█┌┴┐±÷²¬└┬┘ilk.««« new = »»»string of ☺☻♥♦⌂, may include control characters and other ♫☼§►↔◄░▒▓█┌┴┐±÷²¬└┬┘ilk.««« newer = »»»string of , may include control characters and other ilk.«««
I had thought of the fact that all duplicate spaces would be stripped by space in my use case... as a result I had used a verify() to only process lines that had extended characters and not on those that did not (it was not a problem to remove extra spaces from the lines that had extended characters in my case but I did not want to remove them from all lines). I have, though, found what I think is a solution in the above to use translate. The '00'x is a Null character... I tried '7F'x first, which is a del character, which seemed to work for the original string but not the new one. There does not appear to be any control codes (00x-1Fx) as I see no change when removing them.
By the way, the REXX output shown (above) doesn't match what the actual REXX program is doing.
All that is being done is changing the unwanted characters to blanks; the space BIF isn't be shown (I'm assuming that it was intended to be used to eliminate the "unwanted" extra blanks).
Also, all previous text(s)/conversations on Rosetta Code should be left intact (as I understand Rosetta Code policy), although I've used the over-strike capabilities of HTML to show that older text is now irrelevant or somehow in error, but it leaves the old text still visible.
Other people reading the above changed text won't be able to follow the conversation (showing an example of over-striking). -- Gerard Schildberger (talk) 20:04, 25 April 2017 (UTC)
The Rosetta Code task is to remove all control characters and also extended characters.
That the test case that I chose (for the REXX entry) had no null characters ('00'x) [and others] and it shouldn't be used for any specific and assumptive programming solution for the general case, where it isn't known in advance what characters are or aren't in the string.
The fly in the ointment is the special case of all control characters, all extended characters, and all "regular" characters are present in the string. This essentially rules out the use of the translate BIF for this special case, as there isn't any character to be used for the preservation of the original blanks.
In other forums (years ago and far, far away) discussing this issue, one solution was to change all blanks to some character such as '00'x or 'ff'x.
Then, the solutions becomes:
- find a character (let's call it ¥) that isn't in the string,
- change all blanks (via the translate BIF) to the ¥ character,
- change all the unwanted characters (via the translate BIF) to a blank,
- use the space BIF to remove all the (new) blanks with space(xxx,0) and
- change all the ¥ characters to blanks (via the translate BIF).
This works, as long as there can be found a character ¥ that isn't in the string being used.
Therein lies the rub.
So, I was left with the general case (well, two cases) of a REXX programming solution(s) to treat:
- the wanted characters, or
- the unwanted characters (which is normally faster as there are fewer unwanted characters)