Jump to content

GSTrans string conversion

From Rosetta Code
Revision as of 16:21, 11 September 2024 by Hkdtam (talk | contribs) (added Perl programming solution)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Task
GSTrans string conversion
You are encouraged to solve this task according to the task description, using any language you may know.

GSTrans string encoding is a method of encoding all 8-bit character values 0-255 with only printable characters. It originates on Acorn computers to allow command line commands to process non-printable characters.

 Character Encoding
 0-31      |letter eg |@, |A, |i |[ etc.
 32-126    character, except for:
 "         |"
 |         ||
 127       |?
 128-255   |! followed by encoding, eg |!|@ = 128

A string can be surrounded in quotes, eg "ALERT|G".

See http://www.riscos.com/support/developers/prm/conversions.html

Examples:

 |LHello|G|J|M       encodes  CHR$12;"Hello";CHR$7;CHR$10;CHR$13
 "|m|j|@|e|!t|m|!|?" encodes  13,10,0,5,244,13,255
Task
  • Write two functions, one to encode a string of characters into a GSTrans string, and one to decode a GSTrans string. Indicate if any error checking is done, and how it is indicated.

ALGOL 68

As with the Wren sample includes the Julia test cases, but Algol 68 implementations don't generally handle Unicode characters, STRINGs are arrays of CHARs and a CHAR is usually a single byte.
This does a little error checking - as in the Wren, etc. samples, if an invalid byte value ( < 0 or > 255 ) is the result of a decoding, the original character is retained. Additionally, if the string ends with "|" or "|!", the final "|" ot "|!" is ignored.
Quoted strings retain their quotes when encoded or decoded.
In the output of the following, control characters are shown as their decimal values, enclosed in "[" and "]"/

BEGIN # GSTrans string conversion                                            #
    OP   UNQUOTE = ( STRING s )STRING:                  # returns s unquoted #
         IF LWB s >= UPB s THEN s
         ELIF s[ LWB s ] /= """" OR s[ UPB s ] /= """" THEN s
         ELSE s[ LWB s + 1 : UPB s - 1 ]
         FI # UNQUOTE # ;
    OP   ENCODE = ( STRING str )STRING:                # returns str encoded #
         BEGIN
            STRING result := "";
            STRING s       = UNQUOTE str;
            FOR i FROM LWB s TO UPB s DO
                INT c = ABS s[ i ];
                result +:= IF   c < 32 THEN
                                "|" + REPR ( c + 64 )
                           ELIF c = ABS """" OR c = ABS "|" THEN
                               "|" + s[ i ]
                           ELIF c >= 32 AND c <= 126 THEN
                                s[ i ]
                           ELIF c = 127 THEN
                                "|?"
                           ELSE
                                "|!" + ENCODE STRING( REPR( c - 128 ) )
                           FI
            
            OD;
            IF s /= str THEN """" + result + """" ELSE result FI
         END # ENCODE # ;
    OP   DECODE = ( STRING str )STRING:                # returns str decoded #
         BEGIN
            STRING result := "";
            STRING s       = UNQUOTE str;
            INT i := LWB s;
            WHILE i <= UPB s DO
                result +:= IF   s[ i ] /= "|" THEN
                                s[ i ]
                           ELIF ( i +:= 1 ) > UPB s THEN
                                ""
                           ELIF s[ i ] = """" OR s[ i ] = "|" THEN
                                s[ i ]
                           ELIF s[ i ] = "?" THEN
                                REPR 127
                           ELIF s[ i ] /= "!" THEN
                                INT ch = ABS s[ i ] - 64;
                                IF ch < 0 THEN s[ i ] ELSE REPR ch FI
                           ELSE
                                i +:= 1;
                                IF   i > UPB s THEN
                                     ""
                                ELIF s[ i ] /= "|" THEN
                                     INT ch = ABS s[ i ] + 128;
                                     IF ch > 255 THEN s[ i ] ELSE REPR ch FI
                                ELIF ( i +:= 1 ) > UPB s THEN
                                     ""
                                ELSE
                                     STRING c = DECODE STRING( "|" + s[ i ] );
                                     INT ch = ABS c[ LWB c ] + 128;
                                     IF ch > 255 THEN s[ i ] ELSE REPR ch FI
                                FI
                           FI;
                  i +:= 1
            OD;
            IF s /= str THEN """" + result + """" ELSE result FI
         END # DECODE # ;
    OP   SHOWBYTES = ( STRING s )STRING:  # return s with control characters #
         BEGIN                            # replaced by their value          #
            STRING result := "";
            FOR i FROM LWB s TO UPB s DO
                INT c = ABS s[ i ];
                result +:= IF c < 32 THEN
                               "[" + whole( c, 0 ) + "]"
                           ELSE
                               s[ i ]
                           FI
             OD;
             result
          END # SHOWBYTES # ;

    []STRING test = ( "ALERT|G",    "wert↑",       "@♂aN°$ª7Î"    # test cases #
                    , "ÙC▼æÔt6¤☻Ì", """@)Ð♠qhýÌÿ", "+☻#o9$u♠©A" # from Julia #
                    , "♣àlæi6Ú.é",  "ÏÔ♀È♥@ë",     "Rç÷\%◄MZûhZ"
                    , "ç>¾AôVâ♫↓P"
                    , REPR 12 + "Hello" + REPR 7 + REPR 10 + REPR 13 # Task test cases  #
                    , REPR 13 + REPR 10 + REPR 0 + REPR 5 + REPR 244 + REPR 13 + REPR 255
                    , """quoted|text"""                              # quoted test case #
                    );
    FOR i FROM LWB test TO UPB test DO
        STRING encoded = ENCODE test[ i ];
        STRING decoded = DECODE encoded;
        print( ( SHOWBYTES test[ i ], " -> ", encoded, " -> ", SHOWBYTES decoded
               , IF decoded = test[ i ] THEN "" ELSE " ****" FI, newline
               )
             )
    OD;
    STRING invalid = "|=|1|!";
    print( ( "Decoding: ", invalid, " -> ", DECODE invalid, newline ) )
END
Output:
ALERT|G -> ALERT||G -> ALERT|G
wert↑ -> wert|!b|!|F|!|Q -> wert↑
@♂aN°$ª7Î -> @|!b|!|Y|!|BaN|!B|!0$|!B|!*7|!C|!|N -> @♂aN°$ª7Î
ÙC▼æÔt6¤☻Ì -> |!C|!|YC|!b|!|V|!<|!C|!&|!C|!|Tt6|!B|!$|!b|!|X|!;|!C|!|L -> ÙC▼æÔt6¤☻Ì
"@)Ð♠qhýÌÿ -> |"@)|!C|!|P|!b|!|Y|! qh|!C|!=|!C|!|L|!C|!? -> "@)Ð♠qhýÌÿ
+☻#o9$u♠©A -> +|!b|!|X|!;#o9$u|!b|!|Y|! |!B|!)A -> +☻#o9$u♠©A
♣àlæi6Ú.é -> |!b|!|Y|!#|!C|! l|!C|!&i6|!C|!|Z.|!C|!) -> ♣àlæi6Ú.é
ÏÔ♀È♥@ë -> |!C|!|O|!C|!|T|!b|!|Y|!|@|!C|!|H|!b|!|Y|!%@|!C|!+ -> ÏÔ♀È♥@ë
Rç÷\%◄MZûhZ -> R|!C|!'|!C|!7\%|!b|!|W|!|DMZ|!C|!;hZ -> Rç÷\%◄MZûhZ
ç>¾AôVâ♫↓P -> |!C|!'>|!B|!>A|!C|!4V|!C|!|"|!b|!|Y|!+|!b|!|F|!|SP -> ç>¾AôVâ♫↓P
[12]Hello[7][10][13] -> |LHello|G|J|M -> [12]Hello[7][10][13]
[13][10][0][5]ô[13]ÿ -> |M|J|@|E|!t|M|!|? -> [13][10][0][5]ô[13]ÿ
"quoted|text" -> "quoted||text" -> "quoted|text"
Decoding: |=|1|! -> =1

BBC BASIC

   10 REM > GSTrans.bbc
   20 REM GSTrans in BASIC
   30 REM J.G.Harston
   40 :
   50 REPEAT
   60   INPUT LINE "GSstring: "A$
   70   A$=FNGS_Decode(A$,0)
   80   A$=FNGS_Encode(A$)
   90   PRINT A$
  100 UNTIL FALSE
  110 END
  120 :
  130 :
  140 :
  150 REM Decode a GSTrans string
  160 REM On entry: inp$=GSTransed string
  170 REM           flg%=0 - parse whole string, *KEY style
  180 REM               =1 - parse until space, filename style (not implemented)
  190 REM Returns:  decoded string
  200 DEFFNGS_Decode(inp$,flg%)
  210 LOCAL out$,byte%,set%
  220 IF LEFT$(inp$,1)=" ":REPEAT:inp$=MID$(inp$,2):UNTIL LEFT$(inp$,1)<>" "
  230 IF LEFT$(inp$,1)="""":IF RIGHT$(inp$,1)="""":inp$=MID$(inp$,2,LENinp$-2)
  240 IF inp$="":=""
  250 REPEAT
  260   byte%=-1:set%=0
  270   IF LEFT$(inp$,2)="|!":set%=128:inp$=MID$(inp$,3)
  280   IF LEFT$(inp$,1)="|":byte%=ASCMID$(inp$,2,1)AND31
  290   IF LEFT$(inp$,2)="||":byte%=ASC"|"
  300   IF LEFT$(inp$,2)="|?":byte%=127
  310   IF LEFT$(inp$,2)="|""":byte%=34
  320   IF LEFT$(inp$,2)="""""":byte%=34
  330   IF byte%<0:byte%=ASC(inp$):inp$=MID$(inp$,2) ELSE inp$=MID$(inp$,3)
  340   out$=out$+CHR$(set%+byte%)
  350 UNTIL inp$=""
  360 =out$
  370 :
  380 REM Encode into a GSTrans string
  390 REM On entry: inp$=raw string
  400 REM Returns:  GSTrans string
  410 DEFFNGS_Encode(inp$)
  420 LOCAL out$,byte%
  430 IF inp$="":=""""""
  440 REPEAT
  450   byte%=ASC(inp$):inp$=MID$(inp$,2)
  460   IF byte%>127:out$=out$+"|!":byte%=byte% AND 127
  470   IF byte%>31 AND byte%<>ASC"""" AND byte%<>ASC"|" AND byte%<>127:out$=out$+CHR$(byte%)
  480   IF byte%<32:out$=out$+"|"+CHR$(byte%+64)
  490   IF byte%=ASC"""":out$=out$+""""""
  500   IF byte%=ASC"|":out$=out$+"||"
  510   IF byte%=127:out$=out$+"|?"
  520 UNTIL inp$=""
  530 =""""+out$+""""
  540 :

No checks for string lengths is done. On decoding, invalid encodings are ignored and skipped, for instance |4 is decoded as 4.

Emacs Lisp

"
ASCII code 	Symbols used
0 	|@
1 - 26 	|letter eg |A (or |a) = ASCII 1, |M (or |m) = ASCII 13
27 	|[ or |{
28 	|\
29 	|] or |}
30 	|^ or |~
31 	|_ or |' (grave accent)
32 - 126 	keyboard character, except for:
\" 	|\"
| 	||
< 	|<
127 	|?
128 - 255 	|!coded symbol eg ASCII 128 = |!|@ ASCII 129 = |!|A
"

(defun gst--load-char (encoded)
  (if (gst--is-end encoded)
      (error "Unexpected end.")
    (let ((c (aref (car encoded) (cadr encoded))))
      (setcdr encoded (list (1+ (cadr encoded))))
      c )))

(defun gst--is-end (lst)
  (>= (cadr lst) (length (car lst))))

(defun gst--translate-special (c)
  (cond
   ((eq c ?@) 0)
   ((eq c ?\[) 27)
   ((eq c ?\{) 27)
   ((eq c ?\\) 28)
   ((eq c ?\]) 29)
   ((eq c ?\}) 29)
   ((eq c ?^) 30)
   ((eq c ?~) 30)
   ((eq c ?_) 31)
   ((eq c ?') 31)
   ((eq c ?\") ?\")
   ((eq c ?|) ?|)
   ((eq c ?<) ?<)
   ((eq c ??) 127)
   ((and (>= c 65) (<= c 90)) (+ (- c 65) 1))
   ((and (>= c 97) (<= c 122)) (+ (- c 97) 1))
   (t nil)))

(defun gst--load-highpos-token (encoded)
  (let ((c (gst--load-char encoded)) sp)
    (cond
     ((eq c ?|)
      (setq sp (gst--load-char encoded))
      (+ 128 (gst--translate-special sp)))
     ((and (> c 31) (< c 127))
      (+ 128 c))
     (t (error "Not a printable character.")))))

(defun gst--load-token (encoded)
  (let ((c (gst--load-char encoded)) sp)
    (cond
     ((eq c ?|)
      (setq sp (gst--load-char encoded))
      (if (eq sp ?!)
	  (gst--load-highpos-token encoded)
	(gst--translate-special sp)))
     ((and (> c 31) (< c 127)) c)
     (t (error "Not a printable character.")))))

(defun gst-parse (text)
  (let ((encoded (list text 0)) (decoded '()))
    (while (not (gst--is-end encoded))
      (add-to-list 'decoded (gst--load-token encoded) 't))
    decoded))

(progn
  (let ((text "|LHello|G|J|M"))
    (message "%s => %s" text (gst-parse "|LHello|G|J|M"))))
Output:
|LHello|G|J|M => (12 72 101 108 111 7 10 13)

FreeBASIC

Function GSTransEncodeChar(c As Ubyte) As String
    Dim As String resultchars = ""
    If c <= 31 Then
        resultchars &= "|" & Chr(64 + c)
    Elseif c = Asc("""") Then
        resultchars &= "|"""
    Elseif c = Asc("|") Then
        resultchars &= "||"
    Elseif c = 127 Then
        resultchars &= "|?"
    Elseif c >= 128 Then    ' then recurse after subtracting 128
        resultchars &= "|!" & GSTransEncodeChar(c - 128)
    Else
        resultchars &= Chr(c)
    End If
    Return resultchars
End Function

Function GSTransEncode(cad As String) As String
    Dim As String result = ""
    For i As Integer = 1 To Len(cad)
        result &= GSTransEncodeChar(Asc(Mid(cad, i, 1)))
    Next
    Return result
End Function

Function GSTransDecode(cad As String) As String
    Dim As String result = ""
    Dim As Boolean gotbar = False, gotbang = False
    Dim As Integer i, j, bangadd = 0
    For i = 1 To Len(cad)
        Dim As Ubyte c = Asc(Mid(cad, i, 1))
        If gotbang Then
            If c = Asc("|") Then
                bangadd = 128
                gotbar = True
            Else
                result &= Chr(c + 128)
            End If
            gotbang = False
        Elseif gotbar Then
            Select Case c
            Case Asc("?")
                result &= Chr(127 + bangadd)
            Case Asc("!")
                gotbang = True
            Case Asc("|"), Asc(""""), Asc("<")
                result &= Chr(c + bangadd)
            Case Asc("["), Asc("{")
                result &= Chr(27 + bangadd)
            Case Asc("\")
                result &= Chr(28 + bangadd)
            Case Asc("]"), Asc("}")
                result &= Chr(29 + bangadd)
            Case Asc("^"), Asc("~")
                result &= Chr(30 + bangadd)
            Case Asc("_"), Asc("`")
                result &= Chr(31 + bangadd)
            Case Else
                j = Asc(Ucase(Chr(c))) - 64 + bangadd
                result &= Iif(j >= 0, Chr(j), Chr(c))
            End Select
            gotbar = False
            bangadd = 0
        Elseif c = Asc("|") Then
            gotbar = True
        Else
            result &= Chr(c)
        End If
    Next
    Return result
End Function

' test cases
Dim As Integer i, j
Dim As String TESTS(1) = {"ALERT|G", "|LHello|G|J|M"}
Dim As String RAND_TESTS(7)
For i = 0 To Ubound(RAND_TESTS)
    Dim As String rand_str = ""
    For j = 1 To 10
        rand_str &= Chr(Int(Rnd * 256))
    Next
    RAND_TESTS(i) = rand_str
Next
Dim As String DECODE_TESTS(4) = {_
"|LHello|G|J|M", "|m|j|@|e|!t|m|!|?", "abc|1de|5f", "quoted|text", "\r\n\0\x05\xf4\r\xff" }

For i = 0 To Ubound(TESTS)
    Dim As String t = TESTS(i)
    Dim As String encoded = GSTransEncode(t)
    Dim As String decoded = GSTransDecode(encoded)
    Print "String "; t; " encoded is: "; encoded; ", decoded is: "; decoded
    If t <> decoded Then
        Print "Error: Decoded string does not match original"
    End If
Next

For i = 0 To Ubound(RAND_TESTS)
    Dim As String t = RAND_TESTS(i)
    Dim As String encoded = GSTransEncode(t)
    Dim As String decoded = GSTransDecode(encoded)
    Print "String "; t; " encoded is: "; encoded; ", decoded is: "; decoded
    If t <> decoded Then
        Print "Error: Decoded string does not match original"
    End If
Next

For i = 0 To Ubound(TESTS)
    Dim As String enc = DECODE_TESTS(i)
    Print !"\nThe encoded string "; enc; " is decoded as: "; GSTransDecode(enc);
Next

Sleep

Java

This example checks that the string being encoded only contains characters within the range 0..255 (inclusive), and does not process the string if an invalid character is found.

Strings being decoded which contain unprintable characters have each such character, c, replaced by the string CHR$(c). Invalid strings such as |5 are decoded as 5.

import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.List;

public final class GSTransStringConversion {

	public static void main(String[] args) {
		List<String> tests = List.of( "ALERT|G", "wert↑", "@♂aN°$ª7Î", "ÙC▼æÔt6¤☻Ì", "\"@)Ð♠qhýÌÿ",
			"+☻#o9$u♠©A", "♣àlæi6Ú.é", "ÏÔ♀È♥@ë", "Rç÷%◄MZûhZ", "ç>¾AôVâ♫↓P" );
		
		for ( String test : tests ) {
			String encoded = encode(test);
			System.out.println(test + " --> " + encoded + " --> " + decode(encoded));
		}
		System.out.println();	
		
		for ( String encoded : List.of ( "|LHello|G|J|M", "|m|j|@|e|!t|m|!|?", "abc|1de|5f" ) ) {
			System.out.println("The encoded string " + encoded + " is decoded as " + decode(encoded));
		}
	}
	
	private static String encode(String text) {
		StringBuilder result = new StringBuilder();
		byte[] bytes = text.getBytes(StandardCharsets.UTF_8);
		for ( int k = 0; k < bytes.length; k++ ) { 
			int charValue = bytes[k] & 0xff;
			if ( charValue < 0 || charValue > 255 ) {
				throw new IllegalArgumentException("Character value is out of range: " + charValue);
			}
			
			StringBuilder chars = new StringBuilder();
			if ( charValue >= 128 ) {
				chars.append('|'); chars.append('!'); 
				charValue -= 128;
			}		

			if ( charValue <= 31 ) {
				chars.append('|'); chars.append((char) ( 64 + charValue ));
			} else if ( charValue == 34 ) {
		        chars.append('|'); chars.append('"');
		    } else if ( charValue == 124 ) {
		        chars.append('|'); chars.append('|');
		    } else if ( charValue == 127 ) {
		        chars.append('|'); chars.append('?');	
			} else {
				chars.append((char) charValue);
			}
			result.append(chars.toString());
		}	
		
		return result.toString();
	}

	private static String decode(String text) {		
		List<Byte> bytes = new ArrayList<Byte>();
	    boolean previousVerticalBar = false;
	    boolean previousExclamationMark = false;
	    int addend = 0;
	    for ( char ch : text.toCharArray() ) {
	    	if ( previousExclamationMark ) {
	            if ( ch == '|' ) {
	                addend = 128;
	                previousVerticalBar = true;
	            } else {
	                bytes.add((byte) ( 128 + ch ));
	            }	          
	            previousExclamationMark = false;
	        } else if ( previousVerticalBar ) {       
	            if ( ch == '?' ) {
	            	bytes.add((byte) ( 127 + addend ));
	            } else if ( ch == '!' ) {
	                previousExclamationMark = true;
	            } else if ( ch == '|' || ch == '"' || ch == '<' ) {
	                bytes.add((byte) ( ch + addend ));
	        	} else if ( ch == '[' || ch == '{' ) {
	                bytes.add((byte) ( 27 + addend ));
	    		} else if ( ch == '\\' ) {
	                bytes.add((byte) ( 28 + addend ));
	    		} else if ( ch == ']' || ch == '}' ) {
	                bytes.add((byte) ( 29 + addend ));
	    		} else if ( ch == '^' || ch == '~' ) {
	                bytes.add((byte) ( 30 + addend ));
	    		} else if ( ch == '_' || ch == '`' ) {
	                bytes.add((byte) ( 31 + addend ));
	    		} else {
	                final int value = Integer.valueOf(Character.toUpperCase(ch)) - 64 + addend;
	                if ( 0 < value && value < 32 ) {
	            		byte[] newBytes = ( "CHR$(" + String.valueOf(value) + ")" ).getBytes();
	            		for ( byte bb : newBytes ) {
	            			bytes.add(bb); 
	            		}
	                } else if ( value > 0 ) {
	                	bytes.add((byte) value);
	                } else {
	                	bytes.add((byte) ch);
	                }
	    		}	            
	            previousVerticalBar = false;
	            addend = 0;
	        } else if ( ch == '|' ) {
	            previousVerticalBar = true;
	        } else {
	            bytes.add((byte) ch);
	        }
	    }
	    
	    String decoded = "";	    
	    List<Byte> highValueBytes = new ArrayList<Byte>();
		for ( byte bb = 0; bb < bytes.size(); bb++ ) {
			if ( bytes.get(bb) > 0 ) {
				decoded += decodeHighValueBytes(highValueBytes);				
				decoded += new String( new byte[] { bytes.get(bb) }, StandardCharsets.UTF_8 ); 
			} else {
				highValueBytes.add(bytes.get(bb));		
			}		
		}
		decoded += decodeHighValueBytes(highValueBytes);
		return decoded;		
	}
	
	private static String decodeHighValueBytes(List<Byte> highValueBytes) {
		String result = "";
		if ( ! highValueBytes.isEmpty() ) {
			if ( highValueBytes.size() == 1 ) {
				result += Character.toString(highValueBytes.get(0) & 0xff);
			} else {
				byte[] newBytes = new byte[highValueBytes.size()];
				for ( int j = 0; j < highValueBytes.size(); j++ ) {
					newBytes[j] = highValueBytes.get(j);
				}
				result += new String(newBytes, StandardCharsets.UTF_8);
			}
			highValueBytes.clear();
		}
		return result;
	}

}
Output:
ALERT|G --> ALERT||G --> ALERT|G
wert↑ --> wert|!b|!|F|!|Q --> wert↑
@♂aN°$ª7Î --> @|!b|!|Y|!|BaN|!B|!0$|!B|!*7|!C|!|N --> @♂aN°$ª7Î
ÙC▼æÔt6¤☻Ì --> |!C|!|YC|!b|!|V|!<|!C|!&|!C|!|Tt6|!B|!$|!b|!|X|!;|!C|!|L --> ÙC▼æÔt6¤☻Ì
"@)Ð♠qhýÌÿ --> |"@)|!C|!|P|!b|!|Y|! qh|!C|!=|!C|!|L|!C|!? --> "@)Ð♠qhýÌÿ
+☻#o9$u♠©A --> +|!b|!|X|!;#o9$u|!b|!|Y|! |!B|!)A --> +☻#o9$u♠©A
♣àlæi6Ú.é --> |!b|!|Y|!#|!C|! l|!C|!&i6|!C|!|Z.|!C|!) --> ♣àlæi6Ú.é
ÏÔ♀È♥@ë --> |!C|!|O|!C|!|T|!b|!|Y|!|@|!C|!|H|!b|!|Y|!%@|!C|!+ --> ÏÔ♀È♥@ë
Rç÷%◄MZûhZ --> R|!C|!'|!C|!7%|!b|!|W|!|DMZ|!C|!;hZ --> Rç÷%◄MZûhZ
ç>¾AôVâ♫↓P --> |!C|!'>|!B|!>A|!C|!4V|!C|!|"|!b|!|Y|!+|!b|!|F|!|SP --> ç>¾AôVâ♫↓P

The encoded string |LHello|G|J|M is decoded as CHR$(12)HelloCHR$(7)CHR$(10)CHR$(13)
The encoded string |m|j|@|e|!t|m|!|? is decoded as CHR$(13)CHR$(10)@CHR$(5)ôCHR$(13)ÿ
The encoded string abc|1de|5f is decoded as abc1de5f

jq

Adapted from Wren

Works with: jq

Works with gojq, the Go implementation of jq

Strings in jq are just JSON strings, and therefore the constituent codepoints are not restricted to 8-bit bytes. The `encode` and `decode` filters presented here, however, only check that their inputs are non-empty JSON strings.

def encode($upper):
  # helper function to encode bytes < 128
  def f: 
    if (. >= 1 and . <= 26)
    then "|" + (if $upper then [. + 64]|implode else [. + 96]|implode end)
    elif . < 32
    then "|" + ([. + 64] | implode)
    elif . == 34   # quotation mark           
    then "|\""
    elif . == 60   # less than
    then "|<"
    elif . == 124  # vertical bar
    then "||"
    elif . == 127  # DEL
    then "|?"
    else [.]|implode
    end ;
  . as $s
  | if ($s | (type != "string") or (length == 0)) then "Argument of encode must be a non-empty string." | error
    else # remove any outer quotation marks
    ($s | if (length > 1 and .[:1] == "\"" and .[-1:] == "\"") then .[1:-1] else . end) as $s
    # iterate through the string's codepoints
    | reduce ($s|explode)[] as $b ( {enc: ""};
             if $b < 128 then .enc += ($b|f)
             else .enc +=  "|!" + (($b - 128)|f)
             end)
    | .enc
    end;

def decode:
  # helper function for decoding bytes after "|"
  def f:
    if . == 34                  # quotation mark
    then 34
    elif . == 60                # less than
    then 60
    elif . == 63                # question mark
    then 127
    elif . >= 64 and . < 96     # @ + upper case letter + [\]^_
    then . - 64
    elif . == 96                # grave accent
    then 31
    elif . == 124               # vertical bar
    then 124
    elif . >= 97 and . < 127    # lower case letter + {}~
    then . - 96
    else .
    end;
  . as $s
  | if ($s | (type != "string") or (length == 0)) then "Argument of decode must be a non-empty string." | error
    else
    # remove any outer quotation marks
    ($s | if (length > 1 and .[:1] == "\"" and .[-1:] == "\"") then $s[1:-1] else . end) as $s
    | ($s|explode) as $bytes
    | ($bytes|length) as $bc
    | {i: 0, dec: "" }
    # iterate through the string's bytes decoding as we go
    | until(.i >= $bc;
        if $bytes[.i] != 124
        then .dec += ([$bytes[.i]] | implode)
        | .i += 1
        else
          if (.i < $bc - 1) and ($bytes[.i+1] != 33)
          then .dec += ([$bytes[.i+1] | f ] | implode)
          | .i += 2
          else
            if (.i < $bc - 2) and ($bytes[.i+2] != 124)
            then .dec += ([128 + $bytes[.i+2]] | implode)
            | .i += 3
            else
              if (.i < $bc - 3) and ($bytes[.i+2] == 124) 
              then .dec += ([128 + ($bytes[.i+3] | f)] | implode)
              | .i += 4 
              else .i += 1
              end
            end
          end
        end)
    | .dec
    end;

def strings: [
  "\fHello\u0007\n\r",
  "\r\n\u0000\u0005\u00f4\r\u00ff"
];

def uppers: [true, false];

def task1:
  range(0; strings|length) as $i
  | strings[$i]
  | uppers[] as $u
  | encode($u) as $enc
  | ($enc|decode) as $dec
  | "string: \(tojson)",
    "encoded (\(if $u then "upper" else "lower" end)) : \($enc|tojson)",
    "decoded : \($dec|tojson)",
    "string == decoded ? \($dec == .)\n"
;

def jstrings:[
    "ALERT|G",
    "wert↑",
    "@♂aN°$ª7Î",
    "ÙC▼æÔt6¤☻Ì",
    "\"@)Ð♠qhýÌÿ",
    "+☻#o9$u♠©A",
    "♣àlæi6Ú.é",
    "ÏÔ♀È♥@ë",
    "Rç÷\\%◄MZûhZ",
    "ç>¾AôVâ♫↓P"
];

def task2:
  "Julia strings: string -> encoded (upper) <- decoded (same or different)\n",
  ( jstrings[]
    | encode(true) as $enc
    | ($enc|decode) as $dec
    | "  \(tojson) -> \($enc|tojson) <- \($dec|tojson) (\( if . == $dec then "same" else "different" end))"
  );

task1, task2

Invocation: jq -nr -f gstrans.jq

Output:
string: "\fHello\u0007\n\r"
encoded (upper) : "|LHello|G|J|M"
decoded : "\fHello\u0007\n\r"
string == decoded ? true

string: "\fHello\u0007\n\r"
encoded (lower) : "|lHello|g|j|m"
decoded : "\fHello\u0007\n\r"
string == decoded ? true

string: "\r\n\u0000\u0005ô\rÿ"
encoded (upper) : "|M|J|@|E|!t|M|!|?"
decoded : "\r\n\u0000\u0005ô\rÿ"
string == decoded ? true

string: "\r\n\u0000\u0005ô\rÿ"
encoded (lower) : "|m|j|@|e|!t|m|!|?"
decoded : "\r\n\u0000\u0005ô\rÿ"
string == decoded ? true

Julia strings: string -> encoded (upper) <- decoded (same or different)

  "ALERT|G" -> "ALERT||G" <- "ALERT|G" (same)
  "wert↑" -> "wert|!ℑ" <- "wert↑" (same)
  "@♂aN°$ª7Î" -> "@|!◂aN|!0$|!*7|!N" <- "@♂aN°$ª7Î" (same)
  "ÙC▼æÔt6¤☻Ì" -> "|!YC|!┼|!f|!Tt6|!$|!▻|!L" <- "ÙC▼æÔt6¤☻Ì" (same)
  "\"@)Ð♠qhýÌÿ" -> "|\"@)|!P|!◠qh|!}|!L|!|?" <- "\"@)Ð♠qhýÌÿ" (same)
  "+☻#o9$u♠©A" -> "+|!▻#o9$u|!◠|!)A" <- "+☻#o9$u♠©A" (same)
  "♣àlæi6Ú.é" -> "|!◣|!`l|!fi6|!Z.|!i" <- "♣àlæi6Ú.é" (same)
  "ÏÔ♀È♥@ë" -> "|!O|!T|!◀|!H|!◥@|!k" <- "ÏÔ♀È♥@ë" (same)
  "Rç÷\\%◄MZûhZ" -> "R|!g|!w\\%|!╄MZ|!{hZ" <- "Rç÷\\%◄MZûhZ" (same)
  "ç>¾AôVâ♫↓P" -> "|!g>|!>A|!tV|!b|!◫|!ℓP" <- "ç>¾AôVâ♫↓P" (same)

Julia

""" 
ASCII code	Symbols used
0	        |@
1 - 26	    |letter eg |A (or |a) = ASCII 1, |M (or |m) = ASCII 13
27	        |[ or |{
28	        |\
29	        |] or |}
30	        |^ or |~
31	        |_ or |' (grave accent)
32 - 126	keyboard character, except for:
"	        |"
|	        ||
<	        |<
127	        |?
128 - 255	|!coded symbol eg ASCII 128 = |!|@ ASCII 129 = |!|A

See also www.riscos.com/support/developers/prm/conversions.html
"""

""" 
    function GSTrans_encode

    3 methods.

    Encode a string by converting a potentially Unicode string to codeunit bytes and 
    then to a vector of ascii Chars, then passing to the encoding routine for the vector.

    Encode a Vector of Char as its individual Chars and concatenating results.

    Encode a single Char as a GSTRans string of 1 or more chars.
    To avoid Unicode multibyte glitches, throw an assertion error if any Chars
    are multibyte (so, 0 <= integer value of Char c <= 255).
"""
GSTrans_encode(str::AbstractString) = GSTrans_encode(Char.(transcode(UInt8, str)))
GSTrans_encode(a::Vector{Char}) = String(mapreduce(GSTrans_encode, vcat, a, init = Char[]))
function GSTrans_encode(c::Char)
    i = Int(c)
    @assert 0 <= i <= 255 "Char value of $c, $i, is out of range"
    resultchars = Char[]
    if 0 <= i <= 31
        push!(resultchars, '|', Char(64 + i))
    elseif c == '"'
        push!(resultchars, '|', '"')
    elseif c == '|'
        push!(resultchars, '|', '|')
    elseif i == 127
        push!(resultchars, '|', '?')
    elseif 128 <= i <= 255 # |! then recurse after subtracting 128
        push!(resultchars, '|', '!', GSTrans_encode(Char(i - 128))...)
    else
        push!(resultchars, c)
    end
    return resultchars
end

"""
    function GSTrans_decode(str::AbstractString)

    Decode a GSTrans coded string back to original format. If decoding results
    in a negative value for the result due to encoding errors such as "|1" will
    substitute the char without the subtraction of 64 from the | bar, as in the
    Wren and Phix examples, so that "|1" becomes '1'.
"""
function GSTrans_decode(str::AbstractString)
    result = UInt8[]
    gotbar, gotbang, bangadd = false, false, 0
    for c in str
        if gotbang
            if c == '|'
                bangadd = 128
                gotbar = true
            else
                push!(result, Char(Int(c) + 128))
            end
            gotbang = false
        elseif gotbar       
            if c == '?'
                push!(result, Char(127 + bangadd))
            elseif c == '!'
                gotbang = true
            elseif c == '|' || c == '"' || c == '<'
                push!(result, Char(Int(c) + bangadd))
            elseif c == '[' || c == '{'
                push!(result, Char(27 + bangadd))
            elseif c == '\\'
                push!(result, Char(28 + bangadd))
            elseif c == ']' || c == '}'
                push!(result, Char(29 + bangadd))
            elseif c == '^' || c == '~'
                push!(result, Char(30 + bangadd))
            elseif c == '_' || c == '`'
                push!(result, Char(31 + bangadd))
            else
                i = Int(uppercase(c)) - 64 + bangadd
                push!(result, i >= 0 ? Char(i) : c)
            end
            gotbar, bangadd = false, 0
        elseif c == '|'
                gotbar = true
        else
            push!(result, Char(c))
        end
    end
    return String(result)
end

const TESTS = ["ALERT|G", "wert↑"]
const RAND_TESTS = [String(Char.(rand(0:255, 10))) for _ in 1:8]
const DECODE_TESTS = ["|LHello|G|J|M", "|m|j|@|e|!t|m|!|?", "abc|1de|5f"]

for t in [TESTS; RAND_TESTS]
    encoded = GSTrans_encode(t)
    decoded = GSTrans_decode(encoded)
    println("String $t encoded is: $encoded, decoded is: $decoded.")
    @assert t == decoded
end

for enc in DECODE_TESTS
    print("Encoded string $enc decoded is: ")
    display(GSTrans_decode(enc))
end
Output:
String ALERT|G encoded is: ALERT||G, decoded is: ALERT|G.
String wert↑ encoded is: wert|!b|!|F|!|Q, decoded is: wert↑.
String @♂aN°$ª7Î encoded is: @|KaN|!B|!0$|!B|!*7|!B|!|R|!C|!|N, decoded is: @♂aN°$ª7Î.
String ÙC▼æÔt6¤☻Ì encoded is: |!C|!|YC|_|!C|!&|!C|!|Tt6|!B|!$|B|!C|!|L, decoded is: ÙC▼æÔt6¤☻Ì.
String "@)Ð♠qhýÌÿ encoded is: |"@)|!C|!|P|Fqh|!C|!=|!C|!|L|!C|!?, decoded is: "@)Ð♠qhýÌÿ.
String +☻#o9$u♠©A encoded is: +|B#o9$u|F|!B|!)A, decoded is: +☻#o9$u♠©A.
String ♣àlæi6Ú.é encoded is: |E|!C|! l|!B|!|K|!C|!&i6|!C|!|Z.|!C|!), decoded is: ♣àlæi6Ú.é.
String ÏÔ♀È♥@ë encoded is: |!C|!|O|!C|!|T|!B|!|[|Lj|!C|!|H|C@|!B|!|I|!C|!+, decoded is: ÏÔ♀È♥@ë.
String Rç÷%◄MZûhZ encoded is: R|!C|!'|!C|!7%|QMZ|!C|!;hZ, decoded is: Rç÷%◄MZûhZ.
String ç>¾AôVâ♫↓P encoded is: |!C|!'>|!B|!>A|!C|!4V|!C|!|"|N|YP, decoded is: ç>¾AôVâ♫↓P.
Encoded string |LHello|G|J|M decoded is: "\fHello\a\n\r"
Encoded string |m|j|@|e|!t|m|!|? decoded is: "\r\n\0\x05\xf4\r\xff"
Encoded string abc|1de|5f decoded is: "abc1de5f"

Perl

Translation of: Emacs Lisp
# 20240911 Perl programming solution

use strict;
use warnings;

sub gst_load_char {
   my ($encoded) = @_;
   if (gst_is_end($encoded)) { die "Unexpected end." }
   my $c = substr($encoded->[0], $encoded->[1], 1);
   $encoded->[1]++;
   return $c;
}

sub gst_is_end {
   my ($lst) = @_;
   return $lst->[1] >= length($lst->[0]);
}

sub gst_translate_special {
   my ($c) = @_;
   return            0 if $c eq '@';
   return           27 if $c eq '[' || $c eq '{';
   return           28 if $c eq '\\';
   return           29 if $c eq ']' || $c eq '}';
   return           30 if $c eq '^' || $c eq '~';
   return           31 if $c eq '_' || $c eq "'";
   return ord($c)      if $c eq '"' || $c eq '|' || $c eq '<' || $c eq '?';
   return ord($c) - 64 if $c ge 'A' && $c le 'Z';
   return ord($c) - 96 if $c ge 'a' && $c le 'z';
   return undef;
}

sub gst_load_highpos_token {
   my ($encoded) = @_;
   if ( ( my $c = gst_load_char($encoded) ) eq '|') {
      my $sp = gst_load_char($encoded);
      return 128 + gst_translate_special($sp);
   } elsif ($c gt chr(31) && $c lt chr(127)) {
      return 128 + ord($c);
   } 
   die "Not a printable character.";
}

sub gst_load_token {
   my ($encoded) = @_;
   if ( ( my $c = gst_load_char($encoded) ) eq '|') { 
      my $sp = gst_load_char($encoded);
      return ($c eq '!') ? gst_load_highpos_token($encoded) 
                         : gst_translate_special($sp);
   } elsif ($c gt chr(31) && $c lt chr(127)) {
      return ord($c);
   }
   die "Not a printable character.";
}

sub gst_parse {
   my ($text) = @_;
   my ($encoded, @decoded) = ( [$text, 0], );
   while (!gst_is_end($encoded)) { push @decoded, gst_load_token($encoded) }
   return \@decoded;
}

my $decoded = gst_parse(my $text = "|LHello|G|J|M");
print "$text => (" . join(", ", @$decoded) . ")\n";

You may Attempt This Online!

Phix

Note all those unicode strings work fine in a browser and on linux, but look horrible in a windows console, so I left them out,
and in fact penned a hexstr() rather similar to the two routines actually asked for, just to improve the console display a little bit.
Also, the following always encodes to uppercase, but the decode part will properly cope with (eg) "|m|j|@|e|!t|m|!|?".
As per Wren, strings in Phix are just sequences of bytes: UTF-8 or similar is completely irrelevant here, and won't mess up byte subscripting.
Since strings are a sequence of (unsigned) bytes, there can be no encoding errors for anything that passes the typecheck of the "string s" parameter.
For decoding, explicit assertion failures occur for unprintable characters, multiple high bits such as "|!|!", or generating negative bytes such as from "|1".

with javascript_semantics
function GSTrans_encode(string s)
    string res = ""
    for b in s do
        if b>=128 then
            res &= "|!"
            b -= 128
        end if
        if b<' ' then
            res &= "|"&('@'+b)
        elsif find(b,`"|<`) then
            res &= "|"&b
        elsif b='\x7F' then
            res &= "|?"
        else
            res &= b
        end if
    end for
    return res
end function

function GSTrans_decode(string s)
    string res = ""
    bool bar = false
    integer hb = #00
    for b in s do
        assert(b>=' ' and b<='~',"non-printable character")
        if bar then
            if b='!' then
                assert(hb==#00,"high bit already set")
                hb = #80
            else
                if b='?' then 
                    b = #7F
                elsif not find(b,`"|<`) then
                    b -= iff(b>='a'?#60:#40)
                    assert(b>=0,"negative byte generated")
                end if
                res &= b+hb
                hb = #00
            end if
            bar = false
        elsif b='|' then
            bar = true
        else
            res &= b+hb
            hb = #00
        end if
    end for
    return res
end function

function hexstr(string s)
    string res = ""
    for b in s do
        if b>=' ' and b<='~' then
            res &= b
        else
            integer k = find(b,"\r\n\t\0")
            if k then
                res &= '\\'&("rnt0"[k])
            else
                res &= sprintf("\\x%02x",b)
            end if
        end if
    end for
    return res
end function

constant tests = {"\x0CHello\x07\n\r",
                  "\r\n\0\x05\xF4\r\xFF"}
for t in tests do
    string e = GSTrans_encode(t),
           d = GSTrans_decode(e),
          ht = hexstr(t),
          he = hexstr(e) 
    printf(1,"%s <-> %s (decoded same:%t)\n",{ht,he,d=t})
end for
--assertion failures:
--?hexstr(GSTrans_decode("|!|!"))
--?hexstr(GSTrans_decode("|!|1"))
--?hexstr(GSTrans_decode("|1"))
--?hexstr(GSTrans_decode("\xF4"))
Output:
\x0CHello\x07\n\r <-> |LHello|G|J|M (decoded same:true)
\r\n\0\x05\xF4\r\xFF <-> |M|J|@|E|!t|M|!|? (decoded same:true)

Python

By default, errors during decoding will raise a KeyError. If the optional default argument is given to gs_trans_decode, it will replace any erroneous symbols with the default byte instead of raising a KeyError.

"""GS byte string translation using an exhaustive map and regex reverse lookup.

Requires Python >= 3.9.
"""
import re
from typing import Optional

TABLE: dict[int, bytes] = {
    0: b"|@",
    **{byte: f"|{chr(byte+64)}".encode() for byte in range(1, 27)},
    27: b"|[",
    28: b"|\\",
    29: b"|]",
    30: b"|^",
    31: b"|_",
    **{byte: chr(byte).encode() for byte in range(32, 127)},
    34: b'|"',
    60: b"|<",
    124: b"||",
    127: b"|?",
}

# 128 - 255
TABLE.update({byte: b"|!" + TABLE[byte - 128] for byte in range(128, 256)})

REVERSE_LOOKUP: dict[bytes, int] = {
    **{v: k for k, v in TABLE.items()},
    b"|{": 27,
    b"|}": 29,
    b"|~": 30,
    b"|`": 31, # aka backtick
    **{f"|{chr(byte+96)}".encode(): byte for byte in range(1, 27)},
}

RE = re.compile(b"|".join(re.escape(s) for s in REVERSE_LOOKUP) + b"|.")


def gs_trans_encode(s: bytes) -> bytes:
    return b"".join(TABLE[byte] for byte in s)


def gs_trans_decode(s: bytes, default: Optional[int] = None) -> bytes:
    if default is None:
        return bytes(REVERSE_LOOKUP[seq] for seq in RE.findall(s))
    return bytes(REVERSE_LOOKUP.get(seq, default) for seq in RE.findall(s))


examples: list[bytes] = [
    b"\x0CHello\x07\n\r",
    b"\r\n\0\x05\xF4\r\xFF",
]

if __name__ == "__main__":
    for example in examples:
        encoded = gs_trans_encode(example)
        print(f"{example!r} -> {encoded!r}")
        assert gs_trans_decode(encoded) == example
Output:
b'\x0cHello\x07\n\r' -> b'|LHello|G|J|M'
b'\r\n\x00\x05\xf4\r\xff' -> b'|M|J|@|E|!t|M|!|?'

Raku

Translation of: Julia
# 20231105 Raku programming solution

sub GSTrans-encode(Str $str) {
   return [~] $str.encode('utf8').list.chrs.comb.map: -> $c { 
      my $i = $c.ord;
      die "Char value of $c, $i, is out of range" unless 0 <= $i <= 255;
      given ($i,$c) { 
         when 0 <= $i <= 31    { '|' ~ chr(64 + $i) } 
         when $c eq '"'        { '|"' }
         when $c eq '|'        { '||' }
         when $i == 127        { '|?' }
         when 128 <= $i <= 255 { '|!' ~ GSTrans-encode(chr($i - 128)) }
         default               { $c }
      }
   }
}

sub GSTrans-decode(Str $str) {
   my ($gotbar, $gotbang, $bangadd) = False, False, 0;

   my @result = gather for $str.comb -> $c {
      if $gotbang {
         if $c eq '|' {
            $bangadd = 128;
            $gotbar = True;
         } else {
            take $c.ord + 128;
         }
         $gotbang = False;
      } elsif $gotbar {
         given $c {
            when $c eq '?' { take 127 + $bangadd }
            when $c eq '!' { $gotbang = True }
            when $c eq '|' || $c eq '"' || $c eq '<' { take $c.ord + $bangadd }
            when $c eq '[' || $c eq '{' { take 27 + $bangadd } 
            when $c eq '\\' { take 28 + $bangadd } 
            when $c eq ']' || $c eq '}' { take 29 + $bangadd } 
            when $c eq '^' || $c eq '~' { take 30 + $bangadd }
            when $c eq '_' || $c eq '`' { take 31 + $bangadd }
            default { my $i = $c.uc.ord - 64 + $bangadd;
                      take $i >= 0 ?? $i !! $c.ord      }
         }
         $gotbar = False;
         $bangadd = 0;
      } elsif $c eq '|' {
         $gotbar = True
      } else {
         take $c.ord
      }
   }
   return Blob.new(@result).decode('utf8c8')
}

my @TESTS = <ALERT|G 'wert↑>;
my @RAND_TESTS = ^256 .roll(10).chrs.join xx 8;
my @DECODE_TESTS = < |LHello|G|J|M |m|j|@|e|!t|m|!|? abc|1de|5f >; 

for |@TESTS, |@RAND_TESTS -> $t {
   my $encoded = GSTrans-encode($t);
   my $decoded = GSTrans-decode($encoded);
   say "String $t encoded is: $encoded, decoded is: $decoded.";
   die unless $t ~~ $decoded;
}
for @DECODE_TESTS -> $enc {
    say "Encoded string $enc decoded is: ", GSTrans-decode($enc);
}

You may Attempt This Online!

Rust

On error, changes |1 to 1 as in the Wren example.

/* GSTrans encoding and decoding */

use std::collections::HashMap;
use std::iter::FromIterator;

/* encoding lookup table */
const ENCODE_TABLE: &[&str] = &[
    "|@",  "|A",  "|B",  "|C",  "|D",  "|E",  "|F", "|G",
    "|H",  "|I",  "|J",  "|K",  "|L",  "|M",  "|N", "|O",
    "|P",  "|Q",  "|R",  "|S",  "|T",  "|U",  "|V", "|W",
    "|X",  "|Y",  "|Z",  "|[",  "|\\", "|]",  "|^", "|_",
    " ",   "!",   "|\"", "#",   "$",   "%",   "&",  "\'",
    "(",   ")",   "*",   "+",   ",",   "-",   ".",  "/",
    "0",   "1",   "2",   "3",   "4",   "5",   "6",  "7",
    "8",   "9",   ":",   ";",   "|<",  "=",   ">",  "?",
    "@",   "A",   "B",   "C",   "D",   "E",   "F",  "G",
    "H",   "I",   "J",   "K",   "L",   "M",   "N",  "O",
    "P",   "Q",   "R",   "S",   "T",   "U",   "V",  "W",
    "X",   "Y",   "Z",   "[",   "\\",  "]",   "^",  "_",
    "`",   "a",   "b",   "c",   "d",   "e",   "f",  "g",
    "h",   "i",   "j",   "k",   "l",   "m",   "n",  "o",
    "p",   "q",   "r",   "s",   "t",   "u",   "v",  "w",
    "x",   "y",   "z",   "{",   "||",  "}",   "~",  "|?",
    "|@", "|!|A","|!|B","|!|C","|!|D","|!|E","|!|F","|!|G",
    "|!|H","|!|I","|!|J","|!|K","|!|L","|!|M","|!|N","|!|O",
    "|!|P","|!|Q","|!|R","|!|S","|!|T","|!|U","|!|V","|!|W",
    "|!|X","|!|Y","|!|Z","|!|[","|!|\\","|!|]","|!|^","|!|_",
    "|! ","|!!","|!|\"", "|!#", "|!$", "|!%", "|!&", "|!\'",
    "|!(","|!)","|!*",   "|!+", "|!,", "|!-", "|!.", "|!/",
    "|!0", "|!1", "|!2", "|!3", "|!4", "|!5", "|!6", "|!7",
    "|!8", "|!9", "|!:", "|!;", "|!|<",  "|!=", "|!>", "|!?",
    "|!@", "|!A", "|!B", "|!C", "|!D", "|!E", "|!F", "|!G",
    "|!H", "|!I", "|!J", "|!K", "|!L", "|!M", "|!N", "|!O",
    "|!P", "|!Q", "|!R", "|!S", "|!T", "|!U", "|!V", "|!W",
    "|!X", "|!Y", "|!Z", "|![", "|!\\","|!]", "|!^", "|!_",
    "|!`", "|!a", "|!b", "|!c", "|!d", "|!e", "|!f", "|!g",
    "|!h", "|!i", "|!j", "|!k", "|!l", "|!m", "|!n", "|!o",
    "|!p", "|!q", "|!r", "|!s", "|!t", "|!u", "|!v", "|!w",
    "|!x", "|!y", "|!z", "|!{", "|!||","|!}", "|!~", "|!|?",
];

// Encode a string into GSTrans form. Will throw an indexing error if a char
// is encountered that does not have integer value >= 0 and <= 255.
fn gs_trans_encode(txt: &str) -> String {
    return txt
        .as_bytes()
        .iter()
        .map(|c| ENCODE_TABLE[*c as usize])
        .collect::<Vec<_>>()
        .join("");
}

// Decode GSTrans coded text. Uses a lookoup table `table`. If table lookup fails
// at any point, will emit a warning to stderr and skip the char at that index.
fn gs_trans_decode(txt: &str, table: &HashMap<&&str, usize>) -> String {
    let mut result = Vec::<u8>::new();
    let mut i = 0;
    let mut substr;
    let mut uppersubstr: String;
    while i < txt.len() {
        let mut foundchar = false;
        let mut decoded = 0_usize;
        for j in 0..5 {
            if i + j > txt.len() {
                break;
            }
            substr = &txt[i..i + j];
            if j == 2 || j == 4 { // match |a as |A in the table
                uppersubstr = substr.to_uppercase();
                substr = &uppersubstr;
            }
            if table.contains_key(&substr) {
                decoded = table[&substr];
                foundchar = true;
                i += j;
                break;
            }
        }
        if foundchar {
            result.push(decoded as u8);
        } else { // error found: skip one char in the bad encoding, so "|1" becomes "1"
            eprintln!("Warning: Bad encoding at position {}, skipped a char", i);
            i += 1;
        }
    }
    return String::from_utf8_lossy(&result).to_string(); // back to utf8 from bytes
}

fn main() {
    // decoding lookup table
    let mut decode_table =
       HashMap::from_iter(ENCODE_TABLE.iter().enumerate().map(|(i, v)| (v, i)));
    for (v, k) in
        [(27, &"|{"), (29, &"|}"), (30, &"|~"), (31, &"|`",),
         (155, &"|!|{"), (157, &"|!|}"), (158, &"|!|~"), (159, &"|!|`",),] {
            decode_table.insert(k, v);
        }
    for test in ["ALERT|G", "wert↑", "@♂aN°$ª7Î", "ÙC▼æÔt6¤☻Ì", "\"@)Ð♠qhýÌÿ",
                       "+☻#o9$u♠©A", "♣àlæi6Ú.é", "ÏÔ♀È♥@ë", "Rç÷\\%◄MZûhZ", "ç>¾AôVâ♫↓P"] {
        let encoded = gs_trans_encode(test);
        let decoded = gs_trans_decode(&encoded, &decode_table);
        println!("Test string {}, encoded: {}, then decoded: {}", test, encoded, decoded);
        assert!(test == decoded);
    }
    for test in [&"|LHello|G|J|M", &"|m|j|@|e|!t|m|!|?", &"abc|1de|5f"] {
        let decoded = gs_trans_decode(test, &decode_table);
        println!("Test string {} decoded is: {}", test, decoded);
    }
}
Output:
Test string ALERT|G, encoded: ALERT||G, then decoded: ALERT|G
Test string wert↑, encoded: wert|!b|!|F|!|Q, then decoded: wert↑
Test string @♂aN°$ª7Î, encoded: @|!b|!|Y|!|BaN|!B|!0$|!B|!*7|!C|!|N, then decoded: @♂aN°$ª7Î
Test string ÙC▼æÔt6¤☻Ì, encoded: |!C|!|YC|!b|!|V|!|<|!C|!&|!C|!|Tt6|!B|!$|!b|!|X|!;|!C|!|L, then decoded: ÙC▼æÔt6¤☻Ì
Test string "@)Ð♠qhýÌÿ, encoded: |"@)|!C|!|P|!b|!|Y|! qh|!C|!=|!C|!|L|!C|!?, then decoded: "@)Ð♠qhýÌÿ
Test string +☻#o9$u♠©A, encoded: +|!b|!|X|!;#o9$u|!b|!|Y|! |!B|!)A, then decoded: +☻#o9$u♠©A
Test string ♣àlæi6Ú.é, encoded: |!b|!|Y|!#|!C|! l|!C|!&i6|!C|!|Z.|!C|!), then decoded: ♣àlæi6Ú.é
Test string ÏÔ♀È♥@ë, encoded: |!C|!|O|!C|!|T|!b|!|Y|@|!C|!|H|!b|!|Y|!%@|!C|!+, then decoded: ÏÔ♀È♥@ë
Test string Rç÷\%◄MZûhZ, encoded: R|!C|!'|!C|!7\%|!b|!|W|!|DMZ|!C|!;hZ, then decoded: Rç÷\%◄MZûhZ
Test string ç>¾AôVâ♫↓P, encoded: |!C|!'>|!B|!>A|!C|!4V|!C|!|"|!b|!|Y|!+|!b|!|F|!|SP, then decoded: ç>¾AôVâ♫↓P
Test string |LHello|G|J|M decoded is: �Hello�

Test string |m|j|@|e|!t|m|!|? decoded is: 
���
�
Test string abc|1de|5f decoded is: abc1de5f
Warning: Bad encoding at position 3, skipped a char
Warning: Bad encoding at position 7, skipped a char

Without table lookup

Translation of: Julia
fn gs_char_encode(i: u8) -> String {
    let mut resultchars = Vec::<u8>::new();
    match i {
        0..=31 => { resultchars.extend(['|' as u8, 64 + i]) }
        0x22 => { resultchars.extend(['|' as u8, '"' as u8]) }
        0x7c => { resultchars.extend(['|' as u8, '|' as u8]) }
        127 => { resultchars.extend(['|' as u8, '?' as u8]) }
        128..=255 => { // |! then recurse after subtracting 128
            resultchars.extend(['|' as u8, '!' as u8]);
            resultchars.extend(gs_char_encode(i - 128).as_bytes());
        }
        _ => { resultchars.push(i) }
    }
    return String::from_utf8_lossy(&resultchars).to_string();
}

fn gs_trans_encode(s: &str) -> String {
    return s.as_bytes().iter().map(|byt| gs_char_encode(*byt)).collect::<Vec<_>>().join("");
}

fn gs_trans_decode(s: &str) -> String {
    let mut result = Vec::<u8>::new();
    let mut gotbar = false;
    let mut gotbang = false;
    let mut bangadd = 0;
    for c in s.chars() {
        let i = c as u8;
        if gotbang {
            if c == '|' {
                bangadd = 128;
                gotbar = true;
            } else {
                result.push(i + 128);
            }
            gotbang = false;
        } else if gotbar {
            match c {
                '?' => { result.push(127 + bangadd) }
                '!' => { gotbang = true }
                '|' | '"' | '<' => { result.push(i + bangadd) }
                '[' | '{' => { result.push(27 + bangadd) }
                '\\' => { result.push(28 + bangadd) }
                ']' | '}' => { result.push(29 + bangadd) }
                '^' | '~' => { result.push(30 + bangadd) }
                '_' | '`' => { result.push(31 + bangadd) }
                _ => { // mask bit 32 to make lowercase into uppercase
                    let j = bangadd + (if c.is_lowercase() {i - 32} else {i});
                    result.push(if j >= 64 {j - 64} else {i});
                }
            }            
            gotbar = false;
            bangadd = 0;
        } else if c == '|' {
            gotbar = true;
        } else {
            result.push(i);
        }
    }
    return String::from_utf8_lossy(&result).to_string();
}

fn main() {
    for t in ["ALERT|G", "wert↑", "@♂aN°$ª7Î", "ÙC▼æÔt6¤☻Ì", "\"@)Ð♠qhýÌÿ",
                 "+☻#o9$u♠©A", "♣àlæi6Ú.é", "ÏÔ♀È♥@ë", "Rç÷\\%◄MZûhZ", "ç>¾AôVâ♫↓P"] {
        let e = gs_trans_encode(t);
        let d = gs_trans_decode(&e);
        println!("Test string {} encoded is {}, decoded is: {}", t, e, d.escape_debug());
        assert!(t == d);
    }
    for t in [&"abc|1de|5f", &"|LHello|G|J|M", &"|m|j|@|e|!t|m|!|?"] {
        let d = gs_trans_decode(t);
        println!("Test string {} decoded is {}", t, d.escape_debug());
    }
}
Output:
Test string ALERT|G encoded is ALERT||G, decoded is: ALERT|G
Test string wert↑ encoded is wert|!b|!|F|!|Q, decoded is: wert↑
Test string @♂aN°$ª7Î encoded is @|!b|!|Y|!|BaN|!B|!0$|!B|!*7|!C|!|N, decoded is: @♂aN°$ª7Î
Test string ÙC▼æÔt6¤☻Ì encoded is |!C|!|YC|!b|!|V|!<|!C|!&|!C|!|Tt6|!B|!$|!b|!|X|!;|!C|!|L, decoded is: ÙC▼æÔt6¤☻Ì
Test string "@)Ð♠qhýÌÿ encoded is |"@)|!C|!|P|!b|!|Y|! qh|!C|!=|!C|!|L|!C|!?, decoded is: \"@)Ð♠qhýÌÿ
Test string +☻#o9$u♠©A encoded is +|!b|!|X|!;#o9$u|!b|!|Y|! |!B|!)A, decoded is: +☻#o9$u♠©A
Test string ♣àlæi6Ú.é encoded is |!b|!|Y|!#|!C|! l|!C|!&i6|!C|!|Z.|!C|!), decoded is: ♣àlæi6Ú.é
Test string ÏÔ♀È♥@ë encoded is |!C|!|O|!C|!|T|!b|!|Y|!|@|!C|!|H|!b|!|Y|!%@|!C|!+, decoded is: ÏÔ♀È♥@ë
Test string Rç÷\%◄MZûhZ encoded is R|!C|!'|!C|!7\%|!b|!|W|!|DMZ|!C|!;hZ, decoded is: Rç÷\\%◄MZûhZ
Test string ç>¾AôVâ♫↓P encoded is |!C|!'>|!B|!>A|!C|!4V|!C|!|"|!b|!|Y|!+|!b|!|F|!|SP, decoded is: ç>¾AôVâ♫↓P
Test string abc|1de|5f decoded is abc1de5f
Test string |LHello|G|J|M decoded is \u{c}Hello\u{7}\n\r
Test string |m|j|@|e|!t|m|!|? decoded is \r\n\0\u{5}�\r�

Wren

Library: Wren-fmt

Strings in Wren are just an immutable array of bytes. They are usually interpreted as UTF-8 but don't have to be. Unicode characters in the example Julia strings are therefore encoded using their constituent UTF-8 bytes which decodes fine but may not give the same encoding as Julia itself.

If an invalid byte (following the "|" flag) is encountered whilst decoding, it is decoded as if the flag were not present.

import "./fmt" for Fmt

class GSTrans {
    static encode(s, upper) {
        if (!(s is String && s.count > 0)) Fiber.abort("Argument must be a non-empty string.")

        // remove any outer quotation marks
        if (s.count > 1 && s[0] == "\"" && s[-1] == "\"") s = s[1..-2]

        // helper function to encode bytes < 128
        var f = Fn.new { |b|
            if (b >= 1 && b <= 26) {
                return "|" + (upper ? String.fromByte(b + 64) : String.fromByte(b + 96))
            } else if (b < 32) {
                return "|" + String.fromByte(b + 64)
            } else if (b == 34)  { // quotation mark           
                return "|\""
            } else if (b == 60)  { // less than
                return "|<"
            } else if (b == 124) { // vertical bar
                return "||"
            } else if (b == 127) { // DEL
                return "|?"
            } else {
                return String.fromByte(b)
            }
         }

         var enc = ""

         // iterate through the string's bytes encoding as we go
         for (b in s.bytes) {
             if (b < 128) {
                enc = enc + f.call(b)
             } else {
                enc = enc + "|!" + f.call(b - 128)
             }
         }

         return enc
    }

    static decode(s) {
        if (!(s is String && s.count > 0)) Fiber.abort("Argument must be a non-empty string.")

        // remove any outer quotation marks
        if (s.count > 1 && s[0] == "\"" && s[-1] == "\"") s = s[1..-2]

        // helper function for decoding bytes after "|"
        var f = Fn.new { |b|
            if (b == 34)                     { // quotation mark
                return 34
            } else if (b == 60)              { // less than
                return 60
            } else if (b == 63)              { // question mark
                return 127
            } else if (b >= 64 && b < 96)    { // @ + upper case letter + [\]^_
                return b - 64
            } else if (b == 96)              { // grave accent
                return 31
            } else if (b == 124)             { // vertical bar
                return 124
            } else if (b >= 97 && b < 127)   { // lower case letter + {}~
                return b - 96
            } else {
                return b
            }
        }

        var bytes = s.bytes.toList
        var bc = bytes.count
        var i = 0
        var dec = ""

        // iterate through the string's bytes decoding as we go
        while (i < bc) {
            if (bytes[i] != 124) {
                dec = dec + String.fromByte(bytes[i])
                i = i + 1
            } else {
                if (i < bc - 1 && bytes[i+1] != 33) {
                    dec = dec + String.fromByte(f.call(bytes[i+1]))
                    i = i + 2
                } else {
                    if (i < bc - 2 && bytes[i+2] != 124) {
                        dec = dec + String.fromByte(128 + bytes[i+2])
                        i = i + 3
                    } else if (i < bc - 3 && bytes[i+2] == 124) {
                        dec = dec + String.fromByte(128 + f.call(bytes[i+3]))
                        i = i + 4 
                    } else {
                        i = i + 1
                    }
                }
            }
        }
        return dec
    }
}

var strings = [
    "\fHello\a\n\r",
    "\r\n\0\x05\xf4\r\xff"
]

var uppers = [true, false]

for (i in 0...strings.count) {
    var s = strings[i]
    var t = Fmt.swrite("$q", Fmt.B(0, s))
    var u = uppers[i]
    var enc = GSTrans.encode(s, u)
    var dec = GSTrans.decode(enc)
    var d = Fmt.swrite("$q", Fmt.B(0, dec))
    System.print("string: %(t)")
    System.print("encoded (%(u ? "upper" : "lower")) : %(enc)")
    System.print("decoded : %(d)")
    System.print("string == decoded ? %(dec == s)\n")
}

var jstrings = [
    "ALERT|G",
    "wert↑",
    "@♂aN°$ª7Î",
    "ÙC▼æÔt6¤☻Ì",
    "\"@)Ð♠qhýÌÿ",
    "+☻#o9$u♠©A",
    "♣àlæi6Ú.é",
    "ÏÔ♀È♥@ë",
    "Rç÷\%◄MZûhZ",
    "ç>¾AôVâ♫↓P"
]

System.print("Julia strings: string -> encoded (upper) <- decoded (same or different)\n")
for (s in jstrings) {
    var enc = GSTrans.encode(s, true)
    var dec = GSTrans.decode(enc)
    var same = (s == dec)
    System.print("  %(s) -> %(enc) <- %(dec) (%(same ? "same" : "different"))")
}
Output:
string: "\fHello\a\n\r"
encoded (upper) : |LHello|G|J|M
decoded : "\fHello\a\n\r"
string == decoded ? true

string: "\r\n\0\x05\xf4\r\xff"
encoded (lower) : |m|j|@|e|!t|m|!|?
decoded : "\r\n\0\x05\xf4\r\xff"
string == decoded ? true

Julia strings: string -> encoded (upper) <- decoded (same or different)

  ALERT|G -> ALERT||G <- ALERT|G (same)
  wert↑ -> wert|!b|!|F|!|Q <- wert↑ (same)
  @♂aN°$ª7Î -> @|!b|!|Y|!|BaN|!B|!0$|!B|!*7|!C|!|N <- @♂aN°$ª7Î (same)
  ÙC▼æÔt6¤☻Ì -> |!C|!|YC|!b|!|V|!|<|!C|!&|!C|!|Tt6|!B|!$|!b|!|X|!;|!C|!|L <- ÙC▼æÔt6¤☻Ì (same)
  "@)Ð♠qhýÌÿ -> |"@)|!C|!|P|!b|!|Y|! qh|!C|!=|!C|!|L|!C|!? <- "@)Ð♠qhýÌÿ (same)
  +☻#o9$u♠©A -> +|!b|!|X|!;#o9$u|!b|!|Y|! |!B|!)A <- +☻#o9$u♠©A (same)
  ♣àlæi6Ú.é -> |!b|!|Y|!#|!C|! l|!C|!&i6|!C|!|Z.|!C|!) <- ♣àlæi6Ú.é (same)
  ÏÔ♀È♥@ë -> |!C|!|O|!C|!|T|!b|!|Y|!|@|!C|!|H|!b|!|Y|!%@|!C|!+ <- ÏÔ♀È♥@ë (same)
  Rç÷%◄MZûhZ -> R|!C|!'|!C|!7%|!b|!|W|!|DMZ|!C|!;hZ <- Rç÷%◄MZûhZ (same)
  ç>¾AôVâ♫↓P -> |!C|!'>|!B|!>A|!C|!4V|!C|!|"|!b|!|Y|!+|!b|!|F|!|SP <- ç>¾AôVâ♫↓P (same)
Cookies help us deliver our services. By using our services, you agree to our use of cookies.