Idiomatically determine all the characters that can be used for symbols: Difference between revisions
Content added Content deleted
(jq) |
|||
Line 157: | Line 157: | ||
Unicode Identifier start: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐ... |
Unicode Identifier start: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐ... |
||
Unicode Identifier part: [0][1][2][3][4][5][6][7][8][14][15][16][17][18][19][20][21][22][23][24][25][26][27][48][49]...</pre> |
Unicode Identifier part: [0][1][2][3][4][5][6][7][8][14][15][16][17][18][19][20][21][22][23][24][25][26][27][48][49]...</pre> |
||
=={{header|jq}}== |
|||
===jq identifiers=== |
|||
Excluding key names from consideration, in jq 1.4 the set of characters that can be |
|||
used in jq identifiers corresponds to the regex: [A-Za-z0-9$_]. |
|||
Thus, assuming the availability of test/1 as a builtin, the test in jq |
|||
for a valid identifier character is: test("[A-Za-z0-9$_]"). |
|||
To generate a string of such characters idiomatically: |
|||
<lang jq>[range(0;128) | [.] | implode | select(test("[A-Za-z0-9$_]"))] | add</lang> |
|||
jq 1.5 also allows ":" as a joining character in the form "module::name". |
|||
===JSON key names=== |
|||
Any JSON string can be used as a key. Accordingly, |
|||
some characters must be entered as escaped character sequences, |
|||
e.g. \u0000 for NUL, \\ for backslash, etc. Thus any Unicode character |
|||
except for the control characters can appear in a jq key. |
|||
Therefore, assuming the availability in jq of the test/1 builtin, the test |
|||
in jq for whether a character can appear literally in a jq identifier or key is: |
|||
<lang jq>test("[^\u0000-\u0007F]")</lang> |
|||
===Symbols=== |
|||
The following function screens for characters by "\p" class: |
|||
<lang jq>def is_character(class): |
|||
test( "\\p{" + class + "}" );</lang> |
|||
For example, to test whether a character is a Unicode letter, symbol or numeric character: |
|||
<lang jq>is_character("L") or is_character("S") or is_character("N")</lang> |
|||
An efficient way to count the number of Unicode characters within a character class is |
|||
to use the technique illustrated by the following function: |
|||
<lang jq>def count(class; m; n): |
|||
reduce (range(m;n) | [.] | implode | select( test( "\\p{" + class + "}" ))) as $i |
|||
(0; . + 1);</lang> |
|||
For example the number of Unicode "symbol" characters can be obtained by evaluating: |
|||
<lang jq>count("S"; 0; 1114112)</lang> |
|||
The result is 3958. |
|||
=={{header|ooRexx}}== |
=={{header|ooRexx}}== |