Idiomatically determine all the characters that can be used for symbols: Difference between revisions

Content added Content deleted

Inline

@@ Line 157: / Line 157: @@
 Unicode Identifier start: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐ...
 Unicode Identifier part: [0][1][2][3][4][5][6][7][8][14][15][16][17][18][19][20][21][22][23][24][25][26][27][48][49]...</pre>
+=={{header|jq}}==
+===jq identifiers===
+Excluding key names from consideration, in jq 1.4 the set of characters that can be
+used in jq identifiers corresponds to the regex: [A-Za-z0-9$_].
+Thus, assuming the availability of test/1 as a builtin, the test in jq
+for a valid identifier character is: test("[A-Za-z0-9$_]").
+To generate a string of such characters idiomatically:
+<lang jq>[range(0;128) | [.] | implode | select(test("[A-Za-z0-9$_]"))] | add</lang>
+jq 1.5 also allows ":" as a joining character in the form "module::name".
+===JSON key names===
+Any JSON string can be used as a key.  Accordingly,
+some characters must be entered as escaped character sequences,
+e.g. \u0000 for NUL, \\ for backslash, etc.  Thus any Unicode character
+except for the control characters can appear in a jq key.
+Therefore, assuming the availability in jq of the test/1 builtin, the test
+in jq for whether a character can appear literally in a jq identifier or key is:
+<lang jq>test("[^\u0000-\u0007F]")</lang>
+===Symbols===
+The following function screens for characters by "\p" class:
+<lang jq>def is_character(class):
+   test( "\\p{" + class + "}" );</lang>
+For example, to test whether a character is a Unicode letter, symbol or numeric character:
+<lang jq>is_character("L") or is_character("S") or is_character("N")</lang>
+An efficient way to count the number of Unicode characters within a character class is
+to use the technique illustrated by the following function:
+<lang jq>def count(class; m; n):
+  reduce (range(m;n) | [.] | implode | select( test( "\\p{" + class + "}" ))) as $i
+    (0; . + 1);</lang>
+For example the number of Unicode "symbol" characters can be obtained by evaluating:
+<lang jq>count("S"; 0; 1114112)</lang>
+The result is 3958.
 =={{header|ooRexx}}==