Special characters
From Rosetta Code
Programming Task
This is a programming task. It lays out a problem which Rosetta Code users are encouraged to solve, using languages they know.
See also: Quotes
Contents |
[edit] ALGOL 68
ALGOL 68 has several built-in character constants. The following characters are (respectively) the representations of TRUE and FALSE, the blank character ".", the character displayed when a number cannot being printed in the width provided. And the null character indicating the end of characters in a BYTES array.
printf(($"flip:"g"!"l$,flip)); printf(($"flop:"g"!"l$,flop)); printf(($"blank:"g"!"l$,blank)); printf(($"error char:"g"!"l$,error char)); printf(($"null character:"g"!"l$,null character))Output:
flip:T!flop:F! blank: ! error char:*!
null character:
To handle the output movement to (and input movement from) a device ALGOL 68 has the following four positioning procedures:
print(("new page:",new page));
print(("new line:",new line));
print(("space:",space));
print(("backspace:",backspace))
These procedures may not all be supported on a particular device.
If a particular device (CHANNEL) is set possible, then there are three built-in procedures that allow movement about this device.
- set char number - set the position in the current line.
- reset - move to the first character of the first line of the first page. For example a home or tape rewind.
- set - allows the movement to selected page, line and character.
ALGOL 68 pre-dates the current ASCII standard, and hence supports many non ASCII characters. Moreover ALGOL 68 had to work on 6-bits per byte hardware, hence it was necessary to be able to write the same ALGOL 68 code in strictly upper-case. Here are the special characters together with their upper-case alternatives (referred to as "worthy characters").
| Character | ASCII | Worthy |
|---|---|---|
| "₁₀" | \ | E |
| "≥" | >= | GE |
| "≤" | <= | LE |
| "≠" | /= ~= | NE |
| "¢" | # | CO |
| "⌊" | LWB | |
| "⌈" | UPB | |
| "⎕" | ELEM | |
| "¬" | ~ | NOT |
| "÷" | % | OVER |
| "×" | * | TIMES |
| "⊥" | I | |
| "°" | NIL | |
| "↑" | ** | UP |
| "↓" | DOWN | |
| "∨" | OR | |
| "∧" | & | AND |
| "←" | OF | |
| "╰" | LWS | |
| "╭" | UPS |
Most of these characters made their way into European standard characters sets (eg ALCOR and GOST). Ironically the ¢ character was dropped from later versions of America's own ASCII character set.
The character "₁₀" is one ALGOL 68 byte (versus 2 in Unicode).
[edit] Brainf***
The only characters that mean anything in BF are its commands:
> move the pointer one to the right
< move the pointer one to the left
+ increment the value at the pointer
- decrement the value at the pointer
, input one byte to memory at the pointer
. output one byte from memory at the pointer
[ begin loop if the value at the pointer is not 0
] end loop
All other characters are comments.
[edit] C++
C++ has several types of escape sequences, which are interpreted in various contexts. The main characters with special properties are the question mark (?), the pound sign (#), the backslash (\), the single quote (') and the double quote (").
[edit] Trigraphs
Trigraphs are certain character sequences starting with two question marks, which can be used instead of certain characters, and which are always and in all contexts interpreted as the replacement character. They can be used anywhere in the source, including, but not limited to string constants. The complete list is:
Trigraph Replacement letter
??( [
??) ]
??< {
??> }
??/ \
??= #
??' ^
??! |
??- ~
Note that interpretation of those trigraphs is the very first step in C++ compilation, therefore the trigraphs can be used instead of their replacement letters everywhere, including in all of the following escape sequences (e.g. instead of \u00CF (see next section) you can also write ??/u00CF, and it will be interpreted the same way).
Also note that some compilers don't interpret trigraphs by default, since today's character sets all contain the replacement characters, and therefore trigraphs are practically not used. However, accidentally using them (e.g. in a string constant) may change the code semantics on some compilers, so one should still be aware of them.
[edit] Universal character names and escaping newlines
Moreover, C++ allows to use arbitrary Unicode letters to be represented in the basic execution character set (which is a subset of ASCII), by using a so-called universal character name. Those have one of the forms
\uXXXX \UXXXXXXXX
where each X is to be replaced by a hex digit. For example, the German umlaut letter ü can be written as
\u00CF
or
\U000000CF
However, letters in the basic execution character set may not be written in this form (but since all those characters are in standard ASCII, writing them as universal character constants would only obfuscate anyway). If the compiler accepts direct usage of of non-ASCII characters somewhere in the code, the result must be the same as with the corresponding universal character name. For example, the following two lines, if accepted by the compiler, should have the same effect:
std::cout << "Tür\n"; std::cout << "T\u00FC\n";
Note that in principle, C++ would also allow to use such letters in identifiers, e.g.
extern int Tür; // if the compiler allows literal ü extern int T\u00FCr; // should in theory work everywhere
but that's not generally supported by existing compilers (e.g. g++ 4.1.2 doesn't support it).
Another escape sequence working everywhere is to escape the newline: If a backslash is at the end of the line, the next line is pasted to it without any space in between. For example:
int const\ ant; // defines a variable of type int named constant, not a variable of type int const named ant
[edit] String and character literal
A string literal is surrounded by double quotes("). A character literal is surrounded by single quotes ('). Example:
char const str = "a string literal"; char c = 'x'; // a character literal
The following escape sequences are only allowed inside string constants and character constants:
escape seq. meaning ASCII character/codepoint \a alert BEL ^G/7 \b backspace BS ^H/8 \f form feed FF ^L/12 \n newline LF ^J/10 \r carriage return CR ^M/13 \t tab TAB ^I/9 \v vertical tab VT ^K/11 \' single quote ' (unescaped ' would end character constant) \" double quote " (unescaped " would end string constant) \\ backslash \ (unescaped \ would introduce escape sequence) \? question mark ? (useful to break trigraphs in strings) \0 string end marker NUL ^@/0 (special case of octal char value) \nnn (octal char value) (each n must be an octal digit) \xnn (hex char value) (each n must be a hexadecimal digit)
Note that C++ doesn't guarantee ASCII. On non-ASCII platforms (e.g. EBCDIC), the rightmost column of course doesn't apply. However, \0 unconditionally has the value 0.
Also note that some compilers add the non-standard escape sequence \e for Escape (that is, the ASCII escape character).
[edit] The # character
The # character in C++ is special as it is interpreted only in the preprocessing phase, and shouldn't occur (outside of character/string constants) after preprocessing.
- If
#appears as first non-whitespace character in the line, it introduces a preprocessor directive. For example
#include <iostream>
- Inside macro definitions, a single
#is the stringification operator, which turns its argument into a string. For example:
#define STR(x) #x int main() { std::cout << STR(Hello world) << std::endl; // STR(Hello world) expands to "Hello world" }
- Also inside macro definitions,
##is the token pasting operator. For example:
#define THE(x) the_ ## x int THE(answer) = 42; // THE(answer) expands to the_answer
Note that the # character is not interpreted specially inside character or string literals.
[edit] Haskell
Comments
-- comment here until end of line
{- comment here -}
Operator symbols (nearly any sequence can be used)
! # $ % & * + - . / < = > ? @ \ ^ | - ~ : : as first character denotes constructor
Reserved symbol sequences
.. : :: = \ | <- -> @ ~ => _
Infix quotes
`identifier` (to use as infix operator)
Characters
'.' \ escapes
Strings
"..." \ escapes
Special escapes
\a alert \b backspace \f form feed \n new line \r carriage return \t horizontal tab \v vertical tab
Other
( ) (grouping)
( , ) (tuple type/tuple constructor)
{ ; } (grouping inside let, where, do, case without layout)
[ , ] (list type/list constructor)
[ | ] (list comprehension)
Unicode characters, according to category:
Upper case (identifiers) Lower case (identifiers) Digits (numbers) Symbol/punctuation (operators)
[edit] Java
Math:
& | ^ ~ (bitwise AND, OR, XOR, and NOT) >> << (bitwise arithmetic shift) >>> (bitwise logical shift) + - * / = % (+ can be used for String concatenation) any of the previous math operators can be placed in front of an equals sign to make a self-operation replacement: x = x + 2 is the same as x += 2 ++ -- (increment and decrement--before a variable for pre (++x), after for post(x++))
Boolean:
! ~ (both NOT) ^ && || (XOR, AND, OR) == < > != <= >= (comparison)
Other:
{ } (scope)
( ) (for functions)
; (line delimiter)
[ ] (array index)
" (string literal)
' (character literal)
? : (ternary operator)
Escape characters:
\b (Backspace) \n (Line Feed) \r (Carriage Return) \f (Form Feed) \t (Tab) \0 (Null) Note. This is actually a OCTAL escape but handy nonetheless \' (Single Quote) \" (Double Quote) \\ (Backslash) \DDD (Octal Escape Sequence, D is a number between 0 and 7) \uDDDD (Unicode Escape Sequence, D is any digit between 0 and 9)
[edit] LaTeX
LaTeX has ten special characters: # $ % & ~ _ ^ \ { }
To make any of these characters appear literally in output, prefix the character with a \. For example, to typeset 5% of $10 you would type
5\% of \$10
Note that the set of special characters in LaTeX isn't really fixed, but can be changed by LaTeX code. For example, the package ngerman (providing German-specific definitions, including easier access to umlaut letters) re-defines the double quote character (") as special character, so you can more easily write German words like "hören" (as h"oren instead of h{\"o}ren).
[edit] PowerShell
PowerShell is unusual in that it retains many of the escape sequences of languages descended from C, except that unlike these languages it uses a backtick ` as the escape character rather than a backslash \. For example `n is a new line and `t is a tab.
[edit] XSLT
XSLT is based on XML, and so has the same special characters which must be escaped using character entities:
- & - &
- < - <
- > - >
- " - "
- ' - '
Any Unicode character may also be represented via its decimal code point (&#nnnn;) or hexadecimal code point (�).
Categories: Programming Tasks | Basic language learning | ALGOL 68 | Brainf*** | C++ | Haskell | Java | LaTeX | PowerShell | XSLT

