String: Difference between revisions

From Rosetta Code
Content added Content deleted
(Created page with "'''Strings''' are sequences of data that are interpreted as text. Most programming languages handle them in a similar way, and have built-in code for displaying strings and su...")
 
Line 15: Line 15:


==Escape Character==
==Escape Character==
An escape character is used to print a text character as-is, stripping it of all of its special meaning. Without escape characters, we'd run into a problem if we wanted quotation marks to appear in our string when printed to the screen, since quotation marks often mark the beginning and end of the string literal. The same is true if we wanted to have a string that happened to include the comment character in it. Escape characters tell the compiler that this character is part of the string and not a command to the compiler. In many languages, the backslash <code>\</code> is the escape character, but this varies depending on the language.
An escape character is used to print a text character as-is, stripping it of all of its special meaning. Without escape characters, we'd run into a problem if we wanted quotation marks to appear in our string when printed to the screen, since quotation marks often mark the beginning and end of the string literal. The same is true if we wanted to have a string that happened to include the comment character in it. Escape characters tell the compiler that ''the character directly after it'' is part of the string and not a command to the compiler. In many languages, the backslash <code>\</code> is the escape character, but this varies depending on the language.


Escape characters are also used to encode special instructions that otherwise the user would have an extremely difficult time supplying to the computer. In C and many other languages, <code>\n</code> tells the computer to start a new line. The <code>\n</code> doesn't actually get printed; rather, the computer sees it, advances the text cursor to the next line, and skips printing those characters. If you wanted to actually print "\n" to the screen you would need to have an escape character in front of it, like this: "\\n"
Escape characters are also used to encode special instructions that otherwise the user would have an extremely difficult time supplying to the computer. In C and many other languages, <code>\n</code> tells the computer to start a new line. The <code>\n</code> doesn't actually get printed; rather, the computer sees it, advances the text cursor to the next line, and skips printing those characters. If you wanted to actually print "\n" to the screen you would need to have an escape character in front of it, like this: "\\n"

Revision as of 21:39, 14 November 2021

Strings are sequences of data that are interpreted as text. Most programming languages handle them in a similar way, and have built-in code for displaying strings and substituting variables.

How Strings Are Stored In Memory

Strings are encoded either using ASCII or Unicode, a scheme that maps a set of glyphs to specific numeric values. These encoding schemes are standardized so that text data is portable across multiple programs and architectures.

For example, this is the encoding of "Hello World" in ASCII:

0x48,0x65,0x6c,0x6c,0x6f,0x20,0x57,0x6f,0x72,0x6c,0x64,0x00

The extra byte equaling 00 at the end of the string is called the null terminator. Any time you create a string in C or most other high-level languages, the compiler will sneak in a 0 at the end of whatever you typed. Why does it do this? Without a null terminator, the computer would have no idea where the string ends! It would just keep reading whatever is stored in memory after the string, which could be anything. (Although most languages will have you use quotation marks to indicate a string, the quotation marks are not part of the string itself.) Nearly all languages that can work with strings will handle string termination for you automatically.

Control Codes

The first 32 characters of ASCII and Unicode are reserved for control codes. Most of these are a relic of the old teletype days in the 1970s and are no longer of use to most computers today, but a few of them are still used widely (like 0 for the null terminator, 8 for backspace, etc.) Most of them are nowhere to be found on your keyboard; they are used internally by the computer as a signal to perform certain tasks, or to mark the beginning or end of various data. Most high-level languages implement their own control codes using an escape character.

Escape Character

An escape character is used to print a text character as-is, stripping it of all of its special meaning. Without escape characters, we'd run into a problem if we wanted quotation marks to appear in our string when printed to the screen, since quotation marks often mark the beginning and end of the string literal. The same is true if we wanted to have a string that happened to include the comment character in it. Escape characters tell the compiler that the character directly after it is part of the string and not a command to the compiler. In many languages, the backslash \ is the escape character, but this varies depending on the language.

Escape characters are also used to encode special instructions that otherwise the user would have an extremely difficult time supplying to the computer. In C and many other languages, \n tells the computer to start a new line. The \n doesn't actually get printed; rather, the computer sees it, advances the text cursor to the next line, and skips printing those characters. If you wanted to actually print "\n" to the screen you would need to have an escape character in front of it, like this: "\\n"

Substitution Characters

This is a similar concept to escape characters, but instead allows you to print a variable value, such as a number or the result of some calculation. The syntax for doing so will vary, but here's an example from C: <lang C> printf("The sum of A plus B is %d", sum(a+b)); </lang>

Here, % is the substitution character, and d tells the computer to substitute with a decimal value. The expression after the comma is what will be replacing the %d. Nearly all languages with a built-in print function will handle conversion of numeric data to text characters for you - it's only assembly that doesn't do this automatically I believe.