ASCII

From Rosetta Code

ASCII stands for American Standard Code For Information Interchange. It was first created in 1963 and is the basis for standardized data encoding methods such as Unicode that almost all computers follow today. The original ASCII standard defines 128 different values, each of which represent different characters, such as the alphabet, numbers, punctuation, etc. Unlike UTF-8, every ASCII character is exactly one byte long, making routines that use ASCII very easy to write.

Control Codes

Control codes make up the first 32 ASCII characters. With a few exceptions, these do not have a corresponding key on your keyboard. They are used to tell a computer program various information such as where a new line begins, where a file ends, etc. Of course, what these characters actually do depends on the program itself, but the ASCII standard is intended to have these codes do the same thing regardless of what program is using them. This list is not (currently) exhaustive, but showcases a few control codes in common use today, as well as a few historic ones that are no longer used.

  • 0: (NUL). This is probably one of the most important codes of all. This marks the end of a text string, or other various data fields. Without it, your typical "putS" (Print String) routine would go on forever and eventually crash! Computers don't understand the concept of the end of a data range natively, and often rely on NUL to know when to stop reading. (Some languages place the string size as metadata before the actual string itself, but others use a null terminator)


  • 7: Bell (BEL). The computer makes a beeping sound when reading this control code.


  • 8: Backspace (BS). This will delete the character placed before the cursor.


  • 9: Horizontal Tab (HT). This is your Tab key.


  • 10: Line Feed (LF). This causes the text cursor to move down to the next line, but its horizontal position is unchanged. The phrase "line feed" is also from typewriters, where turning the knob would feed more paper through the carriage. ASCII 13 followed by ASCII 10 makes up a "new line" command (aka \n in C)


  • 13: Carriage Return (CR). This causes the text cursor to go back to the far-left side of the screen (in the days of ASCII, computers weren't designed for languages other than English, so this assumed you were writing left to right. The term "carriage return" comes from typewriters, when pressing the "return" key would make the carriage (the cylinder that held the paper) slide back to the left.


  • 27: ESC. This is the Escape key!
  • 32: This is what you get when you hit the spacebar. (It's a blank space.) This was the most convenient location for the space character to be, as it's just in front of the actual visible characters. Old-school computers often implemented ASCII by using the value associated with each character as an index for a lookup-table of tile graphics stored in ROM, and having the beginning be the blank space meant that you could convert ASCII to the table format by simply subtracting 32, and a video memory that is initialized to zero will show blank spaces instead of whatever character happens to be at that index.
  • 127: DEL. This is the Delete key, and is also the last standardized ASCII code. For some reason it's not placed with the other control codes.

Anything from 128 to 255 was dependent on the hardware and/or the program being run. Many programs used this extra space to represent letters with accent marks common to non-English languages that used the Latin alphabet, as well as rudimentary character graphics to allow for simple game creation (such as playing card suits, smiley faces, stickmen, etc.)

Numbers

The digits 0 through 9 are mapped to hexadecimal values 0x30 to 0x39 respectively. This allows for easy conversion from actual numeric data (which every computer stores internally as hexadecimal) to their ASCII equivalent. A number stored in "unpacked" Binary Coded Decimal (where each digit 0-9 gets its own byte, with the top 4 bits of the byte being zero) can easily be converted to ASCII by adding 0x30 to each byte.

Letters

The upper-case letter A is equal to 0x41, and the upper case letter Z equals 0x5A. The lower case letters are all 32 spaces after the lower-case ones, which means that adding 0x20 to any upper-case letter of the alphabet will return its lower-case form. As you can see, ASCII was designed to make conversions easy. Strangely enough, this 0x20 conversion factor also applies to brackets [] which become curly braces{} and the backslash \ which becomes the vertical bar |

Punctuation

Unfortunately, this is where ASCII no longer "lines up" with modern keyboards, so to speak. The keys above the number keys (!@#$%^&*()) are a bit strange. Some of them are 16 spaces apart from their "no-Shift" counterparts on the modern keyboard layout, and others are not! As computers moved away from ASCII in favor of Unicode, and the need for certain characters grew, I imagine the keyboard became less ASCII-friendly over time.

The following ASCII values can be toggled by flipping bit 4. Each pair of characters occupy the same key on a standard US keyboard:

  • 1 <-> !
  • 3 <-> #
  • 4 <-> $
  • 5 <-> %
  • , <-> < (Comma and Less Than Sign)
  • . <-> > (Period and Greater Than Sign)
  • / <-> ?

Other punctuation keys do not follow this "bit 4 rule" anymore. They most likely did on keyboards in the late 20th century but don't follow it now.

Citations

See Also