Memory layout of a data structure: Difference between revisions

m (→‎version 2: updated the output section.)
Line 289:
As before stated, there is no BIT facility, so packing is to byte boundaries. But, if one is determined to store thousands of records with minimal storage use, it may seem worth the effort to engage in the arithmetic to pack the likes of say three bits, followed by the thirty-two bits of a floating-point value, and so on, into a sequence of bytes which then would be written. In such a situation it may even be worth packing only a portion of the floating-point variable, if reduced precision is acceptable and one is certain of the usage of the bits within such a number. However, given the difficulty of access to the parts of such a packed aggregate, it is usually better to leave the byte/word packing and unpacking to the I/O system as via <code>WRITE (F,REC = n) THIS,M,NAME</code> and then manipulate the variables as their conveniently-aligned in-memory forms as ordinary variables, only repacking to the data structure form with a subsequent WRITE statement.
 
The INTEGER*''n'' opportunity is not fully flexible in that powers of two are usually the only options so that a value that might fit into INTEGER*3 will have to go into INTEGER*4. In any case this style is not helpful for decimal machines or binary computers whose word size is not a power of two. It is possible to break away from a byte base, especially when there are many variables with small ranges to represent. Suppose that V3 only has values 0, 1, 2; V5 has only 0, 1, 2, 3, 4; V4 only 0, 1, 2, 3; and V2 only 0, 1. Then a set of values could be encoded as a single decimal number: say 1230 for the four variables in that order, which would fit into a two byte integer instead of four one byte integers. That is merely changing base 256 to base 10, notionally, but a better packing is possible, Consider <code>V = V3 + 3*(V5 + 5*(V4 + 4*(V2)))</code> whose maximum value would be 2 + 3*(4 + 5*(3 + 4*1)) = 119, which will fit into one byte. If there were many such variables, then their packed values might require larger integers for even greater sharing. Variables with fractional values can be treated in a similar way, cautiously... With careful planning, such a compound value may even have helpful numerical properties, of service for (some) multi-key sorts. In the example, V2 is the high-order value so if the sort key happened to start V2, V4, V5, V3 then a four-variable comparison could be done in just one test. Unless the value overflows into the sign bit...
 
With careful planning, such a compound value may even have helpful numerical properties, of service for (some) multi-key sorts. In the example, V2 is the high-order value so if a desired sort key happened to include V2, V4, V5, V3 then a four-variable comparison could be done in just one test. Unless the value overflows into the sign bit. This may not be a problem if the sorted order is merely to facilitate a binary search that will use the same ordering, but there can still be surprises. The B6700 used a 48-bit word and its compiler did not offer the then-unknown CHARACTER type. One might store six eight-bit characters in a word without worrying over the ordering that will result - the B6700 had no integer representation as such, using floating-point numbers with no fractional part, so the numerical values resulting from character codes would be quite odd. However, it also worked in base eight and although forty-eight is divisible by three, the requirement for the sign of the exponent and the sign of the value uses only two bits. Thus, one bit, the topmost, did not participate in arithmetic operations and as a result, arithmetic could not distinguish between characters in the high end of a word that differed in their highest bit. Even .EQ. would fail and the compiler offered a special substitute, .IS. to test for equality. Other languages that offered character manipulation (such as the extensions to Algol) used entirely different methods that avoided this problem.
 
In a similar way, text content may employ only a limited character set so perhaps five bits per symbol would suffice, or some other packing scheme might suggest itself. There is also a whole world of compression algorithms. The end result is that a data structure manifesting as records in a disc file may be difficult to unpack into a convenient internal form even given a careful description of the layout.
1,220

edits