Memory layout of a data structure: Difference between revisions

Line 287:
It is further possible to declare that STUFF is to occupy the same storage as the named variables, If the declaration was <code>CHARACTER*1 STUFF(37)</code>, then <code>EQUIVALENCE (STUFF(1),THIS),(STUFF(9),M),(STUFF(10),NAME)</code> would mean that STUFF occupied the same storage as those variables, or rather, that the variables occupied the same storage as STUFF - indeed, they could overlay each other, which would be unlikely to be helpful. This could mean that a floating-point or integer variable was ''not'' aligned to a word boundary with the consequent penalty in access, for instance by having THIS start with STUFF(2). Some systems may not allow byte-based addressing, only word-based so complications can arise. But this demonstrates precise knowledge of the memory layout of a data structure. The more modern compilers that allow the TYPE declaration typically do not allow the appearance of such variables in EQUIVALENCE statements, to prevent access to the memory layout of such data structures. Others allow a new version of EQUIVALENCE (which the moderns deprecate) via the MAP statement, but this is not standard Fortran.
 
As before stated, there is no BIT facility, so packing is to byte boundaries. But, if one is determined to store thousands of records with minimal storage use, it may seem worth the effort to engage in the arithmetic to pack the likes of say three bits, followed by the thirty-two bits of a floating-point value, and so on, into a sequence of bytes which then would be written. In such a situation it may even be worth packing only a portion of the floating-point variable, if reduced precision is acceptable and one is certain of the usage of the bits within such a number. However, given the difficulty of access to the parts of such a packed aggregate, it is usually better to leave the byte/word packing and unpacking to the I/O system as via <code>WRITE (F,REC = n) THIS,M,NAME</code> and then manipulate the variables as their conveniently-aligned in-memory forms as ordinary variables, only repacking to the data structure form with a subsequent WRITE statement.
 
The INTEGER*''n'' opportunity is not fully flexible in that powers of two are usually the only options so that a value that might fit into INTEGER*3 will have to go into INTEGER*4. It is possible to break away from a byte base, especially when there are many variables with small ranges to represent. Suppose that V3 only has values 0, 1, 2; V5 has only 0, 1, 2, 3, 4; V4 only 0, 1, 2, 3; and V2 only 0, 1. Then a set of values could be encoded as a single decimal number: say 1230 for the four variables in that order, which would fit into a two byte integer instead of four one byte integers. That is merely changing base 256 to base 10, notionally, but a better packing is possible, Consider <code>V = V3 + 3*(V5 + 5*(V4 + 4*(V2)))</code> whose maximum value would be 2 + 3*(4 + 5*(3 + 4*1)) = 119, which will fit into one byte. If there were many such variables, then their packed values might require larger integers for even greater sharing. Variables with fractional values can be treated in a similar way, cautiously... With careful planning, such a compound value may even have helpful numerical properties, of service for (some) multi-key sorts.
The INTEGER*''n'' opportunity is not fully flexible in that powers of two are usually the only options so that a value that might fit into INTEGER*3 will have to go into INTEGER*4.
 
In a similar way, text content may employ only a limited character set so perhaps five bits per symbol would suffice, or some other packing scheme might suggest itself. There is also a whole world of compression algorithms. The end result is that a data structure manifesting as records in a disc file may be difficult to unpack into a convenient internal form even given a careful description of the layout.
 
=={{header|Go}}==
1,220

edits