String length: Difference between revisions

Content added Content deleted
Line 987: Line 987:
Fortran 77 introduced variables of type CHARACTER and associated syntax. These are fixed-size entities, declared at compile time as in <code>CHARACTER*66 TEXT</code>, however a subroutine (or function) receiving such a variable could declare it as <code>CHARACTER*(*) TEXT</code> so that any size may be supplied to the routine, and with F90 came the ability within subroutines (or functions) to declare items of a size determined at run time. There is no associated length variable, as with strings that have both a content ''and'' a length, nor is there a special character value (such as zero) deemed to mark the end-of-text in such a variable to give string-like facilities. However, with F90 came facilities, standardised in F2003 whereby a CHARACTER variable could be re-allocated exactly the right amount of storage whenever it was assigned to. So, <code>TEXT = "this"</code> would cause TEXT to become a CHARACTER variable of length four, adjusted so at run time. Again, the length information is not associated with the variable itself, for instance as the content of a character zero prefixing the content to enable strings of a length up to 255. The length information must be stored somewhere...
Fortran 77 introduced variables of type CHARACTER and associated syntax. These are fixed-size entities, declared at compile time as in <code>CHARACTER*66 TEXT</code>, however a subroutine (or function) receiving such a variable could declare it as <code>CHARACTER*(*) TEXT</code> so that any size may be supplied to the routine, and with F90 came the ability within subroutines (or functions) to declare items of a size determined at run time. There is no associated length variable, as with strings that have both a content ''and'' a length, nor is there a special character value (such as zero) deemed to mark the end-of-text in such a variable to give string-like facilities. However, with F90 came facilities, standardised in F2003 whereby a CHARACTER variable could be re-allocated exactly the right amount of storage whenever it was assigned to. So, <code>TEXT = "this"</code> would cause TEXT to become a CHARACTER variable of length four, adjusted so at run time. Again, the length information is not associated with the variable itself, for instance as the content of a character zero prefixing the content to enable strings of a length up to 255. The length information must be stored somewhere...


Previously, character data would be stored in arithmetic variables, using format codes such as <code>A1</code> to store one character per variable, which might be an integer or a floating-point variable of much larger size. Format <code>A2</code> would store two such characters, and so on. Code A1 would give ease of manipulation, while A8 (say for a REAL*8 variable) would save space. Numerical values would be strange, and word sizes may not be a multiple of eight bits nor character encodements require eight bits, especially on a decimal computer.
Previously, character data would be stored in arithmetic variables, using format codes such as <code>A1</code> to store one character per variable, which might be an integer or a floating-point variable of much larger size. Format <code>A2</code> would store two such characters, and so on. Code A1 would give ease of manipulation, while A8 (say for a REAL*8 variable) would save space. Numerical values would be strange, and word sizes may not be a multiple of eight bits nor character encodements require eight bits, especially on a decimal computer such as the IBM1620 where storage usage was counted in digits, and a character required two.


An intrinsic function LEN(text) reports the number of characters in the variable (with no consideration of any storage needed anywhere to hold the length), while SIZE(array) reports the number of elements in an array and SIZEOF(''x'') may be available to report the number of bytes of storage of ''x''. Since these days, everyone uses computers with eight-bit characters, the result from LEN will be equivalent to both a byte and a character count.
An intrinsic function LEN(text) reports the number of characters in the variable (with no consideration of any storage needed anywhere to hold the length), while SIZE(array) reports the number of elements in an array and SIZEOF(''x'') may be available to report the number of bytes of storage of ''x''. Since these days, everyone uses computers with eight-bit characters and this is deemed universal, the result from LEN will be equivalent to both a byte and a character count.


There is no facility for fancy Unicode schemes, other than by writing suitable routines. In that regard, plotting packages often supply a special function that returns the length of a text string, ''as it would appear on the plot, in plotting units'', especially useful when the plotter's rendition of text employs a proportionally-spaced typeface and interprets superscripts and subscripts and so forth, so that the programmer can prepare code to juggle with the layout. This is of course not in any standard.
There is no facility for fancy Unicode schemes, other than by writing suitable routines. In that regard, plotting packages often supply a special function that returns the length of a text string, ''as it would appear on the plot, in plotting units'', especially useful when the plotter's rendition of text employs a proportionally-spaced typeface and interprets superscripts and subscripts and so forth, so that the programmer can prepare code to juggle with the layout, perhaps of mathematical expressions. This is of course not in any standard.


===Byte Length===
===Byte Length===