User talk:Zmi007

Revision as of 13:43, 9 December 2015 by rosettacode>Zmi007 (short comments)

Strings in Fortran

We seem to have slightly different interpretations. I agree that F77 and later offers a CHARACTER*12 TEXT type declaration that indeed creates a variable of a type built in to the compiler's workings, and the variations in the wording of the declaration are unimportant though I do dislike additional blather. However, for me this is not actually a "string" type of variable, because it does not incorporate a length in itself. When such an item (or a portion, or a text literal) is passed to a subprogram, there is a secret extra parameter that gives its size, but this is still not a length-of-a-string type item even though it equals the length of the text being passed. It can be called a string, but it isn't one as its length is fixed.

By contrast, Pascal does offer a string variable, whereby for an allowance of up to twelve characters as above, you write var text: string[12]; which is much the same. But, when you put text:="Blah"; then write out the value of text, you get four characters not twelve as you would with Fortran. Pascal's implementation scheme is as if there were a character zero in the item, which stores the current length, which can change. This is like pl/i and its character varying type though since Pascal works its strings in bytes only its maximum length for this implementation is of course 255. This also means that passing a piece of a string to a procedure requires that a working copy be made so that the length part can be placed without damaging the parent string. By contrast, passing a portion of a CHARACTER variable to a subprogram in Fortran requires no such copy. Just "start here", and "n characters". Thus a routine UPCASE(TEXT) changes the caller's TEXT without copy-in, copy-out or similar. This is just like passing some portion of an array: by reference.

Yes, it is possible with latter-day features to allocate a working version of a character variable that has the desired size of the moment, then later deallocate it and re-allocate it with a different size, but this just means to me that the item is not a proper string because ordinary string-style manipulations require a cloud of additional activity, activity which is just asking for mistakes as well as consuming cpu time... This can be automated - I have glanced at (in dismay) extensive incantations that appear to develop a genuine string-with-length type, plus its support procedures that allow its use in apparently normal statements. But such a notion is not one built in to the compiler.

More generally, I think a string type should not be just of a plain character. Aside from accommodating those who need sixteen-bit character codes (or even 32-bit), any type should be stringable. Something like var horde: string of double precision complex; - imagine a horde of (x,y) positions that if plotted consecutively would draw the 1000 foot contour line on a map. One might wish to manipulate such items in stringlike ways. Similarly, INDEX should enable the searching of an array (a string) of integers, and so forth.

Sorry, but I am not sure how to understand you. You agree that variable length strings exist in Fortran but you don't want to use them? So what is the point? Allocatable strings in Fortran are records containing string and its length and this snippet
<lang Fortran>
character(len=:), allocatable :: temp
temp = "some string"
temp = "some new looong string" ! automatic reallocation in F2008

</lang> allows you to change string and its length as smoothly as it only can be, so I suppose everyone should be satisfied with it?

String type and any other type is a good type if object of such type string is a first-class citizen. Don't ask me why it is neither so in other languages nor in old FORTRAN. Here are many pros of Fortran strings
  • Character strings can be of any length, up to a processor-dependent limit ([ISO 2010], 4.4.3.1). And derived-type allows you to create string types of larger sizes than that limit.
  • Fortran string is universal with KIND parameter and support multiple character sets (4.4.3.2).
  • There are no reserved characters in any Fortran character sets that serve as magic tokens (f.e. in C)
  • Operators (==, /=, <, >, <=, >=) work just as well on character strings
  • Character assignment and concatenation are performed using operators (= and //) like in other high level languages instead of functions.
  • Now about your problem: there is no need to worry about mismatches in string length (in most cases). Yes, the semantics of Fortran character comparisons require that shorter strings be blank-extended on the right to the length of longer strings. For character assignment, the value on the right side of the = sign is truncated or blank padded on the right as necessary. For me, it is a language feature and this feature eliminates a lot of headaches and increases productivity.
  • Substring references are easy and straightforward, using standard (begin:end) notation.
  • There is a complete set of character string intrinsic procedures ([ISO 2004], 13.5.3) extended(and simplified) also in next standards.
Humm, I think it would be easier to have stated that although F95 did not supply varying-length strings such as you describe as a requirement, it did require that if an extension were to be supplied to do so, it would have to conform. I have seen such a package for F95, packed with turgid definitions just as I mentioned. Later versions such as F2003 (and the F2008 you mention) include these features, but I do not have access to such compilers and so had not perused their capabilities.
You definitely should just give it a try. No recommendation from me how to and which compiler to install, because I am sure you know everything about. Zmi007 (talk) 13:43, 9 December 2015 (UTC)
I agree that marking the end by a zero as used in C-like systems is a plague, not only because I have wanted to use zero codes in the middle of a string for various purposes, but because it is improper. Also it is just asking for overflow errors, and it is idiotic not to know the length of a string without scanning it. Every time. However, I would not be happy in using these varying-length strings when changing the length (inserting, deleting or appending text) because of the overhead of allocation and deallocation of the working memory that is required.
I compiled the snippet code mentioned before, and was convinced by the way this program managed reallocation: it uses indeed realloc routine in glibc under Linux. That function is smart enough to use cached pool of memory blocks etc. and to resize memory obtained with malloc effectively during Fortran automatic reallocation. For large memory blocks query glibc uses system kernel routines and they often are even more effective (manipulation with page table entries etc). Zmi007 (talk) 13:43, 9 December 2015 (UTC)
Perhaps I should just swallow it and smile. Certainly, the use in Pascal and pl/1 of strings with a fixed upper size but varying usage length (up to that size) meant no re-allocation and deallocation (and by contrast, I recall dynamic strings in Basic on the ibmpc, where every so often the system would hang while the string memory pool was defragmented or whatever, and all other activity ceased for minutes while say a paper tape or cassette tape reader overflowed its input buffer - happily, if the prog. was compiled and run, not interpreted, this problem didn't arise) but yes, it was annoying to have to think of a largest-possible-size specification. But not too large. I noticed in pl/1 (IBM mainframe) that if work areas declared as say character(200) varying were re-declared as character(4000) varying to reduce the chance of inadequacy, the execution on the same data would run noticeably slower. I wrote my own text editor in 2,400 lines of pl/i and had to choose a size for variable ALINE, which usually would not exceed eighty characters to a line, or 133 if looking at a printer file ... 200 will do. But this proved restrictive, as sometimes, odd files were to be inspected. It turns out that pl/i offers character(n) varying controlled for variables declared on entry to a procedure (or block) whereby a string variable of size n could be defined and used with lengths up to n: there was one allocation but many usages without requiring deallocation and reallocation on length changes.
If one is juggling a few strings to prepare a message or similar, then no worries, but needing to perform hundreds of millions of allocate/deallocate operations is something else. I wish there were a function FMT(x) for example, that would return a string holding the text of the value for x, similar to the free-format output scheme, that could be used in statements such as "Yes, we have "//FMT(N)//" bananas." but F95 and CHARACTER variables require a fixed-size result such as " 7" rather than "7". I have even considered having the spurious spaces replaced by nulls (character code zero), but they might not be passed over by output devices and certain idiot systems would regard the string as terminating early.
Hm, for integer N you can write such function in seconds or? Zmi007 (talk) 13:43, 9 December 2015 (UTC)
I have no objection to the various operators for messing with strings, though I have been startled by the implementation of some, at least as applied to CHARACTER variables as used in the F95 compiler I mess with. For instance, the latter-day library function LEN_TRIM(text) finds the last non-blank, but, the actual code first copies text to a work area then scans that work area!
Do not understand me wrong but I believe that fidelity to old compiler is not always honesty ) Zmi007 (talk) 13:43, 9 December 2015 (UTC)
Return to the user page of "Zmi007".