User talk:Zmi007

From Rosetta Code
Revision as of 19:15, 24 November 2016 by rosettacode>Arbautjc (→‎Call a foreign-language function)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Strings in Fortran

We seem to have slightly different interpretations. I agree that F77 and later offers a CHARACTER*12 TEXT type declaration that indeed creates a variable of a type built in to the compiler's workings, and the variations in the wording of the declaration are unimportant though I do dislike additional blather. However, for me this is not actually a "string" type of variable, because it does not incorporate a length in itself. When such an item (or a portion, or a text literal) is passed to a subprogram, there is a secret extra parameter that gives its size, but this is still not a length-of-a-string type item even though it equals the length of the text being passed. It can be called a string, but it isn't one as its length is fixed.

By contrast, Pascal does offer a string variable, whereby for an allowance of up to twelve characters as above, you write var text: string[12]; which is much the same. But, when you put text:="Blah"; then write out the value of text, you get four characters not twelve as you would with Fortran. Pascal's implementation scheme is as if there were a character zero in the item, which stores the current length, which can change. This is like pl/i and its character varying type though since Pascal works its strings in bytes only its maximum length for this implementation is of course 255. This also means that passing a piece of a string to a procedure requires that a working copy be made so that the length part can be placed without damaging the parent string. By contrast, passing a portion of a CHARACTER variable to a subprogram in Fortran requires no such copy. Just "start here", and "n characters". Thus a routine UPCASE(TEXT) changes the caller's TEXT without copy-in, copy-out or similar. This is just like passing some portion of an array: by reference.

Yes, it is possible with latter-day features to allocate a working version of a character variable that has the desired size of the moment, then later deallocate it and re-allocate it with a different size, but this just means to me that the item is not a proper string because ordinary string-style manipulations require a cloud of additional activity, activity which is just asking for mistakes as well as consuming cpu time... This can be automated - I have glanced at (in dismay) extensive incantations that appear to develop a genuine string-with-length type, plus its support procedures that allow its use in apparently normal statements. But such a notion is not one built in to the compiler.

More generally, I think a string type should not be just of a plain character. Aside from accommodating those who need sixteen-bit character codes (or even 32-bit), any type should be stringable. Something like var horde: string of double precision complex; - imagine a horde of (x,y) positions that if plotted consecutively would draw the 1000 foot contour line on a map. One might wish to manipulate such items in stringlike ways. Similarly, INDEX should enable the searching of an array (a string) of integers, and so forth.

Sorry, but I am not sure how to understand you. You agree that variable length strings exist in Fortran but you don't want to use them? So what is the point? Allocatable strings in Fortran are records containing string and its length and this snippet
<lang Fortran>
character(len=:), allocatable :: temp
temp = "some string"
temp = "some new looong string" ! automatic reallocation in F2008

</lang> allows you to change string and its length as smoothly as it only can be, so I suppose everyone should be satisfied with it?

String type and any other type is a good type if object of such type string is a first-class citizen. Don't ask me why it is neither so in other languages nor in old FORTRAN. Here are many pros of Fortran strings
  • Character strings can be of any length, up to a processor-dependent limit ([ISO 2010], 4.4.3.1). And derived-type allows you to create string types of larger sizes than that limit.
  • Fortran string is universal with KIND parameter and support multiple character sets (4.4.3.2).
  • There are no reserved characters in any Fortran character sets that serve as magic tokens (f.e. in C)
  • Operators (==, /=, <, >, <=, >=) work just as well on character strings
  • Character assignment and concatenation are performed using operators (= and //) like in other high level languages instead of functions.
  • Now about your problem: there is no need to worry about mismatches in string length (in most cases). Yes, the semantics of Fortran character comparisons require that shorter strings be blank-extended on the right to the length of longer strings. For character assignment, the value on the right side of the = sign is truncated or blank padded on the right as necessary. For me, it is a language feature and this feature eliminates a lot of headaches and increases productivity.
  • Substring references are easy and straightforward, using standard (begin:end) notation.
  • There is a complete set of character string intrinsic procedures ([ISO 2004], 13.5.3) extended(and simplified) also in next standards.
Humm, I think it would be easier to have stated that although F95 did not supply varying-length strings such as you describe as a requirement, it did require that if an extension were to be supplied to do so, it would have to conform. I have seen such a package for F95, packed with turgid definitions just as I mentioned. Later versions such as F2003 (and the F2008 you mention) include these features, but I do not have access to such compilers and so had not perused their capabilities.
You definitely should just give it a try. No recommendation from me how to and which compiler to install, because I am sure you know everything about. Zmi007 (talk) 13:43, 9 December 2015 (UTC)
I agree that marking the end by a zero as used in C-like systems is a plague, not only because I have wanted to use zero codes in the middle of a string for various purposes, but because it is improper. Also it is just asking for overflow errors, and it is idiotic not to know the length of a string without scanning it. Every time. However, I would not be happy in using these varying-length strings when changing the length (inserting, deleting or appending text) because of the overhead of allocation and deallocation of the working memory that is required.
I compiled the snippet code mentioned before, and was convinced by the way this program managed reallocation: it uses indeed realloc routine in glibc under Linux. That function is smart enough to use cached pool of memory blocks etc. and to resize memory obtained with malloc effectively during Fortran automatic reallocation. For large memory blocks query glibc uses system kernel routines and they often are even more effective (manipulation with page table entries etc). Zmi007 (talk) 13:43, 9 December 2015 (UTC)
Perhaps I should just swallow it and smile. Certainly, the use in Pascal and pl/1 of strings with a fixed upper size but varying usage length (up to that size) meant no re-allocation and deallocation (and by contrast, I recall dynamic strings in Basic on the ibmpc, where every so often the system would hang while the string memory pool was defragmented or whatever, and all other activity ceased for minutes while say a paper tape or cassette tape reader overflowed its input buffer - happily, if the prog. was compiled and run, not interpreted, this problem didn't arise) but yes, it was annoying to have to think of a largest-possible-size specification. But not too large. I noticed in pl/1 (IBM mainframe) that if work areas declared as say character(200) varying were re-declared as character(4000) varying to reduce the chance of inadequacy, the execution on the same data would run noticeably slower. I wrote my own text editor in 2,400 lines of pl/i and had to choose a size for variable ALINE, which usually would not exceed eighty characters to a line, or 133 if looking at a printer file ... 200 will do. But this proved restrictive, as sometimes, odd files were to be inspected. It turns out that pl/i offers character(n) varying controlled for variables declared on entry to a procedure (or block) whereby a string variable of size n could be defined and used with lengths up to n: there was one allocation but many usages without requiring deallocation and reallocation on length changes.
If one is juggling a few strings to prepare a message or similar, then no worries, but needing to perform hundreds of millions of allocate/deallocate operations is something else. I wish there were a function FMT(x) for example, that would return a string holding the text of the value for x, similar to the free-format output scheme, that could be used in statements such as "Yes, we have "//FMT(N)//" bananas." but F95 and CHARACTER variables require a fixed-size result such as " 7" rather than "7". I have even considered having the spurious spaces replaced by nulls (character code zero), but they might not be passed over by output devices and certain idiot systems would regard the string as terminating early.
Hm, for integer N you can write such function in seconds or? Zmi007 (talk) 13:43, 9 December 2015 (UTC)
I have no objection to the various operators for messing with strings, though I have been startled by the implementation of some, at least as applied to CHARACTER variables as used in the F95 compiler I mess with. For instance, the latter-day library function LEN_TRIM(text) finds the last non-blank, but, the actual code first copies text to a work area then scans that work area!
Do not understand me wrong but I believe that fidelity to old compiler is not always honesty ) Zmi007 (talk) 13:43, 9 December 2015 (UTC)

Writing an integer

I don't have access to a F2003 compiler, and don't want to spend my money to gain access, especially when the prices I've seen are around US$3,000. The system I use dual starts to Linux or wunduhs, so there is a possibility there, but also a lot of time and patience needed to find a compatible compiler. I do have a wunduhs F95 compiler (alas, its microflaccid "visual" interface won't work on 64 bit or later than wunduhs XP) ... And here is a collection of IFMT(n) functions...

Seriously? ) GNU gfortran is very close to be named a F2003 compiler and costs zero $. Your current task can be solved in a very elegant and simple way with parametrized derived types but there is one big problem there: they are still missing in almost any Fortran compiler including gfortran ) So, yes, you have right here, I wrote similar routines for integers with interface int2char but using kind parameter extends its usability to compilers with different integer kinds. Your problem is that you attempt to preserve portability of the code for very large set of compilers, but your price for it is too high, imho (subset of the Fortran language standard of the worst supported compiler in your set). With one simple step towards F2003 standard many problems become realizable with significantly less efforts and code. Zmi007 (talk) 23:48, 16 December 2015 (UTC)
I don't use Linux/unix much so the rigmarole of choosing and installing a compiler is an obstacle. I definitely do not want the old-style of first choosing and installing a C-compiler so as to compile the software I first desired. Still, I have made a trial attempt in the past, and was baulked by a trivial incompatibility, namely that A%B is the official style for compound data names, and I can't stand it, and have used A.B (as I had with B6700 Algol (which uses % to dignify a line comment, as does matlab, end ...), and pl/i, etc) so I was faced with either finding a compiler variant that allowed this deviationism, or, editing the source files to conform to a disliked form. That trial compiler also was not declared F2003 and may not have dealt with some of the constructions that try the diligence of those transferring from one compiler/system to another compiler/system - those would only be found via testing and compiler complaint, but the % for . complaints swamped the initial output. A serious difficulty was expected over the STRUCTURE/UNION/MAP facility (essentially, EQUIVALENCE ideas expanded to data structures) whereby disc file records could contain different types of data. This I expected trouble over but did not reach because of the .% issue. There is the TRANSFER facility that is approved furrytran these days, but I want to be sure that no actual transfer to/from work areas takes place! Data shovelling takes a lot of time, and adding intermediate stops would exacerbate the delay. To check on this would require time spent with tests and inspection of the compiler-generated code in the various cases - more time and patience. Naturally, the compiler documentation is not at all clear on such details. I was startled for instance to replace some parallel arrays by a data structure equivalent and find that the resulting programme ran about three times slower rather than at the same speed. It transpired that suddenly, a binary search routine was being invoked with its array parameter being copied in (and back), thus vitiating the speed advantage of a binary search over a linear search. Fortunately, there were only two usages, so in-line code was not too much effort and restored the pace yet retained the organisational clarity of a data aggregate in place of an aggregation of individual arrays.
Happy new year!Dinosaur (talk) 01:01, 31 December 2015 (UTC)

<lang Fortran>

     CHARACTER*2 FUNCTION I2FMT4(N)	!These are all the same.
      INTEGER*4 N			!But, the compiler doesn't offer generalisations.
       IF (N.LT.0) THEN	!Negative numbers cop a sign.
         IF (N.LT.-9) THEN	!But there's not much room left.
           I2FMT4 = "-!"	!So this means 'overflow'.
          ELSE			!Otherwise, room for one negative digit.
           I2FMT4 = "-"//CHAR(ICHAR("0") - N)	!Thus. Presume adjacent character codes, etc.
         END IF		!So much for negative numbers.
       ELSE IF (N.LT.10) THEN	!Single digit positive?
         I2FMT4 = " " //CHAR(ICHAR("0") + N)	!Yes. This.
       ELSE IF (N.LT.100) THEN	!Two digit positive?
         I2FMT4 = CHAR(N/10      + ICHAR("0"))	!Yes.
    1           //CHAR(MOD(N,10) + ICHAR("0")) !These.
       ELSE			!Otherwise,
         I2FMT4 = "+!" 	!Positive overflow.
       END IF			!So much for that.
     END FUNCTION I2FMT4	!No WRITE and FORMAT unlimbering.
     CHARACTER*8 FUNCTION I8FMT4(N)	!Oh for proper strings.
      INTEGER*4 N
      CHARACTER*8 HIC
       WRITE (HIC,1) N
   1   FORMAT (I8)
       I8FMT4 = HIC
     END FUNCTION I8FMT4

Combinations and permutations lead to ...

     INTERFACE I10FMT		!Alright, reduce some vexations.
      MODULE PROCEDURE I10FMT4,I10FMT2,I10FMT1
     END INTERFACE
     INTERFACE I8FMT		!Sigh.
      MODULE PROCEDURE I8FMT4,I8FMT2,I8FMT1
     END INTERFACE
     INTERFACE I2FMT		!Proper strings would enable IFMT alone.
      MODULE PROCEDURE I2FMT4,I2FMT2,I2FMT1
     END INTERFACE		!Perhaps a "string" module could be devised...
     INTERFACE FFMT
      MODULE PROCEDURE FFMT4,FFMT8
     END INTERFACE FFMT

</lang> On the face of it, the task is easy, but there are traps. Many Fortran compilers do not handle the case of a WRITE statement invoking a WRITE statement as via a function (e.g. WRITE (OUT,"(3A)") "Yes, we have ",IFMT(N)," Bananas!"), or, possibly allow this so long as one or the other, or perhaps both refrain from using a FORMAT in them, or the statements involved do not trigger some detail. Determining exactly what is possible on one compiler is not a good idea, because another compiler may well have different rules, and thereby, portability fades. Thus the I2FMT routines play with simple arithmetic rather than unlimber the FORMAT apparatus (which is also time-consuming: a fortnight back I modified a prog. that systematically used multiple WRITE statements of assorted texts (generating .kml text for Google Earth's system) into one that used compound WRITE statements of those texts and was startled by a fivefold increase in speed of execution, and there is the question of whether it is better to place the two characters one-at-a-time in two statements rather than use // - which is likely to mean that the expression's value is developed in a work area, then that is copied to the recipient. I couldn't face all this with I8FMT, though I did for function SLASHDATE, allowing four digits for the year number - this function gets heavy use in my major project. A second question is whether or not the function name can be used as a variable within the function, not merely as the destination for an assignment of the final result. In other words, function I8FMT's code could write to a variable I8FMT rather than to HIC then assign HIC's content to I8FMT. Similar usages are possible for arithmetic functions, but, some compilers I have encountered fail to do this properly and thereby, portability fades. If recursion is contemplated, then, a recursive function FACT(X) could refer to FACT within itself as a variable, but FACT(X - 1) would be a function invocation, and this protocol would work even for functions with no parameter as in F vs. F(). <lang Fortran>

      CHARACTER*10 FUNCTION SLASHDATE(DAYNUM)	!This is relatively innocent.

Caution! The Gregorian calendar did not exist prior to 15/10/1582! Confine expected operation to four-digit years, since fixed-field sizes are in mind. Can use this function in WRITE statements with FORMAT, since this function does not use them. Compilers of lesser merit can concoct code that bungles such double usage otherwise.

       INTEGER*4 DAYNUM	!-32768 to 32767 is just not adequate.
       TYPE(DATEBAG) D		!Though these numbers are more restrained.
       INTEGER N,L		!Workers.
        IF (DAYNUM.EQ.NOTADAYNUMBER) THEN	!Perhaps some work can be dodged.
          SLASHDATE = " Undated!!"	!No proper day number has been placed.
         RETURN		!So give up, rather than show odd results.
        END IF			!So much for confusion.
        D = MUNYAD(DAYNUM)	!Get the pieces.
        IF (D%DAY.GT.9) THEN	!Here we go.
          SLASHDATE(1:1) = CHAR(D%DAY/10 + ICHAR("0"))	!Faster than a table look-up?
         ELSE			!Even if not,
          SLASHDATE(1:1) = " "	!This should be quick.
        END IF			!So much for the tens digit.
        SLASHDATE(2:2) = CHAR(MOD(D%DAY,10) + ICHAR("0"))	!The units digit.
        SLASHDATE(3:3) = "/"	!Enough of the day number. The separator.
        IF (D%MONTH.GT.9) THEN	!Now for the month.
          SLASHDATE(4:4) = CHAR(D%MONTH/10 + ICHAR("0"))	!The tens digit.
         ELSE			!Not so often used. A table beckons...
          SLASHDATE(4:4) = " "	!Some might desire leading zeroes here.
        END IF			!Enough of October, November and December.
        SLASHDATE(5:5) = CHAR(MOD(D%MONTH,10) + ICHAR("0"))	!The units digit.
        SLASHDATE(6:6) = "/"	!Enough of the month number. The separator.
        L = 10			!The year value deserves a loop, it having four digits.
        N = ABS(D%YEAR)	!Should never be zero. 1BC is year -1 and 1AD is year = +1.
   1    SLASHDATE(L:L) = CHAR(MOD(N,10) + ICHAR("0"))	!But if it is, this will place a zero.
        N = N /10		!Drop a power of ten.
        L = L - 1		!Step back for the next digit.
        IF (L.GT.6) GO TO 1	!Thus always four digits, even if they lead with zero.
        IF (N.GT.0) SLASHDATE(7:7) = "?"	!Y > 9999? Might as well do something.
        IF (D%YEAR.LT.0) SLASHDATE(7:7) = "-"	!Years BC? Rather than give no indication.

c WRITE (SLASHDATE,1) D%DAY,D%MONTH,D%YEAR !Some compilers will bungle this. c 1 FORMAT (I2,"/",I2,"/",I4) !If so, a local variable must be used.

       RETURN			!Enough.		!As when SLASHDATE is invoked in a WRITE statement.
      END FUNCTION SLASHDATE	!Simple enough.

</lang> Naturally, I could do a proper job by repeatedly using MOD(N,10) and placing the appropriate digit and dividing N by ten but, aside from the annoyance of the digits coming out backwards (and you don't know how many there will be) there are traps here too. With two's complement, one cannot just determine the sign and then continue with ABS(N), because, in 16 bits, -32768 can't be represented as a positive integer. Very well, work always with negative integers, converting a positive N to negative. Then you discover (or should!) that the MOD function for negative numbers has two styles of behaviour and different computers (and their compilers) may well differ (Prof. Knuth remarks on this in the calculation for the date of Easter). Thus portability fades again. But in this case, because the number of digits resulting is unknown until the deed is done, a scratchpad is needed so that the desired digits only can be returned, with no leading spaces...

Which is pointless in F95, where CHARACTER functions always return a fixed number of characters - so I have routines such as SSPACE to single-space a line of text... Only with the latter day STRING style could this be done properly.

I'm still uncomfortable about the unlimbering of memory re-allocation on every change to a string. There are ploys, such as the obvious "do nothing" if the new size is the same as the old size. Presuming that this memory churning would be mixed in with the other memory churning of arrays of floating-point numbers, etc. this means that the precise state of the memory depends now not just on the churn of number storage, but also of text twiddling. Still, given that modern computers have plenty of memory and that there are no mistakes in the memory manipulation, this shouldn't matter... I have from time-to-time mused on a half-way house, wherein a string has a length and a maximum size: any manipulations up to the size limit incur no re-allocation and so can be done in-place. This would offload some of the demand for cunning in the memory re-allocation routines but of course means that the string manipulation procedures become even more complex. Dinosaur (talk) 08:28, 12 December 2015 (UTC)

Call a foreign-language function

I don't quite agree with your comment, on two points.

First, the definition. Quoting Foreign function interface:

Foreign function interface, or FFI, is a common name for a facility in a programming language (especially a high-level one that does not usually work in terms of pointers, raw structure layout, etc.) to invoke functions and access data structures defined using another one (especially C).

Calling C from Fortran qualifies obviously as FFI. The only difference with, say, a Common Lisp FFI to C, is that it's specified by the standard. C is nevertheless foreign to Fortran. It's rather rare for standardized languages, but not unique: Ada has interfaces to C, COBOL and Fortran, for instance.

Second point, you write that you should write a C wrapper. But C has no standard way to call another language. If you can do it with standard C, then you will be able to do it with Fortran and ISO_C_BINDING and if you need nonstandard C features, then the Fortran vendor may provide the same nonstandard features. And often it's what happens. A classic example is calling a STDCALL function on Windows, while, usually, C programs (and Fortran programs, at least with BIND(C)), use the CDECL convention. But, this is not really a language concern, it's on the ABI level. And, as C compiler provide the nonstandard __stdcall keyword, Fortran compilers provide either a STDCALL keyword (Absoft Pro Fortran), or another nonstandard way to declare the function: Intel Fortran has "DEC! ATTRIBUTES STDCALL :: fun", while GNU Fortran has the same (replace DEC with GNU). All in all, in all cases I know off, if you can do it in C, you can do it in Fortran without a wrapper. It may be simpler with a wrapper though.

There are other cases for which one must use nonstandard features. Another example is calling a function that needs arguments with a special memory alignment. Intel has "DEC$ ATTRIBUTES ALIGN" for this. Even in Fortran alone, this allows the compiler to optimize better with SIMD instructions.

Notice that even before ISO_C_BINDING, Fortran compiler vendors used to provide some way to interface with C. There are DEC extensions, for instance. Incidentally, Intel still uses these extensions instead of the new standard bindings, for interface modules to the Windows API. I guess it's so that older programs don't break. It's also often easier to use integer variables holding pointers, than the c_f_pointer and c_f_procpointer subroutines.

On the other hand, there are languages for which a wrapper is mandatory. An example is VBA, in 32 bit applications. Since VBA can only call STDCALL functions, a CDECL function can't be called without a wrapper (which may be written in C, Pascal or Fortran for instance). In 64 bits applications, this not a problem anymore, since there is only one calling convention.

Arbautjc (talk) 18:27, 24 November 2016 (UTC)