String length: Difference between revisions

Line 2,305:
===Grapheme Length===
<lang ruby>say str.graphs.len; #=> 4</lang>
=={{header|Simula}}==
Simula has no bultin support for character encodings (Unicode was not even invented in the year 1967). The encoding was regarded responsibility of the operating system and one byte must match one character.
So character constants encoded in UTF-8 are not possible.
But reading from a utf8-encoding file is actually possible.
 
Input:<pre>møøse
𝔘𝔫𝔦𝔠𝔬𝔡𝔢
J̲o̲s̲é̲
</pre>
===Byte Length===
<lang simula>BEGIN
TEXT LINE;
WHILE NOT LASTITEM DO
BEGIN
INTEGER L;
LINE :- COPY(SYSIN.IMAGE).STRIP;
OUTCHAR('"');
OUTTEXT(LINE);
OUTCHAR('"');
OUTTEXT(" BYTE LENGTH = "); OUTINT(LINE.LENGTH, 0);
OUTIMAGE;
INIMAGE;
END;
END.
</lang>
Output:<pre>
"møøse" BYTE LENGTH = 7
"𝔘𝔫𝔦𝔠𝔬𝔡𝔢" BYTE LENGTH = 28
"J̲o̲s̲é̲" BYTE LENGTH = 13
"€" BYTE LENGTH = 3
</pre>
===Character Length===
To calculate the character length, one can do it manually:
<lang simula>BEGIN
 
! NUMBER OF UFT8 CHARACTERS IN STRING ;
INTEGER PROCEDURE UTF8STRLEN(S); TEXT S;
BEGIN
INTEGER R, LEN, BYTES, ALLBYTES;
CHARACTER BYTE;
WHILE S.MORE DO
BEGIN
BYTE := S.GETCHAR;
ALLBYTES := ALLBYTES + 1;
R := RANK(BYTE);
LEN := LEN + 1;
BYTES :=
IF R >= 0 AND R <= 127 THEN 1 ELSE ! 0....... ASCII ;
IF R >= 128 AND R <= 191 THEN 0 ELSE ! 10...... CONTINUATION ;
IF R >= 192 AND R <= 223 THEN 2 ELSE ! 110..... 10x ;
IF R >= 224 AND R <= 239 THEN 3 ELSE ! 1110.... 10x 10x ;
IF R >= 240 AND R <= 247 THEN 4 ELSE ! 11110... 10x 10x 10x ;
-1;
IF BYTES = -1 THEN ERROR("ILLEGAL UTF8 STRING");
WHILE BYTES > 1 DO
BEGIN
BYTE := S.GETCHAR;
ALLBYTES := ALLBYTES + 1;
BYTES := BYTES - 1;
END;
END;
UTF8STRLEN := LEN;
END UTF8STRLEN;
 
TEXT LINE;
WHILE NOT LASTITEM DO
BEGIN
INTEGER L;
LINE :- COPY(SYSIN.IMAGE).STRIP;
OUTCHAR('"');
OUTTEXT(LINE);
OUTCHAR('"');
L := UTF8STRLEN(LINE);
OUTTEXT(" CHARACTER LENGTH = "); OUTINT(UTF8STRLEN(LINE), 0);
OUTIMAGE;
INIMAGE;
END;
 
END.</lang>
Output:<pre>"møøse" CHARACTER LENGTH = 5
"𝔘𝔫𝔦𝔠𝔬𝔡𝔢" CHARACTER LENGTH = 7
"J̲o̲s̲é̲" CHARACTER LENGTH = 8
"€" CHARACTER LENGTH = 1
</pre>
 
=={{header|Scala}}==
Anonymous user