String length: Difference between revisions
Content added Content deleted
Line 2,305: | Line 2,305: | ||
===Grapheme Length=== |
===Grapheme Length=== |
||
<lang ruby>say str.graphs.len; #=> 4</lang> |
<lang ruby>say str.graphs.len; #=> 4</lang> |
||
=={{header|Simula}}== |
|||
Simula has no bultin support for character encodings (Unicode was not even invented in the year 1967). The encoding was regarded responsibility of the operating system and one byte must match one character. |
|||
So character constants encoded in UTF-8 are not possible. |
|||
But reading from a utf8-encoding file is actually possible. |
|||
Input:<pre>møøse |
|||
𝔘𝔫𝔦𝔠𝔬𝔡𝔢 |
|||
J̲o̲s̲é̲ |
|||
€ |
|||
</pre> |
|||
===Byte Length=== |
|||
<lang simula>BEGIN |
|||
TEXT LINE; |
|||
WHILE NOT LASTITEM DO |
|||
BEGIN |
|||
INTEGER L; |
|||
LINE :- COPY(SYSIN.IMAGE).STRIP; |
|||
OUTCHAR('"'); |
|||
OUTTEXT(LINE); |
|||
OUTCHAR('"'); |
|||
OUTTEXT(" BYTE LENGTH = "); OUTINT(LINE.LENGTH, 0); |
|||
OUTIMAGE; |
|||
INIMAGE; |
|||
END; |
|||
END. |
|||
</lang> |
|||
Output:<pre> |
|||
"møøse" BYTE LENGTH = 7 |
|||
"𝔘𝔫𝔦𝔠𝔬𝔡𝔢" BYTE LENGTH = 28 |
|||
"J̲o̲s̲é̲" BYTE LENGTH = 13 |
|||
"€" BYTE LENGTH = 3 |
|||
</pre> |
|||
===Character Length=== |
|||
To calculate the character length, one can do it manually: |
|||
<lang simula>BEGIN |
|||
! NUMBER OF UFT8 CHARACTERS IN STRING ; |
|||
INTEGER PROCEDURE UTF8STRLEN(S); TEXT S; |
|||
BEGIN |
|||
INTEGER R, LEN, BYTES, ALLBYTES; |
|||
CHARACTER BYTE; |
|||
WHILE S.MORE DO |
|||
BEGIN |
|||
BYTE := S.GETCHAR; |
|||
ALLBYTES := ALLBYTES + 1; |
|||
R := RANK(BYTE); |
|||
LEN := LEN + 1; |
|||
BYTES := |
|||
IF R >= 0 AND R <= 127 THEN 1 ELSE ! 0....... ASCII ; |
|||
IF R >= 128 AND R <= 191 THEN 0 ELSE ! 10...... CONTINUATION ; |
|||
IF R >= 192 AND R <= 223 THEN 2 ELSE ! 110..... 10x ; |
|||
IF R >= 224 AND R <= 239 THEN 3 ELSE ! 1110.... 10x 10x ; |
|||
IF R >= 240 AND R <= 247 THEN 4 ELSE ! 11110... 10x 10x 10x ; |
|||
-1; |
|||
IF BYTES = -1 THEN ERROR("ILLEGAL UTF8 STRING"); |
|||
WHILE BYTES > 1 DO |
|||
BEGIN |
|||
BYTE := S.GETCHAR; |
|||
ALLBYTES := ALLBYTES + 1; |
|||
BYTES := BYTES - 1; |
|||
END; |
|||
END; |
|||
UTF8STRLEN := LEN; |
|||
END UTF8STRLEN; |
|||
TEXT LINE; |
|||
WHILE NOT LASTITEM DO |
|||
BEGIN |
|||
INTEGER L; |
|||
LINE :- COPY(SYSIN.IMAGE).STRIP; |
|||
OUTCHAR('"'); |
|||
OUTTEXT(LINE); |
|||
OUTCHAR('"'); |
|||
L := UTF8STRLEN(LINE); |
|||
OUTTEXT(" CHARACTER LENGTH = "); OUTINT(UTF8STRLEN(LINE), 0); |
|||
OUTIMAGE; |
|||
INIMAGE; |
|||
END; |
|||
END.</lang> |
|||
Output:<pre>"møøse" CHARACTER LENGTH = 5 |
|||
"𝔘𝔫𝔦𝔠𝔬𝔡𝔢" CHARACTER LENGTH = 7 |
|||
"J̲o̲s̲é̲" CHARACTER LENGTH = 8 |
|||
"€" CHARACTER LENGTH = 1 |
|||
</pre> |
|||
=={{header|Scala}}== |
=={{header|Scala}}== |