Compare length of two strings

From Rosetta Code
Revision as of 06:13, 28 October 2021 by Wherrera (talk | contribs) (julia example)
Task
Compare length of two strings
You are encouraged to solve this task according to the task description, using any language you may know.

Basic Data Operation
This is a basic data operation. It represents a fundamental action on a basic data type.

You may see other such operations in the Basic Data Operations category, or:

Integer Operations
Arithmetic | Comparison

Boolean Operations
Bitwise | Logical

String Operations
Concatenation | Interpolation | Comparison | Matching

Memory Operations
Pointers & references | Addresses

Task

Given two strings of different length, determine which string is longer or shorter. Print both strings and their length, one on each line. Print the longer one first.

Measure the length of your string in terms of bytes or characters, as appropriate for your language. If your language doesn't have an operator for measuring the length of a string, note it.

Other tasks related to string operations:
Metrics
Counting
Remove/replace
Anagrams/Derangements/shuffling
Find/Search/Determine
Formatting
Song lyrics/poems/Mad Libs/phrases
Tokenize
Sequences




Julia

Per the Julia docs, a String in Julia is a sequence of charachters encoded as UTF-8. Most string methods in Julia actually accept an AbstractString, which is the supertype of strings in Julia regardless of the encoding, including the default UTF-8.

The Char data type in Julia is a 32-bit, potentially Unicode data type, so that if we enumerate a String as a Char array, we get a series of 32-bit characters: <lang julia>s = "niño" println("Position Char Bytes\n==============================") for (i, c) in enumerate(s)

   println("$i          $c     $(sizeof(c))")

end

</lang>

Output:
Position  Char Bytes
==============================
1          n     4
2          i     4
3          ñ     4
4          o     4

However, if we index into the string, the index into the string will function as if the string was an ordinary C string, that is, an array of unsigned 8-bit integers. If the index attempts to index within a character of size greater than one byte, an error is thrown for bad indexing. This can be demonstrated by casting the above string to codeunits: <lang julia>println("Position Codeunit Bytes\n==============================") for (i, c) in enumerate(codeunits(s))

   println("$i            $(string(c, base=16))     $(sizeof(c))")

end

</lang>

Output:
Position  Codeunit Bytes
==============================
1            6e     1
2            69     1
3            c3     1
4            b1     1
5            6f     1

Note that the length of "niño" as a String is 4 characters, and the length of "niño" as codeunits (ie, 8 bit bytes) is 5. Indexing into the 4th position results in an error: <lang julia> julia> s[4] ERROR: StringIndexError: invalid index [4], valid nearby indices [3]=>'ñ', [5]=>'o' </lang>

So, whether a string is longer or shorter depends on the encoding, as below: <lang julia>length("ñññ") < length("nnnn") # true, and the usual meaning of length of a String

length(codeunits("ñññ")) > length(codeunits("nnnn")) # true as well </lang>


Raku

So... In what way does this task differ significantly from String length? Other than being horribly under specified?

In the modern world, string "length" is pretty much a useless measurement, especially in the absence of a specified encoding; hence Raku not even having an operator: "length" for strings.

<lang perl6>say 'Strings (👨‍👩‍👧‍👦, 🤔🇺🇸, BOGUS!) sorted: "longest" first:'; say "$_: characters:{.chars}, Unicode code points:{.codes}, UTF-8 bytes:{.encode('UTF8').bytes}, UTF-16 bytes:{.encode('UTF16').bytes}" for <👨‍👩‍👧‍👦 BOGUS! 🤔🇺🇸>.sort: -*.chars;</lang>

Output:
Strings (👨‍👩‍👧‍👦, 🤔🇺🇸, BOGUS!) sorted: "longest" first:
BOGUS!: characters:6,  Unicode code points:6,  UTF-8 bytes:6,  UTF-16 bytes:12
🤔🇺🇸: characters:2,  Unicode code points:3,  UTF-8 bytes:12,  UTF-16 bytes:12
👨‍👩‍👧‍👦: characters:1,  Unicode code points:7,  UTF-8 bytes:25,  UTF-16 bytes:22

Z80 Assembly

<lang z80>Terminator equ 0 ;null terminator PrintChar equ &BB5A ;Amstrad CPC BIOS call, prints accumulator to screen as an ASCII character.

       org &8000

ld hl,String1 ld de,String2 call CompareStringLengths

jp nc, Print_HL_First ex de,hl Print_HL_First: push bc push hl call PrintString pop hl push hl ld a,' ' call PrintChar call getStringLength ld a,b call ShowHex_NoLeadingZeroes call NewLine pop hl pop bc

ex de,hl push bc push hl call PrintString pop hl push hl ld a,' ' call PrintChar call getStringLength ld a,b call ShowHex_NoLeadingZeroes call NewLine pop hl pop bc ReturnToBasic: RET

String1: byte "Hello",Terminator String2: byte "Goodbye",Terminator

RELEVANT SUBROUTINES - PRINTSTRING AND NEWLINE CREATED BY KEITH S. OF CHIBIAKUMAS

CompareStringLengths: ;HL = string 1 ;DE = string 2 ;CLOBBERS A,B,C push hl push de ex de,hl call GetStringLength ld b,c

ex de,hl call GetStringLength ld a,b cp c pop de pop hl ret ;returns carry set if HL < DE, zero set if equal, zero & carry clear if HL >= DE ;returns len(DE) in C, and len(HL) in B.

GetStringLength: ld b,0 loop_getStringLength: ld a,(hl) cp Terminator ret z inc hl inc b jr loop_getStringLength

NewLine: push af ld a,13 ;Carriage return call PrintChar ld a,10 ;Line Feed call PrintChar pop af ret

PrintString: ld a,(hl) cp Terminator ret z inc hl call PrintChar jr PrintString

ShowHex_NoLeadingZeroes:

useful for printing values where leading zeroes don't make sense,
such as money etc.

push af and %11110000 ifdef gbz80 ;game boy swap a else ;zilog z80 rrca rrca rrca rrca endif or a call nz,PrintHexChar ;if top nibble of A is zero, don't print it. pop af and %00001111 or a ret z ;if bottom nibble of A is zero, don't print it! jp PrintHexChar

PrintHexChar: or a ;Clear Carry Flag daa add a,&F0 adc a,&40 ;This sequence converts a 4-bit hex digit to its ASCII equivalent. jp PrintChar</lang>

Output:
Goodbye 7
Hello 5