Compare length of two strings

From Rosetta Code
Task
Compare length of two strings
You are encouraged to solve this task according to the task description, using any language you may know.

Basic Data Operation
This is a basic data operation. It represents a fundamental action on a basic data type.

You may see other such operations in the Basic Data Operations category, or:

Integer Operations
Arithmetic | Comparison

Boolean Operations
Bitwise | Logical

String Operations
Concatenation | Interpolation | Comparison | Matching

Memory Operations
Pointers & references | Addresses

Task

Given two strings of different length, determine which string is longer or shorter. Print both strings and their length, one on each line. Print the longer one first.

Measure the length of your string in terms of bytes or characters, as appropriate for your language. If your language doesn't have an operator for measuring the length of a string, note it.

Extra credit

Given more than two strings:
list = ["abcd","123456789","abcdef","1234567"]
Show the strings in descending length order.

Other tasks related to string operations:
Metrics
Counting
Remove/replace
Anagrams/Derangements/shuffling
Find/Search/Determine
Formatting
Song lyrics/poems/Mad Libs/phrases
Tokenize
Sequences



ALGOL 68

Algol 68 does not have an in-built "LENGTH" operator, it does have operators LWB and UPB which return the lower bound and upper bound of an array and as strings are arrays of characters, LENGTH can easily be constructed from these.
In most Algol 68 implementations such as Algol 68G and Rutgers Algol 68, the CHAR type is an 8-bit byte. <lang algol68>BEGIN # compare string lengths #

   # returns the length of s using the builtin UPB and LWB operators #
   OP LENGTH = ( STRING s )INT: ( UPB s + 1 ) - LWB s;
   # prints s and its length #
   PROC print string = ( STRING s )VOID:
        print( ( """", s, """ has length: ", whole( LENGTH s, 0 ), " bytes.", newline ) );
   STRING shorter     = "short";
   STRING not shorter = "longer";
   IF LENGTH shorter >  LENGTH not shorter THEN print string( shorter ) FI;
   print string( not shorter );
   IF LENGTH shorter <= LENGTH not shorter THEN print string( shorter ) FI

END</lang>

Output:
"longer" has length: 6 bytes.
"short" has length: 5 bytes.

FreeBASIC

<lang freebasic>sub comp( A as string, B as string )

   if len(A)>=len(B) then 
       print A, len(A)
       print B, len(B)
   else
       print B, len(B)
       print A, len(A)
   end if

end sub

comp( "abcd", "123456789" )</lang>

Output:
123456789      9
abcd           4

Haskell

Using native String type: <lang haskell>task s1 s2 = do

 let strs = if length s1 > length s2 then [s1, s2] else [s2, s1]
 mapM_ (\s -> putStrLn $ show (length s) ++ "\t" ++ show s) strs</lang>
λ> task "short string" "longer string"
13	"longer string"
12	"short string"

λ> Data.List.sortOn length ["abcd","123456789","abcdef","1234567"]
["abcd","abcdef","1234567","123456789"]

Data.List.sortOn (negate . length) ["abcd","123456789","abcdef","1234567"]
["123456789","1234567","abcdef","abcd"]

or more practically useful Text: <lang haskell>import qualified Data.Text as T

taskT s1 s2 = do

 let strs = if T.length s1 > T.length s2 then [s1, s2] else [s2, s1]
 mapM_ (\s -> putStrLn $ show (T.length s) ++ "\t" ++ show s) strs</lang>
λ> :set -XOverloadedStrings
λ> taskT "short string" "longer string"
13	"longer string"
12	"short string"

Java

Works with: Java 11

<lang Java>package stringlensort;

import java.io.PrintStream; import java.util.Arrays; import java.util.Comparator;

public class ReportStringLengths {

   public static void main(String[] args) {
       String[] list = {"abcd", "123456789", "abcdef", "1234567"};
       String[] strings = args.length > 0 ? args : list;
       compareAndReportStringsLength(strings);
   }
   /**
    * Compare and report strings length to System.out.
    * 
    * @param strings an array of strings
    */    
   public static void compareAndReportStringsLength(String[] strings) {
       compareAndReportStringsLength(strings, System.out);
   }
   /**
    * Compare and report strings length.
    * 
    * @param strings an array of strings
    * @param stream the output stream to write results
    */
   public static void compareAndReportStringsLength(String[] strings, PrintStream stream) {
       if (strings.length > 0) {
           final String QUOTE = "\"";
           Arrays.sort(strings, Comparator.comparing(String::length));
           int min = strings[0].length();
           int max = strings[strings.length - 1].length();
           for (int i = strings.length - 1; i >= 0; i--) {
               int length = strings[i].length();
               String predicate;
               if (length == max) {
                   predicate = "is the longest string";
               } else if (length == min) {
                   predicate = "is the shortest string";
               } else {
                   predicate = "is neither the longest nor the shortest string";
               }
               //@todo: StringBuilder may be faster
               stream.println(QUOTE + strings[i] + QUOTE + " has " + length
                       + " and " + predicate);
           }
       }
   }

}</lang>

Output:
"123456789" has 9 and is the longest string
"1234567" has 7 and is neither the longest nor the shortest string
"abcdef" has 6 and is neither the longest nor the shortest string
"abcd" has 4 and is the shortest string

jq

Works with: jq

Works with gojq, the Go implementation of jq <lang jq> def s1: "longer"; def s2: "shorter😀";

[s1,s2] | sort_by(length) | reverse[] | "\"\(.)\" has length (codepoints) \(length) and utf8 byte length \(utf8bytelength)."

</lang>

Output:
"shorter😀" has length (codepoints) 8 and utf8 byte length 11.
"longer" has length (codepoints) 6 and utf8 byte length 6.

Julia

Per the Julia docs, a String in Julia is a sequence of characters encoded as UTF-8. Most string methods in Julia actually accept an AbstractString, which is the supertype of strings in Julia regardless of the encoding, including the default UTF-8.

The Char data type in Julia is a 32-bit, potentially Unicode data type, so that if we enumerate a String as a Char array, we get a series of 32-bit characters: <lang julia>s = "niño" println("Position Char Bytes\n==============================") for (i, c) in enumerate(s)

   println("$i          $c     $(sizeof(c))")

end

</lang>
Output:
Position  Char Bytes
==============================
1          n     4
2          i     4
3          ñ     4
4          o     4

However, if we index into the string, the index into the string will function as if the string was an ordinary C string, that is, an array of unsigned 8-bit integers. If the index attempts to index within a character of size greater than one byte, an error is thrown for bad indexing. This can be demonstrated by casting the above string to codeunits: <lang julia>println("Position Codeunit Bytes\n==============================") for (i, c) in enumerate(codeunits(s))

   println("$i            $(string(c, base=16))     $(sizeof(c))")

end

</lang>
Output:
Position  Codeunit Bytes
==============================
1            6e     1
2            69     1
3            c3     1
4            b1     1
5            6f     1

Note that the length of "niño" as a String is 4 characters, and the length of "niño" as codeunits (ie, 8 bit bytes) is 5. Indexing into the 4th position results in an error: <lang julia> julia> s[4] ERROR: StringIndexError: invalid index [4], valid nearby indices [3]=>'ñ', [5]=>'o' </lang>

So, whether a string is longer or shorter depends on the encoding, as below: <lang julia>length("ñññ") < length("nnnn") # true, and the usual meaning of length of a String

length(codeunits("ñññ")) > length(codeunits("nnnn")) # true as well </lang>

Nim

In Nim, a character (char) is represented on a byte. A string is a sequence of characters with a length. For interoperability reason, an extra null is added at the end of the characters. A string is supposed to be encoded in UTF-8, but this is not enforced. The function len returns the length of the string i.e. its number of characters (without the extra null).

If we want to manage a string as a Unicode sequence of code points, we have to use the module unicode. We can convert a string in a sequence of runes, each rune being a unicode UTF-32 value. The length of this sequence is the number of code points.

<lang Nim>import strformat, unicode

const

 S1 = "marche"
 S2 = "marché"

echo &"“{S2}”, byte length = {S2.len}, code points: {S2.toRunes.len}" echo &"“{S1}”, byte length = {S1.len}, code points: {S1.toRunes.len}"</lang>

Output:
“marché”, byte length = 7, code points: 6
“marche”, byte length = 6, code points: 6

Pascal

Works with: Extended Pascal

<lang pascal>program compareLengthOfStrings(output);

const specimenA = 'RosettaCode'; specimenB = 'Pascal'; specimenC = 'Foobar'; specimenD = 'Pascalish';

type specimen = (A, B, C, D); specimens = set of specimen value [];

const specimenMinimum = A; specimenMaximum = D;

var { the explicit range min..max serves as a safeguard to update max const } list: array[specimenMinimum..specimenMaximum] of string(24) value [A: specimenA; B: specimenB; C: specimenC; D: specimenD]; lengthRelationship: array[specimen] of specimens;

procedure analyzeLengths; var left, right: specimen; begin for left := specimenMinimum to specimenMaximum do begin for right := specimenMinimum to specimenMaximum do begin if length(list[left]) < length(list[right]) then begin lengthRelationship[right] := lengthRelationship[right] + [right] end end end end;

procedure printSortedByLengths; var i: ord(specimenMinimum)..ord(specimenMaximum); s: specimen; begin { first the string longer than all other strings } { lastly print the string not longer than any other string } for i := ord(specimenMaximum) downto ord(specimenMinimum) do begin { for demonstration purposes: iterate over a set } for s in [specimenMinimum..specimenMaximum] do begin { card returns the cardinality ("population count") } if card(lengthRelationship[s]) = i then begin writeLn(length(list[s]):8, ' ', list[s]) end end end end;

begin analyzeLengths; printSortedByLengths end.</lang>

Output:
      11 RosettaCode
       9 Pascalish
       6 Pascal
       6 Foobar

Phix

Lengths are in bytes, for codepoints use length(utf8_to_utf32()) or similar.

with javascript_semantics
sequence list = {"abcd","123456789","abcdef","1234567"},
         lens = apply(list,length),
         tags = reverse(custom_sort(lens,tagset(length(lens))))
papply(true,printf,{1,{"%s (length %d)\n"},columnize({extract(list,tags),extract(lens,tags)})})
Output:
123456789 (length 9)
1234567 (length 7)
abcdef (length 6)
abcd (length 4)

Python

Works with: Python 3.8

<lang Python>def naive_compare_and_report_length(str1, str2):

   if len(str1) > len(str2):
       print('"' + str1 + '"', 'has length', len(str1), 'and is longer')
       print('"' + str2 + '"', 'has length', len(str2), 'and is shorter')
   elif len(str1) < len(str2):
       print('"' + str1 + '"', 'has length', len(str1), 'and is shorter')
       print('"' + str2 + '"', 'has length', len(str2), 'and is longer')
   else:
       print('"' + str1 + '"', 'has length', len(str1), 'and is equal')
       print('"' + str2 + '"', 'has length', len(str2), 'and is equal')


def compare_and_report_length(*objects, sorted_=True, reverse=True):

   """
   For objects given as parameters it prints which of them are the longest.
   Note that it is possible that each of these objects has the same length.
   """
   lengths = list(map(len, objects))
   max_length = max(lengths)
   lengths_and_objects = zip(lengths, objects)
   if sorted:
       lengths_and_objects = sorted(lengths_and_objects, reverse=reverse)
   for length, obj in lengths_and_objects:
       predicate = 'is' if length == max_length else 'is not'
       print(f'"{obj}" has length {length} and {predicate} the longest string')


A = 'I am string' B = 'I am string too' LIST = ["abcd","123456789","abcdef","1234567"]

print() print('Naive') print()

naive_compare_and_report_length(A, B) print()

print() print('Sophisticated') print()

print('sort two string') print() compare_and_report_length(A, B) print()

print('sort a list of strings') print() compare_and_report_length(*LIST) print()</lang>

Output:

Naive

"I am string" has length 11 and is shorter "I am string too" has length 15 and is longer


Sophisticated

sort two string

"I am string too" has length 15 and is the longest string "I am string" has length 11 and is not the longest string

sort a list of strings

"123456789" has length 9 and is the longest string "1234567" has length 7 and is not the longest string "abcdef" has length 6 and is not the longest string "abcd" has length 4 and is not the longest string

Raku

So... In what way does this task differ significantly from String length? Other than being horribly under specified?

In the modern world, string "length" is pretty much a useless measurement, especially in the absence of a specified encoding; hence Raku not even having an operator: "length" for strings.

<lang perl6>say 'Strings (👨‍👩‍👧‍👦, 🤔🇺🇸, BOGUS!) sorted: "longest" first:'; say "$_: characters:{.chars}, Unicode code points:{.codes}, UTF-8 bytes:{.encode('UTF8').bytes}, UTF-16 bytes:{.encode('UTF16').bytes}" for <👨‍👩‍👧‍👦 BOGUS! 🤔🇺🇸>.sort: -*.chars;</lang>

Output:
Strings (👨‍👩‍👧‍👦, 🤔🇺🇸, BOGUS!) sorted: "longest" first:
BOGUS!: characters:6,  Unicode code points:6,  UTF-8 bytes:6,  UTF-16 bytes:12
🤔🇺🇸: characters:2,  Unicode code points:3,  UTF-8 bytes:12,  UTF-16 bytes:12
👨‍👩‍👧‍👦: characters:1,  Unicode code points:7,  UTF-8 bytes:25,  UTF-16 bytes:22

Ring

Two strings

<lang ring> see "working..." + nl

list = ["abcd","123456789"] if len(list[1]) > len(list[2])

  first = list[1]
  second = list[2]

else

  first = list[2]
  second = list[1]

ok

see "Compare length of two strings:" + nl see "" + first + " len = " + len(first) + nl + second + " len = " + len(second) + nl see "done..." + nl </lang>

Output:
working...
Compare length of two strings:
123456789 len = 9
abcd len = 4
done...

More than two strings

<lang ring> see "working..." + nl

lenList = [] list = ["abcd","123456789","abcdef","1234567"] for n = 1 to len(list)

   len = len(list[n])
   add(lenList,[len,n])

next

lenList = sort(lenList,1) lenList = reverse(lenList)

see "Compare length of strings in descending order:" + nl for n = 1 to len(lenList)

   see "" + list[lenList[n][2]] + " len = " + lenList[n][1] + nl

next see "done..." + nl </lang>

Output:
working...
Compare length of strings in descending order:
123456789 len = 9
1234567 len = 7
abcdef len = 6
abcd len = 4
done...

Wren

Library: Wren-upc

In Wren a string (i.e. an object of the String class) is an immutable sequence of bytes which is usually interpreted as UTF-8 but does not have to be.

With regard to string length, the String.count method returns the number of 'codepoints' in the string. If the string contains bytes which are invalid UTF-8, each such byte adds one to the count.

To find the number of bytes one can use String.bytes.count.

Unicode grapheme clusters, where what appears to be a single 'character' may in fact be an amalgam of several codepoints, are not directly supported by Wren but it is possible to measure the length in grapheme clusters of a string (i.e. the number of user perceived characters) using the Graphemes.clusterCount method of the Wren-upc module. <lang ecmascript>import "./upc" for Graphemes

var printCounts = Fn.new { |s1, s2, c1, c2|

  var l1 = (c1 > c2) ? [s1, c1] : [s2, c2]
  var l2 = (c1 > c2) ? [s2, c2] : [s1, c1]
  System.print(  "%(l1[0]) : length %(l1[1])")
  System.print(  "%(l2[0]) : length %(l2[1])\n")

}

var codepointCounts = Fn.new { |s1, s2|

  var c1 = s1.count
  var c2 = s2.count
  System.print("Comparison by codepoints:")
  printCounts.call(s1, s2, c1, c2)

}

var byteCounts = Fn.new { |s1, s2|

  var c1 = s1.bytes.count
  var c2 = s2.bytes.count
  System.print("Comparison by bytes:")
  printCounts.call(s1, s2, c1, c2)

}

var graphemeCounts = Fn.new { |s1, s2|

  var c1 = Graphemes.clusterCount(s1)
  var c2 = Graphemes.clusterCount(s2)
  System.print("Comparison by grapheme clusters:")
  printCounts.call(s1, s2, c1, c2)

}

for (pair in [ ["nino", "niño"], ["👨‍👩‍👧‍👦", "🤔🇺🇸"] ]) {

   codepointCounts.call(pair[0], pair[1])
   byteCounts.call(pair[0], pair[1])
   graphemeCounts.call(pair[0], pair[1])

}

var list = ["abcd", "123456789", "abcdef", "1234567"] System.write("Sorting in descending order by length in codepoints:\n%(list) -> ") list.sort { |a, b| a.count > b.count } System.print(list)</lang>

Output:
Comparison by codepoints:
niño : length 4
nino : length 4

Comparison by bytes:
niño : length 5
nino : length 4

Comparison by grapheme clusters:
niño : length 4
nino : length 4

Comparison by codepoints:
👨‍👩‍👧‍👦 : length 7
🤔🇺🇸 : length 3

Comparison by bytes:
👨‍👩‍👧‍👦 : length 25
🤔🇺🇸 : length 12

Comparison by grapheme clusters:
🤔🇺🇸 : length 2
👨‍👩‍👧‍👦 : length 1

Sorting in descending order by length in codepoints:
[abcd, 123456789, abcdef, 1234567] -> [123456789, 1234567, abcdef, abcd]

Z80 Assembly

<lang z80>Terminator equ 0 ;null terminator PrintChar equ &BB5A ;Amstrad CPC BIOS call, prints accumulator to screen as an ASCII character.

       org &8000

ld hl,String1 ld de,String2 call CompareStringLengths

jp nc, Print_HL_First ex de,hl Print_HL_First: push bc push hl call PrintString pop hl push hl ld a,' ' call PrintChar call getStringLength ld a,b call ShowHex_NoLeadingZeroes call NewLine pop hl pop bc

ex de,hl push bc push hl call PrintString pop hl push hl ld a,' ' call PrintChar call getStringLength ld a,b call ShowHex_NoLeadingZeroes call NewLine pop hl pop bc ReturnToBasic: RET

String1: byte "Hello",Terminator String2: byte "Goodbye",Terminator

RELEVANT SUBROUTINES - PRINTSTRING AND NEWLINE CREATED BY KEITH S. OF CHIBIAKUMAS

CompareStringLengths: ;HL = string 1 ;DE = string 2 ;CLOBBERS A,B,C push hl push de ex de,hl call GetStringLength ld b,c

ex de,hl call GetStringLength ld a,b cp c pop de pop hl ret ;returns carry set if HL < DE, zero set if equal, zero & carry clear if HL >= DE ;returns len(DE) in C, and len(HL) in B.

GetStringLength: ld b,0 loop_getStringLength: ld a,(hl) cp Terminator ret z inc hl inc b jr loop_getStringLength

NewLine: push af ld a,13 ;Carriage return call PrintChar ld a,10 ;Line Feed call PrintChar pop af ret

PrintString: ld a,(hl) cp Terminator ret z inc hl call PrintChar jr PrintString

ShowHex_NoLeadingZeroes:

useful for printing values where leading zeroes don't make sense,
such as money etc.

push af and %11110000 ifdef gbz80 ;game boy swap a else ;zilog z80 rrca rrca rrca rrca endif or a call nz,PrintHexChar ;if top nibble of A is zero, don't print it. pop af and %00001111 or a ret z ;if bottom nibble of A is zero, don't print it! jp PrintHexChar

PrintHexChar: or a ;Clear Carry Flag daa add a,&F0 adc a,&40 ;This sequence converts a 4-bit hex digit to its ASCII equivalent. jp PrintChar</lang>

Output:
Goodbye 7
Hello 5