Compare length of two strings
You are encouraged to solve this task according to the task description, using any language you may know.
Basic Data Operation
This is a basic data operation. It represents a fundamental action on a basic data type.
You may see other such operations in the Basic Data Operations category, or:
Integer Operations
Arithmetic |
Comparison
Boolean Operations
Bitwise |
Logical
String Operations
Concatenation |
Interpolation |
Comparison |
Matching
Memory Operations
Pointers & references |
Addresses
- Task
Given two strings of different length, determine which string is longer or shorter. Print both strings and their length, one on each line. Print the longer one first.
Measure the length of your string in terms of bytes or characters, as appropriate for your language. If your language doesn't have an operator for measuring the length of a string, note it.
- Extra credit
Given more than two strings:
list = ["abcd","123456789","abcdef","1234567"]
Show the strings in descending length order.
- Metrics
- Counting
- Word frequency
- Letter frequency
- Jewels and stones
- I before E except after C
- Bioinformatics/base count
- Count occurrences of a substring
- Count how many vowels and consonants occur in a string
- Remove/replace
- XXXX redacted
- Conjugate a Latin verb
- Remove vowels from a string
- String interpolation (included)
- Strip block comments
- Strip comments from a string
- Strip a set of characters from a string
- Strip whitespace from a string -- top and tail
- Strip control codes and extended characters from a string
- Anagrams/Derangements/shuffling
- Word wheel
- ABC problem
- Sattolo cycle
- Knuth shuffle
- Ordered words
- Superpermutation minimisation
- Textonyms (using a phone text pad)
- Anagrams
- Anagrams/Deranged anagrams
- Permutations/Derangements
- Find/Search/Determine
- ABC words
- Odd words
- Word ladder
- Semordnilap
- Word search
- Wordiff (game)
- String matching
- Tea cup rim text
- Alternade words
- Changeable words
- State name puzzle
- String comparison
- Unique characters
- Unique characters in each string
- Extract file extension
- Levenshtein distance
- Palindrome detection
- Common list elements
- Longest common suffix
- Longest common prefix
- Compare a list of strings
- Longest common substring
- Find common directory path
- Words from neighbour ones
- Change e letters to i in words
- Non-continuous subsequences
- Longest common subsequence
- Longest palindromic substrings
- Longest increasing subsequence
- Words containing "the" substring
- Sum of the digits of n is substring of n
- Determine if a string is numeric
- Determine if a string is collapsible
- Determine if a string is squeezable
- Determine if a string has all unique characters
- Determine if a string has all the same characters
- Longest substrings without repeating characters
- Find words which contains all the vowels
- Find words which contain the most consonants
- Find words which contains more than 3 vowels
- Find words whose first and last three letters are equal
- Find words with alternating vowels and consonants
- Formatting
- Substring
- Rep-string
- Word wrap
- String case
- Align columns
- Literals/String
- Repeat a string
- Brace expansion
- Brace expansion using ranges
- Reverse a string
- Phrase reversals
- Comma quibbling
- Special characters
- String concatenation
- Substring/Top and tail
- Commatizing numbers
- Reverse words in a string
- Suffixation of decimal numbers
- Long literals, with continuations
- Numerical and alphabetical suffixes
- Abbreviations, easy
- Abbreviations, simple
- Abbreviations, automatic
- Song lyrics/poems/Mad Libs/phrases
- Mad Libs
- Magic 8-ball
- 99 bottles of beer
- The Name Game (a song)
- The Old lady swallowed a fly
- The Twelve Days of Christmas
- Tokenize
- Text between
- Tokenize a string
- Word break problem
- Tokenize a string with escaping
- Split a character string based on change of character
- Sequences
ALGOL 68
Algol 68 does not have an in-built "LENGTH" operator, it does have operators LWB and UPB which return the lower bound and upper bound of an array and as strings are arrays of characters, LENGTH can easily be constructed from these.
In most Algol 68 implementations such as Algol 68G and Rutgers Algol 68, the CHAR type is an 8-bit byte.
<lang algol68>BEGIN # compare string lengths #
# returns the length of s using the builtin UPB and LWB operators # OP LENGTH = ( STRING s )INT: ( UPB s + 1 ) - LWB s; # prints s and its length # PROC print string = ( STRING s )VOID: print( ( """", s, """ has length: ", whole( LENGTH s, 0 ), " bytes.", newline ) ); STRING shorter = "short"; STRING not shorter = "longer"; IF LENGTH shorter > LENGTH not shorter THEN print string( shorter ) FI; print string( not shorter ); IF LENGTH shorter <= LENGTH not shorter THEN print string( shorter ) FI
END</lang>
- Output:
"longer" has length: 6 bytes. "short" has length: 5 bytes.
FreeBASIC
<lang freebasic>sub comp( A as string, B as string )
if len(A)>=len(B) then print A, len(A) print B, len(B) else print B, len(B) print A, len(A) end if
end sub
comp( "abcd", "123456789" )</lang>
- Output:
123456789 9 abcd 4
Haskell
Using native String type: <lang haskell>task s1 s2 = do
let strs = if length s1 > length s2 then [s1, s2] else [s2, s1] mapM_ (\s -> putStrLn $ show (length s) ++ "\t" ++ show s) strs</lang>
λ> task "short string" "longer string" 13 "longer string" 12 "short string" λ> Data.List.sortOn length ["abcd","123456789","abcdef","1234567"] ["abcd","abcdef","1234567","123456789"] Data.List.sortOn (negate . length) ["abcd","123456789","abcdef","1234567"] ["123456789","1234567","abcdef","abcd"]
or more practically useful Text: <lang haskell>import qualified Data.Text as T
taskT s1 s2 = do
let strs = if T.length s1 > T.length s2 then [s1, s2] else [s2, s1] mapM_ (\s -> putStrLn $ show (T.length s) ++ "\t" ++ show s) strs</lang>
λ> :set -XOverloadedStrings λ> taskT "short string" "longer string" 13 "longer string" 12 "short string"
Java
<lang Java>package stringlensort;
import java.io.PrintStream; import java.util.Arrays; import java.util.Comparator;
public class ReportStringLengths {
public static void main(String[] args) { String[] list = {"abcd", "123456789", "abcdef", "1234567"}; String[] strings = args.length > 0 ? args : list;
compareAndReportStringsLength(strings); }
/** * Compare and report strings length to System.out. * * @param strings an array of strings */ public static void compareAndReportStringsLength(String[] strings) { compareAndReportStringsLength(strings, System.out); }
/** * Compare and report strings length. * * @param strings an array of strings * @param stream the output stream to write results */ public static void compareAndReportStringsLength(String[] strings, PrintStream stream) { if (strings.length > 0) { final String QUOTE = "\""; Arrays.sort(strings, Comparator.comparing(String::length)); int min = strings[0].length(); int max = strings[strings.length - 1].length(); for (int i = strings.length - 1; i >= 0; i--) { int length = strings[i].length(); String predicate; if (length == max) { predicate = "is the longest string"; } else if (length == min) { predicate = "is the shortest string"; } else { predicate = "is neither the longest nor the shortest string"; } //@todo: StringBuilder may be faster stream.println(QUOTE + strings[i] + QUOTE + " has " + length + " and " + predicate); } } }
}</lang>
- Output:
"123456789" has 9 and is the longest string "1234567" has 7 and is neither the longest nor the shortest string "abcdef" has 6 and is neither the longest nor the shortest string "abcd" has 4 and is the shortest string
jq
Works with gojq, the Go implementation of jq <lang jq> def s1: "longer"; def s2: "shorter😀";
[s1,s2] | sort_by(length) | reverse[] | "\"\(.)\" has length (codepoints) \(length) and utf8 byte length \(utf8bytelength)."
</lang>
- Output:
"shorter😀" has length (codepoints) 8 and utf8 byte length 11. "longer" has length (codepoints) 6 and utf8 byte length 6.
Julia
Per the Julia docs, a String in Julia is a sequence of characters encoded as UTF-8. Most string methods in Julia actually accept an AbstractString, which is the supertype of strings in Julia regardless of the encoding, including the default UTF-8.
The Char data type in Julia is a 32-bit, potentially Unicode data type, so that if we enumerate a String as a Char array, we get a series of 32-bit characters: <lang julia>s = "niño" println("Position Char Bytes\n==============================") for (i, c) in enumerate(s)
println("$i $c $(sizeof(c))")
end
</lang>
- Output:
Position Char Bytes ============================== 1 n 4 2 i 4 3 ñ 4 4 o 4
However, if we index into the string, the index into the string will function as if the string was an ordinary C string, that is, an array of unsigned 8-bit integers. If the index attempts to index within a character of size greater than one byte, an error is thrown for bad indexing. This can be demonstrated by casting the above string to codeunits: <lang julia>println("Position Codeunit Bytes\n==============================") for (i, c) in enumerate(codeunits(s))
println("$i $(string(c, base=16)) $(sizeof(c))")
end
</lang>
- Output:
Position Codeunit Bytes ============================== 1 6e 1 2 69 1 3 c3 1 4 b1 1 5 6f 1
Note that the length of "niño" as a String is 4 characters, and the length of "niño" as codeunits (ie, 8 bit bytes) is 5. Indexing into the 4th position results in an error: <lang julia> julia> s[4] ERROR: StringIndexError: invalid index [4], valid nearby indices [3]=>'ñ', [5]=>'o' </lang>
So, whether a string is longer or shorter depends on the encoding, as below: <lang julia>length("ñññ") < length("nnnn") # true, and the usual meaning of length of a String
length(codeunits("ñññ")) > length(codeunits("nnnn")) # true as well </lang>
Nim
In Nim, a character (char
) is represented on a byte. A string is a sequence of characters with a length. For interoperability reason, an extra null is added at the end of the characters.
A string is supposed to be encoded in UTF-8, but this is not enforced. The function len
returns the length of the string i.e. its number of characters (without the extra null).
If we want to manage a string as a Unicode sequence of code points, we have to use the module unicode
. We can convert a string in a sequence of runes, each rune being a unicode UTF-32 value. The length of this sequence is the number of code points.
<lang Nim>import strformat, unicode
const
S1 = "marche" S2 = "marché"
echo &"“{S2}”, byte length = {S2.len}, code points: {S2.toRunes.len}" echo &"“{S1}”, byte length = {S1.len}, code points: {S1.toRunes.len}"</lang>
- Output:
“marché”, byte length = 7, code points: 6 “marche”, byte length = 6, code points: 6
Pascal
<lang pascal>program compareLengthOfStrings(output);
const specimenA = 'RosettaCode'; specimenB = 'Pascal'; specimenC = 'Foobar'; specimenD = 'Pascalish';
type specimen = (A, B, C, D); specimens = set of specimen value [];
const specimenMinimum = A; specimenMaximum = D;
var { the explicit range min..max serves as a safeguard to update max const } list: array[specimenMinimum..specimenMaximum] of string(24) value [A: specimenA; B: specimenB; C: specimenC; D: specimenD]; lengthRelationship: array[specimen] of specimens;
procedure analyzeLengths; var left, right: specimen; begin for left := specimenMinimum to specimenMaximum do begin for right := specimenMinimum to specimenMaximum do begin if length(list[left]) < length(list[right]) then begin lengthRelationship[right] := lengthRelationship[right] + [right] end end end end;
procedure printSortedByLengths; var i: ord(specimenMinimum)..ord(specimenMaximum); s: specimen; begin { first the string longer than all other strings } { lastly print the string not longer than any other string } for i := ord(specimenMaximum) downto ord(specimenMinimum) do begin { for demonstration purposes: iterate over a set } for s in [specimenMinimum..specimenMaximum] do begin { card returns the cardinality ("population count") } if card(lengthRelationship[s]) = i then begin writeLn(length(list[s]):8, ' ', list[s]) end end end end;
begin analyzeLengths; printSortedByLengths end.</lang>
- Output:
11 RosettaCode 9 Pascalish 6 Pascal 6 Foobar
Phix
Lengths are in bytes, for codepoints use length(utf8_to_utf32()) or similar.
with javascript_semantics sequence list = {"abcd","123456789","abcdef","1234567"}, lens = apply(list,length), tags = reverse(custom_sort(lens,tagset(length(lens)))) papply(true,printf,{1,{"%s (length %d)\n"},columnize({extract(list,tags),extract(lens,tags)})})
- Output:
123456789 (length 9) 1234567 (length 7) abcdef (length 6) abcd (length 4)
Python
<lang Python>def naive_compare_and_report_length(str1, str2):
if len(str1) > len(str2): print('"' + str1 + '"', 'has length', len(str1), 'and is longer') print('"' + str2 + '"', 'has length', len(str2), 'and is shorter') elif len(str1) < len(str2): print('"' + str1 + '"', 'has length', len(str1), 'and is shorter') print('"' + str2 + '"', 'has length', len(str2), 'and is longer') else: print('"' + str1 + '"', 'has length', len(str1), 'and is equal') print('"' + str2 + '"', 'has length', len(str2), 'and is equal')
def compare_and_report_length(*objects, sorted_=True, reverse=True):
""" For objects given as parameters it prints which of them are the longest. Note that it is possible that each of these objects has the same length. """ lengths = list(map(len, objects)) max_length = max(lengths) lengths_and_objects = zip(lengths, objects) if sorted: lengths_and_objects = sorted(lengths_and_objects, reverse=reverse) for length, obj in lengths_and_objects: predicate = 'is' if length == max_length else 'is not' print(f'"{obj}" has length {length} and {predicate} the longest string')
A = 'I am string' B = 'I am string too' LIST = ["abcd","123456789","abcdef","1234567"]
print() print('Naive') print()
naive_compare_and_report_length(A, B) print()
print() print('Sophisticated') print()
print('sort two string') print() compare_and_report_length(A, B) print()
print('sort a list of strings') print() compare_and_report_length(*LIST) print()</lang>
- Output:
Naive
"I am string" has length 11 and is shorter "I am string too" has length 15 and is longer
Sophisticatedsort two string
"I am string too" has length 15 and is the longest string "I am string" has length 11 and is not the longest string
sort a list of strings
"123456789" has length 9 and is the longest string "1234567" has length 7 and is not the longest string "abcdef" has length 6 and is not the longest string "abcd" has length 4 and is not the longest string
Raku
So... In what way does this task differ significantly from String length? Other than being horribly under specified?
In the modern world, string "length" is pretty much a useless measurement, especially in the absence of a specified encoding; hence Raku not even having an operator: "length" for strings.
<lang perl6>say 'Strings (👨👩👧👦, 🤔🇺🇸, BOGUS!) sorted: "longest" first:'; say "$_: characters:{.chars}, Unicode code points:{.codes}, UTF-8 bytes:{.encode('UTF8').bytes}, UTF-16 bytes:{.encode('UTF16').bytes}" for <👨👩👧👦 BOGUS! 🤔🇺🇸>.sort: -*.chars;</lang>
- Output:
Strings (👨👩👧👦, 🤔🇺🇸, BOGUS!) sorted: "longest" first: BOGUS!: characters:6, Unicode code points:6, UTF-8 bytes:6, UTF-16 bytes:12 🤔🇺🇸: characters:2, Unicode code points:3, UTF-8 bytes:12, UTF-16 bytes:12 👨👩👧👦: characters:1, Unicode code points:7, UTF-8 bytes:25, UTF-16 bytes:22
Ring
Two strings
<lang ring> see "working..." + nl
list = ["abcd","123456789"] if len(list[1]) > len(list[2])
first = list[1] second = list[2]
else
first = list[2] second = list[1]
ok
see "Compare length of two strings:" + nl see "" + first + " len = " + len(first) + nl + second + " len = " + len(second) + nl see "done..." + nl </lang>
- Output:
working... Compare length of two strings: 123456789 len = 9 abcd len = 4 done...
More than two strings
<lang ring> see "working..." + nl
lenList = [] list = ["abcd","123456789","abcdef","1234567"] for n = 1 to len(list)
len = len(list[n]) add(lenList,[len,n])
next
lenList = sort(lenList,1) lenList = reverse(lenList)
see "Compare length of strings in descending order:" + nl for n = 1 to len(lenList)
see "" + list[lenList[n][2]] + " len = " + lenList[n][1] + nl
next see "done..." + nl </lang>
- Output:
working... Compare length of strings in descending order: 123456789 len = 9 1234567 len = 7 abcdef len = 6 abcd len = 4 done...
Wren
In Wren a string (i.e. an object of the String class) is an immutable sequence of bytes which is usually interpreted as UTF-8 but does not have to be.
With regard to string length, the String.count method returns the number of 'codepoints' in the string. If the string contains bytes which are invalid UTF-8, each such byte adds one to the count.
To find the number of bytes one can use String.bytes.count.
Unicode grapheme clusters, where what appears to be a single 'character' may in fact be an amalgam of several codepoints, are not directly supported by Wren but it is possible to measure the length in grapheme clusters of a string (i.e. the number of user perceived characters) using the Graphemes.clusterCount method of the Wren-upc module. <lang ecmascript>import "./upc" for Graphemes
var printCounts = Fn.new { |s1, s2, c1, c2|
var l1 = (c1 > c2) ? [s1, c1] : [s2, c2] var l2 = (c1 > c2) ? [s2, c2] : [s1, c1] System.print( "%(l1[0]) : length %(l1[1])") System.print( "%(l2[0]) : length %(l2[1])\n")
}
var codepointCounts = Fn.new { |s1, s2|
var c1 = s1.count var c2 = s2.count System.print("Comparison by codepoints:") printCounts.call(s1, s2, c1, c2)
}
var byteCounts = Fn.new { |s1, s2|
var c1 = s1.bytes.count var c2 = s2.bytes.count System.print("Comparison by bytes:") printCounts.call(s1, s2, c1, c2)
}
var graphemeCounts = Fn.new { |s1, s2|
var c1 = Graphemes.clusterCount(s1) var c2 = Graphemes.clusterCount(s2) System.print("Comparison by grapheme clusters:") printCounts.call(s1, s2, c1, c2)
}
for (pair in [ ["nino", "niño"], ["👨👩👧👦", "🤔🇺🇸"] ]) {
codepointCounts.call(pair[0], pair[1]) byteCounts.call(pair[0], pair[1]) graphemeCounts.call(pair[0], pair[1])
}
var list = ["abcd", "123456789", "abcdef", "1234567"] System.write("Sorting in descending order by length in codepoints:\n%(list) -> ") list.sort { |a, b| a.count > b.count } System.print(list)</lang>
- Output:
Comparison by codepoints: niño : length 4 nino : length 4 Comparison by bytes: niño : length 5 nino : length 4 Comparison by grapheme clusters: niño : length 4 nino : length 4 Comparison by codepoints: 👨👩👧👦 : length 7 🤔🇺🇸 : length 3 Comparison by bytes: 👨👩👧👦 : length 25 🤔🇺🇸 : length 12 Comparison by grapheme clusters: 🤔🇺🇸 : length 2 👨👩👧👦 : length 1 Sorting in descending order by length in codepoints: [abcd, 123456789, abcdef, 1234567] -> [123456789, 1234567, abcdef, abcd]
Z80 Assembly
<lang z80>Terminator equ 0 ;null terminator PrintChar equ &BB5A ;Amstrad CPC BIOS call, prints accumulator to screen as an ASCII character.
org &8000
ld hl,String1 ld de,String2 call CompareStringLengths
jp nc, Print_HL_First ex de,hl Print_HL_First: push bc push hl call PrintString pop hl push hl ld a,' ' call PrintChar call getStringLength ld a,b call ShowHex_NoLeadingZeroes call NewLine pop hl pop bc
ex de,hl push bc push hl call PrintString pop hl push hl ld a,' ' call PrintChar call getStringLength ld a,b call ShowHex_NoLeadingZeroes call NewLine pop hl pop bc ReturnToBasic: RET
String1: byte "Hello",Terminator String2: byte "Goodbye",Terminator
- RELEVANT SUBROUTINES - PRINTSTRING AND NEWLINE CREATED BY KEITH S. OF CHIBIAKUMAS
CompareStringLengths: ;HL = string 1 ;DE = string 2 ;CLOBBERS A,B,C push hl push de ex de,hl call GetStringLength ld b,c
ex de,hl call GetStringLength ld a,b cp c pop de pop hl ret ;returns carry set if HL < DE, zero set if equal, zero & carry clear if HL >= DE ;returns len(DE) in C, and len(HL) in B.
GetStringLength: ld b,0 loop_getStringLength: ld a,(hl) cp Terminator ret z inc hl inc b jr loop_getStringLength
NewLine: push af ld a,13 ;Carriage return call PrintChar ld a,10 ;Line Feed call PrintChar pop af ret
PrintString: ld a,(hl) cp Terminator ret z inc hl call PrintChar jr PrintString
ShowHex_NoLeadingZeroes:
- useful for printing values where leading zeroes don't make sense,
- such as money etc.
push af and %11110000 ifdef gbz80 ;game boy swap a else ;zilog z80 rrca rrca rrca rrca endif or a call nz,PrintHexChar ;if top nibble of A is zero, don't print it. pop af and %00001111 or a ret z ;if bottom nibble of A is zero, don't print it! jp PrintHexChar
PrintHexChar: or a ;Clear Carry Flag daa add a,&F0 adc a,&40 ;This sequence converts a 4-bit hex digit to its ASCII equivalent. jp PrintChar</lang>
- Output:
Goodbye 7 Hello 5