Names to numbers: Difference between revisions
(→Tcl: Added implementation) |
|||
Line 219: | Line 219: | ||
small = 0 |
small = 0 |
||
elif word in HUGE: |
elif word in HUGE: |
||
total += small * |
total += small * 1000 ** HUGE.index(word) |
||
small = 0 |
small = 0 |
||
else: |
else: |
Revision as of 23:52, 17 May 2013
Translate the spelled-out English name of a number to a number. You can use a preexisting implementation or roll your own, but you should support inputs up to at least one million (or the maximum value of your language's default bounded integer type, if that's less).
Support for inputs other than positive integers (like zero, negative integers, fractions and floating-point numbers) is optional.
- See also
- Number names — the reverse operation.
Perl
The following code reads a file line-by-line. It echos comment lines starting with a hashmark and blank lines. Remaining lines are output followed by an arrow "=>" and any non-negative integer number names translated into numbers, e.g., a line with "ninety-nine" is output like this: "ninety-nine => 99". <lang perl>use strict; use List::Util qw(sum);
our %nums = (
zero => 0, one => 1, two => 2, three => 3, four => 4, five => 5, six => 6, seven => 7, eight => 8, nine => 9, ten => 10, eleven => 11, twelve => 12, thirteen => 13, fourteen => 14, fifteen => 15, sixteen => 16, seventeen => 17, eighteen => 18, nineteen => 19, twenty => 20, thirty => 30, forty => 40, fifty => 50, sixty => 60, seventy => 70, eighty => 80, ninety => 90, hundred => 100, thousand => 1_000, million => 1_000_000, billion => 1_000_000_000, trillion => 1_000_000_000_000, # My ActiveState Win32 Perl uses e-notation after 999_999_999_999_999 quadrillion => 1e+015, quintillion => 1e+018);
- Groupings for thousands, millions, ..., quintillions
our $groups = qr/\d{4}|\d{7}|\d{10}|\d{13}|1e\+015|1e\+018/;
- Numeral or e-notation
our $num = qr/\d+|\d+e\+\d+/;
while (<>) {
# skip blank lines if(/^\s*$/) { print; next; } # echo comment lines if( /^\s*#.*$/ ) { print; next; }
chomp; my $orig = $_; s/-/ /g; # convert hyphens to spaces s/\s\s+/ /g; # remove duplicate whitespace, convert ws to space s/ $//g; # remove trailing blank s/^ //g; # remove leading blank $_ = lc($_); # convert to lower case # tokenize sentence boundaries s/([\.\?\!]) / $1\n/g; s/([\.\?\!])$/ $1\n/g; # tokenize other punctuation and symbols s/\$(.)/\$ $1/g; # prefix s/(.)([\;\:\%',])/$1 $2/g; # suffix
foreach my $key (keys %nums) { s/\b$key\b/$nums{$key}/eg; }
s/(\d) , (\d)/$1 $2/g; s/(\d) and (\d)/$1 $2/g;
s/\b(\d) 100 (\d\d) (\d) (${groups})\b/($1 * 100 + $2 + $3) * $4/eg;
s/\b(\d) 100 (\d\d) (${groups})\b/($1 * 100 + $2) * $3/eg; s/\b(\d) 100 (\d) (${groups})\b/($1 * 100 + $2) * $3/eg; s/\b(\d) 100 (${groups})\b/$1 * $2 * 100/eg;
s/\b100 (\d\d) (\d) (${groups})\b/(100 + $1 + $2) * $3/eg; s/\b100 (\d\d) (${groups})\b/(100 + $1) * $2/eg; s/\b100 (\d) (${groups})\b/(100 + $1) * $2/eg; s/\b100 (${groups})\b/$1 * 100/eg;
s/\b(\d\d) (\d) (${groups})\b/($1 + $2) * $3/eg; s/\b(\d{1,2}) (${groups})\b/$1 * $2/eg;
s/\b(\d\d) (\d) 100\b/($1 + $2) * 100/eg; s/\b(\d{1,2}) 100\b/$1 * 100/eg;
# Date anomolies: nineteen eighty-four and twenty thirteen
s/\b(\d{2}) (\d{2})\b/$1 * 100 + $2/eg;
s/((?:${num} )*${num})/sum(split(" ",$1))/eg;
print $orig, " => ", $_, "\n"; }</lang> Here is a sample input file:
# For numbers between 21 and 99 inclusive, we're supposed to use a hyphen, # but the bank still cashes our check without the hyphen: Seventy-two dollars Seventy two dollars # For numbers bigger than 100, we're not supposed to use "and," # except we still use "and" anyway, e.g., One Hundred and One Dalmatians A Hundred and One Dalmatians One Hundred One Dalmatians Hundred and One Dalmatians One Thousand and One Nights Two Thousand and One: A Space Odyssey # Date anomolies Twenty Thirteen Nineteen Eighty-Four # Maximum value an "unsigned long int" can hold on a 32-bit machine = 2^32 - 1 # define ULONG_MAX 4294967295 four billion, two hundred ninety-four million, nine hundred sixty-seven thousand, two hundred ninety five # Max positive integer on 32-bit Perl is 9_007_199_254_740_992 = 2^53 # Use Math::BigInt if you need more. # Note Perl usually stringifies to 15 digits of precision and this has 16 Nine quadrillion, seven trillion, one hundred ninety-nine billion, two hundred fifty-four million, seven hundred forty thousand, nine hundred ninety two Nine Hundred Ninety-Nine One Thousand One Hundred Eleven Eleven Hundred Eleven Eight Thousand Eight Hundred Eighty-Eight Eighty-Eight Hundred Eighty-Eight Seven Million Seven Hundred Seventy-Seven Thousand Seven Hundred Seventy-Seven Ninety-Nine Trillion Nine Hundred Ninety-Nine Billion Nine Hundred Ninety-Nine Million Nine Hundred Ninety-Nine Thousand Nine Hundred Ninety-Nine ninety-nine three hundred three hundred and ten one thousand, five hundred and one twelve thousand, six hundred and nine five hundred and twelve thousand, six hundred and nine forty-three million, one hundred and twelve thousand, six hundred and nine two billion, one hundred zero eight one hundred one hundred twenty three one thousand one ninety nine thousand nine hundred ninety nine one hundred thousand nine billion one hundred twenty three million four hundred fifty six thousand seven hundred eighty nine one hundred eleven billion one hundred eleven
And here is the resulting output file:
# For numbers between 21 and 99 inclusive, we're supposed to use a hyphen, # but the bank still cashes our check without the hyphen: Seventy-two dollars => 72 dollars Seventy two dollars => 72 dollars # For numbers bigger than 100, we're not supposed to use "and," # except we still use "and" anyway, e.g., One Hundred and One Dalmatians => 101 dalmatians A Hundred and One Dalmatians => a 101 dalmatians One Hundred One Dalmatians => 101 dalmatians Hundred and One Dalmatians => 101 dalmatians One Thousand and One Nights => 1001 nights Two Thousand and One: A Space Odyssey => 2001 : a space odyssey # Date anomolies Twenty Thirteen => 2013 Nineteen Eighty-Four => 1984 # Maximum value an "unsigned long int" can hold on a 32-bit machine = 2^32 - 1 # define ULONG_MAX 4294967295 four billion, two hundred ninety-four million, nine hundred sixty-seven thousand, two hundred ninety five => 4294967295 # Max positive integer on 32-bit Perl is 9_007_199_254_740_992 = 2^53 # Use Math::BigInt if you need more. # Note Perl usually stringifies to 15 digits of precision and this has 16 Nine quadrillion, seven trillion, one hundred ninety-nine billion, two hundred fifty-four million, seven hundred forty thousand, nine hundred ninety two => 9.00719925474099e+015 Nine Hundred Ninety-Nine => 999 One Thousand One Hundred Eleven => 1111 Eleven Hundred Eleven => 1111 Eight Thousand Eight Hundred Eighty-Eight => 8888 Eighty-Eight Hundred Eighty-Eight => 8888 Seven Million Seven Hundred Seventy-Seven Thousand Seven Hundred Seventy-Seven => 7777777 Ninety-Nine Trillion Nine Hundred Ninety-Nine Billion Nine Hundred Ninety-Nine Million Nine Hundred Ninety-Nine Thousand Nine Hundred Ninety-Nine => 99999999999999 ninety-nine => 99 three hundred => 300 three hundred and ten => 310 one thousand, five hundred and one => 1501 twelve thousand, six hundred and nine => 12609 five hundred and twelve thousand, six hundred and nine => 512609 forty-three million, one hundred and twelve thousand, six hundred and nine => 43112609 two billion, one hundred => 2000000100 zero => 0 eight => 8 one hundred => 100 one hundred twenty three => 123 one thousand one => 1001 ninety nine thousand nine hundred ninety nine => 99999 one hundred thousand => 100000 nine billion one hundred twenty three million four hundred fifty six thousand seven hundred eighty nine => 9123456789 one hundred eleven billion one hundred eleven => 111000000111
Python
This example assumes that the module from Number_names#Python is stored as spell_integer.py.
The example understands the textual format generated from number-to-names module.
Note: This example and Number_names#Python need to be kept in sync <lang python>from spell_integer import spell_integer
def int_from_words(num):
words = num.replace(',',).replace(' and ', ' ').replace('-', ' ').split() if words[0] == 'minus': negmult = -1 words.pop(0) else: negmult = 1 small, total = 0, 0 for word in words: if word in SMALL: small += SMALL.index(word) elif word in TENS: small += TENS.index(word) * 10 elif word == 'hundred': small *= 100 elif word == 'thousand': total += small * 1000 small = 0 elif word in HUGE: total += small * 1000 ** HUGE.index(word) small = 0 else: raise ValueError("Don't understand %r part of %r" % (word, num)) return negmult * (total + small)
if __name__ == '__main__':
# examples for n in range(-10000, 10000, 17): assert n == int_from_words(spell_integer(n))
for n in range(20): assert 13**n == int_from_words(spell_integer(13**n)) print('\n##\n## These tests show <==> for a successful round trip, otherwise <??>\n##\n') for n in (0, -3, 5, -7, 11, -13, 17, -19, 23, -29): txt = spell_integer(n) num = int_from_words(txt) print('%+4i <%s> %s' % (n, '==' if n == num else '??', txt)) print() n = 201021002001 while n: txt = spell_integer(n) num = int_from_words(txt) print('%12i <%s> %s' % (n, '==' if n == num else '??', txt)) n //= -10 txt = spell_integer(n) num = int_from_words(txt) print('%12i <%s> %s' % (n, '==' if n == num else '??', txt)) print()</lang>
- Output:
## ## These tests show <==> for a successful round trip, otherwise <??> ## +0 <==> zero -3 <==> minus three +5 <==> five -7 <==> minus seven +11 <==> eleven -13 <==> minus thirteen +17 <==> seventeen -19 <==> minus nineteen +23 <==> twenty-three -29 <==> minus twenty-nine 201021002001 <==> two hundred and one billion, twenty-one million, two thousand, and one -20102100201 <==> minus twenty billion, one hundred and two million, one hundred thousand, two hundred and one 2010210020 <==> two billion, ten million, two hundred and ten thousand, and twenty -201021002 <==> minus two hundred and one million, twenty-one thousand, and two 20102100 <==> twenty million, one hundred and two thousand, and one hundred -2010210 <==> minus two million, ten thousand, two hundred and ten 201021 <==> two hundred and one thousand, and twenty-one -20103 <==> minus twenty thousand, one hundred and three 2010 <==> two thousand, and ten -201 <==> minus two hundred and one 20 <==> twenty -2 <==> minus two 0 <==> zero
Tcl
<lang tcl>package require Tcl 8.6 proc name2num name {
set words [regexp -all -inline {[a-z]+} [string tolower $name]] set tokens {
"zero" 0 "one" 1 "two" 2 "three" 3 "four" 4 "five" 5 "six" 6 "seven" 7 "eight" 8 "nine" 9 "ten" 10 "eleven" 11 "twelve" 12 "thirteen" 13 "fourteen" 14 "fifteen" 15 "sixteen" 16 "seventeen" 17 "eighteen" 18 "nineteen" 19 "twenty" 20 "thirty" 30 "forty" 40 "fifty" 50 "sixty" 60 "seventy" 70 "eighty" 80 "ninety" 90 "hundred" 100 "thousand" 1000 "million" 1000000 "billion" 1000000000 "trillion" 1000000000000 "quadrillion" 1000000000000000 "qintillion" 1000000000000000000
} set values {} set groups {} set previous -inf set sign 1 foreach word $words {
if {[dict exists $tokens $word]} { set value [dict get $tokens $word] if {$value < $previous} { # Check if we have to propagate backwards the "large" terms if {[set mult [lindex $values end]] > 99} { for {set i [llength $groups]} {[incr i -1] >= 0} {} { if {[lindex $groups $i end] >= $mult} { break } lset groups $i end+1 $mult } } lappend groups $values set values {} } elseif {$value < 100 && $previous < 100 && $previous >= 0} { # Special case: dates lappend groups [lappend values 100] set values {} } lappend values $value set previous $value } elseif {$word eq "minus"} { set sign -1 }
} lappend groups $values set groups [lmap prodgroup $groups {tcl::mathop::* {*}$prodgroup}] # Special case: dates if {[llength $groups] == 2} {
if {[lmap g $groups {expr {$g < 100 && $g >= 10}}] eq {1 1}} { lset groups 0 [expr {[lindex $groups 0] * 100}] }
} return [expr {$sign * [tcl::mathop::+ {*}$groups]}]
}</lang> Demonstrating/testing (based on Perl code's samples): <lang tcl>set samples {
"Seventy-two dollars" "Seventy two dollars" "One Hundred and One Dalmatians" "A Hundred and One Dalmatians" "One Hundred One Dalmatians" "Hundred and One Dalmatians" "One Thousand and One Nights" "Two Thousand and One: A Space Odyssey" "Twenty Thirteen" "Nineteen Eighty-Four" "four billion, two hundred ninety-four million, nine hundred sixty-seven thousand, two hundred ninety five" "Nine quadrillion, seven trillion, one hundred ninety-nine billion, two hundred fifty-four million, seven hundred forty thousand, nine hundred ninety two" "Nine Hundred Ninety-Nine" "One Thousand One Hundred Eleven" "Eleven Hundred Eleven" "Eight Thousand Eight Hundred Eighty-Eight" "Eighty-Eight Hundred Eighty-Eight" "Seven Million Seven Hundred Seventy-Seven Thousand Seven Hundred Seventy-Seven" "Ninety-Nine Trillion Nine Hundred Ninety-Nine Billion Nine Hundred Ninety-Nine Million Nine Hundred Ninety-Nine Thousand Nine Hundred Ninety-Nine" "ninety-nine" "three hundred" "three hundred and ten" "one thousand, five hundred and one" "twelve thousand, six hundred and nine" "five hundred and twelve thousand, six hundred and nine" "forty-three million, one hundred and twelve thousand, six hundred and nine" "two billion, one hundred" "zero" "eight" "one hundred" "one hundred twenty three" "one thousand one" "ninety nine thousand nine hundred ninety nine" "one hundred thousand" "nine billion one hundred twenty three million four hundred fifty six thousand seven hundred eighty nine" "one hundred eleven billion one hundred eleven" "minus fifty six"
} foreach s $samples {
puts "$s => [name2num $s]"
}</lang>
- Output:
Seventy-two dollars => 72 Seventy two dollars => 72 One Hundred and One Dalmatians => 101 A Hundred and One Dalmatians => 101 One Hundred One Dalmatians => 101 Hundred and One Dalmatians => 101 One Thousand and One Nights => 1001 Two Thousand and One: A Space Odyssey => 2001 Twenty Thirteen => 2013 Nineteen Eighty-Four => 1984 four billion, two hundred ninety-four million, nine hundred sixty-seven thousand, two hundred ninety five => 4294967295 Nine quadrillion, seven trillion, one hundred ninety-nine billion, two hundred fifty-four million, seven hundred forty thousand, nine hundred ninety two => 9007199254740992 Nine Hundred Ninety-Nine => 999 One Thousand One Hundred Eleven => 1111 Eleven Hundred Eleven => 1111 Eight Thousand Eight Hundred Eighty-Eight => 8888 Eighty-Eight Hundred Eighty-Eight => 8888 Seven Million Seven Hundred Seventy-Seven Thousand Seven Hundred Seventy-Seven => 7777777 Ninety-Nine Trillion Nine Hundred Ninety-Nine Billion Nine Hundred Ninety-Nine Million Nine Hundred Ninety-Nine Thousand Nine Hundred Ninety-Nine => 99999999999999 ninety-nine => 99 three hundred => 300 three hundred and ten => 310 one thousand, five hundred and one => 1501 twelve thousand, six hundred and nine => 12609 five hundred and twelve thousand, six hundred and nine => 512609 forty-three million, one hundred and twelve thousand, six hundred and nine => 43112609 two billion, one hundred => 2000000100 zero => 0 eight => 8 one hundred => 100 one hundred twenty three => 123 one thousand one => 1001 ninety nine thousand nine hundred ninety nine => 99999 one hundred thousand => 100000 nine billion one hundred twenty three million four hundred fifty six thousand seven hundred eighty nine => 9123456789 one hundred eleven billion one hundred eleven => 111000000111 minus fifty six => -56