Talk:Determine if a string is numeric: Difference between revisions

 
(15 intermediate revisions by 10 users not shown)
Line 34:
For those who don't know VB: How exactly is IsNumeric defined?
For example: Is leading/trailing whitespace allowed (i.e. " 123" or "123 ")?
Does it also accept flotingfloating point values (e.g. "2.5" or "1e5")?
What about thousands separators (e.g. "10,000")? Is that locale-dependent?
Are numbers in other bases (e.g. hexadecimal) allowed (assuming VB supports them otherwise)?
Line 60:
:: I hadn't realized that there was an ambiguity. I hadn't even realized that "isnumeric" is a VB function (I certainly don'ty know VB). In the two examples I contributed (IDL and TCL) I assumed that the task meant that something would be interpreted as a number if handed to the language in question. I.e. if I can multiply it with two or take the sin() of it then it is numeric. For example in IDL I might say "sin(double(x))" where "double(x)" converts the input into a "double" (8-byte float) which will fail if "x" is, for example, the string "foo". I trap the error and decide what is or isn't "numeric" based on the occurrence of this kind of error. This will allow "1.1" or "-.1e-04" or "+000003" etc.
:: Should we tag the task for clarification? [[User:Sgeier|Sgeier]] 10:34, 20 September 2007 (MDT)
 
:::I think we need clarification of what is classed as a numeric string. Are commas allowed in the string? Must they be in certain places? What about strings containing numbers represented using underscore annotation? Or numbers in a notation that represents a non-decimal base?
Which of the following example strings are classed as numeric for the purpose of this task?
 
* "20,376"
* "20367"
* "20 368"
* "203 69"
* "20_367"
* "203_76"
* "0x1234" - Hexadecimal
* "0xFFFF" - Hexadecimal
* "0xFFGF" - Is this invalid hexidecimal?
* "01677" - This could be an octal number
* "01678" - This could be an invalid octal number
* "0b10101000" - Could be a binary number
* "0b10102010" - This is probably an invalid binary number
* "10101000b" - This is a binary number in an alternative notation
* "10101020b" - This is an invalid binary number
* "1677o" - This is an octal number in an alternative notation
* "1678o" - This is an invalid octal number in an alternative notation
* "1234h" - Hexadecimal alternative notation
* "FFFFh" - Hexadecimal alternative notation
* "FFFGh" - This is not a valid hexadecimal number
* "+27" - The positive number 27
* "3+2" - This is an expression
 
--[[User:Markhobley|Markhobley]] 16:28, 4 June 2011 (UTC)
:Whatever would be a legal numeric literal accepted by the language compiler/interpreter - thus making it language specific? --[[User:Paddy3118|Paddy3118]] 20:10, 4 June 2011 (UTC)
::I'm not sure. In theory, the application program could support the various numeric formats, even though the underlying language may not. --[[User:Markhobley|Markhobley]] 21:38, 4 June 2011 (UTC)
 
I think we can say that expressions and the invalid numbers are not numeric strings, but this then means we need sufficient logic in the code to be able to identify these as such.
 
In the past, I have assumed that the number system was decimal and accepted only digits, an optional leading hyphen, and a single decimal point. With that logic, "a numeric string is a string consisting only of digits, an optional leading hyphen and an optional single decimal point". Maybe this is the way to go. Note that in some locales, numeric strings fall outside of this definition, so this also needs to be considered. Under that definition strings containing whitespace return a result of "not numeric", but this does not matter in practice, because code that makes use of the result can easily trim whitespace from the string before feeding it to the evaluator (I have done this before and it has worked well for me). --[[User:Markhobley|Markhobley]] 21:38, 4 June 2011 (UTC)
 
: If a leading hyphen (minus sign) is OK, why not a leading plus sign as well (but not both, of course)?   "A leading hyphen" was mentioned, but I assume you meant a ''single'' leading hyphen (minus sign).   However, some languages allow multiple leading signs. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 17:57, 28 September 2013 (UTC)
 
== Objective-C question ==
Line 74 ⟶ 110:
 
Furthermore – similar to the VB discussion above – many programming languages allow floating point numbers to be in the form <code>1.23e15</code> which is currently handled by very few, if any, examples. In a similar vein, hexadecimal, octal or binary numeric literals – in C and languages that follow its conventions closely, <code>09</code> would ''not'' be a valid numeric literal. —[[User:Hypftier|Johannes Rössel]] 17:56, 6 July 2010 (UTC)
 
:I would stick to the numeric literals that you could write in your source and get accepted as a number. If your compiler or interpreter doesn't accept locale-aware things like extra dots or commas then I'd say you were fine, (but what do I know).
:I guess examples should note if there are types of numeric literals of their language that the routines ''don't'' accept, but I think that some examples were written to implement something seen in other examples rather than with an idea to cover all the numeric literal forms the language allows. --[[User:Paddy3118|Paddy3118]] 02:44, 7 July 2010 (UTC)
 
==Mathematica seems lean?==
It just seems to me that Mathematica should have ways to recognize many more number types than the current entry seems to suggest. compare the E, Forth and Python entries. --[[User:Paddy3118|Paddy3118]] 14:54, 26 August 2010 (UTC)
 
== Second MATLAB solution might have a problem ==
 
I don't know if you want to call it "incorrect", but I think the second MATLAB solution would call "...." numeric. I don't see any way of determining that there is only one decimal point in the string. --[[User:Mwn3d|Mwn3d]] 15:59, 2 February 2012 (UTC)
 
== Problem with the VB.net example ==
 
The IsNumeric call will not always return a correct answer. Try "1234+" - IsNumeric will return True. Then pass this same value as an argument to Convert.ToDouble: it will fail "unhandled exception". There's no option but to loop thru the String character by character and check that each one is within range. --[[User:LazyZee|LazyZee]] 17:59, 19 October 2012 (UTC)
 
== R ==
 
The solution will not work correctly if the string contains "NA" (not a number). But NA can be used in numeric calculations. --[[User:Sigbert|Sigbert]] ([[User talk:Sigbert|talk]]) 13:01, 16 February 2015 (UTC)
 
== AWK code is wrong ==
 
<pre>
awk 'BEGIN { x = "01"; print x == x+0 }'
0
awk 'BEGIN { x = "+1"; print x == x+0 }'
0
awk 'BEGIN { x = "1x"; print x == x+0 }'
0
</pre>
But
<pre>
awk 'BEGIN { x = "1x"; print x+1 }'
2
</pre>
 
== Assembly implementation is harder than I thought==
 
The way I see it, the code will need to do the following:
 
Read the first character.
* If it's a null terminator, the program ends and the string is not numeric.
* If it's a minus sign, that's valid and so we can continue
* If it's a decimal point, that's valid and we can continue. However, the presence of a decimal point must be recorded. If the string contains a second decimal point it stops being numeric.
* If it's not a number, at this point the string is not numeric. Otherwise continue.
 
Read the second character.
* If it's the terminator, the string is numeric if the first character was a numeral. If the first character was a minus sign or decimal point, the string is not numeric.
* If the second character is a numeral, continue.
* If the second character is a decimal point, continue if we haven't seen one already. If there is more than one decimal point the string is not numeric.
* If the second character is anything else, the string is not numeric.
 
Loop through the rest of the string.
* If we encounter the terminator, the string is numeric.
* If we encounter a numeral, continue.
* If we encounter a second decimal point, the string is not numeric.
* If we encounter anything besides a numeral, the null terminator, or the first occurrence of a decimal point, the string is not numeric.
 
I've attempted this in both 8086 and z80 but it ends up looking like spaghetti no matter what I do.
 
: @[[User: Puppydrum64]]: Uhm, is this task even applicable for assembly? I don’t thinks so. You know, depending on the assembler used there are ''several'' ways to write numeric literals. In NASM, for instance, I can write <tt>0b0110_1101</tt>. There simply isn’t ''a/the'' programming language “8086 assembly” ''defining'' how to write numeric literals. What you are attempting to do, though, is rather writing an assembly function that accepts a null-terminated string that represents a valid numeric literal ''in C'' (and many ''other'' programming languages), but not “8086 assembly”, you know what I mean? [[User:Root|Root]] ([[User talk:Root|talk]]) 18:10, 20 August 2021 (UTC)
 
I get what you mean. I figured at least attempting something that represents a valid numeric literal in higher-level languages was as close as I could get. Throwing in the towel and just saying "This is assembly, there are no rules" would disqualify it from almost everything. --[[User:Puppydrum64|Puppydrum64]] ([[User talk:Puppydrum64|talk]]) 17:45, 15 September 2021 (UTC)
1,489

edits