|
|
Line 19: |
Line 19: |
|
* <code>0 999 4000 4999 7000 7999</code> detects the half-width bug and some smaller errors (see Tcl). Output should have three heights; the half-width bug looks like: ▁▂▅▅▇█ |
|
* <code>0 999 4000 4999 7000 7999</code> detects the half-width bug and some smaller errors (see Tcl). Output should have three heights; the half-width bug looks like: ▁▂▅▅▇█ |
|
|
|
|
|
|
: '''Addendum:''' ''the second test case assumes that each of the 8 heights should represent 1/8<sup>th</sup> of the range, as closely as possible. Not everyone agrees. See [[#counterpoint]] and [[#Deeper_root_of_the_.27bug.27_.3F|Deeper root of the bug?]] below for discussion.'' |
⚫ |
:: A very helpful intervention and discussion, and I agree absolutely about the first test example. |
|
⚫ |
|
|
⚫ |
:: Perhaps our interpretation of the '''second''' test example depends on some unclarified assumptions about the optimal width (and alignment) of the bins ? |
|
⚫ |
:: The Haskell '''Statistics.Sample.Histogram''' library, for example, returns the following allocation of the sample <code>0 999 4000 4999 7000 7999</code> to 8 evenly sized bins: |
|
⚫ |
:: <code>[1,1,0,0,2,0,1,1]</code> |
|
⚫ |
:: which would, I think, correspond to 5 different sparkline heights, unless I am confusing myself. |
|
⚫ |
:: The set of lower bounds suggested by '''Statistics.Sample.Histogram''' for a division of this sample between 8 bins is: |
|
⚫ |
:: <code>[-571.3571428571429,571.3571428571429,1714.0714285714287,2856.7857142857147,3999.5,5142.214285714286,6284.928571428572,7427.642857142857]</code> |
|
⚫ |
:: The assumption they are making is that any given sample is likely to be drawn from a slightly larger range of possible sample values, and that some margin can usefully be allowed. |
|
⚫ |
:: The margin which that library adopts is <code>margin = (hi - lo) / (fromIntegral (intBins - 1) * 2))</code> |
|
⚫ |
:: (yielding fractionally larger bins and a total range that starts a little below the minimum observed value, and ends a little above the maximum observed value) |
|
⚫ |
:: Arguably reasonable for us to do something comparable ? [[User:Hout|Hout]] ([[User talk:Hout|talk]]) 12:26, 26 February 2019 (UTC) |
|
⚫ |
::: PS the dependence of edge cases on mutable assumptions (e.g. the relationship between the range of the sample and the range of possible/graphed values) may be underscored by the result given by the '''Mathematica 11 Histogram function''', which (if we specify only a target number of bins) allocates the same sample as follows (different pattern again, but still, I think, 5 sparkline levels): |
|
⚫ |
:::: <code>Histogram[{0, 999, 4000, 4999, 7000, 7999}, {"Raw", 8}] --> </code> |
|
⚫ |
:::: [2, 0, 0, 1, 1, 0, 1, 1] |
|
⚫ |
|
|
⚫ |
:::: And similarly the '''R language hist() function''' expression <code>hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code> |
|
⚫ |
:::: Returns a distribution of 5 [2, 0, 0, 1, 1, 0, 1, 1], again using 5 (rather than 3) of 8 available bins. |
|
⚫ |
:::: The breaks which it derives from that data set can be listed: |
|
⚫ |
:::: <code> > histinfo<-hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code> |
|
⚫ |
:::: <code> > histinfo</code> |
|
⚫ |
:::: <code>$breaks</code> |
|
⚫ |
:::: <code>[1] 0 1000 2000 3000 4000 5000 6000 7000 8000</code> |
|
⚫ |
::::[[User:Hout|Hout]] ([[User talk:Hout|talk]]) 13:33, 26 February 2019 (UTC) |
|
|
|
|
⚫ |
::::: "fractionally larger bins" is the Tcl approach I discussed in the section above. It's fine but requires careful selection of the denominator. Too big, and the bins are wider than they need to be (Tcl's mistake); too small, and it can be erased by fp errors. |
|
|
|
|
⚫ |
::::: edit: the relationship between the value of <code>breaks</code> and the number of bins in R is completely opaque and does not match the documentation. For example, <code>hist(0:9, breaks=x)</code> gives 2 bins for x=3; 5 bins for x=4,5,6; 9 bins for x=7. |
|
|
|
|
⚫ |
::::: edit2: I should clarify that Haskell's solution exhibits the half-width bug. I don't believe this is defensible. Much better choices of denominator are available. --Oopsiedaisy, 26 February 2019 |
|
|
|
|
|
|
|
|
|
|
|
|
;sparktest.pl |
|
;sparktest.pl |
Line 100: |
Line 69: |
|
|
|
|
|
::Thanks Oopsiedaisy. I started the task off with an initial buggy Python solution. Now fixed and with examples extended to show your problem cases. Thanks again. --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 19:35, 24 February 2019 (UTC) |
|
::Thanks Oopsiedaisy. I started the task off with an initial buggy Python solution. Now fixed and with examples extended to show your problem cases. Thanks again. --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 19:35, 24 February 2019 (UTC) |
|
|
|
|
|
|
|
|
====Counterpoint==== |
|
⚫ |
:: A very helpful intervention and discussion, and I agree absolutely about the first test example. |
|
⚫ |
|
|
⚫ |
:: Perhaps our interpretation of the '''second''' test example depends on some unclarified assumptions about the optimal width (and alignment) of the bins ? |
|
⚫ |
:: The Haskell '''Statistics.Sample.Histogram''' library, for example, returns the following allocation of the sample <code>0 999 4000 4999 7000 7999</code> to 8 evenly sized bins: |
|
⚫ |
:: <code>[1,1,0,0,2,0,1,1]</code> |
|
⚫ |
:: which would, I think, correspond to 5 different sparkline heights, unless I am confusing myself. |
|
⚫ |
:: The set of lower bounds suggested by '''Statistics.Sample.Histogram''' for a division of this sample between 8 bins is: |
|
⚫ |
:: <code>[-571.3571428571429,571.3571428571429,1714.0714285714287,2856.7857142857147,3999.5,5142.214285714286,6284.928571428572,7427.642857142857]</code> |
|
⚫ |
:: The assumption they are making is that any given sample is likely to be drawn from a slightly larger range of possible sample values, and that some margin can usefully be allowed. |
|
⚫ |
:: The margin which that library adopts is <code>margin = (hi - lo) / (fromIntegral (intBins - 1) * 2))</code> |
|
⚫ |
:: (yielding fractionally larger bins and a total range that starts a little below the minimum observed value, and ends a little above the maximum observed value) |
|
⚫ |
:: Arguably reasonable for us to do something comparable ? [[User:Hout|Hout]] ([[User talk:Hout|talk]]) 12:26, 26 February 2019 (UTC) |
|
⚫ |
::: PS the dependence of edge cases on mutable assumptions (e.g. the relationship between the range of the sample and the range of possible/graphed values) may be underscored by the result given by the '''Mathematica 11 Histogram function''', which (if we specify only a target number of bins) allocates the same sample as follows (different pattern again, but still, I think, 5 sparkline levels): |
|
⚫ |
:::: <code>Histogram[{0, 999, 4000, 4999, 7000, 7999}, {"Raw", 8}] --> </code> |
|
⚫ |
:::: [2, 0, 0, 1, 1, 0, 1, 1] |
|
⚫ |
|
|
⚫ |
:::: And similarly the '''R language hist() function''' expression <code>hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code> |
|
⚫ |
:::: Returns a distribution of 5 [2, 0, 0, 1, 1, 0, 1, 1], again using 5 (rather than 3) of 8 available bins. |
|
⚫ |
:::: The breaks which it derives from that data set can be listed: |
|
⚫ |
:::: <code> > histinfo<-hist(c(0, 999, 4000, 4999, 7000, 7999), breaks=8)</code> |
|
⚫ |
:::: <code> > histinfo</code> |
|
⚫ |
:::: <code>$breaks</code> |
|
⚫ |
:::: <code>[1] 0 1000 2000 3000 4000 5000 6000 7000 8000</code> |
|
⚫ |
::::[[User:Hout|Hout]] ([[User talk:Hout|talk]]) 13:33, 26 February 2019 (UTC) |
|
|
|
|
⚫ |
::::: "fractionally larger bins" is the Tcl approach I discussed in the section above. It's fine but requires careful selection of the denominator. Too big, and the bins are wider than they need to be (Tcl's mistake); too small, and it can be erased by fp errors. |
|
|
|
|
⚫ |
::::: edit: the relationship between the value of <code>breaks</code> and the number of bins in R is completely opaque and does not match the documentation. For example, <code>hist(0:9, breaks=x)</code> gives 2 bins for x=3; 5 bins for x=4,5,6; 9 bins for x=7. |
|
|
|
|
⚫ |
::::: edit2: I should clarify that Haskell's solution exhibits the half-width bug. I don't believe this is defensible. Much better choices of denominator are available. --Oopsiedaisy, 26 February 2019 |
|
|
|
|
|
==Deeper root of the 'bug' ?== |
|
==Deeper root of the 'bug' ?== |