Anonymous user
Talk:Fivenum: Difference between revisions
→R vs Wikipedia: Ref.
(→R vs Wikipedia: Ref.) |
|||
(11 intermediate revisions by 3 users not shown) | |||
Line 8:
[[User:Eoraptor|Eoraptor]] ([[User talk:Eoraptor|talk]]) 20:35, 25 February 2018 (UTC)
:"This has nothing to do with "big data", or with producing a "smaller array". Requiring that the five numbers yield the same boxplot if they are treated as data is pretty useless: the task emphasizes space reduction, but it will not save space."▼
:
:Everything in this quote is objectively false. I had a task in my job to make boxplots with huge datasets (> 120 GB of data) but all I needed were these five data points. It made no sense whatsoever to save every data point. I was doing this in Perl, since I don't like using R unless I have to. That was the purpose of the page. On the contrary, this task was very useful for me, but maybe not for you. I wouldn't have been able to make these plots without this task, specifically the Perl translation. "pretty useless"? on the contrary, this was essential, and I couldn't have performed the task without it. In the spirit of generosity, I decided to make my work in translating R's fivenum function available to others in case they had the same problem I did.--[[User:Hailholyghost|Hailholyghost]] ([[User talk:Hailholyghost|talk]]) 14:27, 26 February 2018 (UTC)▼
::You missed the point, again. That YOU had to do this in your job does not mean that computing 5 numbers is related to big data. Maybe you needed also a mean, that does not imply computing a mean is related to big data: you can take a mean of 10 values. Always your case, your job, your own particular situation, does not make a general task. The general task would say: compute these numbers, period. You can do it with small data, with big data, with gigantic data, noone cares. Besides, the task you asked is not exactly the same: you asked for data that would produce the same numbers. And THAT is useless, as it's obviously enough to store the numbers. But maybe the sentence was not clear enough? Oh, and you did not describe clearly how these five numbers are to be computed: as I already said, there is no universal convention on boxplots. All in all, you didn't address any of my questions. So much for your generosity. [[User:Eoraptor|Eoraptor]] ([[User talk:Eoraptor|talk]]) 20:00, 26 February 2018 (UTC)▼
== License ==
The R function is part of the R source, hence has GPL license. Any translation of this is a derivative work. On such a simple function, I doubt it would be a problem, but please be careful next time: copy-pasting is '''not''' fine. [[User:Eoraptor|Eoraptor]] ([[User talk:Eoraptor|talk]]) 23:29, 25 February 2018 (UTC)
▲"This has nothing to do with "big data", or with producing a "smaller array". Requiring that the five numbers yield the same boxplot if they are treated as data is pretty useless: the task emphasizes space reduction, but it will not save space."
▲Everything in this quote is objectively false. I had a task in my job to make boxplots with huge datasets (> 120 GB of data) but all I needed were these five data points. It made no sense whatsoever to save every data point. I was doing this in Perl, since I don't like using R unless I have to. That was the purpose of the page. On the contrary, this task was very useful for me, but maybe not for you. I wouldn't have been able to make these plots without this task, specifically the Perl translation. "pretty useless"? on the contrary, this was essential, and I couldn't have performed the task without it. In the spirit of generosity, I decided to make my work in translating R's fivenum function available to others in case they had the same problem I did.--[[User:Hailholyghost|Hailholyghost]] ([[User talk:Hailholyghost|talk]]) 14:27, 26 February 2018 (UTC)
▲:You missed the point, again. That YOU had to do this in your job does not mean that computing 5 numbers is related to big data. Maybe you needed also a mean, that does not imply computing a mean is related to big data: you can take a mean of 10 values. Always your case, your job, your own particular situation, does not make a general task. The general task would say: compute these numbers, period. You can do it with small data, with big data, with gigantic data, noone cares. Besides, the task you asked is not exactly the same: you asked for data that would produce the same numbers. And THAT is useless, as it's obviously enough to store the numbers. But maybe the sentence was not clear enough? Oh, and you did not describe clearly how these five numbers are to be computed: as I already said, there is no universal convention on boxplots. All in all, you didn't address any of my questions. So much for your generosity. [[User:Eoraptor|Eoraptor]] ([[User talk:Eoraptor|talk]]) 20:00, 26 February 2018 (UTC)
== Large vs not large ==
Line 24 ⟶ 22:
I would be happy with both possibilities, but these are entirely different tasks, and if we have to manage large data, please state how large, and adapt the current solutions accordingly. All current solutions imply the dataset lies entirely in memory. For "usual" machines, that means the dataset is actually rather small.
Hailholyghost gave his example above, here is another one: most of my work is done on a business PC with 8GB RAM and SAS/Stata/R/Python (and I suspect most professional statisticians work on a daily basis on that kind of machine, with that kind of software). Some of my work is done on a SAS VA server with
While both tasks described above are acceptable, I personally would be
[[User:Eoraptor|Eoraptor]] ([[User talk:Eoraptor|talk]]) 20:12, 27 February 2018 (UTC)
==Phix test result glitch ?==
I noticed, in passing, that the first of the three test results in the Phix example shows the value 43 where other code (and a quick test just now with the the built-in R function) is returning 42.5
Perhaps some kind of edge case that might be worth checking ? Or just a variant interpretation ? [[User:Hout|Hout]] ([[User talk:Hout|talk]]) 08:53, 13 February 2019 (UTC)
==R vs Wikipedia==
[[wp:Fivenum]] uses quartiles defined as members of set of input values. The R definition differs. Could do with a definitive definition or an explanation of the variants as part of this task, as, as others have stated, "do what R does" highlights issues. --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 13:56, 13 February 2019 (UTC)
:[[wp:Percentile#Definitions]] shows common methods of computing percentiles. --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 08:04, 15 February 2019 (UTC)
|