Talk:Welch's t-test

Revision as of 13:53, 23 June 2015 by rosettacode>Paddy3118 (Section.)

Needs better task description

I haven't looked at the C code yet, but I'm assuming it's using t-test? The description should provide more context and explanations of concepts, and preferably links to algorithms. --Ledrug (talk) 20:16, 26 May 2015 (UTC)

Yes, this uses Welch's 2-sided t-test, as I commented inside the code.

Hi, you need to take all those nuggets out of your code comments and put them into an improved task description. The task description needs to stand on its own as a clear and concise description of what needs to be accomplished.
(P.S. Please sign your comments). --Paddy3118 (talk) 19:09, 27 May 2015 (UTC)

Hi Hailholyghost, I just had a look at the link you give and it is inadequate as a description for an RC task. The task description needs to be written for an audience of enthusiastic programmers - not necessarily maths or stats or whatever enthusiasts. It seems that you are new to RC and maybe you need to take time and lurk a bit more to understand a little more about how things are done.

This task needs a full description of the calculation method to use, probably in pseudocode, together with a decription of what the algorithm should be used for to complete a good task. The Code you give is not enough for a task description. --Paddy3118 (talk) 19:23, 27 May 2015 (UTC)

I added that link, as a pointer to the right direction for now. To be fair though, null hypothesis testing is very involved and sometimes borders black magic, so it may be difficult to explain everything clearly in a short text. The following wiki links may be relevant: wp:Statistical hypothesis testing, wp:ANOVA, and more specifically wp:Student's t-test and wp:Welch's t-test. The Student's t-test article has more details on actual computations, which forms the basis for the Welch's test. --Ledrug (talk) 19:44, 27 May 2015 (UTC)
So task description should perhaps also include explicit cautions about p values... Perhaps xkcd 882 and 1478? --Rdm (talk) 13:32, 3 June 2015 (UTC)

I've improved the C function to work with larger arrays using tgammal instead of tgamma, and have exception handling if the entered array is too small. I have made some modifications to the Simpson integration part, and the function now runs about twice as fast as before. I have also added a description. I have removed comments in my code. I hope this is satisfactory.--Hailholyghost (talk) 18:28, 3 June 2015 (UTC)

Looks like you pulled some of that math out of wikipedia, but even there there's not quite enough context. For example, what is the definition of u and of f(u)? That kind of stuff works in a classroom context where representative examples have been recently referenced, but that's not the case here.
Also, if you are going the math route I think you should mention basic assumptions (for example, I think you are assuming that the list of values were taken from what would be some normal distribution). --Rdm (talk) 20:04, 3 June 2015 (UTC)

I can work on the task description later. On a more practical matter, this code cannot calculate p-value for very large array sizes (> about 1755 elements). Does anyone know how to solve this? ==hailholyghost 15:18 Friday 5 June (UTC)

The fraction   blows up. How can I get ratio in terms of lgammal? ==hailholyghost 15:26 7 June 2015.

I can get this fraction in terms of  , but it is computationally expensive. At least it works now. As for the task description, how much detail is required? I only put what I considered necessary to the computation, as this is work I did myself. The internet is awash with articles about p-value, so I only linked to those wikipedia articles. The reason I wrote this page is because I was unable to find a way to implement this computation directly, after weeks and weeks of internet searches. I hope that this computer code can be beneficial to others.--hailholyghost 14:25 Tuesday 9 June 2015 UTC.

Just use exp(lgamma(a) - lgamma(a+0.5)). Replacing tgamma with tgammal is only delaying the overflow until longer data (10000 or so?), while loggamma function should not overflow with any reasonable data. --Ledrug (talk) 18:35, 9 June 2015 (UTC)
Hi Ledrug, I tihnk you used the logarithm identity   but this doesn't apply here because
 
rather,
  which unfortunately doesn't seem to go anywhere.
The answer to this has to buried somewhere in the bowels of the internet... but I can't find it...--hailholyghost 14:07 10 June 2015 (UTC)
How does that not apply?   where A and B are the gammas, isn't that what you want? --Ledrug (talk) 19:02, 10 June 2015 (UTC)
If I understand the mass of expressions on the task page, you want to evaluate
 
But this is equivalent to
 
Or have I misunderstood what the task needs? --Rdm (talk) 13:31, 10 June 2015 (UTC)
Rdm, thank you so much!!!! --hailholyghost 15:11 EST 10 June 2015 (UST)

--Ledrug you are correct, of course, I put what you said into the task description. I'll put more about the definition of the p-value and warnings, maybe split the task description into two different sections.--hailholyghost 16:00 UTC 13 June 2015 (UST)

Task description complete?

I have made the task description more complete. I consider this page as ready to be published as a complete task. If someone else feels it is not ready, please give me a *specific* description of what's missing or why this isn't yet ready. I tried adding references but had formatting issues. I would like to cite this link, among others, if someone could please show me how to do this: http://www.nature.com/polopoly_fs/1.14700!/menu/main/topColumns/topLeftColumn/pdf/506150a.pdf --Hailholyghost (talk) 13:28, 23 June 2015 (UTC)--

Return to "Welch's t-test" page.