# Talk:Selection bias in clinical sciences

## Rank?

The task seems ambiguous.

According to wikipedia, when calculating the Kruskal statistic we "Rank all data from all groups together; i.e., rank the data from 1 to N ignoring group membership. ..."

But, other than the specification "You should get a statistical result highly favoring the REGULAR group" we don't have much of a specification about how to rank the simulated victims. We could rank them in two ranks (caught the disease, or didn't), we could rank them in 181 ranks (number of test days before they caught the disease). We could rank them arbitrarily (assigned test id#). But one thing is clear from the description of the statistic is that we should not rank them by group membership (which seems to exclude ranking them by the amount of medication they received -- though perhaps we're relaxing that condition here?). --Rdm (talk) 05:39, 29 September 2022 (UTC)

* True. The statistic chosen is designed only as an indicator of the 3 groups being __different__ from group to group in a way that should not occur by chance. The hint about the REGULAR group was just to provide a direction for judging output. Of course, if the groups rank in a particular order (they do) in their percentages of Covid cases, that could be taken to mean that if the group are statiustically different that the difference is one that favors the group with the lowest percentage of Covid-19. I agree that there is usually a different statistical test for the rankings between groups being in a particular direction. For example you could do a pairwise Wilcoxon rank sum test with directionality. However I think asking for the Kruskal-Wallis test already seems a lot to require from non-statistically oriented languages for the simulation. --Wherrera (talk) 08:12, 29 September 2022 (UTC)