Talk:Suffix tree: Difference between revisions
→Valid implementations?: new section
(→Valid implementations?: new section) |
|||
(4 intermediate revisions by 3 users not shown) | |||
Line 7:
The wikipedia definition for a suffix tree currently looks like this:
:The suffix tree for the string <math>S</math> of length <math>n</math> is defined as a tree such that:<ref>
:* the paths from the root to the leaves have a one-to-one relationship with the suffixes of <math>S</math>,
:* edges spell non-empty strings,
Line 70:
::: I thought that seeking a definition for a normally well-defined algorithm was out of the scope of RC, but fair enough, I don't mind talking about it. From what I understand, your example is incomplete. The branch in "n" for instance should really have been "na", as all nodes in it start with "a". Somehow in the definition there has to be a rule stating that all edges labels have to be the longest possible. I don't know if it's clear in the wikipedia article. Maybe it should be clarified there.--[[User:Grondilu|Grondilu]] ([[User talk:Grondilu|talk]]) 12:05, 27 May 2013 (UTC)
:::: The distinction you draw in your opening sentence, in your paragraph here, is perhaps worth thinking about. Personally, I think, if the task page cannot convey the definition, and if the definition has its own jargon which must be studied, that the definition is very much in scope for RC. Sometimes an implementation can serve as a definition, but in this case the Racket implementation used an external library (which suggests an unbounded scope) and I do not know enough about perl6 to read its code and I do not know whether it would have been easier to learn the relevant perl6 or to just learn this algorithm (of course, both approaches have additional benefits, but since I imagine that I am not the only one that would want to implement this task, I opted to try to get the task definition clarified).
:::: Anyways, the wikipedia definition does mention that every parent node other than the root has at least two edges leading out.
:::: That said, my current impression is that the wikipedia article is confusing because it claims linear time for O(n log n) mechanisms. Average time might be linear (I do not know enough to determine that) but time is proportional to space and we need an additional copy of the text for every prefix variant that needs to be treated. I may be wrong here (I do not have proof) but thinking about using this mechanism to encode long, random bit strings seems to support this way of thinking. (And, if I am wrong, I would be very interested in seeing ''proof that the algorithm is O(n)'' that adequately covers space needed for random bit strings.) --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 13:57, 27 May 2013 (UTC)
==Definition (take 3)==
So the definition has been updated, both on Wikipedia and on RC. I used http://www.cs.uoi.gr/~kblekas/courses/bioinformatics/Suffix_Trees1.pdf as a reference.--[[User:Grondilu|Grondilu]] ([[User talk:Grondilu|talk]]) 17:37, 26 August 2013 (UTC)
== Valid implementations? ==
None of the displayed current implementations number their leaves, but numbering of leaves was a task requirement. Also, they are not displayed such that implicit numbering can be used (for example: node 0 first, node 1 second, node 2 third, ...) So can any of these implementations be considered valid for the current task definition? --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 23:33, 20 August 2015 (UTC)
|