Category talk:Wren-upc: Difference between revisions

From Rosetta Code
Content added Content deleted
(Added a comment about being unable to upload source code for this module.)
(Have given up trying to post the source code to RC and have posted it to a GitHub gist instead.)
Line 6:
Although the source code file is large by Wren library standards (over 1900 lines), approximately 1600 lines of this are needed to describe the property table which provides the raw material for text segmentation. In the interests of brevity, I have omitted the comments which accompanied the original table which should be referred to if any explanation is needed.
 
(Currently, I amtried unablerepeatedly to uploadpost the source code for this module, orto evenRC but was unable to previewdo it,so as I- keepkept getting a 502 'bad gateway' error. I wonderedsuspect atthis firstis ifsomething thisto wasdo due towith the size of the filecode (circa 75K bytes) but I get the same errorthough, ifwhen I trytried to uploadpost it in chunks. Will keep trying but may have to upload to an external site and then link to, that ifdidn't thework problem persistseither.)
 
Anyway, I've now posted the code to [https://gist.github.com/PureFox48/00fad9b48a0b80445d622a1ef18e4285 this GitHub gist] and may do the same in future with my larger submissions.

Revision as of 08:20, 17 July 2020

User-perceived characters

In Unicode a user-perceived character (or grapheme cluster) can comprise one or more codepoints and the process of splitting a string into such grapheme clusters is described in Unicode Standard Annex #29.

Given the complexity of this process, Wren doesn't have built-in support for it and this module aims to remedy that situation. It is based on Oliver Kuederle's Unicode Text Segmentation for Go library which is subject to the MIT License and is currently based on Unicode version 12.0.

Although the source code file is large by Wren library standards (over 1900 lines), approximately 1600 lines of this are needed to describe the property table which provides the raw material for text segmentation. In the interests of brevity, I have omitted the comments which accompanied the original table which should be referred to if any explanation is needed.

I tried repeatedly to post the source code for this module to RC but was unable to do so - kept getting a 502 'bad gateway' error. I suspect this is something to do with the size of the code (circa 75K bytes) though, when I tried to post it in chunks, that didn't work either.

Anyway, I've now posted the code to this GitHub gist and may do the same in future with my larger submissions.