Talk:WiktionaryDumps to words: Difference between revisions
m (→A common task: <br>) |
|||
(9 intermediate revisions by 3 users not shown) | |||
Line 11: | Line 11: | ||
:::: I would need that explaining to me. How does it quit after 1 or 2 megas and how does it tell wget|bzcat| to quit? --[[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 09:57, 10 December 2020 (UTC) |
:::: I would need that explaining to me. How does it quit after 1 or 2 megas and how does it tell wget|bzcat| to quit? --[[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 09:57, 10 December 2020 (UTC) |
||
::::: On Linux I just use '''Ctrl C''' to terminate all the commands (all the piped programs are terminated at the same time). On Windows, under Cygwin, I just do the same. I think this is the same too on MacOS. - [[User:Blue Prawn|Blue Prawn]] ([[User talk:Blue Prawn|talk]]) 12:50, 10 December 2020 (UTC) |
::::: On Linux I just use '''Ctrl C''' to terminate all the commands (all the piped programs are terminated at the same time). On Windows, under Cygwin, I just do the same. I think this is the same too on MacOS. - [[User:Blue Prawn|Blue Prawn]] ([[User talk:Blue Prawn|talk]]) 12:50, 10 December 2020 (UTC) |
||
::: Also you don't really have to download 800 megabytes on your hard drive, you can just read it from a stream. [[User:Blue Prawn|Blue Prawn]] ([[User talk:Blue Prawn|talk]]) 13:15, 10 December 2020 (UTC) |
|||
::: I too have some questions. |
::: I too have some questions. |
||
Line 29: | Line 30: | ||
https://unix.stackexchange.com/questions/48939/add-new-language-to-usr-share-dict-words<br> |
https://unix.stackexchange.com/questions/48939/add-new-language-to-usr-share-dict-words<br> |
||
The '''wordlist''' package in Debian don't seem to provide that many languages:<br> |
The '''wordlist''' package in Debian don't seem to provide that many languages:<br> |
||
https://packages.debian.org/ |
https://packages.debian.org/en/sid/wordlist<br> |
||
If we modify the ocaml script replacing "==French==" by "==Indonesian==" we can produce the word list for the Indonesian language quite easily.<br> |
If we modify the ocaml script replacing "==French==" by "==Indonesian==" we can produce the word list for the Indonesian language quite easily.<br> |
||
-- [[User:Blue Prawn|Blue Prawn]] ([[User talk:Blue Prawn|talk]]) 13:10, 10 December 2020 (UTC) |
-- [[User:Blue Prawn|Blue Prawn]] ([[User talk:Blue Prawn|talk]]) 13:10, 10 December 2020 (UTC) |
||
== Edit? == |
|||
Hi, it's already 10 days no-one discusses anymore. <br> |
|||
Can we allow adding new languages now? <br> |
|||
[[User:Blue Prawn|Blue Prawn]] ([[User talk:Blue Prawn|talk]]) 18:04, 20 December 2020 (UTC) |
|||
:That is more a reason to remove the "dumped" task altogrther as the original author doesn't seem to want to address these comments. --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 00:59, 21 December 2020 (UTC) |
|||
::Hi Paddy, Sorry English is not my born language, so I'm not sure what you mean by [the "dumped" task]. |
|||
::Do you mean that this task is too simple because it's only about the act of dumping selected content from the input? |
|||
::(I checked https://en.wiktionary.org/wiki/dump and try to see which definition would match the best, hopping that it's not definition 1, 7 or 8 which are quite pejorative.) |
|||
::I do want to address the comments, but I already answered it all, and the discussion stopped after that. In French we say [https://en.wiktionary.org/wiki/qui_ne_dit_mot_consent "qui ne dit mot consent"] (silence is consent) so I thought that they now agree. Isn't it the case? |
|||
:: [[User:Blue Prawn|Blue Prawn]] ([[User talk:Blue Prawn|talk]]) |
|||
== Download 800MB to spell check a document??!! == |
|||
Maybe you can key Ctrl-C, maybe you only got half the language, and what if it's right at the end of the file? --[[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 00:35, 16 February 2021 (UTC) |
|||
:Managed to get 5 words out of the first 240K, and then terminate download/unpack cleanly without having to key Ctrl-C. --[[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 23:52, 13 April 2021 (UTC) |
Latest revision as of 23:53, 13 April 2021
Too vague
"Demonstrate how your language can handle this dump"? How?
You need to write a task where all examples are doing one shared thing that is comparable as a feature of those languages implementation of the task. If you mean to highlight one type of XML handling over another then this doesn't do it, for example. --Paddy3118 (talk) 10:00, 9 December 2020 (UTC)
- The task, as explained, is to create a file equivalent than "/usr/share/dict/french" (output), using the wiktionary dump as input. Blue Prawn (talk) 19:27, 9 December 2020 (UTC)
- I have no desire to download an 800 megabyte compressed file for a Rosetta Code task that is who-knows-how-large uncompressed. Surely the task doesn't need to use a file that large. --Chunes (talk) 20:41, 9 December 2020 (UTC)
- You don't need to do so. Please see the OCaml example that only donwloads the first 1 or 2 megas. Blue Prawn (talk) 09:13, 10 December 2020 (UTC)
- I would need that explaining to me. How does it quit after 1 or 2 megas and how does it tell wget|bzcat| to quit? --Pete Lomax (talk) 09:57, 10 December 2020 (UTC)
- On Linux I just use Ctrl C to terminate all the commands (all the piped programs are terminated at the same time). On Windows, under Cygwin, I just do the same. I think this is the same too on MacOS. - Blue Prawn (talk) 12:50, 10 December 2020 (UTC)
- I would need that explaining to me. How does it quit after 1 or 2 megas and how does it tell wget|bzcat| to quit? --Pete Lomax (talk) 09:57, 10 December 2020 (UTC)
- Also you don't really have to download 800 megabytes on your hard drive, you can just read it from a stream. Blue Prawn (talk) 13:15, 10 December 2020 (UTC)
- You don't need to do so. Please see the OCaml example that only donwloads the first 1 or 2 megas. Blue Prawn (talk) 09:13, 10 December 2020 (UTC)
- I have no desire to download an 800 megabyte compressed file for a Rosetta Code task that is who-knows-how-large uncompressed. Surely the task doesn't need to use a file that large. --Chunes (talk) 20:41, 9 December 2020 (UTC)
- I too have some questions.
- What does wiktionary have to do with the task? Would any XML encoded word list do? If so, why does the task name include wiktionary?
- Because I found it interesting to do something with the wiktionary, as I explained on the Village Pump page. - Blue Prawn (talk) 09:18, 10 December 2020 (UTC)
- Also a word list is available for French with "/usr/share/dict/french", but I don't think that it's available for every languages, and the Wiktionary could be a good source for generating these files. If I understood correctly these words files are useful for spell checking. Blue Prawn (talk) 12:55, 10 December 2020 (UTC)
- Because I found it interesting to do something with the wiktionary, as I explained on the Village Pump page. - Blue Prawn (talk) 09:18, 10 December 2020 (UTC)
- Is the task supposed to show how to download and extract a large file in your particular language? The reference implementation just shells out and uses other tools.
- The task is still a draft, if you think the download and uncompressed parts should be in the language, we can update the task. (and I will updated the ocaml too.) - Blue Prawn (talk) 09:18, 10 December 2020 (UTC)
- If the task is just extract a certain group of entries from an XML file, how does it differ significantly from XML/XPath?
- --Thundergnat (talk) 21:57, 9 December 2020 (UTC)
- Because we can not use the DOM method to parse 800MB of XML, we need to use the SAX method then. Most languages provide 2 different API for SAX and DOM XML parsing, but maybe not all. Blue Prawn (talk) 09:18, 10 December 2020 (UTC)
A common task
You can see on this post that some people are wondering how to do this task:
https://unix.stackexchange.com/questions/48939/add-new-language-to-usr-share-dict-words
The wordlist package in Debian don't seem to provide that many languages:
https://packages.debian.org/en/sid/wordlist
If we modify the ocaml script replacing "==French==" by "==Indonesian==" we can produce the word list for the Indonesian language quite easily.
-- Blue Prawn (talk) 13:10, 10 December 2020 (UTC)
Edit?
Hi, it's already 10 days no-one discusses anymore.
Can we allow adding new languages now?
Blue Prawn (talk) 18:04, 20 December 2020 (UTC)
- That is more a reason to remove the "dumped" task altogrther as the original author doesn't seem to want to address these comments. --Paddy3118 (talk) 00:59, 21 December 2020 (UTC)
- Hi Paddy, Sorry English is not my born language, so I'm not sure what you mean by [the "dumped" task].
- Do you mean that this task is too simple because it's only about the act of dumping selected content from the input?
- (I checked https://en.wiktionary.org/wiki/dump and try to see which definition would match the best, hopping that it's not definition 1, 7 or 8 which are quite pejorative.)
- I do want to address the comments, but I already answered it all, and the discussion stopped after that. In French we say "qui ne dit mot consent" (silence is consent) so I thought that they now agree. Isn't it the case?
- Blue Prawn (talk)
Download 800MB to spell check a document??!!
Maybe you can key Ctrl-C, maybe you only got half the language, and what if it's right at the end of the file? --Pete Lomax (talk) 00:35, 16 February 2021 (UTC)
- Managed to get 5 words out of the first 240K, and then terminate download/unpack cleanly without having to key Ctrl-C. --Pete Lomax (talk) 23:52, 13 April 2021 (UTC)