Talk:Merge and aggregate datasets: Difference between revisions
Line 17: | Line 17: | ||
:Loading from .csv file is shortest code and more practical, but for quickly copying and testing the code examples the hard-coded data is easier. So, when possible, I try to include both, and then comment out the .csv load lines, see the Python example code. --[[User:BdR|BdR]] ([[User talk:BdR|talk]]) 21:58, 7 December 2020 (UTC) |
:Loading from .csv file is shortest code and more practical, but for quickly copying and testing the code examples the hard-coded data is easier. So, when possible, I try to include both, and then comment out the .csv load lines, see the Python example code. --[[User:BdR|BdR]] ([[User talk:BdR|talk]]) 21:58, 7 December 2020 (UTC) |
||
::Agreed, I find it very useful to have demos that "just run", and like you I usually add comments that show how to read the exact same stuff from a file. I have also added some links to related tasks. --[[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 07:25, 8 December 2020 (UTC) |
::Agreed, I find it very useful to have demos that "just run", and like you I usually add comments that show how to read the exact same stuff from a file. I have also added some links to related tasks. --[[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 07:25, 8 December 2020 (UTC) |
||
::Of course that "just run" is more than just a little bit handy for repl-it, tio, and the like, iftm. --[[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 02:35, 10 December 2020 (UTC) |
Revision as of 02:35, 10 December 2020
Duplication of task goals if not task name
So... this task is pretty much an exact duplicate of CSV data manipulation which has been around for 7+ years and has some 85 entries. Admittedly this task has slightly better defined goals and is less trivial, but a large percentage of the code from there could be lifted and used unchanged here.
Some overlap of tasks is inevitable, and honestly I think this one is probably more useful to demonstrate working with real-world data than the other. I hesitate to make any unilateral decisions (unlike with the recent deluge of "Find words containing whatever" tasks that we've been hit with,) but I also don't want to needlessly proliferate trivial variations. Thoughts? --Thundergnat (talk) 19:16, 7 December 2020 (UTC)
- Missing fields in the CSV files. There might be a lot of overlap, but no "exact duplication", and handling of missing fields, although not highlighted, is a significant difference I think. --Paddy3118 (talk) 19:40, 7 December 2020 (UTC)
- My motivation to submit this task was that I recently was working with R-script for the first time. I'm reasonably experienced with programming but had quite a hard time getting it to work.
- The examples and tutorials on stackoverflow and other places are generally either too trivial, or too specific for one exact use-case. Merging, grouping and aggregating different datasets is a very common thing I encounter a lot for my work.
..."two datasets as provided in .csv files"...
Many examples don't read the csv from files. --Paddy3118 (talk) 19:42, 7 December 2020 (UTC)
- Loading from .csv file is shortest code and more practical, but for quickly copying and testing the code examples the hard-coded data is easier. So, when possible, I try to include both, and then comment out the .csv load lines, see the Python example code. --BdR (talk) 21:58, 7 December 2020 (UTC)
- Agreed, I find it very useful to have demos that "just run", and like you I usually add comments that show how to read the exact same stuff from a file. I have also added some links to related tasks. --Pete Lomax (talk) 07:25, 8 December 2020 (UTC)
- Of course that "just run" is more than just a little bit handy for repl-it, tio, and the like, iftm. --Pete Lomax (talk) 02:35, 10 December 2020 (UTC)