Talk:Merge and aggregate datasets
Duplication of task goals if not task name
So... this task is pretty much an exact duplicate of CSV data manipulation which has been around for 7+ years and has some 85 entries. Admittedly this task has slightly better defined goals and is less trivial, but a large percentage of the code from there could be lifted and used unchanged here.
Some overlap of tasks is inevitable, and honestly I think this one is probably more useful to demonstrate working with real-world data than the other. I hesitate to make any unilateral decisions (unlike with the recent deluge of "Find words containing whatever" tasks that we've been hit with,) but I also don't want to needlessly proliferate trivial variations. Thoughts? --Thundergnat (talk) 19:16, 7 December 2020 (UTC)
- Missing fields in the CSV files. There might be a lot of overlap, but no "exact duplication", and handling of missing fields, although not highlighted, is a significant difference I think. --Paddy3118 (talk) 19:40, 7 December 2020 (UTC)
- My motivation to submit this task was that I recently was working with R-script for the first time. I'm reasonably experienced with programming but had quite a hard time getting it to work.
- The examples and tutorials on stackoverflow and other places are generally either too trivial, or too specific for one exact use-case. Merging, grouping and aggregating different datasets is a very common thing I encounter a lot for my work.
..."two datasets as provided in .csv files"...
Many examples don't read the csv from files. --Paddy3118 (talk) 19:42, 7 December 2020 (UTC)
- Loading from .csv file is shortest code and more practical, but for quickly copying and testing the code examples the hard-coded data is easier. So, when possible, I try to include both, and then comment out the .csv load lines, see the Python example code. --BdR (talk) 21:58, 7 December 2020 (UTC)
- Agreed, I find it very useful to have demos that "just run", and like you I usually add comments that show how to read the exact same stuff from a file. I have also added some links to related tasks. --Pete Lomax (talk) 07:25, 8 December 2020 (UTC)
- Of course that "just run" is more than just a little bit handy for repl-it, tio, and the like. --Pete Lomax (talk) 02:35, 10 December 2020 (UTC)
- The task says "Either load the data from the .csv files or create the required data structures hard-coded." so I took that to mean it wasn't required. The current implementations cover the full spectrum. Go, SQL, Wren, and now C++ took the hard-coded approach. Perl and Raku parse a text block. Julia, Phix, and R work as-if they are reading a file. Python, REXX, and SPSS actually do read .csv files. To me the interesting part of this task is combining the tables - I think this is the only task to do that. Reading from a .csv is covered by the CSV data manipulation task. What should be required?
Garbanzo (talk) 03:49, 5 January 2021 (UTC)
Cleaned Note
I removed the note about generalized programming languages. The solutions may not be as clean as a specialized language but it should still be possible. I also hard coded the data for the C++ entry. To me, the interesting part of this task is joining two tables and dealing with nulls. Garbanzo (talk) 03:05, 4 January 2021 (UTC)
- If you don't complete the task by being able to read the files, then the C++ solution is not as comparable to the solutions that implement the task. Yes it is setup, but reading from csv files is a pretty common way of getting data for your "interesting bits".
- If a very well known and easy to use source of C++ libraries, (Boost?), has a csv reader then you could employ that, but I'm not a great C++ programmer. --Paddy3118 (talk) 14:29, 4 January 2021 (UTC)