Talk:Merge and aggregate datasets: Difference between revisions

 
(12 intermediate revisions by 4 users not shown)
Line 5:
 
:Missing fields in the CSV files. There might be a lot of overlap, but no "exact duplication", and handling of missing fields, although not highlighted, is a significant difference I think. --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 19:40, 7 December 2020 (UTC)
 
::My motivation to submit this task was that I recently was working with R-script for the first time. I'm reasonably experienced with programming but had quite a hard time getting it to work.
 
::The examples and tutorials on stackoverflow and other places are generally either too trivial, or too specific for one exact use-case. Merging, grouping and aggregating different datasets is a very common thing I encounter a lot for my work.
 
::So that's why I submitted this task (after also asking [https://www.reddit.com/r/datascience/comments/jyum95/a_hello_world_type_example_for_aggregating/ here]), and made sure to include the most common "hurdles", like missing records, missing values, multiple aggregator functions at once, working with date values and unorderd source files. --[[User:BdR|BdR]] ([[User talk:BdR|talk]]) 22:49, 7 December 2020 (UTC)
 
== ..."two datasets as provided in .csv files"... ==
Many examples don't read the csv from files. --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 19:42, 7 December 2020 (UTC)
 
:Loading from .csv file is shortest code and more practical, but for quickly copying and testing the code examples the hard-coded data is easier. So, when possible, I try to include both, and then comment out the .csv load lines, see the Python example code. --[[User:BdR|BdR]] ([[User talk:BdR|talk]]) 21:58, 7 December 2020‎ (UTC)
::Agreed, I find it very useful to have demos that "just run", and like you I usually add comments that show how to read the exact same stuff from a file. I have also added some links to related tasks. --[[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 07:25, 8 December 2020 (UTC)
::Of course that "just run" is more than just a little bit handy for repl-it, tio, and the like. --[[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 02:35, 10 December 2020 (UTC)
:The task says "Either load the data from the .csv files or create the required data structures hard-coded." so I took that to mean it wasn't required. The current implementations cover the full spectrum. Go, SQL, Wren, and now C++ took the hard-coded approach. Perl and Raku parse a text block. Julia, Phix, and R work as-if they are reading a file. Python, REXX, and SPSS actually do read .csv files. To me the interesting part of this task is combining the tables - I think this is the only task to do that. Reading from a .csv is covered by the [[CSV data manipulation]] task. What should be required?
[[User:Garbanzo|Garbanzo]] ([[User talk:Garbanzo|talk]]) 03:49, 5 January 2021 (UTC)
 
== Cleaned Note ==
I removed the note about generalized programming languages. The solutions may not be as clean as a specialized language but it should still be possible. I also hard coded the data for the C++ entry. To me, the interesting part of this task is joining two tables and dealing with nulls.
[[User:Garbanzo|Garbanzo]] ([[User talk:Garbanzo|talk]]) 03:05, 4 January 2021 (UTC)
 
: If you don't complete the task by being able to read the files, then the C++ solution is not as comparable to the solutions that implement the task. Yes it is setup, but reading from csv files is a pretty common way of getting data for your "interesting bits".
:If a very well known and easy to use source of C++ libraries, (Boost?), has a csv reader then you could employ that, but I'm not a great C++ programmer. --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 14:29, 4 January 2021 (UTC)
 
::'''My apologies''' - The task description ''does'' allow input from other than .csv files. --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 13:07, 5 January 2021 (UTC)
::: Thanks. The description is more clear now. [[User:Garbanzo|Garbanzo]] ([[User talk:Garbanzo|talk]]) 06:07, 6 January 2021 (UTC)
125

edits