Talk:Merge and aggregate datasets: Difference between revisions

← Older edit

Talk:Merge and aggregate datasets (view source)

Revision as of 06:07, 6 January 2021

3,599 bytes added , 3 years ago

→‎Cleaned Note

Garbanzo

125

edits

Revision as of 19:42, 7 December 2020 (view source) rosettacode>Paddy3118 (→‎..."two datasets as provided in .csv files"...) ← Older edit		Latest revision as of 06:07, 6 January 2021 (view source) Garbanzo (talk \| contribs) (→‎Cleaned Note)
(12 intermediate revisions by 4 users not shown)
Line 5: :Missing fields in the CSV files. There might be a lot of overlap, but no "exact duplication", and handling of missing fields, although not highlighted, is a significant difference I think. --[[User:Paddy3118\|Paddy3118]] ([[User talk:Paddy3118\|talk]]) 19:40, 7 December 2020 (UTC) ::My motivation to submit this task was that I recently was working with R-script for the first time. I'm reasonably experienced with programming but had quite a hard time getting it to work. ::The examples and tutorials on stackoverflow and other places are generally either too trivial, or too specific for one exact use-case. Merging, grouping and aggregating different datasets is a very common thing I encounter a lot for my work. ::So that's why I submitted this task (after also asking [https://www.reddit.com/r/datascience/comments/jyum95/a_hello_world_type_example_for_aggregating/ here]), and made sure to include the most common "hurdles", like missing records, missing values, multiple aggregator functions at once, working with date values and unorderd source files. --[[User:BdR\|BdR]] ([[User talk:BdR\|talk]]) 22:49, 7 December 2020 (UTC) == ..."two datasets as provided in .csv files"... == Many examples don't read the csv from files. --[[User:Paddy3118\|Paddy3118]] ([[User talk:Paddy3118\|talk]]) 19:42, 7 December 2020 (UTC) :Loading from .csv file is shortest code and more practical, but for quickly copying and testing the code examples the hard-coded data is easier. So, when possible, I try to include both, and then comment out the .csv load lines, see the Python example code. --[[User:BdR\|BdR]] ([[User talk:BdR\|talk]]) 21:58, 7 December 2020‎ (UTC) ::Agreed, I find it very useful to have demos that "just run", and like you I usually add comments that show how to read the exact same stuff from a file. I have also added some links to related tasks. --[[User:Petelomax\|Pete Lomax]] ([[User talk:Petelomax\|talk]]) 07:25, 8 December 2020 (UTC) ::Of course that "just run" is more than just a little bit handy for repl-it, tio, and the like. --[[User:Petelomax\|Pete Lomax]] ([[User talk:Petelomax\|talk]]) 02:35, 10 December 2020 (UTC) :The task says "Either load the data from the .csv files or create the required data structures hard-coded." so I took that to mean it wasn't required. The current implementations cover the full spectrum. Go, SQL, Wren, and now C++ took the hard-coded approach. Perl and Raku parse a text block. Julia, Phix, and R work as-if they are reading a file. Python, REXX, and SPSS actually do read .csv files. To me the interesting part of this task is combining the tables - I think this is the only task to do that. Reading from a .csv is covered by the [[CSV data manipulation]] task. What should be required? [[User:Garbanzo\|Garbanzo]] ([[User talk:Garbanzo\|talk]]) 03:49, 5 January 2021 (UTC) == Cleaned Note == I removed the note about generalized programming languages. The solutions may not be as clean as a specialized language but it should still be possible. I also hard coded the data for the C++ entry. To me, the interesting part of this task is joining two tables and dealing with nulls. [[User:Garbanzo\|Garbanzo]] ([[User talk:Garbanzo\|talk]]) 03:05, 4 January 2021 (UTC) : If you don't complete the task by being able to read the files, then the C++ solution is not as comparable to the solutions that implement the task. Yes it is setup, but reading from csv files is a pretty common way of getting data for your "interesting bits". :If a very well known and easy to use source of C++ libraries, (Boost?), has a csv reader then you could employ that, but I'm not a great C++ programmer. --[[User:Paddy3118\|Paddy3118]] ([[User talk:Paddy3118\|talk]]) 14:29, 4 January 2021 (UTC) ::'''My apologies''' - The task description ''does'' allow input from other than .csv files. --[[User:Paddy3118\|Paddy3118]] ([[User talk:Paddy3118\|talk]]) 13:07, 5 January 2021 (UTC) ::: Thanks. The description is more clear now. [[User:Garbanzo\|Garbanzo]] ([[User talk:Garbanzo\|talk]]) 06:07, 6 January 2021 (UTC)