Anonymous user
Hash join: Difference between revisions
Rewrite the task description for clarity. Doesn't change the task content, except that the join column in table B has been renamed from "Name" to "Character".
(→{{header|Perl 6}}: correct code to handle non-unique keys in both tables) |
(Rewrite the task description for clarity. Doesn't change the task content, except that the join column in table B has been renamed from "Name" to "Character".) |
||
Line 1:
{{task}}
An [[wp:Join_(SQL)#Inner_join|inner join]] is an operation that combines two data tables into one table, based on matching column values. The simplest way of implementing this operation is the [[wp:Nested loop join|nested loop join]] algorithm, but a more scalable alternative is the [[wp:hash join|hash join]] algorithm.
{{introheader|Task}}
Implement the "hash join" algorithm, and demonstrate that it passes the test-case listed below.
You should represent the tables as data structures that feel natural in your programming language.
{{introheader|Guidance}}
The "hash join" algorithm consists of two steps:
# '''Hash phase:''' Create a [[wp:Multimap|multimap]] from one of the two tables, mapping from each join column value to all the rows that contain it.<br>
#* The multimap must support hash-based lookup which scales better than a simple linear search, because that's the whole point of this algorithm.
#* Ideally we should create the multimap for the ''smaller'' table, thus minimizing its creation time and memory size.
# '''Join phase:''' Scan the other table, and find matching rows by looking in the multimap created before.
<br>
'''let''' ''A'' = the first input table (or ideally, the larger one)
'''let''' ''B'' = the second input table (or ideally, the smaller one)
'''let''' ''j<sub>A</sub>'' = the join column ID of table ''A''
'''let''' ''j<sub>B</sub>'' = the join column ID of table ''B''
'''let''' ''M<sub>B</sub>'' = a multimap for mapping from single values to multiple rows of table ''B'' (starts out empty)
'''let''' ''C'' = the output table (starts out empty)
'''for each''' row ''b'' '''in''' table ''B'' '''do'''
'''place''' ''b'' '''in''' multimap ''M<sub>B</sub>'' under key ''b''(''j<sub>B</sub>'')
'''for each''' row ''a'' '''in''' table ''A'' '''do'''
'''for each''' row ''b'' '''in''' multimap ''M<sub>B</sub>'' under key ''a''(''j<sub>A</sub>'')
'''let''' ''c'' = the concatenation of row ''a'' and row ''b''
'''place''' ''c'' in table ''C''
{{introheader|Test-case}}
{| class="wikitable"
|-
! Input
! Output
|-
|
{| style="border:none; border-collapse:collapse;"
|-
| style="border:none" | '''<math>A</math>''' =
| style="border:none" |
{| class="wikitable"
|-
! Age !! Name
|-
| 27 || Jonah
|-
| 18 || Alan
|-
| 28 || Glory
|-
| 18 || Popeye
|-
| 28 || Alan
|}
| style="border:none; padding-left:1.5em;" rowspan="2" |
| style="border:none" | <math>B</math> =
| style="border:none" |
{| class="wikitable"
|-
! Character !! Nemesis
|-
| Jonah || Whales
|-
| Jonah || Spiders
|-
| Alan || Ghosts
|-
| Alan || Zombies
|-
| Glory || Buffy
|}
|-
| style="border:none" | <math>j_A</math> =
| style="border:none" | <code>Name</code> (i.e. column 1)
| style="border:none" | <math>j_B</math> =
| style="border:none" | <code>Character</code> (i.e. column 0)
|}
|
{| class="wikitable" style="margin-left:1em"
|-
! A.Age !! A.Name !! B.Character !! B.Nemesis
|-
| 27 || Jonah || Jonah || Whales
|-
| 27 || Jonah || Jonah || Spiders
|-
| 18 || Alan || Alan || Ghosts
|-
| 18 || Alan || Alan || Zombies
|-
| 28 || Glory || Glory || Buffy
|-
| 28 || Alan || Alan || Ghosts
|-
| 28 || Alan || Alan || Zombies
|}
|}
The order of the rows in the output table is not significant.<br>
If you're using numerically indexed arrays to represent table rows (rather than referring to columns by name), you could represent the output rows in the form <code style="white-space:nowrap">[[27, "Jonah"], ["Jonah", "Whales"]]</code>.
<br><hr>
=={{header|Bracmat}}==
|