Hash join: Difference between revisions

Hash join (view source)

Revision as of 11:04, 14 August 2016

1,694 bytes added , 7 years ago

Rewrite the task description for clarity. Doesn't change the task content, except that the join column in table B has been renamed from "Name" to "Character".

Anonymous user

rosettacode>Smls

Revision as of 08:17, 14 August 2016 (view source) rosettacode>Smls (→‎{{header\|Perl 6}}: correct code to handle non-unique keys in both tables) ← Older edit		Revision as of 11:04, 14 August 2016 (view source) rosettacode>Smls (Rewrite the task description for clarity. Doesn't change the task content, except that the join column in table B has been renamed from "Name" to "Character".) Newer edit →
Line 1: {{task}} ~~The classic [[wp:Hash Join\|hash join]] algorithm for an inner join of two relations has the following steps:~~ An [[wp:Join_(SQL)#Inner_join\|inner join]] is an operation that combines two data tables into one table, based on matching column values. The simplest way of implementing this operation is the [[wp:Nested loop join\|nested loop join]] algorithm, but a more scalable alternative is the [[wp:hash join\|hash join]] algorithm. ~~<ul>~~ ~~<li>Hash phase: Create a hash table for one of the two relations by applying a hash~~ {{introheader\|Task}} ~~function to the join attribute of each row. Ideally we should create a hash table for the~~ ~~smaller relation, thus optimizing for creation time and memory size of the hash table.</li>~~ Implement the "hash join" algorithm, and demonstrate that it passes the test-case listed below. ~~<li>Join phase: Scan the larger relation and find the relevant rows by looking in the~~ ~~hash table created before.</li>~~ You should represent the tables as data structures that feel natural in your programming language. ~~</ul>~~ {{introheader\|Guidance}} The "hash join" algorithm consists of two steps: # '''Hash phase:''' Create a [[wp:Multimap\|multimap]] from one of the two tables, mapping from each join column value to all the rows that contain it.<br> #* The multimap must support hash-based lookup which scales better than a simple linear search, because that's the whole point of this algorithm. #* Ideally we should create the multimap for the ''smaller'' table, thus minimizing its creation time and memory size. # '''Join phase:''' Scan the other table, and find matching rows by looking in the multimap created before. <br> ~~The~~In pseudo-code, the algorithm iscould be expressed as follows: '''let''' ''A'' = the first input table (or ideally, the larger one) '''let''' ''B'' = the second input table (or ideally, the smaller one) '''let''' ''j<sub>A</sub>'' = the join column ID of table ''A'' '''let''' ''j<sub>B</sub>'' = the join column ID of table ''B'' '''let''' ''M<sub>B</sub>'' = a multimap for mapping from single values to multiple rows of table ''B'' (starts out empty) '''let''' ''C'' = the output table (starts out empty) '''for each''' row ''b'' '''in''' table ''B'' '''do''' '''place''' ''b'' '''in''' multimap ''M<sub>B</sub>'' under key ''b''(''j<sub>B</sub>'') '''for each''' row ''a'' '''in''' table ''A'' '''do''' '''for each''' row ''b'' '''in''' multimap ''M<sub>B</sub>'' under key ''a''(''j<sub>A</sub>'') '''let''' ''c'' = the concatenation of row ''a'' and row ''b'' '''place''' ''c'' in table ''C'' {{introheader\|Test-case}} {\| class="wikitable" \|- ! Input ! Output \|- \| {\| style="border:none; border-collapse:collapse;" \|- \| style="border:none" \| '''<math>A</math>''' = \| style="border:none" \| {\| class="wikitable" \|- ! Age !! Name \|- \| 27 \|\| Jonah \|- \| 18 \|\| Alan \|- \| 28 \|\| Glory \|- \| 18 \|\| Popeye \|- \| 28 \|\| Alan \|} \| style="border:none; padding-left:1.5em;" rowspan="2" \| \| style="border:none" \| <math>B</math> = \| style="border:none" \| {\| class="wikitable" \|- ! Character !! Nemesis \|- \| Jonah \|\| Whales \|- \| Jonah \|\| Spiders \|- \| Alan \|\| Ghosts \|- \| Alan \|\| Zombies \|- \| Glory \|\| Buffy \|} \|- \| style="border:none" \| <math>j_A</math> = \| style="border:none" \| <code>Name</code> (i.e. column 1) \| style="border:none" \| <math>j_B</math> = \| style="border:none" \| <code>Character</code> (i.e. column 0) \|} \| ~~'''for each''' tuple ''s'' '''in''' ''S'' '''do'''~~ ~~'''let''' ''h'' = hash on join attributes ''s''(b)~~ ~~'''place''' ''s'' '''in''' hash table ''S<sub>h</sub>'' '''in''' bucket '''keyed by''' hash value ''h''~~ ~~'''for each''' tuple ''r'' '''in''' ''R'' '''do'''~~ ~~'''let''' ''h'' = hash on join attributes ''r''(a)~~ ~~'''if''' ''h'' indicates a nonempty bucket (''B'') of hash table ''S<sub>h</sub>''~~ ~~'''if''' ''h'' matches any ''s'' in ''B''~~ ~~'''concatenate''' ''r'' and ''s''~~ ~~'''place''' relation in ''Q''~~ {\| class="wikitable" style="margin-left:1em" \|- ! A.Age !! A.Name !! B.Character !! B.Nemesis \|- \| 27 \|\| Jonah \|\| Jonah \|\| Whales \|- \| 27 \|\| Jonah \|\| Jonah \|\| Spiders \|- \| 18 \|\| Alan \|\| Alan \|\| Ghosts \|- \| 18 \|\| Alan \|\| Alan \|\| Zombies \|- \| 28 \|\| Glory \|\| Glory \|\| Buffy \|- \| 28 \|\| Alan \|\| Alan \|\| Ghosts \|- \| 28 \|\| Alan \|\| Alan \|\| Zombies \|} \|} ~~;Task:~~ ~~Implement the Hash Join algorithm and show the result of joining two tables with it.~~ The order of the rows in the output table is not significant.<br> ~~You should use your implementation to show the joining of these tables:~~ If you're using numerically indexed arrays to represent table rows (rather than referring to columns by name), you could represent the output rows in the form <code style="white-space:nowrap">[[27, "Jonah"], ["Jonah", "Whales"]]</code>. <br><hr> ~~<table style="border-collapse:separate; border-spacing:2em 0;"><tr><td><table border>~~ ~~<tr><th>Age</th><th>Name</th></tr>~~ ~~<tr><td>27</td><td>Jonah</td></tr>~~ ~~<tr><td>18</td><td>Alan</td></tr>~~ ~~<tr><td>28</td><td>Glory</td></tr>~~ ~~<tr><td>18</td><td>Popeye</td></tr>~~ ~~<tr><td>28</td><td>Alan</td></tr>~~ ~~</table></td><td><table border>~~ ~~<tr><th>Name</th><th>Nemesis</th></tr>~~ ~~<tr><td>Jonah</td><td>Whales</td></tr>~~ ~~<tr><td>Jonah</td><td>Spiders</td></tr>~~ ~~<tr><td>Alan</td><td>Ghosts</td></tr>~~ ~~<tr><td>Alan</td><td>Zombies</td></tr>~~ ~~<tr><td>Glory</td><td>Buffy</td></tr>~~ ~~</table></td></tr></table>~~ ~~<br><br>~~ =={{header\|Bracmat}}==