Hash join
Hash join is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.
The classic hash join algorithm for an inner join of two relations has the following steps:
- Hash phase : Creating a hash table for one of the two relations by applying a hash function to the join attribute of each row. Ideally we should create a hash table for the smaller relation. Thus, optimizing for creation time and memory size of the hash table.
- Join phase : Scanning the larger relation and finding the relevant rows by looking in the hash table created before.
The algorithm is as follows:
for each tuple s in S do { hash on join attributes s(b) place tuples in hash table based on hash values}; for each tuple r do { hash on join attributes r(a) if r hashes in a nonempty bucket of hash table for S then {if r hash key matches any s in bucket concatenate r and s place relation in Q}};
Implement the Hash Join algorithm in your programming language (optionally providing a test case as well).