Hash join
Hash join
You are encouraged to solve this task according to the task description, using any language you may know.
You are encouraged to solve this task according to the task description, using any language you may know.
The classic hash join algorithm for an inner join of two relations has the following steps:
- Hash phase : Creating a hash table for one of the two relations by applying a hash function to the join attribute of each row. Ideally we should create a hash table for the smaller relation. Thus, optimizing for creation time and memory size of the hash table.
- Join phase : Scanning the larger relation and finding the relevant rows by looking in the hash table created before.
The algorithm is as follows:
for each tuple s in S do { hash on join attributes s(b) place tuples in hash table based on hash values}; for each tuple r do { hash on join attributes r(a) if r hashes in a nonempty bucket of hash table for S then {if r hash key matches any s in bucket concatenate r and s place relation in Q}};
Implement the Hash Join algorithm in your programming language (optionally providing a test case as well).
Haskell
The ST monad allows us to utilise mutable memory behind a referentially transparent interface, with the s type variable prevent the mutable state being modifying by more than one thread. <lang Haskell>import qualified Data.HashTable.ST.Basic as H import Data.Hashable import Control.Monad.ST import Control.Monad import Data.STRef
hashJoin :: (Eq k, Hashable k) =>
[t] -> (t -> k) -> [a] -> (a -> k) -> [(t, a)]
hashJoin xs fx ys fy = runST $ do
l <- newSTRef [] ht <- H.new forM_ ys $ \y -> H.insert ht (fy y) =<< (H.lookup ht (fy y) >>= \case Nothing -> return [y] Just v -> return (y:v)) forM_ xs $ \x -> do H.lookup ht (fx x) >>= \case Nothing -> return () Just v -> modifySTRef' l ((map (x,) v) ++) readSTRef l
test = mapM_ print $ hashJoin
[(1, "Jonah"), (2, "Alan"), (3, "Glory")] snd [("Jonah", "Whales"), ("Jonah", "Spiders"), ("Alan", "Ghosts"), ("Alan", "Zombies"), ("Glory", "Buffy")] fst
</lang>
λ> test ((3,"Glory"),("Glory","Buffy")) ((2,"Alan"),("Alan","Zombies")) ((2,"Alan"),("Alan","Ghosts")) ((1,"Jonah"),("Jonah","Spiders")) ((1,"Jonah"),("Jonah","Whales"))