Advanced Associative Structures Hash Tables 1 Outline Hash Function Hash table Open probe addressing Chaining with separate lists Hash Iterator Efficiency of Hash Methods 2 Associative structures Ordered associative structures Binary search tree Unordered associative structures Hash table

Open probing Chaining with separate lists 3 Hashing Hash table Store elements uniquely identified by their key Hash function: take a key as an argument and returns an entry point to the table 4 Using a hash function hf: UN U = set of all possible keys for the type of item N is the integer set Then take hf(Key)%m to get the hash table index m is the table size (number of entries in the

hash table) Collision: hf(Key1) %m = hf(Key2) %m different items map to the same hash table index 5 Example Hash Function h f(2 2 ) = 2 2 h f(4 ) = 4 22 % 7 = 1 0 1 ta b le E n try [1 ] 4% 7=4 2 3 4 ta b le E n try [4 ]

5 6 6 Example Hash Function 22 7 Design Hash functions the hash function is Easy to compute Minimize the collision Uniform distribution of keys over hash table 8 Function objects A function object is an object of a class that behaves like a function Can be created, stored, and destroyed Can have associated data members and operations 9

Hash function: Key is an integer Identity function Use mod operation (mod m) to get hash table index Collision If m=10b, then collisions will occur for all keys that are the same in their rightmost b digits If m=2b, then collisions will occur for all keys that are the same in their rightmost b bits (binary digits) Solution: choose m as a prime number Make sure the table is large enough to reduce the probability of collisions Choose m as the smallest prime number greater than m_min

Mix up the digits in key (eg. MidSquare technique) 10 Hash function: Key is a string Suppose string is c0c1c2cn-1 n 1 Method 1: h ci i 0 n 1 Method 2: h ci B n 1 i i 0 11

Hash function: Key is not numbers or strings Design a custom hash function object type for the key Example 12 Design Hash Tables Two basic approaches to handling collision in hash table Open addressing: when collision occurs, rehash to a different location Separate chaining: colliding elements are stored on a list for each has value 13 Open addressing h: K T -> M K: Key set, {0,1,2,}, T: number of attempts, M {0,1, , m-1} : the range of table size h(k,i) is the hash function for key k on the i-th attempt, each attempt is called a probe.

Linear probe Quadratic probe 14 Hash Table Using Open Probe Addressing Example 0 77 1 0 77 1 0 77 1 0

77 1 1 89 1 1 89 1 1 89 1 1 89 1

2 45 2 2 45 2 2 45 2 3 14 1 3 14

1 3 14 1 35 3 4 35 3 5 76 7 6 94

1 54 1 2 14 3 1 4 4 4 5 5 5 6

94 1 6 94 1 6 94 1 7 7 7 7 8 8

8 8 9 9 9 9 10 54 1 In s ert 54, 77, 94, 89, 14 (a) 10 54 In s ert 45 (b )

1 10 54 In s ert 35 (c) 1 10 In s ert 76 (d ) 15 Hash Table Using Open Probe Addressing insert an item Clustering: Find an item

Remove an item 16 Chaining with separate lists The hash table is defined as an indexed sequence of containers, such as vector or lists Each container, called a bucket, holds a set of data items that has the same table location < b u ck et0 > < b u ck et1 > < b u ck et2 > . . . . < b u c k e t n-1 > 17 Chaining with Separate Lists Example < B uc k e t 0 > 7 7 (1 )

< B uc k e t 1 > 8 9 (1 ) < B uc k e t 2 > 3 5 (1 ) < B uc k e t 3 > 1 4 (1 ) < B uc k e t 4

> < B uc k e t 5 > < B uc k e t 6 > < B uc k e t 7 > < B uc k e t 8 > < B uc k e t

9 > < B uc k et 1 0 > 4 5 (2 ) 9 4 (1 ) 5 4 (1 ) 7 6 (2 ) 18 Hash table performance Chaining: Unsuccessful search Successful search 19

Efficiency of Hash Methods Hash table size = m, Number of elements in n hash table = n, Load factor = m Average Probes for Successful Search Open (linear) Probe Chainin g Average Probes for Unsuccessful Search 1 1 2 2(1 ) 1 1

2 2(1 ) 2 1 1 2 2m 20 The hash class Implementation of hashing by using chaining with separate lists Stree class implements ordered sets and maps Hash class implement unordered sets and maps Applications 21

Hash Iterator hIter Referencing Element 22 in Table ht h a s h < in t , h F in t I D > h t ; h a s h < in t , h F in t I D > ::it e r a t o r h I t e r ; h a s h T a b le = h Iter & h t cu rren tB u ck et= 2 * h Iter = 2 2 . cu rren tL o c h f(x) = x b u ck ets [0 ] 1 0 em p ty b u ck ets [1 ] h t b u ck ets [2 ] 2

em p ty b u ck ets [3 ] b u ck ets [4 ] 2 2 2 9 22 Comparing search algorithm Sequential search Binary search Binary search tree hashing 23 Binary Search Tree, Red-Black Tree and AVL Tree Example 50 100 60 70 75

90 60 70 80 50 60 90 70 80 100 90 50 100 78 75

78 B in a r y S e a r c h T r e e ( a ) 78 R e d - B la c k T r e e ( b ) 75 80 A V L T ree (c) 24 Two Binary Search Tree Example 5, 15, 20, 3, 9, 7, 12, 17, 6, 75, 100, 18, 25, 35, 40 D ep th = 4 A v e r a g e c o m p a r is o n s p e r s e a r c h = 3 .4 7 D ep th = 6 A v e r a g e c o m p a r is o n s p e r s e a r c h = 4 .0 5 9

3 5 1 5 2 0 9 7 1 2 3 1 7 6 7 5 1 8 2 0 7

6 1 5 7 5 1 8 1 0 0 2 5 1 0 0 3 5 1 7 1 2 2 5 4 0 3 5 (a)

(b ) 4 0 25 Summary Slide 1 - Hash Table - simulates the fastest searching technique, knowing the index of the required value in a vector and array and apply the index to access the value, by applying a hash function that converts the data to an integer - After obtaining an index by dividing the value from the hash function by the table size and taking the remainder, access the table. Normally, the number of elements in the table is much smaller than the number of distinct data values, so collisions occur. - To handle collisions, we must place a value that collides with an existing table element into the table in such a way that we can efficiently access it later. 26 26 Summary Slide 2 - Hash Table (Cont)

- average running time for a search of a hash table is O(1) - the worst case is O(n) 27 27 Summary Slide 3 - Collision Resolution - Two types: 1) linear open probe addressing - the table is a vector or array of static size - After using the hash function to compute a table index, look up the entry in the table. - If the values match, perform an update if necessary. - If the table entry is empty, insert the value in the table. 28 28 Summary Slide 4 - Collision Resolution (Cont)

- Two types: 1) linear open probe addressing - Otherwise, probe forward circularly, looking for a match or an empty table slot. - If the probe returns to the original starting point, the table is full. - you can search table items that hashed to different table locations. - Deleting an item difficult. 29 29 Summary Slide 5 - Collision Resolution (Cont) 2) chaining with separate lists. - the hash table is a vector of list objects - Each list is a sequence of colliding items. - After applying the hash function to compute the table index, search the list for the data value. - If it is found, update its value; otherwise, insert the value at the back of the list. - you search only items that collided at the 30 same table location

30 Summary Slide 6 - Collision Resolution (Cont) - there is no limitation on the number of values in the table, and deleting an item from the table involves only erasing it from its corresponding list 31 31