Collision Resolution Techniques- Open Addressing and Chaining

Collision resolution is a crucial aspect of hashing because it addresses the situation where two different keys hash to the same index in a hash table. There are various collision resolution techniques, each with its own advantages and disadvantages. Here are some common collision resolution techniques:

A. Open Addressing:

In open addressing, when a collision occurs, the algorithm looks for the next available slot in the hash table until an empty slot is found. In Open Addressing, all elements are stored in the hash table itself. So at any point, the size of the table must be greater than or equal to the total number of keys (Note that we can increase table size by copying old data if needed). This approach is also known as closed hashing. This entire procedure is based upon probing. We will understand the types of probing ahead:

Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k.

Search(k): Keep probing until the slot’s key doesn’t become equal to k or an empty slot is reached.

Delete(k): Delete operation is interesting. If we simply delete a key, then the search may fail. So slots of deleted keys are marked specially as “deleted". The insert can insert an item in a deleted slot, but the search doesn’t stop at a deleted slot.




There are different variations of open addressing, including:

1.Linear Probing: 
The algorithm looks for the next available slot linearly (sequentially) in the array.
# Linear Probing
hash_code = (hash1(key) + i) % table_size
Advantages: Simple and easy to implement.
Disadvantages: Prone to clustering, where consecutive slots get filled, potentially leading to further collisions.

2.Quadratic Probing: 
The algorithm looks for the next slot using a quadratic function.
# Quadratic Probing
hash_code = (hash1(key) + c1 * i + c2 * i^2) % table_size
Advantages: Helps mitigate clustering to some extent.
Disadvantages: May still result in clustering, and the quadratic function needs to be chosen carefully to avoid infinite loops.

3.Double Hashing: The algorithm uses a secondary hash function to determine the step size for probing.
# Double Hashing
hash_code = (hash1(key) + i * hash2(key)) % table_size

Advantages: Generally provides a good distribution of keys, reduces clustering.
Disadvantages: Requires a good choice of the second hash function, and if not carefully designed, it may still lead to clustering.

Each open addressing technique has its advantages and disadvantages, and the choice between them depends on factors such as the desired trade-off between simplicity and efficiency, the distribution of hash values, and the characteristics of the data being stored. The goal is to efficiently find an empty slot in the hash table while avoiding excessive clustering and maintaining a good distribution of keys.

Advantages: No additional data structures required, cache-friendly.
Disadvantages: Increased likelihood of clustering (groups of consecutive filled slots), which can impact performance.

B. Separate Chaining:
    Chaining is a collision resolution technique in hashing where each slot (index) in the hash table maintains a linked list of elements. When a collision occurs, meaning two keys hash to the same index, the new element is simply appended to the linked list associated with that index.





  1. Implementation:

    Hash Function: A hash function is used to determine the index for each key. The goal is to distribute the keys as uniformly as possible across the available slots.

    Linked Lists: Each slot in the hash table contains a linked list (or sometimes another data structure like a binary search tree) to handle multiple elements that hash to the same index.

    Collision Handling: When
     a collision occurs (i.e., two keys hash to the same index), the new key-value pair is inserted at the end of the linked list at that index.

    Searching: To retrieve a value, the hash function is used to find the index, and then a search is performed in the linked list at that index.

    # Collision resolution using Separate Chaining
    hash_table[index].append((key, value))

  2. Advantages:
  • Simple Implementation: Chaining is relatively straightforward to implement compared to some other collision resolution techniques.
  • Efficient for Multiple Collisions: It handles scenarios where multiple keys hash to the same index efficiently by creating a linked list at each slot.
Disadvantages:

  • Memory Overhead: Requires additional memory for maintaining pointers in the linked lists, which can lead to increased memory usage.
  • Performance: The efficiency of chaining can degrade if the linked lists become too long, leading to longer search times.
  1. Performance Considerations:
  • The performance of chaining is influenced by the length of the linked lists. If the lists become too long, it can lead to increased search times.
  • Choosing a good hash function is crucial to achieving a uniform distribution of keys and minimizing collisions.

  1. Conclusion:
    Chaining is a practical and intuitive method for handling collisions in hashing. Its simplicity makes it a good choice for scenarios where the number of collisions is expected to be reasonably low or when dealing with variable-length keys. Understanding the characteristics of the data and designing an effective hash function are essential for optimizing the performance of the chaining method.

  2. S.No.Separate ChainingOpen Addressing
    1.Chaining is Simpler to implement.Open Addressing requires more computation.
    2.In chaining, Hash table never fills up, we can always add more elements to chain.In open addressing, table may become full.
    3.Chaining is Less sensitive to the hash function or load factors.Open addressing requires extra care to avoid clustering and load factor.
    4.Chaining is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.Open addressing is used when the frequency and number of keys is known.
    5.Cache performance of chaining is not good as keys are stored using linked list.Open addressing provides better cache performance as everything is stored in the same table.
    6.Wastage of Space (Some Parts of hash table in chaining are never
     used).
    In Open addressing, a slot can be used even if an input doesn’t map to it.
    7.Chaining uses extra space for links.No links in Open addressing

Examples; University Questions

Let the size of a hash table is 10. The index of the hash table varies from 0 to 9. Assume the keys 73, 54, 15, 48, 89, 66, 37, 18, 41, 22, 62 are mapped using modulo operator.

Show how the keys are distributed using chaining method.


Apply the hash function h(x) = x mod 7 for linear probing on the data 2341, 4234,2839, 430, 22, 397, 3920 and show the resulting hash table

Note:red indicates collision and the data is placed in free  position considering the linear probing technique.

Example:
Hash the following keys using open chaining method and closed linear probing method. Size of table is 7 and the Hash function H(K) = K mod 7.Keys ={16, 21, 23, 50, 19, 26}

linear probing

chaining


Example:
Hash the following keys using open chaining method and closed linear probing method. Size of table is 11 and the Hash function H(K) = K mod 11.
Keys ={17, 22, 34, 23, 19, 66}

Linear probing

Chaining


Comments

Popular posts from this blog

Data Structures CST 201 KTU Third Semester Syllabus Notes and Solved Questions- Dr Binu V P 984739060

Depth First Search DFS

Binary Search Tree ( BST) and operations