Hash Table

A hash table is a data structure that implements an associative array abstract data type, a structure that can map keys to values. It uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found. This provides a data structure that can be searched in O(1) time. Each bucket can consists of S slots. Each bucket can hold only one record if it contains one slot.

Example: Lets consider a hash table with 26 buckets and 2 slots. Lets consider a hash function F(X) which maps all identifiers X based on the first character in X.
The identifiers A,A2,D,GA,G are hashed into the corresponding buckets/slots as shown below.

Here are the key components and characteristics of a hash table:
    Hash Function:
    A hash function takes a key as input and produces a hash code, which is used as an index to access the corresponding value in the hash table.
    The hash function should be deterministic (same key should always produce the same hash code) and provide a uniform distribution of hash codes to minimize collisions.

    Array or Bucket Array:
  1. The hash table is typically implemented as an array of buckets or slots, where each bucket can store one or more key-value pairs in slots.
    The array size is usually determined based on the expected number of keys and the desired efficiency of the hash table.

  2. Collision Resolution:
    • Collisions occur when two different keys hash to the same index. There are various techniques to handle collisions:
        Open Addressing: In this approach, when a collision occurs, the algorithm looks for the next available slot in the array (e.g., linear probing, quadratic probing, double hashing).
        Separate Chaining: Each bucket contains a linked list, and when a collision occurs, new key-value pairs are appended to the linked list at that bucket.

  3. Load Factor:
  4. The load factor is the ratio of the number of stored elements to the size of the array. It influences the efficiency and performance of the hash table.
    A higher load factor increases the likelihood of collisions but reduces memory usage, while a lower load factor increases memory usage but reduces collisions.

  5. Time Complexity:
    When the hash function is well-designed and collisions are minimized, hash tables can provide constant-time average-case complexity for basic operations like search, insert, and delete.


  1. Example:
    A hash table is a collection of items which are stored in such a way as to make it easy to find them later. Each position of the hash table, often called a slot, can hold an item and is named by an integer value starting at 0. For example, we will have a slot named 0, a slot named 1, a slot named 2, and so on.
Figure shows a hash table of size m=11. In other words, there are m slots in the table, named 0 through 10.
The mapping between an item and the slot where that item belongs in the hash table is called the hash function. The hash function will take any item in the collection and return an integer in the range of slot names, between 0 and m-1. Assume that we have the set of integer items 54, 26, 93, 17, 77, and 31. Our first hash function, sometimes referred to as the “remainder method,” simply takes an item and divides it by the table size, returning the remainder as its hash value (ℎ(k)=k%11). Note that this remainder method (modulo arithmetic) will typically be present in some form in all hash functions, since the result must be in the range of slot names.


Item  

                                                                   Hash Value

54

10

26

4

93

5

17

6

77

0

31

9





Once the hash values have been computed, we can insert each item into the hash table at the designated position as shown below. Note that 6 of the 11 slots are now occupied. This is referred to as the load factor, and is commonly denoted by lambda=Number of items/Table size
For this example, lambda=6/11.

Now when we want to search for an item, we simply use the hash function to compute the slot name for the item and then check the hash table to see if it is present. This searching operation is O(1), since a constant amount of time is required to compute the hash value and then index the hash table at that location. If everything is where it should be, we have found a constant time search algorithm.

You can probably already see that this technique is going to work only if each item maps to a unique location in the hash table. For example, if the item 44 had been the next item in our collection, it would have a hash value of 0 (44%11==0). Since 77 also had a hash value of 0, we would have a problem. According to the hash function, two or more items would need to be in the same slot. This is referred to as a collision (it may also be called a “clash”). Clearly, collisions create a problem for the hashing



Comments

Popular posts from this blog

Data Structures CST 201 KTU Third Semester Syllabus Notes and Solved Questions- Dr Binu V P 984739060

Depth First Search DFS

Binary Search Tree ( BST) and operations