Hash Functions- Division, Mid Square, Digit Folding , Multiplicative Method, Digit Analysis
A hash function is a mathematical function that takes an input (or "message") and produces a fixed-size string of characters, which is typically a hash code or hash value. The primary purpose of a hash function is to efficiently map data of arbitrary size to a fixed-size value, usually for the purpose of quickly and securely indexing data in a hash table or checking the integrity of data.
Here are key characteristics and purposes of hash functions:
Deterministic:
- A hash function is deterministic, meaning that for a given input, it will always produce the same output. This property is crucial for consistent hashing and data integrity verification.
Fixed Output Size:
- Hash functions produce a fixed-size output, regardless of the size of the input. For example, a hash function might always produce a 256-bit hash value.
Efficiency:
- Hash functions should be computationally efficient to calculate the hash value for any input quickly. This efficiency is crucial in applications like hash tables or digital signatures.
Uniformity:
- Ideally, a hash function should produce hash values that are uniformly distributed across its output space. This helps in preventing clustering, where many inputs map to the same hash value.
Irreversibility:
- Hash functions are designed to be one-way, meaning it should be computationally infeasible to reverse the process and obtain the original input from the hash value. This property is crucial for security applications.
Collision Resistance:
- Collision resistance means that it is difficult to find two different inputs that produce the same hash value. A good hash function minimizes the likelihood of collisions.
Avalanche Effect:
- A small change in the input should result in a significantly different hash value. This property ensures that similar inputs don't produce similar hash values.
Cryptographic Hash Functions:
- In security applications, such as data integrity verification and password storage, cryptographic hash functions are used. These functions meet additional criteria to resist various attacks, making it computationally infeasible to find collisions or reverse the hash.
Following are the most commonly used hash functions in hash table implementation
Division Method
This is the most simple and easiest method to generate a hash value. The hash function divides the value k by M and then uses the remainder obtained.
Formula:
h(K) = k mod M
Here,
k is the key value, and
M is the size of the hash table.
Example:
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
k = 1276
M = 11
h(1276) = 1276 mod 11
= 0
Pros:
The Division Method is a simple technique for creating a hash function. In this method, the key is divided by a fixed divisor, and the remainder (or sometimes the quotient) is used as the hash value. The resulting value is then typically subjected to a modulo operation to ensure it falls within the range of indices in the hash table.
This is the most simple and easiest method to generate a hash value. The hash function divides the value k by M and then uses the remainder obtained.
Formula:
h(K) = k mod M
Here,
k is the key value, and
M is the size of the hash table.
It is best suited that M is a prime number as that can make sure the keys are more uniformly distributed. The hash function is dependent upon the remainder of a division.
Example:
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
k = 1276
M = 11
h(1276) = 1276 mod 11
= 0
Pros:
- This method is quite good for any value of M,straightforward and easy to implement.
- The division method is very fast and efficient since it requires only a single division operation.
- This method leads to poor performance since consecutive keys map to consecutive hash values in the hash table.
- Sometimes extra care should be taken to choose the value of M.
Considerations:
Choice of Divisor: The effectiveness of the division method can be influenced by the choice of the divisor. A poorly chosen divisor may result in uneven distribution and increased collisions.
Conclusion:
The Division Method is a basic and easy-to-understand approach to hash function creation. While it may lack some of the sophistication of more complex hashing techniques, it is suitable for simple applications where a quick and uncomplicated hash function is needed. However, the choice of divisor is critical, and careful consideration should be given to ensure a good distribution of keys in the hash table.
Mid-Square Method
The Mid-Square Method is a technique for creating a hash function by squaring the key and extracting the middle digits of the result. The extracted digits are then used as the hash value. Typically, a modulo operation is applied to ensure the hash value falls within the desired range of indices in the hash table.
Steps:
Square the Key:Take the key and square it.
Select Middle Digits:Extract the middle digits of the squared result. The number of digits chosen depends on the desired size of the hash table.
Modulo Operation (optional):Perform a modulo operation with the size of the hash table to ensure that the hash value falls within the range of table indices.
Example:
Suppose we have a simple hash table with 10 slots (0 to 9), and we want to hash the key "725" using the mid-square method.
Square the Key:725^2=525625.
Select Middle Digits: Since we have six digits in the result, let's select the middle two digits: 56
Modulo Operation (optional):Since our hash table has 10 slots, perform a modulo operation:
56 mod 10=6
So, the hash value for the key "725" using the mid-square method is 6, and we would store the corresponding data in the slot 6 of our hash table.
Pros:
Simplicity: The mid-square method is easy to understand and implement.
Ease of Computation: Squaring operations and digit extraction are generally efficient.
The result is not dominated by the distribution of the top digit or bottom digit of the original key value.Cons:
Sensitivity to Initial Key Choice: The effectiveness of the mid-square method can be influenced by the choice of the initial key. If the initial key choice is poorly distributed, the square operation might not distribute the keys well either.
The size of the key is one of the limitations of this method, as the key is of big size then its square will double the number of digits.
Another disadvantage is that there will be collisions but we can try to reduce collisions.
Potential for Clustering: Depending on the choice of initial key and the length of the squared result, there might be a risk of clustering.
Conclusion:
The Mid-Square Method is a basic and intuitive approach to hash function creation. It is suitable for simple applications where a quick and straightforward hash function is needed. However, careful consideration should be given to the choice of the initial key to ensure a more uniform distribution of keys in the hash table. Additionally, the method may not be as robust as more sophisticated hashing techniques in certain scenarios.
Digit Folding Method:
The Digit Folding Method is a technique for creating a hash function by dividing the key into equal-sized chunks (usually digits) and adding these chunks together. The result is then subjected to a modulo operation to ensure the hash value falls within the desired range of indices in the hash table.
Steps:
Divide the Key into Chunks:
Divide the key into equal-sized chunks. The size of the chunks depends on the desired characteristics of the hash function.
Add the Chunks:
Add together the values of the chunks.
Modulo Operation (optional):Perform a modulo operation with the size of the hash table to ensure that the hash value falls within the range of table indices.
There are two ways of carrying out this addition. In the first all but the last part are shifted so that the least significant bit of each part lines up with the corresponding of the last part. The different parts are then added together. This method is known as shift folding. The other method of adding the chunks is folding at the boundaries. In this method the identifier is folded at the part boundaries and the digits falling into the same position are added together.
Example: Let the identifier be 12320324111220
Example:
Suppose we have a simple hash table with 10 slots (0 to 9), and we want to hash the key "725" using the digit folding method.
Divide the Key into Chunks:For simplicity, let's use individual digits as chunks: 7,2,5
Add the Chunks:7+2+5=14
Modulo Operation (optional):Since our hash table has 10 slots, perform a modulo operation:
14mod 10=4
So, the hash value for the key "725" using the digit folding method is 4, and we would store the corresponding data in the slot 4 of our hash table.
Example: ( shift folding method)
k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
Example: ( folding at boundaries)
k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 43 + 5
= 60
h(K) = 60
Note:
The number of digits in each part varies depending upon the size of the hash table. Suppose for example the size of the hash table is 100, then each part must have two digits except for the last part which can have a lesser number of digits.
Advantages:
Simplicity: The digit folding method is easy to understand and implement.
Adaptability: The method can be adjusted by choosing different chunk sizes based on the characteristics of the data.
Considerations:
Chunk Size: The choice of chunk size can impact the distribution of keys. Experimentation may be needed to find an optimal chunk size.
Potential for Clustering: Depending on the chosen chunk size and the distribution of keys, there might be a risk of clustering.
Conclusion:
The Digit Folding Method is a straightforward approach to hash function creation. It is suitable for simple applications where a quick and adaptable hash function is needed. However, as with any hashing method, considerations should be given to the choice of parameters to ensure a good distribution of keys in the hash table. Additionally, more sophisticated techniques might be preferred for certain scenarios, especially in security-sensitive applications.
It is a simple and efficient method that involves multiplying the key by a constant (usually a real number) and extracting the fractional part of the product. The result is then multiplied by the size of the hash table to obtain the final hash value.
Here's a step-by-step explanation of the multiplicative hash function:
Formula: hash value=⌊table size×(key×A mod 1)⌋Key Components:
key: The input key that needs to be hashed.
Steps:
Multiply the key by a constant A.
Take the fractional part of the product by performing a modulo operation with 1.
Multiply the fractional part by the size of the hash table.
Advantages:
A: A constant multiplier (usually chosen to be a constant between 0 and 1, but not equal to 0 or 1).
table size: The size of the hash table.Steps:
Multiply the key by a constant A.
- product=key×A
Take the fractional part of the product by performing a modulo operation with 1.
- fractional part=product mod 1
Multiply the fractional part by the size of the hash table.
- hash value=⌊table size x fractional part⌋
Final Hash Value:
- The result is the final hash value, which represents the index in the hash table where the data associated with the key should be stored.
Simplicity: The multiplicative hash function is easy to implement.
Considerations:
Good Distribution: When the constant A is carefully chosen, the function can provide a good distribution of hash values.
Considerations:
Choice of Constant (A): The choice of the constant A is crucial. It is often chosen to be a constant close to, but not equal to, a power of 2 divided by the golden ratio A≈(sqrt(5)−1)/2). This choice helps in achieving a better distribution of hash values.
Example: Suppose we want to hash the key k=725 into a hash table of size 10 using the multiplicative hash function. Let A=(sqrt(5)−1)/2.
product=725×A≈725×0.618=448.05
fractional part=448.05 mod 1≈0.05
hash value=⌊10×0.05⌋=0
So, the hash value for the key 725using the multiplicative hash function is 0, and we would store the corresponding data in the slot 0 of our hash table.
The success of the multiplicative hash function often relies on careful selection of the constant A to achieve good distribution properties.
Digit Analysis Method
How does the digit analysis method work?
Define the desired output length: This determines the number of digits in the hash value.
Extract digits from the input: This can be done in various ways, such as selecting specific positions within the input or using a modulo operation.
Discard non-uniform digits: Analyse the extracted digits and discard those that appear too frequently or too rarely. This promotes a more uniform distribution, enhancing randomness and collision resistance. Combine remaining digits: Form the final hash value by combining the remaining digits in a specific order.
Example of Digit Analysis Method:
Let's consider the following scenario:
Input: We want to create a hash function for a set of student IDs, ranging from 10000 to 99999.
Desired output length: 3 digits
Algorithm: Extract digits: We choose to extract the 2nd, 4th, and 6th digits of the student ID.
Analyse digit distribution: Digit 2: We find that digit 2 has a skewed distribution. For example, there are more student IDs with 2 as the second digit compared to other digits.
Digit 4: The distribution of digit 4 is more uniform.
Digit 6: Digit 6 also exhibits a skewed distribution, similar to digit 2.
Discard non-uniform digits: Due to the biased distribution of digits 2 and 6, we discard them to promote randomness.
Combine remaining digits: We combine the remaining digit 4 to form the final hash value.
Suppose we have two student IDs: 12345 and 65432.
For 12345, the extracted digit 4 is 4.
For 65432, the extracted digit 4 is 3.
For 65432, the extracted digit 4 is 3.
Comments
Post a Comment