CS205

Hashing Basics

Learn how hash functions convert keys into array indices for fast lookups.

What is a Hash Function?

A hash function takes an input (key) and returns a fixed-size integer called a hash code. This hash code is then used to determine where to store the data in an array.

key → hashCode() → bucket index

Good hash functions distribute keys evenly across buckets, minimizing collisions.

Java's hashCode() Method

In Java, every object has a hashCode() method. For Strings, it's calculated using character values:

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

The multiplier 31 is chosen because it's an odd prime, which helps distribute hash codes more evenly.

Hash Calculator
8

Try these examples:

Key Points

Hash Code to Index

The bucket index is calculated using the modulo operation:

index = hashCode % capacity

This ensures the index is always within [0, capacity-1].

Collisions

When two different keys hash to the same index, we have a collision.

Java's HashMap handles collisions using chaining — storing multiple entries in the same bucket as a linked list.

Why Use Modulo?

Hash codes can be very large numbers (up to 2^31 - 1 in Java). We can't create an array that large! The modulo operation maps any hash code to a valid array index.

96354
Hash Code
% 8
Modulo Capacity
2
Bucket Index

Different capacities will give different indices for the same key. This is why rehashing (changing capacity) requires recalculating all indices.