AdvancedReading time: ~15 min

Internal Workings

hashCode, equals, hashing, collision handling and Big-O complexity

Collections Framework performance is shaped by the equals() / hashCode() contract, hashing strategy, collision handling, resizing rules, memory layout, and cache behavior.


1. Definition

Collection internals describe how structures such as HashMap, HashSet, TreeMap, and ArrayList actually store, locate, and move data. This topic matters because saying an operation is “fast” is incomplete: you also need to know why it is fast, what assumptions that speed depends on, and when those assumptions break down.

At the API level, collections look simple:

  • map.get(key)
  • set.contains(value)
  • list.add(item)
  • list.get(index)

Under the surface, these operations rely on different storage models:

  • arrays
  • linked structures
  • hash buckets
  • balanced trees
  • comparator-based ordering

Strong interview answers connect the public API to the hidden representation. For example:

  • HashMap#get() is fast on average because a hash narrows the search to a bucket
  • ArrayList#get(index) is fast because it is direct indexed array access
  • TreeMap#get() is slower than HashMap#get(), but it buys you sorted navigation and range operations

The real skill is understanding not just the interface contract, but the mechanics behind the contract.


2. Core Concepts

2.1 The `equals()` and `hashCode()` contract

Hash-based collections depend on two methods working together:

  • equals() decides logical equality
  • hashCode() determines the first placement route into buckets

The key rule is:

If two objects are equal according to equals(), they must return the same hashCode().

If that rule is broken:

  • HashSet may accept logical duplicates
  • HashMap#get(key) may fail to find an entry that was previously inserted
  • contains() results can become surprising and inconsistent

This is why records are often such a good fit for hash keys: they generate value-based equals() and hashCode() automatically.

2.2 Hashing and bucket selection

In a hash table, the collection does not scan every element from the start. Instead, it:

  1. computes the key’s hash code
  2. spreads or mixes the bits
  3. maps the result to a bucket index
  4. searches only inside that bucket

This is what makes average-case lookup close to $O(1)$.

But “constant time” does not mean “free”:

  • the hash still has to be computed
  • collisions still require comparisons
  • resizing still causes occasional spikes
  • memory locality still affects actual CPU cost

2.3 Collision handling

A collision happens when two different keys end up in the same bucket.

That does not automatically mean an error. It just means the map has to perform more work inside that bucket.

Historically, bucket contents were stored as linked lists. Since Java 8, heavily collided buckets can be treeified into Red-Black Trees.

That means:

  • average-case behavior remains hash-table-like
  • worst-case behavior improves from linear to logarithmic in treeified buckets
  • malicious hash flooding becomes less effective

2.4 Load factor and resizing

Hash-based structures do not wait until the array is completely full before growing.

HashMap defaults:

  • initial capacity: 16
  • load factor: 0.75

Resize threshold is calculated as:

$$ threshold = capacity \times loadFactor $$

So for capacity 16, the first resize happens when the structure grows past 12 entries.

Resizing means:

  • allocate a larger table
  • redistribute entries
  • potentially recompute bucket placement

That is why pre-sizing matters in bulk insert scenarios.

2.5 Array-backed vs node-backed storage

ArrayList stores references in a contiguous array. LinkedList stores each value in a separate node object connected by references.

This difference has major consequences:

Structure Access pattern Memory layout Real-world effect
ArrayList Indexed Contiguous Excellent cache locality
LinkedList Pointer chasing Scattered nodes Poor cache locality

This is why theoretical complexity alone can be misleading:

  • LinkedList has attractive-looking endpoint operations
  • ArrayList still often wins in practice because CPUs love contiguous memory

2.6 Tree-based ordering

TreeMap and TreeSet use balanced tree structures rather than hashing.

That means:

  • no bucket logic
  • no hash-based lookup
  • natural or comparator-defined ordering
  • predictable $O(\log n)$ operations

You trade raw average-case speed for ordering and navigation capabilities like:

  • firstKey()
  • lastKey()
  • floorKey()
  • ceilingKey()
  • subMap()

2.7 Big-O vs actual runtime

Developers often stop at the complexity label. That is useful, but incomplete.

Real runtime also depends on:

  • memory allocation patterns
  • branch prediction
  • cache misses
  • object header overhead
  • comparator cost
  • key distribution quality

So the best answer is not just “HashMap is $O(1)$”, but:

“HashMap is average $O(1)$ with well-distributed hashes, but performance depends on key quality, resize frequency, and collision behavior.”


3. Practical Usage

The practical question behind internals is simple:

Which implementation fits the workload and failure mode I actually have?

3.1 Rules that prevent real bugs

  • Use immutable keys in HashMap
  • Override equals() and hashCode() together
  • Do not rely on HashMap iteration order
  • Pre-size large maps and lists during bulk loading
  • Use sorted structures only when ordering is a real requirement

3.2 Practical consequences of key design

Good key design usually means:

  • immutable fields
  • stable equality semantics
  • stable hash code over the object’s lifetime
  • reasonably distributed values

Bad key design leads to:

  • lost lookups
  • duplicate-looking entries
  • throughput degradation
  • hard-to-reproduce bugs in production

3.3 When average case is not enough

In interviews and performance discussions, explicitly distinguish:

  • average case
  • amortized case
  • worst case

Example:

  • ArrayList#add(e) at the end is amortized $O(1)$
  • HashMap#get(k) is average $O(1)$
  • TreeMap#get(k) is worst-case $O(\log n)$ and also average $O(\log n)$

That distinction signals much stronger understanding than reciting one number.

3.4 When ordering becomes part of correctness

Ordering is not always just cosmetic.

It can matter for:

  • deterministic tests
  • stable API responses
  • user-visible lists
  • reproducible logs
  • cache eviction logic

If order matters, choose a structure that guarantees it instead of hoping current JVM behavior stays stable.

3.5 Pre-sizing as a workload decision

Pre-sizing is worth considering when:

  • parsing large files
  • importing CSV or JSON in bulk
  • building caches during startup
  • grouping large datasets
  • creating temporary maps inside tight loops

If you already know approximately how many elements will be inserted, capacity planning is an easy win.


4. Code Examples

Basic Example — stable hash key design

import java.util.HashMap;
import java.util.Map;
import java.util.Objects;

final class UserKey {
    private final String email;

    UserKey(String email) {
        this.email = email;
    }

    @Override
    public boolean equals(Object other) {
        if (this == other) {
            return true;
        }
        if (!(other instanceof UserKey userKey)) {
            return false;
        }
        return Objects.equals(email, userKey.email);
    }

    @Override
    public int hashCode() {
        return Objects.hash(email);
    }
}

public class StableKeyDemo {
    public static void main(String[] args) {
        Map<UserKey, String> users = new HashMap<>();
        users.put(new UserKey("anna@example.com"), "Anna");

        System.out.println(users.get(new UserKey("anna@example.com")));
    }
}

This works because lookup computes the same logical hash path and equality decision as insertion.

Advanced Example — mutable key trap

import java.util.HashMap;
import java.util.Map;

public class MutableKeyTrap {
    static class Key {
        String id;

        Key(String id) {
            this.id = id;
        }

        @Override
        public boolean equals(Object other) {
            return other instanceof Key key && id.equals(key.id);
        }

        @Override
        public int hashCode() {
            return id.hashCode();
        }
    }

    public static void main(String[] args) {
        Map<Key, String> map = new HashMap<>();

        Key key = new Key("A-1");
        map.put(key, "stored");

        key.id = "B-2";

        System.out.println(map.get(key));
        System.out.println(map.containsKey(key));
    }
}

The entry is still in the table, but the key’s new hash no longer points to the original bucket.

Advanced Example — comparator inconsistency

import java.util.Comparator;
import java.util.Set;
import java.util.TreeSet;

record Person(String id, String name) {}

public class ComparatorTrap {
    public static void main(String[] args) {
        Comparator<Person> byNameOnly = Comparator.comparing(Person::name);
        Set<Person> people = new TreeSet<>(byNameOnly);

        people.add(new Person("1", "Alice"));
        people.add(new Person("2", "Alice"));

        System.out.println(people.size());
    }
}

If comparator equality is broader than logical equality, the set behaves according to the comparator, not your intention.

Practical Example — pre-sizing a map

import java.util.HashMap;
import java.util.Map;

public class PresizingExample {
    public static void main(String[] args) {
        int expectedEntries = 10_000;
        int capacity = (int) (expectedEntries / 0.75f) + 1;

        Map<String, Integer> counts = new HashMap<>(capacity);
    }
}

This avoids repeated rehashing during large inserts.


5. Trade-offs

Aspect Advantage Disadvantage Typical consequence
Hash-based storage Fast average lookup Sensitive to key quality Great default for plain lookup
Tree-based storage Sorted order, navigation Slower constant factors Best for range operations
Array-backed storage Cache-friendly reads Middle inserts shift elements Excellent default list
Node-backed storage Fast endpoint updates High overhead and poor locality Rarely best as a general list
Rich key objects Expressive domain modeling Easy to break equality contracts Requires discipline

The important pattern is that no structure is universally best. Each one encodes a bias toward certain workloads.


6. Common Mistakes

1. Overriding `equals()` without `hashCode()`

❌ Wrong mental model: “Equality is enough.”

✅ Correct model: hash-based collections need both methods to agree.

2. Using mutable keys in hash structures

If a field participating in equality or hashing changes after insertion, lookup behavior becomes unreliable.

3. Assuming every operation is always $O(1)$

That only makes sense when you name the implementation and the assumptions.

4. Forgetting comparator consistency in sorted sets and maps

If your comparator says two values compare as equal, the structure will treat them as equivalent positions.

5. Confusing iteration order with sorted order

LinkedHashMap is ordered by insertion or access. TreeMap is ordered by key comparator. HashMap is not a stable ordering API.

6. Ignoring memory overhead

Millions of node-heavy entries behave very differently from millions of array-backed references.


7. Deep Dive

7.1 Why `ArrayList` often beats `LinkedList`

The textbook story makes linked lists sound attractive. Real CPUs disagree.

ArrayList wins often because:

  • data is contiguous
  • prefetching works well
  • there is less object overhead
  • traversal avoids pointer chasing

LinkedList often loses because:

  • each node is a separate object
  • references scatter across the heap
  • cache misses dominate runtime

7.2 Hash flooding and defensive evolution of `HashMap`

Poorly distributed hashes cause long bucket chains. In adversarial settings, that can become a denial-of-service problem.

Java 8 improved resilience by treeifying large buckets when collision pressure becomes high enough.

That does not remove the need for good key design, but it does reduce the damage of pathological cases.

7.3 Fail-fast iterators are bug detectors, not thread-safety features

Many collections use a modification count to detect structural changes during iteration.

This leads to ConcurrentModificationException in many invalid mutation patterns.

Important nuance:

  • fail-fast is best-effort
  • it is not a synchronization mechanism
  • it does not make the collection thread-safe

7.4 Memory overhead is part of performance

Performance is not just CPU time. It is also:

  • total object count
  • GC pressure
  • pointer chasing
  • retained heap size

This is why “theoretically nice” structures can still underperform badly in production.

7.5 Internals influence API design choices

If you understand internals, you write better APIs:

  • you avoid exposing List when only iteration is needed
  • you know when returning an immutable copy is worth the allocation
  • you pick map implementations intentionally instead of reflexively
  • you document ordering guarantees precisely

That is the real value of this topic: not memorizing trivia, but making better engineering decisions.


8. Interview Questions

Why is `HashMap#get()` only average $O(1)$ and not guaranteed $O(1)$?

Because the operation still depends on hash distribution, collisions, and bucket structure. In bad cases, lookup costs more work inside a bucket.

Why can a `HashMap` “lose” an entry even though it still exists internally?

Because if a mutable key changes after insertion, its new hash/equality path no longer matches the original bucket placement.

Why can `ArrayList` outperform `LinkedList` even for operations that look expensive on paper?

Because contiguous memory and CPU cache locality often matter more in practice than the theoretical linked-node advantage.

What is the most common production bug tied to collection internals?

Broken equals() / hashCode() contracts and mutable map keys are among the most common sources of subtle lookup and deduplication bugs.


9. Glossary

Term Meaning
bucket An internal slot in a hash table selected from a hash-derived index
collision Two different keys landing in the same bucket
load factor The fill threshold that triggers growth in a hash-based structure
rehash Resizing the table and redistributing entries
treeification Converting a heavily collided bucket into a balanced tree
cache locality How well data layout fits CPU cache lines
pointer chasing Following object references through scattered memory
fail-fast iterator Iterator that detects many structural modifications during traversal
worst case The most expensive valid execution path for an operation
amortized complexity Average complexity over a sequence of operations, including occasional spikes

10. Cheatsheet

  • HashMap lookup is average fast, not magically constant under every condition
  • equals() and hashCode() form one contract for hash-based structures
  • Mutable keys are dangerous in HashMap and HashSet
  • TreeMap trades speed for ordering and navigation
  • ArrayList is cache-friendly and usually the default list choice
  • LinkedList is rarely the best general-purpose list
  • Pre-sizing reduces resize and rehash overhead in bulk workloads
  • Comparator consistency matters in TreeSet and TreeMap
  • Fail-fast iterators detect many invalid mutations, but do not provide thread safety
  • Good internal understanding leads to better API and performance decisions

If you get stuck in an interview, return to three anchors for Internal Workings: precise definition, dominant use case, and the main failure mode.

🎼 Games

10 questions