Longest Consecutive Sequence
Introduction
An unsorted array of integers arrives, and the question is: what is the longest run of consecutive values — possibly scattered throughout the array — that together cover an interval with no gaps?
The obvious move is to sort and scan. It works in and is the answer most candidates produce. What makes this problem an interview favorite is that there is a better answer — one with amortized time — and reaching for it requires the right move at the right time. That move is: build a hash set, then only walk forward from chain heads (values x whose predecessor x − 1 is not in the set). Skipping non-heads is what keeps the nested loop from being quadratic.
The trick isn’t the hash set. It’s knowing which elements NOT to start walking from.
If you can’t see immediately why the chain-head filter saves the asymptote, read the amortized analysis in the recommended approach — it’s the whole interview signal. If you can, skip to the union-find treatment: it’s the third lens and it matters when the input is streaming.
What the interviewer is actually testing
- Willingness to move past the first idea. Candidates who sort and scan, state , and stop are giving the right answer with the wrong complexity. The test is whether you explicitly ask “can we do better?” even after finding a correct solution.
- The chain-head insight. The move from “walk from every element” to “walk only from chain heads” is non-obvious. Naming it out loud — “I’ll start only from values whose predecessor is absent” — is the signal.
- Amortized reasoning. The hash-set approach has a nested loop and looks like it could be . Explaining why it’s not — each element is visited at most twice total across all walks — separates mid-level candidates from senior ones.
- Tool selection. Knowing when union-find is the right tool (streaming updates, merging regions) versus when it’s overkill (static one-shot) is the senior signal.
Problem statement
Given an unsorted array of integers , return the length of the longest subsequence of whose values form a consecutive range (gaps of exactly 1 between sorted neighbors).
The subsequence need not appear contiguously in ; it just needs to exist as a set of values in .
A = [100, 4, 200, 1, 3, 2] 4 A = [45, 31, 46, 235, 28, 30, 29, 47] 4 A = [1, 2, 2, 3, 0, -1] 5 Clarifying questions — and why each one matters
Clarification questions
- Q Can the array contain duplicates?
- A Ask. Usually yes. Duplicates collapse under a set-based solution (idempotent), but they inflate a naive sorted scan if you don't dedupe. This affects which counter you reset on repeats.
- Q Can values be negative, or near
INT_MIN/INT_MAX? - A Usually yes to negatives. At the int-boundary, walking x + 1 could overflow if you don't guard — rare in interview settings but worth mentioning.
- Q Can the input be modified?
- A Matters for the sort-scan baseline: if you can't mutate, you either copy or sort indices. The hash-set solution is read-only on the input.
- Q What's the upper bound on n?
- A Up to 10⁶ is typical. That's well within O(n) budget but suggests the interviewer wants you to beat O(n log n). Hash set is the expected answer.
- Q Is the input streaming — do numbers arrive one at a time?
- A Different problem shape. Streaming favors union-find: each new number is a singleton; try to merge with x − 1 and x + 1. Keep a running maximum component size.
- Q Should we return the sequence itself, or just its length?
- A Default is just the length. If the sequence is requested, remember the chain head that produced the winning length, then re-walk to materialize the values. No asymptotic change.
A mental framework — sort is the obvious move; asking “can I do better” is the signal
If the array were sorted, the problem is trivial: walk the array once, tracking a running-length counter that resets whenever the gap between consecutive sorted values isn’t exactly 1. That gives for the sort plus for the scan — a correct and clean solution.
The question an interviewer hopes you’ll ask yourself is: do we actually need the elements sorted, or do we just need to know which values are present? Sorting is a strong tool with a lot of information; this problem only needs set membership — “is value y somewhere in the input?”. That’s the operation a hash set gives you.
Once you have membership, the shape of the algorithm becomes: for each starting value x, walk forward through x + 1, x + 2, ... as long as each is in the set. The longest walk wins.
The trap: a naive version runs that walk from every starting value. On the input [1, 2, 3, ..., n], starting from 1 walks through all n, starting from 2 walks through n − 1, and so on — worst case. The fix is the chain-head filter in the recommended approach.
Discussion
Sort and scan — the obvious baseline
Sort, dedupe, walk the array. Reset the run counter whenever the next element isn't exactly one greater than the previous.
The interview narration
Say this out loud:
“If the array were sorted, I could find the longest consecutive run with a single linear scan — reset a counter on every gap. So: sort first, then scan. for the sort dominates. Let me code it as a baseline and then try to do better.”
How it works, step by step
- Sort the array in place (or a copy, if the input is immutable).
- Dedupe. Consecutive duplicates would look like a gap of 0, not 1 — they’d reset the counter incorrectly. Removing them with
std::uniqueafter sorting keeps the scan clean. Alternatively, special-case “gap of 0 → don’t reset, don’t increment” during the scan. - Linear scan with a running
runcounter and abesttracker. For each :- If
nums[i] == nums[i - 1] + 1: extend the current run. Updatebestif needed. - Otherwise: reset
runto 1 (the current element alone).
- If
- Return
best.
The code
#include <algorithm>#include <vector>
int longestConsecutive(std::vector<int> nums) { if (nums.empty()) return 0; std::sort(nums.begin(), nums.end()); nums.erase(std::unique(nums.begin(), nums.end()), nums.end()); int best = 1; int run = 1; for (std::size_t i = 1; i < nums.size(); ++i) { if (nums[i] == nums[i - 1] + 1) { ++run; if (run > best) best = run; } else { run = 1; } } return best;}Sort + dedupe + single-pass scan. The `std::unique` + `erase` idiom trims consecutive duplicates in place after sort; the scan resets on every gap.
Two small C++ notes worth naming:
std::unique+erase.std::uniqueshifts distinct consecutive values to the front and returns an iterator to the new end; theerasetruncates. The two must be paired —uniquealone leaves the tail in an unspecified state.- Pass by value. Taking
std::vector<int> numsby value instead of mutating the caller’s input is the idiomatic move when the algorithm needs to sort but the caller expects the input to be preserved.
Why it’s
The sort is the bottleneck — standard comparison sort is worst-case. The dedupe is (one pass), and the final scan is . Total: time. Space is auxiliary if you sort in place, or if you copy first.
Why the interviewer expects you to go further
with extra space is a legitimate answer. But this problem is specifically designed to have a linear-time solution, and “can we do better than sort?” is a prompt you should ask yourself without being told. Name the insight: “we don’t need the elements sorted; we just need set membership.” Then switch to the hash-set approach.
Hash set with chain-head filter Recommended
Put every value in a hash set. For each value x, walk forward only if x is a chain head (x − 1 not in the set). Amortized O(n).
The insight, stated cleanly
Every element participates in exactly one consecutive chain. If we walk forward from the smallest member of each chain, we traverse each chain exactly once and each element is visited at most twice total. Starting from non-smallest members of the same chain would redo work already covered.
The test for “smallest member of a chain” is stunningly simple: x is a chain head iff x − 1 is not in the set. One hash-set lookup per element decides whether to walk or skip.
[100, 4, 200, 1, 3, 2]Build a hash set from the 6 unique values. Scan each one; only walk forward from chain heads (x − 1 not present).
How it works, step by step
- Build the set.
std::unordered_set<int> present(nums.begin(), nums.end()). This is expected and deduplicates for free — inserting a duplicate is a no-op. - Scan every unique value. Iterate over the set (any order).
- Filter non-heads. For each
x, checkpresent.count(x - 1). If yes, this value is mid-chain; some other value in the same chain is the head, and walking from it will coverx. Skip. - Walk from heads. For each chain head, increment a local
lengthcounter whilex + 1, x + 2, ...are in the set. - Track the best. Update the global maximum after each walk.
The code
#include <unordered_set>#include <vector>
int longestConsecutive(const std::vector<int>& nums) { std::unordered_set<int> present(nums.begin(), nums.end()); int best = 0; for (const int x : present) { // Start a walk only from chain heads (x - 1 not present). // This keeps the inner loop from repeating work already covered // by a walk that started at a smaller member of the same chain. if (present.count(x - 1)) continue; int length = 1; for (int y = x + 1; present.count(y); ++y) ++length; if (length > best) best = length; } return best;}The chain-head filter in ten lines. The `continue` on line 8 is the entire O(n²) → O(n) trick.
The one comment inside the function names the invariant at the exact line code relies on it. That’s the methodology’s rare-exception case: a subtle correctness point where removal would be a bug.
The amortized argument — why this is , not
The structure looks suspicious: an outer loop over every element, and an inner loop that walks forward. A reader who doesn’t see the filter might guess .
The filter makes it :
- The outer loop runs times and does work per iteration (one set lookup for the head check).
- The inner walk fires only for chain heads. A chain of length has exactly one head; the walk from that head does set lookups.
- Total inner work across all chains is (every element belongs to exactly one chain, and each gets visited once by the walk of its chain’s head).
Grand total: outer-loop checks + inner-walk steps = operations = .
The viz above visualizes this via the per-element touch counters: they stay ≤ 2 by the end of every run. One touch from the outer scan, at most one touch from being the target of a forward walk. No element is ever touched more than that, and many are touched just once.
Edge cases the mechanics handle for free
- Empty array: the set is empty, the outer loop never fires, best stays 0. Correct.
- Single element: the set has one value; it’s trivially a chain head (predecessor absent), walk length is 1, best = 1.
- All duplicates: the set collapses to one value. Same as single element.
- Already a consecutive run: one chain head at the minimum; walk covers all values; best = .
- Negatives and zero: the
x - 1check works on anyint; no special handling needed. Be mindful ofINT_MIN(then underflows) — in practice unbounded in typical inputs; mention as a corner if pressed.
Why iterate over the set rather than the array
Iterating over the input array means we examine duplicates multiple times (the filter and walk are idempotent, so correctness is preserved, but each duplicate wastes a constant factor). Iterating over the set deduplicates automatically — every unique value is examined exactly once. The performance difference is a small constant, but the code reads cleaner.
Complexity
- Time: expected (amortized across the outer scan and inner walks). Set operations are expected per call.
- Space: for the hash set.
Union-find over seen values
For each value x, create a singleton component. Union with x − 1 and x + 1 whenever they are already present. Track the largest component size.
When union-find earns its keep
The hash-set solution is simpler, faster in practice, and the right default. Union-find shines in two scenarios:
- Streaming input. Numbers arrive one at a time; you must report the current longest run after each arrival. The hash-set approach would need to redo work per query. Union-find does — essentially constant — per update.
- Composing with DSU-native operations. If the rest of your pipeline uses union-find for other reasons (connectivity, Kruskal-style merging, dynamic equivalence classes), reusing the structure avoids a redundant hash set.
For the static one-shot problem, it’s overkill. Include it in an interview only if asked about streaming or if you want to show a second tool.
How it works, step by step
Use a hash-backed DSU (keys are the integer values themselves; no flat index required). For each incoming value x:
- Add
x. Ifxwas already seen, skip (idempotent). Otherwise create a singleton component. - Union with
x − 1. Ifx − 1is in the DSU,unite(x, x − 1)merges the two components and updates the size of the root. - Union with
x + 1. Same check, same merge.
After each add and each unite, the DSU tracks the running maximum component size. That’s the answer so far.
The DSU internals
- Path compression during
findflattens the tree: subsequentfinds on any visited node become one-hop lookups. - Union by size (or by rank) keeps the tree shallow: the smaller component attaches to the larger.
Together, any operations on an -element DSU run in , where is the inverse Ackermann function — effectively constant for any realistic .
The code
#include <unordered_map>#include <utility>#include <vector>
class DSU { public: void add(int x) { if (parent_.count(x)) return; parent_[x] = x; size_[x] = 1; if (best_ < 1) best_ = 1; }
bool contains(int x) const { return parent_.count(x) > 0; }
int find(int x) { while (parent_[x] != x) { parent_[x] = parent_[parent_[x]]; x = parent_[x]; } return x; }
void unite(int a, int b) { int ra = find(a); int rb = find(b); if (ra == rb) return; if (size_[ra] < size_[rb]) std::swap(ra, rb); parent_[rb] = ra; size_[ra] += size_[rb]; if (size_[ra] > best_) best_ = size_[ra]; }
int best() const { return best_; }
private: std::unordered_map<int, int> parent_; std::unordered_map<int, int> size_; int best_ = 0;};
int longestConsecutive(const std::vector<int>& nums) { DSU dsu; for (const int x : nums) { dsu.add(x); if (dsu.contains(x - 1)) dsu.unite(x, x - 1); if (dsu.contains(x + 1)) dsu.unite(x, x + 1); } return dsu.best();}Hash-keyed DSU; each new value tries to merge with its numerical neighbors. The running maximum is a single field updated on every size change.
Complexity
- Time: — effectively with a larger constant than the hash-set approach due to path-compression and size-tracking overhead.
- Space: for the parent and size maps.
Why the hash set is still the default
Both approaches are asymptotically linear. The hash set has a tighter constant, simpler code (no DSU class), and is more natural to explain in an interview. Mention union-find by name as an alternative — especially for streaming — but lead with the hash-set solution.
Approach comparison
| Approach | Time | Space | Pick when… |
|---|---|---|---|
| Sort + scan | Baseline / memory-constrained only. | ||
| Hash set + head filter | Default. Static one-shot problem. | ||
| Union-find | Streaming updates, or composing DSU. |
Default answer: hash set with chain-head filter. Name the amortized argument out loud when you close.
Follow-ups to anticipate
| The interviewer asks… | What to do |
|---|---|
| ”Return the sequence, not just its length.” | Remember the chain head that produced the best length, then re-walk from it. Two passes, still . |
| ”Numbers arrive one at a time; report the longest run after each.” | Union-find. Each new number is an add plus up to two unite calls, each . |
| ”The array has 10⁹ elements and can’t fit in memory.” | External sort + scan is the natural answer; streaming union-find if the values fit in memory even if the full multiset doesn’t. |
| ”What if the integers are bounded to a small range, say [0, 10⁶]?” | Bit-set instead of hash set: one bit per value, membership, far better cache locality. Same algorithm. |
| ”How would you solve this without any hash table — say, in a language without one?” | Sort and scan, or union-find with an array-backed DSU keyed on rank-compressed values. |
| ”Multidimensional — longest consecutive run in a matrix where adjacency is 4-connectivity.” | Different problem entirely: this is flood-fill (see Number of Islands). |
| ”Average case for the hash approach?” | Still . The worst case for hash operations is per lookup on adversarial inputs, but with a well-distributed hash on integers this is vanishingly rare. |
”Why not just use a sorted std::set with lookups?” | The walk would then be per chain, totalling — no better than sort + scan. The point of the hash set is the membership check. |
Pattern recognition — amortized walks on set membership
The move this chapter teaches — build a hash set; walk forward from filtered starting points so each element is touched a bounded number of times — is a staple of amortized linear-time algorithms. Places it reappears:
- Next Greater Element (monotonic stack). Each element is pushed and popped at most once; total work is despite a nested loop shape.
- Sliding Window Maximum. A monotonic deque touches each element at most twice — once on entry, once on exit.
- Kth Largest in Stream. Amortized analysis on heap size bounds the work per insertion.
- Longest Increasing Subsequence (patience sort). Binary search per element over a bounded-size sorted pile; total by amortized argument.
- Find-all-duplicates-in-O(1)-space. The index-placement trick visits each element at most twice via a cyclic-swap argument.
The reflex: when a nested loop looks quadratic but “most” inner iterations are short, ask whether each element can be bounded by a constant number of total visits. If yes, the algorithm is amortized linear even though no single iteration proves it.
What to say out loud — the delivery script
Ten beats:
- “Let me restate: given an unsorted array, return the length of the longest consecutive-integer subsequence.”
- Ask: “duplicates allowed? how large is n? streaming or one-shot?”
- “The obvious solution sorts and scans — . Let me name it, then try to beat it.”
- “We don’t need the elements sorted; we just need to know which values are present. That’s a hash set, membership.”
- “With the set, I can walk forward from each value. But walking from every value is on a long run. I need to restrict starting points.”
- “Only walk from chain heads: values where is not in the set. That guarantees each chain is traversed once.”
- Write the code — eight lines. Point at the
continueand say: “this is the load-bearing line.” - “Complexity argument: outer loop is ; total inner-walk work sums to across all chains; total .”
- Trace example 1 on the whiteboard. Show that produces exactly one walk from 1, and that 2, 3, 4 all skip.
- “For a streaming variant I’d switch to union-find. For a bounded-range variant I’d use a bitset. For the static problem, this hash-set approach is what I’d ship.”
Summary
Longest Consecutive Sequence looks like a sort-and-scan problem and turns out to have a linear-time answer that hinges on a single filter: only walk from chain heads. The insight — don’t start walking from values whose predecessor is already in the set — is what converts a naive nested loop into amortized linear work.
The wider lesson is the amortized-analysis reflex. Nested loops are not automatically quadratic. When each element is bounded in the number of total visits, the algorithm is linear no matter how the loops nest. That reflex unlocks the hash-set filter here, the monotonic-stack trick in next-greater-element, the sliding-window-max deque, and half a dozen other ” in disguise” patterns across the interview canon.