Davide Spataro

Two Sum

easy · Arrays & Hashing · 28 min read
array hashing two-pointers sorting interview-strategy

Introduction

Two Sum is the most-asked coding interview question on Earth. That fact alone would earn it a chapter. But Two Sum deserves far more than that: it is where most engineers first learn the single most important performance lever in the entire coding-interview discipline — trading space for time using a hash table.

In the forty-five minutes of a typical on-site, the interviewer is really testing two things. Can you produce correct code? And: can you reason about tradeoffs? Two Sum surfaces both, fast. A candidate who writes only the nested loop reveals a ceiling. One who sees the O(n)O(n) hashing solution but cannot explain why they look up before they insert reveals another. One who can produce the hashing solution, a sorted two-pointer alternative, and can argue when each wins — that candidate is already distinguishing themselves.

The skill Two Sum actually tests is not “can you do Two Sum”. It is “can you replace a search with a lookup, and know why it is safe.”

This chapter is designed to take you all the way there. If you are new to interview prep, read it linearly: the approaches, the narration, the traps. If you have written the hash solution a dozen times, jump to the Follow-ups and Correctness invariants sections — there is more nuance here than the typical problem sheet reveals.

What the interviewer is actually testing

Before we look at code, know what you are being graded on. The Two Sum rubric at most top companies scores four things:

  1. Clarification discipline. Did you ask at least two sharp clarifying questions in the first minute?
  2. Multiple approaches. Did you surface at least two qualitatively different solutions?
  3. Tradeoff articulation. Can you argue which solution you would ship, and under what constraints you would switch?
  4. Clean code. Did you write compilable, idiomatic code, and walk through it on a concrete input?

Every section below is organized around delivering those four signals.

Problem statement

Problem

Given an array AA of size nn and an integer TT, return true if any two distinct elements of AA sum to TT, and false otherwise.

Formally: return true iff there exist indices iji \neq j such that ai+aj=Ta_i + a_j = T.

Example 1
Input A = [9, 4, 17, 42, 36, -3, 15], T = 14
Output true
17 + (-3) = 14.
Example 2
Input A = [1, 3, 7], T = 6
Output false
No distinct pair sums to 6.
Example 3
Input A = [5, 5, 1], T = 10
Output true
Duplicate values at distinct indices are a valid pair.

Clarifying questions — and why each one matters

In the interview the problem will arrive slightly under-specified. Your clarifications are themselves being evaluated; asking the right ones in the first sixty seconds is one of the highest-value minutes of the whole session. Here is what to ask, and why.

Clarification questions

Q Can the input be modified?
A Decides whether the two-pointer approach is free. If yes, sort in place. If not, you either copy the array first or lean on hashing.
Q Can values be negative?
A If positive only, certain micro-optimizations open up (skip elements larger than T). If negatives are allowed, several tempting shortcuts quietly break.
Q Can the same element be used twice?
A The single most important clarification. Standard answer: no — indices must differ, but two elements with the same value at different indices are a valid pair. Candidates who skip this return wrong answers on inputs like [5, 5, 1], T=10.
Q Is the input sorted?
A If yes, the two-pointer walk is O(n) without the sort. Many interviewers present an unsorted array by default; always ask.
Q Should we return a boolean, or the indices of the pair?
A The common variant (e.g., LeetCode #1) asks for indices. The algorithms do not change but the data structure does — you use a hash map of value → index instead of a hash set.
Q Are there multiple valid pairs; return any or all?
A Affects whether you return on first match or iterate to the end. Most variants ask for any one pair; some ask for the count of pairs.
Q Any guarantees about n or value range?
A At n ≥ 10⁶ the hash-table constants start to matter; at values near INT_MAX, the sum may overflow. Ask.

A mental framework for lookup-shaped array problems

Before diving into approaches, here is the mental sequence I recommend for problems of this shape. It generalizes to 3Sum, 4Sum, Two Difference, Subarray Sum Equals K, Longest Substring Without Repeating Characters, and many more:

  1. Enumerate and check. Can you brute-force every candidate? Write it as a baseline — even if you throw it away.
  2. Exploit structure. Is the input sorted? Bounded in range? Non-negative? Structure is almost always the path to a better algorithm.
  3. Replace search with a hash lookup. If the inner loop is looking for a specific value, a hash table collapses it to O(1)O(1).
  4. Replace search with sort + two pointers. If the inner loop is looking for a range of values, sort + two pointers is often simpler than hashing.
  5. Replace recomputation with running state. Prefix sums, sliding windows, monotonic stacks.

For Two Sum, steps 1, 3, and 4 all apply. Let’s walk them.

Discussion

Approach 01

Brute force

Check every pair with two nested loops. Correct, quadratic, your starting line.

Time O(n²) Space O(1)

The interview narration

Say this out loud, almost verbatim:

“The straightforward approach is to check every pair — two nested loops, O(n2)O(n^2) time, O(1)O(1) space. I’ll write it as a baseline and then look for a faster one.”

This sentence signals three things at once: you know the obvious approach, you know it is not the answer, and you have a plan.

The code

brute-force.cpp
#include <vector>
// Returns true iff any two distinct indices i, j have A[i] + A[j] == T.
// O(n²) time, O(1) extra space. Use only as a baseline.
bool twoSum(const std::vector<int>& A, int T) {
const int n = static_cast<int>(A.size());
for (int i = 0; i < n - 1; ++i) {
for (int j = i + 1; j < n; ++j) {
if (A[i] + A[j] == T) return true;
}
}
return false;
}

Enumerate every index pair; quadratic time, constant extra space — the version you name before you optimize.

Why it is O(n2)O(n^2) — write this on the whiteboard

It is tempting to wave your hand and say “nested loops, quadratic”. Don’t. Write the arithmetic — it is free points and shows your work:

i=0n2(ni1)=n(n1)2=O(n2)\sum_{i=0}^{n-2}(n - i - 1) = \frac{n(n-1)}{2} = O(n^2)

The inner loop runs n1n-1 times for i=0i=0, n2n-2 for i=1i=1, and so on down to 11 for i=n2i=n-2. That is the triangular number, which is O(n2)O(n^2).

Edge cases this handles for free

  • n<2n < 2: the outer loop body never executes; we return false. Correct.
  • All identical values: A[i] + A[j] with i < j is enumerated; correct.
  • Negative numbers: the comparison is value-based; no issue.

When brute force is genuinely fine

If the interviewer explicitly says n100n \le 100 and there will be no follow-up questions, brute force is the answer. Don’t overengineer. But this is rare — usually the follow-up is “now do it faster.”

Approach 02

Hashing — one pass Recommended

Scan once. At each element, ask the hash set: have I already seen T − x? The move that turns quadratic into linear.

Time O(n) Space O(n) linearone pass

The insight, stated cleanly

The inner loop of brute force is searching for a single specific value: for each aia_i, is there an aja_j with jij \neq i such that ai+aj=Ta_i + a_j = T? Rearrange that equation. We’re looking for aj=Taia_j = T - a_i. A hash set answers that lookup in constant time on average.

So: scan AA once. At each index, ask the set “have I already seen TaiT - a_i?”. If yes, return true. If not, record aia_i and move on.

The critical subtlety — look before you insert

This is the detail that separates candidates.

A tempting wrong variant: stuff the whole array into the set first, then in a second loop look up each TaiT - a_i. This is wrong when TT is even and T/2T/2 appears exactly once in AA: the lookup will find T/2T/2 — itself — and return a false positive.

Concrete failure: A=[1,2,5,4],T=10A = [1, 2, 5, 4], T = 10. The two-pass version finds 55 in the set, 10 - 5 = 5 also in the set, returns true. But there is no valid pair. The correct answer is false.

Search before you insert is the invariant that keeps the element out of its own pair.

Live · Hashing approachUse Play or the step buttons to walk the array; drag Target T to change the sum under test. At each step the chip under the cursor shows the complement the set must already contain for a hit — that matches the look-before-insert rule in the text.
ArraySeen · value → index9[0]4[1]17[2]42[3]36[4]-3[5]15[6]looking for 59
Step 1 of 6

At A[0] = 9, look for T − value = 14 − 9 = 5. Not in the set yet — insert 9.

The code

hashset.cpp
#include <unordered_set>
#include <vector>
bool twoSum(const std::vector<int>& A, int T) {
std::unordered_set<int> seen;
seen.reserve(A.size());
for (int x : A) {
// CRITICAL: look for the complement BEFORE inserting x,
// so x can never pair with itself when T == 2·x.
if (seen.count(T - x)) return true;
seen.insert(x);
}
return false;
}

One pass: look up the complement before inserting the current value — the standard interview solution for the boolean variant.

The indices variant

A frequent variant (and the one LeetCode #1 asks) is to return the indices of the pair, not a boolean. One small change — a hash map of value → index instead of a set — handles it:

hashmap-indices.cpp
#include <unordered_map>
#include <vector>
// Returns the zero-based indices {i, j} of any valid pair, or {-1, -1} if none.
// O(n) time, O(n) space.
std::pair<int, int> twoSumIndices(const std::vector<int>& A, int T) {
std::unordered_map<int, int> idx; // value -> index of first occurrence
idx.reserve(A.size());
for (int i = 0; i < static_cast<int>(A.size()); ++i) {
const int complement = T - A[i];
auto it = idx.find(complement);
if (it != idx.end()) {
return {it->second, i};
}
// Insert only after the lookup — never overwrites an earlier occurrence.
idx.emplace(A[i], i);
}
return {-1, -1};
}

Same idea with a map from value to index, for when the interviewer wants positions back (e.g. LeetCode 1).

Anticipate this variant. When the interviewer says “return the indices” after your boolean solution, the delta should take you twenty seconds.

Correctness argument

Claim. The algorithm returns true iff a valid pair exists.

(⇐) If a valid pair (i,j)(i, j) exists with i<ji < j, then when we process index jj, aia_i is already in the set (inserted at iteration ii). We look up Taj=aiT - a_j = a_i, find it, return true.

(⇒) If we return true at iteration jj, the lookup found ak=Taja_k = T - a_j in the set. Since we only insert after lookups, aka_k was inserted at some earlier iteration i<ji < j. Indices differ, sum equals TT. Valid pair.

Short argument. Memorize it. An interviewer asking “why is this correct?” is a chance to look sharp, not to stall.

Edge cases to call out explicitly

InputExpectedWhy it works
Empty array, or a single elementfalseYou never have two distinct indices, so the complement test never pairs two values.
[5, 5], T=10T = 10trueFirst index inserts 5; second index looks for 10510 - 5 and finds the value already in the set.
Single [5], T=10T = 10falseOne iteration: the set is empty on lookup, then you insert 5 and stop — no pair.
[5, 1, 4], T=10T = 10falseHitting 10 with two 5s would re-use the same 5; look-before-insert rules that out.
All zeros, T=0T = 0true if n2n \ge 2The second index hits complement 0 in the set from the first.

Complexity, precisely

  • Time: O(n)O(n) expected. Each of nn elements triggers one lookup and one insert, both O(1)O(1) expected for a good hash. Worst-case O(n2)O(n^2) if the hash degrades (e.g., adversarial inputs hitting pathological collisions). In competitive programming you sometimes need a custom hash to prevent this; in an interview, mention it only if asked.
  • Space: O(n)O(n). Up to nn elements may be stored before a match is found.
Approach 03

Sort, then two pointers

When you cannot allocate a hash set, or when the input is already sorted, two pointers find the pair in O(1) extra space.

Time O(n log n) Space O(1) sorted arrayconstant space

The mechanic

Once AA is sorted, place one pointer at the start (LL) and one at the end (RR). Look at A[L]+A[R]A[L] + A[R] and move:

Sum at LL, RRActionWhy
A[L]+A[R]=TA[L] + A[R] = Treturn trueA valid pair on the two pointers.
A[L]+A[R]<TA[L] + A[R] < TLL+1L \leftarrow L+1Need a larger sum; the array is sorted, so only moving LL right can increase.
A[L]+A[R]>TA[L] + A[R] > TRR1R \leftarrow R-1Need a smaller sum; only moving RR left can decrease.

Stop when LL meets RR.

two-pointers.cpp
#include <algorithm>
#include <vector>
// If the input is already sorted, skip the std::sort and this is O(n) / O(1).
// Otherwise O(n log n) / O(1).
// If the caller needs A to remain untouched, sort a copy.
bool twoSum(std::vector<int> A, int T) {
std::sort(A.begin(), A.end());
int lo = 0;
int hi = static_cast<int>(A.size()) - 1;
while (lo < hi) {
const int s = A[lo] + A[hi];
if (s == T) return true;
if (s < T) ++lo; // Sum too small: the only way to grow it is to raise the left.
else --hi; // Sum too large: the only way to shrink it is to lower the right.
}
return false;
}

After sorting in place, walk $L$ and $R$ inward — linear scan after the sort, $O(1)$ extra space.

Live · Sorted two pointersThe array is sorted in the widget. Use Next to advance one decision (move L, R, or declare a match);Reset re-runs from both ends. The line below the grid states the sum inequality driving each move — it should read like the two-pointer case split in the chapter.
-3L4915173642R
Step 1 of 3

Sum 39 > T 14. Move R left to decrease the sum.

Correctness — the invariant

The two-pointer walk looks like magic; experienced candidates articulate why it is not.

Invariant. At every step of the walk, no pair (i,j)(i, j) with i<Li < L or j>Rj > R can be the answer.

Proof sketch.

  • We move LL right only when A[L]+A[R]<TA[L] + A[R] < T. At that moment, every j[L+1,R]j \in [L+1, R] has A[j]A[R]A[j] \le A[R] because the array is sorted — wait, the opposite: for jRj \le R, A[j]A[R]A[j] \le A[R]. So A[L]+A[j]A[L]+A[R]<TA[L] + A[j] \le A[L] + A[R] < T. No pair with this LL can sum to TT. Safe to discard LL.
  • The RR-- case is symmetric.

Because we never discard a pointer that could participate in a valid pair, if a valid pair exists, the walk finds it.

State this invariant out loud in the interview. It turns “I’m moving the pointers” into “here is why this is correct”, which is the difference between a mid candidate and a strong one.

When you pick two-pointers over hashing

  • The input is already sorted. Now it’s O(n)O(n) time, O(1)O(1) space — strictly better than hashing.
  • The interviewer constrains memory to O(1)O(1). Hashing is disqualified.
  • Adversarial inputs are a concern. The worst-case hash degeneration (O(n2)O(n^2)) can matter in security-sensitive code; two-pointers is always O(nlogn)O(n \log n).
  • Cache behavior matters. Linear scans are cache-friendly; hash tables are the opposite.

When you don’t

  • The input is unsorted and the size is large. Then the sort cost dominates and hashing wins. This is the common case.

Approach comparison

ApproachTimeSpaceProsCons
Brute forceO(n2)O(n^2)O(1)O(1)Simple, correct, reliableToo slow for large nn
Hashing (one pass)O(n)O(n) expectedO(n)O(n)Fastest single-pass solutionExtra memory; worst-case hash collisions
Sort + two pointersO(nlogn)O(n \log n)O(1)O(1)Constant extra space; cache-friendlyModifies or copies input; slower than hashing

Default answer for an unsorted input: hashing. Default for a sorted input: two pointers. Default when memory is tight: two pointers. Say this out loud when you close.

Follow-ups to anticipate

The interviewer asks…What to do
”Return the indices.”Swap hash set for hash map (shown above).
”Return all pairs, not just one.”Iterate to the end instead of returning early; dedup if needed.
”What if the array is sorted?”Go straight to two pointers; mention you’d skip the O(nlogn)O(n \log n) sort.
”What is 3Sum?”Fix one element; do Two Sum on the rest; total O(n2)O(n^2). Standard pattern.
”Values are floats.”Hash-equality on floats is brittle; use epsilon comparison or hash by rounded key.
”The array doesn’t fit in memory.”Distributed sort + linear sweep, or streaming hash with a bounded Bloom-filter front-end.
”Adversarial input.”Mention randomized hashing / universal hash families to prevent worst-case collisions.
”Reduce space to O(1)O(1) on an unsorted array.”You cannot do better than O(nlogn)O(n \log n) time with O(1)O(1) space — sort in place, then two pointers.

Pattern recognition — the real lesson

The move you learned here — replace an inner search with a hash lookup — is the single most productive pattern in the early-round array portion of coding interviews. It reappears constantly:

  • 3Sum, 4Sum. Fix k2k - 2 elements, reduce to Two Sum.
  • Subarray Sum Equals K. Hash prefix sums; look up prefix − K.
  • Contains Duplicate. Hash each element; check before insert.
  • Group Anagrams. Hash by a canonical key (sorted string).
  • Longest Substring Without Repeating Characters. Hash last-seen position.
  • Valid Sudoku. Hash sets per row, column, and box.

Once you see the shape — “at each element, I need to ask a lookup question about previous elements” — you apply the move on reflex. That reflex is what Two Sum is really teaching.

What to say out loud — the delivery script

Delivery is scored. Here is the narrative arc to rehearse:

  1. Repeat the problem in your own words. “So I have an array of integers and a target; I need to decide whether any two distinct elements sum to the target.”
  2. Ask the three clarifying questions. (See above.)
  3. State the brute force + its complexity. “The obvious approach is two nested loops, O(n2)O(n^2). Let me write that as a baseline.”
  4. Announce the insight. “I think I can do better by rearranging the equation. aj=Taia_j = T - a_i. If I’ve seen aia_i before, I can look it up in a hash set in constant time.”
  5. Write the code. No more than a dozen lines.
  6. Walk through an example. Pick one of the given examples; trace the hash set filling.
  7. State the look-before-insert invariant. “One trap here: you have to check the set before inserting aia_i, otherwise T=2aiT = 2 \cdot a_i will false-positive.”
  8. Call out edge cases. Empty array, single element, T=2T = 2 \cdot some element.
  9. Mention the alternative. “If the input were sorted or we couldn’t allocate memory, I’d switch to two pointers after a sort.”
  10. Summarize. “I’d ship the hashing version unless O(1)O(1) space is required.”

Rehearse this until it’s automatic. The candidates who get offers aren’t the ones with the cleverest code — they’re the ones whose interview feels organized.

Summary

Two Sum looks like one problem. It is really three: the brute force is a warm-up, the hashing solution is the workhorse, the two-pointer solution is the connoisseur’s answer. Know all three, know when to use each, know the invariant that makes each one correct, and know what to say as you are writing them.

The problem is also a pattern detector. Every time an interview problem asks you to find something in an array where the criterion is “relative to some other element”, the Two Sum move — hash the values, look up the complement — is your first instinct. Internalize it here; it will save you on dozens of problems after.