Merge Intervals
Introduction
Intervals are everywhere in production software. They represent calendar events, string spans, segments on a number line, TCP flows, reservations, cache lifetimes, query ranges, lock holds. And yet the genre of interview problems around them reduces, almost entirely, to a single technique: sort by an endpoint, then walk the list once. Merge Intervals is the chapter that teaches it.
This problem shows up so reliably at Google, Meta, and Amazon that recruiters joke about it. Its interview utility is not that it is hard — the final algorithm is a dozen lines — but that it discriminates cleanly between candidates who guess and candidates who reason. A candidate who sorts by start and walks once is strong. One who also articulates why sorting by start is the correct choice, and why one pass after sorting is sufficient, is stronger still. And one who can then solve the Meeting Rooms variant with either a sweep or a heap, each time explaining the tradeoff, is the candidate we hire.
Sort the boundaries, sweep once. Two phrases that solve the majority of the interval problems you will ever see.
What the interviewer is actually testing
- Do you reach for sorting on first principles? Unsorted intervals force . Sorted intervals collapse to . Recognizing this is the entire game.
- Can you state a precise overlap condition? Most candidates mumble a case-by-case list. Strong candidates write one inequality.
- Can you prove your one-pass sweep is correct? The “I just compare to the back of the output” line is not a proof. An invariant is.
- Can you extend cleanly? The Meeting Rooms variant, the insert-interval variant, the non-overlapping-intervals variant — all should feel like variations on the same theme once the first problem is solved.
Problem statement
Given a list of intervals , return a new list where every pair of overlapping intervals has been collapsed into their union.
I = [(3,7), (1,5), (6,8), (4,6)] [(1,8)] I = [(1,5), (6,7), (4,4), (9,12)] [(1,5), (6,7), (9,12)] I = [] [] Clarifying questions — and why each one matters
Clarification questions
- Q Is the input sorted?
- A The single most important question. If sorted, you skip the O(n log n) sort and the walk is O(n). Always ask — do not assume.
- Q Do touching intervals merge, or only strictly overlapping ones?
- A Do
(1,3)and(3,5)merge into(1,5), or stay separate? The two conventions differ by a single character in the code (<=vs.<). This is a favorite interviewer trap — ask, do not guess. - Q Are intervals closed, half-open, or something else?
- A Closed
[s, e]is standard in interview questions; half-open[s, e)is standard in production code. Affects both the overlap test and the merge semantics. - Q Can
start == end? - A A point interval. Usually yes. If so, check that your overlap test still handles it correctly; the formula
max(a.s, b.s) ≤ min(a.e, b.e)does. - Q Can the input be modified?
- A If yes, you can sort in place. If not, sort a copy — an O(n) space overhead.
- Q Any guarantee on n or the value range?
- A Large values of n favor sweep-line approaches over per-time-unit counting (which is O(k) in the value range).
The overlap condition — write this once, use it forever
Most candidates get asked “when do two intervals overlap?” at some point. The mumbled answer goes: “well, either is inside , or is inside , or ‘s start is inside , or ‘s start is inside …”. Four cases, bug-prone.
The clean answer is one inequality. Two intervals and overlap iff
Carry this formula with you. It’s the overlap test for every interval problem in the book.
Discussion
Brute force — greedy pairwise merging
Pick an unprocessed interval. Absorb every overlapping interval into it. Repeat. Correct but quadratic.
The interview narration
“The naive approach is to pick each interval in turn and greedily absorb everything that overlaps it, repeating until no more merges happen. time. I’ll sketch it to anchor, then I think sorting gets us to .”
The code
#include <algorithm>#include <vector>
struct Interval { int start; int end;};
// Two intervals overlap iff max(a.start, b.start) <= min(a.end, b.end).static bool overlaps(const Interval& a, const Interval& b) { return std::max(a.start, b.start) <= std::min(a.end, b.end);}
static Interval fuse(const Interval& a, const Interval& b) { return { std::min(a.start, b.start), std::max(a.end, b.end) };}
std::vector<Interval> merge(const std::vector<Interval>& in) { const int n = static_cast<int>(in.size()); std::vector<bool> absorbed(n, false); std::vector<Interval> out;
for (int i = 0; i < n; ++i) { if (absorbed[i]) continue; Interval cur = in[i]; // Keep scanning the whole list until no more absorbs happen. // A single pass is not enough: newly-fused intervals may overlap // ones already skipped in this iteration. bool changed = true; while (changed) { changed = false; for (int j = 0; j < n; ++j) { if (j == i || absorbed[j]) continue; if (overlaps(cur, in[j])) { cur = fuse(cur, in[j]); absorbed[j] = true; changed = true; } } } out.push_back(cur); } return out;}Quadratic merge-by-absorption: the version you contrast with sort-then-linear, not the one you ship.
Correctness sketch
The inner while loop is critical. A single pass over the remaining intervals is not enough: absorbing interval may extend cur, which may then overlap interval that we’ve already passed over. The loop keeps going until a full pass finds nothing new. Because each absorption marks an interval absorbed[j] = true and the loop can only terminate when no unmarked interval overlaps cur, the output contains only pairwise disjoint intervals covering the same union as the input.
Why it’s , not worse
Each interval can be absorbed at most once, so the total number of absorptions across the entire run is . The outer i loop runs times, and each iteration does at most work before settling (because after absorbing intervals, the remaining work for that cur is one clean pass of ). So the total is .
Why you would not ship this
Two nested loops with a gratuitous while makes this fragile to reason about. In an interview, writing it cleanly is still a signal you understand the problem; never offer it as your final answer.
Sort, then one-pass merge Recommended
Sort by start. Every interval either extends the last output or starts a new one. The $O(n \log n)$ algorithm you should ship.
The insight — why sort by start?
Brute force does repeated work because it has no idea where the next overlap might be. Sorting fixes that. Once is sorted by start, any interval that doesn’t overlap the one before it cannot overlap any interval before it either. So we only need to compare each new interval to the last one we output.
Why sort by start and not by end? Because we want the leftmost endpoint to be decided first so that subsequent comparisons are against a stable rightmost endpoint of the current merge group. Sorting by end would give an equally clean but subtly different algorithm used in problems like non-overlapping intervals (the classic activity selection). For merging, start-sort is canonical.
Sorting reveals structure that is invisible in the unsorted input. That is the whole trick, across the entire sweep-line family.
The algorithm
sort I by start ascending
out = [I[0]]
for iv in I[1..]:
if iv.start <= out.last.end:
out.last.end = max(out.last.end, iv.end)
else:
out.append(iv)
return out The code
#include <algorithm>#include <vector>
struct Interval { int start; int end;};
// O(n log n) time (from the sort), O(n) space for the output.// Handles touching intervals as overlapping: (1,3) and (3,5) merge into (1,5).// If the desired convention is strict overlap only, change `<=` to `<`.std::vector<Interval> merge(std::vector<Interval> in) { if (in.empty()) return {};
std::sort(in.begin(), in.end(), [](const Interval& a, const Interval& b) { return a.start < b.start; });
std::vector<Interval> out; out.reserve(in.size()); out.push_back(in.front());
for (size_t i = 1; i < in.size(); ++i) { Interval& back = out.back(); const Interval& next = in[i];
// Sorted by start, so the only way to overlap is // for next.start to fall inside [back.start, back.end]. if (next.start <= back.end) { back.end = std::max(back.end, next.end); } else { out.push_back(next); } } return out;}Sort by start, then one linear pass — the default answer for “merge overlapping intervals.”
Edit intervals
Correctness — the invariant
State this in the interview. It is the difference between “I walked the pointers” and “I proved the algorithm”.
Invariant. After processing interval (0-indexed on the sorted input), every prefix- of the sorted input is correctly merged into the current out.
Inductive argument.
- Base. After processing ,
out = [I[0]]trivially represents that prefix. - Step. Assume the invariant holds for the prefix of length . Consider .
- If , then overlaps the current last output bucket (because , so overlap is determined by
start ≤ end). Extendingout.back.endto correctly merges into it. - If , then is strictly after all previously-seen intervals’ ends. Because the input is sorted, every future interval will also have — so it’s safe to “close” the current bucket and open a new one.
- If , then overlaps the current last output bucket (because , so overlap is determined by
That last sentence is the load-bearing one: future intervals cannot retroactively extend a previously closed bucket, because their starts can only grow. Sorting is what makes this guarantee hold.
Edge cases, explicitly
| Input | Output | Why |
|---|---|---|
| Empty list | Empty | Nothing to merge. |
| Single point | Unchanged | One interval; the loop never extends. |
| Nesting: contains | One interval | Extend the right end with of ends. |
| Touching: and | One interval | Touch counts as mergeable; use if the problem disallows touch. |
| Same start, different ends | One merged interval | Sort order of ties does not change the union. |
| Unsorted input | After sort, same as above | You always sort by start first. |
| Very large | time | Dominated by sort, then a single pass. |
Complexity, precisely
- Time: , dominated by the sort.
- Space: for the output list. If the interviewer says “modify in place”, you can write the merged intervals back into the input array in-place after the sort, giving extra space for the sort stack and output space.
Approach comparison
| Approach | Time | Space | Use when |
|---|---|---|---|
| Brute force | Never, in practice. Useful only as a reference against which to test the sort version. | ||
| Sort, then merge | Default. If the input is already sorted, drop the sort and it becomes . |
Related — How many meeting rooms?
A nearly-guaranteed follow-up: given a list of meetings, how many rooms do we need so that no two overlapping meetings share a room?
Given a list of meetings, each (start, end) with start < end (half-open on the right), return the minimum number of rooms needed to host all of them.
[(9,10), (9,12), (11,13), (14,15)] 2 [(1,10), (2,3), (4,5), (6,7), (8,9)] 2 Sweep the endpoints Recommended
Sort starts and ends independently. Walk them with two pointers; peak concurrency is the answer.
Concurrency can only change at start/end events — never between them. Extract all starts and all ends into two separate sorted arrays, then walk them with two pointers. Every start bumps the live count up; every end that has occurred by the current start time bumps it down (the room freed up can be reused).
#include <algorithm>#include <vector>
struct Interval { int start; int end;};
// Sort starts and ends independently. Walk them with two pointers.// A start event before the next end means we need another room.// Otherwise we can reuse a room that just freed up.int minMeetingRooms(const std::vector<Interval>& meetings) { const size_t n = meetings.size(); if (n == 0) return 0;
std::vector<int> starts, ends; starts.reserve(n); ends.reserve(n); for (const auto& m : meetings) { starts.push_back(m.start); ends.push_back(m.end); }
std::sort(starts.begin(), starts.end()); std::sort(ends.begin(), ends.end());
int rooms = 0; int peak = 0; size_t e = 0;
for (size_t s = 0; s < n; ++s) { while (e < n && ends[e] <= starts[s]) { --rooms; ++e; } ++rooms; peak = std::max(peak, rooms); } return peak;}Dual sorted lists of start and end times; two-pointer sweep to track concurrent meetings.
Why walking starts and ends separately is valid
A subtle point. You might worry: “if I sort starts and ends independently, don’t I lose the pairing — which start goes with which end?” The answer: we don’t care. We only need to know how many ongoing meetings there are at each event time. The number of starts before time is the number of meetings that have begun; the number of ends before-or-equal is the number that have finished; their difference is the number currently running. Since both streams are monotone in time, a two-pointer walk tracks the live count efficiently.
Edge cases
- Ties: half-open
[start, end)meansend <= startof a new meeting → room freed. The while-loop inside the sweep handles this correctly. - Closed intervals (
end > start) — swap<=for<in the ends loop. - Single meeting → returns 1. Multiple identical meetings → returns
n(all concurrent).
Heap of end-times
Min-heap of when rooms will next free up. For each meeting, reuse the earliest-freeing room if possible.
An alternative framing that some interviewers prefer. Sort meetings by start. Maintain a min-heap of end times — one entry per currently-occupied room. For each new meeting: if the earliest-finishing room is free by then, pop it and reuse. Push this meeting’s end onto the heap. At the end, the heap’s size equals the minimum number of rooms ever concurrently occupied.
#include <algorithm>#include <queue>#include <vector>
struct Interval { int start; int end;};
// Min-heap of end-times, sorted starts.// For each meeting (in order of start time): if the earliest-ending room// has freed up by `start`, reuse it; otherwise open a new room.// Same O(n log n) complexity as the sweep approach, arguably clearer code.int minMeetingRooms(std::vector<Interval> meetings) { if (meetings.empty()) return 0;
std::sort(meetings.begin(), meetings.end(), [](const Interval& a, const Interval& b) { return a.start < b.start; });
std::priority_queue<int, std::vector<int>, std::greater<int>> endTimes;
for (const auto& m : meetings) { if (!endTimes.empty() && endTimes.top() <= m.start) { endTimes.pop(); // free the earliest-finishing room } endTimes.push(m.end); // assign m to a room (new or reused) } return static_cast<int>(endTimes.size());}Process meetings by start; min-heap of end times tracks when the next room frees up.
Sweep vs. heap — how to decide
| Factor | Sweep | Heap |
|---|---|---|
| Big-O complexity | ||
| Constant factor | Lower (two sorts, two pointers) | Higher (heap ops) |
| Generality | Specific to “peak concurrency” | Generalizes to “who is in which room” |
| Clarity for an interviewer | Requires explaining why independent sorts work | Follows natural mental model: assign/reuse rooms |
Most candidates find the heap easier to explain on a whiteboard. Most interviewers have seen both; either is accepted.
Edge cases across both problems
| Case | Expected |
|---|---|
| Empty input | Merged: empty list; meeting rooms: rooms |
| Single interval | Unchanged merge; one room if one meeting |
| All identical | One merged bar; concurrent meetings |
| Point intervals (start = end) | Kept as points; sweep counts instants |
| One interval spans the rest | One merge result; room count = peak overlap |
| Large endpoint values | Comparisons only; no width sums |
| Negative timestamps | Still totally ordered after sort |
Follow-ups to anticipate
| The interviewer asks… | What to do |
|---|---|
| ”Insert a new interval into a sorted, disjoint list.” | Walk; copy intervals that end before the new one; merge overlapping ones into a running interval; copy the rest. . |
| ”Remove an interval.” | Same walk; split the overlapping interval into zero, one, or two pieces. . |
| ”Find the intersection of two sorted interval lists.” | Two-pointer walk over both lists; emit when they overlap. . |
| ”Return the gaps between intervals.” | After merging, for consecutive outputs , emit gap . |
| ”Largest overlap count and where it occurs.” | Sweep: track the max count and the time at which it is achieved. |
| ”Intervals that overlap with at least one other interval.” | Sweep: for each interval, check whether rooms > 1 during its lifetime, or equivalently whether it was merged into a group of size ≥ 2. |
| ”Intervals are on a 2D plane (rectangles).” | Separate x-axis sweep with y-axis interval trees; this is a longer problem — mention the approach. |
| ”Streaming: intervals arrive one at a time and you must maintain the merged list.” | Use an ordered structure (balanced BST or std::map) indexed by start; each insert is where is the number of intervals it absorbs. |
Pattern recognition — the sweep-line family
The pattern you learned here is the sweep line. It generalizes to nearly every problem where events happen over a 1D axis and you need to know something about “what is happening at time ”. The idea:
- Extract events (endpoints).
- Sort events by time.
- Walk events in time order, maintaining running state.
Problems that belong to this family:
- Insert Interval, Non-overlapping Intervals, Erase Overlap Intervals — direct applications.
- The Skyline Problem — sweep x-coordinates with a heap of live heights.
- Minimum Arrows to Burst Balloons — greedy shooting with a sorted-end-time sweep.
- Employee Free Time — merge intervals across multiple sorted lists (use a min-heap of starts).
- Maximum Population Year — difference array or sweep to find peak concurrency.
- Calendar scheduling (Calendar I, II, III) — all sweep variants.
Once you see the shape — “events on a number line, need a property of the running state” — you reach for sort + sweep on reflex.
What to say out loud — the delivery script
- Restate the problem. “I’m given a list of intervals, possibly overlapping; I need to return the list of their non-overlapping unions.”
- Ask the three questions. Sorted? Touching semantics? In-place?
- State the overlap formula. “Two intervals overlap iff .”
- Brute force as a baseline. “I could scan the list times, greedily absorbing — that’s .”
- Propose the sort. “But if I sort by start, any interval that doesn’t overlap the previous one can’t overlap any earlier one either, so I only need to compare each interval to the back of the output list. That’s .”
- Write the code. A dozen lines.
- Walk an example. Trace the algorithm on .
- State the invariant. “After each step, the output correctly merges the prefix of the sorted input.”
- Call out the touching semantics and other edge cases.
- Gesture at follow-ups. “If we needed to answer ‘how many meeting rooms’, I’d switch to a sweep of endpoints or a min-heap of end times.”
- Summarize. “I’d ship the sort-then-merge version.”
Summary
Merge Intervals is the sweep-line family’s hello-world. Nail the overlap formula, internalize the sort-by-start then compare-to-back-of-output pattern, and state the invariant that makes it correct. Once you have, the entire family — insert, remove, intersection, gaps, meeting rooms, skyline — becomes small variations on a theme you already understand.
The skill you’re actually practicing here, the one that transfers to dozens of problems, is “extract boundary events, sort them, walk once”. Internalize that and half of the interval problems in your interview pack reduce to thirty lines of code.