Valid Parenthesis String
Introduction
The classic parenthesis problem is a two-line stack trick: push on (, pop on ), valid iff the stack is empty at the end. Add a single wildcard — a * that can be (, ), or empty — and that trick shatters. Every * branches the set of reachable states three ways; by the end of a long string you are staring at an exponential search tree.
What makes this chapter worth a slot is the move the problem demands. Instead of tracking the count of open parens, you track the range of counts still reachable. A ( shifts the range up. A ) shifts it down. A * widens it. The string is valid iff 0 sits inside the range after the last character — and iff the ceiling never fell below 0 along the way.
When every character offers three choices, don’t pick — track the band of outcomes all of them can still reach.
Read this chapter linearly if parenthesis problems feel scary; jump to the greedy approach and the band viz if you’ve seen the brute force before.
What the interviewer is actually testing
This problem is graded on four things, in roughly this order:
- Recognition that the stack trick breaks. A candidate who reaches for
std::stackon the first*and never revisits the decision has revealed a ceiling. - Willingness to start from the brute force. Name the exponential baseline first. Rushing to the greedy insight without acknowledging what you’re skipping reads as over-pattern-matched.
- The DP instinct. Spot the overlapping subproblems. Even if you don’t write the code, say the words “I could memoize this over substring endpoints.”
- The greedy leap. Arrive at min/max tracking without being prompted. This is the senior signal — few candidates reach it under time pressure.
Every section below is organized around hitting those four marks.
Problem statement
Given a string over the three-character alphabet {'(', ')', '*'}, return true if can be turned into a balanced parenthesis string by interpreting each * as one of (, ), or the empty string. Return false otherwise.
A balanced parenthesis string has every ( matched by a later ) and no ) without an earlier (.
s = "(*)" true s = "(**))" true s = "*(*)()(()" false Clarifying questions — and why each one matters
The problem arrives slightly under-specified. What you ask in the first sixty seconds is scored directly.
Clarification questions
- Q Is an empty string considered valid?
- A Yes — the empty string is trivially balanced. You don't need a special case; every algorithm below handles it.
- Q Do we return a boolean, or the matched valid string?
- A The single most load-bearing clarification. If boolean, the greedy min/max approach works and is O(n) time, O(1) space. If the interviewer wants the actual interpretation back, the greedy cannot reconstruct — it never commits to a per-star choice. You'd fall back to the DP and walk the memo table to materialize a witness.
- Q Are the three '*' options independent per position?
- A Yes. Each '*' picks its own of '(', ')', or empty. Two stars are not required to resolve the same way.
- Q Can the input contain characters outside '(', ')', '*'?
- A Per the stated alphabet, no. Some variants mix in '[', ']', '{', '}' and become genuinely different problems. Confirm.
- Q What is the upper bound on the input length?
- A Decides which approach you can ship. At n ≥ 50 the O(3ⁿ) brute force is already infeasible. At n ≥ 1000 the O(n³) DP starts to feel it. Greedy is always safe.
A mental framework — why the stack dies
If the * weren’t in the alphabet, a single integer would solve this problem. Scan left to right. On (, increment; on ), decrement. Invalid if the counter ever goes negative, invalid at the end if the counter isn’t zero. That’s the standard stack trick in its leanest form (you don’t even need the stack; a counter suffices because there’s only one bracket type).
The * breaks that. After processing a prefix containing k stars, the “open-paren count so far” is not a single number — it is a set of 3^k possible numbers, one per way of resolving the stars you’ve seen. You cannot enumerate that.
But the set has a redeeming structure:
- It’s contiguous. Between the smallest and largest reachable counts, every integer in between is also reachable. You can shift any
*-as-(to*-as-)one step at a time, and each step changes the count by exactly 2. (Proof sketch: the count under any resolution is determined only by how many stars become(, how many become), and how many become empty — three independent knobs that change the count linearly.) - It’s nonnegative at every step — you clamp below at
0, because a prefix that would require a negative count represents an unmatched)in every resolution, which is already invalid.
So the “set of possible counts” collapses to two integers: the minimum low and the maximum high still reachable. That’s the algorithm.
Discussion
Brute force — recurse on every star
At each '*', branch three ways: treat it as '(', ')', or empty. Exponential, correct, your starting line.
The interview narration
Say this out loud before writing anything:
“The straightforward approach is to try every interpretation of every star — three branches per
*. That’s time, which is too slow, but it gives me a correct baseline and a recursion I can speed up.”
This sentence signals three things: you see the exponential, you know it’s unacceptable, and you have a plan to improve.
The code
The recursion tracks the running count of ( and ) seen so far. It short-circuits if ) count ever exceeds ( count (no suffix can ever salvage that) and accepts at the end if the two are equal.
#include <string>
namespace {bool solve(const std::string& s, std::size_t pos, int open, int closed) { if (closed > open) return false; if (pos == s.size()) return open == closed;
const char c = s[pos]; if (c == '(' || c == '*') { if (solve(s, pos + 1, open + 1, closed)) return true; } if (c == ')' || c == '*') { if (solve(s, pos + 1, open, closed + 1)) return true; } if (c == '*') { if (solve(s, pos + 1, open, closed)) return true; } return false;}} // namespace
bool checkValidString(const std::string& s) { return solve(s, 0, 0, 0);}One recursive call per choice at each '*'; literal '(' and ')' have exactly one branch each.
The '(' || '*' and ')' || '*' guards are worth calling out: they collapse three separate character checks into two guarded branches plus a third “ignore” branch gated on c == '*'.
Correctness
Claim. The recursion returns true iff some interpretation of the stars yields a balanced string.
(⇐) If any interpretation balances, the recursion finds it by taking, at each *, the branch that matches that interpretation. Literal characters have only one branch, which agrees.
(⇒) Each accept path corresponds to one concrete assignment of stars. If the path ends with open == closed and never had closed > open, the assignment produces a balanced string.
Complexity
- Time: worst case (every character a
*). A string of 20 stars already blows a one-second budget. - Space: recursion depth.
Memoized DP over substrings
The brute force revisits the same substring many times. Memoize over (i, j) intervals to collapse to O(n³).
The insight
Reframe the problem. Let denote “is the substring valid?” There are only distinct intervals, so if each takes work, the total is — a massive cut from exponential.
The recurrence, for a non-empty interval :
The disjunction ranges over all where can close: or . Any one witness makes true. The inner loop over is the work per interval.
The code
#include <cstdint>#include <string>#include <vector>
namespace {enum class Cell : std::int8_t { Unknown, Yes, No };
bool solve(const std::string& s, int i, int j, std::vector<std::vector<Cell>>& dp) { if (i > j) return true; if (dp[i][j] != Cell::Unknown) return dp[i][j] == Cell::Yes;
bool ok = false; const char c = s[i];
if (c == '*') { ok = solve(s, i + 1, j, dp); } if (!ok && c != ')') { for (int k = i + 1; k <= j && !ok; ++k) { if (s[k] == ')' || s[k] == '*') { ok = solve(s, i + 1, k - 1, dp) && solve(s, k + 1, j, dp); } } }
dp[i][j] = ok ? Cell::Yes : Cell::No; return ok;}} // namespace
bool checkValidString(const std::string& s) { if (s.empty()) return true; const int n = static_cast<int>(s.size()); std::vector<std::vector<Cell>> dp(n, std::vector<Cell>(n, Cell::Unknown)); return solve(s, 0, n - 1, dp);}Top-down memoization with a tri-state cache (Unknown / Yes / No) — cheaper and clearer than a std::optional<bool> table, and avoids the hash overhead of a std::unordered_map keyed on std::pair.
Two small C++17 notes worth naming aloud if asked:
- The
enum class Cell : std::int8_tcache gives a packed, type-safe three-state cell — one byte per entry — versus the more commonstd::vector<std::vector<std::optional<bool>>>which wastes space on the optional flag. - The helper is
staticinside an anonymous namespace so it doesn’t pollute the translation unit. The public entry pointcheckValidStringallocates the table and kicks off the recursion.
When DP beats the greedy
The greedy in the next section is strictly better on the boolean question. The reason to know the DP is the reconstruction variant: “return the valid interpretation.” The DP’s dp[i][j] table lets you walk back from dp[0][n-1] = true to a concrete choice per star. The greedy discards that information at every step and has no way to reconstruct.
Edge cases
| Input | Expected | Why |
|---|---|---|
"" | true | The base case i > j returns true immediately. |
"*" | true | Skip the single star (interpret as empty); dp[1][0] = true. |
")" | false | Substring starts with ); caught at the top of the recursion. |
"(" | false | No k in the empty suffix can close it; all candidates fail. |
"(((*)))" | true | Star as ) balances the extra (. |
Complexity
- Time: . states, each does work searching for a matching
)or*. - Space: for the memoization table; recursion depth.
Greedy — the open-count band Recommended
Track only the minimum and maximum reachable open-paren count. A '(' shifts the band up; a ')' shifts it down; a '*' widens it. Valid iff the high never drops below 0 and the low lands at 0.
The insight, stated cleanly
After scanning a prefix, the set of reachable open-paren counts is a contiguous integer range . That range evolves per character with three simple rules:
| Character | Δ low | Δ high |
|---|---|---|
( | +1 | +1 |
) | −1 | −1 |
* | −1 | +1 |
After every update, apply two universal rules:
- Clamp the floor:
low = max(low, 0). A negative floor would mean we kept an interpretation that turned too many stars into)— we simply wouldn’t pick that interpretation. - Fail-fast on the ceiling: if
high < 0, returnfalseimmediately. No future character can rescue an unmatched).
At the end, the string is valid iff low == 0 — some interpretation lands exactly at a zero count. If low > 0, every interpretation leaves at least one unmatched ( dangling.
low is the band floor (a ')' or empty star where possible), high is the ceiling (every star becomes '('). Press Next to advance one character. The string is valid iff high never goes negative AND low = 0 at the end.Start: the band is the single point {0}.
The code
#include <algorithm>#include <string>
bool checkValidString(const std::string& s) { int low = 0; int high = 0; for (const char c : s) { if (c == '(') { ++low; ++high; } else if (c == ')') { low = std::max(low - 1, 0); --high; if (high < 0) return false; } else { low = std::max(low - 1, 0); ++high; } } return low == 0;}Two integers and one pass. The early return on high < 0 is a real speedup on adversarial inputs and also the tightest correctness statement in the algorithm.
Correctness, precisely
Two claims, both load-bearing.
Invariant 1 (the band is a valid reachable range). After processing the first characters, every integer in is the open-paren count under some interpretation of the stars in that prefix; and no count outside the range is reachable.
Proof sketch. By induction. Base: before any character, [0, 0] is trivially the only reachable count. Inductive step: on any character, the per-character rules above take the previous range and produce the new reachable range. The star case is the interesting one — widening by 2 (from below to above) reflects the three options, and contiguity is preserved because stepping from one star-assignment to the next changes the count by ±2 continuously. Clamping at 0 is safe because any count below 0 is already rejected (see Invariant 2).
Invariant 2 (high < 0 is a permanent dead end). If at any step high < 0, every interpretation of the prefix has at least one unmatched ). No suffix can ever salvage that — future ( and * can only increase high, but a ) already went unmatched. Returning false immediately is correct and strict.
The terminal check low == 0 is just: “some interpretation balanced.”
Why clamp low at 0, but let high go negative?
This is the one subtle move, and interviewers will probe it.
highgoing negative means the maximum possible open count has dipped below zero, which implies every interpretation has an unmatched closer. That’s a true fail state — propagate it by returning false.lowgoing negative would just be an accounting artifact of an interpretation that chose too many*-as-). We never actually pursue those interpretations — we’d have switched a star to empty or(instead. Clampinglowat 0 encodes “we’re not forced to take interpretations that make the floor negative.”
Candidates who write --low without the clamp get wrong answers on strings like "*)": without clamping, low becomes -1 but the string is valid (star as (). The clamp is not an optimization — it is the algorithm.
Edge cases
| Input | Band trajectory | Result | Why |
|---|---|---|---|
"" | [0, 0] (no steps) | true | Terminal check low == 0 passes. |
"*" | [0, 1] | true | 0 is in the final range. |
")" | — | false (early) | high = -1 after step 1. |
"(" | [1, 1] | false | Final low = 1 ≠ 0. |
"*))" | [0, 1] → [0, 0] → fail | false (early) | high = -1 after the second ). |
"(((*)))" | ends at [0, 1] | true | Final low = 0. |
Complexity
- Time: . One pass, two integer updates per character.
- Space: . No auxiliary data structures.
Approach comparison
| Approach | Time | Space | When you pick it |
|---|---|---|---|
| Brute force | Never in production. Name it as the baseline; don’t code it unless asked. | ||
| Memoized DP | When the interviewer wants a reconstructed valid string, not just a boolean. | ||
| Greedy band | Default for the boolean question. Fast, clean, single pass, constant memory. |
Default answer: the greedy band. State that out loud when you close.
Follow-ups to anticipate
| The interviewer asks… | What to do |
|---|---|
| ”Reconstruct one valid interpretation.” | Fall back to the DP. Walk dp[0][n-1] = true recursively, recording the choice at each *. |
| ”What if the string has ', ' too?” | Greedy collapses. Three (or more) bracket types force you back to a stack with the * handled as nondeterministic. The problem becomes much harder. |
| ”Count the number of valid interpretations, not just existence.” | Different problem. A DP over dp[i][j][open_count] works in or so — name it but don’t code unless pressed. |
| ”What’s the space with reconstruction?” | for the full table. You cannot get below that and still reconstruct — each of substrings contributes a bit. |
| ”Streaming input — you only see one character at a time.” | The greedy works as-is. That’s a quiet strength: two integers of state, no look-ahead, no look-back. |
| ”Why does clamping at 0 not lose valid paths?” | Because any interpretation that drives low below 0 uses too many *-as-). A shorter-suffix interpretation (one * becomes empty instead) gives the same final result with low ≥ 0 throughout. |
| ”Two Pass instead of band?” | A common alternate formulation: one forward pass counting ) against ( + *, one backward pass counting ( against ) + *. Same time, less elegant framing. Mention it as prior art; don’t prefer it. |
Pattern recognition — tracking a band of possibilities
The move this chapter teaches generalizes. When each step of a scan offers a commutative family of local choices and you only need a yes/no answer at the end, you often don’t need to explore all interpretations — you need to track the range of outcomes they could produce.
Places the same pattern appears:
- Jump Game (LeetCode 55). Track the farthest reachable index rather than every reachable index. Same collapse.
- Regex Matching with
?and*.?widens the match band by one character;*widens it by any number. DP tables are the standard formulation, but the insight is the same. - Wildcard schedulers. When a job has a range of allowed start times, carry the range through dependencies rather than enumerating start times.
- Interval-bound propagation in SAT / constraint solvers. Each variable has a reachable range; constraints prune the range; clamp at implied bounds.
- Abstract interpretation in compilers. Integer-typed variables carry a range of reachable values through straight-line code; branches narrow the range, loops widen it. Same machinery, built to run at compile time.
The reflex you want: when a character (or step) has more than one valid interpretation, ask “can I represent the set of post-step states as a range rather than a list?” When yes, the exponential collapses.
What to say out loud — the delivery script
Ten beats. Rehearse until fluid.
- “Let me repeat: I have a string over
(,),*, and I need to decide if some interpretation of the stars makes it balanced.” - Ask: “Boolean or reconstruction? Empty string valid? Any other characters?”
- “Without
*this is trivial — a counter that tracks open-paren balance. The*is what makes it hard because each one branches three ways.” - “Brute force is : try every interpretation. Too slow, but correct. I’ll name it and move on.”
- “The subproblems overlap: every substring’s validity is a distinct question. I could memoize over endpoints for . I’d ship that if I needed to reconstruct the interpretation.”
- “For the boolean question there’s something better. Instead of tracking the open-paren count, track the range of counts still reachable. Two integers:
lowandhigh.” - Walk the per-character rule on the whiteboard:
(shifts both,)shifts both down with a clamp,*widens. - “If
highever goes negative, every interpretation has an unmatched)— return false. At the end, valid ifflow == 0.” - Trace the second example (
"(**))"). Show the band evolving to[0, 1]and landing at[0, 0]. - “One pass, two integers, time, space. That’s what I’d ship.”
Summary
Valid Parenthesis String looks like a stack problem until the * arrives and every state branches three ways. The brute force is exponential. The DP over substrings is and pays for itself only if you need to reconstruct an interpretation. The production answer is the greedy band: track the minimum and maximum reachable open-paren counts, widen on *, shift on literal brackets, fail early when the ceiling falls below zero, accept iff the floor ends at zero.
The lesson generalizes. When a decision cascade has commutative local choices and a single boolean question at the end, you almost never need to enumerate interpretations — you need to track the band of outcomes they could still reach. Internalize that move on this problem; it unlocks Jump Game, wildcard matching, and a long tail of “could this ever balance?” questions.