Subset Sum Problem

Definition

Subset Sum Problem

Let $S = {w_{1}, \dots, w_{n}}$ be a multiset of positive integers and $W \in Z^{+}$ be a target sum. The subset sum problem is a decision problem asking whether there exists a subset $S^{'} \subseteq S$ such that the sum of the elements in $S^{'}$ is exactly $W$ :
$\sum_{w \in S^{'}} w = W$

NP-completeness

$3-SAT \leq_{P} SUBSET SUM$

Let’s start from an instance of the 3-SAT problem (which is NP-complete). We construct numbers in base 10 to prevent carries. This reduction proves that SUBSET SUM is NP-hard (and since it is in NP, it is NP-complete).

Reduction Idea:

Digit Reservations:

For each variable in the 3-SAT instance, reserve a dedicated “variable-digit” in the Subset Sum numbers. We can imagine these are on the left.

For each clause in the 3-SAT instance, reserve a dedicated “clause-digit”. The “position” of that digit won’t matter at all, but let’s say they are on the right.

Variable Numbers: For each variable $x$ in the 3-SAT instance, we create two numbers $x_{T}$ and $x_{F}$ :

$x_{T}$ has a 1 at $x$ ‘s variable-digit and a 1 at the clause-digit of each clause satisfied by $x$ . It is 0 everywhere else.

$x_{F}$ has a 1 at $x$ ‘s variable-digit and a 1 at the clause-digit of each clause satisfied by $\overset{x}{ˉ}$ . It is 0 everywhere else.

Target Sum ( $W$ ): Set the target sum $W$ to have a 1 for each variable-digit and a 3 for each clause-digit.
Intuition: A satisfying SAT assignment will add up to a number that has (a) 1 in each variable-digit and (b) 1, 2, or 3 in each clause-digit.

Fill-up Numbers: Finally, add 2 “fill-up numbers” for each clause-digit that each have a 1 in their clause-digit and 0 everywhere else.
Intuition: A clause may be satisfied by 1, 2, or 3 true literals. This means its clause-digit might sum to 1, 2, or 3 from the variable numbers alone. Since the target sum $W$ requires exactly a 3 in every clause-digit, we need “fill-up” numbers to make up the difference (adding 2, 1, or 0 respectively) without altering the variable-digits.

Correctness:

( $\Rightarrow$ ) Satisfiable $\to$ Subset Sum: Pick $x_{T}$ if $x$ is true, else $x_{F}$ . The variable-digits sum to exactly 1. Since the formula is satisfied, each clause-digit sums to 1, 2, or 3. By picking 2, 1, or 0 fill-up numbers respectively for each clause, every clause-digit reaches exactly 3, matching $W$ .

( $\Leftarrow$ ) Subset Sum $\to$ Satisfiable: To sum to 1 at each variable-digit without carries, we must choose exactly one of $x_{T}$ or $x_{F}$ for every variable, yielding a valid truth assignment. Because there are only 2 fill-up numbers per clause, the variable numbers must contribute at least 1 to each clause-digit to reach the target of 3. Thus, every clause has at least one true literal.

Example:
Consider $F = (x_{1} \lor x_{2} \lor \overset{x}{ˉ}_{3}) \land (\overset{x}{ˉ}_{1} \lor x_{2} \lor x_{3})$ with variables $x_{1}, x_{2}, x_{3}$ and clauses $c_{1}, c_{2}$ . We use 5 digits in total (3 variable-digits, 2 clause-digits).

Variable Numbers:

$x_{1, T} = 10010$ (satisfies $c_{1}$ )

$x_{1, F} = 10001$ (satisfies $c_{2}$ )

$x_{2, T} = 01011$ (satisfies $c_{1}, c_{2}$ )

$x_{2, F} = 01000$ (satisfies neither)

$x_{3, T} = 00101$ (satisfies $c_{2}$ )

$x_{3, F} = 00110$ (satisfies $c_{1}$ )

Fill-up Numbers:

For $c_{1}$ : $f_{1, 1} = 00010, f_{1, 2} = 00010$

For $c_{2}$ : $f_{2, 1} = 00001, f_{2, 2} = 00001$

Target Sum: $W = 11133$

Under the satisfying assignment $(x_{1} = F, x_{2} = T, x_{3} = F)$ , we select $x_{1, F}, x_{2, T}, and x_{3, F}$ . Their sum is $10001 + 01011 + 00110 = 11122$ . To reach $W = 11133$ , we must simply include one fill-up number for $c_{1}$ ( $00010$ ) and one for $c_{2}$ ( $00001$ ).

Weak NP-hardness

The complexity of the subset sum problem depends heavily on the numerical encoding of its input.

Binary Encoding: When the numbers are encoded in binary, the problem is NP-hard.
Unary Encoding: If the numbers are encoded in unary (e.g., $5$ is encoded as 11111 rather than the binary 101), the input size becomes exponentially larger, and the dynamic programming algorithm runs in time polynomial to this unary input size. Hence, under unary encoding, the problem is in P.

Problems that exhibit this specific complexity-theoretic behaviour—NP-hard in general, but solvable in polynomial time if inputs are given in unary—are called weakly NP-hard (or pseudo-polynomial).

Practical Implications

We would never actually use the unary encoding to run the algorithm in the real world. Unary encoding simply acts as a formal, simple-to-state promise: every number in the input is polynomially bounded by the input size.

Therefore, a more practical reformulation is: Subset Sum is NP-hard in general, but lies in P if every number in the input is polynomially bounded by the input size. This scenario happens surprisingly often in practice, as the numerical values frequently represent bounded quantities like physical objects.

Reductions

Reduction to Knapsack Problem

Given an instance $(P_{s}, W_{s})$ of the subset sum problem, construct an instance $(Q_{k}, S_{k}, W_{k}, P_{k})$ of the knapsack problem as follows:

for each $z \in P_{s}$ , add a pair to $Q$ : $Q \leftarrow Q \cup {(z, z)}$
set $W_{k} \leftarrow W_{s}$ ,
set $P_{k} \leftarrow W_{s}$

Thus:

⟺ ⟺ (w, p) \in S_{k} \sum w \leq W_{k} \land (w, p) \in S_{k} \sum p \geq P_{k} z \in P_{s} \sum z \leq W_{s} \land z \in P_{s} \sum z \geq W_{s} z \in P_{s} \sum z = W_{s}

From the above, it follows that the Knapsack problem is also weakly NP-hard.

Approaches

Brute-force

A brute-force approach evaluates all possible subsets of $S$ and calculates their sum. Since there are $2^{n}$ possible subsets and computing the sum of a subset takes at most $O (n)$ operations, this approach is strictly exponential.

Time complexity: $O (n 2^{n})$

Dynamic Programming

The problem can be solved in pseudo-polynomial time using dynamic programming.

Let $D P [i, j]$ be a boolean value indicating whether a subset of the first $i$ elements ${w_{1}, \dots, w_{i}}$ can sum to exactly $j$ . The state transition is defined as:

D P [i, j] = {D P [i - 1, j] \lor D P [i - 1, j - w_{i}] D P [i - 1, j] if j \geq w_{i} if j < w_{i}

The base cases are $D P [0, 0] = True$ and $D P [0, j] = False$ for $j > 0$ . The target sum $W$ is achievable if $D P [n, W]$ evaluates to $True$ .

Time complexity: $O (nW)$

This time complexity is pseudo-polynomial because it scales linearly with the numeric magnitude of the target sum $W$ , rather than polynomially with the number of bits required to represent $W$ in the input (which is $O (lo g W)$ bits).

Lukas' Notes

Subset Sum Problem

Table of Contents

Definition

NP-completeness

Weak NP-hardness

Reductions

Reduction to Knapsack Problem

Approaches

Brute-force

Dynamic Programming

Backlinks