Definition
Subset Sum Problem
Let be a multiset of positive integers and be a target sum. The subset sum problem is a decision problem asking whether there exists a subset such that the sum of the elements in is exactly :
NP-completeness
Let’s start from an instance of the 3-SAT problem (which is NP-complete). We construct numbers in base 10 to prevent carries. This reduction proves that SUBSET SUM is NP-hard (and since it is in NP, it is NP-complete).
Reduction Idea:
Digit Reservations:
- For each variable in the 3-SAT instance, reserve a dedicated “variable-digit” in the Subset Sum numbers. We can imagine these are on the left.
- For each clause in the 3-SAT instance, reserve a dedicated “clause-digit”. The “position” of that digit won’t matter at all, but let’s say they are on the right.
Variable Numbers: For each variable in the 3-SAT instance, we create two numbers and :
- has a
1at ‘s variable-digit and a1at the clause-digit of each clause satisfied by . It is0everywhere else.- has a
1at ‘s variable-digit and a1at the clause-digit of each clause satisfied by . It is0everywhere else.Target Sum (): Set the target sum to have a
1for each variable-digit and a3for each clause-digit.
Intuition: A satisfying SAT assignment will add up to a number that has (a)1in each variable-digit and (b)1,2, or3in each clause-digit.Fill-up Numbers: Finally, add 2 “fill-up numbers” for each clause-digit that each have a
1in their clause-digit and0everywhere else.
Intuition: A clause may be satisfied by 1, 2, or 3 true literals. This means its clause-digit might sum to 1, 2, or 3 from the variable numbers alone. Since the target sum requires exactly a 3 in every clause-digit, we need “fill-up” numbers to make up the difference (adding 2, 1, or 0 respectively) without altering the variable-digits.Correctness:
- () Satisfiable Subset Sum: Pick if is true, else . The variable-digits sum to exactly
1. Since the formula is satisfied, each clause-digit sums to 1, 2, or 3. By picking 2, 1, or 0 fill-up numbers respectively for each clause, every clause-digit reaches exactly3, matching .- () Subset Sum Satisfiable: To sum to
1at each variable-digit without carries, we must choose exactly one of or for every variable, yielding a valid truth assignment. Because there are only 2 fill-up numbers per clause, the variable numbers must contribute at least1to each clause-digit to reach the target of3. Thus, every clause has at least one true literal.Example:
Consider with variables and clauses . We use 5 digits in total (3 variable-digits, 2 clause-digits).
- Variable Numbers:
- (satisfies )
- (satisfies )
- (satisfies )
- (satisfies neither)
- (satisfies )
- (satisfies )
- Fill-up Numbers:
- For :
- For :
- Target Sum:
Under the satisfying assignment , we select . Their sum is . To reach , we must simply include one fill-up number for () and one for ().
Weak NP-hardness
The complexity of the subset sum problem depends heavily on the numerical encoding of its input.
- Binary Encoding: When the numbers are encoded in binary, the problem is NP-hard.
- Unary Encoding: If the numbers are encoded in unary (e.g., is encoded as
11111rather than the binary101), the input size becomes exponentially larger, and the dynamic programming algorithm runs in time polynomial to this unary input size. Hence, under unary encoding, the problem is in P.
Problems that exhibit this specific complexity-theoretic behaviour—NP-hard in general, but solvable in polynomial time if inputs are given in unary—are called weakly NP-hard (or pseudo-polynomial).
Practical Implications
We would never actually use the unary encoding to run the algorithm in the real world. Unary encoding simply acts as a formal, simple-to-state promise: every number in the input is polynomially bounded by the input size.
Therefore, a more practical reformulation is: Subset Sum is NP-hard in general, but lies in P if every number in the input is polynomially bounded by the input size. This scenario happens surprisingly often in practice, as the numerical values frequently represent bounded quantities like physical objects.
Reductions
Reduction to Knapsack Problem
Given an instance of the subset sum problem, construct an instance of the knapsack problem as follows:
- for each , add a pair to :
- set ,
- set
Thus:
From the above, it follows that the Knapsack problem is also weakly NP-hard.
Approaches
Brute-force
A brute-force approach evaluates all possible subsets of and calculates their sum. Since there are possible subsets and computing the sum of a subset takes at most operations, this approach is strictly exponential.
Dynamic Programming
The problem can be solved in pseudo-polynomial time using dynamic programming.
Let be a boolean value indicating whether a subset of the first elements can sum to exactly . The state transition is defined as:
The base cases are and for . The target sum is achievable if evaluates to .
This time complexity is pseudo-polynomial because it scales linearly with the numeric magnitude of the target sum , rather than polynomially with the number of bits required to represent in the input (which is bits).