Binary Encoding is Compact

Binary encoding represents a number $W$ in $⌈ lo g_{2} (W + 1)⌉$ bits. In unary, the same number needs $W$ symbols. The gap between the two is exponential.

This compactness is usually treated as a harmless implementation detail, but it is not. It reshapes the boundary between tractable and intractable.

The Hidden Exponential

A dynamic programming algorithm for the subset sum problem runs in $O (nW)$ . Under unary encoding, $W$ is proportional to the input length, so the algorithm is ordinary polynomial time. Under binary encoding, $W$ can be exponentially larger than the input length, so the same algorithm is exponential. The problem is weakly NP-hard precisely because its hardness evaporates when numbers are small, which is exactly what unary encoding enforces.

A dynamic programming table with dimensions $S_{1} \times S_{2}$ contains exactly $S_{1} \cdot S_{2}$ cells. For an instance $I$ , let

ℓ_{1} (I) = ⌈ lo g_{2} (S_{1} + 1)⌉ and ℓ_{2} (I) = ⌈ lo g_{2} (S_{2} + 1)⌉

be the bit-lengths of the two numeric fields. Then

S_{1} \approx 2^{ℓ_{1} (I)} and S_{2} \approx 2^{ℓ_{2} (I)},

so the total number of cells is

S_{1} \cdot S_{2} \approx 2^{ℓ_{1} (I)} \cdot 2^{ℓ_{2} (I)} = 2^{ℓ_{1} (I) + ℓ_{2} (I)} .

The algorithm is polynomial in the numeric values $S_{1}$ and $S_{2}$ , yet exponential in the number of bits $ℓ_{1} (I) + ℓ_{2} (I)$ that encode them. The loops hide an exponential blow-up inside what looks like a simple double iteration.

Bi Knapsack is a concrete instance of this pattern. Its dynamic programme uses a table $T [i, s_{1}, s_{2}]$ , so its running time is

O (n S_{1} S_{2}) .

This is polynomial in the numeric capacities, but not necessarily polynomial in the encoded input. For instances with $S_{1} = S_{2} = 2^{L (I)}$ , each capacity needs only $L (I) + 1$ bits, while the table has

(S_{1} + 1) (S_{2} + 1) = Θ (2^{2 L (I)})

capacity states. This is the exact sense in which the dynamic programme is pseudo-polynomial.

When Encoding Does Not Matter

Not every problem is rescued by unary encoding. A strongly NP-hard problem remains intractable even when every numeric parameter is bounded by a polynomial in the input length. The hardness comes from combinatorial structure, not from numeric magnitude.

Consider the travelling salesman problem. Its input is a graph with $n$ vertices and edge weights. Even if all weights are restricted to ${0, 1, 2}$ and written in unary, the problem is still NP-hard. The exponential search space of $n!$ permutations does not shrink when the numbers are small. No pseudo-polynomial algorithm is known, and none exists unless $P = NP$ .

The weights are tiny, but the number of tours grows factorially. Unary encoding makes the weight labels longer, yet the permutation space is unchanged. The algorithm must still enumerate an exponential number of structures.

The Quantity Space

The clean dichotomy has a deeper cause. The question is whether the problem’s difficulty comes from iterating over a space whose size is a direct function of the numeric values themselves.

For the subset sum problem, the dynamic programming table has dimensions $n \times W$ . The state space size is proportional to $W$ itself:

f (W) = W \approx 2^{ℓ_{W} (I)} .

The function is essentially the identity on the numeric value. Because binary encoding compresses $W$ into $ℓ_{W} (I)$ bits, iterating over all values up to $W$ implicitly iterates over $2^{ℓ_{W} (I)}$ states. The exponential blow-up is hidden inside the loops.

For the travelling salesman problem, the search space is the set of all permutations of vertices:

f (∣ V ∣) = ∣ V ∣! .

The edge weights are merely coefficients in the objective function. Restricting weights to ${0, 1, 2}$ does not reduce $∣ V ∣!$ . The combinatorial explosion comes from the factorial of the vertex count, not from any numeric magnitude.

Even if we treat $∣ V ∣$ itself as a binary-encoded number with bit-length $ℓ_{V} (I)$ , so that $∣ V ∣ \approx 2^{ℓ_{V} (I)}$ , the search space $∣ V ∣!$ is already super-polynomial in $∣ V ∣$ . Under unary encoding the input length becomes $Θ (∣ V ∣)$ , yet $∣ V ∣!$ is still not polynomial in $∣ V ∣$ . The encoding of the vertex count does not matter because the combinatorial explosion outruns the compression.

Uniform Polynomial Bounds

Fix an encoding of problem instances, and let $m (I) = ∣ I ∣$ be the length of instance $I$ under that encoding. An algorithm is polynomial-time if there exist fixed constants $C$ and $c$ such that

\forall I : T (I) \leq C \cdot m (I)^{c} .

The encoding is fixed for the problem. It is not chosen separately for each instance. Polynomial-time solvability is stable under standard encodings whose lengths are polynomially related. Binary and unary encodings of numeric values are not polynomially equivalent, since a value $W$ has binary length $Θ (lo g W)$ but unary length $Θ (W)$ .

The exponent $c$ is chosen once for the whole algorithm. It cannot change with the instance. In particular, one cannot prove polynomial time by saying that, for each instance $I$ , there exists some exponent $c (I)$ with

T (I) \leq m (I)^{c (I)} .

That would be meaningless: almost any finite runtime can be bounded this way by choosing a large enough instance-dependent exponent.

Now let $q (I)$ be a numeric quantity used by the algorithm, and define its binary length by

ℓ_{q} (I) = ⌈ lo g_{2} (q (I) + 1)⌉ .

Every non-negative integer satisfies $q (I) \approx 2^{ℓ_{q} (I)}$ . This alone says nothing about polynomial time. What matters is whether there is a fixed exponent $d$ such that

\forall I : q (I) \leq m (I)^{d} .

If such a $d$ exists, then iterating over $q (I)$ values can still be polynomial in the input length. If no such fixed $d$ exists, then iterating over $q (I)$ values may be exponential in the input length.

Minimum Search

For minimum search, the input is an explicit array
$I = (a_{1}, a_{2}, \dots, a_{N}) .$
Here $q (I) = N$ . Since the instance explicitly contains $N$ entries,
$N \leq m (I) .$
So we can choose the fixed exponent $d = 1$ :
$q (I) = N \leq m (I)^{1} .$
A linear scan therefore satisfies
$T (I) = O (N) \leq O (m (I)) .$
This is polynomial-time with the fixed exponent $c = 1$ .

Subset Sum

For subset sum, the input contains a target value $W$ as one binary field:
$I = (w_{1}, \dots, w_{n}, W) .$
Here $q (I) = W$ . The field $W$ has length
$ℓ_{W} (I) = ⌈ lo g_{2} (W + 1)⌉ .$
There is no fixed exponent $d$ such that
$\forall I : W \leq m (I)^{d} .$
For every fixed $d$ , one can choose an instance with $W$ so large that
$W > m (I)^{d} .$
Equivalently, $ℓ_{W} (I)$ may be linear in $m (I)$ , so
$W \approx 2^{ℓ_{W} (I)}$
may be exponential in the input length. The dynamic programming algorithm creates $W + 1$ columns, so
$T (I) = O (nW) = O (n 2^{ℓ_{W} (I)}),$
which is not bounded by $m (I)^{c}$ for any fixed $c$ .

Summa summarum:

problem minimum search subset sum DP q (I) N W fixed polynomial bound N \leq m (I)^{1} no fixed d with W \leq m (I)^{d} result polynomial pseudo-polynomial

So the point is not that $ℓ_{q} (I)$ exists. It always exists. The point is whether the exponent bounding $q (I)$ by $m (I)$ is a fixed constant independent of the instance.

Beyond NP-Hardness

This gives the clean dichotomy. For weakly NP-hard problems, the hardness lives in the encoding gap because the state space grows with the numeric value. For strongly NP-hard problems, the hardness lives in the combinatorial structure, and no amount of decompression can remove it.

This means the complexity class of a weakly NP-hard problem can depend on how its input is written. The hardness can come from numeric parameters being compressed efficiently. Remove the compression, and that source of hardness disappears.

Therefore, binary encoding is not just compact—it is the source of the exponential gap that makes weakly NP-hard problems intractable.

Lukas' Notes

Binary Encoding is Compact

The Hidden Exponential

When Encoding Does Not Matter

The Quantity Space

Uniform Polynomial Bounds

Beyond NP-Hardness

Graph View

Table of Contents

Backlinks