Star Sum Problem

Definition

Star Sum Problem

Let $G = (V, E)$ be a graph consisting of a set of vertex-disjoint stars and let $k \in Z^{+}$ be a target integer. The star sum problem is a decision problem asking whether there exists a subset $S$ of stars in $G$ such that the total number of vertices in $S$ is precisely $k$ .

Example Instance: Below is a graph $G$ made of 4 disjoint stars ( $S_{1}$ to $S_{4}$ ) and a target integer $k = 7$ . A valid solution subset would be ${S_{2}, S_{3}}$ since $4 + 3 = 7$ , or ${S_{1}, S_{4}}$ since $1 + 6 = 7$ .

Relation to Subset Sum

The star sum problem is essentially a graph-theoretic formulation of the subset sum problem. Because the stars are strictly vertex-disjoint, the structural properties of the stars (such as their edges) are irrelevant; each star $S_{i}$ in $G$ can be abstracted as merely its vertex count $v_{i} = ∣ V (S_{i}) ∣$ .

Consider a concrete instance where we want to find a subset summing to $k = 7$ . By choosing the integer $2$ and $5$ from the set, we conceptually pick the stars $S_{1}$ and $S_{3}$ which precisely combine to $2 + 5 = 7$ vertices total.

Equivalence to Subset Sum Problem

The star sum problem is equivalent (w.r.t. polynomial time reduction) to the subset sum problem. Therefore, it is also NP-complete.

$SUBSET SUM \leq_{P} STAR SUM$

Let an instance of the subset sum problem be given by a multiset of positive integers $W = {w_{1}, \dots, w_{n}}$ and a target sum $k$ .

In polynomial time, we construct a graph $G$ as follows: for each integer $w_{i} \in W$ , create a star graph component with exactly $w_{i}$ vertices (which corresponds to the complete bipartite graph $K_{1, w_{i} - 1}$ , or $K_{1}$ if $w_{i} = 1$ ). All constructed stars are strictly vertex-disjoint. We keep the same target integer $k$ .

Correctness: Picking a subset of stars in $G$ whose vertices sum to exactly $k$ directly corresponds to picking a subset of integers in $W$ that sum to exactly $k$ .

$STAR SUM \leq_{P} SUBSET SUM$

Let an instance of the star sum problem be given by a graph $G$ consisting of $n$ vertex-disjoint stars $S_{1}, \dots, S_{n}$ and a target $k$ .

In polynomial time, count the number of vertices $v_{i} = ∣ V (S_{i}) ∣$ for each star $S_{i}$ in $G$ . We construct a subset sum instance with the multiset $W = {v_{1}, \dots, v_{n}}$ and the same target sum $k$ .

Correctness: Picking a subset of integers from $W$ summing to exactly $k$ identically mirrors selecting the corresponding stars in $G$ to obtain exactly $k$ vertices.

Approaches

Since the problem is a direct mapping to the subset sum problem, the exact same algorithmic approaches apply:

Brute-force

A brute-force approach evaluates all possible $2^{n}$ subsets of the $n$ stars to check if their vertices sum to exactly $k$ .

Time complexity: $O (n 2^{n})$

Dynamic Programming

The problem can be solved in pseudo-polynomial time using dynamic programming.

Algorithm:
Given a collection of $n$ disjoint stars, we can evaluate all achievable vertex sums incrementally:

Initialisation: Start by marking all stars as “unprocessed” and construct an empty list (or set) $L = {0}$ of integers.
Intuition: We will process the stars one by one and keep track of all possible numbers we could get by choosing any subset of them. Observe that while the number of possible choices of $S$ can be $2^{n}$ , the number of possible numbers we can get is bounded.
Iteration: Repeatedly process unprocessed stars (in an arbitrary order) as follows:
- Choose an unprocessed star and read the number of its vertices (say it is $c$ ).
- Construct $L^{'} = {d ∣ \exists a \in L : d = a + c}$ by having it contain the increment of $L$ by $c$ .
- Update $L$ by setting $L := L \cup L^{'}$ .
Termination: After all stars are processed, output YES if and only if $L$ contains $k$ .

Visual Step-by-Step Example:
Consider $n = 3$ stars with vertex counts ${2, 3, 5}$ and target $k = 7$ .

Time complexity: $O (n^{3})$ . The size of $L$ is bounded by the total sum of all vertices. Assuming each of the $n$ stars has at most $O (n)$ vertices, $∣ L ∣$ is at most $O (n^{2})$ . We update it $n$ times, yielding an $O (n^{3})$ runtime.

Why is this Dynamic Programming?

While it may look like we are simply building a set, this procedure is fundamentally a dynamic programming approach. It incrementally constructs a table of boolean values where the state is “can we reach sum $j$ using a subset of the first $i$ stars?“.

Specifically, keeping track of the achievable sums in the set $L$ after processing $i$ stars is mathematically identical to storing a boolean array DP[i][j] (which is True if sum $j$ is achievable using the first $i$ stars). The transition $L := L \cup L^{'}$ directly corresponds to the DP recurrence DP[i][j] = DP[i-1][j] OR DP[i-1][j-c]. By only keeping track of the “True” values in a set $L$ (which bounds our runtime), we optimize the sparse DP table.

Constructing the Solution (Witness):

Once we know that the target sum $k$ is achievable (i.e., we get a YES answer), we typically want to reconstruct the actual subset $S$ that yields this sum. There are two general approaches that apply to almost all dynamic programming algorithms:

1. Backtracking: Once we find $k \in L$ at the very end, we backtrack through the computation steps to see which elements allowed us to obtain $k$ . If we are at sum $u$ after processing a star of size $c$ , the sum $u$ could either come from $u - c$ (meaning we included this star) or it was already in $L$ before processing $c$ (meaning we skipped it).

2. Explicit Storage: Alternatively, for each intermediate sum $u$ in $L$ , we can simply store one subset $S_{u}$ which sums up to $u$ . Whenever we construct $L^{'}$ by adding $c$ (from star $S_{i}$ ) to a previous sum $a \in L$ , we also record $S_{a + c} = S_{a} \cup {S_{i}}$ .

$L_{0} = {0 \to \emptyset}$
$L_{1} = {0 \to \emptyset, 2 \to {S_{1}}}$
$L_{2} = {0 \to \emptyset, 2 \to {S_{1}}, 3 \to {S_{2}}, 5 \to {S_{1}, S_{2}}}$
$L_{3} = {\dots, 7 \to {S_{1}, S_{3}}, \dots}$

This approach makes returning the final solution completely trivial (just look up the set for key $k$ ), at the cost of significantly higher memory consumption.

Lukas' Notes