Exercise Sheet 3

Exercise 1

Instruction

Recall the reduction from 3-SAT to Subset Sum presented in the second lecture. Assuming that the Exponential Time Hypothesis holds, what does this reduction imply for the possible running times of algorithms solving Subset Sum? In particular, could Subset Sum then admit an algorithm running in time:

$O^{*} (3^{∣ P ∣})$

$O^{*} (2^{0.5∣ P ∣ + 7})$

$O^{*} (2^{∣ P ∣ \cdot l o g ∣ P ∣})$

Let the 3-SAT formula $F$ have $n$ variables and $m$ clauses. We want to reduce it to subset sum by creating a set $P$ and a target sum $T$ such that

\exists P^{'} \subseteq P : w \in P^{'} \sum w = T ⟺ F is satisfiable

To solve this, we attach a quantity $q (c)$ to each clause $c \in C$ , where each clause should have a quantity of $3$ if $F$ is satisfiable.

We reduce $F$ to subset sum $(P, T)$ by constructing a set $P$ that contains

two numbers for each variable $x$ , i.e. $x_{T}$ and $x_{F}$
two fill-up numbers for each clause

Fill-up numbers

For each clause $c$ , let $q (c)$ count how many selected assignment numbers satisfy $c$ .

If $c$ is satisfied, then
$q (c) \in {1, 2, 3} .$
If $c$ is not satisfied, then
$q (c) = 0.$
The target asks for every clause digit to equal $3$ . Therefore, for each clause, we add two fill-up numbers with value $1$ in that clause digit and $0$ everywhere else.

These can “repair” the satisfied cases:
$1 + 1 + 1 = 3, 2 + 1 = 3, 3 = 3.$
But they cannot repair an unsatisfied clause:
$0 + 1 + 1 = 2 < 3.$
Thus the fill-up numbers make “at least one true literal” equivalent to “the clause digit can be filled to $3$ ”.

Hence, the constructed set $P$ has cardinality

∣ P ∣ = 2 n + 2 m = O (n + m),

since we create two numbers $x_{T}$ and $x_{F}$ for each variable, and two fill-up numbers for each clause. Thus $∣ P ∣$ is linearly bounded in the size of the original 3-SAT formula.

The target sum $T$ is built digit by digit. There is one digit for each variable and one digit for each clause. In every variable digit, $T$ has value $1$ , forcing the subset to choose exactly one of $x_{T}$ and $x_{F}$ . In every clause digit, $T$ has value $3$ , forcing the selected assignment numbers together with the fill-up numbers to fill each satisfied clause exactly to $3$ . Thus, if there are $n$ variables and $m$ clauses, the target has the form

T = n variable digits 11 \dots 1 m clause digits 33 \dots 3

Shape of one element

Suppose there are $2$ variables and $2$ clauses. Then every number in $P$ has $2 + 2 = 4$ digits.

If setting $x_{1}$ to true satisfies only clause $C_{2}$ , then the corresponding assignment number is
$x_{1, T} = variable digits 10 clause digits 01 .$
The variable part says that this number belongs to $x_{1}$ . The clause part says that this truth choice contributes $1$ only to $C_{2}$ .

A fill-up number for $C_{1}$ would look like
$variable digits 00 clause digits 10,$
since it contributes only to the first clause digit and does not choose any variable value.

If subset sum had an algorithm $A$ with running time $f (∣ P ∣)$ , then we could first reduce a 3-SAT formula to a subset-sum instance with $∣ P ∣ = O (n + m)$ and then run $A$ . This would give a 3-SAT algorithm with running time $f (O (n + m))$ , up to polynomial factors from the reduction.

If the exponential-time hypothesis is true, then 3-SAT cannot be solved in time

O^{*} (2^{o (n + m)}) = O (2^{o (n + m)} \cdot n^{d})

where $o (\cdot)$ is the small-o notation, i.e. strict asymptotic upper bound, and $O^{*} (\cdot)$ is the big-o-star notation, i.e. the asymptotic upper bound suppressing polynomial factors. Therefore, the reduction above rules out a subexponential algorithm w.r.t. $∣ P ∣$ under the assumption of ETH for subset sum.

Solution

Assuming ETH being true, then:

$O^{*} (3^{∣ P ∣})$ is allowed , since

$3^{∣ P ∣} = 2^{l o g_{2} (3) ∣ P ∣} = 2^{O (∣ P ∣)},$
which is still single-exponential in $∣ P ∣$ , not subexponential.

$O^{*} (2^{0.5∣ P ∣ + 7})$ is also allowed, given that the exponent

$0.5∣ P ∣ + 7 = O (∣ P ∣)$
is still linear in $∣ P ∣$ . ETH does not forbid improving the constant in a linear exponent. That would be the role of SETH.

$O^{*} (2^{∣ P ∣ l o g ∣ P ∣})$ is not allowed, as its exponent is sublinear:

$∣ P ∣ \cdot lo g ∣ P ∣ = o (∣ P ∣),$
as
$∣ P ∣ \to \infty lim \frac{∣ P ∣ lo g ∣ P ∣}{∣ P ∣} = ∣ P ∣ \to \infty lim \frac{∣ P ∣ lo g ∣ P ∣}{∣ P ∣ \cdot ∣ P ∣} = ∣ P ∣ \to \infty lim \frac{lo g ∣ P ∣}{∣ P ∣} = 0,$
given that $lo g ∣ P ∣ ≪ ∣ P ∣$ . Since $∣ P ∣ = O (n + m)$ , this would give a 3-SAT algorithm running in
$O^{*} (2^{o (n + m)}),$
contradicting ETH.

Exercise 2

Instruction

Consider the following variant of the classical graph 4-colouring problem. In Equitable 4-Colouring, we are given a graph and are asked for a proper 4-colouring (i.e., one where neighbours do not receive the same colour) but where additionally each colour is used essentially the same number of times ( $\pm 1$ ). To avoid any confusion, a formal definition of the problem is provided below.

Equitable 4-Colouring Input: A connected $n$ -vertex graph $G$ Question: Does there exist a partition of the vertices $V (G)$ of $G$ into four sets $A_{1}, A_{2}, A_{3}, A_{4}$ such that

for each $1 \leq i \leq 4$ , each $A_{i}$ is an independent set, and

for each $1 \leq i \leq 4$ and $1 \leq j \leq 4$ , $∣ A_{i} ∣ \leq ∣ A_{j} ∣ + 1$ ?

EQUITABLE 4-COLOURING admits a simple brute-force exponential algorithm that runs in time $O^{*} (4^{n})$ and simply exhaustively enumerates all possible assignments of colours to the vertices (whereas for each assignment, one can check in polynomial time whether conditions (1) and (2) are satisfied).

The task is to design a single-exponential algorithm for EQUITABLE 4-COLOURING with a running time that has a better base of exponent than $4$ , i.e., one with running time $O^{*} (c^{n})$ for some $c < 4$ .

A 4-colouring is a function $f : V (G) \to {1, 2, 3, 4}$ . If we ignore all graph structure, then every vertex has four possible colours. Hence the naive search space is

4^{n} = ∣ {1, 2, 3, 4} ∣^{n}

So the base $4$ comes from the fact that we treat every vertex as if all four colours were always possible. To improve the base, we need to force the situation that, for most vertices, at least one colour is already unavailable, so that the search has running time $O^{*} (c^{n})$ for some $c < 4$ .

The proper-colouring condition says that neighbours must not have the same colour assignment, i.e.

\forall uv \in E (G) : f (u) \neq = f (v)

So if a vertex $v$ already has an already-coloured neighbour $u$ , then $v$ is forbidden from using colour $f (u)$ . Since there are at most four colours, this leaves at most $4 - 1 = 3$ possible colours for $v$ .

Given that $G$ is connected, it has a DFS tree. Pick an arbitrary root vertex $v_{1}$ . Then order the remaining vertices according to DFS discovery order

v_{1}, v_{2}, \dots, v_{n},

where the DFS run takes polynomial time.

For every $i > 1$ , the vertex $v_{i}$ has a parent in the DFS tree. That parent appears earlier in the order. Hence

\forall i > 1 : \exists j < i : v_{i} v_{j} \in E (G) .

So when colouring $v_{i}$ , it has at least one already-coloured neighbour.

Further, we can fix the root node’s colour to an arbitrary colour, e.g.

f (v_{1}) = 1,

given that the names of the colours don’t matter, only the constraints they impose on the graph. Hence, we only have to search through 3 colours for each of the remaining $n - 1$ vertices. Therefore, with a fixed root, we have a runtime complexity of

O^{*} (3^{n - 1}) = O^{*} (3^{n}),

which is single-exponential given that the exponent is linearly bounded.

For every completed colouring, we then check in polynomial time whether it is actually a valid solution. First, we check the proper-colouring condition, i.e. whether adjacent vertices receive different colours. Then we form the four colour classes $A_{1}, A_{2}, A_{3}, A_{4}$ and check the equitable size condition

\forall1 \leq i, j \leq 4 : ∣ A_{i} ∣ \leq ∣ A_{j} ∣ + 1.

These checks only add a polynomial factor, so they are hidden by the $O^{*} (\cdot)$ notation.

Correctness

If the algorithm finds a solution, the produced partition
$V (G) = A_{1} \cup A_{2} \cup A_{3} \cup A_{4}$
satisfies that every $A_{i}$ is an independent set, because adjacent vertices have different colours.

It also satisfies the equitable condition, because the algorithm only accepts a colouring after checking that the four colour classes differ in size by at most one. In other words, for all $1 \leq i, j \leq 4$ , it verifies that
$∣ A_{i} ∣ \leq ∣ A_{j} ∣ + 1.$
Hence the accepted partition is not only a proper 4-colouring, but an equitable 4-colouring.

Conversely, suppose that $G$ has an equitable 4-colouring. Since the names of the four colours are arbitrary, we may rename the colours so that the root $v_{1}$ receives colour $1$ . The algorithm fixes exactly this root colour and then enumerates all valid colour choices for the remaining vertices. Therefore, it eventually considers this equitable colouring. Since it is proper and its colour classes satisfy the size condition, the algorithm accepts it.

Exercise 3

Instruction

Design a subexponential algorithm for SAT when the input is restricted to instances such that $m \leq n$ , where $m$ is the number of clauses and $n$ the number of variables.

Given a SAT formula $F$ over $n$ variables and $m$ clauses of form

F = C_{1} \land C_{2} \land \dots \land C_{m}

with the promise that $m \leq n,$ we want to find a subexponential algorithm, i.e. an algorithm that has a runtime of $2^{o (n)}$ .

A naive SAT algorithm enumerates all assignments

α : {x_{1}, x_{2}, \dots, x_{n}} \to {0, 1},

i.e. has a naive runtime of $O^{*} (2^{n})$ . However, we’re given a upper boundary for the number of clauses, which means that it makes more sense to branch over clauses rather than variables.

CNF formula $F$ is satisfiable iff every clause has at least one true literal. Further, each clause in $F$ has at most $2 n$ distinct literals, given that each variable has only two possible literals:

x_{i} or \neg x_{i}

So the number of possible choices is at most

i = 1 \prod m ∣ C_{i} ∣ \leq (2 n)^{m} .

Given $m \leq n$ , this is at most

(2 n)^{m} \leq (2 n)^{n}

For every tuple $(ℓ_{1}, \dots, ℓ_{m}) \in C_{1} \times \dots \times C_{m}$ do the following:

Remove duplicate literals, if any.
Check whether the chosen literals are consistent, i.e. there must not be a variable such that $x$ and $\neg x$ are in the chosen literals.
If the chosen literals are consistent, assign all of them to true.

Example

If the chosen literals are $x_{1}, \neg x_{4}, x_{7}$ , then set
$x_{1} = true, x_{4} = false, x_{7} = true$

Assign all remaining variables arbitrarily.
Accept.

If no tuple of chosen literals is consistent, reject.

Boolean SubexpSAT(Formula F)

let F = C_1 ∧ C_2 ∧ ... ∧ C_m;
let α := empty partial assignment;

for each (ℓ_1, ℓ_2, ..., ℓ_m) ∈ (C_1 × C_2 × ... × C_m) {
    α := empty partial assignment;
    consistent := true;

    for i := 1 to m {
        if (ℓ_i = x) {
            required := true;
        }

        if (ℓ_i = ¬x) {
            required := false;
        }

        if (x ∈ dom(α)) {
            if (α(x) ≠ required) {
                consistent := false;
                break;
            }
        } else {
            α(x) := required;
        }
    }

    if (consistent = true) {
        return true;
    }
}

return false;

Correctness

Suppose the algorithm accepts, then it found one chosen literal
$ℓ_{i} \in C_{i}$
for every clause $C_{i}$ , and all chosen literals are mutually consistent. So we can assign all chosen literals to true. Therefore every clause $C_{i}$ contains at least one true literal, namely $ℓ_{i}$ , i.e. the whole formula is satisfied.

Conversely, suppose $F$ is satisfiable. Let $α$ be a satisfying assignment. Since $α$ satisfies every clause, every clause $C_{i}$ contains at least one literal that is true under $α$ . Choose one such literal $ℓ_{i}$ from every clause. The tuple
$(ℓ_{1}, \dots, ℓ_{m})$
is consistent, because all chosen literals are true under the same assignment $α$ . The algorithm enumerates this tuple and accepts.

Therefore, the algorithm is correct.

Running Time

For each clause, there are at most $2 n$ possible literals. Hence, the number of tuples is at most $(2 n)^{m}$ . Using the promise $m \leq n$ , we get
$(2 n)^{m} \leq (2 n)^{n} = 2^{n l o g_{2} (2 n)}$
Given that $n lo g n = o (n)$ , we have
$2^{n l o g_{2} (2 n)} = 2^{o (n)}$
The consistency check for each tuple is polynomial time, so it is hidden inside $O^{*} (\cdot)$ . Thus the running time is
$O^{*} ((2 n)^{m}) \subseteq O^{*} ((2 n)^{n}) = 2^{O (n l o g n)} = 2^{o (n)}$
Therefore, SAT with $m \leq n$ can be solved in $2^{o (n)}$ , i.e. subexponential time.

Exercise 4

Instruction

Consider an NP-complete graph problem $P$ . Assume $P$ admits a polynomial reduction from 3-SAT which transforms each $n$ -variable 3-SAT formula into an instance of $P$ with at most $n^{3}$ vertices.

Now, say we have an algorithm $A$ that solves any $z$ -vertex instance of $P$ . Under the Exponential Time Hypothesis, which of the following running times can we exclude for $A$ ?

$O^{*} (2^{(z^{4})})$

$O^{*} (2^{z})$

$O^{*} (2^{z^{l o g z}})$

$O (z^{7})$

We’re given an NP-complete graph problem $P$ that with

$P$ admits a polynomial-time reduction 3-SAT $\leq_{p} P$
- that transforms $n$ variables of 3-SAT
- into $n^{3}$ vertices of $P$
$A$ solves $P$ in $T (z)$ time for $z$ vertices.

We know that after the reduction, the number of vertices $z$ is constrained by

z \leq n^{3} .

If $T (n^{3}) = 2^{o (n)}$ , then 3-SAT can be solved in subexponential time, contradicting ETH.

Solution

$O^{*} (2^{z^{4}})$ : Given that

$z^{4} \leq (n^{3})^{4} = n^{12},$
we know that
$O^{*} (2^{z^{4}}) \subseteq O^{*} (2^{n^{12}}) .$
Given that the exponent is worse than linear, i.e. $O^{*} (2^{n^{12}})$ is not subexponential, (1) is not excluded under the ETH.

$O^{*} (2^{z})$ : Given that

$z \leq n^{3},$
it follows that
$O^{*} (2^{z}) \subseteq O^{*} (2^{n^{1.5}}) .$
Again, the running time is not subexponential, meaning that (2) is not excluded under the ETH.

$O^{*} (2^{z^{l o g z}})$ : From

$z^{l o g z} \leq (n^{3})^{l o g n^{3}} = n^{9 l o g n}$
it follows that
$O^{*} (2^{z^{l o g z}}) \subseteq O^{*} (2^{n^{9 l o g n}}) .$
Again, (3) is not subexponential, i.e., it is not excluded under ETH.

$O (z^{7})$ : From

$z^{7} \leq (n^{3})^{7} = n^{21}$
it follows that
$O (z^{7}) \subseteq O (n^{21}) .$
This result would conclude that we have found a polynomial-time algorithm (polynomial reduction + polynomial algorithm) that can solve 3-SAT. This contradicts the ETH, i.e., it is excluded.

Exercise 5

Instruction

For each of the following basic scheduling algorithms, construct an example instance of $1 ∣ r_{j}; \overline{d_{j}} ∣ -$ where the algorithm fails.

Earliest Due Date (EDD) First

Earliest Release Date First

Shortest Processing Time First

The notation

1 ∣ r_{j}; \overline{d_{j}} ∣ -

means that

we have one machine,
job $j$ can’t be processed before $r_{j}$ (its release time), and
we have no objective function.

Earliest Due Date First

The earliest due date first algorithm selection policy chooses, at a time $t$ , the available job $j$ with the earliest due date $d_{j}$ .

Take two jobs $j_{1}, j_{2}$ with:

$j_{1}$ :

$r_{j_{1}} = 0$

$p_{j_{1}} = 2$

$d_{j_{1}} = 4$

$j_{2}$ :

$r_{j_{2}} = 1$

$p_{j_{2}} = 1$

$d_{j_{2}} = 2$

At $t = 0$ , only $j_{1}$ can be chosen, because $j_{2}$ has not yet been released. Hence EDD chooses $j_{1}$ .

At $t = 2$ , only $j_{2}$ remains. But $j_{2}$ cannot meet its deadline anymore, since it would complete at time $3$ :
$C_{j_{2}} = 3 > 2 = d_{j_{2}} .$
This is not because the instance itself is infeasible. A feasible schedule is obtained by idling until $t = 1$ and then processing
$j_{2}, j_{1} .$

Earliest Release Date First

The earliest release date first algorithm selection policy chooses, at a time $t$ , the available job $j$ with the earliest release date $r_{j}$ .

Take three jobs $j_{1}, j_{2}, j_{3}$ with:

$j_{1}$ :

$r_{j_{1}} = 0$

$p_{j_{1}} = 1$

$d_{j_{1}} = 1$

$j_{2}$ :

$r_{j_{2}} = 0$

$p_{j_{2}} = 3$

$d_{j_{2}} = 5$

$j_{3}$ :

$r_{j_{3}} = 1$

$p_{j_{3}} = 1$

$d_{j_{3}} = 3$

At $t = 0$ , $j_{1}, j_{2}$ are available with equal release times. The algorithm chooses $j_{1}$ ; this is the best tie-breaking choice, since it meets the tight deadline
$d_{j_{1}} = 1.$

At $t = 1$ , the algorithm can choose between $j_{2}$ and $j_{3}$ . It chooses $j_{2}$ , because
$r_{j_{2}} = 0 < 1 = r_{j_{3}} .$

At $t = 4$ , only $j_{3}$ remains. It would complete too late, since
$C_{j_{3}} = 5 > 3 = d_{j_{3}} .$
This is not because the instance itself is infeasible. A feasible schedule is obtained by processing
$j_{1}, j_{3}, j_{2} .$

Shortest Processing Time First

The shortest processing time first algorithm selection policy chooses, at a time $t$ , the available job $j$ with the shortest processing time $p_{j}$ .

Take two jobs $j_{1}, j_{2}$ with:

$j_{1}$ :

$r_{j_{1}} = 0$

$p_{j_{1}} = 2$

$d_{j_{1}} = 2$

$j_{2}$ :

$r_{j_{2}} = 0$

$p_{j_{2}} = 1$

$d_{j_{2}} = 3$

At $t = 0$ , both jobs are available. The algorithm chooses $j_{2}$ , since
$p_{j_{2}} = 1 < 2 = p_{j_{1}} .$

At $t = 1$ , only $j_{1}$ remains. It would complete too late, since
$C_{j_{1}} = 3 > 2 = d_{j_{1}} .$
This is not because the instance itself is infeasible. A feasible schedule is obtained by processing
$j_{1}, j_{2} .$

Exercise 6

Instruction

Consider the following three schedules for $1∣∣ \sum C_{j}$ (i.e., the scheduling problem with one machine where the aim is to minimize the sum of completion times) for an instance with precisely 4 jobs $a, b, c, d$ :

$S_{1} : a, b, c, d$

$S_{2} : c, d, a, b$

$S_{3} : b, a, d, c$

Assume that the job durations satisfy $p_{a} + p_{b} = p_{c} + p_{d}$ . Under this assumption, answer the following two questions:

Do $S_{1}$ and $S_{2}$ always have the same sum of completion times?

Do $S_{1}$ and $S_{3}$ always have the same sum of completion times?

In each case, if the answer is yes then provide a proof sketch, and otherwise provide a counterexample.

The notation

1 ∣∣ \sum C_{j}

means that there’s one machine, no further constraints, and the objective function is the total completion time .

Let’s take $S_{1}$ first. The completion times are

$C_{a} = p_{a}$ ,
$C_{b} = p_{a} + p_{b}$ ,
$C_{c} = p_{a} + p_{b} + p_{c}$ , and
$C_{d} = p_{a} + p_{b} + p_{c} + p_{d}$ .

So the total completion time for $S_{1}$ is

j \sum C_{j} = C_{a} + C_{b} + C_{c} + C_{d} = 4 p_{a} + 3 p_{b} + 2 p_{c} + p_{d}

For $S_{2}$ , the completion times are

$C_{c} = p_{c}$ ,
$C_{d} = p_{c} + p_{d}$ ,
$C_{a} = p_{c} + p_{d} + p_{a}$ , and
$C_{b} = p_{c} + p_{d} + p_{a} + p_{b}$ .

Thus, the total completion time for $S_{2}$ is

j \sum C_{j} = C_{c} + C_{d} + C_{a} + C_{b} = 4 p_{c} + 3 p_{d} + 2 p_{a} + p_{b} .

For $S_{3}$ , the completion times are

$C_{b} = p_{b}$ ,
$C_{a} = p_{b} + p_{a}$ ,
$C_{d} = p_{b} + p_{a} + p_{d}$ , and
$C_{c} = p_{b} + p_{a} + p_{d} + p_{c}$ .

Thus, the total completion time for $S_{3}$ is

j \sum C_{j} = C_{b} + C_{a} + C_{d} + C_{c} = 4 p_{b} + 3 p_{a} + 2 p_{d} + p_{c}

(1) Do $S_{1}$ and $S_{2}$ always have the same sum of completion times?

If $S_{1}$ and $S_{2}$ have the same sum of completion times, then the difference of their sums should be $0$ .
$(4 p_{a} + 3 p_{b} + 2 p_{c} + p_{d}) - (4 p_{c} + 3 p_{d} + 2 p_{a} + p_{b}) = 2 p_{a} + 2 p_{b} - 2 p_{c} - 2 p_{d} = 2 (p_{a} + p_{b}) - 2 (p_{c} + p_{d})$
Given the assumption
$p_{a} + p_{b} = p_{c} + p_{d},$
we get
$2 (p_{a} + p_{b}) - 2 (p_{c} + p_{d}) = 2 (p_{a} + p_{b}) - 2 (p_{a} + p_{b}) = 0$
Therefore, the completion times of $S_{1}$ and $S_{2}$ always have the same sum of completion times under the assumption..

(2) Do $S_{1}$ and $S_{3}$ always have the same sum of completion times?

If $S_{1}$ and $S_{3}$ have the same sum of completion times, then the difference of their sums should be $0$ .
$(4 p_{a} + 3 p_{b} + 2 p_{c} + p_{d}) - (4 p_{b} + 3 p_{a} + 2 p_{d} + p_{c}) = p_{a} - p_{b} + p_{c} - p_{d} = (p_{a} + p_{c}) - (p_{b} + p_{d})$
Under the assumption above, it doesn’t follow that the difference of total completion times is $0$ . Therefore, $S_{1}$ and $S_{3}$ do not always have the same sum of completion times.

Counterexample

A counterexample is

$p_{a} = 1, p_{b} = 3, p_{c} = 2, p_{d} = 2.$
Plugging those processing times into the equation gives
$(p_{a} + p_{c}) - (p_{b} + p_{d}) = (1 + 2) - (3 + 2) = 3 - 5 \neq = 0.$

Lukas' Notes

Exercise Sheet 3

Table of Contents

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

Exercise 6

Backlinks