Exercise Sheet 4

Exercise 1

Instruction

Consider the following scheduling problem: There is one machine, which is unavailable for a given period $[a, b]$ (i.e., no job can be scheduled during this time). The jobs are released at time $0$ and cannot be preempted. We want to compute a schedule that minimises lateness.

Is the problem above NP-hard or polynomial time solvable? In case of the former, provide a reduction. In case of the latter, provide an algorithm with a correctness argument and runtime analysis.

Minimising lateness means minimising the metric

L_{max} = j max (C_{j} - d_{j}),

where $C_{j}$ is the completion time and $d_{j}$ is the deadline of job $j$ . So the schedule should minimise the maximum lateness of the jobs.

We reduce from subset sum by forcing an unavailable time interval $[T, T + 1]$ . Given a subset sum instance

S = {s_{1}, \dots, s_{n}}, T \in N .

Let

W = i = 1 \sum n s_{i} .

Further, assume that $W$ is an upper bound of $T$ , i.e., $T \leq W$ . Otherwise, if $T > W$ , then the instance is immediately a no-instance.

Next, create one scheduling job $j_{i}$ for each number $s_{i}$ . Set the processing time to $s_{i}$ , the deadline to $W + 1$ , and the release time to $0$ , i.e.,

p_{j_{i}} = s_{i}, d_{j_{i}} = W + 1, r_{j_{i}} = 0.

We make the one machine we have unavailable during

[a, b] = [T, T + 1],

and no job can be preempted. We want to decide whether

L_{max} = j max (C_{j} - d_{j}) \leq 0,

i.e., no job is late and thus must finish no later than the common deadline $W + 1$ .

This completes the reduction: the subset sum instance is mapped to the scheduling instance with jobs $j_{i}$ , common deadline $W + 1$ , and one unavailable interval $[T, T + 1]$ .

Correctness

The interval before the gap has length $T$ , and the interval after the gap has length $W - T$ . Since all deadlines are $W + 1$ , a schedule with $L_{max} \leq 0$ must fit all $W$ units of work into these two intervals. No further gaps are useful, since any idle time outside the unavailable interval only makes completion times later.

If the jobs scheduled before the unavailable interval have total processing time $P$ , then
$P \leq T .$
Therefore, the remaining work, i.e., the work on the right side of the gap, is $W - P$ . This work cannot be processed during $[T, T + 1]$ , because the machine is unavailable there. Hence, the earliest possible start time for the remaining work is $T + 1$ . Even if the machine works without idle time after the gap, it still needs $W - P$ time units to process the remaining jobs. Hence, the last completion time is at least
$T + 1 + (W - P) .$
To have $L_{max} \leq 0$ , every job must complete by the deadline $W + 1$ , hence:
$T + 1 + W - P \leq W + 1 ⟺ T \leq P .$
Above, we already said $P \leq T$ , so we have
$P \leq T \land T \leq P ⟺ P = T .$
So any schedule with $L_{max} \leq 0$ must fill the whole left-sided interval of the unavailability gap of the machine. The jobs before the gap therefore correspond to a subset of $S$ , i.e., the processing times, whose sum is exactly $T$ .

First, suppose the subset sum instance has a subset $S^{'} \subseteq S$ with
$s_{i} \in S^{'} \sum s_{i} = T .$
Schedule the corresponding jobs in $[0, T)$ , wait for the unavailable interval $[T, T + 1]$ , and then schedule the remaining jobs after time $T + 1$ . The remaining processing time is $W - T$ , so the last job completes at
$T + 1 + (W - T) = W + 1.$
Thus every job completes no later than the common deadline $W + 1$ . Therefore, the constructed scheduling instance has a schedule with
$L_{max} \leq 0.$
Conversely, suppose the constructed scheduling instance has a schedule with
$L_{max} \leq 0.$
Then all jobs complete by time $W + 1$ . Let $P$ be the total processing time of the jobs scheduled before the unavailable interval. As shown above, this implies $P = T$ . Since each processing time is exactly one number $s_{i}$ from the subset sum instance, the jobs before the gap define a subset $S^{'} \subseteq S$ with
$s_{i} \in S^{'} \sum s_{i} = P = T .$
So the original subset sum instance is a yes-instance exactly when the constructed scheduling instance admits a schedule with $L_{max} \leq 0$ .

The mapping is polynomial: it creates one job per number and computes only $W$ , $W + 1$ , and the unavailable interval $[T, T + 1]$ . Since the correctness proof shows equivalence with subset sum, deciding whether the constructed instance has a schedule with $L_{max} \leq 0$ is NP-hard. Therefore, computing a schedule that minimises maximum lateness for one machine with one unavailable interval is also NP-hard.

Exercise 2

Instruction

Consider the problem
$1 ∣ prmp ∣ \sum h_{j} (C_{j}),$
where $h_{j}$ is a function (that may be different for each job $j$ ). For example, if
$h_{j} (C_{j}) = C_{j},$
then the problem is equivalent to
$1 ∣ prmp ∣ \sum C_{j},$
(i.e., minimising the sum of completion times with preemption constraint). For another example, if
$h_{j} (C_{j}) = {10 if C_{j} > d_{j} otherwise$
then the problem is then equivalent to
$1 ∣ prmp ∣ \sum U_{j},$
(i.e., maximising throughput with preemption constraint).

First, show that if the functions $h_{j}$ are nondecreasing (i.e., for $x < y$ , $h_{j} (x) \leq h_{j} (y)$ ), there exists an optimal schedule that is nonpreemptive.

Second, answer the following question: “Does the result hold for arbitrary functions $h_{j}$ ?“. Briefly justify your answer.

Let $h_{j}$ be a non-decreasing function for a job $j$ , i.e.,

h_{j} (x) \leq h_{j} (y) \forall x < y .

In other words, if the completion time $x$ is before completion time $y$ , then $h_{j}$ preserves that before-after relation.

An optimal schedule of a scheduling problem $α ∣ β ∣ γ$ is a solution where $γ$ is minimal. In the problem above, it means that

γ = j \sum h_{j} (C_{j}) = γ^{*},

where $γ^{*}$ is the smallest (best) feasible objective.

(1)

We construct a new schedule $S$ that processes jobs non-preemptively in the order
$j_{1}, j_{2}, \dots, j_{n} .$
Therefore, the completion time of job $j_{k}$ in $S$ is the sum of preceding (reflexive) processing times:
$C_{j_{k}}^{S} = i = 1 \sum k p_{j_{i}} .$
Let $S^{*}$ be an optimal preemptive schedule with $γ^{*}$ , and let the jobs be ordered by their completion times in $S^{*}$ :
$C_{j_{1}}^{S^{*}} \leq C_{j_{2}}^{S^{*}} \leq \dots \leq C_{j_{n}}^{S^{*}}$
In $S^{*}$ , by time $C_{j_{k}}^{S^{*}}$ , all jobs $j_{1}, \dots, j_{k}$ have completed. Therefore, $S^{*}$ must have processed at $k$ least
$i = 1 \sum k p_{j_{i}}$
units of work by time $C_{j_{k}}^{S^{*}}$ . Given there’s only one machine, no parallelism is allowed, i.e.:
$i = 1 \sum k p_{j_{i}} \leq C_{j_{k}}^{S^{*}} .$
From above, we know that
$i \sum k p_{j_{i}} = C_{j_{k}}^{S} \leq C_{j_{k}}^{S^{*}} .$
In other words, every job completes no later in $S$ than in $S^{*}$ . Further, we know that $h_{j}$ is non-decreasing, thus:
$C_{j}^{S} < C_{j}^{S^{*}} ⟹ h_{j} (C_{j}^{S}) \leq h_{j} (C_{j}^{S^{*}}) .$
Hence:
$γ = j \sum h_{j} (C_{j}^{S}) \leq j \sum h_{j} (C_{j}^{S^{*}}) = γ^{*} .$
By definition $γ^{*}$ is already optimal, i.e., the smallest feasible value. We conclude that
$γ = γ^{*},$
which means that both $S$ and $S^{*}$ are optimal. Given that $S$ is non-preemptive, there exists an optimal non-preemptive schedule $S$ .

(2)

No. The following two-job instance is a counterexample.

Consider two jobs with
$p_{1} = p_{2} = 2.$
In any non-preemptive schedule, the completion times are either
$(C_{1}, C_{2}) = (2, 4) or (C_{1}, C_{2}) = (4, 2) .$
With preemption, however, we can schedule
$j_{1} : [0, 1], j_{2} : [1, 3], j_{1} : [3, 4],$
which yields
$(C_{1}, C_{2}) = (4, 3) .$
Now define arbitrary objective functions that reward exactly these preemptive completion times:
$h_{1} (C) = {01 if C = 4 otherwise, h_{2} (C) = {01 if C = 3 otherwise.$
The preemptive schedule has objective value $h_{1} (4) + h_{2} (3) = 0$ , while every non-preemptive schedule has objective value at least $1$ . Thus, for arbitrary functions $h_{j}$ , there need not exist an optimal non-preemptive schedule.

Exercise 3

Instruction

Let
$S = ab c d ab c .$
Assume
$a < b < c < d .$
Compute the prefix function, the suffix tree, and the suffix array of $S$ .

(Note: Some of the concepts in this exercise will be discussed in the lecture on 2 June.)

Prefix function

$i$ $S [0.. i]$ border $π (i)$
$0$ a $ε$ $0$
$1$ ab $ε$ $0$
$2$ abc $ε$ $0$
$3$ abcd $ε$ $0$
$4$ abcda a $1$
$5$ abcdab ab $2$
$6$ abcdabc abc $3$

$i$	$S [0.. i]$	border	$π (i)$
$0$	`a`	$ε$	$0$
$1$	`ab`	$ε$	$0$
$2$	`abc`	$ε$	$0$
$3$	`abcd`	$ε$	$0$
$4$	`abcda`	`a`	$1$
$5$	`abcdab`	`ab`	$2$
$6$	`abcdabc`	`abc`	$3$

Suffix tree

$S $ = abcdabc $,$
with a sorting order
$$ < a < b < c < d,$
i.e., in lexicographic order, $ is the first suffix, a the second, etc.

This makes $ the first suffix in lexicographic order. It also separates suffixes cleanly: for example, abc$ is not hidden inside abcdabc$.

First list all suffixes of $S^{'}$ :

start suffix shared prefix
$0$ abcdabc$ $0, 4$ : abc
$1$ bcdabc$ $1, 5$ : bc
$2$ cdabc$ $2, 6$ : c
$3$ dabc$ none
$4$ abc$ $0, 4$ : abc
$5$ bc$ $1, 5$ : bc
$6$ c$ $2, 6$ : c
$7$ $ none

Only $3, 7$ have no shared prefixes, thus, they become direct leafs from the root. The suffix tree is the compressed trie of these suffixes

Each orange leaf stores the starting position of its suffix in $S$$.

start	suffix	shared prefix
$0$	`abcdabc$`	$0, 4$ : `abc`
$1$	`bcdabc$`	$1, 5$ : `bc`
$2$	`cdabc$`	$2, 6$ : `c`
$3$	`dabc$`	none
$4$	`abc$`	$0, 4$ : `abc`
$5$	`bc$`	$1, 5$ : `bc`
$6$	`c$`	$2, 6$ : `c`
$7$	`$`	none

Suffix array

Sort the same suffixes lexicographically and assign a rank to each:

$S A$ suffix rank
$7$ $ $0$
$4$ abc$ $1$
$0$ abcdabc$ $2$
$5$ bc$ $3$
$1$ bcdabc$ $4$
$6$ c$ $5$
$2$ cdabc$ $6$
$3$ dabc$ $7$

Thus, my suffix array is:
$SA = [7, 4, 0, 5, 1, 6, 2, 3] .$

$S A$	suffix	rank
$7$	`$`	$0$
$4$	`abc$`	$1$
$0$	`abcdabc$`	$2$
$5$	`bc$`	$3$
$1$	`bcdabc$`	$4$
$6$	`c$`	$5$
$2$	`cdabc$`	$6$
$3$	`dabc$`	$7$

Exercise 4

Instruction

In the lecture, the runtime analysis of the Knuth-Morris-Pratt Algorithm is an example of the sliding window technique. This technique can be applied in many problems with “linear” structure, as in the following example.

Suppose we are given as input a number $k$ and an array $(a_{1}, \dots, a_{n})$ sorted in increasing order. The task is to find $i, j \in {1, \dots, n}$ , such that $a_{i} + a_{j} = k$ .

The algorithm based on the sliding window technique for this problem is as follows. We start with $i = 1$ and $j = n$ and repeatedly perform the following procedure:

If $a_{i} + a_{j} < k$ , then we increase $i$ by $1$ .

If $a_{i} + a_{j} > k$ , then we decrease $j$ by $1$ .

We stop when $a_{i} + a_{j} = k$ , when $i > n$ , or when $j < 1$ . In the latter case, we conclude that no such $i$ and $j$ satisfy the requirement.

Your tasks for this exercise are:

Determine the running time of the algorithm above (in $O$ notation, but with providing a bound which is as tight as possible).

Argue that the algorithm correctly solves the problem.

(int, int) Search(Array a, int k)

assert a is sorted in increasing order

int n = |a|;

// 1-based indexing
int i = 1;
int j = n;

while i <= n and j >= 1 {
	int s = a[i] + a[j];
	if s == k {
		return (i, j);
	} else if s < k {
		// left-side is not big enough
		i += 1;
	} else {
		// right-side is too big
		assert s > k;
		j -= 1;
	}
}

return null;

Runtime Analysis

In every loop iteration, exactly one of these happen (unless the algorithm immediately returns):

$i \leftarrow i + 1$ , or

$j \leftarrow j - 1$ .

Each iteration performs only constant-time work, i.e., $Θ (1)$ , and we loop at most $2 n$ iterations given that
$1 \leq i \leq n and 1 \leq j \leq n .$
Further, we only search over $n = ∣ a ∣$ , so the running time is $T (n)$ . Hence
$T (n) \leq O (2 n) = O (n) .$

Correctness

Given an instance $(a, k)$ , where $a$ is sorted in increasing order.

If the algorithm returns a pair $(i, j)$ , then it has checked
$a_{i} + a_{j} = k .$
Therefore, the returned pair is a correct solution.

Now suppose the algorithm returns $null$ . We show that every pointer move only discards candidates that cannot be part of a solution.

Sum too small

Assume that

$a_{i} + a_{j} < k .$
Since $a$ is sorted increasingly, every still available right index $q \leq j$ satisfies
$a_{q} \leq a_{j} .$
Hence
$a_{i} + a_{q} \leq a_{i} + a_{j} < k .$
So $a_{i}$ is too small even with the largest still available right component, which means that no pair using $i$ can be a solution anymore and that it is safe to increase $i$ .

Sum too large

Assume that

$a_{i} + a_{j} > k .$
Since $a$ is sorted increasingly, every still available left index $p \geq i$ satisfies
$a_{p} \geq a_{i} .$
Hence
$a_{p} + a_{j} \geq a_{i} + a_{j} > k .$
So $a_{j}$ is too large even with the smallest still available left component, which means that no pair using $j$ can be a solution anymore and that it is safe to decrease $j$ .

Thus, every step either returns a correct pair or removes only impossible candidates, so if the algorithm terminates with $null$ because $i > n$ or $j < 1$ , no candidate pair remains although only impossible candidates were removed. Hence, no valid pair exists.

Therefore, the algorithm correctly solves the problem.

Exercise 5

Instruction

Given a string $S$ and a natural number $k \geq 2$ , describe an $O (∣ S ∣)$ -time algorithm to find the maximum border $B$ of $S$ , such that $∣ B ∣$ is a multiple of $k$ . Briefly justify the correctness and running time bound of your algorithm.

Given a string $S$ and a natural number $k \geq 2$ , let $n = ∣ S ∣$ . To do that, we can use the prefix function, as

π (q) \overset{=}{^} longest border of string S [0.. q] .

So if we take $π (n - 1)$ , we have the longest border of

π (n - 1) \overset{=}{^} longest border of string (S [0.. n - 1] = S) .

Let $ℓ = π (n - 1)$ . If $ℓ$ is divisible by $k$ , we can return $ℓ$ . Otherwise, we have to find a smaller border that’s divisible by $k$ .

Example

Let $S = ab c abab c ab$ . The longest border of $S$ is
$X = ab c ab, ℓ = ∣ X ∣ = π (∣ S ∣ - 1) = 5.$
If $k \neq = 2$ , then $5 mod 2 \neq = 0$ , so this border is not valid. We now need the next smaller border of $S$ .

Since $S [0.. ℓ - 1]$ is the current border, every smaller border of $S$ must also be border of $S [0.. ℓ - 1]$ . Thus, we can ask for the longest border of $S [0.. ℓ - 1]$ :

π (ℓ - 1) \overset{=}{^} longest border of string S [0 \dots ℓ - 1] .

Pseudocode

String MaximumBorder(String S, int k)

assert k >= 2

int n = |S|;
compute prefix function π for S

int ℓ = π(n-1);
while ℓ > 0 and ℓ mod k != 0 {
	ℓ = π(ℓ-1);
}

return S[0 .. ℓ-1];

$S = abcababcab$ and $k = 2$

The borders of $S$ are
$ε, ab, abcab .$
The prefix function first gives
$ℓ = π (9) = 5.$
Since $5 mod 2 \neq = 0$ , the algorithm jumps to the next smaller border:
$ℓ \leftarrow π (4) = 2.$
Since $2 mod 2 = 0$ , it returns $S [0..1] = ab$ .

Correctness

The algorithm starts with $ℓ = π (n - 1)$ , which is the length of the longest border of $S$ .

Whenever the current $ℓ$ is not divisible by $k$ , the border $S [0.. ℓ - 1]$ cannot be the answer, and
$ℓ \leftarrow π (ℓ - 1)$
moves to the next smaller border of $S$ , because every smaller border of $S$ is also a border of the current border $S [0.. ℓ - 1]$ .

Thus, the loop visits the border lengths of $S$ from largest to smallest and stops at the first one divisible by $k$ , which is therefore maximal. If the loop reaches $ℓ = 0$ , no positive valid border exists, so returning $ε$ is correct.

Runtime Analysis

Computing the prefix function takes $O (n)$ time.

During the loop, $ℓ$ strictly decreases because $π (ℓ - 1) < ℓ$ . Thus, the loop performs at most $n$ iterations, and each iteration does constant work.

Hence, the total running time is $O (n)$ .

Exercise 6

Instruction

For a string $S$ , we want to find the length of the longest subsequence that has at least two distinct occurrences in $S$ . Here, two occurrences of a subsequence are distinct, if they differ in at least one index. More formally, we want to find the length of a longest string $T$ , such that for some
$0 \leq i_{0} < \dots < i_{∣ T ∣ - 1} \leq ∣ S ∣ - 1$
and
$0 \leq j_{0} < \dots < j_{∣ T ∣ - 1} \leq ∣ S ∣ - 1,$

$T [ℓ] = S [i_{ℓ}] = S_{j_{ℓ}}$ for all $ℓ \in {0, \dots, ∣ T ∣ - 1}$ (i.e., these indices encode the same subsequence); and

there exists $ℓ \in {0, \dots, ∣ T ∣ - 1}$ such that $i_{ℓ} \neq = j_{ℓ}$ (i.e., the two occurrences are distinct).

Describe an $O (∣ S ∣^{2})$ -time algorithm to solve this problem. Briefly justify the correctness and running time bound of your algorithm.

Lukas' Notes

Exercise Sheet 4

Table of Contents

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

Exercise 6