Recently I dug up some notes from 2 years ago, about a problem I encountered at a previous job. It is a challenge of responsiveness layout for a simple visualization type: pie chart. Here’s a quick demo:

- Try resize the chart area (make it smaller)
- Notice not all pie slices have a callout labels displayed
- Labels are dynamically removed if available areas are too small

The code for this d3 chart is adapted from: http://bl.ocks.org/dbuezas/9306799

We had found a pretty good divide and conquer solution, and iterated on it quite a bit. However, going through my notes, I have come up another formulation for this problem, albeit, with worse time complexity. This new idea is to model it as a graph, I find it quite neat (though I am certain this problem and its more generalized forms have been well-covered in CS literatures).

I wrote this version of the solution in `rust`

and `WebAssembly`

, and it is used in the demo above.

For a simple pie chart like above, we have two columns of callout labels. Notice on each side, labels are aligned horizontally (either left or right-aligned). For either side, we want the following:

- Labels should never overlap, regardless of available screen estate (height-wise)
- We’d like to keep as many “important” labels as possible - by some criteria of importance

In context of pie charts, it makes sense we would prefer labels for slices with larger arcs. People probably usually want to know about components with the largest shares among the total. Alternatively, we may want to ignore the share of pie for each label, and aim to maximize number of labels kept.

In both situations, we could provide a weight function for each label (using integer as weights for simplicity): . For maximizing remaining number of labels , this function is simply a constant function: . The inputs (labels) can be represented as a set of 1-d intervals, corresponding to their y coordinates. Given a set of weighted labels , we want to find a non-overlapping subset of :

where

Here we define operator as non-overlapping binary relation, . We do not require the intersection of two intervals to be empty for them to be considered non-overlapping, they could share a single boundary point (eg. and are considered non-overlapping).

The operator defined here is not a very nice binary relation. It is symmetric, that is, . But it is not transitive, . Here’s an example:

does not overlap with , and does not overlap with , but does overlap with .

On the other hand, we could discard symmetry, and try to come up with another relation for intervals that is in fact transitive. This is also very straight-forward, let’s call this new relation , defined as:

Clearly, . Also, . The relation states the right hand side interval strictly follows the left hand side without any overlaps. And it’s easy to verify is transitive. Furthermore is not defined for two overlapping intervals.

What’s good about , well, it is a strict partial order:

- It’s
*irreflexive*, that is is not defined for any interval (an interval always overlaps with itself) - It is
*antisymmetric*, if , then must not hold - It is transitive

A partial order relation is closely tied to DAGs (Directed acyclic graph). Any finite strict partially ordered set can be represented as a DAG. Note our input intervals is exactly this. Recall the desired output must satisfy:

Expressed using :

Looking at it this way, we are basically looking for a chain or a strictly total ordered subset of the input partially ordered set. Stated in graph terms, we are looking for a *path* in a given DAG.

With this realization, now it’s time to construct this DAG:

- Each node in the graph is an interval, it has a non-negative weight
- Each edge corresponds to the relation, that is
`node_i`

connects to`node_j`

if and only if`node_i`

`node_j`

- In addition, we add two dummy nodes:
`start_node`

, it has a weight 0, and connects to all the nodes in the input set`end_node`

, it has a weight 0, and all input nodes connect to it

To illustrate, this set of three labels:

Translates into this graph:

The problem becomes finding a path from `start_node`

to `end_node`

, with largest sum of node weights - a variant of the longest path problem.

A quick google search for “longest path graph” points to this link: http://www.mathcs.emory.edu/~cheung/Courses/171/Syllabus/11-Graph/Docs/longest-path-in-dag.pdf

It gives a very succinct solution, keyword *dynamic programming*:

- Conduct a topological sort on the graph, to find out a linearized order of its nodes
- Finding a path with largest node weights ending at
`node_i`

requires finding paths ending in all its predecessors`node_j`

with same property up to`j`

(based on the linearized order) - Pick the path from
`node_j`

that has the largest accumulated weight`w_j`

, and mark the total weight up to`node_i`

as`w_j`

+`weight(node_i)`

- Continue until reaching
`end_node`

Pausing for a second, it’s very easy to identify this is essentially the same dynamic programming solution to the longest increasing sequence problem - a popular algorithm tutorial topic. There is a more efficient () solution, but it’s a bit over my head. It would be interesting to know if that could be adapted for the labels layout challenge at hand.

I chose rust for implementing this for two reasons:

- I have been learning rust for quite a while now, it’s fun and addicting
- rust has a
`PartialOrd`

trait that’s particularly suitable in expressing this problem

To illustrate the second point, I used this trait to represent an input label/weighted interval:

```
trait Weighted: PartialOrd {
fn weight(&self) -> u32;
}
```

Three lines of code and that’s it, all the solution code just needs to work with this trait in a generic manner, and does not even need to have any notion of intervals or labels.

Here is the entry function, it returns a list of indices to keep:

```
fn solve<W: Weighted>(inputs: &[W]) -> Vec<uszie> {
let g = build_graph(inputs);
g.longest_path()
}
struct Graph {
nodes: Vec<Node>
}
impl Graph {
fn longest_path(mut self) -> Vec<usize> {
//...
}
}
fn build_graph<W: Weighted>(inputs: &[W]) -> Graph {
// ..
}
```

`Graph`

is a simple adjacency list representation storing a `Vec`

of nodes, referencing each node with its index.

`Node`

struct:

```
struct Node {
predecessors: IntHashSet<usize>,
weight: u32,
dist: u32,
path_predecessor: Option<usize>,
}
```

Field `predecessors`

is a list of indices for its predecessor nodes, I am using an `IntHashSet`

here not necessarily for any performance reason, but for making certain part of the code a little less verbose, I suspect a plain `Vec`

might actually perform better for realistic work loads.

All nodes’ `dist`

field are initialized to be 0, and its `path_predecessor`

as `None`

. Calling `longest_path`

method consumes the graph, runs the dynamic programming loop, and populates these fields for each node in linear (`O(V + E)`

) time.

I have tried a few graph algorithm crates for this task, and ended up picking pathfinding, as it is easiest to use, and has an unassuming api - just a plain function, not requiring any specific graph representation or traits implementations.

These few lines of code does the job:

```
impl Graph {
fn longest_path(mut self) -> Vec<usize> {
//...
let mut sorted = topological_sort(&node_ids, |node_id| {
let node = &self.nodes[*node_id];
node.predecessors.iter().map(|id| *id)
}).unwrap();
sorted.reverse();
// dynamic programming loop
for i in 0..n {
self.mark_longest_path_for(sorted[i]);
}
// ...
}
}
```

This is the method solving a single sub-problem, assign a `path_predecessor`

for a single node, and update its accumulated path weight (`node.dist`

).

```
impl Graph {
fn mark_longest_path_for(&mut self, node_id: usize) {
// start_node
if node_id == 0 {
self.nodes[0].dist = 0;
self.nodes[0].path_predecessor = None;
return;
}
let pred_dist;
let path_predecessor;
{
let (i, pred_node) = self
.predecessors(node_id)
// pick predecessor with largests dist
.max_by_key(|(_, node)| node.dist)
.unwrap(); // assume all sub-problems before have been solved
pred_dist = pred_node.dist;
path_predecessor = i;
}
let node = &mut self.nodes[node_id];
let weight = node.weight;
node.dist = pred_dist + weight;
node.path_predecessor = Some(path_predecessor);
}
}
```

Finally with the functions ready, we just need to implement `PartialOrd`

and `Weight`

for intervals, first, a simple representation of `Interval`

:

```
#[derive(Debug, Copy, Clone, PartialEq)]
pub struct Interval {
lower: f64,
upper: f64,
weight: u32,
}
```

There is no `StrictPartialOrd`

trait in `rust`

, thus we’d settle for using `PartialOrd`

, which requires `PartialEq`

, and thankfully it’s auto-`derive`

-able. Next, we provide our `partial_cmp`

implementation,

```
impl PartialOrd for Interval {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
match self.upper.partial_cmp(&other.lower) {
Some(Less) | Some(Equal) => return Some(Less),
_ => {}
};
match other.upper.partial_cmp(&self.lower) {
Some(Less) | Some(Equal) => return Some(Greater),
_ => {}
};
None
}
}
```

Notice, `partial_cmp`

has a type signature of `-> Option<Ordering>`

, but only ever returns one of these three:

`Some(Ordering::Greater)`

`Some(Ordering::Less)`

`Node`

And never `Some(Ordering::Equal)`

. Thus we are using it at value level as a true strict order. In addition, it is also `NaN`

safe, using `f64`

’s own `PartialOrd`

trait implementation to guard against it.

Oh, and finally,

```
impl Weighted for Interval {
fn weight(&self) -> u32 {
self.weight
}
}
```

When it comes to testing, the first thing we’d like to verify: the solution set indeed does not overlap. Recall from earlier, this condition is formularized using relation:

In terms of code, this translates quite directly. Here, we use `tuple_combinations`

from `itertools`

crate to enumerate over pairs of indices.

```
fn path_overlaps<W: Weighted>(xs: &[W]) -> bool {
let n = xs.len();
if n < 2 {
return false;
}
for (i, j) in (0..n).tuple_combinations() {
match xs[i].partial_cmp(&xs[j]) {
Some(Greater) | Some(Less) => {}
_ => return true,
}
}
false
}
```

Quickcheck is a very useful and fun crate for randomized property testing. Here, we will use it to generate random sets of `Interval`

s with arbitrary positions and sizes. Next we check each solution produced from them does not overlap, using `path_overlaps`

function we just wrote.

```
quickcheck! {
fn removes_overlapping_spans(spans: Vec<WeightedSpan>) -> bool {
let g = build_graph(&spans);
let path = g.longest_path();
let retained_spans: Vec<_> =
path.into_iter().map(|i| spans[i]).collect();
!path_overlaps(&retained_spans)
}
}
```

For this to work, we need to implement the `Arbitrary`

trait from `quickcheck`

- basically to specify how to randomly generate individual `WeightedSpan`

s. Since quickcheck implements `Arbitrary`

for `Vec<T>`

if `T: Arbitrary`

, it is sufficient to perform this test.

Next, we would also like to know our algorithm produces a solution with best possible total weight. This is hard to verify for arbitrary inputs, however, we could do a comparison with a brute force solver for small enough input sizes. The brute force approach is of course, to check all subsets of input spans.

I have used `SmallVec<[WeightedSpan; 16]>`

for this task. smallvec crate provides array backed `Vec`

-like structs that is suitable for this use case. To enumerate through all subsets for any collection of arbitrary 16 spans, we use all `u16`

values (65536 total) to serve as bit masks. This function below performs the exhaustive search using some basic iterator operations.

```
fn solve_brute(spans: SmallSpans<WeightedSpan>) -> SmallSpans<WeightedSpan> {
(0..u16::max_value())
.map(|mask| spans.subset(mask))
.filter(|subset| !path_overlaps(subset.as_slice()))
.max_by_key(|subset| score(subset.as_slice()))
.unwrap_or(SmallSpans {
spans: SmallVec::new(),
})
}
```

Since previous `quickcheck`

test ensures our algorithm does not produce any solution that overlaps, this time, we just need to check its total weight is optimal. There’s no need to require an identical solution to the brute force outcome, as there could be multiple solutions sharing the same score.

```
quickcheck! {
fn check_correctness(spans: SmallSpans<WeightedSpan>) -> bool {
let g = build_graph(spans.as_slice());
let path = g.longest_path();
let retained_spans: Vec<_> =
path.into_iter().map(|i| spans.spans[i]).collect();
let solution = solve_brute(spans);
let score_1 = score(retained_spans.as_slice());
let score_2 = score(solution.as_slice());
score_1 == score_2
}
}
```

With relative little effort, we now have a fairly good test suite that gives us some confidence about the algorithm. In fact, when making the demo, it worked the first time after I got all rust tests to pass. In general, I had way more bugs on js side creating it, and the wasm code has been much more solid.

If you finished reading this, I hope it has been interesting. Whenever someone claims frontend developers don’t need algorithm or data structure knowledge, this has been my go to counter example. The code for this small project is on github here. It was a fun weekend exercise.