1 Definition

In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertx v, u comes before v in the ordering.

It sounds pretty academic, but I am sure you are using topological sort unconsciously every single day.

2 Application

Many real world situations can be modeled as a graph with directed edges where some events must occur before others. Then a topological sort gives an order in which to perform these events, for instance:

2.1 College class prerequisites

You must take course b first if you want to take course a. For example, in your alma mater, the student must complete PHYS:1511(College Physics) or PHYS:1611(Introductory Physics I) before taking College Physics II.

The courses can be represented by vertices, and there is an edge from College Physics to College Physics II since PHYS:1511 must be finished before College Physics II can be enrolled.

2.2 Job scheduling

scheduling a sequence of jobs or tasks based on their dependencies. The jobs are represented by vertices, and there is an edge from x to y if job x must be completed before job y can be started.

In the context of a CI/CD pipeline, the relationships between jobs can be represented by directed graph(specifically speaking, by directed acyclic graph). For example, in a CI pipeline, build job should be finished before start test job and lint job.

2.3 Program build dependencies

You want to figure out in which order you should compile all the program’s dependencies so that you will never try and compile a dependency for which you haven’t first built all of its dependencies.

A typical example is GNU Make: you specific your targets in a makefile, Make will parse makefile, and figure out which target should be built firstly. Supposing you have a makefile like this:

# Makefile for analysis report

output/figure_1.png: data/input_file_1.csv scripts/generate_histogram.py
python scripts/generate_histogram.py -i data/input_file_1.csv -o output/figure_1.png

output/figure_2.png: data/input_file_2.csv scripts/generate_histogram.py
python scripts/generate_histogram.py -i data/input_file_2.csv -o output/figure_2.png

output/report.pdf: report/report.tex output/figure_1.png output/figure_2.png
cd report/ && pdflatex report.tex && mv report.pdf ../output/report.pdf

Make will generate a DAG internally to figure out which target should be executed firstly with typological sort:

3 Directed Acyclic Graph

Back to the definition, we say that a topological ordering of a directed graph is a linear ordering of its vertices, but not all directed graphs have a topological ordering.

A topological ordering is possible if and only if the graph has no directed cycles, that is, if it’s a directed acyclic graph(DAG).

Let us see some examples:

The definition requires that only the directed acyclic graph has a topological ordering, but why? What happens if we are trying to find a topological ordering of a directed graph? Let’s take the figure 3 for an example.

The directed graph problem has no solution, this is the reason why directed cycle is forbidden

4 Kahn’s Algorithm

There are several algorithms for topological sorting, Kahn’s algorithm is one of them, based on breadth first search.

The intuition behind Kahn’s algorithm is pretty straightforward:

To repeatedly remove nodes without any dependencies from the graph and add them to the topological ordering

As nodes without dependencies are removed from the graph, the original nodes depend on the removed node should be free now.

We keep removing nodes without dependencies from the graph until all nodes are processed, or a cycle is detected.

The dependencies of one node are represented as in-degree of this node.

Let’s take a quick example of how to find out a topological ordering of a given graph with Kahn’s algorithm.

Now we should understand how Kahn’s algorithm works. Let’s have a look at a C++ implementation of Kahn’s algorithm:

#include <deque>
#include <vector>
// Kahn's algorithm
// `adj` is a directed acyclic graph represented as an adjacency list.
std::vector<int>
findTopologicalOrder(const std::vector<std::vector<int>> &adj) {
  int n = adj.size();

  std::vector<int> in_degree(n, 0);

  for (int i = 0; i < n; i++) {
    for (const auto &to_vertex : adj[i]) {
      in_degree[to_vertex]++;
    }
  }

  // queue contains nodes with no incoming edges
  std::deque<int> queue;
  for (int i = 0; i < n; i++) {
    if (in_degree[i] == 0) {
      queue.push_back(i);
    }
  }

  std::vector<int> order(n, 0);

  int index = 0;
  while (queue.size() > 0) {
    int cur = queue.front();
    queue.pop_front();
    order[index++] = cur;

    for (const auto &next : adj[cur]) {
      if (--in_degree[next] == 0) {
	queue.push_back(next);
      }
    }
  }

  // there is no cycle
  if (n == index) {
    return order;
  } else {
    // return an empty list if there is a cycle
    return std::vector<int>{};
  }
}

5 Bonus

When a pregnant woman takes calcium pills, she must make sure also that her diet is rich in vitamin D, since this vitamin makes the absorption of calcium possible.

After reading the demonstration of topological ordering, you (and I) too should take a certain vitamin, metaphorically speaking, to help you absorb. The vitamin D I pick for you (and myself) is two leetcode problems, which involve with the most typical use case of topological ordering – college class prerequisites:

6 Reference