缩点

2020-08-05

Tarjan算法（以发现者Robert Tarjan[1]命名）是一个在图中查找强连通分量的算法。

问题

Network of Schools(POJ-1236)
Description

A number of schools are connected to a computer network. Agreements have been developed among those schools: each school maintains a list of schools to which it distributes software (the “receiving schools”). Note that if B is in the distribution list of school A, then A does not necessarily appear in the list of school B
You are to write a program that computes the minimal number of schools that must receive a copy of the new software in order for the software to reach all schools in the network according to the agreement (Subtask A). As a further task, we want to ensure that by sending the copy of new software to an arbitrary school, this software will reach all schools in the network. To achieve this goal we may have to extend the lists of receivers by new members. Compute the minimal number of extensions that have to be made so that whatever school we send the new software to, it will reach all other schools (Subtask B). One extension means introducing one new member into the list of receivers of one school.

Input

The first line contains an integer N: the number of schools in the network (2 <= N <= 100). The schools are identified by the first N positive integers. Each of the next N lines describes a list of receivers. The line i+1 contains the identifiers of the receivers of school i. Each list ends with a 0. An empty list contains a 0 alone in the line.

Output

Your program should write two lines to the standard output. The first line should contain one positive integer: the solution of subtask A. The second line should contain the solution of subtask B.

Sample Input

Sample Output

1
2

1
2

题目大意是，有 $N$ 间学校，每间学校维护这一份软件分发列表，如果 $B$ 在 $A$ 的列表上，软件就可以由 $A$ 传向 $B$。
任务一：软件最少分给几间学校，可以使软件传遍所有学校
任务二：最少需要添加多少条边，可以使得：软件给任意一个学校后，都能传遍所有学校。

任务一看起来像是要找到所有结点的根结点，貌似并查集挺符合的，但是题目给的图有向且有环，任务二要求结点能够两两相互到达，这就很明显了，结点之间不仅要连通，而且要强连通。所以这道题的主要任务就是算出给定图的强连通分量。

这就要引进下面这个算法：

Tarjan 算法

Tarjan /tarjan/ (塔扬)，Tarjan发明了很多算法，都叫 Tarjan 算法，这里讲的是求强连通分量的 Tarjan 算法。

图的各种边

图一：四种不同的边

如图(a)，是一棵普通的树，连通，但没有两两相互抵达，黑色的边叫做树边。
如图(b)，从 1 结点的子树上，返回了一条蓝色的边，这种指向某个结点祖先的边叫做后向边，也叫反祖边。
如图(c)，一台红色的边从 1 结点指向了它的子树的结点，这种边和后向边相反，它叫前向边。
如图(d)，一条边从 4 结点指向了它的兄弟，类似的，从一个结点指向自己所在路径之外的结点 （比如这里到 4 结点的路径是 1 -> 2 -> 4 ，那 3 就在路径之外了） ，这条边叫做横叉边。

了解这些有什么用呢？有用！
后向边就很有用，就以图(b)为例。原本只有一条路径（黑色的边）可以从 1 走到 3，现在出现一条后向边指向了 3 结点的祖先，那又可以从 3 走回去了！然后又从 1 走到 2,2 走到 3，3 走到 1 ……树就变成了强连通图,或者可以说有了回路，有了环。所以计算强连通图的任务可以简化为查找后向边。¹

原理

考虑一下 dfs，如果图中存在环，那么搜索时必会碰到之前遇到过的点。Tarjan 就是一个魔改过的 dfs 算法。
在算法执行过程中主要维护这两个变量：

$dfn[u]$：在深度优先搜索时结点 $u$ 被遍历到的顺序。
$low[u]$：以$u$为根节点的子树上，所能找到的最小 $dfn$

如果从某个结点 $u$ 开始遍历，遍历结束后发现 $low[u] = dfn[u]$，说明从$u$开始遍历的子树又回到了 $u$，因为 $low[u]$ 记载的是子树路径上遇到最早被遍历到的点。

搜索过程中用栈记录遍历到的节点，这里用栈只是为了方便记录遍历结点的先后顺序。（关于栈的作用的可以看这篇博客²，图解很详细）

更新数值的方法如下：（以 $u$ 代表当前节点，$v$ 代表下一个节点）

如果 $(u,v)$ 为树边或前向边，且 $v$ 没有被遍历过，就顺着往下遍历，遍历完之后用 $low[v]$ 更新 $low[u]$ 方式是 $low[u] = \min(low[u],low[v])$。因为 $v$ 如果能够到达比 $u$ 更早的结点，那 $u$ 也能顺着 $v$ 到达这个节点。
如果 $(u,v)$ 是后向边，且 $v$ 在栈中，那么直接用 $\min(dfn[v],low[u])$ 更新 $low[u]$。
如果是横叉边，$v$ 不在栈中（如果在栈中，就成树边了），那么就不用处理。

实现

伪代码

$\begin{array}{ll} 1 & \textbf{Input. } \text{a node of graph } e \text{ which is called } u \\ 2 & \textbf{Output. } \text{All the result of strongly connected components }\\ & \text{is stored in the array } scc. \text{The number of scc is strored in } scnt \\ 3 & \textbf{Method.} \text{ Tarjan}\\ 4 & dfn[u],low[u] \gets \text{the order of } u \text{ in traverse}\\ 5 & \text{push } u \text{ into stack}\\ 6 & \textbf{for} \text{ each } (u,v) \text{ in the stored } e \\ 7 & \qquad \textbf{if } v \text{ haven't been traversed}\\ 8 & \qquad \qquad \text{Tarjan(}{v} \text{)}\\ 9 & \qquad \qquad low[u] \gets \min(low[u],low[v]) \\ 10& \qquad \textbf{else if } v \text{ is in stack} \\ 11& \qquad \qquad low[u] \gets \min(low[u],dfn[v]) \\ 12& \textbf{if } dfn[u] = low[u] \\ 13& \quad scnt \gets scnt + 1 \\ 14& \quad \textbf{do} \\ 15& \quad \quad node \gets \text{the top of stack} \\ 16& \quad \quad \text{pop stack} \\ 17& \quad \quad scc[node] \gets scnt \\ 18& \quad \textbf{while } node \ne u \end{array}$

C++实现

vector<int> G[MAXN];
stack<int> Stack;
bool in_stack[MAXN];
int scc[MAXN],scnt;
int dfn[MAXN],low[MAXN];
int dcnt;

void tarjan(int u)
{
    dfn[u] = low[u] = ++dcnt;
    Stack.push(u);
    in_stack[u] = true;

    for(auto v : G[u])
    {
        if(!dfn[v])
        {
            tarjan(v);
            low[u] = min(low[u],low[v]);
        }
        else if(in_stack[v]) low[u] = min(low[u],dfn[v]);
    }

    if(dfn[u] == low[u])
    {
        ++scnt;
        int node;
        do
        {
            node = Stack.top();
            Stack.pop();
            in_stack[node] = false;
            scc[node] = scnt;
        } while(node != u);
    }
}

每个点位于哪个强连通分量内，信息储存在scc数组里，总分量个数存在scnt里。使用时要对每个节点都使用一遍 tarjan：

1	for(int i = 1;i <= n;i++) tarjan(i);

缩点

强连通分量有一个特点，就是同一个连通分量内，每一个结点都相互可达，那么对于一个外部结点来说，能访问其中一个，便能访问分量内的所有结点。那么对外部结点来讲，同一个连通分量的所有结点就可以看成一个整体。这个思想叫做缩点。
像是「环」这种结构，因为环自身就是一个强连通的结构，所以通过把环内所有节点看成一个整体，对外部来讲就少了一个环。如果把所有环都经过缩点处理，那么整张图就变成了有向无环图(DAG)，有了DAG之后就可以进行很多~~奇奇怪怪的~~算法了，比如简单的dfs，或者可以在上面进行dp。

图二：将强连通分量缩点

题解

讨论

任务一：软件最少分给几间学校，可以使软件传遍所有学校
因为同一个强连通分量内可以相互传递，所以同一个分量是可以缩成一个点来看待的。
对于一个节点来说，如果拥有入度，说明软件可以由其它有出度的点传过来，这个点就不用管了，对于没有入度的点，那软件就必须发到这个学校。所以只要统计缩点后入度为 0 的点的个数就行了。

（写题目的时候第一个想到的就是这个办法，但不知道是不是脑子进水,我自己否定掉了这个办法 qwq。。。所以不要随便否定自己的方案，除非有很明确的输入可以证明这方法是错的）

任务二：最少需要添加多少条边，可以使得：软件给任意一个学校后，都能传遍所有学校
还是从入度和出度的角度考虑，要使整张图强连通，就需要解决掉入度为 0 的点和出度为 0 的点。

$n(\text{入度为 0}) > n(\text{出度为 0})$，可以从出度为 0 的点和其它任意点接一些边到入度为 0 的点上，结果为 $n(\text{入度为 0})$。
$n(\text{入度为 0}) < n(\text{出度为 0})$, 从出度为 0 的点接边到入度为 0 的点，剩下出度为 0 的点接边到任意点，结果为 n(\text{出度为 0})。
$n(\text{入度为 0}) = n(\text{出度为 0})$，从出度为 0 的点接边到入度为 0 的点，结果为 $n(\text{入度为 0})$ 或 $n(\text{出度为 0})$
答案为 $\max(n(\text{入度为 0}) , n(\text{出度为 0}))$。

当缩点后整张图只有一个强连通分量时，结果是 1 ，但其实不用添加边整张图就已经强连通，所以这时候要特判答案为 0。

代码

使用链式前向星储存图，tarjan用到的栈是手写的，不过没关系，都一样（

#include <cstdio>
#define MAXN 105
#define MAXM 10005
#define min(a,b) (((a) < (b)) ? (a) : (b))
#define max(a,b) (((a) > (b)) ? (a) : (b))

int head[MAXN],nxt[MAXM],to[MAXM];
int cnt;
int stack[MAXN],top = -1;
bool instack[MAXN];
int scc[MAXN],scnt;
int dfn[MAXN],low[MAXN];
int dfscnt;
int in_deg[MAXN],out_deg[MAXN];

void add(int u,int v)
{
    nxt[++cnt] = head[u];
    head[u] = cnt;
    to[cnt] = v;
}

void init()
{
    for(int i = 0;i < MAXN;i++) head[i] = -1;
    for(int i = 0;i < MAXM;i++) nxt[i] = -1;
    cnt = 0;
}

void tarjan(int u)
{
    dfn[u] = low[u] = ++dfscnt;
    stack[++top] = u;
    instack[u] = true;

    for(int i = head[u];~i;i = nxt[i])
    {
        int v = to[i];
        if(!dfn[v])
        {
            tarjan(v);
            low[u] = min(low[u],low[v]);
        }
        else if(instack[v]) low[u] = min(low[u],dfn[v]);
    }

    if(dfn[u] == low[u])
    {
        ++scnt;
        while(stack[top] != u)
        {
            scc[stack[top]] = scnt;
            instack[stack[top]] = false;
            top--;
        }
        scc[u] = scnt;
        instack[u] = false;
        top--;
    }
}

int main()
{
    init();

    int n;
    scanf("%d",&n);

    for(int u = 1;u <= n;u++)
    {
        int v;
        while (true)
        {
            scanf("%d",&v);
            if(v == 0) break;
            add(u,v);
        }
    }

    for(int i = 1;i <= n;i++) if(!dfn[i])tarjan(i);

    for(int u = 1;u <= n;u++)
    {
        int v;
        for(int i = head[u];~i;i = nxt[i])
        {
            v = to[i];
            if(scc[u] != scc[v])
            {
                out_deg[(scc[u])]++;
                in_deg[(scc[v])]++;
            }
        }
    }

    int zero_in = 0,zero_out = 0;

    for(int i = 1;i <= scnt;i++)
    {
        if(in_deg[i] == 0) zero_in++;
        if(out_deg[i] == 0) zero_out++;
    }

    if(scnt == 1) printf("1\n0");
    else printf("%d\n%d",zero_in,max(zero_in,zero_out));

    return 0;
}

参考资料

¹. 强连通分量-OI-Wiki ↩

². 有向图强连通分量的Tarjan算法 ↩