One of the reasons that I originally wrote MatlabBGL was to be able to quickly implement and test algorithms that utilize graph algorithms as a subroutine. A great example of this is the FlowImprove routine from Andersen and Lang. (Needless disclosure, I know both of the authors and have an extremely high level of respect for their work.)
The authors’ own descriptions of the methods are a 2008 paper in SODA and a talk at Stanford’s MMDS conference:
- Andersen and Lang. An algorithm for improving graph partitions. ACM-SIAM symposium on discrete algorithms, 2008.
At a high-level, this routine takes a partition of a graph into two pieces, and returns a new partition of the graph with an improved quotient cut score. As a reminder (or an introduction!) the quotient cut score of a set is
where measures the number of edges leaving the set S, and is the sum of weights of vertices in S and “not S” (), respectively. A common choice for the weight of a vertex is the uniform weight, or the degree of the vertex. In the second case, the QCut score corresponds to the Normalized Cut objective — another frequently used partitioning score.
The algorithm works by solving a sequence of max-flow/min-cut problems. At each iteration, the weights of the edges changes. The algorithm works towards improving the partition by adjusting the weights. They show that the algorithm will do at most iterations for a graph with n vertices, but usually find that only a few iterations are required. It’s quite likely there is some improved theory that could explain why it works so fast in practice.
This makes it a perfect algorithm to implement in MatlabBGL, it really only requires 5 or 6 max-flow calls! The only real difficulty in the implementation, which is entirely analogous to the pseudocode in the paper, is rounding the floating point weights in the flow procedure to integer weights in order to use the push-relabel algorithm in MatlabBGL (and which is what the authors use in the paper). Because of this change, we introduce the “scale” in the code below to do this rounding.
The full implementation
This implementation is implemented as a custom code in MatlabBGL: https://github.com/dgleich/matlab-bgl/blob/master/custom/flow_improve.m
function [S,q] = flow_improve(G,A,p,maxiter) % FLOW_IMPROVE Improve an existing clustering using max-flow % G must be a symmetric graph % A is an indicator over the vertices of G % p a non-negative integer vertex weight vector (default ones(n,1)) % maxiter the maximum number of times to run (default 5) %% History % 2008-10-20: Added partition list input. n = size(G,1); if ~exist('maxiter','var') || isempty(maxiter), maxiter=5; end if ~exist('p','var'), p='cond'; end if ~islogical(A), lA = false(n,1); lA(A)=1; A=lA; end % convert A to logical % compute p if isempty(p), p = ones(n,1); end if ischar(p) && strfind(p,'cond'), p = full(sum(G,2)); end [ei ej ev] = find(G); L = diag(sum(G,2))-G; % compute the laplacian from_s = find(A); pA = p(A); to_t = find(~A); pB = p(~A); infval = 2^29; piA = sum(pA); piB = sum(pB); iter= 1; S= A; % compute alpha = Q(A) v= zeros(n,1); v(S) = 1; v(~S) = -1; cv = v'*(L*v)/4; bal = min(piA,piB); alpha = cv/min(piA,piB); q = alpha; fprintf('Initial cut : cut= %7i bal= %7i quot= %8.5f\n', ... cv, bal, q); q_old = q; S_old = S; while iter<=maxiter % compute G_A(alpha) scale = ceil(1/alpha); from_s_v = alpha*pA*scale; to_t_v = alpha*pB*piA/piB*scale; GA = sparse([ei; (n+1)*ones(length(from_s),1); to_t], ... [ej; from_s; (n+2)*ones(length(to_t),1)], ... [ev*scale; ceil(from_s_v); ceil(to_t_v)], n+2, n+2); [flowval,cut] = push_relabel_max_flow(GA,n+1,n+2); S = cut(1:n)==1; DAofS = sum(p(S&A)) - sum(p(S&~A))*piA/piB; if DAofS > 0, v= zeros(n,1); v(S) = 1; v(~S) = -1; cv = v'*(L*v)/4; QAofS = cv/DAofS; else QAofS = infval; end bal = min(sum(p(S)), sum(p(~S))); q = cv/bal; fprintf(' iter %3i : cut= %7i bal= %7i q= %8.5f quot(A)= %8.5f DA= %7f\n', ... iter, cv, bal, q, QAofS, DAofS); if QAofS>=alpha, S = S_old; q = q_old; break; end alpha = QAofS; q_old = q; S_old = S; end
How well does it do? Well, we can reproduce some of the examples from the paper:
Here’s a vertical bisection of a US road network with 126,000 vertices
The score of this cut is 148/57507 (quotient score = 0.00257). In 5 seconds, the flow_improve routine yields a new partition which has a cut score of 42/62244 (quotient score = 0.00067). This new cut tracks the Mississippi river and the Indiana border. This was an example that Kevin Lang showed me years ago, and it’s fun to be able to easily reproduce it.