FlowImprove in MatlabBGL: Fast partition improvement!

One of the reasons that I originally wrote MatlabBGL was to be able to quickly implement and test algorithms that utilize graph algorithms as a subroutine. A great example of this is the FlowImprove routine from Andersen and Lang. (Needless disclosure, I know both of the authors and have an extremely high level of respect for their work.)

The authors’ own descriptions of the methods are a 2008 paper in SODA and a talk at Stanford’s MMDS conference:

http://www.stanford.edu/group/mmds/slides2008/andersen.pdf
Andersen and Lang. An algorithm for improving graph partitions. ACM-SIAM symposium on discrete algorithms, 2008.

At a high-level, this routine takes a partition of a graph into two pieces, and returns a new partition of the graph with an improved quotient cut score. As a reminder (or an introduction!) the quotient cut score of a set is

$\text{QCut}(S) = \frac{\text{cut}(S)}{\min(w(S),w(\bar{S}))}$

where $\text{cut}(S)$ measures the number of edges leaving the set S, and $w(S), w(\bar{S})$ is the sum of weights of vertices in S and “not S” ( $\bar{S}$ ), respectively. A common choice for the weight of a vertex is the uniform weight, or the degree of the vertex. In the second case, the QCut score corresponds to the Normalized Cut objective — another frequently used partitioning score.

The algorithm works by solving a sequence of max-flow/min-cut problems. At each iteration, the weights of the edges changes. The algorithm works towards improving the partition by adjusting the weights. They show that the algorithm will do at most $n^2$ iterations for a graph with n vertices, but usually find that only a few iterations are required. It’s quite likely there is some improved theory that could explain why it works so fast in practice.

This makes it a perfect algorithm to implement in MatlabBGL, it really only requires 5 or 6 max-flow calls! The only real difficulty in the implementation, which is entirely analogous to the pseudocode in the paper, is rounding the floating point weights in the flow procedure to integer weights in order to use the push-relabel algorithm in MatlabBGL (and which is what the authors use in the paper). Because of this change, we introduce the “scale” in the code below to do this rounding.

The full implementation

This implementation is implemented as a custom code in MatlabBGL: https://github.com/dgleich/matlab-bgl/blob/master/custom/flow_improve.m

function [S,q] = flow_improve(G,A,p,maxiter)
% FLOW_IMPROVE Improve an existing clustering using max-flow
%   G must be a symmetric graph
%   A is an indicator over the vertices of G
%   p a non-negative integer vertex weight vector (default ones(n,1))
%   maxiter the maximum number of times to run (default 5)

%% History
%  2008-10-20: Added partition list input.

n = size(G,1);

if ~exist('maxiter','var') || isempty(maxiter), maxiter=5; end
if ~exist('p','var'), p='cond'; end

if ~islogical(A), lA = false(n,1); lA(A)=1; A=lA; end % convert A to logical

% compute p
if isempty(p), p = ones(n,1); end
if ischar(p) && strfind(p,'cond'), p = full(sum(G,2)); end

[ei ej ev] = find(G);
L = diag(sum(G,2))-G; % compute the laplacian
from_s = find(A); pA = p(A);
to_t = find(~A); pB = p(~A);

infval = 2^29;

piA = sum(pA);
piB = sum(pB);

iter= 1;
S= A;
% compute alpha = Q(A)
v= zeros(n,1);
v(S) = 1;
v(~S) = -1;
cv = v'*(L*v)/4;
bal = min(piA,piB);
alpha = cv/min(piA,piB);
q = alpha;
fprintf('Initial cut : cut= %7i bal= %7i quot= %8.5f\n', ...
    cv, bal, q);
q_old = q; S_old = S;
while iter<=maxiter
    % compute G_A(alpha)
    scale = ceil(1/alpha);
    from_s_v = alpha*pA*scale;
    to_t_v = alpha*pB*piA/piB*scale;
    GA = sparse([ei; (n+1)*ones(length(from_s),1); to_t], ...
                [ej; from_s; (n+2)*ones(length(to_t),1)], ...
                [ev*scale; ceil(from_s_v); ceil(to_t_v)], n+2, n+2);

    [flowval,cut] = push_relabel_max_flow(GA,n+1,n+2);
    S = cut(1:n)==1;
    DAofS = sum(p(S&A)) - sum(p(S&~A))*piA/piB;
    if DAofS > 0,
        v= zeros(n,1);
        v(S) = 1;
        v(~S) = -1;
        cv = v'*(L*v)/4;
        QAofS = cv/DAofS;
    else
        QAofS = infval;
    end
    bal = min(sum(p(S)), sum(p(~S)));
    q = cv/bal;
    fprintf('   iter %3i : cut= %7i bal= %7i q= %8.5f quot(A)= %8.5f DA= %7f\n', ...
        iter, cv, bal, q, QAofS, DAofS);
    if QAofS>=alpha, S = S_old; q = q_old; break; end
    alpha = QAofS;
    q_old = q; S_old = S;
end

Performance

How well does it do? Well, we can reproduce some of the examples from the paper:

Here’s a vertical bisection of a US road network with 126,000 vertices

The score of this cut is 148/57507 (quotient score = 0.00257). In 5 seconds, the flow_improve routine yields a new partition which has a cut score of 42/62244 (quotient score = 0.00067). This new cut tracks the Mississippi river and the Indiana border. This was an example that Kevin Lang showed me years ago, and it’s fun to be able to easily reproduce it.

So get MatlabBGL and flow_improve_example.m a try to reproduce this yourself. (Note, you’ll need Tim Davis’ UFget.m function)

4 Responses to FlowImprove in MatlabBGL: Fast partition improvement!

khayyatzy says:

April 6, 2013 at 19:19

Hi, I would like to use one of the above figures in a paper presentation; whome I should contact to get the rights to use the figures in this blog?

- dgleich says:
  
  April 6, 2013 at 19:32
  
  Can you provide any context for the presentation?
  
  - khayyatzy says:
    
    April 6, 2013 at 19:41
    
    I am presenting my paper http://thegraphsblog.files.wordpress.com/2012/11/eurosys13_197_mizan.pdf in EuroSys13. I will use the figure as an example for MaxFlow in road networks in the introduction for my presentation
  - dgleich says:
    
    April 6, 2013 at 19:50
    
    Ah, that’s great! Feel free to use them, but can you attribute them back to this post?
    
    Thanks for sending the link to the blog too! It looks like an interesting site.