FlowImprove in MatlabBGL: Fast partition improvement!

One of the reasons that I originally wrote MatlabBGL was to be able to quickly implement and test algorithms that utilize graph algorithms as a subroutine.  A great example of this is the FlowImprove routine from Andersen and Lang.  (Needless disclosure, I know both of the authors and have an extremely high level of respect for their work.)

The authors’ own descriptions of the methods are a 2008 paper in SODA and a talk at Stanford’s MMDS conference:

At a high-level, this routine takes a partition of a graph into two pieces, and returns a new partition of the graph with an improved quotient cut score.  As a reminder (or an introduction!) the quotient cut score of a set is

\text{QCut}(S) = \frac{\text{cut}(S)}{\min(w(S),w(\bar{S}))}

where \text{cut}(S) measures the number of edges leaving the set S, and w(S), w(\bar{S}) is the sum of weights of vertices in S and “not S” (\bar{S}), respectively.  A common choice for the weight of a vertex is the uniform weight, or the degree of the vertex.  In the second case, the QCut score corresponds to the Normalized Cut objective — another frequently used partitioning score.

The algorithm works by solving a sequence of max-flow/min-cut problems.  At each iteration, the weights of the edges changes.  The algorithm works towards improving the partition by adjusting the weights.  They show that the algorithm will do at most n^2 iterations for a graph with n vertices, but usually find that only a few iterations are required.  It’s quite likely there is some improved theory that could explain why it works so fast in practice.

This makes it a perfect algorithm to implement in MatlabBGL, it really only requires 5 or 6 max-flow calls!  The only real difficulty in the implementation, which is entirely analogous to the pseudocode in the paper, is rounding the floating point weights in the flow procedure to integer weights in order to use the push-relabel algorithm in MatlabBGL (and which is what the authors use in the paper).   Because of this change, we introduce the “scale” in the code below to do this rounding.

The full implementation

This implementation is implemented as a custom code in MatlabBGL: https://github.com/dgleich/matlab-bgl/blob/master/custom/flow_improve.m

function [S,q] = flow_improve(G,A,p,maxiter)
% FLOW_IMPROVE Improve an existing clustering using max-flow
%   G must be a symmetric graph
%   A is an indicator over the vertices of G
%   p a non-negative integer vertex weight vector (default ones(n,1))
%   maxiter the maximum number of times to run (default 5)

%% History
%  2008-10-20: Added partition list input.

n = size(G,1);

if ~exist('maxiter','var') || isempty(maxiter), maxiter=5; end
if ~exist('p','var'), p='cond'; end

if ~islogical(A), lA = false(n,1); lA(A)=1; A=lA; end % convert A to logical

% compute p
if isempty(p), p = ones(n,1); end
if ischar(p) && strfind(p,'cond'), p = full(sum(G,2)); end

[ei ej ev] = find(G);
L = diag(sum(G,2))-G; % compute the laplacian
from_s = find(A); pA = p(A);
to_t = find(~A); pB = p(~A);

infval = 2^29;

piA = sum(pA);
piB = sum(pB);

iter= 1;
S= A;
% compute alpha = Q(A)
v= zeros(n,1);
v(S) = 1;
v(~S) = -1;
cv = v'*(L*v)/4;
bal = min(piA,piB);
alpha = cv/min(piA,piB);
q = alpha;
fprintf('Initial cut : cut= %7i bal= %7i quot= %8.5f\n', ...
    cv, bal, q);
q_old = q; S_old = S;
while iter<=maxiter
    % compute G_A(alpha)
    scale = ceil(1/alpha);
    from_s_v = alpha*pA*scale;
    to_t_v = alpha*pB*piA/piB*scale;
    GA = sparse([ei; (n+1)*ones(length(from_s),1); to_t], ...
                [ej; from_s; (n+2)*ones(length(to_t),1)], ...
                [ev*scale; ceil(from_s_v); ceil(to_t_v)], n+2, n+2);

    [flowval,cut] = push_relabel_max_flow(GA,n+1,n+2);
    S = cut(1:n)==1;
    DAofS = sum(p(S&A)) - sum(p(S&~A))*piA/piB;
    if DAofS > 0,
        v= zeros(n,1);
        v(S) = 1;
        v(~S) = -1;
        cv = v'*(L*v)/4;
        QAofS = cv/DAofS;
    else
        QAofS = infval;
    end
    bal = min(sum(p(S)), sum(p(~S)));
    q = cv/bal;
    fprintf('   iter %3i : cut= %7i bal= %7i q= %8.5f quot(A)= %8.5f DA= %7f\n', ...
        iter, cv, bal, q, QAofS, DAofS);
    if QAofS>=alpha, S = S_old; q = q_old; break; end
    alpha = QAofS;
    q_old = q; S_old = S;
end

Performance

How well does it do?  Well, we can reproduce some of the examples from the paper:

Here’s a vertical bisection of a US road network with 126,000 vertices

The score of this cut is 148/57507 (quotient score = 0.00257).  In 5 seconds, the flow_improve routine yields a new partition which has a cut score of 42/62244 (quotient score = 0.00067).  This new cut tracks the Mississippi river and the Indiana border.  This was an example that Kevin Lang showed me years ago, and it’s fun to be able to easily reproduce it.

So get MatlabBGL and flow_improve_example.m a try to reproduce this yourself.  (Note, you’ll need Tim Davis’ UFget.m function)

This entry was posted in Uncategorized. Bookmark the permalink.

4 Responses to FlowImprove in MatlabBGL: Fast partition improvement!

  1. khayyatzy says:

    Hi, I would like to use one of the above figures in a paper presentation; whome I should contact to get the rights to use the figures in this blog?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s