Here, we consider three implementations of computing the SVD of the netflix matrix.
Just to recap, the matrix has 17770 rows, 480189 columns, and 100480507 non-zeros. We are also considering the sparse SVD that treats the missing entries as 0, not the matrix-completion SVD that treats the missing ratings as missing. (It’s unfortunate that these two, very different, problems are often confused.)
I’m using Matlab R2011a on a dual Intel Xeon e5-2670 computer with 256GB of RAM. Computing a rank 200 SVD takes about 2.34GB of memory (~760 MB for vectors, ~1.5GB for matrix). Given the way the algorithms work, there is usually a bit of overallocation, so let’s say 3GB of memory is reasonable.
If we just use Matlab’s svds
[U,S,V] = svds(A,k);
Then we get the results:
k = 10 -> Elapsed time is 95.075653 seconds. k = 25 -> Elapsed time is 151.247499 seconds. k = 50 -> Elapsed time is 262.132427 seconds. k = 100 -> Elapsed time is 589.469476 seconds. k = 150 -> Elapsed time is 983.575712 seconds. k = 200 -> Elapsed time is 1538.977824 seconds.
What Matlab’s svds routine does internally is compute the extremal eigenvectors of the matrix using the ARPACK software. There are a few steps in this that exploit parallel computations.
We can alternatively compute the largest eigenvalues and vectors of the matrix , which squares the condition number and is usually a no-no in numerical analysis, but if we are solely interested in performance, this could be better. My adviser called this the “dreaded normal equations.” To do this, we use the Matlab eigs routine with a function
f = @(x) A*(A’*x);
So we don’t need to actually FORM the matrix . Again, this routine uses the ARPACK code via the function “eigs” now
f = @(x) A*(A'*x); m = size(A,1); [V D]=eigs(f,m,k,'LA',struct('issym',1,'disp',0));
What happens here is that we’d need a bit more post-processing to get the matrix U, and the elements of D are the squares of the singular values.
k = 10 -> Elapsed time is 26.425276 seconds. k = 25 -> Elapsed time is 47.842963 seconds. k = 10 -> Elapsed time is 84.456961 seconds. k = 100 -> Elapsed time is 166.463371 seconds. k = 150 -> Elapsed time is 250.260487 seconds. k = 200 -> Elapsed time is 335.170137 seconds.
But it’s much faster!
Finally, there is a customized routine that does what Matlab’s svds routine does, but using the Golub-Kahan bidiagonalization procedure that implicitly is doing the Lanczos procedure on but without forming that matrix or storing extra work. For this, we turn to the PROPACK software.
Ax=@(x) A*x; Atx=@(x) A'*x; [m n] = size(A); [UD D VD]=lansvd(Ax,Atx,m,n,k(i),'L'); k = 10 -> Elapsed time is 10.205532 seconds. k = 25 -> Elapsed time is 26.290835 seconds. k = 50 -> Elapsed time is 44.544767 seconds. k = 100 -> Elapsed time is 94.061496 seconds. k = 150 -> Elapsed time is 152.148860 seconds. k = 200 -> Elapsed time is 216.596219 seconds.
Faster still! Although, when we were looking at some of the singular values, they didn’t seem to match.
The last 10 singular values returned from ARPACK (either) and PROPACK are
ARPACK PROPACK 834.8761 799.9475 834.7372 796.6092 834.3883 794.4793 833.5185 792.0514 832.5988 789.1563 831.0431 787.2585 829.8794 783.6587 829.5437 782.2349 828.1831 778.7559 827.0634 776.5972 825.3958 773.8257
This suggests we might need to study the tolerance used in the PROPACK for an updated test.
The testing code is on github: netflix_svd.m.
My tremendous thanks to Yangyang Hou for helping with the experiments in this post and to Burak Bayramli for suggesting things that led to it.