Tuesday, November 13, 2012

Posts from FOCS (Part 4)

[Editor: In Part 4 of what now looks to be a 5- or 6-part series of reports from FOCS, second-year graduate student Ludwig Schmidt of MIT contributes a summary of the worksop on Randomized Numerical Linear Algebra from Saturday October 20.]

The workshop "Randomized Numerical Linear Algebra (RandNLA): Theory and Practice" was organized by Haim Avron, Christos Boutsidis and Petros Drineas. The website of the workshop contains the slides, abstracts and links to the speakers' websites: http://www.cs.rpi.edu/~drinep/RandNLA/ . Michael Mahoney has written a survey of the field: http://arxiv.org/abs/1104.5557 .

After a short introduction by Petros Drineas, Michael Mahoney gave a one-hour tutorial talk. The goal of RandNLA is to design better algorithms for numerical linear algebra by using randomization. Potential benefits include a better runtime (both in theory and in practice) and simpler algorithms that are easier to analyze and more amenable to parallel architectures. The tutorial talk focused on two core problems in numerical linear algebra that highlight the main ideas: low-rank approximation of matrices and least squares approximation for overdetermined systems.

Many algorithms in RandNLA follow a similar pattern: first, parts of the input matrix (rows, columns or entries) are sampled (or otherwise "sketched") in order to construct a smaller problem. This smaller problem is then solved with a traditional algorithm. If the sketch behaves similarly to the original matrix, the solution of the subproblem is close to the solution of the original problem. Hence an important issue is finding a good sampling or sketching scheme. Currently RandNLA algorithms use one of two approaches in order to achieve relative-error approximations: they either construct suitable importance sampling probabilities or use random projections followed by uniform sampling.

A key concept in the analysis of RandNLA algorithms are the statistical leverage scores. For an n x d matrix A (n >= d), the leverage scores a defined as follows: Let U be an n x d orthogonal basis for the column space of A and let U_i be the i-th row of U. Then the statistical leverage scores are || U_i ||^2  ( ||  || denotes the l_2 norm). The concept of statistical leverage scores comes from regression analysis where they are used to detect outliers.

For least-squares approximation, traditional algorithms (Cholesky, QR or SV decomposition) take O(nd^2) time. A basic randomized algorithm first samples O(d log (d) / eps^2) rows of the input matrix A and the input vector b, using the leverage scores of A as importance sampling probabilities. It then solves the smaller least-squares problem to get an approximate solution x'.  The length of the resulting residual vector is a (1 + eps) approximation of the optimal residual || A x_opt - b ||. There are also bounds on the difference between x' and x_opt.

While it takes O(nd^2) time to calculate the leverage scores exactly, an approximation of the leverage scores can be computed faster. One alternative approach uses a fast JL-transform on A, after which the leverage scores are approximately uniform and hence importance sampling on the transformed matrix corresponds to sampling rows of the transformed matrix uniformly. The resulting algorithms run in roughly O(n d log(d / eps) + d^3 log^2 n / eps) time.

Similar ideas are also used for low-rank approximation. In addition to theoretical results, some RandNLA algorithms have already been implemented and evaluated in different scenarios with an emphasis on massive data sets (e.g human genomics and astronomy). There has also been work on implementing RandNLA algorithms in distributed environments.

After the tutorial, several speakers gave 30 minute talks on various topics related to RandNLA. Nick Harvey gave a survey on generalization of Chernoff bounds to matrices: given a set of random, square, symmetric matrices, we want to show that their sum is close to the expected value of the sum. Here, closeness is expressed by bounds on the smallest and largest eigenvalues. Nick introduced Rudelson's Sampling Lemma, the Ahlswede-Winter inequality and Tropp's user-friendly tail bounds, which work in increasingly general settings.

Next, Nikhil Srivastava talked about graph sparsification with a focus on spectral sparsification. A graph H on the same vertices as graph G is a spectral sparsifier for G if the Laplacian of H gives a good approximation of the Laplacian quadratic form of G. Spectral sparsifiers can be constructed with a sampling scheme that uses effective resistances. For a Laplacian L = B^T W B (B is the incidence matrix and W contains the edge weights), these effective resistances are the leverage scores of W^1/2 B. In the case of effective resistances, the sampling probabilities can be computed quickly using Laplacian solvers.

The following two talks discussed practical implementations of RandNLA algorithms. Haim Avron described joint work with Petar Maymounkov and Sivan Toledo. When NLA algorithms are used as subroutines of larger applications, deterministic error bounds and a poly-log dependence on the accuracy are desirable. This can be achieved by using the randomization ideas described above to construct a preconditioner for an iterative algorithm (e.g. CG, LSQR, etc.). The right sampling strategy then leads to a well-behaved matrix and the problem can be solved with a small number of iterations. Empirical results show that a corresponding implementation can be faster than LAPACK (the state of the art) on dense matrices.

Ilse Ipsen spoke about aspects of the practical implementation outlined above (joint work with Thomas Wentworth). Given a matrix Q with orthonormal columns, we want to sample rows from Q so that the resulting matrix has a good condition number. They compare different row sampling schemes and give bounds for the condition number of the resulting matrix based on the coherence of Q. They also conducted numerical experiments and give tighter bounds on the condition number based on the leverage scores of Q.

Yiannis Koutis made another connection to Laplacian linear systems and presented joint work with Gary Miller, Richard Peng and Alex Levin. The theoretically best solver relies on spectral sparsifiers as preconditioners for solving Laplacian / SDD (symmetric diagonally dominant) systems. In contrast to the full sparsifiers described by Nikhil, the algorithm relies on incremental sparsifiers, i.e. a sequence of preconditioners, which are constructed with low-stretch spanning trees. While this algorithm has not been implemented yet, similar ideas have been used in CMG, which is a solver combining the multigrid method with cut-based graph sparsification.

Anastasios Zouzias talked about solving Laplacian systems in the Gossip model (joint work with Nikolaos Freris). The Gossip model is a distributed model of computation where each node can only communicate with its direct neighbors. Moreover, the input is split so that each node has only a single coordinate of the input vector. At the end of the computation, each node has a single output coordinate. Their solution is based on the (randomized) Kaczmarz method, an iterative algorithm for solving linear systems that uses a single row of the Laplacian matrix in each iteration.

Next, Christos Boutsidis spoke about column-based matrix reconstruction (joint work with Petros Drineas and Malik Magdon-Ismail). The overall problem is to approximate a given matrix A with another matrix X that has rank at most k. Christos focused on the problem where we want to approximate A with either r columns of A or r linear combinations of columns of A. One motivation for this restriction is that approximations with actual data columns can be easier to interpret. Since good theoretical results for the r = k case are already known, the talk focused on the r > k case. The authors give deterministic algorithms for approximation in both the Frobenius and the spectral norm and match the lower bound in the spectral norm case.

David Woodruff returned to least-squares regression and discussed his recent results on algorithms running in input sparsity time (joint work with Ken Clarkson). While previous work relied on the fast JL transform for subspace embeddings, their algorithm is based on embeddings that make use of the subspace structure. In particular, it is sufficient to preserve the coordinates with large leverage scores. The subspace embedding is the same as the CountSketch matrix in data streaming. The resulting algorithm runs in time roughly O(nnz(A) + d^3 / eps^2), where nnz(A) is the number of nonzero elements in the matrix A (using iterative methods, this can be improved to a logarithmic dependence on 1 / eps). Similar techniques also work for low-rank approximation.

In the final talk of the workshop, Ben Recht gave a survey of matrix completion. In the matrix completion problem, we are given a low-rank matrix M with missing entries and have to fill the missing data. This problem is also known as the Netflix problem. Since minimizing the rank is NP-hard, we instead use the convex relaxation of the rank function, which is the nuclear norm (the sum of the singular values). This convex relaxation approach is similar to l_1-minimization in compressive sensing. For an m x n matrix A (m <= n) with rank r and coherence bounded by mu, the nuclear norm heuristic recovers A with high probability if we have at least C mu^2 n r log^6 n entries (C is a constant). The dependence on the coherence is necessary because each entry has to provide a similar amount of information about A.



No comments: