SIAM Conference on Computational Science and Engineering (SIAM CSE17) took place in Atlanta, Georgia, USA. We (Tammy, Grey, and Bora) have organized a minisymposium on tensor decompositions. SIAM CSE17's web page contains the talk information, including the speaker names, authors, and the abstracts. The speakers of the minisymposium kindly agreed to share the slides at this page.
Tensors, or multidimensional arrays, are a natural way to represent high-dimensional data arising in a multitude of applications. Tensor decompositions, such as the CANDECOMP/PARAFAC and Tucker models, help to identify latent structure, achieve data compression, and enable other tools of data analysis. This minisymposium identifies pressing applications for multidimensional data analysis as well as efficient algorithms for computing various tensor decompositions.
Title: Discovering Fast Matrix Multiplication Algorithms Via Tensor Decomposition
Speakar: Grey Ballard (Wake Forest University, USA)
Abstract:The computational complexity of matrix multiplication is an open question. Fast algorithms for matrix multiplication are those that require $o(n^3)$ flops (that is, the exponent is less than 3) for $n\times n$ matrices. Nearly all fast algorithms are based on a clever method for multiplying very small matrices that can be applied recursively. Discovering such a method corresponds to finding an exact, low-rank decomposition of a particular tensor. In this talk, I'll discuss numerical techniques for computing these decompositions and the practicality of the algorithms that have been discovered.
Title: A Practical Randomized CP Tensor Decomposition
Speaker: Casey Battaglino (Georgia Institute of Technology, USA)
Abstract:The CANDECOMP/PARAFAC (CP) decomposition is a leading method for the analysis of multi-way data. The standard alternating least squares algorithm for the CP decomposition (CP-ALS) involves a series of highly overdetermined least squares problems. We show that, by extending randomized least squares methods to tensors, the workload of CP-ALS can be drastically reduced without a sacrifice in quality. We introduce techniques for efficiently preprocessing, sampling, and computing randomized least squares on a dense tensor of arbitrary order, as well as an efficient sampling-based technique for checking the stopping condition. We also show more generally that the Khatri-Rao product (used within the CP iteration) produces conditions favorable for direct sampling without any preprocessing. In numerical results, we see improvements in speed, storage, and robustness.
Title: Tensor Based Approaches in Magnetic Resonance Spectroscopic Signal Analysis
Speaker: Sabine Van Huffel (KU Leuven, Belgium)
Abstract: In recent years, magnetic resonance spectroscopic imaging (MRSI) is being used for brain tumor diagnosis. The processing of the MRSI signals are mainly performed using matrix based approaches. Tensorizing the matrix and using suitable tensor decompositions provides certain advantages. We focus on tensor based approaches to perform residual water suppression in the MRSI signal and to differentiate various tissue types in glioma patients from MRSI signals. To suppress the residual water, a Loewner tensor is constructed from the MRSI signal. Canonical polyadic decomposition (CPD) is applied on the tensor to extract the water component, and to subsequently remove it from the original MRSI signal. Tensor based tumor tissue type differentiation consists of building a covariance structured 3-D tensor from the MRSI spectra and then applying a non-negative CPD with common factor in mode-1 and mode-2 and l1 regularization on mode-3 to extract tissue-specific spectra and its corresponding distribution in the MRSI grid. An in-vivo study shows that our tensor based approach has better performance in identifying tumor and necrotic tissue types in glioma patients compared to matrix based approaches. This tensor based tissue characterization approach can be further extended to multi-parametric magnetic resonance imaging consisting of conventional magnetic resonance imaging, perfusion-weighted imaging, diffusion-weighted imaging and MRSI modalities.
Title: Impromptu discussion
Abstract: Slides of the improptu discussion are available! Notes are taken by T. G. Kolda.
Title: Tensor Decompositions for Bernoulli Data
Speaker: Tamara G. Kolda (Sandia National Laboratories, USA)
Abstract: Tensor decompositions are a powerful tool for multiway factor analysis. We consider the problem of modeling binary data using the canonical polyadic (CP) tensor decomposition. Binary data appears in many scenarios. For instance, a tensor representing a time-evolving graph may have a true in position $(i,j,k)$ if nodes $i$ and $j$ are connected at time $k$. We assume the data is Bernoulli distributed and that the parameters in the tensor decomposition represent the logarithm of the odds ratio (i.e., also known as the “logit' function) of a true value. We discuss the formulation of the problem, its connection to standard (Gaussian) tensor decomposition, and give examples of its utility on real-world problems.
Title: An Exploration of Optimization Algorithms for High Performance Tensor Completion
Speaker: Shaden Smith (University of Minnesota, USA)
Abstract: Tensor completion is a powerful tool used to estimate or recover missing values in multi-way data. It has seen great success in domains such as product recommendation and healthcare. Tensor completion is most often accomplished via low-rank sparse tensor factorization, a computationally expensive non-convex optimization problem which has only recently been studied in the context of parallel computing. In this work, we study three optimization algorithms that have been successfully applied to tensor completion: alternating least squares (ALS), stochastic gradient descent (SGD), and coordinate descent (CCD++). We explore opportunities for parallelism on shared- and distributed-memory systems and address challenges such as memory- and operation-efficiency, load balance, cache locality, and communication. Among our advancements are an SGD algorithm which combines stratification with asynchronous communication, an ALS algorithm rich in level-3 BLAS routines, and a communication-efficient CCD++ algorithm. We evaluate our optimizations on a variety of real datasets using a modern supercomputer and demonstrate speedups through 1024 cores. These improvements effectively reduce time-to-solution from hours to seconds on real-world datasets. We show that after our optimizations, ALS is advantageous on parallel systems of small-to-moderate scale, while both ALS and CCD++ will provide the lowest time-to-solution on large-scale distributed systems.
Title: Efficient Parallel Software for Tucker Decompositions of Dense Tensors
Speaker: Alicia M. Klinvex (Sandia National Laboratories, USA)
Abstract: As parallel computing tends toward the exascale, scientific data produced by simulations are growing increasingly massive, sometimes resulting in terabytes of data. By viewing this data as a dense tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving impressive compression ratios with negligible loss in accuracy. We present recent improvements in our distributed-memory parallel implementation of the Tucker decomposition, whose key computations correspond to parallel linear algebra operations. To demonstrate the compression and accuracy of the method, we apply our software to real-world data sets from combustion simulations. We also provide detailed performance results.
Title: Large-Scale Dense Tucker Decomposition on GPUs
Authors:Jee Choi (IBM Research, USA)
Abstract: ...