Tenenbaum, et al. 2000

Summary

Isomap, seemingly named for “Isometric mapping”, seeks to provide a solution to the problem of non-linear dimensionality reduction. The method is especially suitable for high-dimensional manifolds that exhibit non-Euclidean geometry, such that the Euclidean distance between data points returns distances that are not actually realistic for the underlying low-dimensional manifold. The intuition for this approach lies in the use of the all-pairs shortest path algorithm to improve upon Multi-dimensional scaling. Under general conditions on the density and curvature of the points, a geodesic distance can be estimated between far away points on the high-dimensional manifold via the all-pairs shortest path that converges to the true distance in the limit. Then, similar to MDS, Isomap attempts to find coordinate vectors for a low-dimensional space within which the distances between points are preserved as much as possible. This essentially results in the selection of the largest p eigenvectors of the matrix of estimated distances on the high-dimensional manifold (transformed to inner products). To make the algorithm work, the first step consists of clustering the data points either using k-NN or $\epsilon$-balls. Edges are placed between all points clustered together, to form the graph upon which all-pairs shortest path is run.

In this paper, the authors present examples of applying Isomap to a dataset of faces, MNIST, and the “swiss roll” dataset. Interestingly, they are able to map the faces dataset to a 3-D space, capturing left-right poses, up-down poses, and variations in ambient lighting. They show that PCA and MDS converge (the residual loss goes to 0) but they are unable to recover the true dimensionality of the low-dimensional manifold. This seems to be troublesome, because if one naively applies PCA to a dataset and the residual loss goes to 0, it appears then that the user of this algorithm will mistakenly believe they have recovered the true low-dimensional manifold. It would be interesting to then run a classifier on this low-dimensional representation produced by PCA, and then check the performance against the same classifier using the low-dimensional representation learned by Isomap. I imagine that the Isomap classifier will have slightly better performance.