Principal Surfaces

What are principal surfaces?

A principal surface of a data matrix \({\bf{X}}: n \times p\) can be found using an iterative procedure which starts with a linear summary, typically the plane spanned by the first and second principal components. Each successive iteration is a smooth or local average of the \(p\) -dimensional points, where local is based on the projections of the points onto the surface of the previous iteration. The sum of squared distances between the points and the surface in each iteration of the procedure are found, such that this sum is minimised.

The initial step in the algorithm will use the sample principal components as the initial estimates of \({\bf{f}}(0)\):

\[\begin{equation} {\bf{f}}(0) = {\bf{X}}_0 {\bf{V}}_2 {\bf{V}}_2' + {\bf{1}}'\bar{{\bf{X}}}', \end{equation}\]

where \({\bf{X}}_0\) are the centred values \({\bf{X}}\), \({\bf{V}}_2\) consists of the first two columns of the right singular vectors of the centered matrix \({\bf{X}}_0\) and \(\bar{{\bf{X}}}\) consists of \(p\) mean values for the variables in \({\bf{X}}\).

The \(j\)-th iteration consists of two steps:

Projection Step: Given the current iterate of the surface, \({\bf{f}}^{(j-1)}\) of the principal surface, each sample point, \(x\) is projected onto the surface to get an updated value of \(\lambda\):

\[\begin{equation} \lambda_{{\bf{f}}^{(j-1)}}(x) = \lambda^{(j)} = {\bf{f}}^{(j-1)}{\bf{v}}_2 \end{equation}\]

where \({\bf{v}}_2\) consists of the first two columns of the right singular vectors of the matrix \({\bf{f}}^{(j-1)}\).

With a finite data set of \(n\) observations, the average of the observations that project onto a specific point is typically the mean of a single point. The average is therefore, computed within a small area around each point. For this, a scatterplot smoother is fitted on the surface with \({\bf{f}}^{(j-1)}\) with \(\lambda^{(j)}\). A locally weighted scatterplot smoothing (lowess) function is typically used as the scatterplot smoother.

Expectation Step: Given the set \(\lambda_i^{(j)}\) from the projection step, the next iterate of the surface is computed by average all those possible points that project to nearby points on the surface, thus invoking the self-consistent property:

\[\begin{equation} {\bf{f}}^{(j)}(\lambda_i^{(j)}) = E({\bf{X}}|\lambda_{{\bf{f}}^{(j-1)}}(x_i) = \lambda_i^{(j)}) \end{equation}\]

Given \(n\) observations of \({\bf{x}}_i\), the sum of the squared Euclidean distances are estimated by:

\[\begin{equation} D^2({\bf{f}},{\bf{\lambda}}) = E || {\bf{X}} - f({\bf{\lambda}})||^2 = \sum_{i=1}^n || {\bf{x}}_i - {\bf{f}}({\bf{\lambda_f}}({\bf{x}}_i))||^2. \end{equation}\]

The convergence criterion for the principal surface algorithm is the relative change in \(D^2({\bf{f}},{\bf{\lambda}})\) going from the \((j-1)\)-st iteration to the \(j\)-th iteration,

\[\begin{equation} \text{threshold}^{(j)} = \frac{|D^2({\bf{f}}^{(j-1)},{\bf{\lambda}})-D^2({\bf{f}}^{(j)},{\bf{\lambda}})|}{D^2({\bf{f}}^{(j-1)},{\bf{\lambda}})}. \end{equation}\]

The algorithm alternates between the projection and expectation step until the {} is reduced below some specified value, such as 0.001. The final iteration will yield results of \(n\) tuples for the principal surface \({\bf{f}}\) and its coordinates \({\bf{\lambda}}\).

Once the algorithm has converged, the final principal surface \({\bf{f}}({\bf{\lambda}})\), of size \(n \times p\), with corresponding \({\bf{\lambda}}\) coordinates, of size \(n \times 2\), are returned. Since the \({\bf{\lambda}}\) coordinates are not uniformly spread, a set of additional points on the surface are inferred. This creates a set of points on the surface that are equidistant to each other which helps present a fuller representation of the principal surface.

Demonstration of the function principal.surface()

library(prinsurf)
library(rgl)

Consider the following simulated data of 3 variables:

data2 <- function(n,side)
{
  e1 <- runif(n,0,0.5)
  x <- 2*runif(n) - 1
  theta <- 2*pi*runif(n) - pi
  y <- sin(theta)*sqrt(1-x^3)
  z <- sin(theta)*sqrt(x^2)
  X = matrix(c(x+e1,y+e1,z+e1),nrow=n,ncol=3,byrow=F)
  return(X)
}
X <-  data2(100,1)
points3d(X,col="blue")

The function principal.surface applied to the simulated data generates two plots. The first plot displays the \(\lambda\) points in two dimensions from the final iteration of the algorithm. The second plot shows the principal surface fitted to the data (in blue) in 3D. A grid of inferred points is used to create a surface-like visualisation.

surface <- principal.surface(X,max.iter = 5,print_iterations = TRUE)
#> [1] 1.0000000 0.8561086 0.8076939
#> [1] 2.00000000 0.04860376 0.84695082
#> [1] 3.00000000 0.05764037 0.89576939
#> [1] 4.000000000 0.003053059 0.893034549
#> [1] 5.000000000 0.002164445 0.894967474