17.7.2 Partial Least Squares

Partial Least Squares(PLS) combines features of principal components analysis and multiple regression. It first extracts a set of latent factors that explain as much of the covariance as possible between the independent and dependent variables. Then a regression step predicts values of the dependent variables using the decomposition of the independent variables.

Partial Least Squares Plot.png

Goals

There are two primary reasons for using PLS:

  • Prediction
    PLS is a popular method for constructing a predictive model when the factors are many and highly collinear.
  • Data Reduction
    PLS is used to convert a set of highly correlated variables to a set of independent variables

Processing Procedure

Preparing Analysis Data

PLS can be used for variables which are strongly correlated. Since PLS is considered as the combination of PCA and Multiple Regression, the data used for PCA can be addressed with PLS.

Selecting Computation Methods

SVD or Wold's Iteration

These two methods yield the same result, the difference being that Wold's Iteration is slightly faster than SVD. SIMPLS is a method named in some papers. SIMPLS is simply SVD or Wold's Iteration by another name.

Cross Validation

Verification of the fitting model is an important step. Cross Validation allows us to evaluate the performance of our model. Origin uses the leave-one-out method of cross validation. In Origin, predicted residual sum of squares(PRESS) and its root mean are used to find the optimal number of factors by cross-validation.

Handling Missing Values

If there are missing values in the independent/dependent variables, the whole case (entire row) will be excluded in the analysis

Performing Partial Least Squares

  • Select Statistics: Multivariate Analysis: Partial Least Squares
    Or
  • Type pls -d in script window


Topics covered in this section: