17.7.2.2 Interpreting Results of Partial Least Squares

1 Partial Least Squares Report Sheet

Partial Least Squares Report Sheet

Cross Validation

This report only appears when you check to do Cross Validation. It gives summary statistics for fitting models using from 0 to the specified maximum number of extracted factors. If there are more than 15 independent variables, then we restrict the maximum number of extracted factors can only be up to 15; otherwise, the number of original independent variables can be up to. We can see one table Cross Validation Summary and another plot PRESS Plot are contained in this report section. These results will tell you the optimal number of factors of great interest. We have many other methods, such as K-fold cross-validation, 2-fold cross-validation, Repeated random sub-sampling validation and Leave-one-out cross-validation, to do cross validation. However, as for Partial Least Squares, we usually choose Leave-one-out method, which uses a single observation as validation data and leave the remains as training data each time, with the process stopping until each observation has already been treated as validation data.

Cross Validation Summary

In this table, Root mean PRESS is the root mean of PRESS, which is the predicted residual sum of squares. From the table, we generally can see the values of Root mean PRESS start to (non-strictly) decrease to a minimum root mean and then increase to some value. At the time the minimum root mean is reached, the number of factors involved is the so-called Optimal number of factors. Actually, the information of most interest could be found in the notes below the table.

PRESS Plot

This plot shows directly how the minimum root mean is reached.

Variance Explained

This section includes one table Percent of Variance and two plots Variance Explained Plot.

Percent of Variance

This table offers results of the percent variation and cumulative percent variation explained for both X and Y. We can see that the more factors are involved the larger percent value for both X Effects and Y Responses.

Variance Explained Plot

These two plots show Variance Explained for X Effects(%) and Variance Explained for Y Responses(%) respectively.

Coefficients Plots

For each response in Y, the corresponding plot shows the coefficients of X based on the original data.

Variable Importance

This plot explains each predictor variable using the mean variance in responses.This VIP value is a measure of the importance of a variable. We can see there is a reference line, which equals to 0.8, in the plot. A variable is considered 'important' if its VIP value is greater than 0.8.

Loadings Plot

The Loading Plot is a plot of the relationship between original variables and subspace dimensions. It is used for interpreting relationships among variables.

Scores Plot

The score plot is a projection of data onto subspace. It is used for interpreting relations among observations.

Diagnostics Plots

For each response in Y, we have linear fit plot, residual scatter plots and normal percentile plot. These plots are used for model diagnosis.

Distance Plots

The distance plots show distances to both X and Y model for the ith observation.

T Square Plot

The van der Voet $T^2$ statistics are used to test whether models with various numbers of extracted factors significantly differ from the optimum model. And this T Square Plot shows scatter of these statistics.