17.7.2.2 Interpreting Results of Partial Least SquaresPLSInterpretingResults
Partial Least Squares Report Sheet
Cross Validation
This report only appears when you check to do Cross Validation. It gives summary statistics for fitting models using from 0 to the specified maximum number of extracted factors. If there are more than 15 independent variables, then we restrict the maximum number of extracted factors can only be up to 15; otherwise, the number of original independent variables can be up to. We can see one table Cross Validation Summary and another plot PRESS Plot are contained in this report section. These results will tell you the optimal number of factors of great interest. We have many other methods, such as Kfold crossvalidation, 2fold crossvalidation, Repeated random subsampling validation and Leaveoneout crossvalidation, to do cross validation. However, as for Partial Least Squares, we usually choose Leaveoneout method, which uses a single observation as validation data and leave the remains as training data each time, with the process stopping until each observation has already been treated as validation data.
Cross Validation Summary
In this table, Root mean PRESS is the root mean of PRESS, which is the predicted residual sum of squares. From the table, we generally can see the values of Root mean PRESS start to (nonstrictly) decrease to a minimum root mean and then increase to some value. At the time the minimum root mean is reached, the number of factors involved is the socalled Optimal number of factors. Actually, the information of most interest could be found in the notes below the table.
PRESS Plot
This plot shows directly how the minimum root mean is reached.
Variance Explained
This section includes one table Percent of Variance and two plots Variance Explained Plot.
Percent of Variance
This table offers results of the percent variation and cumulative percent variation explained for both X and Y. We can see that the more factors are involved the larger percent value for both X Effects and Y Responses.
Variance Explained Plot
These two plots show Variance Explained for X Effects(%) and Variance Explained for Y Responses(%) respectively.
Coefficients Plots
For each response in Y, the corresponding plot shows the coefficients of X based on the original data.
Variable Importance
This plot explains each predictor variable using the mean variance in responses.This VIP value is a measure of the importance of a variable. We can see there is a reference line, which equals to 0.8, in the plot. A variable is considered 'important' if its VIP value is greater than 0.8.
Loadings Plot
The Loading Plot is a plot of the relationship between original variables and subspace dimensions. It is used for interpreting relationships among variables.
Scores Plot
The score plot is a projection of data onto subspace. It is used for interpreting relations among observations.
Diagnostics Plots
For each response in Y, we have linear fit plot, residual scatter plots and normal percentile plot. These plots are used for model diagnosis.
Distance Plots
The distance plots show distances to both X and Y model for the ith observation.
T Square Plot
The van der Voet statistics are used to test whether models with various numbers of extracted factors significantly differ from the optimum model. And this T Square Plot shows scatter of these statistics.
