4.2.1.1 Linear Fitting and Outlier Removal


Video Image.png Video Text Image.png Website blog icon circle.png Blog Image 33x33px.png

Summary

An outlier is typically described as a data point or observation in a collection of data points that is "very distant" from the other points and thus could be due to, for example, some fault in the measurement procedure. Identification and removal of outliers is often controversial, and is typically "more acceptable" in situations where the model used to describe the data is well known and well accepted.


What you will learn

This tutorial will show you how to:

  • Perform linear regression on a set of data points
  • Examine the Residuals Table in the output and "identify" outliers
  • Use the Masking Tool to remove the outlier points
  • Use the Recalculation mechanism to automatically update the result after outlier removal

The procedure described in this tutorial is also applicable to other fitting tools such as Polynomial and Nonlinear Fitting

Steps

  1. Start with a new workbook and import the file \Samples\Curve Fitting\Outlier.dat.
  2. Click and select the second column and use the menu item Plot: Symbol: Scatter to create a scatter plot.
  3. With the graph active, use the menu item Analysis: Fitting: Linear Fit... to bring up the Linear Fit dialog. Note that if you have used the Linear Fit dialog before, there will be a fly-out menu and you need to select the Open Dialog... sub menu.
  4. In the Fit Control tab, clear the Apparent Fit check box.

    DOC-2411 Tutorial FitLinear 007a Magenta.png

  5. In the Residual Analysis tab in the dialog, and check the Standardized check box.

    DOC-2411 Tutorial FitLinear 003a Magenta.png

  6. Change the Recalculate drop-down at the top of the dialog to Auto and press the OK button at the bottom of the dialog. The dialog will close and linear regression will be performed on the data.

    DOC-2411 Tutorial FitLinear 004a Magenta.png

  7. Select the FitLinearCurve1 result sheet in the data workbook and scroll to the right side to view the Standardized Residual column. You will note that the value in row 6 of this column is -2.54889:

    DOC-2411 Tutorial FitLinear 001a Magenta.png

Masking plotted data by selecting points in the worksheet

At this point, we can mask data using either the worksheet or the graph. We will start by demonstrating how to mask plotted data from the worksheet.

  1. Return to the original Outlier sheet and highlight row 6.
  2. On the Mini Toolbar that pops up, click the Mask/UnMask Data button and note that the outlier is now masked (marked as red) in the graph window.
  3. Click on row 6 again and toggle the Mask/Unmask Data button to remove the mask from the point, noting that the point reverts to black.
Tutorial FitLinear 001b Magenta.png

Masking plotted data by selecting points in the graph

To mask data in the graph:

  1. Make the graph active and then click and hold down the mouse left button on the "Regional Mask Tool" button in the Tools toolbar. Select the "Masked Points on Active Plot" submenu which will be the first item in the fly-out menu:

    Tutorial FitLinear 002.png

  2. With the above submenu selected, go to the graph and click on the 6th data point to mask the point (note that you can drag out a rectangle around the point).

    Tutorial FitLinear 005.png

    Whether you have masked the point in the worksheet or masked the point in the graph window, masking the point changes the input data to the linear fit operation and the auto update mechanism is triggered. The linear fit will be repeated with this particular masked point left out. The fit curve in the graph and the parameters table will automatically update. The resulting graph should look like this:

    Tutorial FitLinear 006.png