Datasets

Introduction

A dataset is a basic Origin object. It is essentially a one-dimensional array that can contain numeric and/or text values.

  • A dataset may contain different data types
  • The individual values in a dataset are called elements.
  • Each element in a dataset is associated with an index number, by which individual elements can be accessed. In a worksheet, the index number corresponds to the row number. Note that in contrast to Origin C convention (where the first element has an index of 0), LabTalk index numbers start with 1.
  • Each dataset in an Origin Project has a unique name.

A dataset is often associated with a worksheet column and/or one or more data plots. When a dataset is deleted, associated worksheet columns and data plots are also deleted.

A dataset is also a LabTalk Variable, so it can either belong to the project, or be declared as a local variable. The following types of datasets are supported in Origin:

Column or Matrix Object: A dataset that is associated with a column in a worksheet or a Matrix Object in a Matrixsheet.

Loose Dataset: A Dataset which is not attached to a worksheet or matrixsheet.

Temporary Dataset: A type of loose dataset used as temporary dataset storage. Origin creates temporary datasets with names starting with an underscore ("_") character, e.g., _temp. Origin temporary datasets are deleted when the project is saved or a graph page is printied/copied. User-created temporary (loose) datasets are deleted when a project is closed or a new project is opened.

To see the names of the existing datasets, use List s command.

Creating a Dataset

Datasets in worksheets and matrix layers, are automatically created when a column or matrix object is added. A column or a matrix object is automatically added when it is requested as an output, like when it is on the left hand side of an assignment, for example:

wks.ncols = 3; // set active sheet to be only 3 columns
col(4)={1,2,3}; // col(4) will be automatically added

There are three ways to create a loose dataset:

  • Declare it with [LabTalk:Dataset_(command)|dataset command]]
  • Use the Create (Command).
  • Auto create with assignment.

Declaring a Dataset and the Dataset -a option

If you declare a dataset without using the -a option, it will be created as Numeric(Double) only. To create a Text & Numeric dataset, add the option -a.

Examples

// Declared a numeric dataset
Dataset ds1 = {1,2,3};

// create a Text & Numeric dataset
dataset -a vv;
vv=data(1,10);
vv[2]$="Test";

// loose dataset mydata1 is created as result of assignment 
mydata1 = col(1); // Since col(1) is a dataset, so is mydata1

Getting and Setting Size

Before Origin 8, you had to use Set and Get command with a dataset name to set and get a dataset's size. In Origin 8 you can use the following two methods:

  • GetSize()
  • SetSize(int nSize)

which are part of the vectorbase OriginC class and inherited by a LabTalk dataset. In order to use these methods, you must have a variable of the type Range or type Dataset. For example:

Range a = 1; // col(1) of active sheet
for( int ii = 1 ; ii <= a.GetSize() ; ii++ )
{
   if( a[ii] < 0 )
     type "Row $(ii) is negative";
}

You cannot use these methods on a dataset name directly:

temp = col(a);
i = temp.GetSize(); // error, cannot do this
i =;// should produce --

Instead, declare a Dataset object first:

Dataset temp = col(a);
// If the dataset is too small
if( temp.GetSize() < 50 )
   // make it bigger
   temp.SetSize(366);

Accessing Datasets

Datasets in Worksheets and in Matrices can be accessed by the following means:

  1. Range Variables
  2. Col() or Mat() function and WCol() function
  3. Cell() function
  4. Substitution Notation
  5. Dataset name

Range Variables

Range variables can be used to access data in worksheets, graphs, and matrices.

For Worksheet Data

string strBk$ = Book2;
string strSh$ = Sheet3;
range rDa = [strBk$]strSh$!col(A);  // This is a data range
range rSh = [strBk$]strSh$!;        // This is a sheet range
rDa = {1:30}; // data generation from 1 to 30 increment 1
// Use properties of a sheet range/object
if(rSh.ncols < 10) 
   // adding more columns can be simply done like this
   rSh.nCols = 10; 
loop(ii,2,10)
{
// re-assign range programmatically 
range rDb = [strBk$]strSh$!wcol(ii); 
    rDb = normal(30);
    rDb += rDa;
}

For Plotted (Graph) Data

You can get various ranges from plotted data as well.

// Get range ( [Book]Sheet!col ) of third dataplot in active layer
range -w rngData = 3; 
// Gets the Page (Book) range of the data
range rngBook = rngData.GetPage()$; 
// Gets the Layer (Sheet) range of the data
range rngSheet = rngData.GetLayer()$;

For Matrix Data

Similarily, you can access Matrix object with a Range variable and matrix elements by [rowNumber, colNumber].

range ml=[MBook1]MSheet1!;   // This is a sheet range
range mm=[MBook1]MSheet1!1;  // This is a data range, 1st Matrix obj
// Access sheet property - number of columns
loop(i,1,ml.ncols)          
{
// Access sheet property - number of rows    
loop(j, 1, ml.nrows)       
    {
        mm[i,j] = sin(pi*i/180) - cos(pi*j/180);
    }
}

Besides, a matrix can be treated as a single-dimensional array using row major ordering, and you can access a matrix value by:

MatrixObject[N*(i - 1) + j]

where N is the number of columns in the matrix, and i and j is the row and column number. So for a 4 rows by 5 columns matrix, using [2, 3] and [8] returns the same cell. For example:

range mm = [MBook1]MSheet1!Mat(1);
if(mm[2, 3] == mm[8]) 
        ty "They are the same!";

By Col(), Mat(), and WCol() function

// Require an active sheet
// Here, the win -o command is used to temporarily switch books and set
string strBk$ = Book5;
string strSh$ = Sheet2;
win -o strBk$
{
   temp = page.active;       // remember the active sheet
   page.active$ = strSh$;
   // col( ) function can take a column number (1..n)
   col(1) = {1:30};          
   loop(ii,2,10)
   {
      // wcol( ) takes column number or an integer variable
      wcol(ii) = normal(30); 
      // col( ) function can also take a Short Name
      wcol(ii) += col(A);    
   }
   page.active = temp;     // Restore the originally active sheet
}                          
// Exit 'win -o', returning to previously active window

// For a matrix, the Mat( ) function serves as an analog to the Col( ) 
// function, and matrix objects are analogous to column numbers.

// Fill the second matrix object in the active matrixsheet
Mat(2) = Data(100,1024,.01);
// Set row 12, col 5 in first matrix object of active matrixsheet:
Mat(1)[12,5] = 125;

Cell Function

The Cell( ) function can be used with either a worksheet or a matrix. When referring to a non-active book, this notation will only support the bookname! prefix for the active sheet.

cell(5,2) = 50; // Sets row 5, column 2 cell in active window 
                //(worksheet or matrix) to 50
// You can use prefix for different active page, 
// but target is still the active sheet in that page

// Sets row 10, column 1 in active sheet of Book5 to 100
Book5!cell(10,1) = 100; 
// Function accepts variables
for(ii = 1 ; ii <= 32 ; ii++)
{
    for(jj = 1 ; jj <= 32 ; jj++)
    {
        MBook1!cell(ii,jj) = (ii-1) * 32 + jj;
    }
}

Substitution Notation

// This notation accepts book name and sheet name 
//to access data directly :
%([Book2]Sheet3,2,1) = 100;
// but does not resolve to active sheet of named book :
%([Book2],2,1) = 10; // THIS PRODUCES A COMMAND ERROR
// unless we just use the page name, as in :
%(Book2,2,1) = 10;   // Use active sheet of Book2
// or :
%(%H,2,1) = 1;       // Uses active sheet of active window
// Substitution works before execution, so you can do things like:
%([Book2]Sheet1,2,1) + %([Book2]Sheet2,2,1) + %([Book2]Sheet3,2,1)=;
val = %([Book2]Sheet1,2,1) + %([Book2]Sheet2,2,2) + %([Book2]Sheet3,2,3);
type "Cross-sheet sum \x3D $(val)";

For other Substitution Notation, please see Substitution Notation .

By Dataset Name

This is the oldest notation in LabTalk and should not be used unless you have to maintain old LabTalk code, or when you know the workbook has only one sheet or you are working with loose datasets.

// Dataset Name
// Using the formal name of a dataset :
// PageName_ColumnName[@SheetIndex]
// Where PageName and ColumnName are the Short Name 
// and SheetIndex is a creation order index
// (SheetIndex is not the order of the sheet tabs in the window)
Book1_B = Uniform(100);   // Can fill datasets
Book1_B@2 = Data(1,365);  // Same book and same column name, but on the
                          // second sheet
July_Week1@3[5]$ = Sell;  // Can access a particular row element
                          // (5th row of column 'Week1' on sheetindex 
                          // 3 in book 'July')
BSmith = 6;
ChemLab_Grade@3[BSmith] = 86; 
     // Can use a variable to access row 
     //(6th row of 'Grade' in third sheet of 'ChemLab')

Accessing Dataset Storage

Beginning with Origin 2018b, there is LabTalk access to dataset Storage. Utility is limited at this time but will be expanded upon in later versions.

Prefix a dataset name with

dataset.

For example:

dataset.Book1_B.info.=; // returns information about content in the info. area of the dataset
dataset.Book1_B.info.Add(Something); // adds storage object "Something" to the dataset
dataset.Book1_B.info.Remove(Something); // deletes storage object "Something" from the dataset

Masking Cells

Cells in worksheet columns and matrixsheets can be masked by setting

dataset<index> = 1

For example:

// Masking the cell 3 in column A
col(a)<3> = 1;
// Mask a cell in matrix
range rr = [mbook1]msheet1!mat(1);
rr<2> = 1;

// Unmask a cell
col(a)<3> = 0;

For matrix, you can also use <row, col> to specify a cell:

// Mask the 2nd row, 3rd column of current matrixsheet
%C<2,3> = 1;
// Mask the whole 2nd row of the current matrixsheet
%C<2,0> = 1;
// Mask the whole 3rd column of the current matrixsheet
%C<0,3> = 1;

To mask a specified range, please see the mark command.

Set Column Values with a Dataset

Minimum Origin Version Required: 9.0 SR0

You can generate a dataset of certain size with a formula and some before formula script, and later assign this dataset to a certain column. The syntax is:

dataset.setvalues("Formula", "Before formula script", "From", "To")

In the Formula you could call functions like in the F(x) menu of set column values dialog.

An example is shown as following:

newbook;
wks.addcol();
dataset ds1,ds2,ds3;//Define three dataset objects
//Set values for each object. 
ds1.setvalues("i+grnd()","","1","20");
ds2.setvalues("sin(i)","","1","20");
ds3.setvalues("col(1)*col(2)","col(1)=ds1;col(2)=ds2;","1","20");
//Use ds3 dataset as the value of column 3
col(3)=ds3;

Note that the setvalues method of a dataset does not support recalculate options. For that, see the csetvalue X-Function.

Using Loose Datasets in X-Functions

Loose datasets are more efficient to use in a LabTalk script since they do not have the overhead of a column in a worksheet. If used inside a scope, they are also cleaned up automatically when the scope ends. For example, declare a loose dataset missingvals, and then use the vfind X-Function to assign row-numbers of missing values in column 1 (of the active worksheet) to it:

// Declare a loose dataset
dataset  missingvals;
// Call vFind X-Function to find row numbers
// where missing values appear in col 1
vfind ix:=1 value:=NANUM ox:=missingvals;
// Access missingvals for results
for(int ii=1; ii<=missingvals.GetSize(); ii++)
   type $(missingvals[ii]);

Many X-Functions produce outputs in the form of a tree and some of the tree nodes can be vectors. To access such a vector, you need to copy it to a dataset. You may directly put the tree vector into a column, but if the vector is used in subsequent calculations, it is more efficent to use a loose dataset.