Skip to content

DataTable

A C++ template class in the txeo namespace designed to handle and organize datasets for machine learning workflows. It supports splitting data into training, evaluation, and test sets.

Overview

DataTable<T> is a data container for supervised learning scenarios. It offers flexibility for defining feature and label columns and supports optional evaluation and test splits.


Template Parameters

  • T: Numeric type (e.g., float, double) used in the underlying Matrix<T>.

Constructors

DataTable with X/Y columns

DataTable(const Matrix<T>& data, std::vector<size_t> x_cols, std::vector<size_t> y_cols);

Split based on specified feature and label columns.

DataTable with only Y columns (auto-infer X columns)

DataTable(const Matrix<T>& data, std::vector<size_t> y_cols);

All columns not in y_cols are considered feature columns.

DataTable with evaluation split

DataTable(const Matrix<T>& data, std::vector<size_t> x_cols, std::vector<size_t> y_cols,
          size_t eval_percent);

DataTable(const Matrix<T>& data, std::vector<size_t> y_cols, size_t eval_percent);

Reserves a percentage of the data for evaluation.

DataTable with evaluation and test splits

DataTable(const Matrix<T>& data, std::vector<size_t> x_cols, std::vector<size_t> y_cols,
          size_t eval_percent, size_t eval_test);

DataTable(const Matrix<T>& data, std::vector<size_t> y_cols, size_t eval_percent,
          size_t eval_test);

Splits dataset into training, evaluation, and test.

DataTable with explicit splits

DataTable(const Matrix<T>& x_train, const Matrix<T>& y_train,
          const Matrix<T>& x_eval, const Matrix<T>& y_eval,
          const Matrix<T>& x_test, const Matrix<T>& y_test);

DataTable(const Matrix<T>& x_train, const Matrix<T>& y_train,
          const Matrix<T>& x_eval, const Matrix<T>& y_eval);

DataTable(const Matrix<T>& x_train, const Matrix<T>& y_train);

Use pre-split matrices directly. If rvalues are passed, copy is avoided.


Accessors

Training Data

const Matrix<T>& x_train() const;
const Matrix<T>& y_train() const;

Evaluation Data

const Matrix<T>* x_eval() const;
const Matrix<T>* y_eval() const;

Returns nullptr if evaluation was not set.

Test Data

const Matrix<T>* x_test() const;
const Matrix<T>* y_test() const;

Returns nullptr if test was not set.


Metadata

Input/Output Dimensions

size_t x_dim() const;
size_t y_dim() const;

Row Count

size_t row_size() const;

Number of rows in the training set.


Exceptions

txeo::DataTableError

Thrown when invalid inputs or split percentages are provided.


Example Usage

txeo::Matrix<float> data = {{1, 2, 3, 4}, {5, 6, 7, 8}};
DataTable<float> dt(data, {3}, 50); // 50% eval split

assert(dt.x_train().rows() == 1);
assert(dt.x_eval()->rows() == 1);

For detailed API references, see individual method documentation at txeo::DataTable.