PCA

simbsig.decomposition.PCA.PCA(n_components=None, iterated_power=0, n_oversamples=None, centered=False, device='cpu', mode='arrays', n_jobs=0, batch_size=None, random_state=None, verbose=True, **kwargs)

Principal Component Analysis class. Implements Halko’s algorithm[1], batched data loading for big datasets and optional GPU accelerated computations.

Parameters

Parameters
  • n_components – int, default=None Number of principle components to be kept.

  • power (iterated) – int, default=0 The i in Halko’s paper

  • n_oversamples – int, default=n_components+2. The l in Halko’s paper

  • centered – bool, default=False Whether the features of the input data have been centered.

  • device – str, default=’cpu’ Which device to use for distance computations. Options supported are: [‘cpu’,’gpu’]

  • mode – str, default=’arrays’ Whether the input data is in memory (as lists, arrays or tensors) or on disk as hdf5 files. The latter should be favored for big datasets. Options supported are: [‘arrays’,’hdf5’]

  • n_jobs – int, default=0 Number of jobs active in torch.dataloader.

  • batch_size – str, default=None Batch size of data chunks that are processed at once for distance computations. Should be optimized for dataset when using device=’gpu’. If batch_size=None, the entire dataset is loaded and processed at once, which may return an error when using device=’gpu’.

  • random_state – int, default=None The random state for the seed of torch.

  • verbose – bool, default=True Logging information. If True, progression updates are produced.

[1] Halko, Nathan, et al. “An algorithm for the principal component analysis of large data sets.” SIAM Journal on Scientific computing 33.5 (2011): 2580-2594.

simbsig.decomposition.PCA.PCA.fit(self, X, y=None)

Performs principle component decomposition using Halko’s method

Parameters

Parameters
  • X – array-like or h5py file handle. Training Data of shape (n_samples, n_features) or (n_samples, n_samples) if metric=’precomputed’

  • y – Ignored. Only present by convention.

Returns

Return self

PCA The PCA object with computed principal components

simbsig.decomposition.PCA.PCA.transform(self, X, centered=False)

Transforms data of same dimension as training data into dimension of n_components using the principal components computed during fit.

Parameters

Parameters
  • X – array-like or h5py file handle. Training Data of shape (n_samples, n_features)

  • centered – bool, default=False Whether the features of the input data have been centered.

Returns

Return X_transformed

torch.tensor The transformed data.

simbsig.decomposition.PCA.PCA.fit_transform(self, X)

Performs fit (principle component decomposition using Halko’s method) and transform (Transforms data of same data into dimension of n_components using the principal components computed during fit) on the data X.

Parameters

Parameters

X – array-like or h5py file handle. Training Data of shape (n_samples, n_features)

Returns

Return X_transformed

torch.tensor The transformed data.