PCA

simbsig.decomposition.PCA.PCA(n_components=None, iterated_power=0, n_oversamples=None, centered=False, device='cpu', mode='arrays', n_jobs=0, batch_size=None, random_state=None, verbose=True, **kwargs)

Principal Component Analysis class. Implements Halko’s algorithm[1], batched data loading for big datasets and optional GPU accelerated computations.

Parameters

Parameters

n_components – int, default=None Number of principle components to be kept.
power (iterated) – int, default=0 The i in Halko’s paper
n_oversamples – int, default=n_components+2. The l in Halko’s paper
centered – bool, default=False Whether the features of the input data have been centered.
device – str, default=’cpu’ Which device to use for distance computations. Options supported are: [‘cpu’,’gpu’]
mode – str, default=’arrays’ Whether the input data is in memory (as lists, arrays or tensors) or on disk as hdf5 files. The latter should be favored for big datasets. Options supported are: [‘arrays’,’hdf5’]
n_jobs – int, default=0 Number of jobs active in torch.dataloader.
batch_size – str, default=None Batch size of data chunks that are processed at once for distance computations. Should be optimized for dataset when using device=’gpu’. If batch_size=None, the entire dataset is loaded and processed at once, which may return an error when using device=’gpu’.
random_state – int, default=None The random state for the seed of torch.
verbose – bool, default=True Logging information. If True, progression updates are produced.

[1] Halko, Nathan, et al. “An algorithm for the principal component analysis of large data sets.” SIAM Journal on Scientific computing 33.5 (2011): 2580-2594.

simbsig.decomposition.PCA.PCA.fit(self, X, y=None)

Performs principle component decomposition using Halko’s method

Parameters

Parameters

X – array-like or h5py file handle. Training Data of shape (n_samples, n_features) or (n_samples, n_samples) if metric=’precomputed’
y – Ignored. Only present by convention.

Returns

Return self: PCA The PCA object with computed principal components

simbsig.decomposition.PCA.PCA.transform(self, X, centered=False)

Transforms data of same dimension as training data into dimension of n_components using the principal components computed during fit.

Parameters

Parameters

X – array-like or h5py file handle. Training Data of shape (n_samples, n_features)
centered – bool, default=False Whether the features of the input data have been centered.

Returns

Return X_transformed: torch.tensor The transformed data.

simbsig.decomposition.PCA.PCA.fit_transform(self, X)

Performs fit (principle component decomposition using Halko’s method) and transform (Transforms data of same data into dimension of n_components using the principal components computed during fit) on the data X.

Parameters

Parameters: X – array-like or h5py file handle. Training Data of shape (n_samples, n_features)

Returns

Return X_transformed: torch.tensor The transformed data.