NearestNeighbors

simbsig.neighbors.NearestNeighbors.NearestNeighbors(n_neighbors=5, radius=None, metric='euclidean', p=2, metric_params=None, feature_weights=None, device='cpu', mode='arrays', n_jobs=0, batch_size=None, verbose=True, **kwargs)

Unsupervised learner performing neighbor searches. Implements batched data loading for big datasets and optional GPU accelerated computations

Parameters

Parameters
  • n_neighbors – int, default=5 Number of neighbors to search for during kneighbors queries.

  • radius – float, default=1.0 Dimension of the neighboring space in which to search for radius_neighbors() queries.

  • metric – str or callable, default=’minkowski’ The distance metric used to quantify similarity between objects, with default metric being minkowski. Other available metrics include [‘euclidean’, ‘manhattan’, ‘minkowski’,’fractional’,’cosine’,’mahalanobis’]. When metric=’precomputed’, provide X as a distance matrix which will be square during fit.

  • p – int, default=2 Parameter to be used when metric=’minkowski’. Note that if p=1 or p=2, it is equivalent to using metric=‘manhattan’ (L1) or metric=‘euclidean’ (L2), respectively. For any other arbitrary p, minkowski distance (L_p) is used.

  • metric_params – dict, default=None Additional metric-specific keyword arguments.

  • feature_weights – np.array of floats, default=None Vector giving user-defined weights to every feature. Must be of similar length as the number of features n_features_in. If feature_weights=None, uniform weights are applied.

  • device – str, default=’cpu’ Which device to use for distance computations. Options supported are: [‘cpu’,’gpu’]

  • mode – str, default=’arrays’ Whether the input data is in memory (as lists, arrays or tensors) or on disk as hdf5 files. The latter should be favored for big datasets. Options supported are: [‘arrays’,’hdf5’]

  • n_jobs – int, default=0 Number of jobs active in torch.dataloader.

  • batch_size – str, default=None Batch size of data chunks that are processed at once for distance computations. Should be optimized for dataset when using device=’gpu’. If batch_size=None, the entire dataset is loaded and processed at once, which may return an error when using device=’gpu’.

  • verbose – bool, default=True Logging information. If True, progression updates are produced.

simbsig.neighbors.NearestNeighbors.NearestNeighbors.fit(self, X, y=None)

Fit the nearest neighbors estimator from the training dataset.

Parameters

Parameters
  • X – array-like or h5py file handle. Training Data of shape (n_samples, n_features) or (n_samples, n_samples) if metric=’precomputed’

  • y – Ignored. Only present by convention

Returns

Return self

NearestNeighbor The fitted nearest neighbors estimator.

simbsig.neighbors.NearestNeighbors.NearestNeighbors.kneighbors(self, X=None, n_neighbors=None, return_distance=True, sort_results=False)

Find the K-neighbors of a point, with K=n_neighbors. Returns indices (including or not corresponding distances) of the K-neighbors.

Parameters

Parameters
  • X – array-like or h5py file handle, shape (n_queries, n_features), or (n_queries, n_indexed) if metric == ‘precomputed’, default=None The query point or points. If not provided, neighbors of each indexed point are returned, excluding itself.

  • n_neighbors – int, default=None Number of neighbors to search for. By default, the value passed to the constructor is used.

  • return_distance – bool, default=True Should the distances between the point and its neighbors be returned or not.

  • sort_results – bool, default=False Should the nearest neighbors be sorted by increasing distance to the query point or not. Note that if return_distance=False`and `sort_results=True, an error will be returned.

Returns

Return neigh_ind

ndarray of shape (n_queries, n_neighbors) storing indices of the nearest neighbors in the population matrix.

Return neigh_dist

ndarray of shape (n_queries, n_neighbors) If return_distance=True: array representing the distances to points.

simbsig.neighbors.NearestNeighbors.NearestNeighbors.radius_neighbors(self, X=None, radius=None, return_distance=True, sort_results=False)

Find the neighbors within a given radius of a point or points. Returns indices (including or not corresponding distances) of the neighbors lying in or on the boundary of a ball with size radius around the points of the query array. Note that the result points might not be sorted by distance to their query point.

Parameters

Parameters
  • X – array-like or h5py file handle of (n_samples, n_features), default=None The query point or points. If not provided, neighbors of each indexed point are returned, excluding itself.

  • radius – float, default=None Dimension of the neighboring space in which the search is performed. By default, the value passed to the constructor is used.

  • return_distance – bool, default=True Should the distances between the point and its neighbors be returned or not.

  • sort_results – bool, default=False Should the nearest neighbors be sorted by increasing distance to the query point or not. Note that if return_distance=False`and `sort_results=True, an error will be returned.

Returns

Return neigh_dist

ndarray of shape (n_samples,) representing the distances to points. Only present if return_distance=True.

Return neigh_ind

ndarray of shape (n_samples,) of arrays of indices of the approximate nearest points that lie within or at the border of a ball of size radius around the query points.

Results from different points may not collect the same number of neighbors and therefore may not fit in a standard array. To overcome this problem efficiently, radius_neighbors returns an array containing 1D arrays of indices or distances.