Advanced
Custom Metric
SIMBSIG allows custom distance metrics to be used during similarity searches.
General Interface
These distance metrics should follow this interface:
def custom_metric(x1, x2, feature_weights=None, metric_params=None):
"""Generic pairwise distance function
Parameters:
:parameter x1: torch.tensor of dimension (n_samples, n_features)
:parameter x2: torch.tensor of dimension (m_samples, n_features)
:parameter feature_weights: torch.tensor of dimension (n_features,)
:parameter metric_params: can be any parameter which which is handed over to the SIMBSIG neighbors module
as metric_params
Returns:
:return dist_mat: numpy.array of dimension (n_samples, m_samples)
Notice that n_samples does not have to be equal to m_samples. However, both n_features have to match.
If GPU is available and a SIMBSIG neighbors module (NearestNeighbors, KNeighborsClassifier, KNeighborsRegressor,
RadiusNeighborsClassifier, RadiusNeighborsRegressor) is instantiated with device=='gpu', x1 and x2 will
be handed over to custom_metric the GPU.
"""
# 1. Compute pairwise distances between points in x1 and x2 using torch.tensor operations for GPU acceleration
# If the GPU acceleration speedup is not required, moving x1 and x2 off the gpu and using for example
# np.array operations is possible. Optionally, feature weights can be used.
# dist_mat = some_operations(x1, x2)
# 2. Move the result off of the tensor, and convert to numpy.array
# dist_mat = dist_mat.cpu().numpy
# 3. return the dist_mat
# return dist_mat
pass
Example
As an example, we show how the 1 - rbf-kernel similarity could be used as custom distance metric for kernelised simliarity searches. To use this for example in NearestNeighbors, the class instantiation should include sigma as key in a dictionary passed to metric_params:
nn_simbsig = NearestNeighbors(n_neighbors=n_neighbors, metric=custom_rbf_metric, metric_params={'sigma':2})
With the following example custom metric:
def custom_rbf_metric(x1, x2, p=None, feature_weights=None, sigma=None):
"""Example pairwise distance function
Parameters:
:parameter x1: torch.tensor of dimension (n_samples, n_features)
:parameter x2: torch.tensor of dimension (m_samples, n_features)
:parameter feature_weights: torch.tensor of dimension (n_features,)
:parameter sigma: passed as metric_params={'sigma':int} in constructor. any custom parameter name may be
chosen.
Returns:
:return dist_mat: numpy.array of dimension (n_samples, m_samples)
Notice that n_samples does not have to be equal to m_samples. However, both n_features have to match.
If GPU is available and a simbsig neighbors module (NearestNeighbors, KNeighborsClassifier, KNeighborsRegressor,
RadiusNeighborsClassifier, RadiusNeighborsRegressor) is instantiated with device=='gpu', x1 and x2 will
be handed over to custom_metric the GPU.
"""
# 1. Compute pairwise distances between points in x1 and x2 using torch.tensor operations for GPU acceleration
# If the GPU acceleration speedup is not required, moving x1 and x2 off the gpu and using for example
# np.array operations is possible. Optionally, feature weights can be used.
# First step: compute pairwise euclidean distances
euclidean_dist_mat = torch.pow(torch.cdist(x1, x2, 2), 2)
# Second step: exp(-euclidean_distance/sigma)
rbf_pairwise = torch.exp(-euclidean_dist_mat / sigma)
# dist_mat = 1 - rbf_pairwise
dist_mat = 1 - rbf_pairwise
# 2. Move the result off of the tensor, and convert to numpy.array
dist_mat = dist_mat.cpu().numpy()
# 3. return the dist_mat
return dist_mat