DriftLens

class driftlens.driftlens.DriftLens(label_list=None)[source]

Bases: object

DriftLens Class.

baseline

BaselineClass object.

Type:: BaselineClass

threshold

ThresholdClass object.

Type:: ThresholdClass

label_list

List of class labels.

Type:: list(str)

batch_n_pc

Number of principal components to use for the per-batch.

Type:: int

per_label_n_pc

Number of principal components to use for the per-label.

Type:: int

baseline_algorithms

Dictionary of possible baseline algorithms.

Type:: dict

threshold_estimators

Dictionary of possible threshold estimators.

Type:: dict

KFold_threshold_estimation(label_list: list[int], E: ndarray, Y: ndarray, batch_n_pc: int, per_label_n_pc: int, window_size: int, flag_shuffle: bool = True)[source]

Estimates the threshold using the KFold algorithm (preliminary version of DriftLens).

Parameters:

label_list (list(int)) – List of class label ids used to train the model.
E (numpy.ndarray) – Embedding matrix of shape (m, d), where m is the number of samples and d the embedding dimensionality.
Y (numpy.ndarray) – Vector of predicted labels of shape (m, 1), where m is the number of samples.
batch_n_pc (int) – Number of principal components to use for the per-batch.
per_label_n_pc (int) – Number of principal components to use for the per-label.
window_size (int) – Size of the window to use for the threshold estimation.
flag_shuffle (bool, optional) – Flag to shuffle the samples before the threshold estimation. Default is True.

Returns:

The estimated threshold.

Return type:

numpy.ndarray

static _compute_bhattacharyya_distribution_distances(label_list: List[int], baseline: BaselineClass, E_w: ndarray, Y_w: ndarray, window_id: int = 0) → dict[source]

Computes the bhattacharyya distribution distance per-batch and per-label.

Parameters:

label_list (list(int)) – List of label ids.
baseline (BaselineClass) – The baseline object.
E_w (list`(:obj:`numpy.ndarray)`) – The embeddings of the current window.
Y_w (list`(:obj:`numpy.ndarray)`) – The predicted labels of the current window.
window_id (int) – The window id (default: 0).

Returns:

a dictionary containing the per-batch (window_distribution_distances_dict[batch]) and the per-label

static _compute_frechet_distribution_distances(label_list: List[int], baseline: BaselineClass, E_w: ndarray, Y_w: ndarray, window_id: int = 0) → dict[source]

Computes the frechet distribution distance (FDD) per-batch and per-label.

Parameters:

label_list (list(int)) – List of label ids.
baseline (BaselineClass) – The baseline object.
E_w (list`(:obj:`numpy.ndarray)`) – The embeddings of the current window.
Y_w (list`(:obj:`numpy.ndarray)`) – The predicted labels of the current window.
window_id (int) – The window id (default: 0).

Returns:

a dictionary containing the per-batch (window_distribution_distances_dict[batch]) and the per-label

static _compute_jensen_shannon_distribution_divergences(label_list: List[int], baseline: BaselineClass, E_w: ndarray, Y_w: ndarray, window_id: int = 0) → dict[source]

Computes the jensen shannon distribution distance per-batch and per-label.

Parameters:

label_list (list(int)) – List of label ids.
baseline (BaselineClass) – The baseline object.
E_w (list`(:obj:`numpy.ndarray)`) – The embeddings of the current window.
Y_w (list`(:obj:`numpy.ndarray)`) – The predicted labels of the current window.
window_id (int) – The window id (default: 0).

Returns:

a dictionary containing the per-batch (window_distribution_distances_dict[batch]) and the per-label

static _compute_kullback_leibler_distribution_divergences(label_list: List[int], baseline: BaselineClass, E_w: ndarray, Y_w: ndarray, window_id: int = 0) → dict[source]

Computes the frechet distribution distance (FID) per-batch and per-label.

Parameters:

label_list (list(int)) – List of label ids.
baseline (BaselineClass) – The baseline object.
E_w (list`(:obj:`numpy.ndarray)`) – The embeddings of the current window.
Y_w (list`(:obj:`numpy.ndarray)`) – The predicted labels of the current window.
window_id (int) – The window id (default: 0).

Returns:

a dictionary containing the per-batch (window_distribution_distances_dict[batch]) and the per-label

static _compute_mahalanobis_drift_distances(label_list: List[int], baseline: BaselineClass, E_w: ndarray, Y_w: ndarray, window_id: int = 0) → dict[source]

Computes the mahalanobis distribution distance per-batch and per-label.

Parameters:

label_list (list(int)) – List of label ids.
baseline (BaselineClass) – The baseline object.
E_w (list`(:obj:`numpy.ndarray)`) – The embeddings of the current window.
Y_w (list`(:obj:`numpy.ndarray)`) – The predicted labels of the current window.
window_id (int) – The window id (default: 0).

Returns:

a dictionary containing the per-batch (window_distribution_distances_dict[batch]) and the per-label

compute_drift_probability(window_distribution_list, threshold)[source]

compute_window_distribution_distances(E_w: ndarray, Y_w: ndarray, distribution_distance_metric: str = 'frechet_drift_distance') → dict[source]

Computes the per-batch and per-label distribution distances for an embedding window.

Parameters:

E_w (numpy.ndarray) – Embeddings of the window.
Y_w (numpy.ndarray) – Predicted labels of the window.
distribution_distance_metric (str, optional) – The distribution distance metric to use. The Frechet Distance is used by default. Options are: …

Returns:

a dictionary containing the per-batch (window_distribution_distances_dict[batch]) and the per-label (window_distribution_distances_dict[per-label][label]) distribution distances computed for the passed window with respect to the baseline.

compute_window_list_distribution_distances(E_w_list: List[ndarray], Y_w_list: List[ndarray], distribution_distance_metric: str = 'frechet_drift_distance') → Tuple[List[dict], List[dict]][source]

Computes the per-batch and per-label distribution distances for each embedding window.

Parameters:

E_w_list (list`(:obj:`numpy.ndarray)`) – List of embeddings of the windows.
Y_w_list (list`(:obj:`numpy.ndarray)`) – List of predicted labels of the windows.
distribution_distance_metric (str, optional) – The distribution distance metric to use. Currently, only the Frechet Inception Distance is supported.

Returns:

A tuple containing a list of dictionaries containing the per-batch (window_distribution_distances_dict[batch]) and the per-label (window_distribution_distances_dict[per-label][label]) distribution distances computed for each input window with respect to the baseline.

Return type:

tuple

static convert_distribution_distances_list_to_dataframe(distribution_distances_list: dict) → DataFrame[source]

Converts the list of distribution distances to a pandas DataFrame.

Parameters:: distribution_distances_list (list(dict)) – A list of dictionaries containing the distribution distances.
Returns:: A pandas DataFrame containing the distribution distances.
Return type:: pd.DataFrame

estimate_baseline(E: ndarray, Y: ndarray, label_list: List[int], batch_n_pc: int, per_label_n_pc: int, baseline_algorithm: str = 'StandardBaselineEstimator') → BaselineClass[source]

Estimates the baseline.

Parameters:

label_list (list(int)) – List of class label ids used to train the model.
batch_n_pc (int) – Number of principal components to use for the per-batch.
per_label_n_pc (int) – Number of principal components to use for the per-label.
E (numpy.ndarray) – Embedding matrix of shape (m, d), where m is the number of samples and d the embedding dimensionality.
Y (numpy.ndarray) – Vector of predicted labels of shape (m, 1), where m is the number of samples.
baseline_algorithm (str, optional) – Baseline estimation algorithm to use. Possible values are: “StandardBaselineEstimator”. If not provided, the default value is “StandardBaselineEstimator”.

Returns:

An instance of the BaselineClass class from the _baseline.py module, performing the offline phase of DriftLens.

Return type:

BaselineClass

load_baseline(folder_path: str, baseline_name: str) → BaselineClass[source]

Loads the baseline from disk into a BaselineClass object.

Parameters:

folder_path (str) – Folder path with the saved baseline.
baseline_name (str) – Filename of the baseline folder.

Returns:

the loaded baseline.

Return type:

BaselineClass

load_threshold(folder_path: str, threshold_name: str) → ThresholdClass[source]

Loads the threshold from disk into a ThresholdClass object.

Parameters:

folder_path (str) – Folder path with the saved threshold
threshold_name (str) – Filename of the threshold file.

Returns:

The loaded threshold.

Return type:

ThresholdClass

random_sampling_threshold_estimation(label_list: list[int], E: ndarray, Y: ndarray, batch_n_pc: int, per_label_n_pc: int, window_size: int, n_samples: int, flag_shuffle: bool = True, flag_replacement: bool = True, proportional_flag: bool = False, proportions_dict=None, distribution_distance_metric: str = 'frechet_drift_distance')[source]

Estimates the threshold using the random sampling algorithm.

Parameters:

label_list (list(int)) – List of class label ids used to train the model.
E (numpy.ndarray) – Embedding matrix of shape (m, d), where m is the number of samples and d the embedding dimensionality.
Y (numpy.ndarray) – Vector of predicted labels of shape (m, 1), where m is the number of samples.
batch_n_pc (int) – Number of principal components to use for the per-batch.
per_label_n_pc (int) – Number of principal components to use for the per-label.
window_size (int) – Size of the window to use for the threshold estimation.
n_samples (int) – Number of windows randomly sampled to use for the threshold estimation.
flag_shuffle (bool, optional) – Flag to shuffle the samples before the threshold estimation. Default is True.
flag_replacement (bool, optional) – Flag to sample with replacement the windows. Default is True.
proportional_flag (bool, optional) – Flag to use the windows with proportional distribution between labels. Default is False.
proportions_dict (dict, optional) – Dictionary with the proportions of the labels to use for the proportional sampling. Default is None.

Returns:

Tuple with the per-batch distances sorted and the per-label distances.

Return type:

tuple(numpy.ndarray, numpy.ndarray)

repeated_KFold_threshold_estimation(label_list, E, Y, batch_n_pc, per_label_n_pc, window_size, repetitions, flag_shuffle=True)[source]

save_baseline(folder_path: str, baseline_name: str) → str[source]

Stores persistently on disk the baseline.

Parameters:

folder_path (str) – Folder path where save the baseline.
baseline_name (str) – Filename of the baseline folder.

Returns:

Baseline folder path.

Return type:

str

save_threshold(folder_path: str, threshold_name: str) → str[source]

Stores persistently on disk the threshold.

Parameters:

folder_path (str) – Folder path where save the threshold.
threshold_name (str) – Filename of the threshold file.

Returns:

The threshold filepath.

Return type:

str

set_baseline(baseline: BaselineClass) → None[source]

Sets the baseline attribute with a BaselineClass object.

:param BaselineClass: The baseline object to set.

Returns:: None

set_threshold(threshold) → None[source]

Sets the threshold attribute with a ThresholdClass object.

:param ThresholdClass: The threshold object to set.

Returns:: None

standard_threshold_estimation(label_list, E, Y, baseline, window_size, flag_shuffle=True)[source]

class driftlens.driftlens.DriftLensVisualizer[source]

Bases: object

Class to visualize the drift detection monitor results.

static _parse_distribution_distances(label_list, windows_distribution_distances)[source]

Parse the distribution distances to per-label and per-batch distances. :param label_list: list of label ids. :type label_list: list(int) :param windows_distribution_distances: list of distribution distances. :type windows_distribution_distances: list(dict)

Returns:: dictionary with per-label distribution distances. per_batch_distribution_distances (list): list of per-batch distribution distances.
Return type:: per_label_distribution_distances (dict)

static plot_per_batch_drift_monitor(window_distribution_list, plt_title=None, plt_xlabel_name=None, plt_ylabel_name=None, ylim_top=15, flag_save=False, folder_path=None, filename=None, format='eps')[source]

static plot_per_label_drift_monitor(window_distribution_list, label_names=None, plt_title=None, plt_xlabel_name=None, plt_ylabel_name=None, ylim_top=15, flag_save=False, folder_path=None, filename=None, format='eps')[source]

plot_per_label_monitor_with_threshold(label_names=None, ylim_top=15)[source]