DriftLens
- class driftlens.driftlens.DriftLens(label_list=None)[source]
Bases:
objectDriftLens Class.
- baseline
BaselineClass object.
- Type:
BaselineClass
- threshold
ThresholdClass object.
- Type:
ThresholdClass
- label_list
List of class labels.
- Type:
list(str)
- batch_n_pc
Number of principal components to use for the per-batch.
- Type:
int
- per_label_n_pc
Number of principal components to use for the per-label.
- Type:
int
- baseline_algorithms
Dictionary of possible baseline algorithms.
- Type:
dict
- threshold_estimators
Dictionary of possible threshold estimators.
- Type:
dict
- KFold_threshold_estimation(label_list: list[int], E: ndarray, Y: ndarray, batch_n_pc: int, per_label_n_pc: int, window_size: int, flag_shuffle: bool = True)[source]
Estimates the threshold using the KFold algorithm (preliminary version of DriftLens).
- Parameters:
label_list (
list(int)) – List of class label ids used to train the model.E (
numpy.ndarray) – Embedding matrix of shape (m, d), where m is the number of samples and d the embedding dimensionality.Y (
numpy.ndarray) – Vector of predicted labels of shape (m, 1), where m is the number of samples.batch_n_pc (
int) – Number of principal components to use for the per-batch.per_label_n_pc (
int) – Number of principal components to use for the per-label.window_size (
int) – Size of the window to use for the threshold estimation.flag_shuffle (
bool, optional) – Flag to shuffle the samples before the threshold estimation. Default is True.
- Returns:
The estimated threshold.
- Return type:
numpy.ndarray
- static _compute_bhattacharyya_distribution_distances(label_list: List[int], baseline: BaselineClass, E_w: ndarray, Y_w: ndarray, window_id: int = 0) dict[source]
Computes the bhattacharyya distribution distance per-batch and per-label.
- Parameters:
label_list (
list(int)) – List of label ids.baseline (
BaselineClass) – The baseline object.E_w (
list`(:obj:`numpy.ndarray)`) – The embeddings of the current window.Y_w (
list`(:obj:`numpy.ndarray)`) – The predicted labels of the current window.window_id (
int) – The window id (default: 0).
- Returns:
a dictionary containing the per-batch (window_distribution_distances_dict[batch]) and the per-label
- static _compute_frechet_distribution_distances(label_list: List[int], baseline: BaselineClass, E_w: ndarray, Y_w: ndarray, window_id: int = 0) dict[source]
Computes the frechet distribution distance (FDD) per-batch and per-label.
- Parameters:
label_list (
list(int)) – List of label ids.baseline (
BaselineClass) – The baseline object.E_w (
list`(:obj:`numpy.ndarray)`) – The embeddings of the current window.Y_w (
list`(:obj:`numpy.ndarray)`) – The predicted labels of the current window.window_id (
int) – The window id (default: 0).
- Returns:
a dictionary containing the per-batch (window_distribution_distances_dict[batch]) and the per-label
- static _compute_jensen_shannon_distribution_divergences(label_list: List[int], baseline: BaselineClass, E_w: ndarray, Y_w: ndarray, window_id: int = 0) dict[source]
Computes the jensen shannon distribution distance per-batch and per-label.
- Parameters:
label_list (
list(int)) – List of label ids.baseline (
BaselineClass) – The baseline object.E_w (
list`(:obj:`numpy.ndarray)`) – The embeddings of the current window.Y_w (
list`(:obj:`numpy.ndarray)`) – The predicted labels of the current window.window_id (
int) – The window id (default: 0).
- Returns:
a dictionary containing the per-batch (window_distribution_distances_dict[batch]) and the per-label
- static _compute_kullback_leibler_distribution_divergences(label_list: List[int], baseline: BaselineClass, E_w: ndarray, Y_w: ndarray, window_id: int = 0) dict[source]
Computes the frechet distribution distance (FID) per-batch and per-label.
- Parameters:
label_list (
list(int)) – List of label ids.baseline (
BaselineClass) – The baseline object.E_w (
list`(:obj:`numpy.ndarray)`) – The embeddings of the current window.Y_w (
list`(:obj:`numpy.ndarray)`) – The predicted labels of the current window.window_id (
int) – The window id (default: 0).
- Returns:
a dictionary containing the per-batch (window_distribution_distances_dict[batch]) and the per-label
- static _compute_mahalanobis_drift_distances(label_list: List[int], baseline: BaselineClass, E_w: ndarray, Y_w: ndarray, window_id: int = 0) dict[source]
Computes the mahalanobis distribution distance per-batch and per-label.
- Parameters:
label_list (
list(int)) – List of label ids.baseline (
BaselineClass) – The baseline object.E_w (
list`(:obj:`numpy.ndarray)`) – The embeddings of the current window.Y_w (
list`(:obj:`numpy.ndarray)`) – The predicted labels of the current window.window_id (
int) – The window id (default: 0).
- Returns:
a dictionary containing the per-batch (window_distribution_distances_dict[batch]) and the per-label
- compute_window_distribution_distances(E_w: ndarray, Y_w: ndarray, distribution_distance_metric: str = 'frechet_drift_distance') dict[source]
Computes the per-batch and per-label distribution distances for an embedding window.
- Parameters:
E_w (
numpy.ndarray) – Embeddings of the window.Y_w (
numpy.ndarray) – Predicted labels of the window.distribution_distance_metric (
str, optional) – The distribution distance metric to use. The Frechet Distance is used by default. Options are: …
- Returns:
a dictionary containing the per-batch (window_distribution_distances_dict[batch]) and the per-label (window_distribution_distances_dict[per-label][label]) distribution distances computed for the passed window with respect to the baseline.
- compute_window_list_distribution_distances(E_w_list: List[ndarray], Y_w_list: List[ndarray], distribution_distance_metric: str = 'frechet_drift_distance') Tuple[List[dict], List[dict]][source]
Computes the per-batch and per-label distribution distances for each embedding window.
- Parameters:
E_w_list (
list`(:obj:`numpy.ndarray)`) – List of embeddings of the windows.Y_w_list (
list`(:obj:`numpy.ndarray)`) – List of predicted labels of the windows.distribution_distance_metric (
str, optional) – The distribution distance metric to use. Currently, only the Frechet Inception Distance is supported.
- Returns:
A tuple containing a list of dictionaries containing the per-batch (window_distribution_distances_dict[batch]) and the per-label (window_distribution_distances_dict[per-label][label]) distribution distances computed for each input window with respect to the baseline.
- Return type:
tuple
- static convert_distribution_distances_list_to_dataframe(distribution_distances_list: dict) DataFrame[source]
Converts the list of distribution distances to a pandas DataFrame.
- Parameters:
distribution_distances_list (
list(dict)) – A list of dictionaries containing the distribution distances.- Returns:
A pandas DataFrame containing the distribution distances.
- Return type:
pd.DataFrame
- estimate_baseline(E: ndarray, Y: ndarray, label_list: List[int], batch_n_pc: int, per_label_n_pc: int, baseline_algorithm: str = 'StandardBaselineEstimator') BaselineClass[source]
Estimates the baseline.
- Parameters:
label_list (
list(int)) – List of class label ids used to train the model.batch_n_pc (
int) – Number of principal components to use for the per-batch.per_label_n_pc (
int) – Number of principal components to use for the per-label.E (
numpy.ndarray) – Embedding matrix of shape (m, d), where m is the number of samples and d the embedding dimensionality.Y (
numpy.ndarray) – Vector of predicted labels of shape (m, 1), where m is the number of samples.baseline_algorithm (
str, optional) – Baseline estimation algorithm to use. Possible values are: “StandardBaselineEstimator”. If not provided, the default value is “StandardBaselineEstimator”.
- Returns:
An instance of the BaselineClass class from the _baseline.py module, performing the offline phase of DriftLens.
- Return type:
BaselineClass
- load_baseline(folder_path: str, baseline_name: str) BaselineClass[source]
Loads the baseline from disk into a BaselineClass object.
- Parameters:
folder_path (
str) – Folder path with the saved baseline.baseline_name (
str) – Filename of the baseline folder.
- Returns:
the loaded baseline.
- Return type:
BaselineClass
- load_threshold(folder_path: str, threshold_name: str) ThresholdClass[source]
Loads the threshold from disk into a ThresholdClass object.
- Parameters:
folder_path (
str) – Folder path with the saved thresholdthreshold_name (
str) – Filename of the threshold file.
- Returns:
The loaded threshold.
- Return type:
ThresholdClass
- random_sampling_threshold_estimation(label_list: list[int], E: ndarray, Y: ndarray, batch_n_pc: int, per_label_n_pc: int, window_size: int, n_samples: int, flag_shuffle: bool = True, flag_replacement: bool = True, proportional_flag: bool = False, proportions_dict=None, distribution_distance_metric: str = 'frechet_drift_distance')[source]
Estimates the threshold using the random sampling algorithm.
- Parameters:
label_list (
list(int)) – List of class label ids used to train the model.E (
numpy.ndarray) – Embedding matrix of shape (m, d), where m is the number of samples and d the embedding dimensionality.Y (
numpy.ndarray) – Vector of predicted labels of shape (m, 1), where m is the number of samples.batch_n_pc (
int) – Number of principal components to use for the per-batch.per_label_n_pc (
int) – Number of principal components to use for the per-label.window_size (
int) – Size of the window to use for the threshold estimation.n_samples (
int) – Number of windows randomly sampled to use for the threshold estimation.flag_shuffle (
bool, optional) – Flag to shuffle the samples before the threshold estimation. Default is True.flag_replacement (
bool, optional) – Flag to sample with replacement the windows. Default is True.proportional_flag (
bool, optional) – Flag to use the windows with proportional distribution between labels. Default is False.proportions_dict (
dict, optional) – Dictionary with the proportions of the labels to use for the proportional sampling. Default is None.
- Returns:
Tuple with the per-batch distances sorted and the per-label distances.
- Return type:
tuple(numpy.ndarray, numpy.ndarray)
- repeated_KFold_threshold_estimation(label_list, E, Y, batch_n_pc, per_label_n_pc, window_size, repetitions, flag_shuffle=True)[source]
- save_baseline(folder_path: str, baseline_name: str) str[source]
Stores persistently on disk the baseline.
- Parameters:
folder_path (
str) – Folder path where save the baseline.baseline_name (
str) – Filename of the baseline folder.
- Returns:
Baseline folder path.
- Return type:
str
- save_threshold(folder_path: str, threshold_name: str) str[source]
Stores persistently on disk the threshold.
- Parameters:
folder_path (
str) – Folder path where save the threshold.threshold_name (
str) – Filename of the threshold file.
- Returns:
The threshold filepath.
- Return type:
str
- set_baseline(baseline: BaselineClass) None[source]
Sets the baseline attribute with a BaselineClass object.
:param
BaselineClass: The baseline object to set.- Returns:
None
- class driftlens.driftlens.DriftLensVisualizer[source]
Bases:
objectClass to visualize the drift detection monitor results.
- static _parse_distribution_distances(label_list, windows_distribution_distances)[source]
Parse the distribution distances to per-label and per-batch distances. :param label_list: list of label ids. :type label_list:
list(int):param windows_distribution_distances: list of distribution distances. :type windows_distribution_distances:list(dict)- Returns:
dictionary with per-label distribution distances. per_batch_distribution_distances (list): list of per-batch distribution distances.
- Return type:
per_label_distribution_distances (dict)
- static plot_per_batch_drift_monitor(window_distribution_list, plt_title=None, plt_xlabel_name=None, plt_ylabel_name=None, ylim_top=15, flag_save=False, folder_path=None, filename=None, format='eps')[source]