L-infinity histogram gap

Link to the Paper

humancompatible.detect.methods.l_inf.l_inf.check_l_inf_gap(X: ndarray, y: ndarray, binarizer: Binarizer, feature_involved: str, subgroup_to_check: Any, delta: float, verbose: int = 1) bool[source]

Test whether a protected subgroup’s outcome distribution differs from the overall population by at most delta in the l_inf-norm.

Parameters:
  • X (np.ndarray) – Protected-attribute slice of the dataset (same rows as y).

  • y (np.ndarray) – Boolean target vector.

  • binarizer (Binarizer) – The binarizer used to encode X and y.

  • feature_involved (str) – Name of the protected column whose subgroup is tested.

  • subgroup_to_check (Any) – Raw value of the subgroup to isolate.

  • delta (float) – Threshold for the L-infinity norm.

  • verbose (int, default 1) – Verbosity level. 0 = silent, 1 = logger output only, 2 = all detailed logs (including solver output).

Returns:

True if the subgroup histogram is within delta; False otherwise.

Return type:

bool

Raises:
  • ValueError – If delta is not positive.

  • KeyError – If feature_involved is not in the binarizer’s feature names.

  • KeyError – If subgroup_to_check is not a valid value for the feature.

humancompatible.detect.methods.l_inf.lp_tools.lin_prog_feas(hist1: ndarray, hist2: ndarray, delta: float, num_samples: float = 1.0) int[source]

Specifies a number of samples as a fraction of the total histogram bins and checks whether all the sampled bins satisfy

|hist1 - hist2| <= delta

Parameters:
  • hist1 (np.ndarray) – 1-D array (or (n,1) column vector) of histogram bin densities for the full dataset.

  • hist2 (np.ndarray) – 1-D array (or (n,1) column vector) of histogram bin densities for the subgroup.

  • delta (float) – Threshold for the absolute difference |hist1 - hist2|.

  • num_samples (float) – Fraction of total bins to sample. The function draws int(num_samples * (len(hist1) - 1)) random samples.

Returns:

Status code from scipy.optimize.linprog. A status of 0 indicates

the constraints are feasible (i.e., |hist1 - hist2| <= delta for all sampled bins); other codes signal infeasibility or solver errors.

Return type:

int