humancompatible.detect.data_handler.DataHandler module

class humancompatible.detect.data_handler.DataHandler.DataHandler(features: list[Feature], target: Feature | None = None, causal_inc: list[tuple[Feature, Feature]] | None = None, greater_than: list[tuple[Feature, Feature]] | None = None)[source]

Bases: object

Performs all data processing from a pandas DataFrame/numpy array to a normalized and encoded input Expected use is to initialize this with training data and then use it to encode all data. Supports mixed encoding, where only some values are categorical Normalizes contiguous data to [0, 1] range Produces either one-hot encoded data or direct data with mapped categorical data to negative integers

allowed_changes(pre_vals, post_vals)[source]
property causal_inc: list[tuple[Feature, Feature]]
decode(X: ndarray[float64], denormalize: bool = True, encoded_one_hot: bool = True, as_dataframe: bool = True) ndarray[float64][source]

Decode input features.

Parameters:

Xarray-like
Input data matrix. Shape: (num_samples, num_enc_features)

where num_enc_features can be higher than num_features, because of one-hot encoding

denormalizebool, optional

Whether to invert the normalization of the features (default is True).

encoded_one_hotbool, optional

Whether the input matrix is one-hot encoded (default is True).

as_dataframebool, optional

Whether to return a pandas DataFrame or numpy array (default is True - DataFrame).

Returns:

decoded_Xnumpy array

Decoded features in the original format. Shape: (num_samples, num_features)

decode_y(y: ndarray[float64], denormalize: bool = True, as_series: bool = True) ndarray[float64][source]

Decode target feature.

Parameters:

yarray-like
Target feature data. Shape: (num_samples,) for general case

or (num_samples, num_categorical_values) in case of one-hot encoding

denormalizebool, optional

Whether to invert the normalization of the feature (default is True).

as_seriesbool, optional

Whether to return a pandas Series or numpy array (default is True - Series).

Returns:

decoded_ynumpy array

Decoded target feature data. Shape: (num_samples,)

encode(X: ndarray | DataFrame, normalize: bool = True, one_hot: bool = True) ndarray[float64][source]

Encode input features.

Parameters:

Xarray-like

Input features (data matrix or DataFrame). Shape: (num_samples, num_features)

normalizebool, optional

Whether to normalize the features (default is True).

one_hotbool, optional

Whether to perform one-hot encoding for categorical values (default is True).

Returns:

encoded_Xnumpy array

Encoded input features. Shape: (num_samples, one_hot_features) when one hot encoding is performed, (num_samples, num_features) otherwise

encode_all(X_all: ndarray, normalize: bool, one_hot: bool)[source]
encode_y(y: ndarray | Series, normalize: bool = True, one_hot: bool = True) ndarray[float64][source]

Encode target feature.

Parameters:

yarray-like

Target feature (data matrix or DataFrame of labels or regression targets). Shape: (num_samples,)

normalizebool, optional

Whether to normalize the features (default is True).

one_hotbool, optional

Whether to perform one-hot encoding for categorical values (default is True).

Returns:

encoded_ynumpy array

Encoded target feature. Shape: (num_samples, num_values) for one hot encoding or (num_samples,) otherwise

encoding_width(one_hot: bool) int[source]
property feature_names: list[str]

List of feature names

property features: list[Feature]

List of input features

classmethod from_data(X: ndarray | DataFrame, y: ndarray | Series | None = None, categ_map: dict[int | str, list[int | str]] = {}, ordered: list[int | str] = [], bounds_map: dict[int | str, tuple[int, int]] = {}, discrete: list[int | str] = [], immutable: list[int | str] = [], monotonicity: dict[int | str, Monotonicity] = {}, causal_inc: list[tuple[int | str, int | str]] = [], greater_than: list[tuple[int | str, int | str]] = [], regression: bool = False, feature_names: list[str] | None = None, target_name: str | None = None) DataHandler[source]

Construct a DataHandler instance.

Parameters:

Xarray-like (2 dimensional)

Input features. Shape: (num_samples, num_features)

yarray-like (1 dimensional)

Target feature (e.g., labels or regression targets). Shape: (num_samples,)

categdictionary

Dictionary with indices (or column names for DataFrame) of categorical features as keys and a list of unique categorical values as values.

If the list is empty, each unique value of the feature is considered categorical If the list is non-empty, but does not cover all values, the feature is considered mixed

regressionbool

True if the task is regression, False if y is categorical and task is classification.

feature_namesoptional list of strings

List of feature names, if None it is recovered from column names if X is a DataFrame

target_nameoptional string

Name of the target feature, if None it is recovered from X if X is a pandas Series

property greater_than: list[tuple[Feature, Feature]]
property n_features: int

Number of features in the input space

property target_feature: Feature

Target feature