preprocessing.OneHotEncoder

Encode categorical integer features using a one-hot aka one-of-K scheme. The input to this transformer should be a matrix of integers, denoting the values taken on by categorical (discrete) features. The output will be a sparse matrix where each column corresponds to one possible value of one feature. It is assumed that input features take on values in the range [0, n_values).

This encoding is needed for feeding categorical data to many scikit-learn estimators, notably linear models and SVMs with the standard kernels.

Note: a one-hot encoding of y labels should use a LabelBinarizer instead.

Usage

const enc = new OneHotEncoder();
const planetList = [
 { planet: 'mars', isGasGiant: false, value: 10 },
 { planet: 'saturn', isGasGiant: true, value: 20 },
 { planet: 'jupiter', isGasGiant: true, value: 30 }
];
const encodeInfo = enc.encode(planetList, {
 dataKeys: ['value', 'isGasGiant'],
 labelKeys: ['planet']
});
// encodeInfo.data -> [ [ -1, 0, 1, 0, 0 ], [ 0, 1, 0, 1, 0 ], [ 1, 1, 0, 0, 1 ] ]
const decodedInfo = enc.decode(encodeInfo.data, encodeInfo.decoders);
// gives you back the original value, which is `planetList`

Methods

Methods


λ decode

Decode the encoded data back into its original format

Defined in preprocessing/data.ts:194

Parameters:

ParamTypeDefaultDescription
encodedany
decodersany

Returns:

any[]

λ encode

encode data according to dataKeys and labelKeys

Defined in preprocessing/data.ts:111

Parameters:

ParamTypeDefaultDescription
dataanynulllist of records to encode
options.dataKeysstring[]nullIndependent variables
options.labelKeysstring[]nullDepdenent variables

Returns:

ParamTypeDescription
dataany[]Encoded data
decodersany[]Decoder