6. Dataset transformations
scikit-learn provides a library of transformers, which may clean (seePreprocessing data), reduce (see Unsupervised dimensionality reduction), expand (seeKernel Approximation) or generate (see Feature extraction)feature representations.
Like other estimators, these are represented by classes with a fit
method,which learns model parameters (e.g. mean and standard deviation fornormalization) from a training set, and a transform
method which appliesthis transformation model to unseen data. fit_transform
may be moreconvenient and efficient for modelling and transforming the training datasimultaneously.
Combining such transformers, either in parallel or series is covered inPipelines and composite estimators. Pairwise metrics, Affinities and Kernels covers transforming featurespaces into affinity matrices, while Transforming the prediction target (y) considerstransformations of the target space (e.g. categorical labels) for use inscikit-learn.
- 6.1. Pipelines and composite estimators
- 6.2. Feature extraction
- 6.3. Preprocessing data
- 6.4. Imputation of missing values
- 6.5. Unsupervised dimensionality reduction
- 6.6. Random Projection
- 6.7. Kernel Approximation
- 6.8. Pairwise metrics, Affinities and Kernels
- 6.9. Transforming the prediction target (
y
)