Transformers
Transformers offer a basic building block to executing machine learningpipelines. A transformer takes data from a data frame, transforms itusing some operation and outputs one or more new fields back to thedata frame. This is a very simple task, but is at the core of the Spark,Scikit-learn, MLeap and Tensorflow execution engines.
Transformers are used for many different tasks, but the most common inmachine learning are:
- Feature extraction
- Model scoring
Feature Extraction
Feature extraction is the process of taking one of more features fromand input dataset and deriving new features from them. In the case ofdata frames, the features come from the input data frame and are writtento the output data frame.
Some examples of feature extraction are:
- String indexing (label encoding) - Taking a string and converting itto an integer value
- One hot encoding - Converting an integer value to a vector of 1s and0s
- Feature selection - Running analysis to determine which features aremost effective for training a predictive ML algorithm (i.e. CHI2)
- Math - Basic mathematical functions, such as dividing two features byeach other or taking the log of a feature
There are too many examples of feature extraction to enumerate here, buttake a look at our complete list of supported feature transformers toget an idea of what is possible.
Regression
Regression is used to predict a continuousnumeric value, such as the price of a car or a home. Regression models,for the most part, operate on vectors of doubles called a “featurevector”. The feature vector contains all of the known information aboutwhat is being predicted. In the case of predicting a price of a house,the feature vector will have things like the encoded region where thehouse is, the square footage, how many bathrooms there are, how old itis, etc.
See a list of supported regression models.
Classification
Classification is used to predict categorical information. An example ismaking a binary prediction of whether or not to give a consumer a loan. Another example is predicting what type of sound is contained ina .wav file, or whether or not there is a person in and image.
Supported classification models.
Clustering
Clustering is used to assign a label to similar data (thus categorizing/clustering it). It is similar toclassification in that the predictions are discrete values from a set.Unlike classification models though, clustering models are part of the unsupervised family of models and do not operate on labeled data.This is useful for feature engineering, anomaly detection, as well as many other tasks.
Other Types of Transformers
Transformers can do ANYTHING! This is just a sample of the most commonuses of them. However, you can build transformers to resize images,resample sound data, import data from different data sources or anythingelse you can think of. The sky is the limit.