What about feature engineering methods using simple machine learning techniques?
Laurae: This post is about a listing of simple methods to engineer features using unsupervised machine learning techniques (anything where you have an input and an output you don’t have to know much about — like k-means). An essential business rationale is given at the end. This post (formatting slightly edited) was originally made at Kaggle.
There are many ways to create features.
This is not limited to:
- Linear combination of features (ex: v1+v2, 0.132882*v1+95.4294829428*v2…)
- Categorical encoding (one-hot encoding, ordinal encoding, binary encoding, sum coding, polynomial coding, Helmert coding, forward difference encoding, backward difference encoding, combination of any of these encodings, self-made encodings)
- Positive Rates
- Negative Rates
- Canonical Correlation
- Box-Cox transformation
- Yeo-Johnson transformation
- Discretization of continuous features (many methods existing)
- Model output
- Tree output (features coming from a tree model)
- Any combinations on this list, or combinations of combinations, etc. -> you can create an infinite amount of features
However, if you create that many features, you need to understand why. And you also will require a model that is able to encompass through all the features without being horribly slow.