What about feature engineering methods using simple machine learning techniques?

Laurae: This post is about a listing of simple methods to engineer features using unsupervised machine learning techniques (anything where you have an input and an output you don’t have to know much about — like k-means). An essential business rationale is given at the end. This post (formatting slightly edited) was originally made at Kaggle.

There are many ways to create features.

This is not limited to:

  • Linear combination of features (ex: v1+v2, 0.132882*v1+95.4294829428*v2…)
  • Categorical encoding (one-hot encoding, ordinal encoding, binary encoding, sum coding, polynomial coding, Helmert coding, forward difference encoding, backward difference encoding, combination of any of these encodings, self-made encodings)
  • PCA
  • ICA
  • PLS
  • Positive Rates
  • Negative Rates
  • Canonical Correlation
  • Box-Cox transformation
  • Yeo-Johnson transformation
  • Normalization
  • Standardization
  • Discretization of continuous features (many methods existing)
  • Model output
  • Tree output (features coming from a tree model)
  • Any combinations on this list, or combinations of combinations, etc. -> you can create an infinite amount of features

However, if you create that many features, you need to understand why. And you also will require a model that is able to encompass through all the features without being horribly slow.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.