Data Preprocessing
Published in

Data Preprocessing

Data Reduction

With great amounts of data comes the greater need to process data accurately. And in this case, analysis with tons of data onboard can be a difficult task to deal with. Therefore, such techniques are employed in data preprocessing in data mining to get the required results and can be done so in the following ways.

Image by @ibm
  1. Data Cube Aggregation:
    A data cube is constructed using the operation of data aggregation.
  2. Attribute Subset Selection:
    using only attributes that are highly relevant is usually the correct way to deal with things. Unnecessary data can always be discarded. In attribute selection, a level can be decided and anything that may be of lesser significance can be discarded.
  3. Numerosity Reduction:
    in this case, data preprocessing only stores model data and throws away unnecessary data.
  4. Dimensionality Reduction:
    using various encoding mechanisms, the size of the data can be reduced. Depending on how it’s done, one may or may not lose data. If after reduction, one is able to successfully retrieve reduced data, then it is considered lossless. If otherwise, then the data is lost for good.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Gurram Bhaskar

Gurram Bhaskar

I am an aspiring data scientist who enjoys connecting the dots: be it ideas from different disciplines, people from different teams.