Data Science 365

Bring data into actionable insights.

Member-only story

3 Easy Steps to Perform Dimensionality Reduction Using Principal Component Analysis (PCA)

10 min readJan 3, 2023

--

Photo by Ashley Jurius on Unsplash

What is the dimensionality of a dataset?

In the context of both statistics and machine learning, the dimensionality of a dataset refers to the number of input variables (features) in the dataset.

If the dataset contains only two input variables as in the following image, it is called a two-dimensional dataset. In this case, the observations (data points) can be plotted in a 2D scatterplot.

Two-dimensional data (Image by author)

If we add another variable called Age to the same dataset, we implicitly add another dimension to the dataset. Now, the dataset becomes three-dimensional and the observations (data points) can be plotted in a 3D scatterplot.

Three-dimensional data (Image by author)

Likewise, the dataset is very high-dimensional when there are many variables in the dataset. It is…

--

--

Data Science 365
Data Science 365
Rukshan Pramoditha
Rukshan Pramoditha

Written by Rukshan Pramoditha

3,000,000+ Views | BSc in Stats (University of Colombo, Sri Lanka) | Top 50 Data Science, AI/ML Technical Writer on Medium

Responses (4)