# Why is the second Principal Component orthogonal (perpendicular) to the first one?

Because **the second Principal Component** should capture the highest variance **from what is left** after the first Principal Component explains the data as much as it can. (The first principal component captures the most data variability.)

But why does the orthogonal direction capture the most variation?

If two directions are not orthogonal, they are linearly dependent on each other, which means that one direction can be expressed as a linear combination of the other direction. If two directions are orthogonal (linearly independent), they do not capture any unique variance in the data beyond what the first direction has already caught.

The direction that captures the highest variation in the data is the first principal component (PC1). When finding the second principal component (PC2), the algorithm looks for the direction that captures the most variance but is orthogonal (perpendicular) to PC1.

This is because the goal of PCA is to capture as much variation as possible with as few principal components as possible. By looking for orthogonal directions, each subsequent PC captures additional unique variation in the data.