Natural Language Processing (Part 32)-PCA Algorithm

5 min readFeb 25, 2024

📚Chapter 3: Vector Space Model

Introduction

You will now learn about Eigenvalues and Eigenvectors and you’ll see how you can use them to reduce the dimension of your features. First, I’ll show you how to get
uncorrelated features for your data. And then how to reduce the dimensions of your word representations while trying to keep as much information as possible from your original embedding. To perform dimensionality reduction using PCA, begin with your original vectra space, then you get uncorrelated features for your data. And finally, you project your data to a number of desired features that retain the most information.

Sections

How to get uncorrelated Features
How to reduce dimensions while retaining as much information as possible

Section 1- How to get uncorrelated Features

You may recall from algebra that matrices have Eigenvectors and Eigenvalues. You do not need to remember how to get those right now, but you should keep in mind that’s NPCA, they are useful because the Eigenvectors of the co-variance matrix from your data. You give directions of uncorrelated features and the Eigenvalues are the variants of your data sets in each of those new features.

So to perform PCA,you will need to get the Eigenvectors and Eigenvalues from the co variance matrix of your data.

The first step is to get a set of uncorrelated features for this step, you mean normalize your data (The first step in PCA is to standardize the data. Since the scale of the data influences PCA, standardizing the data (giving it mean of 0 and variance of 1) ensures that the analysis is not biased towards variables with greater magnitude.), then medical variance matrix.

Covariance Matrix Computation: PCA looks at the variance and the covariance of the data. Variance is a measure of the variability of a single feature, and covariance is a measure of how much two features change together. The covariance matrix is a table where each element represents the covariance between two features.

And finally perform a singular value decomposition to get a set of three matrices. The first of those matrices contain the Eigenvectors stacked column wise and the second one has the Eigenvalues on the diagonal. The singular vector decomposition is already implemented in many programming libraries, so you don’t have to worry about how it works.

Eigenvalue and Eigenvector Calculation: From the covariance matrix, eigenvalues and eigenvectors are calculated. Eigenvectors are the directions of the axes where there is the most variance (i.e., the principal components), and eigenvalues are coefficients attached to eigenvectors that give the amount of variance carried in each Principal Component.

Section 2- How to reduce dimensions while retaining as much information as possible.

The next step is to project your data to a new sets of features. You will be using the Eigenvectors and Eigenvalues in this step, let’s denote the Eigenvectors with U and the Eigenvalues with S. First, you will perform the dust products between the matrix containing your word embeddings and the first and columns of the U matrix. Where N equals the number of dimensions that you want to have at the end. For visualization, it’s common practice to have two dimensions, then you’ll get the percentage of variants retained in the new vector space. As an important side note, your Eigenvectors and Eigenvalues should be organized according to the Eigenvalues in descending order. This condition will ensure that you retain as much information as possible from your original embedding. However, most libraries order those matrices for you.

Transforming Data: Finally, the original data is projected onto the principal components (eigenvectors) to transform the data into a new space. This results in a new dataset where the variables are uncorrelated and where the first few variables retain most of the variability of the original data.

Summary

Wrapping up, Eigenvectors from the co variance matrix of your normalized data, give the directions of uncorrelated features. The Eigenvalues associated with those Eigenvectors tell you the variants of your data on those features. It does products between your word embeddings and the matrix of Eigenvectors will project your data onto a new vector space of the dimension that you choose.

Please Follow and 👏 Clap for the story courses teach to see latest updates on this story

If you want to learn more about these topics: Python, Machine Learning Data Science, Statistic For Machine learning, Linear Algebra for Machine learning Computer Vision and Research

Then Login and Enroll in Coursesteach to get fantastic content in the data field.

Stay tuned for our upcoming articles where we will explore specific topics related to NLP in more detail!

Remember, learning is a continuous process. So keep learning and keep creating and sharing with others!💻✌️

Note:if you are a NLP export and have some good suggestions to improve this blog to share, you write comments and contribute.

if you need more update about NLP and want to contribute then following and enroll in following

👉Course: Natural Language Processing (NLP)

👉📚GitHub Repository

👉 📝Notebook

Do you want to get into data science and AI and need help figuring out how? I can offer you research supervision and long-term career mentoring.
Skype: themushtaq48, email:mushtaqmsit@gmail.com

Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.

Together, let’s make this the best AI learning Community! 🚀

To Do List

1- Watch video 2 include important point in this blog

2- include easy explanation of Eigenvectors and Eigenvalues

3- Collects Keys points from the blogs

👉WhatsApp

👉 Facebook

👉Github

👉LinkedIn

👉Youtube

👉Twitter