# How to add text similarity to your applications easily using MediaPipe and Python

Machine learning is everywhere when using a mobile app: recommending people or products, detecting people or tags in an image, etc.

Recommend means getting similar items and the user can see and consume something new.

Embedding is an essential component when you need to obtain similarities and is used to:

- Semantic search
- Clustering
- Recommendations
- Anomaly detection
- Classification

An embedding is a numerical representation or a vector of floating point numbers, like this:

`embedding": [`

-0.006929283495992422,

-0.005336422007530928,

...

-4.547132266452536e-05,

-0.024047505110502243

]

Now, you can perform operations between two embeddings, for example, distance measures of their relationship. Small distances suggest a high relationship and large distances suggest a low relationship. There are three most commonly used measures listed below.:

- Euclidean Distance — Distance between ends of vectors
- Cosine — Cosine of the angle between vectors
- Dot product — Cosine multiplied by lengths of both vectors

How do you get embeddings? This is our next question. There are some libraries and APIs to get embeddings like the following:

- MediaPipe
- Vertex AI client libraries
- PaLM API by Google
- OpenAI API
- SentenceTransformers
- FastText by Facebook

On the next step, we’ll talk about MediaPipe

# MediaPipe Solutions

MediaPipe Solutions provides a set of libraries and tools to apply machine learning (ML) to your applications quickly. You can use SDKs for multiple platforms, including Android, Python, and Web, to get your machine learning up and running in fewer minutes. Additionally, you can customize your models using transfer learning. MediaPipe offers libraries and resources to use in your applications and tools to customize and evaluate new solutions.

MediaPipe also offers a library where you can get image and text embeddings; In our case, we use it for texts. The MediaPipe documentation recommends the Universal Sentence Encoder model.

# How to get embeddings with MediaPipe

We are using the SDK for Python to get embeddings.

The are the steps:

- Install the MediaPipe Python package

`pip install mediapipe`

- Download the model and store it in your project [download model].
- Then you can get text embeddings like this:

Okay, the next step would be to determine the level of similarity between the sentences.

Let’s look at an example with two sentences and their embeddings, something like this:

`Sentences:`

{

"text1":"How's it going?",

"text2":"I am fine"

}

Embeddings:

{

"text1embedding": "[127 16 185 127 82 127 128 50 127 127 172 10 127 128 127 127 7 160\n 128 128 128 90 127 238 70 127 246 128 127 127 170 128 182 185 9 76\n 154 196 4 42 136 127 127 127 128 28 151 127 127 4 135 127 80 157\n 77 90 113 41 15 127 128 167 127 83 1 127 217 60 128 90 255 2\n 161 232 24 171 127 9 55 12 127 210 127 87 181 79 127 88 128 124\n 128 7 128 128 128 19 127 127 250 145]",

"text2embedding": "[127 44 209 127 35 127 128 128 127 81 176 26 127 128 127 127 242 180\n 139 128 128 127 127 147 126 127 230 128 127 127 200 137 128 9 65 70\n 217 128 22 124 142 127 118 127 194 131 128 127 110 245 142 127 127 151\n 127 50 67 61 248 127 128 128 127 36 216 127 218 106 151 78 20 223\n 182 189 222 233 127 1 76 11 127 253 127 33 186 127 127 235 128 121\n 128 4 128 128 175 187 127 87 228 141]",

}

# How to get the distance between two embeddings

There are three famous methods to obtain similarities between embeddings. These are:

**Euclidean Distance — Distance between ends of vectors**

The L2 norm calculates the distance of the vector coordinate from the origin of the vector space, for this reason, we can calculate the Euclidean distance with Python and the Numpy library using the **norm** function.

**Dot product — Cosine multiplied by lengths of both vectors**

The dot product is a simple vector operation in which we multiply element by element and then add those multiplication terms. We can perform this operation with Python and the Numpy library using the **dot** function.

**Cosine — Cosine of the angle between vectors**

We can perform this operation with Python and the Numpy library something like this:

Although with Numpy it is easy to calculate the similarity with **MediaPipe** it is **easier** because you only use a method like this:

So let’s look at a complete example.

I hope this information is useful to you and remember to share this blog post, your comment is always welcome.

Visit my social networks:

# Resources

- https://developers.google.com/machine-learning/crash-course/embeddings/video-lecture
- https://platform.openai.com/docs/guides/embeddings
- https://platform.openai.com/docs/guides/embeddings/what-are-embeddings
- https://developers.google.com/mediapipe
- https://developers.google.com/machine-learning/clustering/similarity/measuring-similarity
- https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings#generative-ai-get-text-embedding-python
- https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-multimodal-embeddings
- https://developers.generativeai.google/
- https://www.sbert.net/index.html
- https://github.com/facebookresearch/fastText
- https://developers.google.com/machine-learning/crash-course/embeddings/obtaining-embeddings
- https://github.com/jggomez/Text-Embedings-API
- https://en.wikipedia.org/wiki/Transfer_learning