How to Represent a Poodle as a Vector

Finding your perfect dog using vector representations for fast nearest neighbour searches

Graham Horne
Jun 11 · 5 min read

Word vectorization?

The aim of word vectorization is to associate words within a corpus and assign a “vector” representation. It works by extracting information from a text corpus. The vector is assigned a value.

So how do these Vectors help me to find the right Poodle?

For a comprehensive model for Poodles, we might include countries and climates for certain types of poodles. Also, behavioural patterns and articles, vet care, extensive articles on dog health and lifespans, history of each breed, etc.

  • Kid Rating

Ok, I get the concept, but why is it interesting?

This technique goes further than grouping words, it also enables arithmetical operations between them.. What it means is that you can do the following:

  • You can use these vectors to feed another more ambitious machine learning algorithm (for example, a neural network).
  • The ultimate goal is to allow machines to understand human language, not by learning it by heart but by having a structured representation of it.

Sounds feasible! Where do I start with our dogs?

We insert our features and then output into a vector format

  • Life Span‎: ‎10 to 15 years
  • Height‎: ‎7 inches to 1 foot, 3 inches tall
  • Temperament Rating: Calm
Original image on Pinterest

Query Example

The example query (in this case Elasticsearch), displays a range utilising Principal Component Analysis (PCA). It then executes a re-ranking score using a function score (in line vector scoring) calling an associated vector library (cosine similarity function).

POST my_index/_search
{
“query”: {
“function_score”: {
“query”: {
“range”: {
“pca_reduced_vector”: {
“from”: “-0.5,-0.5,-0.5,-0.5,-0.5,-0.5,-0.5,-0.5”,
“to”: “0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5”
}
}
},
“functions”: [
{
“script_score”: {
“script”: {
“inline”: “vector_scoring”,
“lang”: “binary_vector_score”,
“params”: {
“vector_field”: “full_vector”,
“vector”: [ 0.0, 0.0716, 0.1761, 0.0, 0.0779, 0.0, 0.1382, 0.3729 ]
}
}
}
}
],
“boost_mode”: “replace”
}
},
“size”: 10
}

Want to explore more about dense vectors?

Vector Space Model Software

The following software packages may be of interest to you if you want to experiment with vector models and implement search services based upon them.

SEEK blog

At SEEK we’ve created a community of valued, talented, diverse individuals that really know their stuff. Enjoy our Product & Technical insights…

Graham Horne

Written by

Search Specialist @ seek.com.au, guitarist and wannabe singer (living the dream)

SEEK blog

SEEK blog

At SEEK we’ve created a community of valued, talented, diverse individuals that really know their stuff. Enjoy our Product & Technical insights…