Milvus — The Unstructured Olympics of the Mind — AI? Data?

Tim Spann
4 min readAug 9, 2024

--

Milvus, Vector Database, Python, SDK, Zilliz, Paris, Olympics 2024, Data

The first set of data we want is the very information heavy, wikipedia page that has a lot of good information on the Paris Summer Olympics of 2024.

This is pretty easy. We grab the whole page vectorize it and take a summary as a varchar field as well as the title. We could grab all the olympics in Wikipedia if we want to expand and do some analytics. Things to think of when you are turning our current Bronze Demo into a potential Gold Demo.

I do like the idea of organizing a hackathon for building three levels of cool demos with Milvus and awesome open source AI tools. We have space in Princeton and virtually, so if you are interested comment or reach out.

Check out the Paris Summer Olympics 2024 website!

Okay so in this very simple demo, we use the Python wikipediaapi to read our page as english and HTML. This gives us a little bit of data. For the next level we should chunk, parse and pull out the pieces that will help feed an LLM of our choice. We will most likely run an open source model on OLLAMA. I have llama3 loaded locally, so I will probably use that.

Gold!!!!

We need the medals, unforunately can’t get a live feed. But I got a download from Kaggle (I’ll have to update this in the next round when we connect to some models and deep learning code). So this is in flux, more medals before the end.

So we load from a CSV and build a sentence to encode as well as store and add some filter fields and a good chunk of JSON for good measure.

It’s so easy and super fast to query.

Next up we’ll combine it with a model for Olympics fun.

SOURCE CODE

RESOURCES

https://data.paris2024.org/api/explore/v2.1/catalog/datasets/paris-2024-sites-de-competition/records?limit=20

REAL-WORLD EVENTS

Aug 13, 2024: Unstructured Data Meetup NYC

Aug 15, 2024: AI Camp NYC

Sept 24, 2024: Unstructured Data Meetup NYC

WEBINAR

Star Us On GitHub and Join Our Discord!

If you liked this blog post, consider starring Milvus on GitHub, and feel free to join our Discord! 💙

--

--

Tim Spann

Principal Developer Advocate, Zilliz. Milvus, GenAI, Big Data, IoT, Deep Learning, Streaming, Machine Learning, NiFi, Kafka. https://www.datainmotion.dev/