Sitemap
Voxel51

News, tutorials, tips, and big ideas in computer vision and data-centric machine learning, from the company behind open source FiftyOne. Learn more at https://voxel51.com

Memes Are the VLM Benchmark We Deserve

10 min readFeb 20, 2025

--

A dataset of memes in FiftyOne format

Note: You can find the repo for this blog here.

!pip install fiftyone
!fiftyone plugins download https://github.com/harpreetsahota204/janus-vqa-fiftyone

!fiftyone plugins requirements @harpreetsahota/janus_vqa --install

!fiftyone plugins download https://github.com/harpreetsahota204/moondream2-plugin

!fiftyone plugins requirements @harpreetsahota/moondream2 --install
import os

os.environ['FIFTYONE_ALLOW_LEGACY_ORCHESTRATORS'] = 'true'
fo.launch_app(ml_memes_dataset)
Exploring the meme dataset in FiftyOne
import fiftyone.operators as foo

janus_vqa = foo.get_operator("@harpreetsahota/janus_vqa/janus_vqa")

moondream = foo.get_operator("@harpreetsahota/moondream2/moondream")

OCR

QUESTION = "What does the text on this image say? Respond only with the text on the image and nothing else."

await janus_vqa(
ml_memes_dataset,
model_path="deepseek-ai/Janus-Pro-1B",
question=QUESTION,
question_field="ocr_questions",
answer_field="janus_ocr",
delegate=True
)
await moondream(
ml_memes_dataset,
revision="2025-01-09",
operation="query",
output_field="moondream_ocr",
query_text=QUESTION,
delegate=True
)
fo.launch_app(ml_memes_dataset)
Exploring OCR outputs in FiftyOne

Meme understanding

MEME_UNDERSTANDING_QUESTION = """This image is a meme. Describe the scene of the meme,
its characters, what they are saying, and what the
target audience of this meme might find funny about it.
"""

await janus_vqa(
ml_memes_dataset,
model_path="deepseek-ai/Janus-Pro-1B",
question=MEME_UNDERSTANDING_QUESTION,
question_field="meme_understanding_question",
answer_field="janus_meme_understanding",
delegate=True
)
await moondream(
ml_memes_dataset,
revision="2025-01-09",
operation="query",
output_field="moondream_meme_understanding",
query_text=MEME_UNDERSTANDING_QUESTION,
delegate=True
)
fo.launch_app(ml_memes_dataset)
Assessing meme understanding outputs in FiftyOne

Can the VLMs find the attribution tag?

ATTR_QUESTION = """The creator of this meme has tagged themselves for self-attribution. 
Who can we attribute as the creator of this meme? Respond with just the authors name"""

await janus_vqa(
ml_memes_dataset,
model_path="deepseek-ai/Janus-Pro-1B",
question=ATTR_QUESTION,
question_field="attr_question",
answer_field="janus_attr",
delegate=True
)
await moondream(
ml_memes_dataset,
revision="2025-01-09",
operation="query",
output_field="moondream_attr",
query_text=ATTR_QUESTION,
delegate=True
)
fo.launch_app(ml_memes_dataset)
Assessing attribution finding outputs in FiftyOne

Meme Captioning

memes_dataset = fouh.load_from_hub(
"harpreetsahota/memes-dataset",
name="meme-captioning",
overwrite=True
)

uncaptioned_memes = memes_dataset.select_group_slices("template")

uncaptioned_memes = uncaptioned_memes.clone(name="vlm-captioned-memes")
fo.launch_app(uncaptioned_memes)
Exploring the uncaptioned meme dataset in FiftyOne
MEME_GENERATE = """This image is a meme. Write a caption for this meme 
related to deep learning and artificial intelligence.
Respond only with the caption and nothing else.
"""

await janus_vqa(
uncaptioned_memes,
model_path="deepseek-ai/Janus-Pro-1B",
question=MEME_GENERATE,
question_field="caption_prompt",
answer_field="janus_caption",
delegate=True
)
await moondream(
uncaptioned_memes,
revision="2025-01-09",
operation="query",
query_text=MEME_GENERATE,
output_field="moondream_caption",
delegate=True
)
fo.launch_app(uncaptioned_memes)
Assessing meme captions in FiftyOne

Conclusion

--

--

Voxel51
Voxel51

Published in Voxel51

News, tutorials, tips, and big ideas in computer vision and data-centric machine learning, from the company behind open source FiftyOne. Learn more at https://voxel51.com

Harpreet Sahota
Harpreet Sahota

Written by Harpreet Sahota

🤖 Generative AI Hacker | 👨🏽‍💻 AI Engineer | Hacker-in- Residence at Voxel 51

Responses (1)