Cooper/Smith

Follow publication

Cooper/Smith uses hard data to improve the outcomes of health and development programs worldwide. www.coopersmith.org

Follow publication

Are current benchmarks sufficient? Assessing AI performance in Global Health applications

Cooper/Smith

Published in

Cooper/Smith

2 min readMar 22, 2024

While generative AI offers a possibility of revolutionizing healthcare planning and decision making in low- and low-middle income countries (LMIC), there are valid concerns over privacy, ethics, data security, and misuse of data.

While there are open source options for LLMs, their computing requirements exceed the resources available in most LMICs.

Further, these LLMs are useful in generalized settings, but can struggle with very specific use cases (e.g., HIV program planning in Malawi) or jargon and acronyms. This means bespoke solutions, including uptraining smaller LLMs for deployment in LMICs along with retrieval-augmented generation could thread the needle of feasibility while offering high performance to support decision makers with limited time and resources.

Currently, there is an apparent dearth of appropriate benchmarks for AI applications in global health and development. As we’ve pursued our own AI projects, we’ve been reviewing available benchmark datasets to inform our own strategy for assessing our AI endeavors.

We searched arXiv, Papers with Code, and Hugging Face, for papers on generative AI LLMs to catalog the benchmarks used by the biggest and most commonly used models. We further sought to supplement with snowball sampling, reviewing popular blog posts, and searching Github for additional options.

We briefly categorized each benchmark in terms of its type of assessment, such as reading comprehension, commonsense reasoning, and safety and truthfulness. We further labeled each in terms of its topic area, such as medical, statistics, and legal information.

After identifying nearly 60 benchmarks, we confirmed our earlier suspicion that there was no extant benchmark or dataset to adequately test LLM performance in global health settings.

This has prompted us to begin developing our own, bespoke benchmarks for AI applications in Africa on primary health care and HIV-specific use cases. We hope that these newly developed benchmarks will not only be useful for our own model deployments, but for others doing the same.

Please find a link to our repository here. We know our search was not completely exhaustive. With the rapid pace of AI advances, we hope that others may direct us to relevant benchmarks as well. Please let us know!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Global Health

Benchmarks And Trends

Published in Cooper/Smith

21 Followers

Last published Dec 16, 2024

Cooper/Smith uses hard data to improve the outcomes of health and development programs worldwide. www.coopersmith.org

Written by Cooper/Smith

70 Followers

206 Following

We use hard data to increase effectiveness and efficiency of health and development programs worldwide. www.coopersmith.org

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

More from Cooper/Smith and Cooper/Smith

JavaScript in Plain English

Fahadul Shadhin

Design a Beautiful Profile Card With CSS

Create a profile card for your website with hover effects using some simple CSS

Apr 11, 2021

How To Update Your Status During Standup Like a Senior Engineer

Better Programming

Edward Huang

How To Update Your Status During Standup Like a Senior Engineer

A status update is where you can showcase how well you manage ambiguity and is an important way to build trust with your team

Oct 20, 2022

Why I Prefer Regular Merge Commits Over Squash Commits

Better Programming

Dr. Derek Austin 🥳

Why I Prefer Regular Merge Commits Over Squash Commits

I used to think squash commits were so cool, and then I had to use them all day, every day. Here’s why you should avoid squash

Sep 30, 2022

Python in Plain English

Fahadul Shadhin

The MVT Design Pattern of Django

Understand the Model-View-Template architecture of a Django application

May 4, 2021

See all from Cooper/Smith

Recommended from Medium

How to Manage Django Settings for Local and Production Environments

Sagar Sangwan

How to Manage Django Settings for Local and Production Environments

Not a premium member? READ HERE

Feb 19

Why Companies Are Saying GoodBye to Next.js?

Dev Simplified

Neha Gupta

Why Companies Are Saying GoodBye to Next.js?

Are you using Next.js or planning to for your next project? Then you need to know this before making a decision!

6d ago

Celery Beat: The Best Solution for Background Tasks in Django + Memory Optimization Tips

Alfin Fanther

Celery Beat: The Best Solution for Background Tasks in Django + Memory Optimization Tips

Have you ever felt like your Django app is slowing down because it has to wait for heavy tasks like sending mass emails, processing large…

Mar 26

Breaking Up with settings.py: A Smarter Django Settings Structure (Not Env-Based!) 🎯

Django Unleashed

Sanjay Prajapati

Breaking Up with settings.py: A Smarter Django Settings Structure (Not Env-Based!) 🎯

Learn how to split your Django settings module-wise, not environment-wise, for better organization and scalability.

Apr 1

Paying for software is stupid ..Open-Source tools to Destroy Your SaaS Expenses

Dipanshu ‎

Paying for software is stupid ..Open-Source tools to Destroy Your SaaS Expenses

These 40 Open-Source Tools Will Make Your SaaS Subscriptions Look Obsolete

Mar 26

This new IDE from Google is an absolute game changer

Coding Beauty

Tari Ibaba

This new IDE from Google is an absolute game changer

This new IDE from Google is seriously revolutionary.

Mar 11

206

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech