Ingesting All The Medications in America Every 7 Days

Tim Spann
Cloudera
Published in
6 min readDec 12, 2023

Apache NiFi to Read RSS REST Feeds the Smart Way!

Photo by Myriam Zilles on Unsplash

“How to Access DailyMed Data via XML, JSON, RSS REST Feeds / HTTP InvokeHTTP GET over SSL”

DailyMed provides a lot of drug related data, so let’s ingest some of the most interesting.

First feed — RSS Daily

SPL Processing

Input -> Get Set ID -> Get Lookup Details -> PublishKafkaRecord

Get Set ID -> Get Label Details -> Labels -> UpdateRecord -> PublishKafkaRecord -> RetryFlowFile

Get Label Details

UpdateRecord

Input -> Get Set ID -> Get Lookup Details -> UpdateRecord -> ExtractText -> Send Message Notification

Kafka Topics for DailyDrugNews

RSS Description

RSS Feed

https://dailymed.nlm.nih.gov/dailymed/rss.cfm

SupportingData

Label RSS

https://dailymed.nlm.nih.gov/dailymed/labelrss.cfm?setid=${setID}

Example Output

[
{"version":2.0,
"channel":
{"title":"DailyMed Drug Label Updates for TAZAROTENE CREAM [MAYNE PHARMA]",
"link":[
"https://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=0296f0e9-f940-45d9-987e-1ee26a7ca961&version=2",
{"rel":"self",
"href":"https://dailymed.nlm.nih.gov/dailymed/labelrss.cfm?setid=0296f0e9-f940-45d9-987e-1ee26a7ca961",
"type":"application/rss+xml"}],
"description":"\n\tDailyMed provides high quality information about marketed drugs.\n\tDrug labeling on this Web site is the most recent submitted to the Food and Drug Administration (FDA)\n\tand currently in use; it may include strengthened warnings undergoing FDA review and minor editorial changes.\n ","language":"en-us","pubDate":"Thu, 30 Nov 2023 00:00:00 EST","lastBuildDate":"Fri, 08 Dec 2023 14:20:40 EST",
"item":
{"title":"TAZAROTENE cream [Mayne Pharma]",
"description":"Updated Date: Thu, 30 Nov 2023 00:00:00 EST",
"link":"https://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=0296f0e9-f940-45d9-987e-1ee26a7ca961&version=2",
"pubDate":"Thu, 30 Nov 2023 00:00:00 EST",
"guid":
{"isPermaLink":true,"value":null}}},
"uuid":"fd6b929f-3832-4710-af0a-9fef9581ee79"}
]

HTML Page

Second flow, SPL.

Kafka Topics for DailyMedSPL

REST Ingest Meds

Read all the drug nails of the day in RSS.

Source:

https://dailymed.nlm.nih.gov/dailymed/services/v2/drugnames.json?pagesize=100

DataType
Is Data Updated?

We need to grab all the fields to for navigation such as page, url, next, total elements and total pages.

Batch

Download all labels

Format for once a month (monYYYY)

https://dailymed-data.nlm.nih.gov/public-release-files/dm_spl_monthly_update_nov2023.zip

Data

Grab up to 100 records then iterate to pages

pagesize=100&page=13

  • All APIs Web Services

https://dailymed.nlm.nih.gov/dailymed/app-support-web-services.cfm#restfulapi

UUNIS API

https://dailymed.nlm.nih.gov/dailymed/webservices-help/v2/uniis_api.cfm

https://dailymed.nlm.nih.gov/dailymed/services/v2/uniis.json

RXCUIS API

https://dailymed.nlm.nih.gov/dailymed/webservices-help/v2/rxcuis_api.cfm

https://dailymed.nlm.nih.gov/dailymed/services/v2/rxcuis.json?pagesize=100&page=2

Drug Names API

https://dailymed.nlm.nih.gov/dailymed/webservices-help/v2/drugnames_api.cfm

https://dailymed.nlm.nih.gov/dailymed/services/v2/drugnames.json

App #s API

https://dailymed.nlm.nih.gov/dailymed/webservices-help/v2/applicationnumbers_api.cfm

https://dailymed.nlm.nih.gov/dailymed/services/v2/applicationnumbers.json

https://dailymed.nlm.nih.gov/dailymed/services/v2/applicationnumbers.json?pagesize=100&page=13

Drug Classes API

https://dailymed.nlm.nih.gov/dailymed/webservices-help/v2/drugclasses_api.cfm

https://dailymed.nlm.nih.gov/dailymed/services/v2/drugclasses.json

SPLS API

https://dailymed.nlm.nih.gov/dailymed/webservices-help/v2/spls_api.cfm

https://dailymed.nlm.nih.gov/dailymed/services/v2/spls.json

NDCS API

https://dailymed.nlm.nih.gov/dailymed/webservices-help/v2/ndcs_api.cfm

https://dailymed.nlm.nih.gov/dailymed/services/v2/ndcs.json

Example Use Case

Download daily extracts from FTP and unzip.

Grab daily news from RSS to get what’s changed.

Use setid to get more data.

https://dailymed.nlm.nih.gov/dailymed/services/v2/spls.json?setid=9256d3b2-50eb-4091-bbcd-1982865fb998&pagesize=5000

Also Grab SPL https://dailymed.nlm.nih.gov/dailymed/services/v2/spls/9256d3b2-50eb-4091-bbcd-1982865fb998.xml

Grab SPL Media https://dailymed.nlm.nih.gov/dailymed/services/v2/spls/9256d3b2-50eb-4091-bbcd-1982865fb998/media.json This will produce data with URL to jpegs or other mime_types, download these.

https://dailymed.nlm.nih.gov/dailymed/image.cfm?setid=9256d3b2-50eb-4091-bbcd-1982865fb998&name=mm3.jpg

Get ndcs for it https://dailymed.nlm.nih.gov/dailymed/services/v2/spls/9256d3b2-50eb-4091-bbcd-1982865fb998/ndcs.json This one supports the next_page paradigm that we can use to navigate through many pages.

Get packaging for it

https://dailymed.nlm.nih.gov/dailymed/services/v2/spls/9256d3b2-50eb-4091-bbcd-1982865fb998/packaging.json

Get all spl version information

https://dailymed.nlm.nih.gov/dailymed/services/v2/spls/9256d3b2-50eb-4091-bbcd-1982865fb998/history.json

This one supports the next_page paradigm that we can use to navigate through many pages.

SOURCE CODE

Bing AI

--

--

Tim Spann
Cloudera

Principal Developer Advocate, Zilliz. Milvus, Attu, Towhee, GenAI, Big Data, IoT, Deep Learning, Streaming, Machine Learning. https://www.datainmotion.dev/