Ingesting All The Medications in America Every 7 Days

Published in

Cloudera

6 min readDec 12, 2023

--

Apache NiFi to Read RSS REST Feeds the Smart Way!

Photo by Myriam Zilles on Unsplash

“How to Access DailyMed Data via XML, JSON, RSS REST Feeds / HTTP InvokeHTTP GET over SSL”

DailyMed provides a lot of drug related data, so let’s ingest some of the most interesting.

First feed — RSS Daily

SPL Processing

Input -> Get Set ID -> Get Lookup Details -> PublishKafkaRecord

Get Set ID -> Get Label Details -> Labels -> UpdateRecord -> PublishKafkaRecord -> RetryFlowFile

Get Label Details

UpdateRecord

Input -> Get Set ID -> Get Lookup Details -> UpdateRecord -> ExtractText -> Send Message Notification

Kafka Topics for DailyDrugNews

RSS Description

DailyMed - RSS Updates

DailyMed will deliver notification of updates and additions to Drug Label information currently shown on this site…

dailymed.nlm.nih.gov

RSS Feed

https://dailymed.nlm.nih.gov/dailymed/rss.cfm

SupportingData

DailyMed - Mapping Files

These are pipe (|) delimited files relating SPL Set IDs with other information. Currently, there are three files: To…

dailymed.nlm.nih.gov

Label RSS

https://dailymed.nlm.nih.gov/dailymed/labelrss.cfm?setid=${setID}

Example Output

[
{"version":2.0,
"channel":
{"title":"DailyMed Drug Label Updates for TAZAROTENE CREAM [MAYNE PHARMA]",
"link":[
"https://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=0296f0e9-f940-45d9-987e-1ee26a7ca961&version=2",
{"rel":"self",
"href":"https://dailymed.nlm.nih.gov/dailymed/labelrss.cfm?setid=0296f0e9-f940-45d9-987e-1ee26a7ca961",
"type":"application/rss+xml"}],
"description":"\n\tDailyMed provides high quality information about marketed drugs.\n\tDrug labeling on this Web site is the most recent submitted to the Food and Drug Administration (FDA)\n\tand currently in use; it may include strengthened warnings undergoing FDA review and minor editorial changes.\n    ","language":"en-us","pubDate":"Thu, 30 Nov 2023 00:00:00 EST","lastBuildDate":"Fri, 08 Dec 2023 14:20:40 EST",
"item":
{"title":"TAZAROTENE cream [Mayne Pharma]",
"description":"Updated Date: Thu, 30 Nov 2023 00:00:00 EST",
"link":"https://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=0296f0e9-f940-45d9-987e-1ee26a7ca961&version=2",
"pubDate":"Thu, 30 Nov 2023 00:00:00 EST",
"guid":
{"isPermaLink":true,"value":null}}},
"uuid":"fd6b929f-3832-4710-af0a-9fef9581ee79"}
]

HTML Page

Second flow, SPL.

Kafka Topics for DailyMedSPL

REST Ingest Meds

Read all the drug nails of the day in RSS.

Source:

https://dailymed.nlm.nih.gov/dailymed/services/v2/drugnames.json?pagesize=100

DataType

Is Data Updated?

We need to grab all the fields to for navigation such as page, url, next, total elements and total pages.

Batch

Download all labels

DailyMed - Download All Drug Labels

Choose from daily, weekly or monthly periodic updates or full releases of all drug labels. Downloads are available as…

dailymed.nlm.nih.gov

Format for once a month (monYYYY)

https://dailymed-data.nlm.nih.gov/public-release-files/dm_spl_monthly_update_nov2023.zip

Data

Grab up to 100 records then iterate to pages

pagesize=100&page=13

All APIs Web Services

https://dailymed.nlm.nih.gov/dailymed/app-support-web-services.cfm#restfulapi

UUNIS API

https://dailymed.nlm.nih.gov/dailymed/webservices-help/v2/uniis_api.cfm

https://dailymed.nlm.nih.gov/dailymed/services/v2/uniis.json

RXCUIS API

https://dailymed.nlm.nih.gov/dailymed/webservices-help/v2/rxcuis_api.cfm

https://dailymed.nlm.nih.gov/dailymed/services/v2/rxcuis.json?pagesize=100&page=2

Drug Names API

https://dailymed.nlm.nih.gov/dailymed/webservices-help/v2/drugnames_api.cfm

https://dailymed.nlm.nih.gov/dailymed/services/v2/drugnames.json

App #s API

https://dailymed.nlm.nih.gov/dailymed/webservices-help/v2/applicationnumbers_api.cfm

https://dailymed.nlm.nih.gov/dailymed/services/v2/applicationnumbers.json

https://dailymed.nlm.nih.gov/dailymed/services/v2/applicationnumbers.json?pagesize=100&page=13

Drug Classes API

https://dailymed.nlm.nih.gov/dailymed/webservices-help/v2/drugclasses_api.cfm

https://dailymed.nlm.nih.gov/dailymed/services/v2/drugclasses.json

SPLS API

https://dailymed.nlm.nih.gov/dailymed/webservices-help/v2/spls_api.cfm

https://dailymed.nlm.nih.gov/dailymed/services/v2/spls.json

NDCS API

https://dailymed.nlm.nih.gov/dailymed/webservices-help/v2/ndcs_api.cfm

https://dailymed.nlm.nih.gov/dailymed/services/v2/ndcs.json

Example Use Case

Download daily extracts from FTP and unzip.

Grab daily news from RSS to get what’s changed.

Use setid to get more data.

https://dailymed.nlm.nih.gov/dailymed/services/v2/spls.json?setid=9256d3b2-50eb-4091-bbcd-1982865fb998&pagesize=5000

Also Grab SPL https://dailymed.nlm.nih.gov/dailymed/services/v2/spls/9256d3b2-50eb-4091-bbcd-1982865fb998.xml

Grab SPL Media https://dailymed.nlm.nih.gov/dailymed/services/v2/spls/9256d3b2-50eb-4091-bbcd-1982865fb998/media.json This will produce data with URL to jpegs or other mime_types, download these.

https://dailymed.nlm.nih.gov/dailymed/image.cfm?setid=9256d3b2-50eb-4091-bbcd-1982865fb998&name=mm3.jpg

Get ndcs for it https://dailymed.nlm.nih.gov/dailymed/services/v2/spls/9256d3b2-50eb-4091-bbcd-1982865fb998/ndcs.json This one supports the next_page paradigm that we can use to navigate through many pages.

Get packaging for it

https://dailymed.nlm.nih.gov/dailymed/services/v2/spls/9256d3b2-50eb-4091-bbcd-1982865fb998/packaging.json

Get all spl version information

https://dailymed.nlm.nih.gov/dailymed/services/v2/spls/9256d3b2-50eb-4091-bbcd-1982865fb998/history.json

This one supports the next_page paradigm that we can use to navigate through many pages.

DailyMed

Posted: September 15, 2021 The RxImage API will cease operation on December 31, 2021. All RxImage data are available…

dailymed.nlm.nih.gov

SOURCE CODE

ApacheConAtHome2020/flows/DailyMed at main · tspannhw/ApacheConAtHome2020

ApacheCon @Home Sept-Oct 2020 Materials. Contribute to tspannhw/ApacheConAtHome2020 development by creating an account…

github.com

Bing AI

Tim Spann

Written by Tim Spann

Writer for

Cloudera

Principal Developer Advocate, Zilliz. Milvus, Attu, Towhee, GenAI, Big Data, IoT, Deep Learning, Streaming, Machine Learning. https://www.datainmotion.dev/

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams