Mapping Rxnorm and NDC Codes to the National Institute of Health (NIH) Drug Brand Names with Spark NLP

RxNorm is a standardized nomenclature for clinical drugs and it provides a normalized naming system for medications that enables accurate communication and exchange of drug information between different healthcare information systems and applications.

Published in

John Snow Labs

5 min readMar 15, 2023

There are more than 200K codes and terms in the RxNorm domain so it is a large one. But drug names can vary widely depending on factors such as geographic location, manufacturer, and language. For example, a medication called acetaminophen may be sold under different brand names in different countries or by different manufacturers. Additionally, the same drug may be available in different dosage forms or strengths, further complicating drug identification and communication. Also, they can be in various formats like abbreviations (methotrexate-MTX), spelling differences (1 milligram-1mg), typos (Lipitor-Liptior), etc. So even if you can extract the entities from unstructured data correctly, you may not be able to map them to RxNorm codes directly and you need to do some pre-processing (spell-checking for drugs, drug normalization, casing control, etc.) on the extracted entities.

Spark NLP for Healthcare comes with 60+ different entity resolver models to support several clinical terminologies (RxNorm, ICD-10-CM, SNOMED, CPT, ATC, HPO, etc.). It also has a drug spell checker model and drug normalizer as a solution to the problems in drug-related tasks. You can check entity resolution benchmarks that we got against cloud providers in Comparison of Key Medical NLP Benchmarks — Spark NLP vs AWS, Google Cloud and Azure medium article.

Now Spark NLP for Healthcare has a new model to map clinical entities to RxNorm codes according to the NIH database in addition to all RxNorm resolver models, also another model for mapping NDC codes to drug brand names.

Implementing Entity Resolution for Mapping Rxnorm Codes According To the NIH Database

If you want to use sentence entity resolver models, first you need to extract the appropriate entities from clinical texts by using clinical NER models in Spark NLP. You can check Clinical Named Entity Recognition Notebook in Spark NLP Workshop Repo to see how these models can be used for clinical entity extraction. After getting the entities, you need to get the embeddings of them and feed to the related entity resolver model. We have a Clinical Entity Resolvers Notebook in the same repo that you can find how to implement entity resolver models for several clinical terminologies.

And now, a new sbiobertresolve_rxnorm_nih model is released in v4.3.1 that maps clinical entities and concepts (like drugs/ingredients) to RxNorm codes according to the NIH database using sbiobert_base_cased_mli Sentence Bert Embeddings. You can find an example of this model below.

Example:

...
rxnorm_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_rxnorm_nih","en", "clinical/models") \
     .setInputCols(["sbert_embeddings"]) \
     .setOutputCol("resolution")\
     .setDistanceFunction("EUCLIDEAN")

text= "She is given folic acid 1 mg daily , levothyroxine 0.1 mg and aspirin 81 mg daily ."

Result:

| ner_chunk            | entity |rxnorm_code | all_codes                               | resolutions                                                                      |
|:---------------------|:-------|-----------:|:----------------------------------------|:---------------------------------------------------------------------------------|
| folic acid 1 mg      | DRUG   |   12281181 | ['12281181', '12283696', '12270292', ...| ['folic acid 1 MG [folic acid 1 MG]', 'folic acid 1.1 MG [folic acid 1.1 MG]',...|
| levothyroxine 0.1 mg | DRUG   |   12275630 | ['12275630', '12275646', '12301585', ...| ['levothyroxine sodium 0.1 MG [levothyroxine sodium 0.1 MG]', 'levothyroxine  ...|
| aspirin 81 mg        | DRUG   |   12278696 | ['12278696', '12299811', '12298729', ...| ['aspirin 81 MG [aspirin 81 MG]', 'aspirin 81 MG [YSP Aspirin] [aspirin 81 MG ...|

We can also visualize the resolver model results by using EntityResolverVisualizer method in spark-nlp-display library.

sbiobertresolve_rxnorm_nih model results

Mapping NDC Codes to Drug Brand Names As Well As Clinical Entities (like drugs/ingredients) to Rxnorm Codes

NDC stands for National Drug Code, which is a unique numeric identifier assigned to drugs in the United States and used by the Food and Drug Administration (FDA) to track drugs and ensure their safety and efficacy. It is also used by healthcare providers, pharmacies, and insurance companies for drug billing and reimbursement purposes.

Spark NLP for Healthcare has 30+ chunk mapper models that were trained for several solutions like mapping clinical terminology codes each other, clinical abbreviation definition, drug action-treatments etc. and you can find detialed examples in Chunk Mapping Notebook. And we have two new chunk mapper models in addition to 30+ chunk mapper models in Spark NLP for Healthcare now.

ndc_drug_brandname_mapper model maps NDC codes with their corresponding drug brand names as well as RxNorm codes according to the National Institute of Health (NIH).

Example:

...
mapper = ChunkMapperModel.pretrained("ndc_drug_brandname_mapper", "en", "clinical/models")\
    .setInputCols("document")\
    .setOutputCol("mappings")\
    .setRels(["drug_brand_name"])

text= ["0009-4992", "57894-150"]

Result:

| ndc_code   | drug_brand_name   |
|:-----------|:------------------|
| 0009-4992  | ZYVOX             |
| 57894-150  | ZYTIGA            |

rxnorm_nih_mapper model maps entities with their corresponding RxNorm codes according to the National Institute of Health (NIH) database. It returns Rxnorm codes along with their NIH Rxnorm Term Types within a parenthesis.

Example:

...
chunkerMapper = ChunkMapperModel\
 .pretrained("rxnorm_nih_mapper", "en", "clinical/models")\
 .setInputCols(["ner_chunk"])\
 .setOutputCol("mappings")\
 .setRels(["rxnorm_code"])

Result:

+-------------------------+-------------+-----------+
|ner_chunk                |mappings     |relation   |
+-------------------------+-------------+-----------+
|Adapin 10 MG Oral Capsule|1911002 (SY) |rxnorm_code|
|acetohexamide            |12250421 (IN)|rxnorm_code|
|Parlodel                 |829 (BN)     |rxnorm_code|
+-------------------------+-------------+-----------+

All In One Pipeline

Lets build up a pipeline to show how NER models can be used with sbiobertresolve_rxnorm_nih resolver and rxnorm_nih_mapper chunk mapper models together.

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentenceDetectorDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", 'clinical/models') \
    .setInputCols("document") \
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols("sentence")\
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("word_embeddings")

clinical_ner = MedicalNerModel.pretrained("ner_posology_greedy", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "word_embeddings"]) \
    .setOutputCol("ner")

ner_converter_icd = NerConverterInternal() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("ner_chunk")\
    .setWhiteList(['DRUG'])\
    .setPreservePosition(False)

c2doc = Chunk2Doc()\
    .setInputCols("ner_chunk")\
    .setOutputCol("doc_ner_chunk") 

sbert_embedder = BertSentenceEmbeddings.pretrained('sbiobert_base_cased_mli', 'en','clinical/models')\
    .setInputCols("doc_ner_chunk")\
    .setOutputCol("sentence_embeddings")\
    .setCaseSensitive(False)
    
rxnorm_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_rxnorm_nih","en", "clinical/models") \
    .setInputCols(["sentence_embeddings"]) \
    .setOutputCol("rxnorm_code")\
    .setDistanceFunction("EUCLIDEAN")
    
chunkMapper = ChunkMapperModel.pretrained("rxnorm_nih_mapper", "en", "clinical/models")\
    .setInputCols(["ner_chunk"])\
    .setOutputCol("mappings")\
    .setRels(["rxnorm_code"])

resolver_pipeline = Pipeline(
    stages = [
        document_assembler,
        sentenceDetectorDL,
        tokenizer,
        word_embeddings,
        clinical_ner,
        ner_converter_icd,
        c2doc,
        sbert_embedder,
        rxnorm_resolver,
        chunkMapper
  ])

Conclusion

Entity mapping is a critical task in clinical NLP and Spark NLP for Healthcare is one of the most popular libraries for this. John Snow Labs is keeping up-to-date this library with new releases every two weeks. There will be new features and models in the upcoming releases, so keep following us!

Spark NLP for Healthcare models are licensed, so if you want to use these models, you can watch “Get a Free License For John Snow Labs NLP Libraries” video and request one from https://www.johnsnowlabs.com/install/.

You can follow us on medium and Linkedin to get further updates or join slack support channel to get instant technical support from the developers of Spark NLP. If you want to learn more about the library and start coding right away, please check our certification training notebooks.

Mapping Rxnorm and NDC Codes to the National Institute of Health (NIH) Drug Brand Names with Spark NLP

RxNorm is a standardized nomenclature for clinical drugs and it provides a normalized naming system for medications that enables accurate communication and exchange of drug information between different healthcare information systems and applications.

Implementing Entity Resolution for Mapping Rxnorm Codes According To the NIH Database

Mapping NDC Codes to Drug Brand Names As Well As Clinical Entities (like drugs/ingredients) to Rxnorm Codes

All In One Pipeline

Conclusion

Written by Muhammet S.