Finance NLP Releases bug fixes on deidentification pipelines

Version 1.18.0 fixes bugs on the pretrained models.

David Cecchini
John Snow Labs
2 min readSep 10, 2023

--

The latest version of the library fixed some relevant errors on the deidentification pipelines on financial documents. With the fixes, the library is fully compatible with newer versions of Spark.

Deidentification

Deidentification pipelines can be used to remove private or personal information from financial documents. It can be used to remove the information by masking it with entity labels, special characters, or obfuscating (changing with synthetic data). Use it with the PretrainedPipeline named finpipe_deid :

Obtaining:

Masking with entity labels:

Masking with special chars:

Masking with fixed-length chars:

Obfuscated:

Fancy trying?

We’ve got 30-days free licenses for you with technical support from our financial team of technical and SME. This trial includes complete access to more than 150 models, including Classification, NER, Relation Extraction, Similarity Search, Summarization, Sentiment Analysis, Question Answering, etc. and 50+ financial language models.

Just go to https://www.johnsnowlabs.com/install/ and follow the instructions!

Don’t forget to check our notebooks and demos.

How to run

Finance NLP is quite easy to run on both clusters and driver-only environments using johnsnowlabs library:

!pip install johnsnowlabs
from johnsnowlabs import nlp
nlp.install(force_browser=True)

Then we can import the Finance NLP module and start working with Spark.

from johnsnowlabs import finance
# Start Spark Session
spark = nlp.start()

For alternative installation methods of how to install in specific environments, please check the docs.

--

--