WhyHow.AI

WhyHow.AI’s platform helps devs and non-technical domain experts build Agentic and RAG-native knowledge graphs

Featured

Introducing PatientSeek, the first Open-Source MED-LEGAL Deepseek reasoning model

--

We are excited to introduce PatientSeek, an open source MED-LEGAL reasoning model trained on one of the largest accessible datasets of medical records, that can be run locally and securely.

We finetuned a DeepSeek R1 model on one of the largest accessible datasets of patient records, for the purpose of medical summaries and question answering. We specially preprocessed tens of thousands of patient records in a specific way that aligned to the needs of the MED-LEGAL space, and leveraged the reasoning capabilities of the suite of DeepSeek models to replicate how the discovery of correlation between patient records and external events can be performed. We leveraged CometML for dataset storage and experiment tracking, Unsloth and HuggingFace TRL for the finetuning, and AWS Sagemaker (funded by credits generously offered to us through the NVIDIA Inception startup program) to produce this model, which can be found here https://huggingface.co/whyhow-ai/PatientSeek

To run this model, it can be downloaded from https://huggingface.co/whyhow-ai/PatientSeek and used with the following instructions: ​​https://unsloth.ai/blog/deepseek-r1

Business Value of this Model

In the “MED-LEGAL” domain (defined as industries and workflows that touch both legal and healthcare considerations), two key things that we were looking to optimize for was improving the state of the art for:

  • Disease and Diagnosis identification
  • Hypothesis testing of correlation & cause and effect

The value and reason we built this model was because we realized that “MED-LEGAL” workflows typically had to make a number of correlations and associations, particularly around healthcare causation questions that need to stand up to legal standards, that were not present in traditional medical workflows, or traditional legal workflows. Given our team’s unique background in both law and healthcare, we have been helping to collect and pre-process this data with medical professionals for a while now, and the coincidence of strong open-source reasoning models like DeepSeek emerging comes at a fortuitous time.

A suite of models and agents that are orchestrated in a smart way to support repetitive tasks for these practitioners can now be done in collaboration with the necessary human expertise. For example, a quick patient history or a question about diabetes medication use can provide necessary context for a live patient conversation, and the associated reasoning in the model can highlight things that were not immediately obvious.

Why Now: With the release of DeepSeek r1, as well as the broader commercial acceptance and uptake of automated reasoning, we can start to use data to supervise the reasoning in the direction that we want. Further, the general capabilities of the models do not need to be extended, but more ‘honed’ to respond and reason in the ways that best support the practitioners. In this way, we can be confident that as we expand our suite of specific models and agents, they will be best tuned to the tasks required.

With this model, we were optimizing for a small enough model that can be run offline, locally, privately and securely, which are crucial for organizations that handle sensitive patient data. We include O1 as an apples-to-oranges benchmark for accuracy, and show that we have as good or better performance as O1 despite DeepSeek having 30x lower costs and an ability to be run in private, local environments.

Data Infrastructure as a key unlock for Performance

Finetuning is not about dumping random data into a model and calling it a day. DeepSeek’s existence and advancements are predicated on the idea of deliberate data structuring for better performance, and that is part of the attitude we take to creating models to solve business problems.

Rarely is data collected in the format for finetuning, and will need to be preprocessed to fit the format that reflects the business objective. Further, different model architectures and model types (instruct, SFT etc) require different formats (a great reference is Unsloths Datasets 101, here: https://docs.unsloth.ai/basics/datasets-101). As this was a finetuning of a reasoning model, we needed to have many examples of each, with consistent formatting and variety of answers. The value of this preprocessing cannot be understated, especially as data processing done right can keep organizations up to date with the latest model architectures.

We intend to build a suite of models that is designed to take advantage of the latest in reasoning developments, and adapt it to specific tasks and use-cases. These models will help in standard tasks like correlation analysis, medical knowledge graph creation, entity extraction, reasoning, action-taking, conversation, and many others, to power agentic architectures. Our training set did not contain PII and was created compliantly and commercially.

Model Evaluation

We benchmarked against popular general models that can be run locally and securely, and O1 which cannot be run locally or privately, but represents state of the art for those who care only about accuracy (and not costs or privacy).

Our evaluation demonstrates PatientSeek’s specialized capabilities across different medical tasks, showing particular strength in complex medical reasoning. While all models perform well on basic tasks like extracting patient demographics (with accuracies ranging from 89.7% to 97.8%), PatientSeek demonstrates increasing advantages as task complexity grows.

In basic clinical tasks such as condition detection and vital signs analysis, PatientSeek achieves ~90% accuracy, outperforming other open source models while maintaining an O1 level of performance. This advantage becomes even more pronounced in complex medical tasks like generating patient summaries and treatment planning, where PatientSeek maintains ~90% accuracy while other models show significant performance drops. The advantage over O1 and high performing open source models in complex tasks highlights PatientSeek’s specialized medical capabilities, achieved through focused training on medical documentation and clinical workflows, as well as specifically fine-tuned medical QA.

Complex Reasoning Tasks

Basic Tasks

These results suggest that while general-purpose language models can handle basic medical tasks adequately, specialized models like PatientSeek offer substantial benefits for more complex reasoning processes and medical applications. This is particularly relevant for providers seeking reliable automation of sophisticated medical documentation and analysis tasks.

Even for more basic tasks, PatientSeek was clearly superior to other locally run models, and competitive against O1, especially when we take into account costs and ability to run locally. As a comparison, DeepSeek R1 as an API is approximately 27x less than O1.

In our case, for PatientSeek, it was significantly cheaper. We have the DeepSeek model hosted on AWS, with each basic question (30k input, 2k output) costing <$0.01 and each more complex question still costing <$0.05, even with the verbose reasoning output of r1. We also have the system running on M2 Macs through Ollama, which is functionally free.

PatientSeek is the first open source, locally running, R1 reasoning model finetuned on patient records that is publicly available and has human level comprehension in the legal-medical domain. As we continue to develop models and build products that support MED-LEGAL workflows, and for the purposes of quickly understanding a patient’s history or making relevant associations between patient-specific cause and effect, we will update and adapt the latest models to the most relevant problems these practitioners have.

If this sounds like you, get in touch here or follow our work at WhyHow.AI

--

--

WhyHow.AI
WhyHow.AI

Published in WhyHow.AI

WhyHow.AI’s platform helps devs and non-technical domain experts build Agentic and RAG-native knowledge graphs

Responses (2)