How to Run Analytics on Medical Images Using Snowflake

Dicom image is a standard protocol for the management and transmission of medical images and related data and is used in many healthcare facilities. There is a lot of information stored in the Dicom images like patient name, patient birthdate, performing physician name, referring physician name, etc. Data analysts and data scientists use this information stored in Dicom images for analytics.

Until now, customers had to extract this data outside Snowflake and then load it into Snowflake to run analytics. To extract the data, customers often use custom-built python, spark, or java code and execute it on custom computing machines.

With the latest release of unstructured data in Snowflake (in public preview), customers no longer need to run a processing pipeline outside Snowflake. Now customers can directly store the Dicom images in Snowflake and also process them using the java function to extract data or run analytics on the fly.

Here are the steps needed to do so.

  1. Create a Stage (internal or external) where Dicom images will be stored.
  2. Create an inline java function to read the Dicom image and return the data from images.
  3. [Optional] Build a pipeline using streams and tasks for continuous ingestion of Dicom images.

The sample code for steps #1 and #2 can be found below.

-- Create an external stage to store the images
create or replace stage dicom_images url = 's3://sshah-demo/unstructured/dicom/'
directory = (enable = true auto_refresh = true)
storage_integration = sshah_demo_storage;
-- Refresh the directory
alter stage dicom_images refresh;
-- Select from directory
select * from directory(@dicom_images);
-- Create an inline java function to parse dicom image
create or replace function parse_dicom(file string)
returns string
language java
imports = ('@jars_stage/dcm4che-core-5.24.2.jar', '@jars_stage/log4j-1.2.17.jar',
'@jars_stage/slf4j-api-1.7.30.jar', '@jars_stage/slf4j-log4j12-1.7.30.jar',
'@jars_stage/gson-2.8.7.jar')
HANDLER = 'DicomParser.Parse'
as
$$
import org.dcm4che3.data.Attributes;
import org.dcm4che3.data.Tag;
import org.dcm4che3.io.DicomInputStream;
import org.xml.sax.SAXException;
import java.io.*;
import java.util.HashMap;
import java.util.Map;
import com.google.gson.Gson;public class DicomParser {
public static String Parse(InputStream stream) throws IOException {
String jsonStr = null;
try {
DicomInputStream dis = new DicomInputStream(stream);
DicomInputStream.IncludeBulkData includeBulkData = DicomInputStream.IncludeBulkData.URI;
dis.setIncludeBulkData(includeBulkData);
Attributes attrs = dis.readDataset(-1, -1);
Map<String, String> attributes = new HashMap<String, String>();
attributes.put("PerformingPhysicianName", attrs.getString(Tag.PerformingPhysicianName));
attributes.put("PatientName", attrs.getString(Tag.PatientName));
attributes.put("PatientBirthDate", attrs.getString(Tag.PatientBirthDate));
attributes.put("Manufacturer", attrs.getString(Tag.Manufacturer));
attributes.put("PatientID", attrs.getString(Tag.PatientID));
attributes.put("PatientSex", attrs.getString(Tag.PatientSex));
attributes.put("PatientWeight", attrs.getString(Tag.PatientWeight));
attributes.put("PatientPosition", attrs.getString(Tag.PatientPosition));
attributes.put("StudyID", attrs.getString(Tag.StudyID));
attributes.put("PhotometricInterpretation", attrs.getString(Tag.PhotometricInterpretation));
attributes.put("RequestedProcedureID", attrs.getString(Tag.RequestedProcedureID));
attributes.put("ProtocolName", attrs.getString(Tag.ProtocolName));
attributes.put("ImagingFrequency", attrs.getString(Tag.ImagingFrequency));
attributes.put("StudyDate", attrs.getString(Tag.StudyDate));
attributes.put("StudyTime", attrs.getString(Tag.StudyTime));
attributes.put("ContentDate", attrs.getString(Tag.ContentDate));
attributes.put("ContentTime", attrs.getString(Tag.ContentTime));
attributes.put("InstanceCreationDate", attrs.getString(Tag.InstanceCreationDate));
attributes.put("SpecificCharacterSet", attrs.getString(Tag.SpecificCharacterSet));
attributes.put("StudyDescription", attrs.getString(Tag.StudyDescription));
attributes.put("ReferringPhysicianName", attrs.getString(Tag.ReferringPhysicianName));
attributes.put("ImageType", attrs.getString(Tag.ImageType));
attributes.put("ImplementationVersionName", attrs.getString(Tag.ImplementationVersionName));
attributes.put("TransferSyntaxUID", attrs.getString(Tag.TransferSyntaxUID));
Gson gsonObj = new Gson();
jsonStr = gsonObj.toJson(attributes);
}
catch (Exception exception) {
System.out.println("Exception thrown :" + exception.toString());
throw exception;
}
return jsonStr;
}
}
$$
;
-- Test the java function
--select relative_path, file_url, parse_json(parse_dicom('@dicom_images/' || relative_path)) as data,
select
relative_path,
file_url,
parse_json(parse_dicom('@dicom_images/' || relative_path)) as data,
data:PatientName::string as PatientName,
data:PatientID::string as PatientID,
data:StudyDate::string as StudyDate,
data:StudyTime::string as StudyTime,
data:StudyDescription::string as StudyDescription,
data:ImageType::string as ImageType,
data:PhotometricInterpretation::string as PhotometricInterpretation,
data:Manufacturer::string as Manufacturer,
data:PatientPosition::string as PatientPosition,
data:PatientSex::string as PatientSex,
data:PerformingPhysicianName::string as PerformingPhysicianName,
data:ImagingFrequency::string as ImagingFrequency,
data:ProtocolName::string as ProtocolName
from directory(@dicom_images)
limit 5;

Using this simple java function, customers can now process Dicom images, extract data from the images and then run analytics on the data.

Note: This functionality is in limited private preview right now. If you are interested in trying out, please reach out to your Snowflake account team.

To try this code on your Snowflake account, you will need to download the required binaries and store them on a Snowflake stage (@jars_stage in my example).

'@jars_stage/dcm4che-core-5.24.2.jar', 
'@jars_stage/log4j-1.2.17.jar',
'@jars_stage/slf4j-api-1.7.30.jar',
'@jars_stage/slf4j-log4j12-1.7.30.jar',
'@jars_stage/gson-2.8.7.jar'

--

--

Saurin Shah
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Product @ Snowflake. Passionate about building products to help customers derive more value out of their data