Cross-Dataset Segmentation Models

Jonathan Striebel
Jun 4, 2019 · 2 min read

At scalable minds we work a lot with research labs in the Connectomics field and beyond to help them with automated image analysis and EM dataset reconstructions. We use state-of-the-art machine learning algorithms for automated dense reconstructions of neuron connections in brain tissue. Our alignment and segmentation tools are optimized to handle the terabyte-scale 3D EM datasets.

(left) Raw microscopy data from SegEM[1]. (right) Our
dense neuron segmentation method applied.

Usually, segmentation models require a high amount of training data. Gathering and annotating such datasets manually not only requires expert knowledge but also considerable resources and time. To minimize the data needed for a high-quality segmentation, we train our machine learning models on multiple annotated and available datasets, before fine-tuning on the actual training-data.

In preparation for the Connectomics Conference in Berlin, we validated this approach with two experiments presented here:

  1. How much training data is necessary to train a competitive model on the SegEM dataset [1]?
  2. Can we save on training data by fine-tuning a model trained on other datasets?

To evaluate the methods, we measure if a neuron was split into two segments, and if two neurons were merged wrongly. We optimized for a weighted error of splits and mergers, which compensates for the higher complexity of resolving merge errors.

Insight 1: 20M voxels training data suffice

Our first experiment (blue plot) indicates that 20M voxels of training data (10% available training data) are sufficient to reach acceptable performance. Adding further training provides incremental improvements.

Insight 2: Using pretraining 50–80% of data can be saved, 10M voxels suffice

By fine-tuning a model pretrained on other datasets (orange plot), already 10M voxels (5% of the SegEM training data) produce acceptable results, and 40M (20% of the SegEM training data) yield results comparable to the model trained on the full training dataset. For the same amount of SegEM training data the pretrained model consistently outperforms the pure SegEM version.

We are currently working on improving this method, so that our models perform well even on unseen datasets having no training data at all. Get in touch with us if you would like to know more about our segmentation services or would like apply it to your own datasets.

[1] M Berning et al. SegEM: Efficient Image Analysis for High-Resolution Connectomics. Cell. 2015

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store