Cross-Dataset Segmentation Models

Jonathan Striebel
WEBKNOSSOS
Published in
2 min readJun 4, 2019

At scalable minds we work a lot with research labs in the Connectomics field and beyond to help them with automated image analysis and EM dataset reconstructions. We use state-of-the-art machine learning algorithms for automated dense reconstructions of neuron connections in brain tissue. Our alignment and segmentation tools are optimized to handle the terabyte-scale 3D EM datasets.

(left) Raw microscopy data from SegEM[1]. (right) Our
dense neuron segmentation method applied.

Usually, segmentation models require a high amount of training data. Gathering and annotating such datasets manually not only requires expert knowledge but also considerable resources and time. To minimize the data needed for a high-quality segmentation, we train our machine learning models on multiple annotated and available datasets, before fine-tuning on the actual training-data.

In preparation for the Connectomics Conference in Berlin, we validated this approach with two experiments presented here:

  1. How much training data is necessary to train a competitive model on the SegEM dataset [1]?
  2. Can we save on training data by fine-tuning a model trained on other datasets?

To evaluate the methods, we measure if a neuron was split into two segments, and if two neurons were merged wrongly. We optimized for a weighted error of splits and mergers, which compensates for the higher complexity of resolving merge errors.

Insight 1: 20M voxels training data suffice

Our first experiment (blue plot) indicates that 20M voxels of training data (10% available training data) are sufficient to reach acceptable performance. Adding further training provides incremental improvements.

Insight 2: Using pretraining 50–80% of data can be saved, 10M voxels suffice

By fine-tuning a model pretrained on other datasets (orange plot), already 10M voxels (5% of the SegEM training data) produce acceptable results, and 40M (20% of the SegEM training data) yield results comparable to the model trained on the full training dataset. For the same amount of SegEM training data the pretrained model consistently outperforms the pure SegEM version.

We are currently working on improving this method, so that our models perform well even on unseen datasets having no training data at all. Get in touch with us if you would like to know more about our segmentation services or would like apply it to your own datasets.

[1] M Berning et al. SegEM: Efficient Image Analysis for High-Resolution Connectomics. Cell. 2015

--

--