Building a multi-user platform for real-time, very large graph editing

Sven Dorkenwald
Princeton Systems Course
9 min readMay 15, 2018

This post is written by Zoe Ashwood, Sven Dorkenwald, and Thomas Macrina, and it serves as a final report on a class project for Princeton COS518: Advanced Computer Systems, Spring 2018.
Checkout our project on Github

Problem statement

We are members of the Seung Lab at the Princeton Neuroscience Institute, where we are use computer vision to map neural circuits. More specifically, we use 3D convolutional networks to segment neurons from very large 3D electron microscopy images in an effort to reconstruct connectomes.

Unfortunately, our segmentation algorithms are not perfect, and the neurons it produces require human editing to be corrected. The output of the segmentation algorithm is a set of neuron fragments along with a proposed clustering, so edits are adjusting the clustering of fragments. Said another way, our segmentation output is a graph where the nodes are fragments and the edges indicate that two fragments belong to the same neuron. So edits are just adding and removing edges to adjust the connected components of that graph.

Our current editing system only allows one user to make edits at a time, and there is no version control for users to access the state of the graph at earlier time points. We used concepts from class (operational transforms, locking, pub/sub) to make it so that multiple users can concurrently edit the graph (that users can see each other’s edits in real-time), and that users have the ability to access the graph from earlier states. While our project is merely a prototype system for a few users, it serves as a test for design decisions to see how we could build a platform that scales to supporting thousands of concurrent users, alongside hundreds of researchers running analyses on earlier states of the graph.

Background

We densely segment 3D electron microscopy images into distinct sets of voxels (3D pixels) that represent each neuron in the image. To do this, we first over-segment the image into sets of voxels that are less than the neurons we want them to ultimately represent. We call these small sets of voxels supervoxels, and they are immutable objects that should represent a fragment of only one neuron. The goal is to cluster these supervoxels together so that each cluster (connected component) represents a unique neuron.

To provide a backend for clustering procedures, we use a graph of our supervoxels, where vertices are the supervoxels and edges relate supervoxels to the same cluster. The edges are imperfect which can lead to two neurons being incorrectly merged as one neuron (see Figure 1a below), or to one neuron being split incorrectly into two neurons.

Figure 1. a: Two neuron branches are incorrectly merged together by previous methods in our segmentation process. b: Each color represents a supervoxel, the immutable components of our segmentation. c: By removing the incorrect edge between two of our supervoxels, we can cluster the supervoxels correctly into their two neuron branches.

To edit a particular clustering requires the ability to quickly query the graph for a cluster’s components, and then to quickly recompute the cluster (see Figure 1b & c). To support these fast queries and edits, we’ve developed a data structure called the “chunked graph”. The ultimate clustering procedure is to have humans proofread the graph. Because humans are slow and expensive, this is the rate limiting step in producing a correct segmentation.

Research demands tend to target a specific neuron or region of space, so while the graph is large, simultaneous edits will often occur in the same local area. The ability to see collaborative edits in real-time is required, so that human effort can be harnessed most effectively.

System Design

Figure 2. System Design diagram

Figures 2 and 3 show our system design. Each client application has a client interface (written in TypeScript) through which the client sends write and read requests to the master. The master resides on the cloud (GCE) and has exclusive read- and write-access to the graph database which is stored in distributed cloud storage (Google BigTable). It is accompanied by processing node(s) that fulfill specific tasks for the master, such as receiving HTTPS requests from the client, implementing graph edits or processing meshes (future work). Client interfaces subscribe to a pub/sub system to learn about changes to components of the graph. The master also implements a lock system in order to cope with conflicting writes; the client server should resend a graph edit if the master does not publish a successful write after a specified wait time.

The client provides a 3D view of the dataset (see Figure 4). When the client selects a supervoxel or moves the viewer, it sends a set of read requests to the master, each containing a supevoxel ID and the bounding cube of the viewer’s location. The master node identifies the connected component of the supervoxel, and returns the subgraph of the connected components that exists in that bounding cube. The subgraph is a set of weighted edges between supervoxels. The client uses this subgraph to determine which additional supervoxels to display, including 3D meshes. The client also uses its collection of requested subgraphs to make write requests called merge and split that add or remove edges.

Figure 3. Master Design diagram
Figure 4. To support the proofreading, we use a Google Maps-styled interface called Neuroglancer that allows users to view the segmentation as an overlay on the original image, as well as the 3D meshes for any neurons selected (Neuroglancer n.d.).

ChunkedGraph

Our application requires fast reads and edits to the connected components of a large, sparse graph. A typical graph that we currently are working with has 108 vertices and 109 edges, with connected components ranging from 102 to 106 vertices, and requires about 50GB of storage. In the medium-term, we will interface with graphs that have 1000 times as many vertices, edges, and components, and require 50TB of storage. Our vertices also have spatial labels.

It is important that we use the spatial labels of our vertices in querying and editing our connected components. To do this, our lab has created a variant of disjoint-set data structure: the chunked graph. This data structure partitions the graph into spatial chunks, and uses an octree to store a hierarchy of connected components (see Figure 4). As in an octree, this speeds up spatially contained queries significantly compared to a naive flat store.

Figure 5. Basic concepts behind the chunked graph. (1) Graph stored in an octree of spatial chunks. (2) Supervoxels are stored in the lowest-level (level 0) chunks. (3) A chunk only stores edges between objects within that chunk. Edges between objects in neighboring chunks are stored one level above in the octree. (4) Connected components are stored hierarchically from level 1 to N, where level N contains only one chunk.

In the chunks at the lowest-level are the immutable supervoxels. These chunks store only the edges between supervoxels within that chunk. Each supervoxel stores a pointer to an intermediary connected component that represents the set of supervoxels in a 2x2x2 set of chunks. This intermediary connected component is assigned a unique ID and is stored in a chunk at this next higher level in the octree.

Connected components in the next layer contain pointers to their children supervoxels (referenced by chunk ID and supervoxel ID) and pointers to their parent connected component in the next higher layer of the octree. We call this the ancestry of connected components.

User edits only add or remove edges between supervoxels. These edits require further updates to the ancestry of connected components, adding or removing edges between supervoxels and adding or removing connected components. This write amplification is the price paid for faster queries.

We also require the graph to be stateful. Reads must be able to request the state from a certain time point or at least checkpoint (e.g. weekly). This is important to support analysis based on the graph. Therefore, we engrain the version control into the graph database. Changes to the graph do not overwrite older data but only add new connected components and new edges, both tagged with a timestamp. The tag of a read then determines the set of vertices and edges that is returned.

Evaluation

For our tool to enable real-time collaboration between humans, the latency of communicating changes between clients is our most important metric. We measured the latency of our system by simulating client loads and varying the conditions of the simulations.

We conducted test runs of our system, each at least two minutes long, with between two and sixteen clients. We tested our system under two different conditions:

  1. high-read: We used one client to execute read + write commands, while varying the number of clients that executed read commands.
  2. high-write: We had every client execute read + write commands.

We measured the mean time it took for a write operation to propagate from the initiating client to a receiving client (Figure 6).

Figure 6. The top row shows the average latency for write (merge & split) and read (get_subgraph) operations that occurred during high-read simulations. The bottom row shows the same for high-write simulations. Both are broken down by the processes associated with the operation. Read operations do not propagate to a receiving client.

Discussion

Both our minimum write and read latencies are unsatisfactory, but our read latencies are particularly slow. This is particularly noticeable in the high-write simulation, where writes have slightly increasing latencies with the number of clients, but read latencies balloon. This helped us pinpoint some read inefficiencies that we intend to refine in future work:

  1. Our read operation (get_subgraph) attempted to find the supervoxels in the entire volume. This was done for test purposes, and should be restricted to significantly smaller subvolumes when put into production. This operation’s runtime is linear in the volume of the chunk, and we are considering making chunks 16x smaller. Since chunks load in parallel, this would further speed up the read of the same effective volume.
  2. To easily associate supervoxels with their particular chunk, supervoxels are relabeled according to a dataset-wide lookup table when the chunked graph is built. Any read operation requires translating that chunked graph ID to the original ID. This could be done asynchronously, or with a function that allows the client to easily perform the conversion.

We also need to explore improvements to the write operations. The largest opportunity is related to the conservative locking system that the master uses. The master checks all the historic IDs associated with a root to see if the ID is being operated upon before issuing a write operation. This could be shortened by limiting the search to only recent history, and potentially only the most recent root ID. We could do away with the search entirely if we moved all relabeling operations to the client, leaving the master to only decide the order of operations and to asynchronously check if a client operation was valid (Feldman et al. 2010).

Related Work

There are other applications for collaboratively tracing neurons, though they are designed for skeleton-based annotations. The applications represent concurrent user results separately, and will only share results between users after an explicit merge step where conflicts are raised and resolved. Neither process allows for real-time collaboration. Applications include:

Conclusion

We have built a prototype of a system that (1) enables humans to modify neuron segmentation in real-time, and (2) allows access to earlier states of the segmentation. We reimplemented the graph backend that supports these modifications, which allows users to quickly query for the connected component of a supervoxel within a given subvolume, and extended this data structure so that previous states could be easily accessed. We developed a master server that processes read and write requests from clients by maintaining a queue and locking system so that edits to the graph backend have a global order. We setup a pub/sub system so that the master server could broadcast to clients any changes made to the graph backend. We simulated behavior of two to sixteen clients to measure the latency of a change as it propagates from one client to another.

References

Boergens, Kevin M., Manuel Berning, Tom Bocklisch, Dominic Bräunlein, Florian Drawitsch, Johannes Frohnhofen, Tom Herold, et al. 2017. “webKnossos: Efficient Online 3D Data Annotation for Connectomics.” Nature Methods 14 (7): 691–94.

Feldman, Ariel J., William P. Zeller, Michael J. Freedman, and Edward W. Felten. 2010. “SPORC: Group Collaboration Using Untrusted Cloud Resources.” In OSDI, 10:337–50.

“KNOSSOS — 3D Image Annotation.” n.d. Accessed March 9, 2018. https://knossostool.org/.

Neuroglancer. n.d. Google. Accessed May 15, 2018. https://github.com/google/neuroglancer.

Saalfeld, Stephan, Albert Cardona, Volker Hartenstein, and Pavel Tomancak. 2009. “CATMAID: Collaborative Annotation Toolkit for Massive Amounts of Image Data.” Bioinformatics 25 (15): 1984–86.

--

--