Volara | E11 Bio

TLDR

We are releasing Volara, an open source python package to assist with processing large n-dimensional volumes.
Provides built-in functionality for affinity-graph-based neuron segmentation and other microscopy ops
Supports different compute contexts (local, SLURM, LSF)
Abstractions for common operations
Plug-in system for easy extension
Get started: Github | Tutorial

Background

E11 Bio is excited to release its first software package, Volara - An open-source Python library that facilitates the application of common block-wise operations for image processing of large volumetric microscopy datasets.

When working with large n-dimensional datasets, efficient and scalable processing is necessary. Processing in this context refers to any number of image processing tasks that must be distributed over many workers to transform/analyze a large volume. Complex image processing pipelines have generally been challenging to use by non-experts, thus limiting the accessibility of cutting-edge methods. Many great developments from the broader community have been made to address various challenges related to distributed processing. Dask, Chunkflow, and Daisy are a few examples of libraries that help scale various methods to larger volumes.

Over the years we have worked extensively with Daisy, a lightweight python package for processing large n-dimensional data, developed by the Funke lab at Janelia. This has proven to be an excellent framework for scaling novel methods to run on large volumes. We wanted to extend it by adding some common block-wise task abstractions to provide both users and developers with nice-to-have, common functionalities. This led to the creation of Volara along with significant contributions to Daisy itself to support the needs of Volara.

We developed Volara for neuron segmentation tasks in optical connectomics datasets created with methods such as PRISM, which we have discussed in more detail here. Volara therefore contains the necessary logic to extract segmentations from affinity graphs. However, at its core, Volara aims to create block-wise task abstractions and is thus easily extendable for other image processing pipelines.

Since it is built on top of Daisy, all the block management, parallelization, and resulting aggregation complexity of dividing tasks into manageable blocks is transparently handled. Volara also provides many nice-to-have features such as block checks, detailed progress bars, organized logging, and visualization of completed blocks. Volara makes it easy to run the same code in different compute contexts, which provides a nice path from implementing and debugging code locally to deploying it on an HPC cluster. It additionally allows for syntactically nice task chaining, and a plugin system for creating custom tasks. We hope that these features help to make block-wise processing of massive volumes easier, more robust, and more repeatable.

Architecture

This diagram visualizes the lifetime of a block in Volara. On the left we are reading array and/or graph data with optional padding for a specific block. This data is then processed, and written to the output on the right. For every block processed we also mark it done in a separate Zarr. Once each worker completes a block, it will fetch the next. This process continues until the full input dataset has been processed.

Key Features

1. Common Tasks

Volara comes with built-in support for common operations:

Supports next gen file formats: Volara supports Zarr and Ome-zarr style datasets

Supports lazy operations - By using Dask, Volara can perform many operations such as thresholding, normalizing, slicing and dtype conversions on-the-fly
Model Prediction: Run PyTorch machine learning models on large volumes. Currently supports any image to image torch model.

Examples: See this tutorial for blockwise torch predictions on data from the cremi challenge

raw = Raw(
    store="path/to/data.zarr/raw",
    channels=[3,4,5],
    scale_shift=(1/255, 0)
)

2. Microscopy-Specific Operations

Volara comes equipped with a suite of operations tailored to machine learning and computational tasks for microscopy:

Affinity Processing:
- Supervoxel extraction using Mutex Watershed.
- Compute within and across block aggregated edge costs between supervoxels.
- Global graph optimization using Mutex Watershed.
- Relabel fragments into segments based on a globally optimized lookup table.

Other Tasks:
- Perform local registration of a moving image to a fixed image
- Take an argmax over multi-channel volumes.

Examples: See this tutorial on blockwise post-processing some cremi predictions

3. Flexible Graph Support

Volara supports graph storage in databases like SQLite for quick and simple setups, or PostgreSQL for better performance in larger-scale projects.

Demonstrating graph read/write operations

You can take any networkx graph and save it to the database. You must define the node and edge attributes in advance, but a few common attributes are added by default. These include: postion, size, and filtered attributes on nodes, and a distance attribute on edges.

from volara.dbs import SQLite
import networkx as nx
from funlib.geometry import Roi

# Create the SQLite database
sqlite_db = SQLite(
    path="my_db.db",
    node_attrs={"t": "int", "embedding": 2},
    edge_attrs={"weight": "float"},
).open("w")

# Create the networkx graph
your_graph = nx.Graph()
your_graph.add_node(0, t=0, embedding=[0, 0], position=[1, 1, 1], size=2)
your_graph.add_node(1, t=1, embedding=[1, 1], position=[10, 10, 10], size=2)
your_graph.add_node(2, t=2, embedding=[2, 2], position=[10, 10, 10], size=2)
your_graph.add_edge(0, 1, weight=0.5)
your_graph.add_edge(1, 2, weight=0.5)

# Write the graph to the SQLite database
sqlite_db.write_graph(your_graph)

Once you have written a graph to the database, you can read it back out:

# Read the entire graph back from the SQLite database
retrieved_graph = sqlite_db.read_graph()
print(retrieved_graph)
for node, data in retrieved_graph.nodes(data=True):
    print(node, data)

This gives us:

Graph with 3 nodes and 2 edges
0 {'position': (1, 1, 1), 'size': 2, 'filtered': None, 't': 0, 'embedding': (0, 0)}
1 {'position': (10, 10, 10), 'size': 2, 'filtered': None, 't': 1, 'embedding': (1, 1)}
2 {'position': (10, 10, 10), 'size': 2, 'filtered': None, 't': 2, 'embedding': (2, 2)}

We can also fetch just a portion of the graph based on the spatial positions of the nodes. In this case, we have 1 node contained in our region query, the node at (1,1,1). It will be fetched, along with all adjacent edges.

# Read a portion of the graph back from the SQLite database
sub_roi = Roi((0, 0, 0), (3, 3, 3))
retrieved_subgraph = sqlite_db.read_graph(sub_roi)
print(retrieved_subgraph)
for node, data in retrieved_subgraph.nodes(data=True):
    print(node, data)

This query gives us:

Graph with 2 nodes and 1 edges
0 {'position': (1, 1, 1), 'size': 2, 'filtered': None, 't': 0, 'embedding': (0, 0)}
1 {}

4. Parallelized Blockwise Processing

Volara uses Daisy under the hood for efficient block-wise processing and task scheduling. It has support for running jobs both locally and on a cluster (e.g SLURM). This is handled by a simple configurable worker config which Volara then uses to distribute the workload while ensuring efficient resource utilization.

Demonstrating slurm worker configuration

from volara.workers import SlurmWorker
from volara_torch.blockwise import Predict

worker = SlurmWorker(queue="gpu", num_cpus=4, num_gpus=1)
task = Predict(
    checkpoint=...,
    in_data=...,
    out_data=...,
    worker_config=worker,
    )
task.run_blockwise()

Note that if you want to double check that blockwise processing is working as expected, you can simply execute with task.run_blockwise(multiprocessing=False) to run the exact same code, just in a single threaded, serial execution setting.

5. Progress Tracking and Visualization

Volara has a built-in progress bar which provides an estimated time to completion and detailed information about any failed blocks. Additionally, Volara tracks which blocks have been processed and allows for easy visualization of block progress overlaid on the volumes being processed.

Below is an example interactive Neuroglancer visualization showing the blocks required to process a given segment (right panel) when it is activated. Additionally we see the fragments and graph nodes/edges (colored by edge weights) in the middle two panels. Volara was used for each of these tasks.

6. Robustness to Failure

During processing of large volumes, it is common for blocks to fail for various reasons. If a specific block fails, it can be retried until before being marked as failed. If a specific worker dies, it can be restarted before the task is considered failed. If a job fails or is interrupted, on its next execution, Volara will quickly skip all previously completed blocks to continue finishing the volume. Throughout this entire process Volara maintains robust logs allowing for easier debugging of errors anywhere in the pipeline. Volara provides the functionality for marking blocks as done and then checking if blocks are done. Daisy will then print a nice execution summary showing that.

Example task summaries

After running a task you will see a summary:

Execution Summary
-----------------
Task test_0:

    num blocks : 36
    completed ✔: 36 (skipped 0)
    failed    ✗: 0
    orphaned  ∅: 0

    all blocks processed successfully

If you had a job that previously completed some blocks, you will see some number of blocks skipped:

Execution Summary
-----------------

Task test_0:

    num blocks : 36
    completed ✔: 36 (skipped 15)
    failed    ✗: 0
    orphaned  ∅: 0

    all blocks processed successfully

7. Chainable tasks

Daisy provides flexibility for chaining tasks together by representing each task as a node in a directed acyclic graph (DAG). When the final downstream task is called it will request any upstream task which cascades up to the starting task of the DAG. This task then begins processing, and when there is enough block context available the next task will start, until all tasks have been completed. This offers an extra layer of parallelization and is useful for long running tasks which require the output of previous tasks. We extended this in Volara by providing a nice syntax for task chaining using + and | operators, as shown below.

Visualizing chained task block processing

Multiple BlockwiseTask subclasses can be combined into a single pipeline via two operations: + for two tasks that must be run in order, and | for two tasks that can be run independently. Given a pipeline: pipeline = (task_a | task_b) + task_c , executing the pipeline with pipeline.run_blockwise() will allow task_a and task_b run simultaneously without affecting each other, but task_c will only process a block once enough blocks have been completed in both task_a and task_b .

The video below shows an example schematic demonstrating the block processing order of a simple two step pipeline e.g. pipeline = task_a_b + task_b_c where task_a_b reads from array A and writes to array B. Similarly task_b_c reads from array B and writes to array C.

Next we visualize real world example (extracting affinities and fragments from raw microscopy data). pipeline = predict_affs + extract_fragments

8. Plugin System for Custom Tasks

Volara has a built in plugin system that makes it easy to define custom blockwise tasks. With little overhead, a custom task can leverage all of Volara's features, including cluster job processing, progress tracking, task scheduling, and visualization.

Examples: See this tutorial on creating your own plugin and the Volara-torch library for an example of a plugin package

Performance

We conducted a performance test of an example supported Volara operation. This test does the following:

block-wise mutex watershed to create supervoxels and RAG nodes
reads affinities from zarr stored on s3, writes supervoxels back to zarr, writes nodes (supervoxel centers) to postgresql database
uses a block size of 160x160x160 voxels
runs async workers on SLURM
uses various numbers of AWS EC2 r5.xlarge workers ($0.252 per worker on demand hourly rate)
tests with following worker counts: 1, 4, 8, 16, 32, 64, 120

The optimal speedup is linear, and we are reaching that up to 120 workers. We expect further scaling to be relatively trivial, but do expect to, at some point, run into I/O and block communication bottlenecks. Assuming continued linear scaling, given the volume size of the full adult fly brain dataset, using 120 workers it would take ~1 week and cost ~5k for this step. Further improvements could likely be made when increasing to many more workers (e.g ~1k workers to process in ~1 day).

Limitations

Volara is particularly well suited to tasks that:
- have expensive startups (loading torch models, connecting to dbs, etc.)
- have cross block dependencies
- need to be run in parallel with other steps in a pipeline
There are other tools such as Dask, Chunkflow, and Cubed that might provide more granularity. These may be more robust and easier to use for tasks that do not fall into any of the above categories.
It is very likely to run into scaling issues when moving to petabyte sized datasets. A lot of these issues could likely be addressed in Volara when encountered, but will require development work. Chunkflow has already been demonstrated to scale to petabyte sized datasets.

Future steps

There are various extensions we would like to make to Volara in the future and we welcome any open source contributions to help improve it! Some ideas below.
More input/output data format support
- CloudVolume
- TensorStore
- tiff stacks
- different ML libraries
- different databases (e.g BossDB)
Expand number of supported operations
- global normalization / clahe
- scale pyramid generation
Light weight wrapper/interface to other block-wise processing paradigms (Chunkflow, Dask, etc.) so they can be used in Volara pipelines, addressing the cases that are better handled by other libraries.
Better CLI support
Library/Cookbook of common image processing tasks/pipelines

Getting started

Volara is available on PyPi and can be installed with pip install volara

For running inference with pre-trained pytorch models, you can also install volara-torch with pip install volara-torch

See the API reference to get started.

Additionally, there are several tutorials:

Also, see the Daisy Tutorial for a more in depth look at what else Daisy offers under the hood!

Acknowledgements

Since 2018, a lot of great work has been done in the Funke Lab to build open source tools for processing large n-dimensional volumes. This started with the creation of Daisy and was used for projects such as LSDs, Synful, and Linajea, among others. A lot of the common operation code is now in dedicated packages (e.g funlib.persistence for data interfacing & storage). Volara is hugely dependent on this work and the developers who worked on it (Jan Funke, Caroline Malin-Mayor, Tri Nguyen, and others). We hope that the abstractions/wrappers provided by Volara help users and developers to interface with and extend their own block-wise tasks.
Thanks to Claire Wang, Julia Lyudchik from E11 Bio, and Jakob Troidl and Vijay Venu from the scientific community for helpful feedback and testing.
There has been a ton of amazing progress in the field to make it easier to work with large microscopy volumes. Some example contributions include, but are not limited to:
- Data interfacing: Zarr, TensorStore, CloudVolume, BossDB, DVID
- Distributed processing: Dask, Chunkflow, Daisy, Igneous
- Visualization: Neuroglancer, BigDataViewer
- Proofreading: CAVE, NeuTu/Neu3, Paintera

Introducing Volara: an open-source package for processing volumetric microscopy data