Skip to content

Galaxy Brain Topic: Subsystem Architecture

William Silversmith edited this page Jul 11, 2019 · 6 revisions

Motivation

As the tasks facing the connectiomics community grow in concert with technological capability, CloudVolume must be designed to grow with them. For instance, the Precomputed standard has expanded to include the Sharded format for images, meshes, and skeletons and both the simpler format and the more complex format must be provided somehow. Other groups have experimented with manifest-less on-demand meshing, and yet others are interested in designing plugins that interface with existing image servers. Still other imagined use cases envision combining data from different cloud services transparently (e.g. Meshes in BigTable, images in Google Storage). How may these disparate uses be accommodated?

Each different use case affords different reconfigurations of the package. An implementation of a new Precomputed format extension must automatically select which version to use and provide a library for each version. An on-demand meshing capability is mainly concerned with changing the mesh access subsystem, but would reuse other components. A remix object that wishes to combine different data sources would need to designate a different source for each subsystem. A plugin for a different image server must slot in seamlessly to CloudVolume and be selectable by the user. Potentially, the plugin should be able to be remixed into a different front-end so that users from the other system don't have to learn a new interface.

Design Outline

Given these design parameters, the solution we've devised consists of the bullets below:

  • Adopt the convention from neuroglancer to be able to specify the format before the service provider where appropriate: graphene://gs://.... If not specified, we will assume precomputed://. The protocol will select the correct implementation by assembling pre-defined combinations of modules as described below.
  • Break out the image, skeleton, and mesh subsystems into objects conforming to a standard interface so that new objects can be designed to replace them.
  • Convert the CloudVolume object into a front-end interface to back-end modules. This enables different modules to be recombined with culture-specific front-ends.
  • Move "info", "provenance", and associated metadata accessors into a Metadata object. The metadata object can be shared by all subsystems or independent metadata can be assigned to a specific subsystem to achieve transparent cross-service fusion. Other image servers can design their own metadata objects to provide a uniform interface.
  • Each format, including the original Precomputed, will be considered a plugin that must be registered.

Expounding on that last point, the architecture of CloudVolume has been essentially inverted. The CloudVolume class was hollowed out and __new__ has been overridden to return the appropriately configured frontend class (as of this writing either CloudVolumePrecomputed or CloudVolumeGraphene) based on the format. Each format is registered with a construction function that assembles the correct image source, mesh source, skeleton source, cache, and configuration options as applicable. It then feeds those objects into the frontend constructor and returns the built instance.

Precomputed Example

For example, here is how the standard Precomputed object is built:

# __init__.py
from .datasource.precomputed import register as register_precomputed
register_precomputed()
# in various places in cloudvolume.datasource.precomputed:

from ...cloudvolume import register_plugin

def register():
  register_plugin('precomputed', create_precomputed)
def create_precomputed(
    cloudpath, mip=0, bounded=True, autocrop=False,
    fill_missing=False, cache=False, compress_cache=None,
    cdn_cache=True, progress=False, info=None, provenance=None,
    compress=None, non_aligned_writes=False, parallel=1,
    delete_black_uploads=False, green_threads=False
  ):
    path = strict_extract(cloudpath)
    config = SharedConfiguration(
      cdn_cache=cdn_cache,
      compress=compress,
      green=green_threads,
      mip=mip,
      parallel=parallel,
      progress=progress,
    )

    cache = CacheService(
      cloudpath=(cache if type(cache) == str else cloudpath),
      enabled=bool(cache),
      config=config,
      compress=compress_cache,
    )

    meta = PrecomputedMetadata(
      cloudpath, cache=cache,
      info=info, provenance=provenance,
    )

    image = PrecomputedImageSource(
      config, meta, cache,
      autocrop=bool(autocrop),
      bounded=bool(bounded),
      non_aligned_writes=bool(non_aligned_writes),
      fill_missing=bool(fill_missing),
      delete_black_uploads=bool(delete_black_uploads),
    )

    mesh = PrecomputedMeshSource(meta, cache, config)
    skeleton = PrecomputedSkeletonSource(meta, cache, config)

    return CloudVolumePrecomputed(
      meta, cache, config, 
      image, mesh, skeleton,
      mip
    )

Follow this example for building your own plugin.

Levels of Plugin Support

Depending on how well this scheme is adopted, there will be a few different levels of support:

  1. Official Default - We design, build, and maintain the plugin and it and all of its dependencies are installed by default.
  2. Official Optional - We design, build, and maintain the plugin, but installation requires square braces with pip.
  3. Unofficial Optional - We include the necessary logic to support an optional pip installation but do not maintain the plugin.
  4. Endorsed Community Supported - We link to an appropriate project page with installation instructions and we might offer advice on integrating it if we are able.
  5. Non-Endorsed Community Supported - We defer all responsibility to the publisher, but it might very well be a best-in-class plugin.