Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mmap for discontiguous data? #587

Closed
tlnagy opened this issue Feb 9, 2017 · 9 comments
Closed

Mmap for discontiguous data? #587

tlnagy opened this issue Feb 9, 2017 · 9 comments

Comments

@tlnagy
Copy link
Contributor

tlnagy commented Feb 9, 2017

I wasn't sure where to post this, but I'm hoping to add support for reading OME-TIFF files to Images.jl. I would also like to add mmap support as these files can get quite large, quite fast. I tried understanding NRRD.jl's usage of mmap to see how I can use that, but the issue I'm running into is how to use mmap for discontiguous array blocks. The general layout of OME-TIFFs is that data is stored in separate 2d XY arrays with labels specifying CZT information. @timholy do you have any suggestions on how to handle this with Images.jl's architecture/mmaping?

ref https://www.micro-manager.org/wiki/Micro-Manager_File_Formats#Image_file_stack_specification

@tlnagy
Copy link
Contributor Author

tlnagy commented Feb 9, 2017

Also, data might be split across separate files (due to the 4gb limit of TIFF). I know the file from where a specific XY plane is located. I guess it might be possible to do the loading lazily.

@timholy
Copy link
Member

timholy commented Feb 9, 2017

Awesome.

I haven't read that document carefully, but first impression is that you might need to create something a bit like https://github.com/tanmaykm/ChainedVectors.jl. (There may be other examples of similar packages around.) I don't know whether you can mmap the same file multiple times, or whether one giant mmap per file, or something else. It seems that if you can do one mmap per file, then the problem basically becomes one of computing indexing offsets; internally your AbstractArray type might need to maintain a vector of file offsets, one per 2d image, and then use sub2ind computations to figure out where to get the data from. I'd certainly parse the table on opening, so that access is pretty fast when you're actually using the image.

@tlnagy
Copy link
Contributor Author

tlnagy commented Feb 10, 2017

That's an interesting idea. How would that work with AxisArrays? Would it be a subtype of that or a completely separate thing?

@tlnagy
Copy link
Contributor Author

tlnagy commented Feb 10, 2017

Also, I'm going to rope @quinnj into this discussion since he was the main architect of the mmap redesign in JuliaLang/julia#11280

@timholy
Copy link
Member

timholy commented Feb 10, 2017

Completely separate. You could put the "chained array" inside an AxisArray wrapper, if you wanted. But shoot for completely orthogonal design.

I'd even advocate for doing this in two pieces (i.e, two packages):

  • a TIFF-OME parser
  • a "chained array" implementation, simply based on a backing array and a set of slice offset (if such a thing doesn't already exist in other packages)

The TIFF-OME parser would create and return the "chained array," possibly wrapped inside an AxisArray.

The advantage of the two-part split is reusability; it's not hard to imagine that other file formats (or other applications) might want to re-use that part.

@shashi
Copy link
Contributor

shashi commented Feb 10, 2017

For reference, I will link my Discourse answer about using Dagger.jl to do this here: https://discourse.julialang.org/t/mmapping-a-discontiguous-file/2016/2?u=shashi But this should give you the "chained array" bit of this puzzle. :) You might have to create a DistributedImage wrapper type to use it as an image though...

@timholy
Copy link
Member

timholy commented Mar 20, 2017

For anyone interested in TIFF-OME, see https://discourse.julialang.org/t/bioformats-in-julia/2440.

@tlnagy
Copy link
Contributor Author

tlnagy commented Mar 20, 2017

I'm still interested in writing a pure-julia OME-TIFF reader, I think it would be fine to have both.

@johnnychen94
Copy link
Member

@tlnagy could you help check if this issue is already solved?

@tlnagy tlnagy closed this as completed Mar 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants