jpg is a weird format to be storing microscopy images, no? Usually end up in som...

SapphireSun · on July 14, 2016

So the weird thing is that we're not only a lab, but also a web tool. We have the files backed up in a standard format in one place, but delivering 1024x1024x128 cubes of images over the internet has been tricky. We don't need people to always view them at full fidelity, just good enough.

We tried JPEG2000, which was better quality per a file size, but the web worker decoder was slower than the JPEG one adding seconds to the total download/decode time.

EDIT: We're currently doing 256x256x256 (equivalent to a 4k image) on eyewire.org. We're speeding things up to handle bigger 3D images.

EDIT2: If you check out Eyewire right now, you might notice some slowdown when you load cubes, that's because we're decoding on the main thread. We'll be changing that up next week.

bnolsen · on July 15, 2016

Yeah jpeg2k sucks. It doesn't seem to do anything very well. Design by committee ruined it by making it waay to complex.

e12e · on July 14, 2016

The size difference between "high quality" jpeg and lossless compressed TIFFs can be quite significant - so I it might not be feasible to use eg. TIFF for archiving. And jpegs might very well be "good enough".

Eg. just tested on a small 10MP Sony ARW raw image - the raw file is 7MB, the camera jpeg is 2.3MB, and a tiff compressed with LZW is 20MB (uncompressed 29MB). The raw tiff run through lzma is 9.3MB. But either way, if ~7MB is the likely lossless size, if the JPEG is good enough, at ~2.3MB it's a pretty big difference, if we're talking petabytes, not megabytes.

(I'll get around to testing lepton on the jpegs shortly)

eon1 · on July 14, 2016

How on earth did a 7MB raw blow out to a 20MB TIFF? Isn't that going from uncompressed RGBG to losslessly compressed RGB?

e12e · on July 15, 2016

DNGs and RAWs aren't generally (AFAIK) uncompressed. But ideally they're losslessly compressed. They're all(?) "TIFF files" - but AFAIK saying something is a valid TIFF, is almost as helpful as saying something is "a file".

Apparently Sony uses some kind of lossy compression for it's files - I just tested with a jpeg2000 encoder on the same file above, and the size of the j2k file is approximately the same as the ARW: 7MB. Btw, the lep-file is 1.7MB.

Note that the uncompressed (flat) PPM file is 29MB as is the uncompressed TIFF - but simply running the TIFF through lzma reduces the size to 9.3MB. So ~7MB isn't that far off.

[ed: And while the lep-file was 1.7MB, shaving a bit off the original jpg, mozjpeg with defaults+baseline created a jpeg (at q=75, per default) 472k in size. Lep managed to shave a bit off that too - ending up with PPM->mozjpeg->lepton resulting in a 359K file. The (standard) progressive mozjpeg ended up at 464K.

This is not quite apples to apples, though, I think the comparable quality setting for mozjpeg would probably be 90 to 95 or so -- ending up around 1.6MB. But for this particular (rather crappy) image - I couldn't readily tell any difference.

Which I suppose is where https://github.com/danielgtaylor/jpeg-archive comes in.]

CarVac · on July 15, 2016

Raw files store only one channel per pixel. A TIFF has been demosaiced and stores 3 channels per pixel.

On top of that, most sensible raw formats only store 12 or 14 bits per pixel, instead of 16.

And then most are compressed, some losslessly and some lossily (like the infamous Sony format that packs it down to an average of 8 bits per pixel but does exhibit artifacts).

rleigh · on July 17, 2016

Some TIFF-derived container is the most common representation. But note even these could use JPEG/J2K if desired.

Most microscopy images are stored uncompressed or with lossless compression. But unfortunately this doesn't scale with newer imaging modalities. Here's two examples:

Digital histopathology. Whole-slide scanners can create huge images e.g. 200000x200000 and larger. These are stored using e.g. JPEG or J2K in a tiled BigTIFF container, with multiple resolution levels. Or JPEG-XR. When each image is multiple gigabytes, lossless compression doesn't scale.

SPIM involves imaging a 3D volume by rotating the sample and imaging it from multiple angles and directions. The raw data can be multiple terabytes per image and is both sparse and full of redundant information. The viewable post-processed 3D image volume is vastly smaller, but also still sparse.

For more standard images such as confocal or brightfield or epifluorescence CCD, lossless storage is certainly the norm. You don't really want to perform precise quantitative measurements with poor quality data.