The below listing is derived from a script that uploaded a multi-hundred gigabyte uncompressed image dataset off an external hard drive to Google Storage but it would also work with Amazon S3. It has the following helpful features:
The resumable upload feature works by writing the index of uploaded slices to disk by touching filenames in a newly created ./progress/
directory. You can easily reset the upload with rm -r ./progress
or avoid reuploading files by touching e.g. touch progress/5
which would avoid uploading z=5.
A curious feature of this script is that it uses ProcessPoolExecutor
as an independent multi-process runner rather than using CloudVolume's parallel=True
option. This is helpful because it helps parallelize the file reading and decoding step. ProcessPoolExecutor
is used instead of multiprocessing.Pool
as the original multiprocessing module hangs when a child process dies.
Please use the below Python3 code as a guide.
import os from concurrent.futures import ProcessPoolExecutor import numpy as np import tifffile from cloudvolume import CloudVolume from cloudvolume.lib import mkdir, touch info = CloudVolume.create_new_info( num_channels = 1, layer_type = 'image', # 'image' or 'segmentation' data_type = 'uint16', # can pick any popular uint encoding = 'raw', # see: https://github.com/seung-lab/cloud-volume/wiki/Compression-Choices resolution = [ 4, 4, 4 ], # X,Y,Z values in nanometers voxel_offset = [ 0, 0, 1 ], # values X,Y,Z values in voxels chunk_size = [ 1024, 1024, 1 ], # rechunk of image X,Y,Z in voxels volume_size = [ 8368, 2258, 12208 ], # X,Y,Z size in voxels ) # If you're using amazon or the local file system, you can replace 'gs' with 's3' or 'file' vol = CloudVolume('gs://bucket/dataset/layer', info=info) vol.provenance.description = "Description of Data" vol.provenance.owners = ['email_address_for_uploader/imager'] # list of contact email addresses vol.commit_info() # generates gs://bucket/dataset/layer/info json file vol.commit_provenance() # generates gs://bucket/dataset/layer/provenance json file direct = 'local/path/to/images' progress_dir = mkdir('progress/') # unlike os.mkdir doesn't crash on prexisting done_files = set([ int(z) for z in os.listdir(progress_dir) ]) all_files = set(range(vol.bounds.minpt.z, vol.bounds.maxpt.z + 1)) to_upload = [ int(z) for z in list(all_files.difference(done_files)) ] to_upload.sort() def process(z): img_name = 'brain_%06d.tif' % z print('Processing ', img_name) image = tifffile.imread(os.path.join(direct, img_name)) image = np.swapaxes(image, 0, 1) image = image[..., np.newaxis] vol[:,:, z] = image touch(os.path.join(progress_dir, str(z))) with ProcessPoolExecutor(max_workers=8) as executor: executor.map(process, to_upload)
To work with RGB data, set num_channels=3
, set data_type="uint8"
, and make sure that RGB
is the last axis on the image array.
To view an RGB image in neuroglancer, paste this code into the rendering box.
void main() { vec3 data = vec3(toNormalized(getDataValue(0)), toNormalized(getDataValue(1)), toNormalized(getDataValue(2))); emitRGB(data); }
Sharded images compact many files into a single randomly accessible file to reduce strain on the filesystem. You probably only need to worry about them once your data is in the multi-teravoxel range. To upload a sharded image, you have two options.
igneous xfer --sharded
to create a new sharded image. Then you delete the original upload.RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4