Submitted by deluded_soul t3_1181g88 in MachineLearning

I am looking into any techniques one could use for very large datasets in machine learning. So I am talking about datasets with the following properties:

1: 3D Imaging dataset where each dataset is of the order of many terabytes.

2: Each 3D image is too big to fit in the GPU or CPU memory.

I am interested in educating myself on methods that people have used in classical ML and modern deep learning for such extremely large datasets.

In particular, how does one ensure one can capture long-range spatial interactions in such datasets and what computational techniques can one do to perform learning on such datasets?

Finally, if someone can point me to some open source examples of such ML systems (domain is not important) that I can learn from, I would be extremely grateful.open-source

4

Comments

You must log in or register to comment.

the_architect_ai t1_j9h7but wrote

Use binning/ quantisation to reduce image size. Look into voxelisation.

Transformers can capture long range spatial interactions but computation is hefty. Might have to downsize first.

In ViT, tokenization is applied on patches. You might need a 3D CNN to extract voxel tokens.

There are many ways to reduce computational costs via attention-ing. In the paper Perceiver I/O by deepmind, a bottleneck cross attention layer is applied.

6

__lawless t1_j9eu7n2 wrote

r/learnmachinelearning

2

Insecure--Login t1_j9ibbc4 wrote

Sorry, this is a bit off-topic but what medical imaging datasets are u working with? I'm usually looking for those and you seem to be familiar with very large ones.

1

deluded_soul OP t1_j9iqtgs wrote

The dataset is more microscopy related and unfortunately I am not allowed to share :(

1