elnaqnely

elnaqnely t1_jdpty1s wrote

> accelerate the image processing tasks using a GPU

You can find some working code to do simple manipulations of images (scaling, flipping, cropping) on a GPU. Search for "gpu image augmentation".

> image dataset as some form of a database

With millions of images, the metadata alone may be difficult to navigate. I recommend storing the images/metadata on a good SSD (plus a backup), with the metadata in Parquet format, partitioned by categories that are meaningful to you. That will allow the metadata to be efficiently queried using Arrow or Spark, both of which have Python wrappers (pyarrow, pyspark).

For the images themselves, store them in a similar nested directory structure to match the metadata. This means your images will be grouped by the same meaningful attributes you chose to partition the metadata. Also, this will hopefully keep the number of images per directory from becoming too large. Doing that will allow you to browse thumbnails using whatever file browser comes with your operating system. To rapidly page through thousands of images, I found that the default Ubuntu image viewer, Eye of Gnome, works really well.

2