dborowiec10 t1_izch61l wrote on December 8, 2022 at 2:15 AM

How many and what kind of computational resources were involved in training CICERO? How long did the training take? If you have access to such information, could you elaborate in which region of the world the computation took place and what the energy/fuel mix was that powered the machines?

Given this excerpt from the github repo: "One can also instead pass launcher.local.use_local=true to run them on locally, e.g. on an individual 8-GPU-or-more GPU machine but training may be very slow", and "launcher.slurm.num_gpus=256", it seems as the resources were quite substantial.
It would be good to get some carbon accountability on this.

pyepyepie t1_izci4w8 wrote on December 8, 2022 at 2:22 AM

Not from the Meta team but you might want to take a look in the SM and search for "GPU"/"GPUs", they actually did a very nice job describing it (does not answer you question RE region but I thought it might be helpful, e.g. number of GPUs).

dborowiec10 t1_izckl91 wrote on December 8, 2022 at 2:41 AM

Thanks, that's a good point of reference. Seems like Nvidia V100s (volta)?
Would be interesting to see the total compute time involved.