Viewing a single comment thread. View all comments

halixness t1_j9e80y1 wrote

So far I have tried BLOOM Petals (a distributed LLM), inference took me around 30s for a single prompt on a 8GB VRAM gpu, but not bad!

1