pan_berbelek t1_jabthep wrote on February 28, 2023 at 8:18 AM

Reply to comment by Youngerdiogenes in and I wouldn't change this system for the world. by moonyou22

So could you explain what those Greek letters mean in relation to options? (I haven't yet traded any options, only shares)

pan_berbelek t1_j1cpkth wrote on December 23, 2022 at 9:15 AM

Reply to [D] When chatGPT stops being free: Run SOTA LLM in cloud by _underlines_

I'm trying to do basically the same thing and yes, running bloom does require a lot of memory. I managed to run it on:

ordinary computer with no GPU and 16GB of RAM, by loading parts of the model (divided to 73 parts) every time for every token. But this is painfully slow: 2-3 minutes per single token produced
a VM in Azure with no GPU but with lots of RAM (600+GB). This can generate a single token in 2-3 seconds, still way too slow for my usecase

Now I'm trying to run on a Azure VM with 8 A100 GPUs, as is recommended by Bloom authors, but this of course is significantly more expensive: the right sized VM costs $35 per hour. From what I read this setup could be capable in generating a single token in less than 1 millisecond, and if this is really true then this means this setup is actually the cheapest one for my usecase, despite high VM cost, but I need to validate first if I can really achieve this speed.

pan_berbelek t1_is6pz3i wrote on October 13, 2022 at 6:23 PM

Reply to [N] First RTX 4090 ML benchmarks by killver

Disappointing? Those results look great, the improvement is really good. What exactly were you expecting?