Viewing a single comment thread. View all comments

rePAN6517 t1_jc585bd wrote

> If you're a game developer, do you want to dedicate the bulk of the user's VRAM/GPU time to text inference to... add some mildly dynamic textual descriptions to NPCs you encounter? Or would you rather use those resources to, y'know, actually render the game world?

When you're interacting with an NPC usually you're not moving around much and not paying attention to FPS either. LLM inference would only happen at interaction time and only for a brief second or so per interaction.

5

Jepacor t1_jc698s6 wrote

You can't just snap your fingers and instantly load and start up a multi GB LLM into VRAM while the game is running though.

3

zackline t1_jc69d50 wrote

I am not sure about it, but I have heard that it’s at the moment not possible to use CUDA while running a game because supposedly the GPU needs to enter a different mode or something like that.

If that should indeed be the case it might even be a hardware limitation that prevents this use case on current GPUs.

2