Viewing a single comment thread. View all comments

plocco-tocco t1_jdj9is4 wrote

It woulde be quite expensive to do tho. You have to do inference very fast with multiple images of your screen, don't know if it is even feasible.


ThirdMover t1_jdjf69i wrote

I am not sure. Exactly how does inference scale with the complexity of the input? The output would be very short, just enough tokens for the "move cursor to" command.


plocco-tocco t1_jdjx7qz wrote

The complexity of the input wouldn't change in this case since it's just a screen grab of the display. Just that you'd need to do inference at a certain frame rate to be able to detect the cursor, which isn't that cheap with GPT-4. Now, I'm not sure what the latency or cost would be, I'd need to get access to the API to answer it.