banmeyoucoward t1_jdhg7kt wrote

I'd bet that screen recordings + mouse clicks + keyboard inputs made their way into the training data too.


nmkd t1_jdhmgpm wrote

Nope, it's multimodal in terms of understanding language and images. It wasn't trained on mouse movement because that's neither language nor imagery.