Viewing a single comment thread. View all comments

-Rizhiy- t1_j2lkgn2 wrote

Look at papers dealing with multi-modal tasks. e.g. Perceiver/Perceiver IO by DeepMind

You can encode your data into tokens with the same size using something like an MLP. Then feed these tokens into decoder along with encoder tokens. Should probably also add an learnable embedding for different types of data to prevent signal confusion.

6