Submitted by parabellum630 t3_z088fo in MachineLearning
yannbouteiller t1_ix4aesl wrote
We are also currently struggling to train a Transformer for 1D sequential data in the hope that this may eventually outperform our state-of-the-art model based on a mix of CNN, GRU and time-dilation. First, you need to be careful about what you use as positional encoding because in low-dimensional embeddings it can easily destroy your data. Then, according to the papers, dataset size will likely be a huge factor, in the sense that you will need a huge dataset, because Transformers might lack inductive bias compared to, e.g., GRUs and you need an enormous amount of data to compensate for that.
hadaev t1_ix5eduw wrote
Just replace gru with transformer and keep cnn as positional encoding.
Viewing a single comment thread. View all comments