Viewing a single comment thread. View all comments

vwings t1_iud9fb8 wrote

The best way is probably to use a feature encoding and plugging this into a Transformer. First sample: 200 features A and 5 features B. You encode this as set {[A feats, encoding for A]W_A, [B feats (possibly repeated), encoding for B]W_B]} Second sample with B and C features: {[C feats, encoding for C]W_C, [B feats (possibly repeated), encoding for B]W_B]}. The linear mappings W_A, W_B, and W_C must map to the same dimensions. The order of the feature groups does not play a role (permutation invariance of the transformer). Note that this also learns a feature or feature group embedding.

−2