Viewing a single comment thread. View all comments

nuthinbutneuralnet t1_j0rcwd1 wrote

If I have a large set of input features (1000s+) and most of them can be categorized into one of several feature groups (metadata, feature extractions A, feature extractions B, etc), is it always necessary for your neural network model architecture to reflect your feature groups? For example, is it better to have one flat fully connected layer of all of the features to allow for any type of cross-interactions as opposed to, let's say creating linear or embedding layers for each feature group before combining them together. What are the pros and cons of each? What is usually done in practice?

5

alkibijad t1_j14x08t wrote

This may not be the direct answer, but it's applicable to many problems:

  1. Use the simplest approach first. This would be creating a simple model, in this case flat fully connected layer.
  2. Measure the results.
  3. If the results aren't good enough, think about what could improve the results: different model architecture, training procedure, obtaining more data...
  4. Iterate (go to 2)

Also:

`creating linear or embedding layers for each feature group before combining them together` - this adds additional knowledge into the network, so it may help... but in theory the network should be able to find this out on its own - the combinations that don't have much sense will have weights close to zero - that's why I advise you to start without it (and try doing it without it).

​

1K+ features: in some cases this is a lot of features, in some it's not that big number... but it maybe makes sense to reduce the number of features, by using some of the dimension reduction techniques.

5