Viewing a single comment thread. View all comments

freaky1310 t1_iy91uxy wrote

TL;DR: Each model tries to solve the problems that affect the current state-of-the-art model.

Theoretically, yes. Practically, definitely not.

I’ll try to explain myself, please let me know if something I say is not clear. The whole point of training NNs is to find an approximator that could provide correct answers to our questions, given our data. The different architectures that have been designed through the years address different problems.

Namely, CNNs addressed the curse of dimensionality: using MLPs and similar architecture wouldn’t scale on “large” images (large means larger than 64x64) because the number of connections would increase exponentially on the number of neurons of each layer. Therefore, convolution has been found to provide a nice approximation of aggregated pixels (called “features” from now on) and CNNs were born.

After that, expressiveness has been a problem: for example, stacking too many convolutions would erase too much information on one side, and significantly decrease inference time on the other side. To address this, researchers have found recurrent units to be useful for retaining lost information and propagate it through the network. Et voilá, RNNs are born.

Long story short: each different type of architecture was born to solve the problems of another kind of models, while introducing new issues and limitations at the same time.

So, to go back to your first question: can NNs approximate everything? Not everything everything, but a “wide variety of interesting functions”. In practice, they can try to approximate everything that you will need to, even though some limitation will always stay there.

3