freaky1310

freaky1310 t1_iy91uxy wrote

TL;DR: Each model tries to solve the problems that affect the current state-of-the-art model.

Theoretically, yes. Practically, definitely not.

I’ll try to explain myself, please let me know if something I say is not clear. The whole point of training NNs is to find an approximator that could provide correct answers to our questions, given our data. The different architectures that have been designed through the years address different problems.

Namely, CNNs addressed the curse of dimensionality: using MLPs and similar architecture wouldn’t scale on “large” images (large means larger than 64x64) because the number of connections would increase exponentially on the number of neurons of each layer. Therefore, convolution has been found to provide a nice approximation of aggregated pixels (called “features” from now on) and CNNs were born.

After that, expressiveness has been a problem: for example, stacking too many convolutions would erase too much information on one side, and significantly decrease inference time on the other side. To address this, researchers have found recurrent units to be useful for retaining lost information and propagate it through the network. Et voilá, RNNs are born.

Long story short: each different type of architecture was born to solve the problems of another kind of models, while introducing new issues and limitations at the same time.

So, to go back to your first question: can NNs approximate everything? Not everything everything, but a “wide variety of interesting functions”. In practice, they can try to approximate everything that you will need to, even though some limitation will always stay there.

3

freaky1310 t1_iy7ielr wrote

Thanks for pointing out the article, it’s going to be useful for a lot of people.

Anyway, when we refer to the “black box” nature of DNNs we don’t mean “we don’t know what’s going on”, but rather “we know exactly what’s going on in theory, but there are so many simple calculations that it’s impossible for a human being to keep track of them”. Just think of a simple ConvNet for MNIST classification like AlexNet: it has ~62M parameters, meaning that all the simple calculations (gradients update and whatnot) are performed A LOT of times in a single backward pass.

Also, DNNs often work with a latent representation, which adds another layer of abstraction for the user: the “reasoning” part happens in a latent space that we don’t know anything about, except some of its properties (and again, if we make the calculations we actually do know exactly what it is, it’s just unfeasible to do them).

To address these points, several research projects have focused on network interpretability, that is, finding ways of making sense of NNs’ reasoning process. Here’s a review written in 2021 regarding this.

11