Submitted by Imaginary_Carrot4092 t3_xudng9 in MachineLearning

I am trying to learn a certain data with Neural Networks and the loss decreases very steeply in the first 2 epochs and almost remains constant after that. I tried manipulating the hyperparameters but the loss pattern never changed. My dataset it quite large so there is no scarcity of data.

How do I identify the problem here and how can I conclude that my data cannot be learned at all ?

Data visualization

​

Loses

0

Comments

You must log in or register to comment.

dumpyact t1_iqv19d4 wrote

Have you tried using LR scheduler? I was able to reduce loss in similar situations with LR scheduler

2

PassionatePossum t1_iqv5fo9 wrote

Are we talking about training loss or validation loss? Because the training loss will almost always go gown and it means very little.

3

Imaginary_Carrot4092 OP t1_iqv9whz wrote

Do you mean to reduce the LR as training progresses ? But I already tried playing with the LR. It doesn't seem to change anything. Yes, the losses have a different magnitude but the pattern is the same.

0

ThrowThisShitAway10 t1_iqviiv2 wrote

What loss are you using? It seems to be around 0.1, yet in your image the predictions are clearly worse than 0.1 MAE. I'm guessing there's some bug in your code

1

PassionatePossum t1_iqvklvt wrote

Sorry, I still need a little more information. From the plots you have provided I would assume that you have a regression problem and you have a 1D input and a 1D target, correct? Or are we talking about a time series?

For the moment I'll go with assumption (1). The data you provided looks fairly random. I'm curious what function you want to use to model this? How does the network look like (how many layers) and what exactly are the inputs to your network (are they powers of your input variable or something else?)

1

sanderbaduk t1_iqvmins wrote

You have a single input and single output? It's likely to just learn something like the smoothed average, which is quite reasonable.
Also it seems your post is better in the subreddits mentioned under rule 4.

6

Imaginary_Carrot4092 OP t1_iqvqztc wrote

Yes your assumption 1 is exactly right. The network I am using is very simple with 2 hidden layers (I am not sure if this model is enough to learn this data). This is not a time series data.

The input is the number of hours it takes for a certain process to complete and the output is one of the process variables.

1

ThrowThisShitAway10 t1_iqvtrz7 wrote

Oh... then I'm not sure what you're expecting to learn. There doesn't appear to be much (if any) correlation between your input and output values. If you provide a 0.0 as input to the network, how is it supposed to predict an output? There's no indication whether the value should be 3.0 or 4.0, so it will always just predict around the mean.

This one input feature is pretty useless. The ideal model is just y=3.5 and doesn't include x at all. If you're able to provide more input features that actually correlate with the output, then you'll get an interesting model.

1

PassionatePossum t1_iqvz2n0 wrote

Yeah, this has no chance of working. Neural networks aren't magic. They are function approximators, nothing more. And an neuron can only learn a linear combination of its inputs.

Since you only have one input, the first layer will only be able to learn fractions of the original input. And the second layer will learn how to add them together. So some non-linearities (due to activations) aside, your model can essentially only learn to add fractions of the original input.

And while the universal approximation theorem says that theoretically this is enough to approximate any function if you make your network wide or deep enough, you have no guarantees that the solver will actually find the solution. And in practice, it won't.

A common trick is to use (1, x, x^2, ..., x^n) as input but I doubt that this will do the trick in your case. If there is a function that describes a relationship between your input variable and the output variable, it has to be a polynomial of extremely high degree.

If you have additional inputs you could use, it might help. But just looking at what you have provided, it is not going to work.

2

BrotherAmazing t1_iqwgcms wrote

You can’t tell if the data is “fairly random” or not just based on that plot though. Once a blue dot of finite size plots over another, a density of two dots nearly on top of one another will appear identically to the human eye as 10 or 100 or any N > 1 dots plotted almost entirely on top of another.

Unfortunately, OP doesn’t provide anything close to enough information for anyone here to truly be able to diagnose the problem (what is the theoretical relationship between these inputs/outputs?), or even if there is a problem; i.e., what would we estimate the Bayes Error Rate to be for this problem and what loss would that yield?

1

Tgs91 t1_iryxsqa wrote

A neural network is capable of learning non-linear relationships from a 1d input to a 1d output. The problem is that your data doesn't doesn't have any relationship between those variables. You need to find some input variables that are actually related to the output. A neural net can't approximate a relationship that doesn't exist

1