Hamster729

Hamster729 t1_iwo4ma4 wrote

Okay. So, as I understand, your labels are usually either zero (before normalization), or negative, and, very rarely, they are positive.

With the abs, it's easy for the model to reproduce the "baseline" level, because it's still zero after normalization, and as long as the last Dense produces a large negative number, sigmoid turns that number into zero.

I think it would work even better if, instead of abs, you set all positive labels to zero, then normalize. (After normalization, the "baseline" level will become 1, also easy to reproduce).

In both cases, the model will work for data points that originally had negative or zero labels, but it won't work for data points with originally positive labels.

You have a problem without normalization, because the "baseline" level no longer 0 or 1 and your model needs to converge on that number. I think it would get there eventually, but you'll need more training, and probably learning rate decay (replace the constant learning rate with a tf.keras.optimizers.schedules.LearningRateSchedule object, and play with its settings.)

The question is, do you want, and do you expect to be able to, reproduce positive labels? Or are they just random noise? If you don't need to reproduce them, just set them to zero. If they are valid and you need to reproduce them, do more training.

P.S. There are other things you could try. Here's an easy one. Drop the abs, drop the normalization, and change the last layer to: model.add(Dense(1, activation=None, use_bias=False))

1

Hamster729 t1_iwmtbs9 wrote

It is not clear what you are doing, because your code does not match your plots. The model in your code outputs values in 0..1 range, but your plots have large positive and negative values. To help you, we would need to understand what exactly is going on. I want either the complete model or the physical significance of your data. Generally speaking, unless signs in your data have no significance (so e.g. a +5 and a -5 correspond to the same fundamental physical state), applying an abs to the data would only make the model perform worse.

1

Hamster729 t1_iv7swx8 wrote

Absolutely. In fact, you typically get more DL performance per $USD with AMD GPUs, than with NVIDIA.

However, there are caveats:

  1. The primary target scenario for ROCm is Linux + docker container + gfx9 server SKUs (Radeon Instinct MIxxx). The further you move from this optimal target, the more uncertain things become. You can install the whole thing directly into your Ubuntu system, or, if you really want to waste lots of time, to compile everything from source, but it is best to install just the kernel-mode driver, and then do "docker run --privileged" to pull a complete VM with every package already in place. I am not sure what the situation is with Windows support. Support of consumer grade GPUs usually comes with some delay. E.g. Navi 21 support was only "officially" added last winter. The new chips announced last week may not be officially supported for months after they hit the shelves.
  2. You occasionally run into third party packages that expect CUDA and only CUDA. I just had to go through the process of hacking pytorch3d (the visualization package from FB) because it had issues with it.
1