DeepNonseNse
DeepNonseNse t1_it6yta0 wrote
Reply to comment by dearnot in [D] Simple Questions Thread by AutoModerator
>to clarify. I have read it everywhere, including the official forums - that feature normalization is not required when training the decision trees model
All the XGBoost decision tree splits are in form of: [feature] >= [treshold], thus any order preserving normalization/transformation (log, sigmoid, z-scoring, min-max etc) won't have any impact on the results. But if the order is not preserved, creating new transformed features can be beneficial.
Without doing any transformations or changes to the modelling procedure, and training data containing years 2000-2014 and test 2015-2080, the predictions would be something similar to those values in 2014 as you originally suspected. There isn't any hidden built-in magic to do anything about data shift.
One common way to tackle this type of time series problems is to switch to autoregressive (type of) modelling. So, instead of just using raw stock prices directly, use yearly change percentages.
DeepNonseNse t1_iqvgzsk wrote
Reply to comment by ResourceResearch in [D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention) by 029187
But then again, that just lead to another question: why are deep(er) architectures better in the first place?
DeepNonseNse t1_izxxdf0 wrote
Reply to comment by gwern in [D] G. Hinton proposes FF – an alternative to Backprop by mrx-ai
As far as I can tell, the tweet just means that you can combine learnable layers with some blackbox compenents which are not adjusted/learned at all. I.e. model architecture could be something like layer_1 -> blackbox -> layer_2, where layer_i:s are locally optimized using typical gradient based algorithms and the blackbox is just doing some predefined calculations in-between.
So given that, I can't see how the blackbox aspect is really that usefull. If we initially can't tell what kind of values each layer is going to represent, it's going to be really difficult to come up with usefull blackboxes outside of maybe some simple normalization/sampling etc.