Comments

You must log in or register to comment.

No_Dust_9578 t1_j8vv85i wrote

Few things. Don't make a model from scratch, use a pre-trained one. There are plenty on hugging face. Another thing, later on, if you have your own data, you can use it to fine tune those models to better suit your task. This is a general approach to ML applications where data isn't available or not enough. Side note, speaking from experience, those large sentiment models that are out there do have great performance but some of them have been trained with large sentiment datasets that have inconsistencies. For instance, once I had to validate manually the performance on my data and noticed that the pre-trained models predicted the following sentence as POSITIVE sentiment but to a human, this is not positive: "oh yay, I love cold food...". So be careful and setup some sanity checks. Don't fully assume the predictions are accurate.

5

bubudumbdumb t1_j8wpmih wrote

Do you know how to validate a pricing signal, back testing and portfolio optimization? The NLP/ML part might be the easy one

1

bubudumbdumb t1_j8wtq42 wrote

https://www.investopedia.com/terms/b/backtesting.asp

https://en.m.wikipedia.org/wiki/Modern_portfolio_theory

With extreme synthesis :

markets are not stationary environments so you have to expect and mitigate drift. This have implications on the evaluation methodology and on the choice of time series models that can be calibrated with fewer data points.

A strategy to make money in the markets allocate capital on multiple financial instruments using multiple signals therefore the value of a signal is the predictive advantage that it provides when stacked on top of others commonly used signals. If the predictive capability of the news sentiment is easily replicated by a linear combination of cheaply available signals then it's not worth much.

1