No_Dust_9578 t1_j8vv85i wrote on February 17, 2023 at 9:01 AM

#1,841,164

Few things. Don't make a model from scratch, use a pre-trained one. There are plenty on hugging face. Another thing, later on, if you have your own data, you can use it to fine tune those models to better suit your task. This is a general approach to ML applications where data isn't available or not enough. Side note, speaking from experience, those large sentiment models that are out there do have great performance but some of them have been trained with large sentiment datasets that have inconsistencies. For instance, once I had to validate manually the performance on my data and noticed that the pre-trained models predicted the following sentence as POSITIVE sentiment but to a human, this is not positive: "oh yay, I love cold food...". So be careful and setup some sanity checks. Don't fully assume the predictions are accurate.

justundertheblack OP t1_j8vvt1i wrote on February 17, 2023 at 9:09 AM

#1,841,206

Replying to No_Dust_9578 (#1,841,164)

Thanks for this man

justundertheblack OP t1_j8vvwgw wrote on February 17, 2023 at 9:10 AM

#1,841,211

Replying to No_Dust_9578 (#1,841,164)

btw this is a school project so we have to train our own model and we have the dataset for it too so do you know any good ones?

No_Dust_9578 t1_j8vw117 wrote on February 17, 2023 at 9:12 AM

#1,841,216

Replying to justundertheblack (#1,841,211)

Ah I see. In that case something like this should give you a good direction.

justundertheblack OP t1_j8vw3f2 wrote on February 17, 2023 at 9:13 AM

#1,841,221

Replying to No_Dust_9578 (#1,841,216)

this seems good I'll look into it

bubudumbdumb t1_j8wpmih wrote on February 17, 2023 at 2:26 PM

#1,843,382

Do you know how to validate a pricing signal, back testing and portfolio optimization? The NLP/ML part might be the easy one

justundertheblack OP t1_j8wpq8p wrote on February 17, 2023 at 2:26 PM

#1,843,395

Replying to bubudumbdumb (#1,843,382)

Naah I don't Can you point me towards some resources?

Hot_Initial7865 t1_j8wrxtj wrote on February 17, 2023 at 2:42 PM

#1,843,611

It sounds like a school assignment

justundertheblack OP t1_j8ws5u6 wrote on February 17, 2023 at 2:43 PM

#1,843,630

Replying to Hot_Initial7865 (#1,843,611)

Naah it's a college project 😂

justundertheblack OP t1_j8ws7px wrote on February 17, 2023 at 2:44 PM

#1,843,638

Replying to justundertheblack (#1,843,630)

Maybe I didn't describe it good enough

justundertheblack OP t1_j8ws9ep wrote on February 17, 2023 at 2:44 PM

#1,843,642

Replying to Hot_Initial7865 (#1,843,611)

help me out if you know things tho

bubudumbdumb t1_j8wtq42 wrote on February 17, 2023 at 2:54 PM

#1,843,762

Replying to justundertheblack (#1,843,395)

https://www.investopedia.com/terms/b/backtesting.asp

https://en.m.wikipedia.org/wiki/Modern_portfolio_theory

With extreme synthesis :

markets are not stationary environments so you have to expect and mitigate drift. This have implications on the evaluation methodology and on the choice of time series models that can be calibrated with fewer data points.

A strategy to make money in the markets allocate capital on multiple financial instruments using multiple signals therefore the value of a signal is the predictive advantage that it provides when stacked on top of others commonly used signals. If the predictive capability of the news sentiment is easily replicated by a linear combination of cheaply available signals then it's not worth much.