2muchnet42day t1_jcjsy5s wrote on March 17, 2023 at 10:39 AM

Reply to comment by farmingvillein in [D] What is the best way to fine tune a LLM with your own data and build a custom text classifier? by pgalgali

Why do you suggest Roberta and not something like LLAMA or Standford Alpaca?

farmingvillein t1_jckjsyr wrote on March 17, 2023 at 2:38 PM

Much more off-the-shelf right now (although that is changing rapidly)
No/minimal IP issues/concerns (although maybe OP doesn't care about that)

2muchnet42day t1_jckjy9i wrote on March 17, 2023 at 2:39 PM

Thank you

farmingvillein t1_jckm5r2 wrote on March 17, 2023 at 2:54 PM

Although note that OP does say that his data isn't labeled...and you of course need to label it for Roberta. So you're going to need to bootstrap that process via manual labeling or--ideally, if able--via an LLM labeling process.

If you go through the effort to set up an LLM labeling pipeline, you might just find that it is easier to use the LLM as a classifier, instead of fine-tuning yet another model (depending on cost, quality, etc. concerns).