Viewing a single comment thread. View all comments

mxby7e t1_jdjzkzy wrote

The use of OpenAI’s models for generating competing models violates the term of use, which is why the Stanford dataset is restricted.

17

__Maximum__ t1_jdkepie wrote

Also, it's very shady for a company called OpenAI. They claimed they became for profit because they needed the money to grow, but these restrictions just show that they are filthy liars and only care about keeping the power and making profit. I'm sure they already have a strategy going around that 30B cap, just like they planned stealing money and talent by calling themselves non-profit first.

17

throwaway2676 t1_jdl0y80 wrote

Alpaca was only trained on 50k instructions, right? A large group of grad students or a forum like reddit could construct that many manually in a couple weeks. I'm surprised they even had to resort to using ClosedAI

8

mxby7e t1_jdl18t6 wrote

Maybe, open assistant by Stability.ai is doing this type of manual dataset collection. The training data and the model weights are supposed to be released once training is complete

11

WarAndGeese t1_jdl5t0z wrote

Boo hoo to openai, people should do it anyway. Is the terms of service the only reason not to do it or are there actual material barriers? If it's a problem of money then as long as people know how much money it can be crowdfunded. If it's a matter of people power then there are already large volunteer networks. Or is it just something that isn't practical or feasible?

7

visarga t1_jdlpae7 wrote

OpenAI has first hand RLHF data. Alpaca has second hand. Wondering if third hand is good enough and free of any restrictions.

2

lexcess t1_jdlj8tf wrote

Classy, especially when they are breezing past any copyright of the datasets they are training off of. I wonder if they can legally enforce that without creating a potentially bad precedent for themselves. Or if it could be worked around if the training was indirect through something like Alpaca.

3

ebolathrowawayy t1_jdnc05i wrote

But what if you're training a model for a narrow use-case and don't intend for anyone to use it except for a niche set of users? Is that enough to be in the clear? Or is any use of OpenAI's model output to train a model for any purpose a no-no?

1

mxby7e t1_jdncs51 wrote

From my understanding its limited to no commercial use, so you can use it for what you need, but not commercially.

1