Viewing a single comment thread. View all comments

wind_dude t1_jdxrcpp wrote

>depend on the Alpaca dataset, which was generated from a GPT3 davinci model, and is subject to non-commercial use

Where do you get that? tatsu-lab/stanford_alpaca is apache 2.0, so you can use it for whatever.

​

for OpenAI

"""

(c) Restrictions. You may not (i) use the Services in a way that infringes, misappropriates or violates any person’s rights; (ii) reverse assemble, reverse compile, decompile, translate or otherwise attempt to discover the source code or underlying components of models, algorithms, and systems of the Services (except to the extent such restrictions are contrary to applicable law); (iii) use output from the Services to develop models that compete with OpenAI; (iv) except as permitted through the API...

"""

​

So as far as I'm concerned you are allowed to use the generated dataset for commercial purposes...

​

Only use might be the licensing on the llama models... but you can train another LLM

2

lazybottle t1_jec8i0c wrote

Alpaca is not Apache 2.0

https://huggingface.co/datasets/tatsu-lab/alpaca#licensing-information

> The dataset is available under the Creative Commons NonCommercial (CC BY-NC 4.0).

Edit: I see the source of confusion. https://github.com/tatsu-lab/stanford_alpaca

While the code is released under apache 2.0, the instruct dataset as pointed out by OP is not. One could potentially repro the steps, possibly with human ground truth, and release under a more amenable data license.

1