Submitted by coconautico t3_11c1hzc in MachineLearning
visarga t1_ja2r2fe wrote
Wouldn't it be better if people could donate their interactions with chatGPT, BingChat and other models? Make a scraping extension, it should collect chat logs and anonymise them. Then you got a diverse distribution of real life tasks.
I suspect this is the reason OpenAI and Bing offered their models for free to the public - to find the real distribution of tasks people want to solve with AI bots.
avocadoughnut t1_ja35pg6 wrote
There's risk of breaking OpenAI TOS by training on their models. It's a hard no for this project to ensure legal safety.
sebzim4500 t1_ja874jk wrote
Oh how the turntables.
coconautico OP t1_ja3nvs7 wrote
I have manually copy-pasted a few interesting questions (i.e, my input) that I have asked chatGPT previously, that encouraged lateral thinking or required specialized knowledge.
However, I'm not so sure it would a good idea to load thousands of questions indiscriminately, because just as we wouldn't express a question on Reddit in the same way we would in person, when we ask a question to chatGPT (or Google), we slightly modify the way we talk by taking into account the weaknesses of the system. And given that we are looking for a high-quality dataset of natural conversations, I don't think this would be a very good strategy in the short term.
Moreover, we also have to consider that the project prioritizes quality above all else, and unless the number of volunteers ranking questions/replies increases considerably, the "ratio of trees to ready exported" wouldn't increase much either.
LetterRip t1_ja3rzqk wrote
> I have manually copy-pasted a few interesting questions that I asked chatGPT and encouraged lateral thinking or required specialized knowledge. > >
Don't do that - it violates ChatGPT's TOS which could result in a lawsuit against the model developers.
coconautico OP t1_ja3ujgs wrote
According to OpenAI's terms of service, I'm the owner of the input (i.e., my question), which implies that they can use, modify, and distribute my input for the purpose of operating and improving the ChatGPT system, but they can't do anything to prevent me from using my data in other systems.
Link: https://openai.com/terms/
LetterRip t1_ja4d12c wrote
It appears they have changed the ToS. It used to restrict usage of output.
sebzim4500 t1_ja87cym wrote
> You may not [...] (iii) use the Services to develop foundation models or other large scale models that compete with OpenAI
coconautico OP t1_ja8abnh wrote
I can't use the output of ChatGPT to train other systems, but I can use my input however I want because, according to the TOS, I'm the owner of it.
sebzim4500 t1_ja8agwp wrote
Are you using the output of ChatGPT to determine which inputs you copy across and which ones you don't? If not, I agree that you are probably in the clear. Otherwise idk.
coconautico OP t1_ja8dbew wrote
No, I don't, because even if chatGPT could answer my question correctly, that doesn't mean that another assistant could.
Therefore, when I come up with a question that, from my point of view could be challenging to answer by a virtual assistant, and regardless of whether I have searched Google/Reddit/StackOverflow/ChatGPT/... for the answer, I end up typing it on OpenAssistant, (again, just my question).
Viewing a single comment thread. View all comments