Submitted by Business-Lead2679 t3_1271po7 in MachineLearning
wind_dude t1_jecbli5 wrote
What are the concerns with the release of the [shareGPT] dataset? I really hope it does get released, since it looks like shareGPT has shutdown api access, and even web access.
gmork_13 t1_jecj9vo wrote
It'll be filled with copies of people attempting weird jailbreaks haha
wind_dude t1_jedvs9b wrote
That’d actually be pretty cool to see, could train some classifiers pretty quick and pull some interesting stats on how people are using chatgpt.
Hoping someone publishes the dataset.
[deleted] t1_jedvdeo wrote
[deleted]
ZCEyPFOYr0MWyHDQJZO4 t1_jedz3ps wrote
I'm guessing there's some PII/questionable data that couldn't easily be filtered.
KerfuffleV2 t1_jecbxy7 wrote
It's based on Llama, so basically the same problem as anything based on Llama. From the repo "We plan to release the model weights by providing a version of delta weights that build on the original LLaMA weights, but we are still figuring out a proper way to do so." edit: Nevermind.
You will still probably need a way to get a hold of the original Llama weights (which isn't the hardest thing...)
wind_dude t1_jecct1i wrote
ahh, sorry, referring to the dataset pulled from shareGPT that was used for finetuning. Which shareGPT has disappeared since the media hype about google using it for BARD.
​
Yes, the llama weights are everywhere, including HF in converted form for hf transformers.
Viewing a single comment thread. View all comments