firejak308 t1_ja4e7rp wrote on February 26, 2023 at 7:32 PM

Reply to comment by blueSGL in [R] [P] New ways of breaking app-integrated LLMs with prompt injection by taken_every_username

Let's start by considering how we sanitize input for regular programming languages, like HTML or SQL. In both cases, we look for certain symbols that could be interpreted as code, such as < in HTML or ' in SQL and escape them to not-code, such as &lt; and \'.

So for LLMs, what kinds of things could be interpreted as "code"? Well, any text. Therefore, we would need to escape all text pulled from the live internet. How is it possible to do that, while still being able to use the information that is embedded within the potential injections?

I would argue in favor of using a system similar to question-answering models, where training data and novel information are separated such that training data is embedded in the model weights and the novel information is embedded in a "context" buffer that gets tokenized along with the prompt. Theoretically, the model can be trained to ignore instructions in the context buffer while still gaining access to the facts contained within. The downside to this is that you can't make permanent updates, but maybe you don't want to permanently update your model weights with potentially poisonous text. Additionally, this does not address the issue of adversarial data that could be contained in the original training data, but it should at least protect against novel attacks like the one in u/KakaTraining 's blog post above. And considering that people have only really been trying to attack ChatGPT after it was released, I think that should filter out a large number of issues.

firejak308 t1_ja16y0h wrote on February 26, 2023 at 1:56 AM

Reply to [P] [N] Democratizing the chatGPT technology through a Q&A game by coconautico

My main concern with this is how the "Reply as Assistant" texts are generated. That task is orders of magnitude more difficult than labeling an existing reply/prompt or coming up with a new prompt, because it often requires doing background research about the question and summarizing it effectively. If I were to actually try to fill out one of the Reply as Assistant tasks, I would much rather just copy-paste the Google Knowledge Panel or the Wikipedia summary or the ChatGPT output. How do we know that people aren't doing those kinds of things, which could introduce plagiarism concerns?

firejak308 t1_it2esjt wrote on October 20, 2022 at 1:22 PM

Reply to comment by the_javi_himself in [R] State of the art audio classification by the_javi_himself

If you go on PapersWithCode and click the GitHub icon for any of the papers, it'll link you to their public repo, which in many cases will have pretrained models available.

firejak308 t1_iqvqnii wrote on October 3, 2022 at 1:34 PM

Reply to comment by AristocraticOctopus in [D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention) by 029187

Thanks for this explanation! I've heard the general reasoning that "transformers have variable weights" before, but I didn't quite understand the significance of that until you provided the concrete example of relationships between x1 and x3 in one input, versus x1 and x2 in another input.