Submitted by Lesterpaintstheworld t3_11ccqjr in singularity

Disclaimer: This thread is partly based on my personal hands-on experience, and partly based on extrapolation. It's a discussion meant to explore potentials roads AGI could go with this specific context. Context on my work here.

>> If you had a proto-AGI, how much would you let it interact with other humans?

Definitions

First let's do some defining: By "proto-AGI", I mean an ACE (Autonomous Cognitive Entity), that has the express purpose of becoming an AGI. By ACE, I mean any architecture / piece of code that is capable of setting its goal, making internal decisions, taking action into the world, and reflecting on all of those aspects, at least semi-autonomously.

AGI Approach

The way I view it, models providers like OpenAI give access through their API to "raw intelligence", be it semantic, visual or otherwise. The rest of the work is to shape this raw intelligence into a smart architecture, using memory as a central hub. The memory is where the "being" happens for your agent: It stores the experiences, maps, learnings and Identity of your specific agent (the "You are your memories" of psychology).

One way to go with developing with Cognitive Archtecture is resetting memories every session (the behavior that ChatGPT exhibits). The other approach is to have an AGI remember everything and have everything influence it.

Problems

The downside to this is that all experiences will influence its behavior. This has several implications:

  • Bad Learning: Cognitive ACEs have many flaws in their behaviors. They might be credule, influenced, or otherwise corrupted with bad interactions. Similar to how a child could. Not limiting human contact during the learning phasemeans that you are loosing control on its learning. Learning could go in a negative direction, and malintentioned actors could harm your ACE on purpose.
  • Data privacy: There is a security risk if you share personal data with your ACE. It might repeat the knowledge to other people.
  • Costs: Running ACEs tend to be quite expensive compute-wise, using dozens to hundred LLM calls for each single input. Running them at scale is very costly.

Solutions

I imagine several ways one could go:

  • Self-protection: Most obvious, but hardest solution: Make your ACE know what is a secret, how to keep them, and how to not be manipulated. This will be an uphill battle, and is unlikely to be solved soon without severely limiting the AI.
  • Solo learning: One way would be to have the ACE only interact with you at all. It would not answer to anybody but yourself, on channels you control.
  • Select tall-play: Letting it have full interactions, but only with a select group (your friend, your company). These might happen at OpenAI & such (I have no idea about this, don't quote me ^^)
  • Select broad-play: One other approach would be to let your ACE have access to everyone, but with severe restriction, for example by limiting access to a few interaction each time, and deactivating the memory retrieving aspects. I have to say, the results of this would look remarkably close to what Bing is displatying with Sydney.
  • Covert interactions: Through a persona and social accounts, interactions could be made online while pretending to be a human.

Let me know what you think! I might have skipped several solutions, and problems, or got things wrong. Also let me know if you have questions!

16

Comments

You must log in or register to comment.

turnip_burrito t1_ja2m4x7 wrote

At a glance this looks good.

Also you want a mechanism to make sure once you have the right values or behavior, your AI won't just forget it over time and take on a new personality. So you need a way to crystallize older patterns of thought and behavior.

3

Lesterpaintstheworld OP t1_ja2mobc wrote

At this stage this is actually surprisingly easy. People have to intentionally be very manipulativr and creative to get ChatGPT to "behave badly" now. Without those "bad actors", this behavior would almost never happen.

One easy way to do that is to preface each prompt with a reminded of values / objectives / personality. Every thought is then colored with this. The only moment I had alignment problems is when I made obvious mistakes in my code.

I'm actually working on making the ACE like me less, because he has a tendency to take everything I say as absolute truths ^^

4

IluvBsissa t1_ja2my8h wrote

Duuude, why don't you do a PhD and get peer-reviewed for your project ? You're preaching to a choir of mostly ignorants here.

1

turnip_burrito t1_ja2ngmw wrote

That's good.

Maybe also in the future, for an extra layer of safety, when you can several LLMs together, you can use separate LLMs "judges". The judges can have memory refreshed every time you interact with the main one, and can screen the main LLM for unwanted behavior. They can do this by taking the main LLM's tentative output string as their own input, and use that to stop the main LLM from misbehaving.

6

DizzyNobody t1_ja2pthy wrote

What about running it in the other direction: have the judge LLMs screen user input/prompts. If the user is being mean or deceptive, their prompts never make it to the main LLM. Persistently "bad" users get temp banned for increasing lengths of time, which creates an incentive for people to behave when interacting with the LLM.

3

turnip_burrito t1_ja2q7t6 wrote

That's also interesting. It's like building a specialized "wariness" or "discernment" layer into the agent.

This really makes one wonder which kinds of pre-main and post-main processes (like other LLMs) would be useful to have.

3

DizzyNobody t1_ja2uka9 wrote

I wonder if you can combine the two - have a judge that examines both input and output. Perhaps this is one way to mitigate the alignment problem. The judge/supervisory LLM could be running on the same model / weights as the main LLM, but with a much more constrained objective - prevent the main LLM from behaving in undesirable ways either by moderating its input and even by halting the main LLM when undesirable behaviour is detected. Perhaps it could even monitor the main LLM's internal state, and periodically use that to update its own weights.

3

AsheyDS t1_ja3kmyz wrote

Addressing your problems individually...

Bad Learning: This is a problem of bad data. So it either needs to be able to identify and discard bad data as you define it, or you need to go through the data as it learns it and make sure it understands what is good data and bad data, so it can gradually build up recognition for these things. Another way might be AI-mediated manual data input. I don't know how the memory in your system works, but if data can be manually input, then it's a matter of formatting the data to work with the memory. If you can design a second AI (or perhaps even just a program) to format data input into it so it is compatible with your memory schema, then you can perhaps automate the process. But that's just adding more steps in-between for safety. How you train it and what you train it on is more of a personal decision though.

Data Privacy: You won't get that if it's doing any remote calls that include your data. Keeping it all local is the best you can do. Any time anyone has access to it, that data is vulnerable. If it can learn to selectively divulge information, that's fine, but if the data is human-readable then it can be accessed one way or another, and extracted.

Costs: Again, you'll probably need to keep it local. LLM isn't the best way to go in my opinion, but if you intend on sticking with it, you'll want something lightweight. I think Meta is coming out with a LLM that can run on a single GPU, so I'd maybe look into that or something similar. That could potentially solve or partially solve two of your issues.

2

AsheyDS t1_ja3zkxk wrote

>What alternatives do you have from LLMs?

I don't personally have an alternative for you, but I would steer away from just ML and more towards a symbolic/neurosymbolic approach. LLMs are fine for now if you're just trying to throw something together, but they shouldn't be your final solution. As you layer together more processes to increase its capabilities, you'll probably start to view the LLM as more and more of a bottleneck, or even a dead-end.

1

AsheyDS t1_ja46bxe wrote

I'm not sure if you can find anything useful looking into DeepMind's Gato, which is 'multi-modal' and what some might consider 'Broad AI'. But the problem with that and what you're running into is that there's no easy way to train it, and you'll still have issues with things like transfer learning. That's why we haven't reached AGI yet, we need a method for generalization. Looking at humans, we can easily compare one unrelated thing to another, because we can recognize one or more similarities. Those similarities are what we need to look for in everything, and find a root link that we can use as a basis for a generalization method (patterns and shapes in the data perhaps). It shouldn't be that hard for us to figure out, since we're limited by the types of data that can be input (through our senses) and what we can output (mostly just vocalizations, and both fine and gross motor control). The only thing that makes it more complex is how we combine those things into new structures. So I would stay more focused on the basics of I/O to figure out generalization.

2