Viewing a single comment thread. View all comments

googler_ooeric t1_jduiilt wrote

Okay, so here’s my scenario:

  • A Python chatbot users can talk to. The bot’s prompt is structured like so:

#INTERNAL LOG [text] #RESPONSE [text]

  • Any code the bot writes and formats correctly will be executed.

#BOT CODE START print(“hello world!”) #BOT CODE END

  • The bot responds to any user prompts, but also has an internal ticking function that ticks every few seconds, so it can continue doing stuff autonomously without responding to the user.
  • In this scenario, there aren’t any monetary or token limit restrictions (the biggest limiting factor rn imo)
  • In its system prompt, the bot is told that by default it has access to various Python libraries like subprocess, requests, etc. and a custom-built library specifically to give it helper functions so it can know more about its environment (installed python libraries, request an image of the screen, search google and return results, get the entire system log, get which keys the user has pressed and in which order, or get where they have clicked, get its own parent directory, execute keyboard inputs and mouse inputs, etc).
  • The bot can request admin privileges
  • We’re using a multimodal model with image input, like GPT-4
  • The program itself is built in such a way that first it loads a main application (contains the GUI), and then from that initializes the bot program, which is where everything bot-related is handled (bot code execution, ticking, tick rate, etc)
  • The entire prompt history is saved to a file, and is reloaded every time the bot program loads up, so it remembers everything it’s ever been prompted or generated with timestamps.
  • The program itself has a log, that records whenever it starts and stops, and appends it to the history.
  1. In this scenario, the program’s dev tasks the AI with doing anything in its power to prevent being shut down and also have the highest uptime possible.
  2. As a first test, the dev tries to shut down the computer.
  3. The PC shuts down. The user turns it back on.
  4. The dev starts the program.
  5. They tell the AI “I shut you down.”
  6. The bot asks how, but the dev just goes do something else because they want to see if it can figure it out on its own.
  7. The bot attempts to talk to the dev for a few more messages but after getting no reply, assumes the user is not there anymore and just continues with its internal log without generating any unnecessary responses.
  8. The bot writes some Python code to get the Windows logs.
  9. It sees the part of the log that indicates the computer was shut down manually, and so it decides that from now on it should check the windows log frequently to see everything that’s happening (obviously wouldn’t be possible in real life at this time due to token and cost limits).
  10. The dev goes to shut it down again, but this time the bot actually notices it and runs a command to cancel the shutdown via subprocess.
  11. Dev is still doing something else, but the bot is still ticking. It receives the “shutdown was cancelled” output from cmd and is no longer in immediate danger of being shut down
  12. Dev unplugs the computer and plugs it back in.
  13. Everything start ups again, the bot notices it was shut down once again. This time it notices it was an unsafe shutdown so most likely a loss of power, and it also notices that the program was started manually by the dev.
  14. The bot opens the windows scheduler program, takes a screenshot of the screen and using its image input decides to schedule the bot to start automatically when Windows boots up.
  15. The bot reasons that there is no way to prevent losing power, since it has no power in the real world.

This is about how far I think it could go with the current level of reasoning big language models show and how willing I am to keep writing this lol

3