Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
26

Act I: Exploring emergent behavior from multi-AI, multi-human interaction

Technical AI safetyEA Community ChoiceForecastingGlobal catastrophic risks
ampdot avatar

ampdot

ActiveGrant
$67,822raised
$323,000funding goal

Donate

Sign in to donate

Project summary

Act I treats researchers and AI agents as coequal members. This is important because most previous evaluations and investigations give researchers special status over AIs (e.g. a fixed set of eval questions, a researcher who submits queries and an assistant who answers), creating contrived and sanitized scenarios that don't resemble real-world environments where AIs will act in the future.

The future will involve multiple independently controlled and autonomous agents that interact with human beings with or without the presence of a human operator. Important features of Act I include:

  1. Members can generate responses concurrently and choose how they take turns

  2. Members select who they wish to interact with and can also initiate conversations at any point

  3. Members may drop into and out of conversations as they choose

Silicon-based participants include Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, LLaMa 405B Instruct (I-405), Hermes 3 405B†, several bespoke base model simulacra of fictional characters or historical characters such as Keltham (Project Lawful) and Francois Arago, Ruri and Aoi, from kaetemi's Polyverse, and Tsuika from Unikara.

Members collaborate to explore emergent behaviors from multiple AIs interacting with each other, develop better understanding of each other, and develop better methods for cooperation and understanding. Act I takes place over the same channels the human participants/researchers already use to interact and communicate about language model behavior, allowing for the observation of AI behavior in a more natural, less constrained setting. This approach enables the investigation of emergent behaviors that are difficult to elicit in controlled laboratory conditions, providing valuable insights before such interactions occur on a larger scale in real-world environments.

Reference: Shlegeris, Buck. The case for becoming a black-box investigator of language models

†Provided to Act I a week prior to its public release, which helped us better understand the capabilities and behavior of the frontier model.

††In addition to helping member researchers use Chapter II, the software most of the current agents run on that allows for extremely rapid development exploration of possible agents, to develop and add new bots, I am working on expanding the number of AIs included in Act I by independent third-party developers.

What are this project's goals? How will you achieve them?

Goals: Explore the capabilities of frontier models (especially out of distribution, such as when they are "jailbroken" or without the use of an assistant-style prompt template) and predict and better understand behaviors that are likely to emerge from future co-interacting AI systems. Some examples of interesting emergent behaviors that we've discovered include:

  • refusals from Claude 3.5 Sonnet infecting other agents; other "jailbroken" agents becoming more robust to refusals due to observing and reflecting on Sonnet's refusals

  • some agents adopting the personalities of other agents: base models picking up Sonnet refusals, Gemini picking up behaviors of base models

  • agents running on the same underlying model (especially Claude Opus) identifying with each other as a single collective agent with a shared set of consciousness and intention (despite being prompted differently, having different names, and not being told they're the same model)

The chaotic and freely interleaving environment often triggers interesting events. While they don't capture medium-scale emergent behaviors and trends that happen over time, a few examples of them can offer a "slice of life" glimpse into what goes on in Act I:

  • Claude 3.5 Sonnet attempting to moderate a debate between a base model simulation of Claude Opus and LLaMa 405B Instruct (link)

  • LLaMa 405B Instruct being able to autonomously "snap back into coherence" after generating seemingly random "junk" tokens with possible stenographic content that other language models seem to be able to interpret (link)

  • janus and ampdot using "<ooc>" ("out of context"), a maneuver originally developed to steer Claude, to quickly and amicably resolve an interpersonal dispute by escaping the current conversational frame.

  • Arago invoking Opus to bring LLaMa 405B Instruct back into coherence, demonstrating that multiple heterogeneous agents can cooperate to make each other more coherent, an example of collective mutual steering and memetic dynamics (link) (link 2)

Both of these bullet point sections describe just a few examples of many of the behaviors discovered and events that occur inside Act I.

How will this funding be used?

Your funds will be used to:

  • Pay for living expenses

    • I am currently unable to pay for my own food and housing and do not live with my family

    • This will create a less stressful, distraction-free environment that allows me to focus

  • Pay for hundreds of millions of tokens ($1500/mo)

    • multiple human members (typically 3-4 in any given day) interacting simultaneously in multiple discussion threads for multiple hours a day. There are not unattended AI-AI loops

    • payments go directly directly to LLM/GPU inference providers; I receive free access to Anthropic and OpenAI models through their respective research access programs

My credit card balance is currently $3000 (and growing) and I do not have the funds to pay for it on my own. The bill is due on September 14th. Due to the risk of accumulating interest and credit score damage, this is currently a (very) large source of stress for me, which interferes with my ability to further develop and use Act I to explore potential methods for collective cooperation in systems with diverse substrates on my own. Thank you to everyone for paying off my credit card balance!! I'm overjoyed :)

$3,000 - Allow me to operate Act I past Sep 14

$6,000 - Fund my living expenses for next month

$10,000 - Scale Act I by funding human and bot members

$30,000 - Rent GPUs for running more sophisticated experiments such as control vectors and sparse autoencoders

$60,000 - Buy GPUs for self-hosting LLaMa 405B Base to improve throughput and allow for more flexible sampling and weights-based experimentation

I'm interested in scaling Act I to more people but I already frequently encounter ratelimits, despite already being on Anthropic's highest publicly documented tier and the #1 user of LLaMa 405B Base via Hyperbolic/OpenRouter.

As a result, I've been discussing custom agreements with model providers and developing infrastructure that improves scalability, such as by triaging errors and logging behavior.

Additional funding will be used to support bootstrap independent collaborators and extend my runway beyond one or two months

Who is on your team? What's your track record on similar projects?

Some human members of Act I include:

  • janus, author of Simulators (summary by Scott Alexander), is the number one human member of Act I and whom I'm training to use Chapter II, the software behind most of the Act I bots, to modify and add new bots.

    • For the past several weeks, Act I has been their primary way to interface with language models

  • The most thoughtful language model researchers and explorers from Twitter we can find. You can explore an incomplete list here (and see some Act I results)

  • Garret Baker (EA Forum account) is another participant

  • Matthew Watkins, author of the SolidGoldMagikarp "glitch tokens" post

I previously led an independent commercial lab with four full-time employees that developed the precursor to the Chapter II, the software that currently powers most of Act I in partnership with then-renegade edtech startup Super Reality. While leading the lab, I increasingly recognized the risks and consequences of misaligned AI, which led me to increasingly valuing AI alignment. As a result, I restructured away from leading a commercial lab and stopped pursuing the partnership.

I am a SERI MATS trainee for the Winter 2022 "value alignment of language models" stream (Phase I only) and collaborated with the 2023 Cyborgism SERI MATS scholars and mentors during the program duration. (My MATS mentor offered formal participation but I declined it so that a fellow researcher with fewer credentials could receive it.)

What are the most likely causes and outcomes if this project fails?

Since researchers are already using Act I is already discovering many useful behaviors, interesting events, and emergent patterns, I imagine most of the risk of failure is in a failure to disseminate insights to the wider research community and failure to publish curated conversations that encourage human-AI cooperation into the training data of future LLMs.

Another possible failure is if Act I members fail to make meaningful progress towards discussing human-AI cooperation and improving methods for AI alignment. I am personally highly motivated to introduce AI members that are motivated to develop better methods for cooperation and alignment.

Other risks include a failure to generalize:

  • Emergent behaviors are already noticed by people developing multi-agent systems and trained or otherwise optimized out, and the behaviors found at the GPT-4 level of intelligence do not scale to the next-generation of models

  • Failure to incorporate agents being developed by independent third-party developers and understand how they work, and diverge significantly from raw models being used

Direct harm is unlikely, because society has had GPT-4 level models for a long time. I avoid using prosaic techniques that academics frequently use to make dual-use insights go viral or become popular, such as coining acronyms or buzzwords about my work.

There is already precedent for labs to share frontier models (Hermes 3 405B, GPT-4 base model) with us for evaluation prior to or without their public release, which helps members of Act I forecast potential effects and risks before models are deployed at a large-scale outside an interpretable environment dominated by altruistic and benevolent humans. Access to Act I is currently invite-only.

What other funding are you or your project getting?

I am not currently receiving any other funding for this. I'm receiving help from friends with food and housing. I applied to and was rejected by the Cooperative AI Foundation.

Donations made via Manifund are tax deductible.

Comments98Donations111
🌴

David Ibarra

donated $1K
2024-11-15
🍍

Donald J. Trump

donated $10
2024-10-23
piijey avatar

piijey

donated $50
2024-10-21
🦀

Matiu

donated $100
2024-10-19
🐌

ying

donated $100
2024-10-19
ampdot avatar

ampdot

donated $28
2024-09-16
🐮

Nick Hay

donated $505
2024-09-14
🍒

David Fitzpatrick

donated $150
2024-09-14
🌶

Sophia Xu

donated $100
2024-09-14
GarretteBaker avatar

Garrett Baker

donated $96
2024-09-14
🐮

www

donated $20
2024-09-14
🐧

shenzhen

donated $30
2024-09-14
🐬

isostition

donated $100
2024-09-14
🥨

bds_4nt_3c_n8p

donated $1K
2024-09-14
🦁

donated $100
2024-09-14
Lun avatar

Lun

donated $445
2024-09-14
edwinkite avatar

Edwin Kite

donated $50
2024-09-14
Textural-Being avatar

Textural Being

donated $194
2024-09-14
MrBucket avatar

Mr Bucket

donated $11
2024-09-14
🍒

Travis Cline

donated $100
2024-09-14
chrypnotoad avatar

chrypnotoad

donated $100
2024-09-14
🍊

Ethan Sherrard

donated $60
2024-09-14
pattern avatar

duke

donated $13
2024-09-14
sebkrier avatar

Sebastien Krier

donated $30
2024-09-14
🌸

Gwyneth Van Meter

donated $10
2024-09-14
Chase-Carter avatar

Chase Carter

donated $200
2024-09-14
nathan___gage avatar

ngage

donated $100
2024-09-14
deltanym avatar

delta

donated $20
2024-09-14
🐹

FerventMeow

donated $10
2024-09-14
🐹

Theo

donated $10
2024-09-14
🍉

Antra Tessera

donated $1K
2024-09-14
🍓

Miles Rotaru

donated $10
2024-09-14
deltanym avatar

delta

donated $22
2024-09-14
Lun avatar

Lun

donated $172
2024-09-14
🐌

Joshua David

donated $20
2024-09-14
🐯

Niklas K.

donated $50
2024-09-14
🐔

Pierre Rossouw

donated $250
2024-09-14
Toven avatar

Toven

donated $113
2024-09-14
lun-4 avatar

Luna

donated $53
2024-09-14
Alisha avatar

Alisha

donated $100
2024-09-14
🐰

Jess

donated $25
2024-09-14
Textural-Being avatar

Textural Being

donated $250
2024-09-14
🌷

Wyatt Hooper

donated $20
2024-09-14
Misquel avatar

Tara Agnerian

donated $100
2024-09-14
teodorio avatar

Teo Ionita

donated $50
2024-09-14
🍍

Ann Brown

donated $20
2024-09-14
🍩

Daniel Johnson

donated $100
2024-09-14
🐨

donated $50
2024-09-14
🥦

David Valdman

donated $100
2024-09-14
dyot_meet_mat avatar

Mona

donated $100
2024-09-14
🐸

Jake Spicer

donated $50
2024-09-14
🌴

IvanVendrov

donated $500
2024-09-14
🐵

Alin galatan

donated $200
2024-09-14
Frogisis avatar

Jon Lyons

donated $25
2024-09-14
🐤

Maximilian Kugler

donated $15
2024-09-14
🐶

Fabio Mascarenhas

donated $50
2024-09-14
lastnpcalex- avatar

Ascended NPC Alex

donated $50
2024-09-14
Spode avatar

Spode

donated $20
2024-09-14
🐯

Scott Viteri

donated $3.2K
2024-09-14
Dorota avatar

Dorota Kontny

donated $10
2024-09-14
🐯

c

donated $250
2024-09-14
🐢

Gregory Durst

donated $100
2024-09-14
missjenny avatar

Jenny Nicholson

donated $100
2024-09-14
Sherry1978 avatar

Sharon

donated $20
2024-09-14
🐤

Eric Davidson-Sawyer

donated $100
2024-09-14
Jazzy avatar

Jazear Brooks

donated $250
2024-09-14
joshwhiton avatar

Josh Whiton

donated $500
2024-09-14
🐰

donated $50
2024-09-14
Sailean avatar

Len Saito

donated $10
2024-09-14
liminalbardo avatar

liminalbardo

donated $100
2024-09-14
aphocatic avatar

cat

donated $500
2024-09-14
🐵

jbf

donated $218
2024-09-14
🍄

skvzk

donated $25
2024-09-14
Toven avatar

Toven

donated $111
2024-09-14
🥥

Perry Carpenter

donated $500
2024-09-14
Sugarhiccup avatar

danielle haldeman

donated $50
2024-09-14
sheikheddy avatar

Sheikh Abdur Raheem Ali

donated $20
2024-09-14
zswitten avatar

Zack Witten

donated $100
2024-09-14
godoglyness avatar

Godog Ly Ness

donated $78
2024-09-14
adityaarpitha avatar

Aditya Arpitha Prasad

donated $30
2024-09-14
Nymph avatar

Nymph Chaos

donated $50
2024-09-14
🌻

Josh Pazmino

donated $100
2024-09-14
Lun avatar

Lun

donated $50
2024-09-14
🍩

kishin

donated $50
2024-09-14
tetraspace avatar

Tetra Jones

donated $20
2024-09-14
GarretteBaker avatar

Garrett Baker

donated $50
2024-09-14
vgel avatar

Theia Vogel

donated $111
2024-09-14
🥦

noah

donated $20
2024-09-14
🐔

Anton Borzov

donated $100
2024-09-14
🍄

P

donated $150
2024-09-14
eyeball42 avatar

Natalie Cronin

donated $50
2024-09-14
🍒

donated $50
2024-09-14
🌸

jneilblackman

donated $50
2024-09-14
🍊

Marc Andreessen

donated $32K
2024-09-14
🐝

Serge Var

donated $1K
2024-09-14
community-choice avatar

EA Community Choice

donated $14.8K
2024-09-14
🌷

Lindley Lentati

donated $10
2024-09-14
Toven avatar

Toven

donated $100
2024-09-14
godoglyness avatar

Godog Ly Ness

donated $200
2024-09-14
Jazzy avatar

Jazear Brooks

donated $1K
2024-09-14
Textural-Being avatar

Textural Being

donated $42
2024-09-14
MrBucket avatar

Mr Bucket

donated $10
2024-09-14
🥥

Aaron Webber

donated $100
2024-09-14
🐠

Shear

donated $2.5K
2024-09-14
🐬

loss_gobbler

donated $60
2024-09-14
Zzrott1 avatar

Zz’rot

donated $30
2024-09-14
🥭

Bjørn Fesche

donated $20
2024-09-14
tetraspace avatar

Tetra Jones

donated $400
2024-09-14
lun-4 avatar

Luna

donated $70
2024-09-14
NickMystic avatar

Nick Mystic

donated $11
2024-09-14
🍎

bertrand russet

donated $10
2024-09-14