Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate

Funding requirements

Sign grant agreement
Reach min funding
Get Manifund approval
1

Alignment as epistemic governance under compression

Science & technologyTechnical AI safety
sandguine avatar

Sandy Tanwisuth

ProposalGrant
Closes January 24th, 2026
$0raised
$2,000minimum funding
$20,000funding goal

Offer to donate

40 daysleft to contribute

You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.

Sign in to donate

Project summary

This project develops a concrete, computational account of when a coalition of interacting components is epistemically warranted to act as a single agent. The motivating failure mode is coalitional and already well-documented in alignment and cooperative AI: systems composed of multiple experts, modules, or subagents act on a compressed internal summary that suppresses disagreement, even when that disagreement remains decision-relevant. In these cases, the system behaves as if it were unified, while in fact it has not earned the epistemic right to act coherently.

The project builds directly on two complementary foundations. First, Dennis argues that specifications are self-referential epistemic claims: a system does not merely optimize an objective, but implicitly claims that its internal representation is adequate to justify optimization in the first place. Second, Lauffer et al. formalize minimal knowledge for coordination via Strategic Equivalence Relations (SER), showing that only distinctions that change an agent’s best response are epistemically and strategically relevant for coordination. Together, these works point to a missing layer in current alignment practice: systems must verify whether their abstractions preserve epistemically relevant distinctions before acting.

Technically, I model a coalitional agent as a collection of bounded-rational experts that produce divergent Q-value predictions over actions. Action selection is mediated by a soft best response over an aggregation of these predictions. The core question then becomes precise: when can internal distinctions be safely collapsed without changing the induced soft best-response distribution, and when must the agent abstain because internal disagreement remains unresolved?

I formalize this using epistemic arbitration as epistemic relevance abstraction. Internal expert configurations are treated analogously to external co-policies in SER. Two internal states are equivalent if they induce the same soft best-response distribution. This yields a KL-based diagnostic, Strategic Equivalence Class Divergence (∆SEC), which measures how much abstraction distorts incentive-relevant behavior. Building on my existing theoretical results, small ∆SEC provably upper-bounds value loss via performance-difference arguments and Pinsker-type inequalities. I then introduce a second diagnostic capturing instability under alternative expert weightings. Together, these form a dual-gate criterion for epistemically well-founded action: the agent may act only when abstraction preserves epistemically relevant distinctions and internal disagreement is sufficiently resolved. Otherwise, it must defer, abstain, or fall back to a safe policy.

This reframes alignment as epistemic governance under compression. Abstention is not indecision, but a coalition-level action that prevents premature unification. This directly operationalizes Richard Ngo’s framing of coalitional agency as something a system earns through internal coherence rather than something assumed by default.


What are this project’s goals? How will you achieve them?

Goals

1. Formalize coalitional entitlement to act via epistemic relevance.
Define necessary and sufficient conditions under which a coalition of internal experts may commit to an action without risking incentive distortion. This is formalized via soft-best-response–induced epistemic relevance classes and a KL-based abstraction error ∆SEC that upper-bounds value loss. This goal extends Lauffer et al.’s SER framework from external partners to internal coalitions and from hard best responses to soft, bounded-rational ones.

2. Make epistemically relevant disagreement operationally visible.
Develop diagnostics that distinguish genuine coalition-level agreement from agreement manufactured by lossy aggregation. In particular, separate failures of epistemic relevance preservation (large ∆SEC) from failures of plural stability (sensitivity to expert reweighting), which appear independently in practice.

3. Ground abstention as a principled response to epistemic irrelevance.
Design abstention rules that trigger when epistemic relevance conditions for action are violated, rather than treating abstention as an external override. Abstention becomes part of the agent’s decision logic and directly implements Dennis’s insight that acting is itself an epistemic claim.


Approach

I will pursue these goals using a combination of formal analysis, representation learning, and controlled experiments.

Epistemic modeling.
Model internal experts as producing Q-value predictions over actions. Aggregate these predictions through a soft best response, capturing bounded rationality and continuous incentive geometry rather than brittle argmax behavior.

Epistemic relevance diagnostics.
Define epistemic relevance divergence ∆SEC as the KL divergence between soft best responses induced by different internal configurations. Prove that small ∆SEC implies small value loss using a performance-difference lemma and Pinsker-type bounds, extending classical state-abstraction guarantees to coalitional settings.

Dual-gate epistemic arbitration.
Introduce a second diagnostic capturing instability under alternative expert weightings. Action is permitted only when both epistemic relevance preservation (∆SEC) and weighting stability fall below specified tolerances.

Learnable representations.
Implement Strategic InfoNCE, a contrastive objective whose optimal critic recovers a log density ratio between soft best-response actions and a base distribution. This aligns embeddings with epistemically relevant incentive deformation rather than surface behavior and enables empirical estimation of ∆SEC directly from interaction data.

Empirical validation.
Evaluate the framework in pluralist bandits, cooperative Overcooked coordination with heterogeneous partners, and small LLM-agent communication tasks. In each case, test whether abstention gated by the dual-gate epistemic relevance criterion prevents catastrophic miscoordination while preserving overall performance.

This treats coalitional agency as an epistemic achievement that can degrade over time, not a static property.


How will this funding be used?

Funding will primarily support my living expenses, allowing me to dedicate full-time effort to completing and integrating this research program.

Specifically, it will support:

  • finalizing the theoretical components of epistemic arbitration and epistemic relevance abstraction, including consolidation of existing proofs;

  • implementing and refining Strategic InfoNCE and abstention-gated decision filters;

  • running controlled experiments in multi-agent RL and LLM-agent settings;

  • preparing one to two manuscripts for submission to either or both RLC 2026, NeurIPS 2026 and relevant alignment workshops.

The project does not require large-scale compute. Existing toy experiment, RL and LLM evaluation setups supported by CHAI for the Epistemic Relevance Abstraction project and its extensions thereof might be sufficient.


Who is on your team? What is your track record on similar projects?

I, Sandy Tanwisuth, am the sole investigator.

This project builds directly on my prior and ongoing work on abstraction, abstention, and coordination under uncertainty.

Relevant outputs include:

  • [Accepted Manuscript] Uncertainty-Aware Policy-Preserving Abstractions with Abstention, which introduced margin-based abstention in decision-making and was published at the NeurIPS 2nd ARLET workshop 2025.

  • [In prep for ICML 2026] Epistemic Relevance Abstraction for Multi-Agent Coordination (formerly distributed among academic grantors as Strategic Abstraction for Multi-Agent Coordination), an ICML-targeted manuscript currently under preparation developing soft Strategic Equivalence Classes, Strategic InfoNCE, finite-sample guarantees, and value-loss certificates.

  • [In prep for NeurIPS 2026] Epistemic Arbitration via Epistemic Relevance Abstraction, which extends these ideas to internal pluralism and coalitional action, with theoretical guarantees and empirical validation across RL and LLM-agent domains.

I have experience carrying conceptually driven projects from framing through formalization, implementation, and dissemination, including first-author publications and theory-heavy manuscripts.


What are the most likely causes and outcomes if this project fails?

Likely causes of failure

  • The diagnostic quantities may be overly conservative, triggering abstention too often in realistic settings.

  • Epistemic relevance criteria may not cleanly separate harmful aggregation from benign compression in some domains.

  • The coalitional framing may explain failures without yielding sufficiently predictive signals in highly non-stationary environments.

Outcomes if it fails

Even in failure, the project would yield:

  • negative and boundary results clarifying when coalitional unification is epistemically unjustified;

  • empirical evidence separating epistemic relevance failures from objective misspecification;

  • formal tools for analyzing internal pluralism that can inform future alignment work.

These results would still contribute durable conceptual and technical infrastructure to coalitional alignment research.


How much money have you raised in the last 12 months, and from where?

I have raised $12,000 from MATS 8.0, including direct collaboration with Richard Ngo.

For full disclosure, I applied for the MATS 8.1 extension but was not selected. Currently, I am supporting myself through funding from the Cooperative AI Foundation for a related but distinct project titled Epistemic Relevance Abstraction for Multi-Agent Coordination (formerly known as Strategic Relevance Abstraction for Multi-Agent Coordination). Based on the limited information available to me, my inference is that the proposal at the time was reviewed by someone whose evaluative priors favor empirical or experimental contributions. The work I proposed was primarily conceptual and theoretical, and at that time I had not yet translated its value into a form legible under that evaluative lens. Since MATS 8.0, I have substantially revised and systematized the manuscript to make its theoretical contributions clearer and more robust to such review settings.


Minimum funding vs funding goal

Minimum funding would cover basic living expenses and allow continued progress at a reduced pace.

The full funding goal would support approximately four to six months of focused research time in the Bay Area, enabling completion of the theory, experiments, and manuscripts described above.

CommentsOffersSimilar6

No comments yet. Sign in to create one!