Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate

Funding requirements

Sign grant agreement
Reach min funding
Get Manifund approval
1

Boundary-Mediated Models of LLM Hallucination and Alignment

Technical AI safetyAI governance
Brian-McCallion avatar

Brian McCallion

ProposalGrant
Closes January 19th, 2026
$0raised
$40,000minimum funding
$75,000funding goal

Offer to donate

35 daysleft to contribute

You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.

Sign in to donate

Project Summary

Large Language Models (LLMs) display coherent reasoning while repeatedly failing in predictable ways, including hallucination, brittle alignment, and specification gaming. Existing explanations are fragmented and largely empirical, offering limited guidance for principled alignment design.

This project develops and empirically tests a new mechanistic framework that models LLMs as boundary-mediated adaptive systems structured by a Generate–Conserve–Transform architecture. In this view, both token generation and training updates are irreversible boundary write events that shape future behaviour. The framework yields concrete predictions about hallucinations, correction dynamics, and alignment failures, which will be evaluated using small-scale synthetic and empirical experiments.

The goal is to provide a unifying, testable theory that helps move alignment work from reactive mitigation toward principled system design.


What are this project’s goals? How will you achieve them?

Goals

  1. Develop a clear mechanistic model explaining hallucinations, coherence, and alignment failures in LLMs.

  2. Validate core predictions empirically, rather than leaving the framework as a purely conceptual proposal.

  3. Produce alignment-relevant design insights, including when and why wrapper-style safety methods fail.

Approach

  • Formalise boundary-mediated computation and the Generate–Conserve–Transform architecture in computational terms.

  • Model inference as trajectories through an attractor field shaped by learned structure.

  • Test key hypotheses using compact transformer models and synthetic tasks, including:

    • localisation of hallucinations to weakly shaped regions of latent space

    • irreversible divergence from early incorrect token commitments

    • distinct failure signatures arising from Generate–Conserve–Transform imbalances

  • Compare wrapper-only safety constraints with a minimal adaptive “supervisory Transform” intervention.

The work is intentionally scoped to be feasible without large-scale compute or institutional infrastructure.


How will this funding be used?

Funding will support a 3–6 month validation phase, focused on theory formalisation and targeted experiments.

Indicative use of funds:

  • Researcher time: enable focused work through reduced hours or temporary exit from external employment.

  • Compute and infrastructure: small-scale transformer training, repeated ablation runs, storage, and analysis tooling.

  • Execution buffer: flexibility for failed experiments, additional runs, or minor tooling needs.

Any unused funds would be returned or reallocated with approval.


Who is on your team? What’s your track record on similar projects?

This project is led by a single independent researcher.

I work at the intersection of systems theory, machine learning, and AI alignment, with a background in complex technical systems. Over the past year, I have independently developed the core framework underlying this project, including formal definitions, alignment implications, and a concrete experimental plan.

While this work has not yet been externally funded, it builds directly on sustained prior research and has been developed to the point of being empirically testable. The project is deliberately designed for solo execution and does not rely on institutional resources.


What are the most likely causes and outcomes if this project fails?

Likely causes of failure

  • Proposed metrics (e.g. curvature or density proxies) may not correlate strongly with hallucination or stability.

  • Small-scale experiments may be insufficient to cleanly demonstrate predicted effects.

  • The framework may require refinement or partial revision based on empirical results.

Outcomes if the project fails

  • Negative or ambiguous results would still constrain the space of plausible explanations for LLM failure modes.

  • The work would clarify which aspects of the theory are unsupported, reducing future misdirected alignment efforts.

  • Even partial results would inform follow-on research directions and funding decisions.

Failure modes are informative rather than wasted effort.


How much money have you raised in the last 12 months, and from where?

I have not raised external research funding in the past 12 months.

CommentsOffersSimilar5
SandyFraser avatar

Sandy Fraser

Concept-anchored representation engineering for alignment

New techniques to impose minimal structure on LLM internals for monitoring, intervention, and unlearning.

Technical AI safetyGlobal catastrophic risks
3
1
$0 raised
🐯

Scott Viteri

Attention-Guided-RL for Human-Like LMs

Compute Funding

Technical AI safety
4
2
$3.1K raised
EGV-Labs avatar

Jared Johnson

Beyond Compute: Persistent Runtime AI Behavioral Conditioning w/o Weight Changes

Runtime safety protocols that modify reasoning, without weight changes. Operational across GPT, Claude, Gemini with zero security breaches in classified use

Science & technologyTechnical AI safetyAI governanceGlobal catastrophic risks
1
0
$0 raised
sandguine avatar

Sandy Tanwisuth

Alignment as epistemic governance under compression

This project develops a concrete, computational account of when a coalition of interacting components is epistemically warranted to act as a single agent.

Science & technologyTechnical AI safety
1
0
$0 / $20K
LawrenceC avatar

Lawrence Chan

Exploring novel research directions in prosaic AI alignment

3 month

Technical AI safety
5
9
$30K raised