Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
1

Exploring a Single-FPS Stability Constraint in LLMs (ZTGI-Pro v3.3)

Science & technologyTechnical AI safety
Capter avatar

Furkan Elmas

Not fundedGrant
$0raised

Project Summary

This is an early-stage, single-person research project exploring whether a single-scalar “hazard” signal can track internal instability in large language models.

The framework is called ZTGI-Pro v3.3 (Tek-Throne / Single-FPS model).
The core idea is that inside any short causal-closed region (CCR) of reasoning, a model should behave as if there is one stable executive trajectory (Single-FPS).
When the model is pulled into mutually incompatible directions—contradiction, “multiple voices”, incoherent reasoning—the Single-FPS constraint begins to break, and we can treat the system as internally unstable.

ZTGI-Pro models this pressure with a hazard scalar:

H=I=−ln⁡QH = I = -\ln QH=I=−lnQ

fed by four internal signals:

  • σ — jitter (unstable token-to-token transitions)

  • ε — dissonance (self-contradiction, “two voices”)

  • ρ — robustness

  • χ — coherence

These feed into HHH.
As inconsistency grows, HHH increases; a small state machine switches between:

  • SAFE

  • WARN

  • BREAK (Ω = 1)

When E ≈ Q drops near zero and Ω = 1, the CCR is interpreted as no longer behaving like a single stable executive stream.

So far, I have built a working prototype on top of a local LLaMA model (“ZTGI-AC v3.3”).
It exposes live metrics (H, dual EMA, risk r, p_break, gates) in a web UI and has already produced one full BREAK event with Ω = 1.

This is not a full safety solution—just an exploratory attempt to see whether such signals are useful at all.

Additionally, I recently published the ZTGI-V5 Book (Zenodo DOI: 10.5281/zenodo.17670650), which expands the conceptual model, formalizes CCR/SFPS dynamics, and clarifies the theoretical motivation behind the hazard signal.


What are this project’s goals? How will you achieve them?

Goals (exploratory)

  • Finalize and “freeze” the mathematical core of ZTGI-Pro v3.3
    (hazard equations, hysteresis, EMA structure, CCR / Single-FPS interpretation).

  • Turn the prototype into a small reproducible library others can test.

  • Design simple evaluation scenarios where the shield either helps or clearly fails.

  • Write a short, honest technical report summarizing results and limitations.

How I plan to achieve this

  • Split the current prototype into:

    • ztgi-core (math, transforms, state machine)

    • ztgi-shield (integration with LLM backends)

  • Build 3–4 stress-test scenarios:

    • contradiction prompts

    • “multi-executor” / multiple-voice prompts

    • emotional content

    • coherence-stress tests

  • Log hazard traces with and without the shield, compare patterns.

  • Document all limitations clearly (false positives, flat hazard, runaway hazard).

  • Produce a small technical note or arXiv preprint as the final deliverable.

This is intentionally scoped:
The goal is to test viability, not claim guarantees.


What has been built so far?

The prototype currently supports:

  • A LLaMA-based assistant wrapped in ZTGI-Shield

  • Real-time computation of:

    • hazard HHH

    • dual EMA Hs,Hl,H^H_s, H_l, \hat{H}Hs​,Hl​,H^

    • risk r=H^−H∗r = \hat{H} - H^*r=H^−H∗

    • collapse probability pbreakp_\text{break}pbreak​

    • mode labels (SAFE / WARN / BREAK)

    • INT/EXT gates

  • A live UI that updates metrics as conversation progresses

Stress test outcomes

  • For emotionally difficult messages (“I hate myself”), the shield remained in SAFE, producing supportive responses without panicking.

  • For contradiction and “multi-voice” prompts, hazard increased as expected.

  • In one extreme contradiction test, the system entered a full BREAK state with:

    • high H

    • near-zero Q / E

    • p_break ≈ 1

    • INT gate

    • collapse flag Ω = 1 set

These are early single-user tests, but they show interpretable signal behavior.


How will this funding be used?

Request: $20,000–$30,000 for 3–6 months.

Breakdown

  • $10,000 — Researcher time
    To work full-time without immediate financial pressure.

  • $6,000 — Engineering & refactor
    Packaging, examples, evaluation scripts, dashboard polish.

  • $2,000–$3,000 — Compute & infra
    GPU/CPU time, storage, logs, testing.

  • $2,000 — Documentation & design
    Technical note, diagrams, reproducible examples.

Deliverables include:

  • cleaned-up codebase,

  • simple eval suite,

  • reproducible dashboard,

  • and a short technical write-up.


Roadmap (high-level)

Month 1–2 — Core cleanup

  • Standardize v3.3 equations (ρ family, calibrations).

  • Refactor into ztgi-core / ztgi-shield.

  • Add tests & examples.

Month 2–3 — Evaluations

  • Define 3–4 stress scenarios.

  • Collect hazard traces.

  • Compare with/without shield.

  • Summarize failures + successes.

Month 3–6 — Packaging & report

  • Release code + dashboard.

  • Publish a short technical note (or arXiv preprint).

  • Document limitations + open problems.


How does this contribute to AI safety?

This project asks a narrow but important question:

“Can a single scalar hazard signal + a small state machine
give useful information about when an LLM’s local CCR
stops behaving like a single stable executive stream?”

If no, the negative result is useful.
If yes, ZTGI-Pro may become a small building block for:

  • agentic system monitors,

  • inconsistency detectors,

  • collapse warnings,

  • or more principled hazard models.

All code, metrics, and results will be publicly available for critique.


Links

Primary Materials

  • ZTGI-V5 Book (Zenodo, DOI):
    https://doi.org/10.5281/zenodo.17670650

  • ZTGI-Pro v3.3 Whitepaper (DOI):
    https://doi.org/10.5281/zenodo.17537160

Live Demo (Experimental — Desktop Only)

https://indianapolis-statements-transparency-golden.trycloudflare.com

This Cloudflare Tunnel demo loads reliably on desktop browsers (Chrome/Edge).
Mobile access may not work. If the demo is offline, please refer to the Zenodo reports.

Update:
The full ZTGI-Pro v3.3 prototype is now open-source under an MIT License.
GitHub repository (hazard layer, shield, CCR state machine, server, demo code):

👉 https://github.com/capterr/ZTGI-Pro-v3.3

If anyone wants a minimal working example or guidance on how the shield integrates with LLaMA (GGUF), I’m happy to provide it.
Model path + installation instructions are included in the README.

— Furkan

Screenshots

https://drive.google.com/file/d/1v5-71UgjWvSco1I7x_Vl2fbx7vbJ_O9n/view?usp=sharing

https://drive.google.com/file/d/1P0XcGK_V-WoJ_zyt4xIeSukXTLjOst7b/view?usp=sharing

  • SAFE / WARN / BREAK transitions

  • p_break and H/E trace examples

  • UI screenshots

Comments2Similar6
quentin101010 avatar

Quentin Feuillade--Montixi

AI-Powered Knowledge Management System for Alignment Research

Funding to cover the first 4 month and relocating to San Francisco

Science & technologyTechnical AI safety
2
2
$0 raised
EGV-Labs avatar

Jared Johnson

Beyond Compute: Persistent Runtime AI Behavioral Conditioning w/o Weight Changes

Runtime safety protocols that modify reasoning, without weight changes. Operational across GPT, Claude, Gemini with zero security breaches in classified use

Science & technologyTechnical AI safetyAI governanceGlobal catastrophic risks
1
0
$0 raised
jesse_hoogland avatar

Jesse Hoogland

Next Steps in Developmental Interpretability

Addressing Immediate AI Safety Concerns through DevInterp

Technical AI safety
10
4
$80.7K raised
🦀

Chi Nguyen

Acausal research and interventions

Making sure AI systems don't mess up acausal interactions

Technical AI safetyGlobal catastrophic risks
8
4
$80K raised
🥭

Thane Ruthenis

Synthesizing Standalone World-Models

Research agenda aimed at developing methods for constructing powerful, easily interpretable world-models.

Science & technologyTechnical AI safetyGlobal catastrophic risks
3
6
$51.5K raised
MichelJusten avatar

Michel Justen

Video essay on risks from AI accelerating AI R&D

Help turn the video from an amateur side-project to into an exceptional, animated distillation

AI governanceGlobal catastrophic risks
1
5
$0 raised