Project summary

I'm an independent AI researcher building a measurement system that quantifies how much of a transformer language model's next token comes from its own hidden state, as opposed to the prompt or training-induced constraints. The system is called IOTA. It decomposes next-state determination into three information channels: external input (E), constraint pressure (C), and prior internal state (R), with E + C + R = 1 defined formally from the chain rule of mutual information.

The headline result so far: across three transformer architectures (LLaMA 3 8B, Gemma 2 9B, Gemma 2 2B), the internal-state share R converges to approximately 0.40–0.46 at T=1.0, despite completely different training, different architectures, and qualitatively different temperature profiles. Four independent measurements across three architectures and two precisions fall within R ∈ [0.40, 0.46] at T=1.0.

A precision comparison on Gemma 2 2B reveals a qualitative shift: at FP16, R climbs monotonically from 0.25 (T=0.0) to 0.45 (T=1.0); at Q4, R is temperature-invariant around 0.44. The FP16 run also achieves OLS significance at T=0.0 for the first time in any model tested.

A linear-vs-nonlinear cross-check (Ridge vs MLP reference) shows the linear estimator is accurate to within 0.008 on the FP16 run and increasingly underestimates under Q4 quantization (gap 0.03 at 2B, 0.05 at 9B, 0.09 at 8B). The linear decomposition is a point estimate under full precision and a conservatively biased lower bound under quantization.

Supporting evidence on the existing measurements:

Activation patching confirms internal state is causally upstream of output
Synthetic calibration against ground-truth systems shows the estimator is conservatively biased (empirical R values are lower bounds)
Confound tests rule out prompt and estimator artifacts
47 experimental runs, 51 preregistered null hypotheses, held-out validation passes
All runs completed on a single RTX 3080 (10 GB VRAM)

Technical note with formal proofs of boundedness, invariance, non-triviality, and behavior under erasure: Zenodo DOI . 10.5281/zenodo.19656944

Empirical paper is imminent for arXiv submission via cs.AI endorsement.

Code and data: github.com/vee-industries/iota-code

What are this project's goals? How will you achieve them?

Primary goal: land the empirical paper on arXiv with the precision ladder complete across all three architectures.

I currently have FP16 results on Gemma 2 2B and Q4 results on all three models. To make the quantization finding robust across architectures, I need:

Q8 on Gemma 2 2B (fits on the 3080, running this week)
FP16 on LLaMA 3 8B (does not fit on the 3080)
FP16 on Gemma 2 9B (does not fit on the 3080)

The 8B and 9B FP16 runs are the bottleneck. A GPU with 24–48 GB VRAM makes these runs tractable. The 3080 cannot run them at full precision.

Secondary goal: maintain enough financial stability to finish the paper and start the next one without disruption. I'm on medical leave from my day job, my long-term disability claim is in active appeal, and I need runway to keep the lights on while the paper gets finished and submitted.

How will this funding be used?

Full funding goal: $20,000 USD. Minimum: $5,000 USD.

Item Cost: Used NVIDIA RTX A6000 (48 GB VRAM) or equivalent~$4,500
4 months of living expenses (rent, utilities, food for family of 3)~$12,000
Cloud GPU rental for 70B replication attempt~$2,000
Buffer for unforeseen costs~$1,500

At minimum funding ($5,000): GPU only. The precision ladder experiments complete, the paper finishes with the 8B/9B FP16 results integrated, but no runway for follow-up work and no budget for 70B replication. This is still a successful outcome for the project even if I have to work around financial constraints personally.

At full funding ($20,000): GPU plus four months of runway plus the 70B replication. Four months is enough to finish the paper, submit to arXiv, handle the first round of reviewer feedback, and start the follow-up work on precision scaling.

Who is on your team? What's your track record on similar projects?

Team: Solo researcher. Kevin Vaillancourt, independent, based in Greater Sudbury, Ontario, Canada.

Potential collaborator: Dr. Meng Cheng Lau, Laurentian University - co-authorship potential on applying the same decomposition framework to humanoid robotic gait control.

Track record:

Built the full IOTA measurement system from scratch between January and April 2026: formal mathematical framework, 28-file Python codebase, preregistered hypothesis registry, synthetic calibration suite, live experimental dashboard, subprocess-isolated GPU orchestration with fault-tolerant resume
Published technical note establishing formal properties of the measurement target (Zenodo DOI above)
Completed 47 experimental runs across three model families with causal validation via activation patching
Falsified my own prior prediction about capacity-driven R behavior in small models when the 2B FP16 data came in, and revised the framing honestly
Built prior engineering projects including a local LLM inference server with persistent KV state and a slot-based memory system for stateless LLMs

Background: BSc Interdisciplinary Science, Laurentian University. Independent researcher since early 2025. Neurodivergent profile (2e AuDHD). Independent research format works well for how I focus.

What are the most likely causes and outcomes if this project fails?

Most likely failure modes, ranked:

Hardware failure or data loss. I'm running on a single consumer machine with no redundancy and no budget for replacement hardware if the GPU or storage fails. A drive failure or GPU failure would halt the experimental program until replacement hardware was available. Mitigation: the codebase and technical note are pushed to public repositories (GitHub, Zenodo) so the intellectual work is preserved; experimental data is backed up opportunistically. The requested funding partly addresses this by providing a second, higher-capacity GPU.
The 8B/9B FP16 results reveal the quantization finding is Gemma-2B-specific. This would mean the R convergence at Q4 has a scale or architecture confound I haven't disentangled. This is a real possibility and it's why the precision ladder matters. If it happens, the paper still has the preregistered hypothesis results, the causal patching evidence, and the formal framework — it just loses the cross-precision convergence finding. Still publishable; less compelling.
No replication at 70B. The stretch goal might reveal the pattern breaks above a certain capacity threshold. This would be a real scientific finding, not a failure.

Outcomes regardless of failure mode: the codebase, data, preregistered registry, and technical note are all public and stay public. Anyone can replicate or extend the work.

How much money have you raised in the last 12 months, and from where?

$0 in research funding. The full 47-run experimental program, the codebase, and the technical note were produced without any grant, fellowship, or institutional support. Self-funded out of pocket while on medical leave. I have not previously applied for research funding. Manifund is my first application.