You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
What are this project's goals? How will you achieve them?
AI text detectors are everywhere right now, but they all have the same problem: they give you a label and nothing else. "This is AI," says the machine, and you're supposed to just trust it. For students accused of cheating, for researchers trying to audit content, for anyone who wants to understand the decision, that's not good enough.
TELL (Traced Evidence for Learned Labels) is a research prototype that changes this. Instead of outputting a score, it outputs evidence, specific spans in the original text with an explanation of why that phrase looks AI-generated or human-written. The goal is to move AI detection from a black box to something closer to a peer reviewer.
The pipeline has two parts. A policy model, trained with GRPO, learns to insert <tell explanation="..."> tags around spans in the text without changing a single word. Then a separate, frozen scoring model assigns a continuous score between -1 (human) and +1 (AI) to each tag. The final verdict is the average of all tells. By separating the "tagging" and the "scoring" into two different models, the system can't cheat: the policy model can't just pick words it knows will score high, it actually has to find real evidence.
The key research question I want to keep working on is generalization: right now, the model is trained on scientific abstracts in English, and it already shows promising results. But real-world AI detection needs to work across languages, domains, and writing styles, including non-native English. I'm Spanish, so I'm particularly interested in what happens with translated text or Spanish writing. My plan is to expand the training data, improve Z-score normalization for out-of-distribution (OOD) cases, and run more systematic evaluations.
How will this funding be used?
Honestly, the main bottleneck right now is compute. Training runs are expensive, and GRPO is sampling-heavy. This is how I'd use the funding:
API costs for Tinker (training and sampling), and an inference engine (frozen scorer): around 70% of the budget.
Evaluation: building a proper benchmark across domains and languages (including Spanish), and collecting or licensing some human-written text data: around 20%.
Miscellaneous infrastructure (hosting the demo online, storage for checkpoints): around 10%.
The minimum funding would let me run two or three full training runs with different data mixtures and evaluate OOD generalization more carefully. With full funding I could also start the multilingual track.
Who is on your team? What's your track record on similar projects?
Right now this is a solo project. I'm Aldan Creo, an MSDS student at the Halıcıoğlu Data Science Institute (HDSI) at UC San Diego, funded by Fulbright. My research is on AI-generated text detection and LLM watermarking, working with Prof. Yu-Xiang Wang. In parallel, I work in Prof. Earlence Fernandes's lab on prompt injection.
TELL started as a hackathon project for DiamondHacks 2026, where I built the full pipeline from scratch. It's also closely related to my ongoing graduate research and a previous project (CoDeTect) on out-of-distribution code detection, where I proposed a similar Z-scoring approach for language normalization.
What are the most likely causes and outcomes if this project fails?
The most likely failure mode is that the policy model doesn't generalize beyond scientific abstracts. If that happens, the system works in a narrow domain but isn't practically useful. A second risk is that the frozen scorer becomes a bottleneck: if it's badly calibrated, the tells can be misleading even if the format is correct. Both of these are known issues I'm already working on. If the project fails completely, the worst outcome is that it stays a narrow research prototype, which is still useful for the scientific community and as a proof of concept.
How much money have you raised in the last 12 months, and from where?
I'm funded through the Fulbright program for my MSDS studies, but that doesn't cover research compute costs. I haven't raised any external funding specifically for this project.