You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
I am requesting funding for 6 months of independent mechanistic interpretability research focused on detecting deceptive behaviors in large language models, to be conducted in Kigali, Rwanda.
The project has two phases. Month 1: completing the ARENA curriculum (transformers, circuits, superposition, activation patching). Months 2–6: original research investigating whether whitebox probing methods can reliably detect cases where a model's internal representations contradict its outputs what the interpretability literature defines as deceptive behavior and whether these methods genuinely outperform blackbox baselines by leveraging model internals.
The primary goal is to produce original, publishable mechanistic interpretability research on deceptive behavior detection in LLMs specifically developing whitebox probing tools that generalize beyond standard academic benchmarks to realistic deployment settings.
I will achieve this by:
Completing ARENA in month 1 to build the technical foundation (TransformerLens, circuits, probing methods)
Designing and running experiments on deception detection using supervised probing and activation patching on open-weight models
Iterating toward a research output whether a paper, Alignment Forum post, or benchmark suitable for submission to an AI safety venue or fellowship
The secondary goal is to unlock access to mechanistic interpretability fellowships and internships (MATS, Neel Nanda's team, Anthropic residency), all of which require demonstrated prior independent research. This grant bridges that gap.
Full budget breakdown: [Sheet]
Summary:
Monthly living × 6 months (rent, food, internet, transport, stipend): $7,200
One-time setup (flight Port Sudan→Kigali, laptop, visa, household basics): $2,180
Cloud compute (net after $500 Modal GPU credit already secured): $1500
Total: $10,880
The laptop is a hard requirement all mech interp work requires running TransformerLens and PyTorch locally, and I currently have no working device. I access the internet at cafes in areas partially controlled by RSF militia in Sudan.
This is a solo project. My relevant background:
Master's in AI for Science, AIMS South Africa / UCT - fully funded by Google DeepMind (2% acceptance rate). Thesis: Context-Aware Neural Network for ARC-AGI benchmark, supervised by Prof. Ulrich Paquet (Google DeepMind).
Research Engineer, Sultan Qaboos University (Oman) - built NLP analytics platform with multilingual embeddings and semantic similarity metrics for research funding evaluation.
1st place, Build with AI Hackathon, Kigali 2025
1st place, Qeen.AI Data Science Challenge, Qatar 2025
3rd place, InstaDeep Hackathon, Deep Learning Indaba, Kigali 2025
Undergraduate thesis: adversarial attacks on neural networks (FGSM & one-pixel attacks on MNIST/CIFAR-10)
GitHub: https://github.com/AhMedDa1
The most likely failure mode is that the research does not reach publication quality within the grant period. In that case, I would still have produced documented experiments, negative results, and a public write-up on the Alignment Forum all of which have value for the field and are accepted as evidence of research experience by fellowship programs.
A secondary risk is that living costs in Kigali exceed estimates. I have budgeted conservatively and have a lean personal stipend ($550/month) that leaves little buffer, but the main cost categories (rent, food, internet) are well-established in the market.
The research failing to happen at all due to lack of funding is the worst outcome, and the most likely one if this grant is not awarded. There is no alternative funding path available to me.
$0. I have had no stable income or funding since completing my master's degree in September 2025. I am currently displaced in Sudan due to the ongoing armed conflict between the Sudanese Armed Forces and the RSF.