@kaminovs
Independent researcher working on AI evaluation and agent reliability. Built CRepair — a benchmark and runtime enforcement layer for measuring structural self-repair in LLM agents. Two preprints published (Zenodo). Background in data analytics and systems thinking.
https://github.com/kaminovs/crepair$0 in pending offers
I work at the intersection of AI safety and empirical evaluation. My current research programme centres on a question that existing benchmarks don't answer: when an AI agent's reasoning breaks down, can it detect the failure, repair it, and verify the repair worked?
I built CRepair to measure this. The benchmark revealed that under standard conditions, LLMs achieve 0% verification rate — they detect and repair failures but never close the loop. A follow-up ablation study showed that structured runtime intervention raises this substantially (+0.333 mean improvement) while generic re-prompting barely moves the needle (+0.051).
Both papers are published as open preprints. All code is open source. I'm looking for funding to run cross-model replication (GPT-4o, Gemini) and develop the research into a community benchmark with a public leaderboard.
Background: independent researcher based in the UK, with a day job in casino data analytics. This research is built in my own time.
pending admin approval