@sohvenk
Independent Researcher. Currently working on AI Safety and Interpretability
https://sohv.github.io$0 in pending offers
Research Interests: Understanding LLM capabilities, AI safety and alignment, interpretability
My current research focuses specifically on studying LLM capabilities and its failure modes such as alignment faking and model scheming. I use interpretability and representation engineering to understand these mechanisms internally with the goal of improving the safety of AI systems.