Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
skunnavakkam avatarskunnavakkam avatar
Sudarsh Kunnavakkam

@skunnavakkam

https://sudarsh.com
$0total balance
$0charity balance
$0cash balance

$0 in pending offers

About Me

I'm Sudarsh. I'm a first year at Caltech studying Physics. I also do research as a contractor at METR, work in the Shapiro Lab at Caltech, and other things. Please reach out if you'd like, or check out my website at https://sudarsh.com :D

Projects

Tooling + Model Orgs for CoT Faithfulness Research

Outgoing donations

Marius Hobbhahn
$2000
about 1 hour ago

Comments

Tooling + Model Orgs for CoT Faithfulness Research
skunnavakkam avatar

Sudarsh Kunnavakkam

9 months ago

@NeelNanda i'm doing this by training against a monitor (so negatively rewarding when the monitor fires). the reasoning traces definitely have higher reward (fires the monitor less) and look by eye to be more obfuscated, but i think this "look by eye" is likely confirmation bias. i think continuing training from these checkpoints would work!

Transactions

ForDateTypeAmount
<9af369f0-dc1e-4577-9bb7-4ad4cb87131f>about 1 hour agoprofile donation2000
Sudarsh Kunnavakkamabout 1 hour agocash to charity transfer2000
Manifund Bank8 months agowithdraw1000
Tooling + Model Orgs for CoT Faithfulness Research8 months agoproject donation+3000