Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
Miguel avatarMiguel avatar
Miguelito De Guzman

@Miguel

Independent Researcher

whitehatstoic.com
$0total balance
$0charity balance
$0cash balance

$0 in pending offers

Projects

Model Interpretability on modFDTGPT2-XL, a partially-aligned model

Comments

Model Interpretability on modFDTGPT2-XL, a partially-aligned model
Miguel avatar

Miguelito De Guzman

almost 2 years ago

Thank you for the offer! Nicholas Doiron

Jacques Thibodeau - Independent AI Safety Research
Miguel avatar

Miguelito De Guzman

almost 2 years ago

Hello,

I upvoted this because I have personally explored this area and have identified numerous possibilities and areas of interest. Comparing base models to their variants in terms of alignment is currently an underexplored aspect. I encourage more people to focus on this area.

Scoping Developmental Interpretability
Miguel avatar

Miguelito De Guzman

almost 2 years ago

I am also conducting phase transitions with GPT2-xl, and I believe there is a need for further research on this mechanism. I fully support this application!

Model Interpretability on modFDTGPT2-XL, a partially-aligned model
Miguel avatar

Miguelito De Guzman

almost 2 years ago

Just finished the first update post on this project. An Analysis of Activation Values (ActVal) in GPT2-xl and modFDTGPTxl

Joseph Bloom - Independent AI Safety Research
Miguel avatar

Miguelito De Guzman

almost 2 years ago

I am one of the ARENA 2.0 online participants and I could say that in my interaction with Joseph he was very insightful. I believe he is competent enough to deliver on his the alignment space.

Model Interpretability on modFDTGPT2-XL, a partially-aligned model
Miguel avatar

Miguelito De Guzman

almost 2 years ago

Thank you @Vincent Weisser.

Much appreciated offers.