alexhb61 avatar
Alexander Bistagne

@alexhb61

Computational Complexity & AI Alignment Independent Researcher

https://github.com/Alexhb61
$0total balance
$0charity balance
$0cash balance

$0 in pending offers

About Me

I was introduced to concerns about AGI alignment via Robert Miles work back in college around 2018, and It motivated me to take Lise Getoor's Algorithms and Ethics class at UC Santa Cruz.

Now, I'm an independent researcher whose working on AI Alignment among other things. My current approach to AI Alignment is to use computational complexity techniques on black boxes. I'm of the opinion that post construction aligning black box AI's is infeasible.

Projects

Comments

alexhb61 avatar

Post available on lesswrong and submitted to alignment forum.

https://www.lesswrong.com/posts/JxhJfqfTJB9dkq72K/alignment-is-hard-an-uncomputable-alignment-problem-1

alexhb61 avatar

Project is on github. https://github.com/Alexhb61/Alignment/blob/main/Draft_2.pdf

citations and submitting to Alignment forum tommorrow.

alexhb61 avatar

Alexander Bistagne

2 months ago

This project is nearly at its target, but hit a delay near the beginning of september as I needed to take up other work to pay bills. Hopefully, I will post the minimal paper soon.

alexhb61 avatar

Alexander Bistagne

4 months ago

@alexhb61

Conditional on 6k being reached,

I have committed to submitting an edited draft to the alignment forum on August 23rd

alexhb61 avatar

Alexander Bistagne

4 months ago

Correction Co-RE is the class not Co-R. The set of problems reducable to the complement of the halting problen

alexhb61 avatar

Alexander Bistagne

4 months ago

Technical detail worth mentioning; Here is the main theorem of the 6K project:

Proving an immutable code agent with turing-complete architecure in a turing machine simulateable environment has nontrivial betrayal-sensitive alignment is CoR-Hard.

The paper would define nontrivial betrayal-sensitive alignment and some constructions on agents needed in the proof.

alexhb61 avatar

Alexander Bistagne

4 months ago

Thanks for the encouragement and donation.

The 40K max would be a much larger project than the 6K project which is what I summarized.

6K would cover editing

-Argument refuting testing anti-betrayal alignments in turing complete architecture

-Argument connecting testing alignment to training alignment in single agent architecture

40k would additionally cover developing and editing

-Arguments around anti-betrayal alignments in deterministic or randomized, P or PSPACE complete architecture

-Arguments around short term anti-betrayal alignments

-Arguments connecting do-no-harm alignments to short term antibetrayal alignments

-Arguments refuting general solutions to the stop button problem which transform the utility function in computable reals context

-Arguments around general solutions to the stop button problem with floating point utility functions

-Foundations for modelling mutable agents or subagents