"you see more value and practicality in the first steps of this decomposition (taking one big N-level model and decomposing it into n (N-1)-level models) rather than the last steps [...]?"

Yes, I'd see a lot of value in being able to do the first steps of decomposition. I'm particularly thinking about concerns stemming from the AI itself being dangerous, as opposed to systematic risks. Here I think that "a system built of n (N-1)-level models" would likely be much safer than "one N-level model" for reasonable values of n. (E.g. I think this would plausibly be much better in terms of hidden cognition, AI control, deceptive alignment, and staying within assigned boundaries.)

"I would expect the largest performance hit to occur primarily in the initial decomposition steps, and for decomposition to hold on until the end."

I would expect this, too. This is a big factor for why I think one should look here: it doesn't really help if one can solve the (relatively easy) problem of constructing plain-coded white-petal-detectors, if one can't decompose the big dangerous systems into smaller systems. But if one the other hand one could get comparable performance from a bunch of small models, or even just one (N-0.1)-level model and a lot of specialized models, then that would be really valuable.

"but are currently worried about both expanding capabilities and safety concerns"

Makes sense. "We are able to get comparable performance by using small models" has the pro of "we can use small models", but the con of "we can get better performance by such-and-such assembles". I do think this is something one has to seriously think about.

🧡

Research Staff for AI Safety Research Projects

🐯

Loppukilpailija

over 1 year ago

CAIS has done great work in the past; I'm showing appreciation with a small donation, and hope that larger donors will provide more funding

AI Safety Textbook

🐯

Loppukilpailija

over 1 year ago

I do think that the textbook improves upon BlueDot Impact's material, and more broadly I think the "small things" such as having one self-contained exposition (as opposed to a collection of loosely-connected articles) are actually quite important.

I second Ryan Kidd's recommendation of consulting people with experience in AI safety strategy and AI governance. I think getting in-depth feedback from many experts could considerably improve the quality of the material, and suggest allocating time for properly collecting and integrating such feedback.

My understanding is that there are a lot of AI safety courses that use materials from the same reference class, and would use improved materials in the future. Best of skill on executing the project!

🧡

Investigating constructability as a safer approach to machine-learning

🐯

Loppukilpailija

over 1 year ago

I overall find the direction of safer-by-design constructions of AI systems an exciting direction: the ideas of constructability are quite orthogonal to other approaches, and marginal progress there could turn out to be broadly useful.

That said, I do think this direction is littered by skulls, and consider the modal outcome to be failure. I think that especially the fully-plain-coded approaches are not practical for the types of AI we are most worried about, and working on this would very likely be a dead end. I'm more excited about top-down approaches: trying to make models more modular-by-design while essentially retaining performance, in the sense of "we have replaced one big model with N not-so-big models".

The project authors seem to be aware of the skulls, and indeed the proposal has some novel components that may get around some issues. While I think it's still easy to run into dead ends, this is good enough of a reason for me to fund the project.

Overall, I think simply having a better understanding of the constraints involved when trying to make systems safer-by-design is great. I'd quite like there to be people thinking about this, and would be happy about progress on mapping out dead ends and not-so-dead ends.

Translation of BlueDot Impact's AI alignment curriculum into Portuguese

🐯

Loppukilpailija

over 1 year ago

This translation project seems worthwhile to me; I gave a small retrospective donation as a way of saying thank you.

Tangential, but I'm curious about how you proceeded with the translation. Are current machine translators (such as DeepL) good enough to be useful as a first draft, or did you do it completely via human work?

BAIS (ex-AIS Hub Serbia) Office Space for (Frugal) AI Safety Researchers

🐯

Loppukilpailija

over 1 year ago

@DusanDNesic I'm curious: how have the summer plans been progressing?

Predict on random markets - Manifold tool

🐯

Loppukilpailija

almost 2 years ago

I created an application for this. You can see a screenshot below. Supports essential features - e.g. allows logging predictions.

Thanks to Isaac King for allowing me to build this project on top of his code-base, it allowed me to get started. He deserves a portion of the Shapley value.

You can get the code here. (It might require a bit of fiddling around to make it work due to some files being in .gitignore, though.)

The bad news: I have not hosted it anywhere. (I lack the knowledge and, currently, the spoons. I'd very much appreciate it if someone else would do that, but I don't expect that to happen.) Thought I'd still share the update.

Predict on random markets - Manifold tool

🐯

Loppukilpailija

almost 2 years ago

Update:

I have a demo with basic functionalities up.
I realize that I asked for way too much funding - this was much easier to create than I expected. (Not sure what to do with the post now.)

Will give a more substantive update a bit later.

Transactions

For	Date	Type	Amount
<ed06a947-b052-4249-937e-de7e4f0e56ce>	6 months ago	tip	+1
<c4929171-6618-46fa-9aed-a18c5bbca3e7>	about 1 year ago	tip	1
Fund Sentinel for Q1-2025	about 1 year ago	project donation	800
Finishing The SB-1047 Documentary	about 1 year ago	project donation	5000
Fund Sentinel for Q1-2025	about 1 year ago	project donation	500
Fund Sentinel for Q1-2025	about 1 year ago	project donation	1000
Research Staff for AI Safety Research Projects	over 1 year ago	project donation	500
<51822c19-d998-453a-9896-bf55d53e1642>	over 1 year ago	tip	+1
<51822c19-d998-453a-9896-bf55d53e1642>	over 1 year ago	tip	1
<28ef2a62-3b35-47da-beb8-1d7acce2095d>	over 1 year ago	tip	1
<51822c19-d998-453a-9896-bf55d53e1642>	over 1 year ago	tip	1
AI Safety Textbook	over 1 year ago	project donation	2000
<5b5e53f5-c48c-4c35-a492-c07c6c34fb12>	over 1 year ago	tip	1
Translation of BlueDot Impact's AI alignment curriculum into Portuguese	over 1 year ago	project donation	300
Manifund Bank	over 1 year ago	deposit	+10000
Manifund Bank	over 1 year ago	mana deposit	+110

Comments

Fund Sentinel for Q1-2025

🐯

Loppukilpailija

about 1 year ago

I've found the weekly Sentinel minutes to be very high-quality reporting about world events

Investigating constructability as a safer approach to machine-learning

🐯

Loppukilpailija

over 1 year ago

@epiphanie_gedeon

"you see more value and practicality in the first steps of this decomposition (taking one big N-level model and decomposing it into n (N-1)-level models) rather than the last steps [...]?"

"I would expect the largest performance hit to occur primarily in the initial decomposition steps, and for decomposition to hold on until the end."

"but are currently worried about both expanding capabilities and safety concerns"

🧡

Research Staff for AI Safety Research Projects

🐯

Loppukilpailija

over 1 year ago

CAIS has done great work in the past; I'm showing appreciation with a small donation, and hope that larger donors will provide more funding

AI Safety Textbook

🐯

Loppukilpailija

over 1 year ago

My understanding is that there are a lot of AI safety courses that use materials from the same reference class, and would use improved materials in the future. Best of skill on executing the project!

🧡

Investigating constructability as a safer approach to machine-learning

🐯

Loppukilpailija

over 1 year ago

Translation of BlueDot Impact's AI alignment curriculum into Portuguese

🐯

Loppukilpailija

over 1 year ago

This translation project seems worthwhile to me; I gave a small retrospective donation as a way of saying thank you.

BAIS (ex-AIS Hub Serbia) Office Space for (Frugal) AI Safety Researchers

🐯

Loppukilpailija

over 1 year ago

@DusanDNesic I'm curious: how have the summer plans been progressing?

Predict on random markets - Manifold tool

🐯

Loppukilpailija

almost 2 years ago

I created an application for this. You can see a screenshot below. Supports essential features - e.g. allows logging predictions.

Thanks to Isaac King for allowing me to build this project on top of his code-base, it allowed me to get started. He deserves a portion of the Shapley value.

You can get the code here. (It might require a bit of fiddling around to make it work due to some files being in .gitignore, though.)

Predict on random markets - Manifold tool

🐯

Loppukilpailija

almost 2 years ago

Update:

I have a demo with basic functionalities up.
I realize that I asked for way too much funding - this was much easier to create than I expected. (Not sure what to do with the post now.)

Will give a more substantive update a bit later.

Transactions

For	Date	Type	Amount
<ed06a947-b052-4249-937e-de7e4f0e56ce>	6 months ago	tip	+1
<c4929171-6618-46fa-9aed-a18c5bbca3e7>	about 1 year ago	tip	1
Fund Sentinel for Q1-2025	about 1 year ago	project donation	800
Finishing The SB-1047 Documentary	about 1 year ago	project donation	5000
Fund Sentinel for Q1-2025	about 1 year ago	project donation	500
Fund Sentinel for Q1-2025	about 1 year ago	project donation	1000
Research Staff for AI Safety Research Projects	over 1 year ago	project donation	500
<51822c19-d998-453a-9896-bf55d53e1642>	over 1 year ago	tip	+1
<51822c19-d998-453a-9896-bf55d53e1642>	over 1 year ago	tip	1
<28ef2a62-3b35-47da-beb8-1d7acce2095d>	over 1 year ago	tip	1
<51822c19-d998-453a-9896-bf55d53e1642>	over 1 year ago	tip	1
AI Safety Textbook	over 1 year ago	project donation	2000
<5b5e53f5-c48c-4c35-a492-c07c6c34fb12>	over 1 year ago	tip	1
Translation of BlueDot Impact's AI alignment curriculum into Portuguese	over 1 year ago	project donation	300
Manifund Bank	over 1 year ago	deposit	+10000
Manifund Bank	over 1 year ago	mana deposit	+110